To build and run:

	sh mbtest.mk  # To build

	./iriw --ncycles 10000000 --jitter 8 --v  # To run IRIW

	Arguments:

	--cpuoffset n

		Start allocating CPUs at CPU n.

	--jitter n

		Fuzz start time relative to TBR.

	--ncycles n

		Number of passes through the test.  32-bit number.

	--noise_size n

		Number of cache lines to dump into store buffer.

	--test_cycle_tb_mask 0xhh

		When to start relative to TBR.

	--v

		Print arguments.  Highly recommended as -last- argument.


Filename conventions for .c files beginning with "mb_":

	b	Memory barrier
	e	"eieio" instruction (POWER-specific) -- orders stores
	h	Heavy-weight memory barrier
	i	Increment
	l	Load
	s	Store
	w	While loop (repeated load)

	Thus mb_lhs_ws.c has one thread doing a load, a heavy-weight
	barrier, and a store, and another thread doing a while loop
	followed by a store.

Variables available (all of type "long"):

	Cache line 0: state.a, state.a1, state.a2
	Cache line 1: state.b, state.b1, state.b2
	Cache line 2: state.c, state.c1, state.c2
	Cache line 3: state.d, state.d1, state.d2
	Cache line 4: state.e, state.e1, state.e2
	Cache line 5: state.v, state.v1, state.v2
	Cache line 6: state.x, state.x1, state.x2
	Cache line 7: state.y, state.y1, state.y2
	Cache line 8: state.z, state.z1, state.z2

	Of these, the following are initialized to zero at the beginning
	of each test iteration:

		state.a, state.b, state.c, state.d, state.x, state.y,
		state.z, state.a1, state.b1, state.c1, state.d1

	The others are not initialized at all, but perhaps should be,
	perhaps only at the very beginning of the test sequence.  But if
	you need more to be initialized, define the "INIT_VARS_EACH_CYCLE"
	macro with your desired initialization, invoked at the beginning
	of each test cycle.

	The state.badcount variable should be incremented when an error
	is detected.  The state.anomalies may be used as well to count
	questionable but legal sequences.  When it comes to memory
	ordering, the difference between an "error" and an "anomaly"
	can of course be somewhat subjective.  That said, the first
	time "badcount" is incremented, the working state will be dumped,
	while "anomalies" is silent.

Code sequences:

	Code sequences are specified by defining as many of the THREAD_0,
	THREAD_1, THREAD_2, THREAD_3, THREAD_4, and THREAD_5 macros
	as required.  You are required to define THREAD_0 and THREAD_1,
	the rest are optional.	If you need more than five threads,
	you will need to make the relevant modifications to mbtest.h.

	Thread 0 starts and stops the sequence.  Therefore, if you have
	an assertion that should be tested only after all threads have
	completed, put the assertion in thread 0 after ensuring that the
	value of state.f has reached zero.

Cache prewarming:

	Each element of the cache_preload[] array specifies the initial
	location of the cache line containing the specified variable.
	For example, consider the following:

	struct cache_preload cache_preload[] = {
		{ 1, &state.a },
		{ 2, &state.b },
		{ 2, &state.x },
		{ 1, &state.y },
		{-1, NULL },
	};

	Here, the cache lines containing the variables state.a and state.y
	will be pulled into a cache close to the CPU on which THREAD_0
	is running, and the cache lines containing the variables state.b
	and state.x will be pulled into a cache close to the CPU on which
	THREAD_1 is running.  Specifying that a given variable be located
	near more than one thread is legal, but probably an exercise in
	futility, with the result depending on the hardware in question.

	Specifying that a given variable should be located near a thread
	that you have not defined is equivalent to leaving that variable
	off the list.  Otherwise, any variables mentioned in this list
	will be initialized to zero.

Associating threads with CPUs

	The thread_assignment[] array controls which CPU each thread
	runs on.  For example, consider the following:

	struct thread_assignment thread_assignment[] = {
		{ 0, thread_0 },
		{ 2, thread_1 },
		{ 4, thread_2 },
		{-1, NULL },
	};

	This runs the code sequence specified by THREAD_0 on CPU 0, that
	specified by THREAD_1 on CPU 2, and that specified by THREAD_2
	on CPU 4.

	Note that specifying a given thread in the thread_assignment[]
	array without also defining the corresponding THREAD_ macro
	will get you a compiler error.	Defining a given THREAD_ macro
	without also placing a corresponding thread_ entry in the
	thread_assignment[] array will get you a run-time error.

	Note that the specific assignment of threads to CPUs required
	to force memory misordering scenarios can be extremely hardware
	dependent.  The assignment shown above might be appropriate for
	a system with two hardware threads per core.  Most tests will
	give the highest misordering rates if there are no levels of
	cache shared among the CPUs partipating in the test.
