qUark (quite Utilitarian arkitecture) quark is Utterly anarchistic, recidivist kids nFORTH = nanoFORTH: a viable stack-computer with 4-bit opcodes -- perhaps the irreducible ultimate in "software DeMorgan-ization"? v2.3 Feb 19, 1999 (this doc is verrry rough and crude) (c) vic plichota, original concept by Myron Plichota Dec '98, polished over the MISC list. ... (an almost) FORMAL DESCRIPTION: GLOSSARY: AKA: Also Known As cell-size: the nFORTH machine-implementation's native integer format iff: "if and only if" ILK: 'immediate' in-line constant, with the same number of bits as the cell-size IP: the Instruction Pointer, AKA 'program counter' N: Next-from-top entry on parameter-stack R: Top entry on return-stack T: Top entry on parameter-stack (. . .): parameter-stack diagram <. . .>: return-stack diagram |: pipe char used to indicate an "either-or" choice ... \ qUark dual stack machine instruction set: 32-bit cells and addresses \ Codespace is limited to the lowest 256Mcells. \ Upper pages are used for TBD on-chip and external peripherals and RAM. \ Eight 4-bit instructions are packed into each cell. \ Instruction slots are numbered 0..7 from ms nybl to ls nybl within the cell. \ Execution progresses from left to right. \ Most opcodes may occur in any slot, except JZ and CALL must occur in \ slot 0 with the absolute address in slots 1..7. \ LIT may occur in multiple slots with inline data in following cells. \ The PC is the only register in the programming model. It is 28 bits wide \ and is zero-extended to 32 bits when read. Memory-mapped devices are only \ mechanism provided to expand the computation hardware. \ The compiler generates inline code whenever \ [that would take less space than CALLing that word]. \ The assembler and compiler blur into one language. \ All words must be CALLable, whether they are inlineable or not. This is \ essential for the command line interpreter. \ Only words made entirely of 4-bit straight \ line code sequences up to a full octet or simple literals are [always inlined]. \ Examples would be TRUE, OVER, and OR. I want a smart compiler, but \ I don't want it to outsmart me;-)) My thinking is that if a word being \ compiled has conditionals or nests then you might as well call it anyway. \ [DAV: Right after a call, \ it takes less space to insert a full octet for another call \ than it does to insert 9 or more nybbles of inline code. \ However, immediately after a call and a 1 nybble opcode, \ it takes less space to inline a straight-line code sequence \ up to to 14 nybbles long \ than it does to pad out this cell with NOP \ and fill the next cell with the call. \ Obviously, when inlining takes exactly the same space \ as a CALL, inlining is faster. \ To minimize space, code sequences from 0 to 15 nybbles \ are all candidates for inlining. \ A really smart compiler, optimizing for space, \ always inlines \ straight-line code sequence from 0 to 8 nybbles, \ and would only CALL straight-line sequences \ from 9 to 15 nybbles \ when that would actually save space. \ ] \ Non-maskable interrupts: \ - reset \ - parameter stack overflow \ - parameter stack underflow \ - return stack overflow \ - return stack underflow \ - external NMI \ Maskable interrupts: \ - timer overflow \ - timer match \ - external IRQs \ Notes on instruction timing: \ \ Due to lack of a pipeline, no instruction or operand prefetching occurs. \ An octet fetch takes 1 cycle, followed by up to 8 instruction execute \ cycles. All instructions execute in 1 cycle. \ \ An octet fetch occurs during either of: \ - a CALL or JZ is executing in slot 0 \ - a RET is executing in any slot \ - all pending slots are NOPs \ - the slot 7 instruction is executing and not a LIT, @, or !. \ - the slot 7 instruction has finished executing. \ \ Note that CALL, JZ, and RET perform an octet fetch while executing, which \ achieves 1 cycle total execution time. [DAV: The branch instructions ( CALL and JZ and RET ) perform an octet fetch while executing, so they achieve 1 cycle total execution time before the fetched octet starts executing. In particular, a cell containing CALL or JZ executes in 1 cycle. A cell containing RET executes in 1 to 8 cycles. All other cells are straight-line code. The compiler always packs NOPs to the end of the cell, so cells with NOPs execute in 1 to 8 cycles. If the slot 7 instruction is not a NOP, LIT, @, or !, the cell executes in 8 cycles. If the slot 7 instruction is a LIT, @, or !, the cells executes in 9 cycles. ] [DAV: This seems like a lot of special cases, and seems to be unnecessarily constrained by RAM cycle time. Perhaps it would be simpler for the CPU to run at 8x RAM cycle time, so that all cells take 8 cycles to load and 8 cycles to execute (a total of only 2 RAM cycle times) except for LIT, @, or !. ... ] \ Instruction set summary: \ NOP ( --) no operation \ is necessary for alignment padding within a given octet whenever the next \ instruction is JZ or CALL. The compiler will fill any remaining slots in \ the current octet with NOPs in that case. An octet fetch is triggered \ whenever the unexecuted slots are made up entirely of NOPs, therefore the \ maximum execution penalty for compiler-generated NOPs is 1 cycle per \ instance. \ JZ ( n --) jump if false \ is the only conditional provided. An unconditional jump may be affected \ by DUP DUP XOR JZ. The compiler always generates this instruction in slot \ 0, and no provision is made in hardware for dealing with misalignment. \ CALL ( --) ( R: -- PC) call subroutine \ The compiler automatically chooses whether to CALL or generate inline code \ for previously defined words. An explicit CALL forces the CALL strategy and \ may also be used to force compilation of CALLs to IMMEDIATE words. The \ compiler always generates this instruction in slot 0, and no provision is \ made in hardware for dealing with misalignment. \ RET ( --) ( R: addr --) return from subroutine \ may be used to compile an explicit return before the terminating ; which \ implicitly generates a RET. \ LIT ( -- n) "value" push inline cell, PC++ \ causes a push of the cell pointed to by PC onto the parameter stack. The \ PC is then incremented. Up to 8 LITs may occur in an octet. \ @ ( addr -- n) fetch cell at addr \ ! ( n addr --) store n to cell at addr \ + ( n1 n2 -- n1+n2) addition \ is the only addition operator. No carry is provided. Multiple-precision \ addition may be performed by extending downwards with 31-bit unsigned \ fields in the lsbs and using bit 31 as a carry. Precision can be \ extended in 31-bit chunks this way. The msb of the mscell is the sign bit. \ NAND ( n1 n2 -- n1NANDn2) negative and \ XOR ( n1 n2 -- n1^n2) exclusive-or \ 2/ ( n -- n/2) arithmetic shift right, sign retained \ DUP ( n -- n n) duplicate \ DROP ( n --) discard \ SWAP ( n1 n2 -- n2 n1) exchange \ >R ( n --) ( R: -- n) pop parameter stack to return stack \ R> ( -- n) ( R: n --) pop return stack to parameter stack \ \ Coding examples: \ HEX : TRUE ( -- FFFFFFFF) FFFFFFFF ; : FALSE ( -- 0) 0 ; : MIN- ( -- 80000000) 80000000 ; : MAX+ ( -- 7FFFFFFF) 7FFFFFFF ; : OVER ( n1 n2 -- n1 n2 n1) >R DUP R> SWAP ; : ROT ( n1 n2 n3 -- n2 n3 n1) >R SWAP R> SWAP ; : R@ ( -- n) ( R: n -- n) R> DUP >R ; : EXECUTE ( addr --) ( R: -- return@) >R ; : INVERT ( n -- ~n) DUP NAND ; : NEGATE ( n -- -n) DUP NAND 1 + ; : - ( n1 n2 -- n1-n2) DUP NAND + 1 + ; : AND ( n1 n2 -- n1&n2) NAND DUP NAND ; : OR ( n1 n2 -- n1|n2) DUP NAND SWAP DUP NAND NAND ; : +! ( n addr --) DUP >R @ + R> ! ; : 2* ( n -- 2*n) DUP + ; : 0= ( n -- flag=n==0) IF FALSE RET ELSE TRUE THEN ; : 0<> ( n -- flag=n!=0) IF TRUE RET ELSE FALSE THEN ; : = ( n1 n2 -- flag=n1==n2) XOR 0= ; : 0< ( n -- flag=n<0) MIN- AND 0<> ; : CMPSGN ( n1 n2 -- n1 n2 flag=msb1!=msb2) OVER OVER XOR 0< ; : < ( n1 n2 -- flag=n1 ( n1 n2 -- flag=n1>n2) SWAP < ; : <= ( n1 n2 -- flag=n1<=n2) OVER OVER < ROT ROT = OR ; : >= ( n1 n2 -- flag=n1>=n2) SWAP <= ; : ?DUP ( n -- 0|n n) DUP IF DUP THEN ; : ABS ( n -- |n|) DUP 0< IF NEGATE THEN ; : SQRT ( +n -- +n**0.5) \ rounded down >R 0 DUP \ adapted from W. Baden BEGIN DUP R@ < WHILE SWAP 1 + SWAP OVER 2* + 1 + REPEAT R> DROP DROP ; : MOVE ( src@ dest@ count --) OVER + >R ( -- src@ dest@) ( R: -- last@) BEGIN DUP R@ XOR WHILE OVER @ OVER ! 1 + SWAP 1 + SWAP REPEAT R> DROP DROP DROP ; : U+ ( lsc1 lsc2 -- carry:sum) \ 31-bit add with carry in msb MAX+ AND SWAP MAX+ AND + ; : ?C ( lsc -- lsc 1|0) \ generate carry if msb = 1 DUP 0< IF 1 RET ELSE 0 THEN ; : D+ ( d1 d2 -- d1+d2) >R SWAP >R ( -- lsc1 lsc2) ( R: -- msc1 msc2) U+ ?C ( -- lsc1+lsc2 1|0) R> + R> + ; ( -- d1+d2) ( R: --) : DNEGATE ( d -- -d) INVERT SWAP INVERT SWAP 1 0 D+ ; : D- ( d1 d2 -- d1-d2) DNEGATE D+ ; \ Compiler stack security checking is sacrificed to allow the following \ examples to work. \ A calculated constant defined via colon definition. MIN- 10000 / : LEAST-Q15 ( -- k) LITERAL ; \ An uninitialized variable defined via colon definition. HERE 1 ALLOT : MY-VAR ( -- @) LITERAL ; \ An initialized variable defined via colon definition. HERE 12345678 , : MY-IVAR ( -- @) LITERAL ; \ An uninitialized array defined via colon definition. HERE 10 ALLOT : MY-ARRAY ( +n -- @) LITERAL + ; [David Cary: tiny little additions: \ copy instruction pointer to T (*Must* be called, *cannot* be inlined) : IP>T ( -- IP ) R> dup >R ; : <= ( n1 n2 -- flag=n1<=n2) > 0= ; \ These versions of 0<> only use 1 LIT, rather than 2 LITs. \ But are they really any better than the first 0<> ? : 0<> ( n -- flag=n!=0) true swap dup if else swap then drop ; : 0<> ( n -- flag=n!=0) true swap dup if swap then swap drop ; see http://www.rdrop.com/~cary/html/computer_architecture.html end http://www.rdrop.com/~cary/mirror/nFORTH.txt ]