Computer Architecture

updated 2002-12-21.

Some notes on Computer Architecture. Very incomplete.

Contents:

[FIXME: should I put links to Beowulf here ?]

David also maintains related files:

``I wisdom dwell with prudence, and find out knowledge of witty inventions.'' -- Proverbs 8:12

news

computer architecture news

The latest computer architecture information is at

computer architecture comic strips

The Freedom CPU Project

DLX

is one of the architectures suggested for the F-CPU. Its major advantage is: gcc has already been ported to it, so we don't need to write a new compiler on top of everything else. So we can immediately compile the Linux kernel and other code for this chip *now*, before we even start doing anything else, which should make it easy to cache analysis and other trade-off analysis. Also, it means that when the hardware is turned on for the first time, it could boot directly into Linux, which would be cool.

(Can we really go to 64 bit architecture and still use this compiler ?)

(Can we really use a TTA architecture for the attached FPUs ?)

MMIX

is one of the architectures suggested for the F-CPU.

TTA

is one of the architectures suggested for the F-CPU.

Using FPGAs to simulate CPUs

(only with FPGAs can you have reconfigurable computing)

see also robot_links.html#PLD Programmable Logic (FPGA, PLD, CPLD, etc.) and the devices needed to program them.

see also simple_cpu for simple CPU architectures that one would think would be simple to implement in a FPGA.

see also vlsi.html#PCI_on_FPGA for FPGA devices compatible with the PCI buss.

Extremely Simple CPU Architectures

see also "Tiny robots" robot_links.html#tiny

I am fascinated by CPU designs that are extremely simple, approaching "minimal", in several (sometimes incompatible) senses of the word "simple".

Here "CPU", "MCU", and "PE" are almost interchangeable.

Often these characteristics exhibit synergy -- when you eliminate some opcodes, that eliminates some gates (making it lower-cost) and makes it easier to understand (less documentation required). Occasionally, though, driving a design to extreme simplicity according to one measure causes extra complexity in another area to compensate. This is called the the Turing tarpit #tarpit .

For purposes of robotics, "low-cost" and "Simple interface" are usually the dominant considerations. Some of the concepts of massively parallel processing are also present when one builds swarms of simple robots, but the kind of random communication between constantly-changing arrangements of simple robots is very different than the communication between the rigidly connected (most commonly in a 2 D mesh or a hypercube) elements of common cellular automata and multi-processors.

See also Using FPGAs to simulate CPUs #FPGA for some very simple CPUs designed to fit onto FPGAs. 2 very different reasons for simplicity there: (a) a simpler CPU can fit onto a smaller FPGA, making it much less expensive. (b) making the CPU smaller allows you to fit more copies of the CPU on a given FPGA, increasing MIPs at no cost. [FIXME: should combine these into 1 section ? Since the same op-code set may be implemented many ways, in TTL, in FPGA, in custom VLSI, it doesn't really make sense to split them into seperate sections ... On the other hand, ``less hardware'' means something a little different in these 3 technologies. ]

[here I *list* simple CPUs I know about; Opcode considerations #considerations and #FPGA also talk about tips for designing new CPU architectures. ]

... Here I also list all the CPUs I know about that were built up out of TTL.

cellular automata

cellular automata is related to other interests I have:

[FIXME: CA stuff scattered elsewhere ...]

The cellular automata questions I'm most interested in are:

There's a little loop here -- first we use (simulated) cellular automata to learn about replication, then we use that information to design replicating tools. We also use (simulated) cellular automata to explore good rules and good patterns for computronium. Then we use those replicating tools to (build enough copies of themselves to) build computronium. Then we use *that* computronium (hardware cellular automata) as a better computer.

self-replication

It has been shown that self-replicators can be designed in cellular automata #cellular_automata . some of this deals with ``real'', physical replication. While other parts -- cellular automata, quines, etc -- deal only with patterns in a computer. Should I separate them ? But some of the theory applies to both.

related to reconfigurable robots robot_links.html#reconfigurable and robot construction (humans building robots) robot_links.html#construction and tool closure 3d_design.html#closure . and the bootstrap problem [FIXME:] [FIXME: cross-link all the self-replication stuff on my web pages. nano, robot, cellular automata, etc. Point back and forth from ``replication'' section to computer architecture # cellular automata robots nanotech idea_space and unknowns ]

Minimum Instruction Set Computing (MISC)

see #simple_cpu and minimal_instruction_set.html

other attempts at a minimal instruction set

Steamer16: a high-performance homebrewer's microprocessor http://www3.sympatico.ca/myron.plichota/steamer1.htm by Myron Plichota <myron.plichota at sympatico.ca>. written in VHDL and prototyped on a single wire-wrapped protoboard. Has only 7 instructions. Packets of 5 instructions (5 instructions * 3 bits = 15 bits/packet) execute in 6 cycles/packet at a cycle rate of 20 MHz. Both the address and data busses are 16 bits. Unusual ``ArF'' call/return protocol.

qUark ../mirror/quark.txt a viable stack-computer with 4-bit opcodes (c) vic plichota, original concept by Myron Plichota Dec '98.

------------------------------
Date: Sat, 20 Feb 1999 12:54:34 +0300
From: "Stas Pereverzev" 
To: MISC
Subject: Re: nFORTH v2.3
Content-Type: text/plain;
	charset="koi8-r"
Content-Transfer-Encoding: 7bit

>Comments, folks?


You need only five instructions, not 16. They are:

ALU:
1. nand
2. shr

RAM:
3. store ( addr n -- )
4. lit   ( -- n )

CONTROL:
5. ncret  \ JUMP to addr in N, if carry flag isn't set in T,
          \ also drop both T and N


Also, if PC is memory variable (or can be addressed as memory variable)
we can awoid "ncret" instruction:
In that case we sholud use NCSTORE instead STORE:
ncstore (addr n flag -- )

ncstore (addr n 0 ) - same as store,
ncstore (PC n -1 )  - same as ncret

We need only 2 bits per instruction in that case.

That all folks ;-)

Stas.

------------------------------

(does this really work ???)

Turing tarpit

Turing Tar Pit.

Opcode considerations

Some things you might think about if you are designing a new instruction set (for a Von Neumann machine).

subroutines

[FIXME: I have a lot of information on subroutines and the importance of well-factored code scattered about ... should I gather it all together here, or since it's so very important, leave a brief reference at each place ?]

Taxonomy of Multiprocessors

also see Koopman's taxonomy of microprocessors [FIXME: ]

_Computation Structures_ by Stephen A. Ward and Robert H. Halstead, Jr. (c)1990 by The Massachusetts Institute of Technology The MIT Press

p.606 Models of Computation The von Nuemann model of computation ... alternative models ... include:

The above list is neither exhaustive nor even representative...

... p.612 Taxonomy of Multiprocessors ... Michael Flynn's taxonomy ...

Computation Structures

Ward and Halstead

quotes from _Computation Structures_ by Stephen A. Ward and Robert H. Halstead, Jr. (c)1990 by The Massachusetts Institute of Technology The MIT Press

... p. 347 Programming languages offer several /classes/ (allocation disciplines) for storage. These differ ... in the degree to which responsibility for allocation and deallocation of storage is assumed implicitly by the system (as opposed as being left to the programmer).

... common storage classes:

... p.354 the /general-register/ organization, in which ... active registers are available for explicit use by programs to hold frequently accessed data. ...

Proponents of the general-register organization cite two relatively independent argument in its favor:

... objections to the general-register scheme: * Lack of implementation transparency. The number of general registers is effectively "frozen" in the machine-language specification; a higher-performance implementation, offering a greater number of registers, will require modification of the machine langaguage and hence of the software designed for it. * Programming difficulty. The need to allocate registers to variables complicates the programming task.

... There are alternatives to the general-register organization that offer, at some additional implementation cost, its performance and compactness advantages while avoiding the above objections. ... cache-memory ... ... tricks for address encoding ... [DAV: is it really possible to have a "no programmer-visible registers" architecture ?]

... p.365 Pascal and Ada are also concerned with /safety/ in the sense that they strive to make as many programming errors as possible detectable at compile time rather than run time.

neural networks (NNs)

[FIXME: gather other NN information scattered over my other pages here.]

minimal-gate CPU

[FIXME: move to #simple_cpu

http://www.dejanews.com/getdoc.xp?AN=411378028 says:

From: mj@isy.liu.se (Michael Josefsson)
Subject: How to do the smallest processor ever?
Date: 13 Nov 1998 00:00:00 GMT
Newsgroups: comp.arch

Hi all,

while everybody seem to make everything as parallell as possible (to get the
speed up), I tinker with the idea of making the easiest possible processor.

Wherever I can, I want to get away from hardware. So these are some ideas that I have
come up with (discussions baits perhaps):

1. Make it serial(!). Yes, it slows down things but OTOH there is only 1 bit
busses. Adding 2 numbers (8-bit) would require a great number of clocks in
multiples of eight. Slow? Yes! But feasible.

2. The original idea used only one 1 bit bus but I have come to the conclusion
(opposite the reason to do this in the first place) that it is far more easy
to implement something using 2 or 3 buses.

3. Minimise the number of registres. Yes a PC, IR and AR (address register)
would be hard to get by without, but otherwise it would suffice to have only
one other register - the accumulator. Incrementing PC is done with the ALU.

4. External memory is always parallell so I imagine that there would be 8
bit busses to the outside.

5. Speed would be slow with some 10-15 clocks to get the smallest operation
done. Given a TTL implementation the total number of mips, at 10 MHz say,
would be around 0.7 - not much, but then again what would one expect from a
handful of TTLs?

Some ASCII graphics might help



                (for shifts)                        * 1bit buses everywhere!
           |-----------<------------                * Accumulator knows how to shift
	|  -|\   ------------------ |               * Everything is clocked bit-by-bit
        |---| |--|8bit accumulator|--------|        * PC and IR better parallell?
        |   |/   ------------------        |
        |                                  |
        |                                  |
        |                    0/1(ld/pc++)  |            Proc Control
        |              /|     |            |            /\/\/\
        |            /  |-----/------------|             | | |
        |-----------|ALU<                             --------------
        |            \  |---------/-------------|-----|IR/OP decode|
        |              \|         |             |     --------------
        |           1bit ALU      |             |
        |                         |             |
        |          -------------  |             |
        |----------|Program cntr|-|       --------------
        |          -------------  |       |Parallell to|
        |                         |       |serial conv.|
        |          -------------  |       --------------
        |----------|Addr reg    |-|             |
        |          -------------  |      From ext memory
        |                         |
        |                         |
   -----------               -----------
   |serial to|               |serial to|
   |parallell|               |parallell|
   |converter|               |converter|
   -----------               ----------
        |                         |
        V                   Ext address
    To ext memory




Has this been done anywhere? The focus seems to get everything as fast as
possible and not as small as possible.


Cheers,
/Micke Josefsson
University of Linkoping
Sweden

One reply http://www.dejanews.com/getdoc.xp?AN=411507856.1

From: Gary Boone <gary@micromethods.com>
Subject: Re: How to do the smallest processor ever?
Date: 13 Nov 1998 00:00:00 GMT
X-Posted-Path-Was: not-for-mail
Content-Type: text/plain; charset=us-ascii
X-ELN-Date: Fri Nov 13 09:05:21 1998
Organization: Micro Methods
Mime-Version: 1.0
Newsgroups: comp.arch

Regarding minimal processor history aspects:

1. See "Computer Structures ..." Siewiorek, Bell & Newell,
   1982, beginning at page 581, for a description of TMS1000
   microcontrollers, an increased function (and cost
reduced)
   re-design based on TMS0100 architecture that I worked on.

2. In 1971 "state of the art" 10-micron 1-metal PMOS,
TMS0100
   4-bit data paths were used, not for performance, rather
to
   allow less chip area (and single-chip design) compared to
   then-conventional 1-bit serial data paths. Consider, for
   example, area savings due to 3-transistor RAM compared to
   conventional 6-transistor shift register memory. Control
   decode for 4-bit data paths also saved (no "bit" timing).

3. The "most minimal" processor I know of was made at
Litronix
   circa 1976, for use in digital watches. As I recall, this
   processor could only count, not even add. The architect
was
   Steve McCrystal. I've lost track of my copy of his
amazing
   instruction set. Steve, if you see this, please post it!

Gary Boone

The Motorola MC14500B http://www.microprocessor.sscc.ru/great/s2.html#MC14500B http://www.questlink.com/keyword.htm?proc=keyword&view=list&kwrd=14500 http://www.questlink.com/page.htm?proc=brief&part=MC14500B&vendor=11205&brief=22059&view=153540 had only 16 pins; processed everything serially.

The Museum of HP Calculators http://www.hpmuseum.org/ mentions some zero-IC calculators (slide rules, discrete transistors).

stack machines

(see Minimum Instruction Set Computing (MISC) for more stack machines)

see also data_compression.html#space-optimizing_compiler and data_compression.html#compact_instruction_set for more thoughts on compact, dense, non-bloated code.

operating systems development (your own OS)

David Cary thinks that if you know enough to develop an adequate OS, your time and skill could be applied to other projects more worthy of your effort todo.html .

Nevertheless, there's nothing like producing a system where you wrote every byte of its software. (It may be a waste of time, but it's sure a learning experience). You might want to start with a simple PIC robot_links.html#pic .

See also

[FIXME: consider moving to http://c2.com/cgi/wiki?OperatingSystemsResearch ... http://c2.com/cgi/wiki?OperatingSystemsDesignPrinciples ... or perhaps http://cliki.tunes.org/Microkernel-based%20OS ]

David Cary is fascinated by the variety of computer architectures, and nearly all of the "layers of abstraction" idea_space.html#level of a computer system (from the raw atoms, electrons, and photons at the bottom ... through typography ... to the user interface at the surface). However, I am singularly uninterested in the OS layer. I've seen this sequence of events far too often:

  1. Someone proposes a cool new computer architecture.
  2. then insists that a cool new OS must be written for it.
  3. Which means that all-new applications must be written for this OS.
  4. The time and effort of all these software projects drains the energy of the visionary. Time and resources are spread too thin.
  5. None of these projects get enough critical mass to finish a "killer application".
  6. all this effort comes to naught.

(For example of the ``everything must change'' mentality, see ``a major architectural overhaul is still required, on pretty much all fronts'' -- Ron http://www.cs.caltech.edu/~adam/LOCAL/faq-fork.html#munchkins , or check out Luke 5:36-38 )

Why re-invent the wheel ? There are already far too many OSes out there (in my opinion). Alan Turing proved that any OS can be ported to any computer architecture. Of course we want systems that are better in *every* area, but (correct me if I'm wrong here) it should be possible to just improve one area at a time. ( That's the main idea behind "scaffolding" and "levels" idea_space.html#level )

If you're interested in computer architecture, fine, invent a new architecture, and get an ugly old OS running on this clean new hardware (I hear that Linux is *designed* to be easy to port).

If you want to make your own OS, fine, get it running on ugly old hardware and port a bunch of the ugly old applications to it (there are tons of "free source" applications *designed* to be easy to port .

There's tons of applications (both open and closed source) that are written in C or C++. So you might consider

Porting "gcc" or another C compiler #porting_c to your OS. Once you have the C compiler and C libraries ported to your OS, suddenly thousands of programs "just work".

Porting a FORTH or FROTH compiler http://sourceforge.net/projects/froth/ is much easier than gcc, but not as many applications are written in this language.

If you're really serious about creating your own OS, here's some links I think you might be interested in.

First we have some general links relevant to developing a OS:

Next we have lots of stuff about particular OS ideas (usually embedded in a specific OS):

Some systems have the GUI mixed up in the OS. Other systems have a distinct GUI layer between between the application and the OS.

[FIXME: should I split out Beowulf and similar "operating systems" into a seperate category ?]

Hope you have fun !

compiler

There seems to be some confusion about what a compiler can and cannot do.

Here I have information about compilers in general. Also see Porting C compilers #porting_c

I'm especially interested in:

compiler quotes

Porting C compilers

Porting the "gcc" compiler; Porting other C compilers

see also general compiler quotes #compiler

2 sections here:

Porting "gcc", a full-size commercial-strength open-source compiler, and re-targeting it as a cross-compiler.

porting simpler C compilers, especially ones targeting very simple processors such as the Microchip PIC, especially ones that optimize space over time. This includes "Small C", a highly restricted subset of C (no structures; DAV finds this annoying) that was designed to require a far simpler compiler than one that can handle the full C language.

other C Compilers:

  • Small C cross-compiler for the Motorola DSP56800 http://home.attbi.com/~petegray

    Since the full source to GCC is so large, perhaps it would be good to get a simple C compiler working first on a new OS or architecture.

    (Does it make sense to bootstrap gcc using Small C or another C compiler ? Or should we just use gcc on a previous machine as a cross-compiler ? )

    information about porting "gcc":

    low power design

    low power design [Should this be moved to vlsi.html ?]

    Designing for low power has secondary benefits ... assembling a quiet PC hardware_david_uses.html#quiet_pc

    There are many levels to low-power design. To get a low-power web server (top level) requires software that doesn't waste power, a low-power CPU, low-power clocking, and low-power circuits (including low-power oscillator). [FIXME: the last section is over at schematic.html; should I break up the other levels into their own sections ?]

    see also:

    external links:

    top N books on CPU development

    historical computer architectures (computer museums)

    Here I point to a few museums that list lots and lots of computer architectures. [FIXME: delete all the redundant stuff on this page that can be found easier at these ``museums''].

    Some of these architectures can be simulated on FPGAs .

    Also see robot_links.html#ucontrollers for information about a few of the most popular architectures in production.

    dealing with interrupts

    dealing with interrupts: some ideas about interrupt-time code. [FIXME: make a clean seperation between peripheral interrupts here (which the CPU can safely ignore for one or two cycles), and MMU traps f_mmu.html which must block WRITE instructions before they overwrite ``protected'' memory. ] [FIXME: should I seperate out things that are only useful to CPU designers, vs. things that are only useful to CPU users ?]

    FORTH

    the FORTH programming language [FIXME: lots of links scattered elsewhere ...] [FIXME: should this section go elsewhere ? perhaps robot_links.html or to_program.html ? or its own dedicated page ? ]

    latest Forth news is, of course, at news:comp.lang.forth | http://groups.google.com/groups?group=comp.lang.forth .

    [FIXME: should I dump all these links over to one of the Forth Wiki ? ]

    ?]

    see also linux.html#style_guides

    see also machine_vision.html#fft for FFT programmed in FORTH.

    see also book.html for some books on forth [FIXME: move here, just leave link to here at book.html ?] [FIXME: should I seperate out 2 lists: one dealing with implementation (direct-threaded vs. indirect threaded, etc.) (perhaps useful for OS writers) and one dealing with programming (tutorials, advanced ideas, source code) ?]

    benchmarks

    benchmarks for computer architecture. CPU design goals. benchmarking.

    While faster is better, all else being equal, many times all else is not equal. Speed is becoming less important in computer design (many tasks can already be completed ``fast enough''). Power usage, reliability, ease-of-use, and other factors are becoming more important. We must make sure to test for the things we really want creed.html#really_want , and we don't need to test for things that are irrelevant.

    What do we really want ? Here are a few of the design goals for various systems. Naturally, if your particular goals heavily emphasize one of these, it will require a different architecture than one that emphasizes a different goal.

    CPU design goals: (see computer_architecture.html#CPU_goals )(2003-01-10)

    Over at data_compression.html#benchmark I list benchmark test images for image compression.

    assembling a quiet PC hardware_david_uses.html#quiet_pc

    One might think that comparing 2 pieces of software ``which one compresses my files smaller ?'', vs. comparing 2 CPUs ``which one executes my code faster / better / with less power'' are completely unrelated. But there's one tiny bit of overlap: data_compression.html#program_compression indicates that while dealing with compressed data and compressed executable code is often slower than uncompressed, sometimes there's a synergy -- if we compress it enough (using techniques I talk about there) so it all fits in a higher-level cache, then our programs run faster (what I'm talking about here).

    Over at bignums.html#cpu and at robot_links.html#ucontrollers and at vlsi.html#cRAM I list information about some historical CPUs. [FIXME: should I merge them ?]

    Why are there not more ``reliability'' benchmarks ?

    Other pages about lots of CPUs, reviews, compare and contrast, benchmarks, etc.

    hardware compilers

    unsorted

    unsorted


    Part of the

    Previous Utopia Webring Next
    Random Choice


    This page started 1998-04-21 and has backlinks

    known by AltaVista

    known by Yahoo!

    known by infoseek

    Send comments, suggestions, errors, bug reports to

    David Cary feedback.html
    d.cary@ieee.org.

    Return to index // end http://rdrop.com/~cary/html/computer_architecture.html