About Different Types of Processors

http://www.sxc.hu marinm

There are many different processors on the market. However, there are only a few that you should consider purchasing. Whether you're buying a computer off the shelf, building it from scratch, or upgrading your CPU, you must put some time and thought into which processor to buy. The choice you make today will affect your computer's speed and functionality for years to come.

Types

1. There are two primary manufacturers of computer microprocessors. Intel and Advanced Micro Devices (AMD) lead the market in terms of speed and quality. Intel's desktop CPUs include Celeron, Pentium, and Core. AMD's desktop processors include Sempron, Athlon, and Phenom. Intel makes Celeron M, Pentium M, and Core mobile processors for notebooks. AMD makes mobile versions of its Sempron and Athlon, as well as the Turion mobile processor which comes in Ultra and Dual-Core versions. Both companies make both single-core and multi-core processors.

Features

2. Each processor has a clock speed which is measured in gigahertz (GHz). Also, a processor has a front side bus which connects it with the system's random access memory (RAM.) CPUs also typically have two or three levels of cache. Cache is a type of fast memory which serves as a buffer between RAM and the processor. The processor's socket type determines which motherboard it can be installed on.

Function

3. A microprocessor is a silicon chip containing millions of microscopic transistors. This chip functions as the computer's brain. It processes the instructions or operations contained within executable computer programs. Instead of taking instructions directly off of the hard drive, the processor takes its instructions from memory. This greatly increases the computer's speed.

Considerations

4. If you're thinking about upgrading your processor yourself, you must check your motherboard specs first. The CPU you install must have the same socket size as the slot on the motherboard. Also, when you install a new processor, you may need to install a heat sink and fan. This is because faster processors produce more heat than slower ones. If you fail to protect your new CPU from this heat, you may end up replacing the processor.

Size

5. When it comes to processors, size matters. Whether you're buying a new computer or upgrading your old one, you must get the fastest processor you can afford. This is because the processor will become obsolete very quickly. Choosing a 3.6 GHz processor over a 2 GHz today can buy you several years of cheap computing time. Also check the speed of the front side bus (FSB) when purchasing your new computer or CPU. A front side bus of 800 MHz or greater is essential for fast processing speeds. The processor's cache is also important. Make sure it has at least 1 MB of last level cache if your computing needs are average. If you're an extreme gamer or if you run intensive graphics programs, get the processor with the largest cache that fits your budget. There can be hundreds of dollars' difference between the cheapest processors and the most expensive ones. However, investing just a little extra cash can get you a much better processor.

Benefits

6. Getting a processor with a dual, triple, or quad core can make a significant difference in the processing power of your computer. It's like having two, three, or four separate processors installed on your computer at one time. These processors work together to make your computer multitask faster and with greater efficiency. Getting a CPU with a larger front side bus can enhance the processor's ability to communicate with RAM, which will increase your computer's overall speed.

Functions of CPU Processor

I want to do this! What's This?

Microsoft Clipart

A CPU processor or central processing unit controls the functions of most electronic products. The CPU accepts the input data, processes the information and sends it to the component that is in charge of executing the action. CPUs are also known as microprocessors and are at the center of any computer system. Although CPUs are most often thought of as a computer chip, they can also be found in many other electronic devices including cell phones, hand held devices, microwaves, television sets and toys.

History

1. The CPU evolved from miniature transmitters and integrated circuits which were developed in the early 1960s by IBM and other top technology companies of the time. By the early 1970s, transmitting integrated circuits were being manufactured commercially and engineers took that technology and developed the CPU. Harnessing the transmission abilities of integrated circuits, engineers added the ability to process information and memory power, Combined, these elements became the core of the CPU. By the end of the 1970s, technology had reached the point where CPUs could be commercially produced and were the size of a fingernail.

During the 1980s, CPUs became a standard component in consumer electronics. They could be found in cameras, television sets and pocket calculators. By the next decade, the small size and cheap production cost of the CPU allowed computers to cross over from industry to the home. Today, engineers continue to fine tune CPUS, making them smaller and more powerful.

CPU Parts

2. CPUs are made up of six key components, which work in conjunction to process and execute commands. The control unit is the brain of the CPU. This part receives the input data and decides where to send the processed information. The instruction cache is where the control unit's instructions are stored. Specific instruction data is loaded into the CPU when it is manufactured. The pre-fetch unit is the information portal. Input data goes through the pre-fetch which stores a copy of the data before sending it on to be processed by the control unit. The decode unit translate the input instruction into binary code, which is then sent on to the ALU. The arithmetic logic unit, or ALU, receives the code from the decode unit and chooses the action needed to carry out the command. RAM and ROM are the CPUs memory cache. Here, all information that has been sent, received or preloaded is stored. Sections of the RAM and ROM can be accessed by the system user.

Process

3. There are a series of steps that a CPU performs to execute a command. Each command is handled individually and a CPU can process multiple commands in a matter of seconds. The more powerful the CPU, the faster the commands are processed.
1. A command is issued by the system user using an input device such as a keyboard or mouse.
2. The command is sent to the prefetch unit. The unit accesses the preloaded CPU memory to identify the command and sends it to the command unit.
3. The command unit determines what steps come next. This data is passed on to the decode unit.
4. The decode unit transfer the data into binary code and sends it to the ALU.
5. The ALU changes the raw data into an actual command.
6. The ALU sends a copy of the command to the RAM or ROM before sending it back to the command unit.
7. The command unit sends the code to the part of the system that will actual perform the action.
8. The action is executed and the result is sent back to the user.

Types

4. There are different types of CPUs; each type comes with varying degrees of speed memory and preset instructions. The larger the CPU, the faster it can process, store and execute commands. A single core CPU is the smallest unit available. It is usually found in smaller appliances that only perform a simple set of actions such as a remote control or toy. Dual-core CPUs contains two command units and contain enough power and memory for most personal computers. Multi-core CPUS contain multiple command units. They are mainly used by large industrial electronic devices, servers, and network work stations.

Sizes

5. CPU size refers to the unit's power to perform tasks and the amount of memory space it contains. CPU size is measured in binary digits and are called bits. Originally, CPUs contained 4-bits but that has since evolved into 8 bits. 8 bit CPUs are the smallest and slowest components available and are used mostly in toys or household appliances.
16-bit and 32-bit have become the standard CPU size and can be found in personal computers, laptops, cell phones and other electronic devices that can perform a variety of tasks. 64-bit CPUs are becoming increasingly popular in high-end personal computers and laptops. There are also larger CPUs which are usually used for industrial purposes.

From AT to BTX:
Motherboard Form Factors

You've probably heard the term motherboard a thousand times, but do you know what it really means and how it relates to the rest of your computer?

The form factor of a motherboard determines the specifications for its general shape and size. It also specifies what type of case and power supply will be supported, the placement of mounting holes, and the physical layout and organization of the board. Form factor is especially important if you build your own computer systems and need to ensure that you purchase the correct case and components.

The Succession of Motherboard Form Factors

AT & Baby AT
Prior to 1997, IBM computers used large motherboards. After that, however, the size of the motherboard was reduced and boards using the AT (Advanced Technology) form factor was released. The AT form factor is found in older computers (386 class or earlier). Some of the problems with this form factor mainly arose from the physical size of the board, which is 12" wide, often causing the board to overlap with space required for the drive bays.

Following the AT form factor, the Baby AT form factor was introduced. With the Baby AT form factor the width of the motherboard was decreased from 12" to 8.5", limiting problems associated with overlapping on the drive bays' turf. Baby AT became popular and was designed for peripheral devices — such as the keyboard, mouse, and video — to be contained on circuit boards that were connected by way of expansion slots on the motherboard.

Baby AT was not without problems however. Computer memory itself advanced, and the Baby AT form factor had memory sockets at the front of the motherboard. As processors became larger, the Baby AT form factor did not allow for space to use a combination of processor, heatsink, and fan. The ATX form factor was then designed to overcome these issues.

ATX
With the need for a more integrated form factor which defined standard locations for the keyboard, mouse, I/O, and video connectors, in the mid 1990's the ATX form factor was introduced. The ATX form factor brought about many chances in the computer. Since the expansion slots were put onto separate riser cards that plugged into the motherboard, the overall size of the computer and its case was reduced. The ATX form factor specified changes to the motherboard, along with the case and power supply. Some of the design specification improvements of the ATX form factor included a single 20-pin connector for the power supply, a power supply to blow air into the case instead of out for better air flow, less overlap between the motherboard and drive bays, and integrated I/O Port connectors soldered directly onto the motherboard. The ATX form factor was an overall better design for upgrading.

micro-ATX
MicroATX followed the ATX form factor and offered the same benefits but improved the overall system design costs through a reduction in the physical size of the motherboard. This was done by reducing the number of I/O slots supported on the board. The microATX form factor also provided more I/O space at the rear and reduced emissions from using integrated I/O connectors.

LPX
White ATX is the most well-known and used form factor, there is also a non-standard proprietary form factor which falls under the name of LPX, and Mini-LPX. The LPX form factor is found in low-profile cases (desktop model as opposed to a tower or mini-tower) with a riser card arrangement for expansion cards where expansion boards run parallel to the motherboard. While this allows for smaller cases it also limits the number of expansion slots available. Most LPX motherboards have sound and video integrated onto the motherboard. While this can make for a low-cost and space saving product they are generally difficult to repair due to a lack of space and overall non-standardization. The LPX form factor is not suited to upgrading and offer poor cooling.

NLX
Boards based on the NLX form factor hit the market in the late 1990's. This "updated LPX" form factor offered support for larger memory modules, tower cases, AGP video support and reduced cable length. In addition, motherboards are easier to remove. The NLX form factor, unlike LPX is an actual standard which means there is more component options for upgrading and repair.

Many systems that were formerly designed to fit the LPX form factor are moving over to NLX. The NLX form factor is well-suited to mass-market retail PCs.

BTX
The BTX, or Balanced Technology Extended form factor, unlike its predecessors is not an evolution of a previous form factor but a total break away from the popular and dominating ATX form factor. BTX was developed to take advantage of technologies such as Serial ATA, USB 2.0, and PCI Express. Changes to the layout with the BTX form factor include better component placement for back panel I/O controllers and it is smaller than microATX systems. The BTX form factor provides the industry push to tower size systems with an increased number of system slots.

One of the most talked about features of the BTX form factor is that it uses in-line airflow. In the BTX form factor the memory slots and expansion slots have switched places, allowing the main components (processor, chipset, and graphics controller) to use the same airflow which reduces the number of fans needed in the system; thereby reducing noise. To assist in noise reduction BTX system level acoustics have been improved by a reduced air turbulence within the in-line airflow system.

Initially there will be three motherboards offered in BTX form factor. The first, picoBTX will offer four mounting holes and one expansion slot, while microBTX will hold seven mounting holes and four expansion slots, and lastly, regularBTX will offer 10 mounting holes and seven expansion slots. The new BTX form factor design is incompatible with ATX, with the exception of being able to use an ATX power supply with BTX boards.

Today the industry accepts the ATX form factor as the standard, however legacy AT systems are still widely in use. Since the BTX form factor design is incompatible with ATX, only time will tell if it will overtake ATX as the industry standard.

Did You Know...
ATX and Baby AT boards are approximately the same size, but the ATX board is rotated 90 degrees within the case to allow for easier access to components.

The motherboard is a vital component that is responsible for smooth processing of data occurs on a computer. There was a little damage to the motherboard will cause damage to the entire system, because the motherboard is place for all the hardware components attached. Start from the graphics card, RAM, hard disk, optical drive, all combined into one unit by using the port in the motherboard. Below is list of general motherboard component :

CPU Socket : This section is the place laid a processor even with the many small holes that are used to place a pin processor.
RAM Slots : Places to install the memory module (RAM). For a modern motherboard, usually there are 4 slots DDR2 PC 6400 with a capacity of up to 8GB and support for dual-channel configuration.
Power Port : Old type use 20 +4 pin, while new types have been using 24 +4 pin. Through this port all the power needed by the system supplied through the power supply.
Serial ATA Port : The new type motherboard using this port. Usually there are 4 to 8. It is used for hard disk drives where using SATA interface. Cable that is used is usually smaller than the IDE cable. SATA technology is growing rapidly, at this time is to reach the next generation of SATA-2 with data transfer speeds up to 3Gb/s.
IDE Port : It is only found in old type of motherboard. That is used to install the hard disk drive and optical drives, it has an large size of the cable width.
Chip-set : A chip that regulates the data traffic on the system. There are 2 types of chip set with a different function with each other. North-bridge chip-set is located between the processor and RAM slots. This chip-set works so RAM and processor modules can work together (set of traffic data between the CPU and RAM). South-brigde chip-set is located on the bottom. Either the bottom of the processor or under the slot for graphics cards. Chip-set handle this work flow of traffic data from the graphics card, hard disk and other motherboard peripherals.
AGP / PCI Express Slot : Used to install the graphics card. AGP slot is rarely found in the new type, because the resulting bandwidth is not sufficient for the needs of graphics at this time. Now, almost all motherboards use the PCI Express slot, with a larger bandwidth. In fact, there are motherboards that have up to 4 PCI Express slots.
PCI (Peripheral Component Interconnect) : Is the place to put additional cards such as sound card, LAN card, TV tuner, and others. It is working at 33 Mhz frequency.
BIOS (Basic Input Output System) : Form of software that is embedded in the motherboard where the energy supplied from the motherboard battery. All initial configuration of the hardware that is installed can be accessed and changed through the BIOS.

Many motherboard manufacturers other products that are complete with many additional features that will make the compatibility of various hardware increase

Motherboard Troubleshooting

A.)GENERAL TESTING TIPS.

Before you begin, download a few of our Diagnostic Software Tools to pinpoint possible problem areas in your PC. Ideally, troubleshooting is best accomplished with duplicate parts from a used computer enabling "test" swapping of peripheral devices/cards/chips/cables. In general, it is best to troubleshoot on systems that have been leaned-out. Remove unnecessary peripherals (soundcard, modem, harddisk, etc.) to check the unworking device in as much isolation as possible. Also, when swapping devices, don't forget the power supply. Power incompetency (watts and volts) can cause intermittent problems at all levels, but especially with UARTS and HD's.

Inspect the motherboard for loose components. A loose or missing CPU, BIOS chip, Crystal Oscillator, or Chipset chip will cause the motherboard not to function. Also check for loose or missing jumper caps, missing or loose memory chips (cache and SIMM's or DIMM's). To possibly save you hours of frustration i'll mention this here, check the BIOS Setup settings. 60% of the time this is the cause of many system failures. A quick fix is to restore the BIOS Defaults. Next, eliminate the possibility of interference by a bad or improperly set up I/O card by removing all cards except the video adapter. The system should at least power up and wait for a drive time-out. Insert the cards back into the system one at a time until the problem happens again. When the system does nothing, the problem will be with the last expansion card that was put in.

B.)RESETTING CMOS.

Did you recently 'flash' your computers BIOS, and needed to change a jumper to do so? Perhaps you left the jumper in the 'flash' position which could cause the CMOS to be erased.

If you require the CMOS Reset and don't have the proper jumper settings try these methods: Our Help Desk receives so many requests on Clearing BIOS/CMOS Passwords that we've put together a standard text outlining the various solutions.

C.)NO POWER.

Switching power supplies (the most common used PC's), cannot be adequately field-tested with V/OHM meters. Remember: for most switching power supplies to work, a FLOPPY and at least 1 meg of memory must be present on the motherboard. If the necessary components are present on the motherboard and there is no power:

1) check the power cable to the wall and that the wall socket is working. (You'd be surprised!)
2) swap power supply with one that is known to work.
3) if the system still doesn't work, check for fuses on the motherboard. If there are none, you must replace the motherboard.

D.)PERIPHERAL WON'T WORK.

Peripherals are any devices that are connected to the motherboard, including I/O boards, RS232/UART devices (including mice and modems), floppies and fixed-disks, video cards, etc. On modern boards, many peripherals are integrated into the motherboard, meaning, if one peripheral fails, effectually the motherboard has to be replaced.* On older boards, peripherals were added via daughter boards.

*some MB CMOS's allow for disabling on-board devices, which may be an option for not replacing the motherboard -- though, in practicality, some peripheral boards can cost as much, if not more, than the motherboard. Also, failure of on-board devices may signal a cascading failure to other components.
1. New peripheral?

a) Check the MB BIOS documentation/setup to ensure that the BIOS supports the device and that the MB is correctly configured for the device.
(Note>> when in doubt, reset CMOS to DEFAULT VALUES. These are ) (optimized for the most generalized settings that avoid some of) (the conflicts that result from improper 'tweaking'. )
b) Check cable attachments & orientation (don't just look, reattach!)
c) If that doesn't work, double-check jumper/PnP (including software and/or MB BIOS set) settings on the device.
d) If that doesn't work, try another peripheral of same brand & model that is known to work.
e) If the swap peripheral works, the original peripheral is most likely the problem. (You can verify this by testing the non-working peripheral on a test MB of the same make & bios.)
f) If the swap periphal doesn't on the MB, verify the functionality of the first peripheral on a test machine. If the first peripheral works on another machine AND IF the set-up of the motherboard BIOS is verified AND IF all potentially conflicting peripherals have been removed OR verified to not be in conflict, the motherboard is suspect. (However, see #D below.)
g) At this point, recheck MB or BIOS documentation to see if there are known bugs with the peripheral AND to verify any MB or peripheral jumper settings that are necessary for the particular peripheral to work. Also, try a different peripheral of the same kind but a different make to see if it works. If it does not, swap the motherboard. (However, see #D below.)

2. Peripheral that worked before?

a) If the hood has been opened (or even if it has not), check the orientation and/or seating of the cables. Cables sometimes 'shake' loose or are accidentally pulled out by end-users, who then misalign or do not reattach them.
b) If that doesn't work, try the peripheral in another machine of the same make & bios that is known to work. If the peripheral still doesn't work, the peripheral is most likely the problem. (This can be verified by swapping-in a working peripheral of the same make and model AND that is configured the same as the one that is not working. If it works, then the first peripheral is the problem.)
c) If the peripheral works on another machine, double-check other peripherals and/or potential conflicts on the MB, including the power supply. If none can be found, suspect the MB.
d) At this point, recheck MB or BIOS documentation to see if there are known bugs with the peripheral AND to verify any jumper settings that might be necessary for the particular peripheral. Also, try another peripheral of the same kind but a different make to see if it works. If not, swap the motherboard!

E.)OTHER INDICATIONS OF A PROBLEM MOTHERBOARD.

1. CLOCK that won't keep correct time. >>Be sure to check/change the battery.

2. CMOS that won't hold configuration information. >>Again, check/change the battery.

Note about batteries and CMOS: in theory, CMOS should retain configuration information even if the system battery is removed or dies. In practice, some systems rely on the battery to hold this information. On these systems, a machine that is not powered-up for a week or two may report improper BIOS configuration. To check this kind of system, change the battery, power-up and run the system for several hours. If the CMOS is working, the information should be retained with the system off for more than 24 hours.

F.)BAD MOTHERBOARD OR OBSOLETE BIOS?

1. If the motherboard cannot configure to a particular peripheral, don't automatically assume a bad motherboard, even if the peripheral checks out on another machine -- especially if the other machine has a different BIOS revision. Check with the board manufacturer to see if a BIOS upgrade is available. Many BIOS upgrades can be made right on the MB with a FLASH RAM program provided by the board maker. See our BIOS page for more information.

Processor Types

ARM 1 (v1)

This was the very first ARM processor. Actually, when it was first manufactured in April 1985, it was the very first commercial RISC processor. Ever.
As a testament to the design team, it was "working silicon" in it's first incarnation, it exceeded it's design goals, and it used less than 25,000 transistors.

The ARM 1 was used in a few evaluation systems on the BBC micro (Brazil - BBC interfaced ARM), and a PC machine (Springboard - PC interfaced ARM).
It is believed a large proportion of Arthur was developed on the Brazil hardware.
In essence, it is very similar to an ARM 2 - the differences being that R8 and R9 are not banked in IRQ mode, there's no multiply instruction, no LDR/STR with register-specified shifts, and no co-processor gubbins.

ARM 2 (v2)

Experience with the ARM 1 suggested improvements that could be made. Such additions as the MUL and MLA instructions allowed for real-time digital signal processing. Back then, it was to aid in generating sounds. Who could have predicted exactly how suitable to DSP the ARM would be, some fifteen years later?
In 1985, Acorn hit hard times which led to it being taken over by Olivetti. It took two years from the arrival of the ARM to the launch of a computer based upon it...

...those were the days my friend, we thought they'd never end.
When the first ARM-based machines rolled out, Acorn could gladly announce to the world that they offered the fastest RISC processor around. Indeed, the ARM processor kicked ass across the computing league tables, and for a long time was right up there in the 'fastest processors' listings. But Acorn faced numerous challenges. The computer market was in disarray, with some people backing IBM's PC, some the Amiga, and all sorts of little itty-bitty things. Then Acorn go and launch a machine offering Arthur (which was about as nice as the first release of Windows) which had no user base, precious little software, and not much third party support. But they succeeded.

The ARM 2 processor was the first to be used within the RISC OS platform, in the A305, A310, and A4x0 range. It is an 8MHz processor that was used on all of the early machines, including the A3000. The ARM 2 is clocked at 8MHz, which translates to approximately four and a half million instructions per second (0.56 MIPS/MHz).

ARM 3 (v2as)

Launched in 1989, this processor built on the ARM 2 by offering 4K of cache memory and the SWP instruction. The desktop computers based upon it were launched in 1990.
Internally, via the dedicated co-processor interface, CP15 was 'created' to provide processor control and identification.
Several speeds of ARM 3 were produced. The A540 runs a 26MHz version, and the A4 laptop runs a 24MHz version. By far the most common is the 25MHz version used in the A5000, though those with the 'alpha variant' have a 33MHz version.
At 25MHz, with 12MHz memory (a la A5000), you can expect around 14 MIPS (0.56 MIPS/MHz).
It is interesting to notice that the ARM3 doesn't 'perform' faster - both the ARM2 and the ARM3 average 0.56 MIPS/MHz. The speed boost comes from the higher clock speed, and the cache.

Oh, and just to correct a common misunderstanding, the A4 is not a squashed down version of the A5000. The A4 actually came first, and some of the design choices were reflected in the later A5000 design.

ARM 250 (v2as)

The 'Electron' of ARM processors, this is basically a second level revision of the ARM 3 design which removes the cache, and combines the primary chipset (VIDC, IOC, and MEMC) into the one piece of silicon, making the creation of a cheap'n'cheerful RISC OS computer a simple thing indeed. This was clocked at 12MHz (the same as the main memory), and offers approximately 7 MIPS (0.58 MIPS/MHz).
This processor isn't as terrible as it might seem. That the A30x0 range was built with the ARM250 was probably more a cost-cutting exercise than intention. The ARM250 was designed for low power consumption and low cost, both important factors in devices such as portables, PDAs, and organisers - several of which were developed and, sadly, none of which actually made it to a release.

ARM 250 mezzanine

This is not actually a processor. It is included here for historical interest. It seems the machines that would use the ARM250 were ready before the processor, so early releases of the machine contained a 'mezzanine' board which held the ARM 2, IOC, MEMC, and VIDC.

ARM 4 and ARM 5

These processors do not exist.

More and more people began to be interested in the RISC concept, as at the same sort of time common Intel (and clone) processors showed a definite trend towards higher power consumption and greater need for heat dissipation, neither of which are friendly to devices that are supposed to be running off batteries.
The ARM design was seen by several important players as being the epitome of sleek, powerful RISC design.
It was at this time a deal was struck between Acorn, VLSI (long-time manufacturers of the ARM chipset), and Apple. This lead to the death of the Acorn RISC Microprocessor, as Advanced RISC Machines Ltd was born. This new company was committed to design and support specifically for the processor, without the hassle and baggage of RISC OS (the main operating system for the processor and the desktop machines). Both of those would be left to Acorn.

In the change from being a part of Acorn to being ARM Ltd in it's own right, the whole numbering scheme for the processors was altered.

ARM 610 (v3)

This processor brought with it two important 'firsts'. The first 'first' was full 32 bit addressing, and the second 'first' was the opening for a new generation of ARM based hardware.
Acorn responded by making the RiscPC. In the past, critics were none-too-keen on the idea of slot-in cards for things like processors and memory (as used in the A540), and by this time many people were getting extremely annoyed with the inherent memory limitations in the older hardware, the MEMC can only address 4Mb of memory, and you can add more by daisy-chaining MEMCs - an idea that not only sounds hairy, it is hairy!
The RiscPC brought back the slot-in processor with a vengeance. Future 'better' processors were promised, and a second slot was provided for alien processors such as the 80486 to be plugged in. As for memory, two SIMM slots were provided, and the memory was expandable to 256Mb. This does not sound much as modern PCs come with half that as standard. However you can get a lot of milage from a RiscPC fitted with a puny 16Mb of RAM.

But, always, we come back to the 32 bit. Because it has been with us and known about ever since the first RiscPC rolled out, but few people noticed, or cared. Now as the new generation of ARM processors drop the 26 bit 'emulation' modes, we RISC OS users are faced with the option of getting ourselves sorted, or dying.
Ironically, the other mainstream operating systems for the RiscPC hardware - namely ARMLinux and netbsd/arm32 are already fully 32 bit.

Several speeds were produced; 20MHz, 30Mhz, and the 33MHz part used in the RiscPC.
The ARM610 processor features an on-board MMU to handle memory, a 4K cache, and it can even switch itseld from little-endian operation to big-endian operation. The 33MHz version offers around 28MIPS (0.84 MIPS/MHz).

As an enhancement of the ARM610, the ARM 710 offers an increased cache size (8K rather than 4K), clock frequency increased to 40MHz, improved write buffer and larger TLB in the MMU.
Additionally, it supports CMOS/TTL inputs, Fastbus, and 3.3V power but these features are not used in the RiscPC.
Clocked at 40MHz, it offers about 36MIPS (0.9 MIPS/MHz); which when combined with the additional clock speed, it runs an appreciable amount faster than the ARM 610.

ARM 7500

The ARM7500 is a RISC based single-chip computer with memory and I/O control on-chip to minimise external components. The ARM7500 can drive LCD panels/VDUs if required, and it features power management. The video controller can output up to a 120MHz pixel rate, 32bit sound, and there are four A/D convertors on-chip for connection of joysticks etc.
The processor core is basically an ARM710 with a smaller (4K) cache.
The video core is a VIDC2.
The IO core is based upon the IOMD.
The memory/clock system is very flexible, designed for maximum uses with minimum fuss. Setting up a system based upon the ARM7500 should be fairly simple.

ARM 7500FE

A version of the ARM 7500 with hardware floating point support.

StrongARM / SA110 (v4)

The StrongARM took the RiscPC from around 40MHz to 200-300MHz and showed a speed boost that was more than the hardware should have been able to support. Still severely bottlednecked by the memory and I/O, the StrongARM made the RiscPC fly. The processor was the first to feature different instruction and data caches, and this caused quite a lot of self-modifying code to fail including, amusingly, Acorn's own runtime compression system. But on the whole, the incompatibilities were not more painful than an OS upgrade (anybody remember the RISC OS 2 to RISC OS 3 upgrade, and all the programs that used SYS OS_UpdateMEMC, 64, 64 for a speed boost froze the machine solid!).
In instruction terms, the StrongARM can offer half-word loads and stores, and signed half-word and byte loads and stores. Also provided are instructions for multiplying two 32 bit values (signed or unsigned) and replying with a 64 bit result. This is documented in the ARM assembler user guide as only working in 32-bit mode, however experimentation will show you that they work in 26-bit mode as well. Later documentation confirms this.
The cache has been split into separate instruction and data cache (Harvard architecture), with both of these caches being 16K, and the pipeline is now five stages instead of three.
In terms of performance... at 100MHz, it offers 114MIPS which doubles to 228MIPS at 200MHz (1.14 MIPS/MHz).

In order to squeeze the maximum from a RiscPC, the Kinetic includes fast RAM on the processor card itself, as well as a version of RISC OS that installs itself on the card. Apparently it flies due to removing the memory bottleneck, though this does cause 'issues' with DMA expansion cards.

SA1100 variant

This is a version of the SA110 designed primarily for portable applications. I mention it here as I am reliably informed that the SA1100 is the processor inside the 'faster' Panasonic satellite digibox. It contains the StrongARM core, MMU, cache, PCMCIA, general I/O controller (including two serial ports), and a colour/greyscale LCD controller. It runs at 133MHz or 200MHz and it consumes less than half a watt of power.

Thumb

The Thumb instruction set is a reworking of the ARM set, with a few things omitted. Thumb instructions are 16 bits (instead of the usual 32 bit). This allows for greater code density in places where memory is restricted. The Thumb set can only address the first eight registers, and there are no conditional execution instructions. Also, the Thumb cannot do a number of things required for low-level processor exceptions, so the Thumb instruction set will always come alongside the full ARM instruction set. Exceptions and the like can be handled in ARM code, with Thumb used for the more regular code.

Other versions

These versions are afforded less coverage due, mainly, to my not owning nor having access to any of these versions.
While my site started as a way to learn to program the ARM under RISC OS, the future is in embedded devices using these new systems, rather than the old 26 bit mode required by RISC OS...
...and so, these processors are something I would like to detail, in time.

M variants

This is an extension of the version three design (ARM 6 and ARM 7) that provides the extended 64 bit multiply instructions.
These instructions became a main part of the instruction set in the ARM version 4 (StrongARM, etc).

T variants

These processors include the Thumb instruction set (and, hence, no 26 bit mode).

E variants

These processors include a number of additional instructions which provide improved performance in typical DSP applications. The 'E' standing for "Enchanced DSP".

The future

The future is here. Newer ARM processors exist, but they are 32 bit devices.
This means, basically, that RISC OS won't run on them until all of RISC OS is modified to be 32 bit safe. As long as BASIC is patched, a reasonable software base will exist. However all C programs will need to be recompiled. All relocatable modules will need to be altered. And pretty much all assembler code will need to be repaired. In cases where source isn't available (ie, anything written by Computer Concepts), it will be a tedious slog.
It is truly one of the situations that could make or break the platform.

I feel, as long as a basic C compiler/linker is made FREELY available, then we should go for it. It need not be a 'good' compiler, as long as it will be a drop-in replacement for Norcroft CC version 4 or 5. Why this? Because RISC OS depends upon enthusiasts to create software, instead of big corporations. And without inexpensive reasonable tools, they might decide it is too much to bother with converting their software, so may decide to leave RISC OS and code for another platform.

I, personally, would happily download a freebie compiler/linker and convert much of my own code. It isn't plain sailing for us - think of all of the library code that needs to be checked. It will be difficult enough to obtain a 32 bit machine to check the code works correctly, never mind all the other pitfalls. Asking us for a grand to support the platform is only going to turn us away in droves. Heck, I'm still using ARM 2 and ARM 3 systems. Some of us smaller coders won't be able to afford such a radical upgrade. And that will be VERY BAD for the platform. Look how many people use the FREE user-created Internet suite in preference to commercial alternatives. Look at all of the support code available on Arcade BBS. Much of that will probably go, yes. But would a platform trying to re-establish itself really want to say goodbye to the rest?
I don't claim my code is wonderful, but if only one person besides myself makes good use of it - then it has been worth it.

Introduction

The processor (CPU, for Central Processing Unit) is the computer's brain. It allows the processing of numeric data, meaning information entered in binary form, and the execution of instructions stored in memory.

The first microprocessor (Intel 4004) was invented in 1971. It was a 4-bit calculation device with a speed of 108 kHz. Since then, microprocessor power has grown exponentially. So what exactly are these little pieces of silicone that run our computers?

Operation

The processor (called CPU, for Central Processing Unit) is an electronic circuit that operates at the speed of an internal clock thanks to a quartz crystal that, when subjected to an electrical currant, send pulses, called "peaks". The clock speed (also called cycle), corresponds to the number of pulses per second, written in Hertz (Hz). Thus, a 200 MHz computer has a clock that sends 200,000,000 pulses per second. Clock frequency is generally a multiple of the system frequency (FSB, Front-Side Bus), meaning a multiple of the motherboard frequency.

With each clock peak, the processor performs an action that corresponds to an instruction or a part thereof. A measure called CPI (Cycles Per Instruction) gives a representation of the average number of clock cycles required for a microprocessor to execute an instruction. A microprocessorâ€™s power can thus be characterized by the number of instructions per second that it is capable of processing. MIPS (millions of instructions per second) is the unit used and corresponds to the processor frequency divided by the CPI.

Instructions

An instruction is an elementary operation that the processor can accomplish. Instructions are stored in the main memory, waiting to be processed by the processor. An instruction has two fields:

the operation code, which represents the action that the processor must execute;
the operand code, which defines the parameters of the action. The operand code depends on the operation. It can be data or a memory address.

Operation Code

Operand Field

The number of bits in an instruction varies according to the type of data (between 1 and 4 8-bit bytes).

Instructions can be grouped by category, of which the main ones are:

Memory Access: accessing the memory or transferring data between registers.
Arithmetic Operations: operations such as addition, subtraction, division or multiplication.
Logic Operations: operations such as AND, OR, NOT, EXCLUSIVE NOT, etc.
Control: sequence controls, conditional connections, etc.

Registers

When the processor executes instructions, data is temporarily stored in small, local memory locations of 8, 16, 32 or 64 bits called registers. Depending on the type of processor, the overall number of registers can vary from about ten to many hundreds.

The main registers are:

the accumulator register (ACC), which stores the results of arithmetic and logical operations;
the status register (PSW, Processor Status Word), which holds system status indicators (carry digits, overflow, etc.);
the instruction register (RI), which contains the current instruction being processed;
the ordinal counter (OC or PC for Program Counter), which contains the address of the next instruction to process;
the buffer register, which temporarily stores data from the memory.

Cache Memory

Cache memory (also called buffer memory) is local memory that reduces waiting times for information stored in the RAM (Random Access Memory). In effect, the computer's main memory is slower than that of the processor. There are, however, types of memory that are much faster, but which have a greatly increased cost. The solution is therefore to include this type of local memory close to the processor and to temporarily store the primary data to be processed in it. Recent model computers have many different levels of cache memory:

Level one cache memory (called L1 Cache, for Level 1 Cache) is directly integrated into the processor. It is subdivided into two parts:

the first part is the instruction cache, which contains instructions from the RAM that have been decoded as they came across the pipelines.
the second part is the data cache, which contains data from the RAM and data recently used during processor operations.

Level 1 caches can be accessed very rapidly. Access waiting time approaches that of internal processor registers.

Level two cache memory (called L2 Cache, for Level 2 Cache) is located in the case along with the processor (in the chip). The level two cache is an intermediary between the processor, with its internal cache, and the RAM. It can be accessed more rapidly than the RAM, but less rapidly than the level one cache.
Level three cache memory (called L3 Cache, for Level 3 Cache) is located on the motherboard.

All these levels of cache reduce the latency time of various memory types when processing or transferring information. While the processor works, the level one cache controller can interface with the level two controller to transfer information without impeding the processor. As well, the level two cache interfaces with the RAM (level three cache) to allow transfers without impeding normal processor operation.

Control Signals

Control signals are electronic signals that orchestrate the various processor units participating in the execution of an instruction. Control signals are sent using an element called a sequencer. For example, the Read / Write signal allows the memory to be told that the processor wants to read or write information.

Functional Units

The processor is made up of a group of interrelated units (or control units). Microprocessor architecture varies considerably from one design to another, but the main elements of a microprocessor are as follows:

A control unit that links the incoming data, decodes it, and sends it to the execution unit:The control unit is made up of the following elements:

sequencer (or monitor and logic unit) that synchronizes instruction execution with the clock speed. It also sends control signals;
ordinal counter that contains the address of the instruction currently being executed;
instruction register that contains the following instruction.

An execution unit (or processing unit) that accomplishes tasks assigned to it by the instruction unit. The execution unit is made of the following elements:

The arithmetical and logic unit (written ALU). The ALU performs basic arithmetical calculations and logic functions (AND, OR, EXCLUSIVE OR, etc.);
The floating point unit (written FPU) that performs partial complex calculations which cannot be done by the arithmetical and logic unit.
The status register;
The accumulator register.

A bus management unit (or input-output unit) that manages the flow of incoming and outgoing information and that interfaces with system RAM;

Transistor

To process information, the microprocessor has a group of instructions, called the "instruction set", made possible by electronic circuits. More precisely, the instruction set is made with the help of semiconductors, little "circuit switches" that use the transistor effect, discovered in 1947 by John Barden, Walter H. Brattain and William Shockley who received a Nobel Prize in 1956 for it.

A transistor (the contraction of transfer resistor) is an electronic semi-conductor component that has three electrodes and is capable of modifying current passing through it using one of its electrodes (called control electrode). These are referred to as "active components", in contrast to "passive components", such as resistance or capacitors which only have two electrodes (referred to as being "bipolar").

A MOS (metal, oxide, silicone) transistor is the most common type of transistor used to design integrated circuits. MOS transistors have two negatively charged areas, respectively called source (which has an almost zero charge) and drain (which has a 5V charge), separated by a positively charged region, called a substrate). The substrate has a control electrode overlaid, called a gate, that allows a charge to be applied to the substrate.

When there is no charge on the control electrode, the positively charged substrate acts as a barrier and prevents electron movement from the source to the drain. However, when a charge is applied to the gate, the positive charges of the substrate are repelled and a negatively charged communication channel is opened between the source and the drain.

The transistor therefore acts as a programmable switch, thanks to the control electrode. When a charge is applied to the control electrode, it acts as a closed interrupter and, when there is no charge, it acts as an open interrupter.

Integrated Circuits

Once combined, transistors can make logic circuits, that, when combined, form processors. The first integrated circuit dates back to 1958 and was built by Texas Instruments.

MOS transistors are therefore made of slices of silicone (called wafers) obtained after multiple processes. These slices of silicone are cut into rectangular elements to form a "circuit". Circuits are then placed in cases with input-output connectors and the sum of these parts makes an "integrated circuit". The minuteness of the engraving, written in microns (micrometers, written Âµm) defines the number of transistors per surface unit. There can be millions of transistors on one single processor.

Moore's Law, penned in 1965 by Gordon E. Moore, cofounder of Intel, predicted that processor performance (by extension of the number of transistors integrated in the silicone) would double every twelve months. This law was revised in 1975, bringing the number of months to 18. Mooreâ€™s Law is still being proven today.

Because the rectangular case contains input-output pins that resemble legs, the term "electronic flea" is used in French to refer to integrated circuits.

Families

Each type of processor has its own instruction set. Processors are grouped into the following families, according to their unique instruction sets:

80x86: the "x" represents the family. Mention is therefore made to 386, 486, 586, 686, etc.
ARM
IA-64
MIPS
Motorola 6800
PowerPC
SPARC

This explains why a program produced for a certain type of processor can only work directly on a system with another type of processor if there is instruction translation, called emulation. The term "emulator" is used to refer to the program performing this translation.

Instruction Set

An instruction set is the sum of basic operations that a processor can accomplish. A processorâ€™s instruction set is a determining factor in its architecture, even though the same architecture can lead to different implementations by different manufacturers.

The processor works efficiently thanks to a limited number of instructions, hardwired to the electronic circuits. Most operations can be performed using basic functions. Some architecture does, however, include advanced processor functions.

CISC Architecture

CISC (Complex Instruction Set Computer) architecture means hardwiring the processor with complex instructions that are difficult to create using basic instructions.

CISC is especially popular in 80x86 type processors. This type of architecture has an elevated cost because of advanced functions printed on the silicone.

Instructions are of variable length and may sometimes require more than one clock cycle. Because CISC-based processors can only process one instruction at a time, the processing time is a function of the size of the instruction.

RISC Architecture

Processors with RISC (Reduced Instruction Set Computer) technology do not have hardwired, advanced functions.

Programs must therefore be translated into simple instructions which complicates development and/or requires a more powerful processor. Such architecture has a reduced production cost compared to CISC processors. In addition, instructions, simple in nature, are executed in just one clock cycle, which speeds up program execution when compared to CISC processors. Finally, these processors can handle multiple instructions simultaneously by processing them in parallel.

Technological Improvements

Throughout time, microprocessor manufacturers (called founders) have developed a certain number of improvements that optimize processor performance.

Parallel Processing

Parallel processing consists of simultaneously executing instructions from the same program on different processors. This involves dividing a program into multiple processes handled in parallel in order to reduce execution time.

This type of technology, however, requires synchronization and communication between the various processes, like the division of tasks in a business: work is divided into small discrete processes which are then handled by different departments. The operation of an enterprise may be greatly affected when communication between the services does not work correctly.

Pipelining

Pipelining is technology that improves instruction execution speed by putting the steps into parallel.

To understand the pipelineâ€™s mechanism, it is first necessary to understand the execution phases of an instruction. Execution phases of an instruction for a processor with a 5-step "classic" pipeline are as follows:

FETCH: (retrieves the instruction from the cache;
DECODE: decodes the instruction and looks for operands (register or immediate values);
EXECUTE: performs the instruction (for example, if it is an ADD instruction, addition is performed, if it is a SUB instruction, subtraction is performed, etc.);
MEMORY: accesses the memory, and writes data or retrieves data from it;
WRITE BACK (retire): records the calculated value in a register.

Instructions are organized into lines in the memory and are loaded one after the other.

Thanks to the pipeline, instruction processing requires no more than the five preceding steps. Because the order of the steps is invariable (FETCH, DECODE, EXECUTE, MEMORY, WRITE BACK), it is possible to create specialized circuits in the processor for each one.

The goal of the pipeline is to perform each step in parallel with the preceding and following steps, meaning reading an instruction (FETCH) while the previous step is being read (DECODE), while the step before that is being executed (EXECUTE), while the step before that is being written to the memory (MEMORY), and while the first step in the series is being recorded in a register (WRITE BACK).

In general, 1 to 2 clock cycles (rarely more) for each pipeline step or a maximum of 10 clock cycles per instruction should be planned for. For two instructions, a maximum of 12 clock cycles are necessary (10+2=12 instead of 10*2=20) because the preceding instruction was already in the pipeline. Both instructions are therefore being simultaneously processed, but with a delay of 1 or 2 clock cycles. For 3 instructions, 14 clock cycles are required, etc.

The principle of a pipeline may be compared to a car assembly line. The car moves from one workstation to another by following the assembly line and is completely finished by the time it leaves the factory. To completely understand the principle, the assembly line must be looked at as a whole, and not vehicle by vehicle. Three hours are required to produce each vehicle, but one is produced every minute!

It must be noted that there are many different types of pipelines, varying from 2 to 40 steps, but the principle remains the same.

Superscaling

Superscaling consists of placing multiple processing units in parallel in order to process multiple instructions per cycle.

HyperThreading

HyperThreading (written HT) technology consists of placing two logic processors with a physical processor. Thus, the system recognizes two physical processors and behaves like a multitasking system by sending two simultaneous threads, referred to as SMT (Simultaneous Multi Threading). This "deception" allows processor resources to be better employed by guaranteeing the bulk shipment of data to the processor

What Is CPU Overclocking?

While the words CPU and microprocessor are used interchangeably, in the world of personal computers (PC), a microprocessor is actually a silicon chip that contains a CPU. At the heart of all personal computers sits a microprocessor that controls the logic of almost all digital devices, from clock radios to fuel-injection systems for automobiles. The three basic characteristics that differentiate microprocessors are the following:

Instruction set: The set of instructions that the microprocessor can execute.
Bandwidth: The number of bits processed in a single instruction.
Clock speed: Given in megahertz (MHz), the clock speed determines how many instructions per second the processor can execute.

The higher the value, the more powerful the CPU. For example, a 32-bit microprocessor that runs at 50MHz is more powerful than a 16-bit microprocessor that runs at 25MHz.

If you think overclocking sounds like an ominous term, you have the right idea. Basically overclocking means to run a microprocessor faster than the clock speed for which it has been tested and approved. Overclocking is a popular technique for getting a little performance boost from your system, without purchasing any additional hardware. Because of the performance boost overclocking, is very popular among hardcore 3D gamers.

Most times overclocking will result in a performance boost of 10 percent or less. For example, a computer with an Intel Pentium III processor running at 933MHz could be configured to run at speeds equivalent to a Pentium III 1050MHz processor by increasing the bus speed on the motherboard. Overclocking will not always have the exact same results. Two identical systems being overclocked most likely will not produce the same results. One will usually always overclock better than the other.

To overclock your CPU you must be quite familiar with hardware, and it is always a procedure conducted at your own risk. When overclocking there are some problems and issues you'll have to deal with, such as heat. An overclocked CPU will have an increased heat output, which means you have to look at additional cooling methods to ensure proper cooling of an overclocked CPU. Standard heat sinks and fans will generally not support an overclocked system. Additionally, you also have to have some understanding of the different types of system memory. Even though your CPU can be overclocked, it doesn't mean your RAM modules will support the higher speeds.

Common CPU Overclocking Methods
The most common methods of overclocking your CPU is to either raise the multiplier or raise the FSB (frontside bus) — while not the only options they are the most common. To understand overclocking, you have to understand the basics of CPU speeds. The speed of a CPU is measured in Megahertz (MHz) or Gigahertz (GHz). This represents the number of clock cycles that can be performed per second. The more clock cycles your CPU can do, the faster it processes information.

The formula for processor speed is: frontside bus x multiplier = processor speed.

Example:
(1) Pentium III 450MHz
The CPU runs at 450 million clock cycles per second. The CPU runs at at a speed of 450 megahertz. Using our processor speed equation we have: 100MHz (frontside bus) x 4.5 (multiplier) = 450MHz (processor speed)

The frontside bus connects the CPU to the main memory on the motherboard — basically, it's the conduit used by your entire system to communicate with your CPU. One caution with raising the FBS is that is can affect other system components. When you change the multiplier on a CPU, it will change only the CPU speed. If you change the FSB you are changing the speed at which all components of your system communicate with the CPU.

Using our example above, the multiplier is 4.5. Since valid multipliers end in .0 or .5, you could try increasing the multiplier to 5.0 to obtain a performance boost (which would result in 100MHz x 5.0 = 500MHz). By far the easiest way to overclock a CPU is to raise the multiplier, but this cannot be done all all systems. The multiplier on newer Intel CPUs cannot be adjusted, leaving Intel overclockers with the FSB overclocking method (because of this AMD is becoming more of a popular choice for overclockers). The equation formula doesn't change for the method of raise the FSB. In the example above the FSB was 100MHz. Raising it to 133Mhz would change the equation (133Mhz x 4.5 = 598.5 MHz).

Sometimes overclocking can be that simple -- other times it's not.

Depending on your motherboard, overclocking is done one of three ways: by changing jumper or dip-switch settings (from .on. and .off. or .close. and .open.), by changing some of the Chipset Features settings in your BIOS, or by using a combination of both. In overclocking you will need to know your hardware, plan your overclocking method, and, of course perform many tests once changes have been made. You may need to adjust your CPU voltage, and you will most likely have to try several settings before obtaining a successful and stable overclock result.

Overclocking Risks (and There Are Many)
Overclocking comes with many risks, such as overheating, so you should become familiar with all the pros and cons before you attempt it. Additionally, overclocking isn't supported by the major chip manufacturers which means overclocking your CPU will void your warranty. Overclocking can also decrease the lifespan of the CPU, cause failure in critical components and may even result in some data corruption. You may also notice an increase in unexplainable crashes and freezes.

You can find many complete step-by-step guides available online that detail the actual process of overclocking. If you've decided to take the plunge and overclock your CPU, we recommend you don't start with your only usable system (try using outdated and cheap hardware to practice with) and be sure to find a knowledgeable source and read some of the overclocking information and Web pages listed below in the links section to get you started in the right direction.

Did You Know...
"Multiplier locking forces the CPU to use a multiplier that is preset by the manufacturer. Intel has been quoted as saying they use multiplier locking to prevent unscrupulous retailers from overclocking processors to higher speeds, and selling overclocked systems to consumers for the same, higher price as the faster retail model."

Key Terms To Understanding Overclocking

CPU
Abbreviation of central processing unit. The CPU is the brains of the computer.

Overclock
To run a microprocessor faster than the speed for which it has been tested and approved.

frontside bus
The bus that connects the CPU to main memory on the motherboard.

More Overclocking Related Terms

clock speed
jumper
chipset
motherboard
bus
clock cycle

COMPUTER MEMORY

The system memory is the place where the computer holds current programs and data that are in use. There are various levels of computer memory (memory), including ROM, RAM, cache, page and graphics, each with specific objectives for system operation. This section focusses on the role of computer memory, and the technology behind it.

Although memory is used in many different forms around modern PC systems, it can be divided into two essential types: RAM and ROM. ROM, or Read Only Memory, is relatively small, but essential to how a computer works. ROM is always found on motherboards, but is increasingly found on graphics cards and some other expansion cards and peripherals. Generally speaking, ROM does not change. It forms the basic instruction set for operating the hardware in the system, and the data within remains intact even when the computer is shut down. It is possible to update ROM, but it's only done rarely, and at need. If ROM is damaged, the computer system simply cannot function.

RAM, or Random Access Memory, is "volatile." This means that it only holds data while power is present. RAM changes constantly as the system operates, providing the storage for all data required by the operating system and software. Because of the demands made by increasingly powerful operating systems and software, system RAM requirements have accelerated dramatically over time. For instance, at the turn of the millennium a typical computer may have only 128Mb of RAM in total, but in 2007 computers commonly ship with 2Gb of RAM installed, and may include graphics cards with their own additional 512Mb of RAM and more.

Clearly, modern computers have significantly more memory than the first PCs of the early 1980s, and this has had an effect on development of the PC's architecture. The trouble is, storing and retrieving data from a large block of memory is more time-consuming than from a small block. With a large amount of memory, the difference in time between a register access and a memory access is very great, and this has resulted in extra layers of cache in the storage hierarchy.

When accessing memory, a fast processor will demand a great deal from RAM. At worst, the CPU may have to waste clock cycles while it waits for data to be retrieved. Faster memory designs and motherboard buses can help, but since the 1990s "cache memory" has been employed as standard between the main memory and the processor. Not only this, CPU architecture has also evolved to include ever larger internal caches. The organisation of data this way is immensely complex, and the system uses ingenious electronic controls to ensure that the data the processor needs next is already in cache, physically closer to the processor and ready for fast retrieval and manipulation.

Read on for a closer look at the technology behind computer memory, and how developments in RAM and ROM have enabled systems to function with seemingly exponentially increasing power.

DDR Memory

Computer Memory Menu

L1 cache

The Level 1 cache, or primary cache, is on the CPU and is used for temporary storage of instructions and data organised in blocks of 32 bytes. Primary cache is the fastest form of storage. Because it's built in to the chip with a zero wait-state (delay) interface to the processor's execution unit, it is limited in size.

Level 1 cache is implemented using Static RAM (SRAM) and until recently was traditionally 16KB in size. SRAM uses two transistors per bit and can hold data without external assistance, for as long as power is supplied to the circuit. The second transistor controls the output of the first: a circuit known as a "flip-flop" - so-called because it has two stable states which it can flip between. This is contrasted to dynamic RAM (DRAM), which must be refreshed many times per second in order to hold its data contents.

SRAM is manufactured in a way rather similar to how processors are: highly integrated transistor patterns photo-etched into silicon. Each SRAM bit is comprised of between four and six transistors, which is why SRAM takes up much more space compared to DRAM, which uses only one (plus a capacitor). This, plus the fact that SRAM is also several times the cost of DRAM, explains why it is not used more extensively in PC systems.

Intel's P55 MMX processor, launched at the start of 1997, was noteworthy for the increase in size of its Level 1 cache to 32KB. The AMD K6 and Cyrix M2 chips launched later that year upped the ante further by providing Level 1 caches of 64KB. 64Kb has remained the standard L1 cache size, though various multiple-core processors may utilise it differently.

For all L1 cache designs the control logic of the primary cache keeps the most frequently used data and code in the cache and updates external memory only when the CPU hands over control to other bus masters, or during direct memory access by peripherals such as optical drives and sound cards.

Some chipsets, such as the Pentium based Triton FX (and later), support a "write back" cache rather than a "write through" cache. Write through happens when a processor writes data simultaneously into cache and into main memory (to assure coherency). Write back occurs when the processor writes to the cache and then proceeds to the next instruction. The cache holds the write-back data and writes it into main memory when that data line in cache is to be replaced. Write back offers about 10% higher performance than write-through, but cache that has this function is more costly. A third type of write mode, write through with buffer, gives similar performance to write back.

L2 cache

Most PCs are offered with a Level 2 cache to bridge the processor/memory performance gap. Level 2 cache - also referred to as secondary cache) uses the same control logic as Level 1 cache and is also implemented in SRAM.

Level 2 cache typically comes in two sizes, 256KB or 512KB, and can be found, or soldered onto the motherboard, in a Card Edge Low Profile (CELP) socket or, more recently, on a COAST ("cache on a stick") module. The latter resembles a SIMM but is a little shorter and plugs into a COAST socket, which is normally located close to the processor and resembles a PCI expansion slot. The Pentium Pro deviated from this arrangement, siting the Level 2 cache on the processor chip itself.

The aim of the Level 2 cache is to supply stored information to the processor without any delay (wait-state). For this purpose, the bus interface of the processor has a special transfer protocol called burst mode. A burst cycle consists of four data transfers where only the address of the first 64 are output on the address bus. The most common Level 2 cache is synchronous pipeline burst.

To have a synchronous cache a chipset, such as Triton, is required to support it. It can provide a 3-5% increase in PC performance because it is timed to a clock cycle. This is achieved by use of specialised SRAM technology which has been developed to allow zero wait-state access for consecutive burst read cycles. Pipelined Burst Static RAM (PB SRAM) has an access time in the range 4.5 to 8 nanoseconds (ns) and allows a transfer timing of 3-1-1-1 for bus speeds up to 133MHz. These numbers refer to the number of clock cycles for each access of a burst mode memory read. For example, 3-1-1-1 refers to three clock cycles for the first word and one cycle for each subsequent word.

For bus speeds up to 66MHz Synchronous Burst Static RAM (Sync SRAM) offers even faster performance, being capable of 2-1-1-1 burst cycles. However, with bus speeds above 66MHz its performance drops to 3-2-2-2, significantly slower than PB SRAM.

There is also asynchronous cache, which is cheaper and slower because it isn't timed to a clock cycle. With asynchronous SRAM, available in speeds between 12 and 20ns, all burst read cycles have a timing of 3-2-2-2 on a 50 to 66MHz CPU bus, which means that there are two wait-states for the lead-off cycle and one wait-state for the following three transfers of the burst cycle.

RAM - the Main Memory

A PC's third and principal level of system memory is referred to as main memory, or Random Access Memory (RAM). It is an impermanent source of data, but is the main memory area accessed by the hard disk. It acts, so to speak, as a staging post between the hard disk and the processor. The more data it is possible to have available in the RAM the faster the PC will run.

Main memory is attached to the processor via its address and data buses. Each bus consists of a number of electrical circuits or bits. The width of the address bus dictates how many different memory locations can be accessed, and the width of the data bus how much information is stored at each location. Every time a bit is added to the width of the address bus, the address range doubles. In 1985, Intel's 386 processor had a 32-bit address bus, enabling it to access up to 4GB of memory. The Pentium processor - introduced in 1993 - increased the data bus width to 64-bits, enabling it to access 8 bytes of data at a time.

Each transaction between the CPU and memory is called a bus cycle. The number of data bits a CPU is able to transfer during a single bus cycle affects a computer's performance and dictates what type of memory the computer requires. By the late 1990s, most desktop computers were using 168-pin DIMMs, which supported 64-bit data paths.

Main memory is built up using DRAM chips, short for Dynamic RAM. DRAM has been developed over the years on two main fronts: to be more compact, and to be faster to access. These developments are explored in the following pages of memory section - see the menu below.

DRAM

DRAM chips are large, rectangular arrays of memory cells with support logic that is used for reading and writing data in the arrays, and refresh circuitry to maintain the integrity of stored data. Memory arrays are arranged in rows and columns of memory cells called wordlines and bitlines, respectively. Each memory cell has a unique location or address defined by the intersection of a row and a column.

DRAM is manufactured using a similar process to how processors are: a silicon substrate is etched with the patterns that make the transistors and capacitors (and support structures) that comprise each bit. It costs much less than a processor because it is a series of simple, repeated structures, so there isn't the complexity of making a single chip with several million individually-located transistors and DRAM is cheaper than SRAM and uses half as many transistors. Over the years, several different structures have been used to create the memory cells on a chip, and in today's technologies the support circuitry generally includes:

sense amplifiers to amplify the signal or charge detected on a memory cell
address logic to select rows and columns
Row Address Select (/RAS) and Column Address Select (/CAS) logic to latch and resolve the row and column addresses and to initiate and terminate read and write operations
read and write circuitry to store information in the memory's cells or read that which is stored there
internal counters or registers to keep track of the refresh sequence, or to initiate refresh cycles as needed
Output Enable logic to prevent data from appearing at the outputs unless specifically desired.

A transistor is effectively a switch which can control the flow of current - either on, or off. In DRAM, each transistor holds a single bit: if the transistor is "open", and the current can flow, that's a 1; if it's closed, it's a 0. A capacitor is used to hold the charge, but it soon escapes, losing the data. To overcome this problem, other circuitry refreshes the memory, reading the value before it disappears completely, and writing back a pristine version. This refreshing action is why the memory is called dynamic. The refresh speed is expressed in nanoseconds (ns) and it is this figure that represents the "speed" of the RAM. Most Pentium-based PCs use 60 or 70ns RAM.

The process of refreshing actually interrupts/slows down the accessing of the data but clever cache design minimises this. However, as processor speeds passed the 200MHz mark, no amount of cacheing could compensate for the inherent slowness of DRAM and other, faster memory technologies have largely superseded it.

DRAM Timing and Signals

The most difficult aspect of working with DRAM devices is resolving the timing requirements. DRAMs are generally asynchronous, responding to input signals whenever they occur. As long as the signals are applied in the proper sequence, with signal durations and delays between signals that meet the specified limits, the DRAM will work properly. These are few in number, comprising:

Row Address Select: The /RAS circuitry is used to latch the row address and to initiate the memory cycle. It is required at the beginning of every operation. /RAS is active low; that is, to enable /RAS, a transition from a high voltage to a low voltage level is required. The voltage must remain low until /RAS is no longer needed. During a complete memory cycle, there is a minimum amount of time that /RAS must be active, and a minimum amount of time that /RAS must be inactive, called the /RAS precharge time. /RAS may also be used to trigger a refresh cycle (/RAS Only Refresh, or ROR).
Column Address Select: /CAS is used to latch the column address and to initiate the read or write operation. /CAS may also be used to trigger a /CAS before /RAS refresh cycle. This refresh cycle requires /CAS to be active prior to /RAS and to remain active for a specified time. It is active low. The memory specification lists the minimum amount of time /CAS must remain active to initiate a read or write operation. For most memory operations, there is also a minimum amount of time that /CAS must be inactive, called the /CAS precharge time. (An ROR cycle does not require /CAS to be active.)
Address: The addresses are used to select a memory location on the chip. The address pins on a memory device are used for both row and column address selection (multiplexing). The number of addresses depends on the memory's size and organisation. The voltage level present at each address at the time that /RAS or /CAS goes active determines the row or column address, respectively, that is selected. To ensure that the row or column address selected is the one that was intended, set up and hold times with respect to the /RAS and /CAS transitions to a low level are specified in the DRAM timing specification.
Write Enable: The /WE signal is used to choose a read operation or a write operation. A low voltage level signifies that a write operation is desired; a high voltage level is used to choose a read operation. The operation to be performed is usually determined by the voltage level on /WE when /CAS goes low (Delayed Write is an exception). To ensure that the correct operation is selected, set up and hold times with respect to /CAS are specified in the DRAM timing specification.
Output Enable: During a read operation, this control signal is used to prevent data from appearing at the output until needed. When /OE is low, data appears at the data outputs as soon as it is available. /OE is ignored during a write operation. In many applications, the /OE pin is grounded and is not used to control the DRAM timing.
Data In or Out: The DQ pins (also called Input/Output pins or I/Os) on the memory device are used for input and output. During a write operation, a voltage (high=1, low=0) is applied to the DQ. This voltage is translated into the appropriate signal and stored in the selected memory cell. During a read operation, data read from the selected memory cell appears at the DQ once access is complete and the output is enabled (/OE low). At most other times, the DQs are in a high impedance state; they do not source or sink any current, and do not present a signal to the system. This also prevents DQ contention when two or more devices share the data bus.

Fast Page Mode DRAM

All types of memory are addressed as an array of rows and columns, and individual bits are stored in each cell of the array. With standard DRAM or FPM DRAM, which comes with access times of 70ns or 60ns, the memory management unit reads data by first activating the appropriate row of the array, activating the correct column, validating the data and transferring the data back to the system. The column is then deactivated, which introduces an unwanted wait state where the processor has to wait for the memory to finish the transfer. The output data buffer is then turned off, ready for the next memory access.

At best, with this scheme FPM can achieve a burst rate timing as fast as 5-3-3-3. This means that reading the first element of data takes five clock cycles, containing four wait-states, with the next three elements each taking three.

DRAM speed improvements have historically come from process and photolithography advances. More recent improvements in performance however have resulted from changes to the base DRAM architecture that require little or no increase in die size. Extended Data Out (EDO) memory is an example of this.

EDO DRAM

Extended Data Out DRAM comes in 70ns, 60ns and 50ns speeds. 60ns is the slowest that should be used in a 66MHz bus speed system (i.e. Pentium 100MHz and above) and the Triton HX and VX chipsets can also take advantage of the 50ns version. EDO DRAM doesn't demand that the column be deactivated and the output buffer turned off before the next data transfer starts. It therefore achieves a typical burst timing of 5-2-2-2 at a bus speed of 66MHz and can complete some memory reads a theoretical 27% faster than FPM DRAM.

BEDO RAM

Burst EDO DRAM is an evolutionary improvement in EDO DRAM that contains a pipeline stage and a 2-bit burst counter. With the conventional DRAMs such as FPM and EDO, the initiator accesses DRAM through a memory controller. The controller must wait for the data to become ready before sending it to the initiator. BEDO eliminates the wait-states thus improving system performance by up to 100% over FPM DRAM and up to 50% over standard EDO DRAM, achieving system timings of 5-1-1-1 when used with a supporting chipset.

Despite the fact that BEDO arguably provides more improvement over EDO than EDO does over FPM the standard has lacked chipset support and has consequently never really caught on, losing out to Synchronous DRAM (SDRAM).

SDRAM

The more recent Synchronous DRAM memory works quite differently from other memory types. It exploits the fact that most PC memory accesses are sequential and is designed to fetch all the bits in a burst as fast as possible. With SDRAM an on-chip burst counter allows the column part of the address to be incremented very rapidly which helps speed up retrieval of information in sequential reads considerably. The memory controller provides the location and size of the block of memory required and the SDRAM chip supplies the bits as fast as the CPU can take them, using a clock to synchronise the timing of the memory chip to the CPU's system clock.

This key feature of SDRAM gives it an important advantage over other, asynchronous memory types, enabling data to be delivered off-chip at burst rates of up to 100MHz. Once the burst has started all remaining bits of the burst length are delivered at a 10ns rate. At a bus speed of 66MHz SDRAMs can reduce burst rates to 5/1/1/1. The first figure is higher than the timings for FPM and EDO RAM because more setting up is required for the initial data transfer. Even so, there's a theoretical improvement of 18% over EDO for the right type of data transfers.

However, since no reduction in the initial access is gained, it was not until the release of Intel's 440BX chipset, in early 1998, that the benefit of 100MHz page cycle time was fully exploited. However, even SDRAM cannot be considered as anything more than a stop-gap product as the matrix interconnection topology of the legacy architecture of SDRAM makes it difficult to move to frequencies much beyond 100MHz. The legacy pin function definition - separate address, control and data/DQM lines - controlled by the same clock source leads to a complex board layout with difficult timing margin issues. The 100MHz layout and timing issues might be addressed by skilful design, but only through the addition of buffering registers, which increases lead-off latency and adds to power dissipation and system cost.

Beyond 100MHz SDRAM, the next step in the memory roadmap was supposed to have been Direct Rambus DRAM (DRDRAM). According to Intel, the only concession to a transition period was to have been the S-RIMM specification, which allows PC100 SDRAM chips to use Direct RDRAM memory modules. However, driven by concerns that the costly Direct RDRAM would add too much to system prices, with the approach of 1999 there was a significant level of support for a couple of transitionary memory technologies.

PC133 SDRAM

Although most of the industry agrees that Rambus is an inevitable stage in PC development, PC133 SDRAM is seen as a sensible evolutionary technology and one that confers a number of advantages that make it attractive to chip makers unsure of how long interest in Direct RDRAM will take to materialise. Consequently, in early 1999, a number of non-Intel chipset makers decided to release chipsets that supported the faster PC133 SDRAM.

PC133 SDRAM is capable of transferring data at up to 1.6 GBps - compared with the hitherto conventional speeds of up to 800 MBps - requires no radical changes in motherboard engineering, has no price premium on the memory chips themselves and has no problems in volume supply. With the scheduled availability of Direct RDRAM reportedly slipping, it appeared that Intel had little option than to support PC133 SDRAM, especially given the widespread rumours that chipset and memory manufacturers were working with AMD to ensure that their PC133 SDRAM chips will work on the fast bus on the forthcoming K6-III processor.

At the beginning of 2000, NEC begun sampling 128MB and 256MB SDRAM memory modules utilising the company's unique performance-enhancing Virtual Channel Memory (VCM) technology, first announced in 1997. Fabricated with an advanced 0.18-micron process and optimised circuit layout and compliant with the PC133 SDRAM standard, VCM SDRAMs achieve high-speed operation with a read latency of 2 at 133MHz (7.5ns) and are package-and pin-compatible with standard SDRAMs.

VCRAM

The VCM architecture increases the memory bus efficiency and performance of any DRAM technology by providing a set of fast static registers between the memory core and I/O pins, resulting in reduced data access latency and reduced power consumption. Each data request from a memory master contains separate and unique characteristics. With conventional SDRAM multiple requests from multiple memory masters can cause page trashing and bank conflicts, which result in low memory bus efficiency. The VCM architecture assigns virtual channels to each memory master. Maintaining the individual characteristics of each memory master's request in this way enables the memory device to be able to read, write and refresh in parallel operations, thus speeding-up data transfer rates.

Continuing delays with Rambus memory as well as problems with its associated chipsets finally saw Intel bow to the inevitable in mid-2000 with the release of its 815/815E chipsets - its first to provide support for PC133 SDRAM.

DDR SDRAM

Double Data Rate (DDR) SDRAM is the other competing memory technology battling to provide system builders with a high-performance alternative to Direct RDRAM. As in standard SDRAM, DDR SDRAM is tied to the system's FSB, the memory and bus executing instructions at the same time rather than one of them having to wait for the other.

Traditionally, to synchronise logic devices, data transfers would occur on a clock edge. As a clock pulse oscillates between 1 and 0, data would be output on either the rising edge (as the pulse changes from a "0" to a "1") or on the falling edge. DDR DRAM works by allowing the activation of output operations on the chip to occur on both the rising and falling edge of the clock, thereby providing an effective doubling of the clock frequency without increasing the actual frequency.

DDR-DRAM first broke into the mainstream PC arena in late 1999, when it emerged as the memory technology of choice on graphics cards using nVidia's GeForce 256 3D graphics chip. Lack of support from Intel delayed its acceptance as a main memory technology. Indeed, when it did begin to be used as PC main memory, it was no thanks to Intel. This was late in 2000 when AMD rounded off what had been an excellent year for the company by introducing DDR-DRAM to the Socket A motherboard. While Intel appeared happy for the Pentium III to remain stuck in the world of PC133 SDRAM and expensive RDRAM, rival chipset maker VIA wasn't, coming to the rescue with the DDR-DRAM supporting Pro266 chipset.

By early 2001, DDR-DRAM's prospects had taken a major turn for the better, with Intel at last being forced to contradict its long-standing and avowed backing for RDRAM by announcing a chipset - codenamed "Brookdale" - that would be the company's first to support the DDR-DRAM memory technology. The i845 chipset duly arrived in mid-2001, although it was not before the beginning of 2002 that system builders would be allowed to couple it with DDR SDRAM.

DDR memory chips are commonly referred to by their data transfer rate. This value is calculated by doubling the bus speed to reflect the double data rate. For example, a DDR266 chip sends and receives data twice per clock cycle on a 133MHz memory bus. This results in a data transfer rate of 266MT/s (million transfers per second). Typically, 200MT/s (100MHz bus) DDR memory chips are called DDR200, 266MT/s (133MHz bus) chips are called DDR266, 333MT/s (166MHz bus) chips are called DDR333 chips, and 400MT/s (200MHz bus) chips are called DDR400.

DDR memory modules, on the other hand, are named after their peak bandwidth - the maximum amount of data they can deliver per second - rather than their clock rates. This is calculated by multiplying the amount of data a module can send at once (called the data path) by the speed the FSB is able to send it. The data path is measured in bits, and the FSB in MHz.

A PC1600 memory module (simply the DDR version of PC100 SDRAM) uses DDR200 chips and can deliver bandwidth of 1600MBps. PC2100 (the DDR version of PC133 SDRAM) uses DDR266 memory chips, resulting in 2100MBps of bandwidth. PC2700 modules use DDR333 chips to deliver 2700MBps of bandwidth and PC3200 - the fastest widely used form in late 2003 - uses DDR400 chips to deliver 3200MBps of bandwidth.

DDR400 was to prove the limit of DDR technology. In the spring of 2004, the next step in the evolutionary chain emerged, in the shape of the the new DDR2 memory architecture.

DDR2 SDRAM

The transition from DDR to DDR2 memory was more evolutionary than revolutionary, the DDR2 architecture being essentially the same as that of its predecessor, with a number of enhancements designed to provide greater bandwidth and other features that help reduce power consumption:

Higher memory transfer rates: DDR2 supports higher-speed transfer rates than DDR, which is officially found in 266 megatransfers per second (MTps), 333 MTps, and 400 MTps versions. Current DDR2 industry specifications call for 400 MTps, 533 MTps, 667 MTps and 800 MTps.
4n prefetch: DDR2 devices use a 4n prefetch architecture instead of the 2n prefetch used for DDR. Using DDR2, the internal DRAM core is designed to read or write data at four times the width of the device's external interface. In comparison, using DDR the internal DRAM core is designed to read or write data at twice the width of the device's external interface. As a result, the internal DRAM core's 4n prefetch architecture enables DDR2 to attain higher memory transfer rates than DDR.
Reduced voltage: DDR2 devices use a supply voltage of 1.8 volts versus 2.5 volts for DDR - almost a 30 percent decrease in supply voltage. This voltage scaling enhancement has the potential to reduce overall power requirements for the memory subsystem.

DDR2 memories work with higher latencies than DDR memories, meaning that they delay more clock cycles to deliver a requested item of data. For this reason, the performance advantage of DDR2 over DDR was not fully seen in practice until the emergence of DDR2-533 versions, in mid-2004.

DDR2 is not backward compatible with DDR. Although DDR2 modules are the same length as DDR modules, differences exist in connectors, signalling, and supply voltages. For example, on DDR memories the resistive termination necessary for making the memory work is located on the motherboard, while on DDR2 memories this circuit is located inside the memory chip itself. Also, DDR modules have 184 contacts, while DDR2 modules have 240. Thus it is not possible to install DDR2 memories in DDR sockets or vice-versa.

The following table shows the bus frequency, data transfer rate, and bandwidth associated with each type of DDR2 module:

	DDR2 type	Module speed grade	Data transfer rate	Bandwidth (per channel)
DDR2 at 400MHz	PC2-3200	200 MHz	400 MTps	3.2 GBps
DDR2 at 533 MHz	PC2-4300	266 MHz	533 MTps	4.3 GBps
DDR2 at 667 MHz	PC2-5300	333 MHz	667 MTps	5.3 GBps
DDR2 at 800 MHz	PC2-6400	400 MHz	800 MTps	6.4 GBps

As processor power increased relentlessly - both by means of increased clock rates and wider FSBs - chipset manufacturers were quick to embrace the benefits of a dual-channel memory architecture as a solution to the growing bandwidth imbalance.

Dual-channel DDR

The terminology "dual-channel DDR" is, in fact, a misnomer. The fact is there's no such thing as dual-channel DDR memory. What there are, however, are dual-channel platforms.

When properly used, the term "dual channel" refers to a DDR motherboards chipset that's designed with two memory channels instead of one. The two channels handle memory-processing more efficiently by utilising the theoretical bandwidth of the two modules, thus reducing system latencies, the timing delays that inherently occur with one memory module. For example, one controller reads and writes data while the second controller prepares for the next access, hence, eliminating the reset and setup delays that occur before one memory module can begin the read/write process all over again.

Consider an analogy in which data is filled into a funnel (memory), which then "channels" the data to the CPU.

Single-channel memory would feed the data to the processor via a single funnel at a maximum rate of 64 bits at a time. Dual-channel memory, on the other hand, utilises two funnels, thereby having the capability to deliver data twice as fast, at up to 128 bits at a time. The process works the same way when data is "emptied" from the processor by reversing the flow of data. A "memory controller" chip is responsible for handling all data transfers involving the memory modules and the processor. This controls the flow of data through the funnels, preventing them from being over-filled with data.

It is estimated that a dual-channel memory architecture is capable of increasing bandwidth by as much as 10%.

The majority of systems supporting dual-channel memory can be configured in either single-channel or dual-channel memory mode. The fact that a motherboard supports dual-channel DDR memory, does not guarantee that installed DIMMs will be utilised in dual-channel mode. It is not sufficient to just plug multiple memory modules into their sockets to get dual-channel memory operation - users need to follow specific rules when adding memory modules to ensure that they get dual-channel memory performance. Intel specifies that motherboards should default to single-channel mode in the event of any of these being violated:

DIMMs must be installed in pairs
Both DIMMs must use the same density memory chips
Both DIMMs must use the same DRAM bus width
Both DIMMs must be either single-sided or dual-sided.

DDR SDRAM DIMMs have 184 pins (as opposed to 168 pins on SDRAM, or, 240 pins on DDR2 SDRAM), and can be differentiated from SDRAM DIMMs by the number of notches (DDR SDRAM has one, SDRAM has two).

1T-SRAM

Historically, while much more cost effective than SRAM per Megabit, traditional DRAM has always suffered speed and latency penalties making it unsuitable for some applications. Consequently, product manufacturers have often been forced to opt for the more expensive, but faster SRAM technology. By 2000, however, system designers had another option available to them, and one that offers the best of both worlds: fast speed, low cost, high density and lower power consumption.

Though the inventor of 1T-SRAM - Monolithic System Technology Inc. (MoSys) - calls its design an SRAM, it is in fact based on single-transistor DRAM cells. As with any other DRAM, the data in these cells must be periodically refreshed to prevent data loss. What makes the 1T-SRAM unique is that it offers a true SRAM-style interface that hides all refresh operations from the memory controller.

Traditionally, SRAMs have been built using a bulky four or six transistor (4T, 6T) cell. The MoSys 1T-SRAM device is built on a single transistor (1T) DRAM cell, allowing a reduction in die size by between 50% and 80% compared to SRAMs of similar density. Moreover, its high density is achieved whilst at the same time maintaining the refresh-free interface and low latency random memory access cycle time associated with traditional six-transistor SRAM cells. As if these exceptional density and performance characters weren't enough, 1T-SRAM technology also offers dramatic power consumption savings by using under a quarter of the power of traditional SRAM memories!

1T-SRAM is an innovation that promises to dramatically change the balance between the two traditional memory technologies. At the very least it will provide DRAM makers with the opportunity to squeeze significantly more margin from their established DRAM processes.

Direct RDRAM

Conventional DRAM architectures have reached their practical upper limit in operating frequency and bus width. With mass market CPUs operating at over 300MHz and media processors executing more than 2GOPs, it is clear that their external memory bandwidth of approximately 533 MBps cannot meet increasing application demands. The introduction of Direct Rambus DRAM (DRDRAM) in 1999 is likely to prove one of the long term solutions to the problem.

Direct RDRAM is the result of a collaboration between Intel and a company called Rambus to develop a new memory system. It is a totally new RAM architecture, complete with bus mastering (the Rambus Channel Master) and a new pathway (the Rambus Channel) between memory devices (the Rambus Channel Slaves). Direct RDRAM is actually the third version of the Rambus technology. The original (Base) design ran at 600MHz and this was increased to 700MHz in the second iteration, known as Concurrent RDRAM.

A Direct Rambus channel includes a controller and one or more Direct RDRAMs connected together via a common bus - which can also connect to devices such as micro-processors, digital signal processors (DSPs), graphics processors and ASICs. The controller is located at one end, and the RDRAMS are distributed along the bus, which is parallel terminated at the far end. The two-byte wide channel uses a small number of very high speed signals to carry all address, data and control information at up to 800MHz. The signalling technology is called Rambus Signalling Logic. Each RSL signal wire has equal loading and fan-out is routed parallel to each other on the top trace of a PCB with a ground plane located on the layer underneath. Through continuous incremental improvement signalling data rates are expected to increase by about 100MHz a year to reach a speed of around 1000MHz by the year 2001.

At current speeds a single channel is capable of data transfer at 1.6 GBps and multiple channels can be used in parallel to achieve a throughput of up to 6.4 GBps. The new architecture will be capable of operating at a system bus speed of up to 133MHz.

Problems with both the Rambus technology and Intel's chipset supporting it - the i820 - delayed DRDRAM's appearance until late 1999 - much later than had been originally planned. As a result of the delays Intel had to provide a means for the 820 chipset to support SDRAM DIMMs as well as the new Direct RDRAM RIMM module. A consequence of this enforced compromise and the need for bus translation between the SDRAM DIMMs and the 820's Rambus interface, was that performance was slower than when the same DIMMs were used with the older 440BX chipset! Subsequently, the component which allowed the i820 to use SDRAM was found to be defective and resulted in Intel having to recall and replace all motherboards with the defective chip and to also swap out the SDRAM that had been used previously with far more expensive RDRAM memory!

And Intel's Rambus woes didn't stop there. The company was repeatedly forced to alter course in the face of continued market resistance to RDRAM and AMD's continuing success in embracing alternative memory technologies. In mid-2000, its 815/815E chipsets were the first to provide support for PC133 SDRAM and a year later it revealed that its forthcoming i845 chipset would provide support for both PC 133 SDRAM and DDR-DRAM on Pentium 4 systems.

SIMMs

Memory chips are generally packaged into small plastic or ceramic dual inline packages (DIPs) which are themselves assembled into a memory module. The single inline memory module or SIMM is a small circuit board design.

Wednesday, 28 July 2010

Thursday, 15 July 2010

Wednesday, 14 July 2010

Monday, 12 July 2010

Saturday, 10 July 2010

About Different Types of Processors

Types

Features

Function

Considerations

Size

Benefits

History

CPU Parts

Process

Types

Sizes

ARM 1 (v1)

ARM 250 (v2as)

ARM 250 mezzanine

ARM 7500

StrongARM / SA110 (v4)

SA1100 variant

Thumb

M variants

Introduction

Operation

Instructions

Registers

Cache Memory

Control Signals

Functional Units

Integrated Circuits

Families

Instruction Set

CISC Architecture

RISC Architecture

Technological Improvements

Parallel Processing

Pipelining

Superscaling

HyperThreading

Computer Memory Menu

L1 cache

L2 cache

RAM - the Main Memory

DRAM

DRAM Timing and Signals

Fast Page Mode DRAM

EDO DRAM

BEDO RAM

SDRAM

PC133 SDRAM

DDR SDRAM

DDR2 SDRAM

Dual-channel DDR

1T-SRAM

Direct RDRAM

SIMMs