Elbrus 2000

Original Article by:
Boris Babayan
Elbrus International, Moscow, Russia
ur.surble|nayabaB.siroB#ur.surble|nayabaB.siroB

E2K Technology and Implementation

For many years Elbrus team has been involved in the design and delivery of many generations of the most powerful Soviet computers. It has developed computers based on super-scalar, shared memory multiprocessing and EPIC architectures. The main goal has always been to create a computer architecture which is fast, compatible, reliable and secure.
Main technical achievements of Elbrus Line computers designed by the team are: high speed, full compatibility, trustworthiness (program security, hardware fault tolerance), low power consumption and dissipation, low cost.

elbrus3_small.jpg

The Soviet era computers are:

  • Elbrus-1 (1979): a super-scalar RISC processor with out-of-order execution, speculative execution and register renaming. Capability-based security with dynamic type checking. Ten-CPU shared memory multiprocessor.
  • Elbrus-2 (1984): a ten-processor supercomputer.
  • Elbrus-3 (1991): an EPIC based VLIW CPU. Sixteen-processor shared memory multiprocessor.
elbrus90_cabinet_small.jpg

The development continues with:

  • Elbrus-90micro (1998-2000): SPARC instruction set architecture (ISA) microprocessors MCST R80, R150, R500, R500S and MCST-4R working at 80, 150, 500 and 1000 MHz.
  • Elbrus-3M1 (2005): a 2-processor computer based on Elbrus 2000 microprocessor employing VLIW architecture working at 300 MHz. It is a further development of the Elbrus-3.
  • Elbrus МВ3S1/C (2009): a ccNUMA 4-processor computer based on Elbrus-S microprocessor working at 500 MHz.
mb3s_struk.gif

The latest Elbrus computers are:

  • Monoblock-KM4 (2012): an Elbrus-2C+ dual core 512-bit VLIW microprocessor running at 500 MHz, which includes 4 digital signal processor (DSP) cores.
e2c_shema.jpg

Our approach is ExpLicit Basic Resource Utilization Scheduling – ELBRUS.

Elbrus Instruction Structure

Elbrus instructions fully and explicitly control all hardware resources for the compiler to perform static scheduling. Thus, Elbrus instruction is a variable size wide instruction consisting of one mandatory header syllable and up to 15 optional instruction syllables, each controlling a specific resource.

Advantages of Elbrus Architecture

  • Performance, the highest speed with given computational resources
    • Excellent cost performance
    • Excellent performance for the given level of memory subsystem
    • Well-defined set of compiler optimization needed to reach the limit
    • Highly universal
    • Can better utilize a big number of transistors in future chips
    • Better suited for high clock frequency implementation
  • Simplicity:
    • More simple control logic
    • More simple and effective compiler optimization (explicit HW)
    • Easier and more reliable testing and HW correctness proof

Elbrus approach allows most efficient design of main data path resources (execution units, internal memories and interconnections without limitations from analysis scheduling hardware).

Support of Straight-Line Program

  • Wide instruction
  • Variable size instruction (decreased code fetch throughput)
  • Score-boarding
  • Multi-port register file (split RF)
  • Unified register file for integer and floating point units
  • Increased number of registers in a single procedure context window with variable size
  • Three independent register files for:
    • Integer and FP data, memory address pointers
    • Boolean predicates
  • HW implemented spill/fill mechanism (in separate hidden stack)
  • L1 Cache splitting

Support of Conditional Execution

  • Exclusion of control transfer from data dependency graph. No need to conditionally control transfer for implementation of conditional expressions semantics
  • Speculative execution explicitly program controlled
  • Hoisting LOADs and operations across the basic blocks
  • Predicated execution
  • A big number of Boolean predicates and corresponding operations (in parallel with arithmetic operations)
  • Elimination of output dependencies
  • Introduction of control transfer statements during optimization
  • Preparation to branch operations
  • Instruction cache pre-load
  • Removing control transfer condition from critical path (unzipping)
  • Short pipeline – fast branch
  • Programmable branch predictor

Loop Support

  • Loop overlapping
  • Basing register references
  • Basing predicate register references
  • Support of memory access of array elements (automatic reference pointer forwarding)
  • Array pre-fetch buffer
  • Loop unroll support
  • Loop control
  • Recurrent loop support (“shift-register”)

Circuit design

Advanced circuit design has been developed in Elbrus project to support extremely high clock frequency implementation. It introduces two new basic logic elements (besides traditional ones):

  • Universal self-reset logic with the following outstanding features
    • No losses for latches
    • No losses for clock skew
    • Time borrowing
    • Low power dissipation
  • Differential logic for high speed long distance signal transfer

This logic supports 25-30% better clock frequency compared to existing most advanced microprocessors.

Hardware Support of Binary Translation

Platform independent features:

  • Two virtual spaces
  • TLB design:
    • Write protection
    • Self-modifying code
    • I/O pages access
    • Protection
  • Call/return cache
  • Precise interrupt implementation (register context)

X86 platform specific features:

  • Integer arithmetic and logical primitives
  • Floating point arithmetic
  • Memory access (including memory models support)
  • LOCK prefix
  • Peripheral support

E2K Ensures Intel Compatibility Including:

  • Invisibility of the binary compiled code for original Intel code
  • Run-time code modifications
    • Run-time code creation
    • Self-modifying code
    • Code modification in MP system by other CPUs
    • Code modification by external sources (PCI, etc.)
    • Modification of executable in code file
  • Dynamic control transfer
  • Optimizations of memory access order
  • Proper interrupt handling
    • Asynchronous
    • Synchronous

Security

Elbrus security technology solves a critical problem of today – network security and full protection from viruses on the Internet. Besides, it provides a perfect condition for efficient debugging and facilitates advanced technology for system programming.

The principle for security is extremely simple: “You should not steal”. For information technology it implies that one should access only the data which one has created himself or which has been given to him from the outside with certain access rights.

All data are accessed through address information (references, pointers). If pointers are handled properly, the above said is valid and the system is secure. Unfortunately, it is impossible to statically check pointer handling correctness without imposing undue restrictions on programming. For full, strong and efficient dynamic control of explicit pointer handling with no restrictions on programming, HW support is required. This is what Elbrus provides.

Traditional Approaches

To avoid pointer check problems, Java just throws away explicit pointer handling. This makes the language non-universal and still it does not exclude the need for dynamic checking (e.g. for array ranges). C and C++ include explicit pointer handling, but for efficiency reasons exclude dynamics checks totally, which results in insecure programming.

Analysis of traditional approaches:

  1. Memory; languages have pointer types, but they are represented by regular integers that can be explicitly handled by the user. No check for proper pointer handling – no security in memory
  2. File-system; no pointer to a file data type. File reference is presented by a regular string. In order for the downloaded program to execute this reference, the file system root is made accessible to it. No protection in file-system – good condition for virus reproduction

Elbrus Approach

Elbrus hardware supports dynamic pointer checking. For this reason each pointer is marked with special type-bits. This does not lead to the use of non-standard DIMMs. By this way, perfect memory protection and debugging facilities are ensured. Using this technology we can run C and C++ programs in a fully secure environment, and Java becomes much more efficient.

File-system and Network Security

To use these ideas in the file-system and Internet areas, C and C++ need to be extended by introducing special data types – file or directory references. Now we can pass file references to the downloaded program, without providing access to the file-system root, full security is ensured.

E2K is fast, compatible, reliable and secure. It is a real Internet oriented microprocessor.

Source: Euro-Par 2000 Parallel Processing; 6th international Euro-Par Conference Munich, Germany, August/September 2000 Proceedings.

_e
Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License