본문 바로가기

Computer System

x86 Processor Architecture

- IA-32 is a basic model we could use to understand about computer architectures before learning about more advanced 64 bit processors.

 

Basic Microcomputer : CPU + Memory

- As mentioned in previous articles, the job of the CPU is to succesfully perform the 'fetch -> decode -> execute cycle'.

 

Clock

- Every operation is synchronized by the clock.

- Each step in instruction cycle takes one clock cycle.

- Specifically, each process is processed simutaneously within the clock period like a conveyer belt in a factory.

- This is called multi-stage pipelining.

 

- IA-32 processors are pipelined with 6-stage execution cycle.

- 'Bus Interface Unit(BIU)' is memory access & I/O access

- 'Code fetch unit' is getting instructions from the BIU & putting into the I-queue.

- 'Decode Unit' is decoding instructions from I-queue and translating into micro-code.

- 'Execution unit' is executing micro-code instructions.

- 'Segment Unit' is translating logical address into linear address.

- 'Paging Unit' is translating linear address into physical address.

- Segment Unit and Paging Unit are units that match the logical units into physical units.

 

6-stage non-pipelined processor

- If a processor is not pipelined it would be very inefficient.

- For k stages, it would take n*k cycles, which would be an waste of CPU.

 

 

6-stage pipelined processor

- In the case of a pipelined processor, CPU is used much more efficiently.

- For k stages, it would take k+(n-1) cycles.

 

 

Superscalar processor with 2 or more pipelines

- A superscalar processor could deal with instructions that would require more than 2 cycles.

- For example, in S4, each process requires 2 cycles.

- For k-stages, n-instructions, it would take (k+2n-1) cycles in total.

- This leads to the problem that S5 doesn't have any process to do in cycle 7,9.

 

 

u-pipe & v-pipeline

- To handle the problem mentioned above, we use u-pipeline and v-pipeline.

- We put two S4 processors each called S4-u, S4-v that would each handle odd numbered instructions and even numbered instructions.

- For k-stages, n-instructions, it would take (k+n) cycles in total.

 

 

- For multi-stage pipeline, the CPU has to fetch instructions from memory all the time.

- The big problem is that even if the memory is a critical component to execution speed, the performance gap between CPU and memory was huge.

- One way to solve this problem is to set a instruction queue for prefetch, so that when decoding, we could use the instruction queue instead of fetching it every time.

 

- Using memory hierarchy, with reasonable amount of resource, we can get the best performance of speed.

- The CPU controls the registers.

- The Cache memory does not have a object that controls it.

- The Main Memory is controlled by Operating system.

- The Disk is also controlled by Operating system.

 

- The 'Principle of Locality' is that a program access a relatively small portion of the address space at any instant of time.

- Two different types of locality is 'Temporal Locality(Locality in Time)', and 'Spatial Locality(Locality in Space)'.

- 'Temporal Locality(Locality in Time)' is that if an item is referenced, it will tend to be referenced again soon.

- 'Spatial Locality(Locality in Space)' is that if an item is referenced, items whose addresses are close by tend to be referenced soon.

 

 

 

- Cache Memory is also called SRAMs.

- Main Memory is also called DRAMs.

- When we need high speed memory, we use cache memory.

- L-1 cache exists inside the CPU.

- L-2 cache exists inside and outside the CPU.

- If the data to be read is already in cache memory, it is 'Cache hit', and when data to e read is not in cache memory, it is 'Cache miss'.

- The data should be cache hit when this data will be read frequently throughout processes in the future.

 

Program Execution

- After booting, Operating System controls a program.

- When we type a program name, OS search filename in current directory, and if found, retrieves file information.

- OS then determines next available location in memory to load program file into memory, and then branch to the first instruction of the file to process it.

- When the program is finished, Program removes file handle & memory space is allocated, and return to OS control.

 

 

 

Registers

- In 16-bit Intel Processor, there exists 16-bit registers which could be divided into general purpose, special purpose and hidden.

- AX(Accumulator), BX(Base register). CX(Counter register). DX(Data) are 16-bit general-purpose registers.

- AX, BX, CX, DX could be divided into AH, AL, BH, BL, CH, CL, DH, DL 8-bit general purpose registers.

- CS(Code segment), DS(Data segment), SS(Stack segment). ES(Extra segment) are segment registers.

- BP(Base Pointer), SP(Stack Pointer), SI(Source Index), DI(Destination Index) are index registers.

- IP(Instruction pointer), Flags(DF, CF, OF, SF, ZF, A, P) are a status/control registers.

- CS:IP(Segment + offset) form the complete address with segment + offset.

 

32-bit Intel Processor
64-bit Intel Processor

- 32-bit Intel Processor, 64-bit Intel Processor have slight changes, but overall have the same set of registers.

- It is crucial to have compatible mode when designing a new processor.

 

 

 

Memory Addressing

- Memory is divided into segments.

- A logical address notation is "<seg address> : <offset address>".

- For example, 60250 might be noted as 6000:0250.

 

- From 0000 - 003FFH, there exists interrupt vector table of 32-bit addresses.

- From 00400H - , software BIOS routines for managing KBD, Display, Disk exists.

 

 

 

Motherboard

- The motherboard holds many sockets, slots, chips, connectors.

 

 

Programming Levels

- To control an I/O device, there exist three levels of Input-Output Control.

- The first level is calling I/O drivers that communicate directly with devices.

- OS security may prevent application-level code from working at this level.

- The second level is calling OS functions for I/O control.

- The third level is calling library functions for I/O control, which may not always be the fastest.

- Assembly Language could perform input-output control on every level 1,2,3 and level 0(Hardware).