SlideShare una empresa de Scribd logo
1 de 80
The ARM Architecture
ARM
•Introduction and processor modes
•Instruction Set Architecture – I
•Instruction Set Architecture- II
•Pipelining in ARM
ARM
• ARM: Advanced RISC Machines
• Most widely used 32- bit RISC instruction set
  architecture
• The relative simplicity makes it suitable for low power
  devices
• ARM7, ARM9, ARM11 and Cortex
• Approximately 90% of all embedded 32-bit RISC
  processors
• Used extensively in consumer electronics,
  including PDAs, mobile phones, digital media and music
  players, hand-held game consoles, calculators and
  computer peripherals such as hard drives and routers.
Product Code Description
• M: Multiplier
  ARM processor have hardware multiplier unit doing
  multiplication
• I: Embedded ICE Macrocel
  Hardware circuit used to generate trace information. Used in
  advance debugging.
• E: Enhanced Instruction Set
• J: Java Acceleration by Jazelle mode
  Hardware circuit used for running JAVA byte code
• F: Vector Floating point
  Hardware implementation of floating operations.
• S: Synthesizable Version
  The ARM architecture can be modified as it comes in terms
  of soft processor core.
Example
• ARM7TDMI
 This is the ARM7 family processor which has T= Thumb
 instruction set, D= Debug Unit, M= MMU(Memory
 Management Unit), I= Embedded Trace core.
• ARM946E-S
  1. ARM9xx core
  2. Enhanced Instruction set
  3. Synthesizable
ARM
• ARM has 3 instruction set states
   1. 32-bit ARM instruction set
   2. 16-bit Thumb instruction set
   3. 8- bit Jazelle instruction set
• ARM – 32 bit Load/Store architecture with every instruction
  being conditional.
• Thumb- 16 bit with only branch instructions being conditional
  and only half of the registers used
• Jazelle- Allows Java byte code to be directly executed in ARM
  architecture. Improves performance by 5x-10x
ARM- Processor Modes
• Seven basic operating modes exist:
   1. User: Unprivileged mode under which most tasks run
   2. FIQ: Entered when a high priority interrupt is raised
   3. IRQ: Entered when a low priority interrupt is raised
   4. Supervisor: Entered on reset and when a software
      Interrupt instruction is executed
   5. Abort: Used to handle memory access violations
   6. Undef: Used to handle undefined instructions
   7. System: Privileged mode using the same registers as user
      mode.
Register Organization Summary
 User           FIQ       IRQ         SVC           Undef      Abort
    r0
    r1
                User
    r2         mode
     r3        r0-r7,
     r4         r15,       User         User         User        User      Thumb state
                and        mode         mode         mode        mode
     r5
                cpsr      r0-r12,      r0-r12,      r0-r12,     r0-r12,
                                                                           Low registers
     r6
                           r15,         r15,         r15,        r15,
     r7                     and          and          and         and
     r8          r8        cpsr         cpsr         cpsr        cpsr
     r9          r9
    r10          r10                                                       Thumb state
    r11          r11                                                       High registers
    r12          r12
 r13 (sp)      r13 (sp)   r13 (sp)    r13 (sp)      r13 (sp)    r13 (sp)
  r14 (lr)     r14 (lr)   r14 (lr)    r14 (lr)      r14 (lr)    r14 (lr)
 r15 (pc)

   cpsr
                spsr       spsr         spsr         spsr        spsr


Note: System mode uses the User mode register set
ARM- The Registers
• ARM has 37 registers all of which are 32-bits long.
    –   1 dedicated program counter
    –   1 dedicated current program status register
    –   5 dedicated saved program status registers
    –   30 general purpose registers

• The current processor mode governs which of several banks is
  accessible. Each mode can access
    –   a particular set of r0-r12 registers
    –   a particular r13 (the stack pointer, sp) and r14 (the link register, lr)
    –   the program counter, r15(pc)
    –   the current program status register, cpsr

   Privileged modes (except System) can also access
    – a particular spsr (saved program status register)
Program Status Registers
    31           28 27     24   23                  16 15                8   7   6   5   4              0

    NZ C VQ                 J          U n d e f i n e d                     I F T               mode
          f                             s                x                                   c
•        Condition code flags                               •   Interrupt Disable bits.
           –     N = Negative result from ALU                    – I = 1: Disables the IRQ.
           –     Z = Zero result from ALU                        – F = 1: Disables the FIQ.
           –     C = ALU operation Carried out
           –     V = ALU operation overflowed               •   T Bit
                                                                 – Architecture xT only
•        Sticky Overflow flag - Q flag                           – T = 0: Processor in ARM state
           – Architecture 5TE/J only                             – T = 1: Processor in Thumb state
           – Indicates if saturation has occurred
                                                            •   Mode bits
•        J bit                                                   – Specify the processor mode
           – Architecture 5TEJ only
           – J = 1: Processor in Jazelle state
Program Counter (r15)
• When the processor is executing in ARM state:
    – All instructions are 32 bits wide
    – All instructions must be word aligned
    – Therefore the PC value is stored in bits [31:2] with bits [1:0] undefined (as
      instruction cannot be halfword or byte aligned).

• When the processor is executing in Thumb state:
    – All instructions are 16 bits wide
    – All instructions must be halfword aligned
    – Therefore the PC value is stored in bits [31:1] with bit [0] undefined (as
      instruction cannot be byte aligned).

• When the processor is executing in Jazelle state:
    – All instructions are 8 bits wide
    – Processor performs a word access to read 4 instructions at once
Exception Handling
• When an exception occurs, the ARM:
  – Copies CPSR into SPSR_<mode>
  – Sets appropriate CPSR bits
     • Change to ARM state
                                           0x1C               FIQ
     • Change to exception mode            0x18               IRQ
     • Disable interrupts (if appropriate) 0x14          (Reserved)
  – Stores the return address in          0x10           Data Abort
  LR_<mode>                               0x0C         Prefetch Abort
                                          0x08         Software Interrupt
  – Sets PC to vector address             0x04        Undefined Instruction

• To return, exception handler            0x00              Reset

needs to:                                              Vector Table
                                                   Vector table can be at
  – Restore CPSR from SPSR_<mode>                 0xFFFF0000 on ARM720T
                                                  and on ARM9/10 family
  – Restore PC from LR_<mode>                             devices
  This can only be done in ARM state.
Development of the
                                   ARM Architecture
                                           Improved
                 Halfword                  ARM/Thumb       5TE   Jazelle
                                    4
                 and signed                Interworking                                5TEJ
      1                                                          Java bytecode
                 halfword /                                      execution
                                           CLZ
                 byte support
                 System         SA-110     Saturated maths         ARM9EJ-S          ARM926EJ-S
      2          mode
                                           DSP multiply-
                                SA-1110                            ARM7EJ-S          ARM1026EJ-S
                                           accumulate
                                           instructions
      3
                                            ARM1020E             SIMD Instructions
                Thumb              4T                                                         6
                instruction                                      Multi-processing
                set                           XScale
Early ARM                                                        V6 Memory
architectures                                                    architecture (VMSA)
                 ARM7TDMI       ARM9TDMI     ARM9E-S
                                                                 Unaligned data
                  ARM720T       ARM940T    ARM966E-S             support             ARM1136EJ-S
The ARM Instruction Set part1
Main features of the
               ARM Instruction Set
•   All instructions are 32 bits long.
•   Most instructions execute in a single cycle.
•   Every instruction can be conditionally executed.
•   A load/store architecture
    – Data processing instructions act only on registers
       • Three operand format
       • Combined ALU and shifter for high speed bit manipulation
    – Specific memory access instructions with powerful
      auto-indexing addressing modes.
Conditional Execution
• Most instruction sets only allow branches to be executed
  conditionally by postfixing them with the appropriate condition
  code field..
• However by reusing the condition evaluation hardware, ARM
  effectively increases number of instructions.
   – All instructions contain a condition field which determines whether
     the CPU will execute them.
   – Non-executed instructions soak up 1 cycle.
       • Still have to complete cycle so as to allow fetching and decoding of following
         instructions.
• This removes the need for many branches, which stall the pipeline
  (3 cycles to refill).
   – Allows very dense in-line code, without branches.
   – The Time penalty of not executing several conditional instructions is
     frequently less than overhead of the branch
     or subroutine call that would otherwise be needed.
The Condition Field
              31        28          24   20   16           12           8         4             0

                Cond


0000 = EQ - Z set (equal)                          1001 = LS - C clear or Z (set unsigned
0001 = NE - Z clear (not equal)                           lower or same)

0010 = HS / CS - C set (unsigned                   1010 = GE - N set and V set, or N clear
       higher or same)                                    and V clear (>or =)
0011 = LO / CC - C clear (unsigned                 1011 = LT - N set and V clear, or N clear
       lower)                                             and V set (>)
0100 = MI -N set (negative)                        1100 = GT - Z clear, and either N set and
0101 = PL - N clear (positive or zero)                    V set, or N clear and V set (>)
0110 = VS - V set (overflow)                       1101 = LE - Z set, or N set and V clear,or
0111 = VC - V clear (no overflow)                         N clear and V set (<, or =)

1000 = HI - C set and Z clear                      1110 = AL - always
       (unsigned higher)                           1111 = NV - reserved.
Using and updating the Condition Field
• To execute an instruction conditionally, simply postfix it with the
  appropriate condition:
    – For example an add instruction takes the form:
        • ADD r0,r1,r2 ; r0 = r1 + r2 (ADDAL)
    – To execute this only if the zero flag is set:
        • ADDEQ r0,r1,r2             ; If zero flag set then…
                                     ; ... r0 = r1 + r2
• By default, data processing operations do not affect the condition
  flags (apart from the comparisons where this is the only effect). To
  cause the condition flags to be updated, the S bit of the instruction
  needs to be set by postfixing the instruction (and any condition
  code) with an “S”.
    – For example to add two numbers and set the condition flags:
        • ADDS r0,r1,r2              ; r0 = r1 + r2
                                     ; ... and set flags
Data processing Instructions
• Largest family of ARM instructions, all sharing the same
  instruction format.
• Contains:
   –   Arithmetic operations
   –   Comparisons (no results - just set condition codes)
   –   Logical operations
   –   Data movement between registers
• Remember, this is a load / store architecture
   – These instruction only work on registers, NOT memory.
• They each perform a specific operation on one or two
  operands.
   – First operand always a register - Rn
   – Second operand sent to the ALU via barrel shifter.
ARM Processor
ARM Processor
Data Movement
• Operations are:
   – MOV      operand2
   – MVN      NOT operand2
  Note that these make no use of operand1 i.e operand1
  is ignored.
• Syntax:
   – <Operation>{<cond>}{S} Rd, Operand2
• Examples:
   – MOV r0, r1
   – MOVS r2, #10
   – MVNEQ r1,#0
Arithmetic Operations
• Operations are:
   –   ADD       operand1 + operand2
   –   ADC       operand1 + operand2 + carry
   –   SUB       operand1 - operand2
   –   SBC       operand1 - operand2 + carry -1
   –   RSB       operand2 - operand1
   –   RSC       operand2 - operand1 + carry - 1
• Syntax:
   – <Operation>{<cond>}{S} Rd, Rn, Operand2
• Examples
   –   ADD r0, r1, r2
   –   SUBGT r3, r3, #1
   –   RSBLES r4, r5, #5
   –   SUB r4,r5,r7,LSR r2    ; Logical right shift R7 by the number in
                              ; the bottom byte of R2, subtract result
                              ; from R5, and put the answer into R4.
Logical Operations
• Operations are:
   –   AND    operand1 AND operand2
   –   EOR    operand1 EOR operand2
   –   ORR    operand1 OR operand2
   –   BIC    operand1 AND NOT operand2 [ie bit clear]
• Syntax:
   – <Operation>{<cond>}{S} Rd, Rn, Operand2
• Examples:
   – AND      r0, r1, r2
   – BICEQ    r2, r3, #7
   – EORS     r1,r3,r0
Multiplication Instructions
• The Basic ARM provides two multiplication instructions.
• Multiply
   – MUL{<cond>}{S} Rd, Rm, Rs            ; Rd = Rm * Rs
• Multiply Accumulate            - does addition for free
   – MLA{<cond>}{S} Rd, Rm, Rs,Rn         ; Rd = (Rm * Rs) + Rn
• Restrictions on use:
   – Rd and Rm cannot be the same register
       • Can be avoid by swapping Rm and Rs around. This works because
         multiplication is commutative.
   – Cannot use PC.
  These will be picked up by the assembler if overlooked.
• Operands can be considered signed or unsigned
   – Up to user to interpret correctly.
• The multiply form of the instruction gives Rd:=Rm*Rs. Rn is
  ignored, and should be set to zero for compatibility with
  possible future upgrades to the instruction set.
Multiplication Implementation
 • The ARM makes use of Booth’s Algorithm to perform integer
   multiplication.
 • On non-M ARMs this operates on 2 bits of Rs at a time.
        – For each pair of bits this takes 1 cycle (plus 1 cycle to start with).
        – However when there are no more 1’s left in Rs, the multiplication will
          early-terminate.
 • Example: Multiply 18 and -1 : Rd = Rm * Rs
  Rm         18 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 18     Rs

   Rs        -1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 -1    Rm

17 cycles                                                                            4 cycles

 • Note: Compiler does not use early termination criteria to
   decide on which order to place operands.
Booth’s Algorithm
Extended Multiply Instructions
• M variants of ARM cores contain extended multiplication
  hardware. This provides three enhancements:
   – An 8 bit Booth’s Algorithm is used
       • Multiplication is carried out faster (maximum for standard instructions
         is now 5 cycles).
   – Early termination method improved so that now completes
     multiplication when all remaining bit sets contain
       • all zeroes (as with non-M ARMs), or
       • all ones.
     Thus the previous example would early terminate in 2 cycles in
     both cases.
   – 64 bit results can now be produced from two 32bit operands
       • Higher accuracy.
       • Pair of registers used to store result.
Multiply-Long and
             Multiply-Accumulate Long
• Instructions are
    – MULL which gives RdHi,RdLo:=Rm*Rs
    – MLAL which gives RdHi,RdLo:=(Rm*Rs)+RdHi,RdLo
• However the full 64 bit of the result now matter (lower precision
  multiply instructions simply throws top 32bits away)
    – Need to specify whether operands are signed or unsigned
• Therefore syntax of new instructions are:
    –   UMULL{<cond>}{S} RdLo,RdHi,Rm,Rs
    –   UMLAL{<cond>}{S} RdLo,RdHi,Rm,Rs
    –   SMULL{<cond>}{S} RdLo, RdHi, Rm, Rs
    –   SMLAL{<cond>}{S} RdLo, RdHi, Rm, Rs
• Not generated by the compiler.
   Warning : Unpredictable on non-M ARMs.
Operand restrictions
  • R15 must not be used as an operand or as a destination
  register.
  • RdHi, RdLo, and Rm must all specify different registers.
ISA part 1
Data Transfer

• ARM is a load/store architecture
• Involves
   -Load data from memory to register
   -Store data from register into memory
• ARM has three types of load/store instructions
  -LDR/STR
  -LDM/STM
  -SWP
LDR/STR Instructions
Types of load/store instructions

Simple load/store has options like the following
• LDR/STR       involved in storing/loading words(32 bits)
• LDRB/STRB involved with a byte transfer
• In ARM v4 we also have support for halfwords(16 bits)
   LDRH/STRH without sign extension
   LDRSB/STRSB with sign extension
• Condition codes can also be suffixed
   LDREQB/STREQB
• General syntax looks somewhat like..
   <LDR|STR>{<cond>}{<size>} Rd, <address>
Base Register
• STR r0,[r1] Stores content in address contained in r1 in r0
  LDR r2,[r1] Loads content in address contained in r1 to r2


                     r0            Memory
      Source
                    0x5
      Register
      for STR


              r1                                r2
  Base                                               Destination
            0x200          0x200    0x5        0x5
 Register                                             Register
                                                      for LDR
Off set from the base register

• ARM also supports accessing locations pointed out as an
  offset from the base register
• The offset can be
  An unsigned 12 bit immediate value(0-4096)
  A register with the option of shift
• Option exists for ‘+’ or ‘-’ from base register
• Offset can be applied
  - before transfer is made
    optionally auto incremnets base register by using ‘!’
  -after transfer is made
    base register auto incremented
Pre-Indexed Addressing

• Example :STR r0,[r1,#12]
                                                   r0   Source
                                     Memory
                                                  0x5   Register
                   Offset                               for STR
                     12      0x20c    0x5
             r1
 Base      0x200             0x200
Register


  •Offset value can as well be -12 (STR r0,[r1,#-12])
  •To perform auto increment on base reg STR r0,[r1,#12]!
    -updates base register to value 0x20C
  •If r2 contains 3 then this will yield the same result
   STR r0,[r1,r2,LSL#2]
  •Useful if only a particular element is to be accessed
Post Indexed Addressing
• Example :STR r0,[r1],#12
                                         Memory

Updated      r1       Offset                             r0    Source
 Base      0x20c        12       0x20c                  0x5    Register
Register                                                       for STR
                                 0x200    0x5
Original     r1
 Base      0x200
Register


 •If r2 contains 3 then this will also yield the same result
   STR r0,[r1],r2,LSL #2
 •Useful if traversal is required through elements
For half words/signed byte access

• Instructions can be used in much the same way except
  - the offset value is restricted to 8 bits(0-255)
  - the registers cannot be shifted
For LDRH/STRH register offset
For LDRH/STRH immediate offset
LDM/STM (Block data transfer)
• Allow for transfer between 1-16 registers to or from memory
• The transferred registers can be:
  - Any subset of the current bank of registers (default).
  - Any subset of the user mode bank of registers when in a
    privileged mode (postfix instruction with a ‘^’).
Instruction Format
Block Data Transfer

• Base register determines where memory access can occur
• Base register can be updated after data transfer by suffixing a
  ‘!’
• These instructions are useful for
   - Saving and restoring context
   - moving large chunks of data to/from memory
Stack Example
Block Data Transfer

• One use of stacks is to temporary create register space for
  subroutines
  STMFD sp!,{r0-r12, lr}         ; stack all registers
   ........                      ; and the return address
   ........
  LDMFD sp!,{r0-r12, pc}         ; load all the registers
                                 ; and return automatically

• If the pop instruction also had the ‘S’ bit set (using ‘^’) then
  the transfer of the PC when in a priviledged mode would also
  cause the SPSR to be copied into the CPSR (see exception
  handling module).
Direct functionality Of Block Data Transfer

• When not being used for a stack operation these instructions
  can also be used in a generic way
• The LDM/STM support a further set of instructions
   – STMIA / LDMIA : Increment After
   – STMIB / LDMIB : Increment Before
   – STMDA / LDMDA : Decrement After
   – STMDB / LDMDB : Decrement Before
Criteria for different block data transfer
Swap Instruction
Swap Instruction

• The instruction is used to swap data between a register and a
  memory
• This instruction is atomic (cannot be interrupted)
• The swap address is determined by the contents of the base
  register (Rn).
• The processor first reads the contents of the swap address.
  Then it writes the contents of the source register (Rm) to the
  swap address, and stores the old memory contents in the
  destination register (Rd).
• The same register may be specified as both the source and
  destination
Branch and Exchange




•Used to switch between the Thumb state and the ARM state
Branch and Branch Link
Branch and Branch with Link

• Branch instructions contain a signed 2’s complement 24 bit offset.
• This is shifted left two bits, sign extended to 32 bits, and added to
  the PC.
• The instruction can therefore specify a branch of +/- 32Mbytes.
• The branch offset must take account of the prefetch operation,
  which causes the PC to be 2 words (8 bytes) ahead of the current
  instruction.
• Branches beyond +/- 32Mbytes must use an offset or absolute
  destination which has been previously loaded into a register. In this
  case the PC should be manually saved in R14 if a Branch with Link
  type operation is required.
Link Bit

• Branch with Link (BL) writes the old PC into the link register
  (R14) of the current bank.
• The PC value written into R14 is adjusted to allow for the
  prefetch, and contains the address of the instruction following
  the branch and link instruction.
• The CPSR is not saved with the PC
Barrel Shifter

• A barrel shifter is a digital circuit that can shift a data word by
  a specified number of bits in one clock cycle.
• It can be implemented as a sequence of multiplexers (mux.),
  and in such an implementation the output of one mux is
  connected to the input of the next mux in a way that depends
  on the shift distance.
• A barrel shifter is often implemented as a cascade of parallel
  2×1 multiplexers.
Using the Barrel Shifter




•There are 2 options for shifting
 - where shift amount is stored in a base register bottom byte
 - shift amount as a % bit unsigned integer
Shift Operations

• Shifts Left by specified amount (multiplies)
• Example: LSL #5




          CF                  Destination        0
Shift Operations

• Logical Shift Right
• Shifts right without preserving sign bit
                               ...0              Destination   CF


• Arithmetic Shift Right
• Preserves the sign bit


                                             Destination       CF

                           Sign bit shifted in
Rotate

• Rotate Right
  Same as ASR but the bits wrap around as they rotate
   The rotated bit also used as carry flag


                                         Rotate Right


                                         Destination    CF
Comparison
• The only effect of the comparisons is to
   – UPDATE THE CONDITION FLAGS. Thus no need to set S bit.
• Operations are:
   – CMP      operand1 - operand2, but result not written
   – CMN      operand1 + operand2, but result not written
   – TST      operand1 AND operand2, but result not written
   – TEQ      operand1 EOR operand2, but result not written
• Syntax:
   – <Operation>{<cond>} Rn, Operand2
• Examples:
   – CMP      r0, r1
   – TSTEQ r2, #5
ARM Processor
Pipelining
• Initially implemented a 3-stage pipeline
  organization. (upto ARM7)
  – Fetch
  – Decode
  – Execute
• 3-stage pipeline organization
  – Principal components
     • The register bank
     • The barrel shifter
        – Can shift or rotate one operand by any number of bits
     • The ALU
     • The address register and incrementer
        – Select and hold all memory addresses and generate
          sequential addresses
     • The data registers
     • The instruction decoder and associated control logic
• Fetch - The instruction is
  fetched from memory and
  placed in the instruction
  pipeline
• Decode - The instruction is
  decoded and the datapath
  control signals prepared for
  the next cycle
• Execute - The register bank
  is read, an operand shifted,
  the ALU result generated
  and written back into
  destination register
• At any time slice, 3 different instructions may occupy
  each of these stages, so the hardware in each stage has
  to be capable of independent operations

• When the processor is executing data processing
  instructions , the latency = 3 cycles and the throughput
  = 1 instruction/cycle

• Drawback: Every data transfer instruction causes a
  pipeline “stall”. (Single memory for data and
  instruction- next instruction cannot be fetched while
  data is being read)
5-stage Pipeline Organization
• Implemented in ARM9TDMI
• Tprog = Ninst * CPI / fclk
  – Tprog: the time taken to execute a given program
  – Ninst: the number of ARM instructions executed in
    the program (compiler dependent)
  – CPI: average number of clock cycles per
    instructions => hazard causes pipeline stalls
  – fclk: frequency
• Fetch
   – The instruction is fetched from
     memory and placed in the
     instruction pipeline
• Decode
   – The instruction is decoded and
     register operands read from the
     register files. There are 3
     operand read ports in the
     register file so most ARM
     instructions can source all their
     operands in one cycle
• Execute
   – An operand is shifted and the
     ALU result generated. If the
     instruction is a load or store,
     the memory address is
     computed in the ALU
• Buffer/Data
  – Data memory is accessed
    if required. Otherwise the
    ALU result is simply
    buffered for one cycle.
• Write back
  – The result generated by
    the instruction are written
    back to the register file,
    including any data loaded
    from memory.
5-stage pipeline organization
• Moved the register read step from the execute
  stage to the decode stage
• Execute stage was split into 3 stages- ALU,
  memory access, write back.
• Result: Better balanced pipeline with
  minimized latencies between stages, which
  can run at a faster clock speed.
Pipeline Hazards
• There are situations, called hazards, that prevent the
  next instruction in the instruction stream from being
  executed during its designated clock cycle. Hazards
  reduce the performance from the ideal speedup
  gained by pipelining.
• There are three classes of hazards:
   – Structural Hazards
   – Data Hazards
   – Control Hazards
Structural Hazards
• When a machine is pipelined, the overlapped
  execution of instructions requires pipelining of
  functional units and duplication of resources
  to allow all possible combinations of
  instructions in the pipeline.
• If some combination of instructions cannot be
  accommodated because of a resource conflict,
  the machine is said to have a structural
  hazard.
• Ex. A machine has shared a single-memory pipeline
  for data and instructions. As a result, when an
  instruction contains a data-memory reference (load),
  it will conflict with the instruction reference for a
  later instruction (instr 3):
Solution
• To resolve this, we stall the pipeline for one clock
  cycle when a data-memory access occurs. The effect
  of the stall is actually to occupy the resources for
  that instruction slot. The following table shows how
  the stall is actually implemented.
Solution
• Another solution is to use separate instruction
  and data memories.
• ARM has moved from the von-Neumann
  architecture to the Harvard architecture in
  ARM9.
  – Implemented a 5-stage pipeline and separate data
    and instruction memory.
  – Doesn’t suffer from this hazard.
Data Hazards
• They arise when an instruction depends on the result of a
  previous instruction in a way that is exposed by the
  overlapping of instructions in the pipeline.
• The problem with data hazards can be solved with a
  hardware technique called data forwarding (by making
  use of feedback paths).
• Without forwarding, the pipeline would have to be
  stalled to get the results from the respective registers
• Example:
Data Hazards




•   The first forwarding is for value of R1 from EXadd to EXsub.
•   The second forwarding is also for value of R1 from MEMadd to EXand.
•   This code now can be executed without stalls.
•   Forwarding can be generalized to include passing the result directly
    to the functional unit that requires it: a result is forwarded from the
    output of one unit to the input of another, rather than just from the
    result of a unit to the input of the same unit.
Control Hazards
• They arise from the pipelining of branches and other
  instructions that change the PC.
Further Improvements
THANK YOU




•Alok Sharma
•Aniket Thakur
•Paritosh Ramanan
•Pavan A.R.

Más contenido relacionado

La actualidad más candente

Introduction to arm processor
Introduction to arm processorIntroduction to arm processor
Introduction to arm processorRAMPRAKASHT1
 
ARM Architecture Instruction Set
ARM Architecture Instruction SetARM Architecture Instruction Set
ARM Architecture Instruction SetDwight Sabio
 
System on chip architectures
System on chip architecturesSystem on chip architectures
System on chip architecturesA B Shinde
 
Architecture of 8051
Architecture of 8051Architecture of 8051
Architecture of 8051hello_priti
 
Introduction to arm architecture
Introduction to arm architectureIntroduction to arm architecture
Introduction to arm architectureZakaria Gomaa
 
Unit II Arm 7 Introduction
Unit II Arm 7 IntroductionUnit II Arm 7 Introduction
Unit II Arm 7 IntroductionDr. Pankaj Zope
 
Unit II Arm7 Thumb Instruction
Unit II Arm7 Thumb InstructionUnit II Arm7 Thumb Instruction
Unit II Arm7 Thumb InstructionDr. Pankaj Zope
 
FPGA TECHNOLOGY AND FAMILIES
FPGA TECHNOLOGY AND FAMILIESFPGA TECHNOLOGY AND FAMILIES
FPGA TECHNOLOGY AND FAMILIESrevathilakshmi2
 
Unit II arm 7 Instruction Set
Unit II arm 7 Instruction SetUnit II arm 7 Instruction Set
Unit II arm 7 Instruction SetDr. Pankaj Zope
 
RTOS APPLICATIONS
RTOS  APPLICATIONSRTOS  APPLICATIONS
RTOS APPLICATIONSDr.YNM
 
Arm modes
Arm modesArm modes
Arm modesabhi165
 
Communication Interface of The Embedded Systems
Communication Interface of The Embedded Systems Communication Interface of The Embedded Systems
Communication Interface of The Embedded Systems VijayKumar5738
 
LPC 2148 ARM MICROCONTROLLER
LPC 2148 ARM MICROCONTROLLERLPC 2148 ARM MICROCONTROLLER
LPC 2148 ARM MICROCONTROLLERsravannunna24
 
Arm Processors Architectures
Arm Processors ArchitecturesArm Processors Architectures
Arm Processors ArchitecturesMohammed Hilal
 
Pic 18 microcontroller
Pic 18 microcontrollerPic 18 microcontroller
Pic 18 microcontrollerAshish Ranjan
 

La actualidad más candente (20)

Introduction to arm processor
Introduction to arm processorIntroduction to arm processor
Introduction to arm processor
 
ARM Architecture Instruction Set
ARM Architecture Instruction SetARM Architecture Instruction Set
ARM Architecture Instruction Set
 
System on chip architectures
System on chip architecturesSystem on chip architectures
System on chip architectures
 
Architecture of 8051
Architecture of 8051Architecture of 8051
Architecture of 8051
 
Introduction to arm architecture
Introduction to arm architectureIntroduction to arm architecture
Introduction to arm architecture
 
ARM Micro-controller
ARM Micro-controllerARM Micro-controller
ARM Micro-controller
 
Unit II Arm 7 Introduction
Unit II Arm 7 IntroductionUnit II Arm 7 Introduction
Unit II Arm 7 Introduction
 
Unit II Arm7 Thumb Instruction
Unit II Arm7 Thumb InstructionUnit II Arm7 Thumb Instruction
Unit II Arm7 Thumb Instruction
 
ARM Processor Tutorial
ARM Processor Tutorial ARM Processor Tutorial
ARM Processor Tutorial
 
FPGA TECHNOLOGY AND FAMILIES
FPGA TECHNOLOGY AND FAMILIESFPGA TECHNOLOGY AND FAMILIES
FPGA TECHNOLOGY AND FAMILIES
 
Unit II arm 7 Instruction Set
Unit II arm 7 Instruction SetUnit II arm 7 Instruction Set
Unit II arm 7 Instruction Set
 
RTOS APPLICATIONS
RTOS  APPLICATIONSRTOS  APPLICATIONS
RTOS APPLICATIONS
 
Pass Transistor Logic
Pass Transistor LogicPass Transistor Logic
Pass Transistor Logic
 
Unit4.addressing modes 54 xx
Unit4.addressing modes 54 xxUnit4.addressing modes 54 xx
Unit4.addressing modes 54 xx
 
Arm modes
Arm modesArm modes
Arm modes
 
Communication Interface of The Embedded Systems
Communication Interface of The Embedded Systems Communication Interface of The Embedded Systems
Communication Interface of The Embedded Systems
 
LPC 2148 ARM MICROCONTROLLER
LPC 2148 ARM MICROCONTROLLERLPC 2148 ARM MICROCONTROLLER
LPC 2148 ARM MICROCONTROLLER
 
Arm Processors Architectures
Arm Processors ArchitecturesArm Processors Architectures
Arm Processors Architectures
 
Embedded System Basics
Embedded System BasicsEmbedded System Basics
Embedded System Basics
 
Pic 18 microcontroller
Pic 18 microcontrollerPic 18 microcontroller
Pic 18 microcontroller
 

Destacado

Arm architecture chapter2_steve_furber
Arm architecture chapter2_steve_furberArm architecture chapter2_steve_furber
Arm architecture chapter2_steve_furberasodariyabhavesh
 
Handheld Devices
Handheld DevicesHandheld Devices
Handheld Devicesrbarreras
 
Handheld operting system
Handheld operting systemHandheld operting system
Handheld operting systemAj Maurya
 
8051 Microcontroller Notes
8051 Microcontroller Notes8051 Microcontroller Notes
8051 Microcontroller NotesDr.YNM
 

Destacado (6)

Risc
RiscRisc
Risc
 
Arm architecture chapter2_steve_furber
Arm architecture chapter2_steve_furberArm architecture chapter2_steve_furber
Arm architecture chapter2_steve_furber
 
Handheld Devices
Handheld DevicesHandheld Devices
Handheld Devices
 
CISC & RISC Architecture
CISC & RISC Architecture CISC & RISC Architecture
CISC & RISC Architecture
 
Handheld operting system
Handheld operting systemHandheld operting system
Handheld operting system
 
8051 Microcontroller Notes
8051 Microcontroller Notes8051 Microcontroller Notes
8051 Microcontroller Notes
 

Similar a ARM Processor

Similar a ARM Processor (20)

2 introduction to arm architecture
2 introduction to arm architecture2 introduction to arm architecture
2 introduction to arm architecture
 
LPC 2148 Instructions Set.ppt
LPC 2148 Instructions Set.pptLPC 2148 Instructions Set.ppt
LPC 2148 Instructions Set.ppt
 
Arm teaching material
Arm teaching materialArm teaching material
Arm teaching material
 
Arm teaching material
Arm teaching materialArm teaching material
Arm teaching material
 
Arm architecture
Arm architectureArm architecture
Arm architecture
 
Arm
ArmArm
Arm
 
The ARM Architecture: ARM : ARM Architecture
The ARM Architecture: ARM : ARM ArchitectureThe ARM Architecture: ARM : ARM Architecture
The ARM Architecture: ARM : ARM Architecture
 
arm-intro.ppt
arm-intro.pptarm-intro.ppt
arm-intro.ppt
 
ARM.ppt
ARM.pptARM.ppt
ARM.ppt
 
ARM Introduction
ARM IntroductionARM Introduction
ARM Introduction
 
Arm architecture overview
Arm architecture overviewArm architecture overview
Arm architecture overview
 
ARM7TDMI-S_CPU.ppt
ARM7TDMI-S_CPU.pptARM7TDMI-S_CPU.ppt
ARM7TDMI-S_CPU.ppt
 
07-arm_overview.ppt
07-arm_overview.ppt07-arm_overview.ppt
07-arm_overview.ppt
 
ARMInst.ppt
ARMInst.pptARMInst.ppt
ARMInst.ppt
 
ARMInst.ppt
ARMInst.pptARMInst.ppt
ARMInst.ppt
 
ARM - Advance RISC Machine
ARM - Advance RISC MachineARM - Advance RISC Machine
ARM - Advance RISC Machine
 
arm
armarm
arm
 
07-arm_overview.ppt
07-arm_overview.ppt07-arm_overview.ppt
07-arm_overview.ppt
 
Unit 4 _ ARM Processors .pptx
Unit 4 _ ARM Processors .pptxUnit 4 _ ARM Processors .pptx
Unit 4 _ ARM Processors .pptx
 
ESD_05_ARM_Instructions set for preparation
ESD_05_ARM_Instructions set for preparationESD_05_ARM_Instructions set for preparation
ESD_05_ARM_Instructions set for preparation
 

Último

VoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXVoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXTarek Kalaji
 
UiPath Studio Web workshop series - Day 5
UiPath Studio Web workshop series - Day 5UiPath Studio Web workshop series - Day 5
UiPath Studio Web workshop series - Day 5DianaGray10
 
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsIgniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsSafe Software
 
Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Brian Pichman
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Websitedgelyza
 
9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding TeamAdam Moalla
 
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAshyamraj55
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7DianaGray10
 
UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8DianaGray10
 
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfJamie (Taka) Wang
 
UiPath Clipboard AI: "A TIME Magazine Best Invention of 2023 Unveiled"
UiPath Clipboard AI: "A TIME Magazine Best Invention of 2023 Unveiled"UiPath Clipboard AI: "A TIME Magazine Best Invention of 2023 Unveiled"
UiPath Clipboard AI: "A TIME Magazine Best Invention of 2023 Unveiled"DianaGray10
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UbiTrack UK
 
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...Daniel Zivkovic
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024D Cloud Solutions
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024SkyPlanner
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1DianaGray10
 
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDEADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDELiveplex
 
100+ ChatGPT Prompts for SEO Optimization
100+ ChatGPT Prompts for SEO Optimization100+ ChatGPT Prompts for SEO Optimization
100+ ChatGPT Prompts for SEO Optimizationarrow10202532yuvraj
 

Último (20)

VoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXVoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBX
 
UiPath Studio Web workshop series - Day 5
UiPath Studio Web workshop series - Day 5UiPath Studio Web workshop series - Day 5
UiPath Studio Web workshop series - Day 5
 
20230104 - machine vision
20230104 - machine vision20230104 - machine vision
20230104 - machine vision
 
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsIgniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
 
Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Website
 
9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team
 
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7
 
201610817 - edge part1
201610817 - edge part1201610817 - edge part1
201610817 - edge part1
 
UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8
 
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
 
UiPath Clipboard AI: "A TIME Magazine Best Invention of 2023 Unveiled"
UiPath Clipboard AI: "A TIME Magazine Best Invention of 2023 Unveiled"UiPath Clipboard AI: "A TIME Magazine Best Invention of 2023 Unveiled"
UiPath Clipboard AI: "A TIME Magazine Best Invention of 2023 Unveiled"
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
 
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
 
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDEADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
 
100+ ChatGPT Prompts for SEO Optimization
100+ ChatGPT Prompts for SEO Optimization100+ ChatGPT Prompts for SEO Optimization
100+ ChatGPT Prompts for SEO Optimization
 

ARM Processor

  • 2. ARM •Introduction and processor modes •Instruction Set Architecture – I •Instruction Set Architecture- II •Pipelining in ARM
  • 3. ARM • ARM: Advanced RISC Machines • Most widely used 32- bit RISC instruction set architecture • The relative simplicity makes it suitable for low power devices • ARM7, ARM9, ARM11 and Cortex • Approximately 90% of all embedded 32-bit RISC processors • Used extensively in consumer electronics, including PDAs, mobile phones, digital media and music players, hand-held game consoles, calculators and computer peripherals such as hard drives and routers.
  • 4. Product Code Description • M: Multiplier ARM processor have hardware multiplier unit doing multiplication • I: Embedded ICE Macrocel Hardware circuit used to generate trace information. Used in advance debugging. • E: Enhanced Instruction Set • J: Java Acceleration by Jazelle mode Hardware circuit used for running JAVA byte code • F: Vector Floating point Hardware implementation of floating operations. • S: Synthesizable Version The ARM architecture can be modified as it comes in terms of soft processor core.
  • 5. Example • ARM7TDMI This is the ARM7 family processor which has T= Thumb instruction set, D= Debug Unit, M= MMU(Memory Management Unit), I= Embedded Trace core. • ARM946E-S 1. ARM9xx core 2. Enhanced Instruction set 3. Synthesizable
  • 6. ARM • ARM has 3 instruction set states 1. 32-bit ARM instruction set 2. 16-bit Thumb instruction set 3. 8- bit Jazelle instruction set • ARM – 32 bit Load/Store architecture with every instruction being conditional. • Thumb- 16 bit with only branch instructions being conditional and only half of the registers used • Jazelle- Allows Java byte code to be directly executed in ARM architecture. Improves performance by 5x-10x
  • 7. ARM- Processor Modes • Seven basic operating modes exist: 1. User: Unprivileged mode under which most tasks run 2. FIQ: Entered when a high priority interrupt is raised 3. IRQ: Entered when a low priority interrupt is raised 4. Supervisor: Entered on reset and when a software Interrupt instruction is executed 5. Abort: Used to handle memory access violations 6. Undef: Used to handle undefined instructions 7. System: Privileged mode using the same registers as user mode.
  • 8. Register Organization Summary User FIQ IRQ SVC Undef Abort r0 r1 User r2 mode r3 r0-r7, r4 r15, User User User User Thumb state and mode mode mode mode r5 cpsr r0-r12, r0-r12, r0-r12, r0-r12, Low registers r6 r15, r15, r15, r15, r7 and and and and r8 r8 cpsr cpsr cpsr cpsr r9 r9 r10 r10 Thumb state r11 r11 High registers r12 r12 r13 (sp) r13 (sp) r13 (sp) r13 (sp) r13 (sp) r13 (sp) r14 (lr) r14 (lr) r14 (lr) r14 (lr) r14 (lr) r14 (lr) r15 (pc) cpsr spsr spsr spsr spsr spsr Note: System mode uses the User mode register set
  • 9. ARM- The Registers • ARM has 37 registers all of which are 32-bits long. – 1 dedicated program counter – 1 dedicated current program status register – 5 dedicated saved program status registers – 30 general purpose registers • The current processor mode governs which of several banks is accessible. Each mode can access – a particular set of r0-r12 registers – a particular r13 (the stack pointer, sp) and r14 (the link register, lr) – the program counter, r15(pc) – the current program status register, cpsr Privileged modes (except System) can also access – a particular spsr (saved program status register)
  • 10. Program Status Registers 31 28 27 24 23 16 15 8 7 6 5 4 0 NZ C VQ J U n d e f i n e d I F T mode f s x c • Condition code flags • Interrupt Disable bits. – N = Negative result from ALU – I = 1: Disables the IRQ. – Z = Zero result from ALU – F = 1: Disables the FIQ. – C = ALU operation Carried out – V = ALU operation overflowed • T Bit – Architecture xT only • Sticky Overflow flag - Q flag – T = 0: Processor in ARM state – Architecture 5TE/J only – T = 1: Processor in Thumb state – Indicates if saturation has occurred • Mode bits • J bit – Specify the processor mode – Architecture 5TEJ only – J = 1: Processor in Jazelle state
  • 11. Program Counter (r15) • When the processor is executing in ARM state: – All instructions are 32 bits wide – All instructions must be word aligned – Therefore the PC value is stored in bits [31:2] with bits [1:0] undefined (as instruction cannot be halfword or byte aligned). • When the processor is executing in Thumb state: – All instructions are 16 bits wide – All instructions must be halfword aligned – Therefore the PC value is stored in bits [31:1] with bit [0] undefined (as instruction cannot be byte aligned). • When the processor is executing in Jazelle state: – All instructions are 8 bits wide – Processor performs a word access to read 4 instructions at once
  • 12. Exception Handling • When an exception occurs, the ARM: – Copies CPSR into SPSR_<mode> – Sets appropriate CPSR bits • Change to ARM state 0x1C FIQ • Change to exception mode 0x18 IRQ • Disable interrupts (if appropriate) 0x14 (Reserved) – Stores the return address in 0x10 Data Abort LR_<mode> 0x0C Prefetch Abort 0x08 Software Interrupt – Sets PC to vector address 0x04 Undefined Instruction • To return, exception handler 0x00 Reset needs to: Vector Table Vector table can be at – Restore CPSR from SPSR_<mode> 0xFFFF0000 on ARM720T and on ARM9/10 family – Restore PC from LR_<mode> devices This can only be done in ARM state.
  • 13. Development of the ARM Architecture Improved Halfword ARM/Thumb 5TE Jazelle 4 and signed Interworking 5TEJ 1 Java bytecode halfword / execution CLZ byte support System SA-110 Saturated maths ARM9EJ-S ARM926EJ-S 2 mode DSP multiply- SA-1110 ARM7EJ-S ARM1026EJ-S accumulate instructions 3 ARM1020E SIMD Instructions Thumb 4T 6 instruction Multi-processing set XScale Early ARM V6 Memory architectures architecture (VMSA) ARM7TDMI ARM9TDMI ARM9E-S Unaligned data ARM720T ARM940T ARM966E-S support ARM1136EJ-S
  • 14. The ARM Instruction Set part1
  • 15. Main features of the ARM Instruction Set • All instructions are 32 bits long. • Most instructions execute in a single cycle. • Every instruction can be conditionally executed. • A load/store architecture – Data processing instructions act only on registers • Three operand format • Combined ALU and shifter for high speed bit manipulation – Specific memory access instructions with powerful auto-indexing addressing modes.
  • 16. Conditional Execution • Most instruction sets only allow branches to be executed conditionally by postfixing them with the appropriate condition code field.. • However by reusing the condition evaluation hardware, ARM effectively increases number of instructions. – All instructions contain a condition field which determines whether the CPU will execute them. – Non-executed instructions soak up 1 cycle. • Still have to complete cycle so as to allow fetching and decoding of following instructions. • This removes the need for many branches, which stall the pipeline (3 cycles to refill). – Allows very dense in-line code, without branches. – The Time penalty of not executing several conditional instructions is frequently less than overhead of the branch or subroutine call that would otherwise be needed.
  • 17. The Condition Field 31 28 24 20 16 12 8 4 0 Cond 0000 = EQ - Z set (equal) 1001 = LS - C clear or Z (set unsigned 0001 = NE - Z clear (not equal) lower or same) 0010 = HS / CS - C set (unsigned 1010 = GE - N set and V set, or N clear higher or same) and V clear (>or =) 0011 = LO / CC - C clear (unsigned 1011 = LT - N set and V clear, or N clear lower) and V set (>) 0100 = MI -N set (negative) 1100 = GT - Z clear, and either N set and 0101 = PL - N clear (positive or zero) V set, or N clear and V set (>) 0110 = VS - V set (overflow) 1101 = LE - Z set, or N set and V clear,or 0111 = VC - V clear (no overflow) N clear and V set (<, or =) 1000 = HI - C set and Z clear 1110 = AL - always (unsigned higher) 1111 = NV - reserved.
  • 18. Using and updating the Condition Field • To execute an instruction conditionally, simply postfix it with the appropriate condition: – For example an add instruction takes the form: • ADD r0,r1,r2 ; r0 = r1 + r2 (ADDAL) – To execute this only if the zero flag is set: • ADDEQ r0,r1,r2 ; If zero flag set then… ; ... r0 = r1 + r2 • By default, data processing operations do not affect the condition flags (apart from the comparisons where this is the only effect). To cause the condition flags to be updated, the S bit of the instruction needs to be set by postfixing the instruction (and any condition code) with an “S”. – For example to add two numbers and set the condition flags: • ADDS r0,r1,r2 ; r0 = r1 + r2 ; ... and set flags
  • 19. Data processing Instructions • Largest family of ARM instructions, all sharing the same instruction format. • Contains: – Arithmetic operations – Comparisons (no results - just set condition codes) – Logical operations – Data movement between registers • Remember, this is a load / store architecture – These instruction only work on registers, NOT memory. • They each perform a specific operation on one or two operands. – First operand always a register - Rn – Second operand sent to the ALU via barrel shifter.
  • 22. Data Movement • Operations are: – MOV operand2 – MVN NOT operand2 Note that these make no use of operand1 i.e operand1 is ignored. • Syntax: – <Operation>{<cond>}{S} Rd, Operand2 • Examples: – MOV r0, r1 – MOVS r2, #10 – MVNEQ r1,#0
  • 23. Arithmetic Operations • Operations are: – ADD operand1 + operand2 – ADC operand1 + operand2 + carry – SUB operand1 - operand2 – SBC operand1 - operand2 + carry -1 – RSB operand2 - operand1 – RSC operand2 - operand1 + carry - 1 • Syntax: – <Operation>{<cond>}{S} Rd, Rn, Operand2 • Examples – ADD r0, r1, r2 – SUBGT r3, r3, #1 – RSBLES r4, r5, #5 – SUB r4,r5,r7,LSR r2 ; Logical right shift R7 by the number in ; the bottom byte of R2, subtract result ; from R5, and put the answer into R4.
  • 24. Logical Operations • Operations are: – AND operand1 AND operand2 – EOR operand1 EOR operand2 – ORR operand1 OR operand2 – BIC operand1 AND NOT operand2 [ie bit clear] • Syntax: – <Operation>{<cond>}{S} Rd, Rn, Operand2 • Examples: – AND r0, r1, r2 – BICEQ r2, r3, #7 – EORS r1,r3,r0
  • 25. Multiplication Instructions • The Basic ARM provides two multiplication instructions. • Multiply – MUL{<cond>}{S} Rd, Rm, Rs ; Rd = Rm * Rs • Multiply Accumulate - does addition for free – MLA{<cond>}{S} Rd, Rm, Rs,Rn ; Rd = (Rm * Rs) + Rn • Restrictions on use: – Rd and Rm cannot be the same register • Can be avoid by swapping Rm and Rs around. This works because multiplication is commutative. – Cannot use PC. These will be picked up by the assembler if overlooked. • Operands can be considered signed or unsigned – Up to user to interpret correctly.
  • 26. • The multiply form of the instruction gives Rd:=Rm*Rs. Rn is ignored, and should be set to zero for compatibility with possible future upgrades to the instruction set.
  • 27. Multiplication Implementation • The ARM makes use of Booth’s Algorithm to perform integer multiplication. • On non-M ARMs this operates on 2 bits of Rs at a time. – For each pair of bits this takes 1 cycle (plus 1 cycle to start with). – However when there are no more 1’s left in Rs, the multiplication will early-terminate. • Example: Multiply 18 and -1 : Rd = Rm * Rs Rm 18 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 18 Rs Rs -1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 -1 Rm 17 cycles 4 cycles • Note: Compiler does not use early termination criteria to decide on which order to place operands.
  • 29. Extended Multiply Instructions • M variants of ARM cores contain extended multiplication hardware. This provides three enhancements: – An 8 bit Booth’s Algorithm is used • Multiplication is carried out faster (maximum for standard instructions is now 5 cycles). – Early termination method improved so that now completes multiplication when all remaining bit sets contain • all zeroes (as with non-M ARMs), or • all ones. Thus the previous example would early terminate in 2 cycles in both cases. – 64 bit results can now be produced from two 32bit operands • Higher accuracy. • Pair of registers used to store result.
  • 30. Multiply-Long and Multiply-Accumulate Long • Instructions are – MULL which gives RdHi,RdLo:=Rm*Rs – MLAL which gives RdHi,RdLo:=(Rm*Rs)+RdHi,RdLo • However the full 64 bit of the result now matter (lower precision multiply instructions simply throws top 32bits away) – Need to specify whether operands are signed or unsigned • Therefore syntax of new instructions are: – UMULL{<cond>}{S} RdLo,RdHi,Rm,Rs – UMLAL{<cond>}{S} RdLo,RdHi,Rm,Rs – SMULL{<cond>}{S} RdLo, RdHi, Rm, Rs – SMLAL{<cond>}{S} RdLo, RdHi, Rm, Rs • Not generated by the compiler. Warning : Unpredictable on non-M ARMs.
  • 31. Operand restrictions • R15 must not be used as an operand or as a destination register. • RdHi, RdLo, and Rm must all specify different registers.
  • 33. Data Transfer • ARM is a load/store architecture • Involves -Load data from memory to register -Store data from register into memory • ARM has three types of load/store instructions -LDR/STR -LDM/STM -SWP
  • 35. Types of load/store instructions Simple load/store has options like the following • LDR/STR  involved in storing/loading words(32 bits) • LDRB/STRB involved with a byte transfer • In ARM v4 we also have support for halfwords(16 bits) LDRH/STRH without sign extension LDRSB/STRSB with sign extension • Condition codes can also be suffixed LDREQB/STREQB • General syntax looks somewhat like.. <LDR|STR>{<cond>}{<size>} Rd, <address>
  • 36. Base Register • STR r0,[r1] Stores content in address contained in r1 in r0 LDR r2,[r1] Loads content in address contained in r1 to r2 r0 Memory Source 0x5 Register for STR r1 r2 Base Destination 0x200 0x200 0x5 0x5 Register Register for LDR
  • 37. Off set from the base register • ARM also supports accessing locations pointed out as an offset from the base register • The offset can be An unsigned 12 bit immediate value(0-4096) A register with the option of shift • Option exists for ‘+’ or ‘-’ from base register • Offset can be applied - before transfer is made optionally auto incremnets base register by using ‘!’ -after transfer is made base register auto incremented
  • 38. Pre-Indexed Addressing • Example :STR r0,[r1,#12] r0 Source Memory 0x5 Register Offset for STR 12 0x20c 0x5 r1 Base 0x200 0x200 Register •Offset value can as well be -12 (STR r0,[r1,#-12]) •To perform auto increment on base reg STR r0,[r1,#12]! -updates base register to value 0x20C •If r2 contains 3 then this will yield the same result STR r0,[r1,r2,LSL#2] •Useful if only a particular element is to be accessed
  • 39. Post Indexed Addressing • Example :STR r0,[r1],#12 Memory Updated r1 Offset r0 Source Base 0x20c 12 0x20c 0x5 Register Register for STR 0x200 0x5 Original r1 Base 0x200 Register •If r2 contains 3 then this will also yield the same result STR r0,[r1],r2,LSL #2 •Useful if traversal is required through elements
  • 40. For half words/signed byte access • Instructions can be used in much the same way except - the offset value is restricted to 8 bits(0-255) - the registers cannot be shifted
  • 43. LDM/STM (Block data transfer) • Allow for transfer between 1-16 registers to or from memory • The transferred registers can be: - Any subset of the current bank of registers (default). - Any subset of the user mode bank of registers when in a privileged mode (postfix instruction with a ‘^’).
  • 45. Block Data Transfer • Base register determines where memory access can occur • Base register can be updated after data transfer by suffixing a ‘!’ • These instructions are useful for - Saving and restoring context - moving large chunks of data to/from memory
  • 47. Block Data Transfer • One use of stacks is to temporary create register space for subroutines STMFD sp!,{r0-r12, lr} ; stack all registers ........ ; and the return address ........ LDMFD sp!,{r0-r12, pc} ; load all the registers ; and return automatically • If the pop instruction also had the ‘S’ bit set (using ‘^’) then the transfer of the PC when in a priviledged mode would also cause the SPSR to be copied into the CPSR (see exception handling module).
  • 48. Direct functionality Of Block Data Transfer • When not being used for a stack operation these instructions can also be used in a generic way • The LDM/STM support a further set of instructions – STMIA / LDMIA : Increment After – STMIB / LDMIB : Increment Before – STMDA / LDMDA : Decrement After – STMDB / LDMDB : Decrement Before
  • 49. Criteria for different block data transfer
  • 51. Swap Instruction • The instruction is used to swap data between a register and a memory • This instruction is atomic (cannot be interrupted) • The swap address is determined by the contents of the base register (Rn). • The processor first reads the contents of the swap address. Then it writes the contents of the source register (Rm) to the swap address, and stores the old memory contents in the destination register (Rd). • The same register may be specified as both the source and destination
  • 52. Branch and Exchange •Used to switch between the Thumb state and the ARM state
  • 54. Branch and Branch with Link • Branch instructions contain a signed 2’s complement 24 bit offset. • This is shifted left two bits, sign extended to 32 bits, and added to the PC. • The instruction can therefore specify a branch of +/- 32Mbytes. • The branch offset must take account of the prefetch operation, which causes the PC to be 2 words (8 bytes) ahead of the current instruction. • Branches beyond +/- 32Mbytes must use an offset or absolute destination which has been previously loaded into a register. In this case the PC should be manually saved in R14 if a Branch with Link type operation is required.
  • 55. Link Bit • Branch with Link (BL) writes the old PC into the link register (R14) of the current bank. • The PC value written into R14 is adjusted to allow for the prefetch, and contains the address of the instruction following the branch and link instruction. • The CPSR is not saved with the PC
  • 56. Barrel Shifter • A barrel shifter is a digital circuit that can shift a data word by a specified number of bits in one clock cycle. • It can be implemented as a sequence of multiplexers (mux.), and in such an implementation the output of one mux is connected to the input of the next mux in a way that depends on the shift distance. • A barrel shifter is often implemented as a cascade of parallel 2×1 multiplexers.
  • 57. Using the Barrel Shifter •There are 2 options for shifting - where shift amount is stored in a base register bottom byte - shift amount as a % bit unsigned integer
  • 58. Shift Operations • Shifts Left by specified amount (multiplies) • Example: LSL #5 CF Destination 0
  • 59. Shift Operations • Logical Shift Right • Shifts right without preserving sign bit ...0 Destination CF • Arithmetic Shift Right • Preserves the sign bit Destination CF Sign bit shifted in
  • 60. Rotate • Rotate Right Same as ASR but the bits wrap around as they rotate The rotated bit also used as carry flag Rotate Right Destination CF
  • 61. Comparison • The only effect of the comparisons is to – UPDATE THE CONDITION FLAGS. Thus no need to set S bit. • Operations are: – CMP operand1 - operand2, but result not written – CMN operand1 + operand2, but result not written – TST operand1 AND operand2, but result not written – TEQ operand1 EOR operand2, but result not written • Syntax: – <Operation>{<cond>} Rn, Operand2 • Examples: – CMP r0, r1 – TSTEQ r2, #5
  • 63. Pipelining • Initially implemented a 3-stage pipeline organization. (upto ARM7) – Fetch – Decode – Execute
  • 64. • 3-stage pipeline organization – Principal components • The register bank • The barrel shifter – Can shift or rotate one operand by any number of bits • The ALU • The address register and incrementer – Select and hold all memory addresses and generate sequential addresses • The data registers • The instruction decoder and associated control logic
  • 65. • Fetch - The instruction is fetched from memory and placed in the instruction pipeline • Decode - The instruction is decoded and the datapath control signals prepared for the next cycle • Execute - The register bank is read, an operand shifted, the ALU result generated and written back into destination register
  • 66. • At any time slice, 3 different instructions may occupy each of these stages, so the hardware in each stage has to be capable of independent operations • When the processor is executing data processing instructions , the latency = 3 cycles and the throughput = 1 instruction/cycle • Drawback: Every data transfer instruction causes a pipeline “stall”. (Single memory for data and instruction- next instruction cannot be fetched while data is being read)
  • 67. 5-stage Pipeline Organization • Implemented in ARM9TDMI • Tprog = Ninst * CPI / fclk – Tprog: the time taken to execute a given program – Ninst: the number of ARM instructions executed in the program (compiler dependent) – CPI: average number of clock cycles per instructions => hazard causes pipeline stalls – fclk: frequency
  • 68. • Fetch – The instruction is fetched from memory and placed in the instruction pipeline • Decode – The instruction is decoded and register operands read from the register files. There are 3 operand read ports in the register file so most ARM instructions can source all their operands in one cycle • Execute – An operand is shifted and the ALU result generated. If the instruction is a load or store, the memory address is computed in the ALU
  • 69. • Buffer/Data – Data memory is accessed if required. Otherwise the ALU result is simply buffered for one cycle. • Write back – The result generated by the instruction are written back to the register file, including any data loaded from memory.
  • 70. 5-stage pipeline organization • Moved the register read step from the execute stage to the decode stage • Execute stage was split into 3 stages- ALU, memory access, write back. • Result: Better balanced pipeline with minimized latencies between stages, which can run at a faster clock speed.
  • 71. Pipeline Hazards • There are situations, called hazards, that prevent the next instruction in the instruction stream from being executed during its designated clock cycle. Hazards reduce the performance from the ideal speedup gained by pipelining. • There are three classes of hazards: – Structural Hazards – Data Hazards – Control Hazards
  • 72. Structural Hazards • When a machine is pipelined, the overlapped execution of instructions requires pipelining of functional units and duplication of resources to allow all possible combinations of instructions in the pipeline. • If some combination of instructions cannot be accommodated because of a resource conflict, the machine is said to have a structural hazard.
  • 73. • Ex. A machine has shared a single-memory pipeline for data and instructions. As a result, when an instruction contains a data-memory reference (load), it will conflict with the instruction reference for a later instruction (instr 3):
  • 74. Solution • To resolve this, we stall the pipeline for one clock cycle when a data-memory access occurs. The effect of the stall is actually to occupy the resources for that instruction slot. The following table shows how the stall is actually implemented.
  • 75. Solution • Another solution is to use separate instruction and data memories. • ARM has moved from the von-Neumann architecture to the Harvard architecture in ARM9. – Implemented a 5-stage pipeline and separate data and instruction memory. – Doesn’t suffer from this hazard.
  • 76. Data Hazards • They arise when an instruction depends on the result of a previous instruction in a way that is exposed by the overlapping of instructions in the pipeline. • The problem with data hazards can be solved with a hardware technique called data forwarding (by making use of feedback paths). • Without forwarding, the pipeline would have to be stalled to get the results from the respective registers • Example:
  • 77. Data Hazards • The first forwarding is for value of R1 from EXadd to EXsub. • The second forwarding is also for value of R1 from MEMadd to EXand. • This code now can be executed without stalls. • Forwarding can be generalized to include passing the result directly to the functional unit that requires it: a result is forwarded from the output of one unit to the input of another, rather than just from the result of a unit to the input of the same unit.
  • 78. Control Hazards • They arise from the pipelining of branches and other instructions that change the PC.
  • 80. THANK YOU •Alok Sharma •Aniket Thakur •Paritosh Ramanan •Pavan A.R.

Notas del editor

  1. Question: how can it result in a better balanced pipeline?; what do you mean by a balanced pipeline?