Transcript Document
Exercises Embedded Systems William Sandqvist [email protected] 1.1 The C-function int fac_c(int x) fac_c(5) { int f; calculates if(x <= 0) f = 0; else 1*5*4*3*2*1=120 { f = 1; while(x > 1) { f = f * x; x--; } } We should document our return f; } code. You can find a Flowchart int fac_c(int x) x<=0 else N Y if f=0 f=1 while x>1 N Y f=f*x x = x-1 flowchart tool in Word or Powerpoint. This could be useful for lab reports. William Sandqvist [email protected] return f End main in C Message to the linker: #include<stdio.h> fac_asm() is an external function extern int fac_asm(int); (from an other file). int fac_c(int); int main(void) { int c_result, asm_result; int x; while(1) { printf(”Enter a number: ”); scanf(”%d”, &x); c_result = fac_c(x); asm_result = fac_asm(x); printf(”C-result: %d\n”, c_result); printf(”Asm-result: %d\n”,asm_result); } return 0; } William Sandqvist [email protected] Structure diagram? To document the program structure, a structure diagram could be useful. It could be directly translated into structured programming. ( while, if, else … ) But in assembler, we are not interested in the program structure, but in the program flow. William Sandqvist [email protected] The Flowchart The flowchart could be directly translated to assembler code. William Sandqvist [email protected] How to program the Nios processor? The Nios processor is the Altera version of a MIPS processor. It is designed to make efficient use of the resources in a FPGA. It comes in three versions: Small – Medium – Large … William Sandqvist [email protected] Nios II registers 0…15 Use as constant ”0”! If you call a subroutine, save the contents of the registers you’ve used on stack! William Sandqvist [email protected] Nios II registers 16…31 Points to the stack! William Sandqvist [email protected] Register operations, R-type instructions William Sandqvist [email protected] Program constants, I-type instructions Some pseudoinstructions: movi rB, IMMED addi rB,r0,IMMED movia rB,label orhi rB,r0,%hiadj(label) addi rB,r0,%lo(label) William Sandqvist [email protected] I-type, Branch Pseudoinstruction: ble branch if less than or equal signed bge is the ble with register A and B swapped! The IMM16 adress is effectively a 18 Byte-adress because instructions must be word-aligned. William Sandqvist [email protected] Conditional operators of C Compare two registers and branch relative if the expression is true. All C-language conditional operators have assembly instructions (or pseudoinstructions). William Sandqvist [email protected] Memory content, Load and Store Store in memory … stw r6, 100(rA) William Sandqvist [email protected] The call and ret instructions William Sandqvist [email protected] From Flowchart to assembler William Sandqvist [email protected] Assembler fac_asm has to be made known to other files .global fac_asm .text # Parameter in r4 (and if needed in r5, # Return value in r2 (and r3 if long or # we can use r2 and r3 for calculations # r8 … r15 must be saved by caller of a r6, r7) double) until return sub fac_asm: # int r2 fac_asm(int r4 x), the function prototype # r3 : for constant ”1” if: ble r4, r0, else # if(x <= 0) movi r3, 1 # constant ”1” mov r2,r3 # f = 1 while: ble r4,r3, endsub # while(x>1){ mul r2,r2,r4 # f = f*x sub r4,r4,r3 # x = x - 1 br while # } else: mov r2, r0 # f = 0 endsub: ret # return r2 .end William Sandqvist [email protected] Exercises Embedded Systems William Sandqvist [email protected] 2.1 Prioritized interrupts William Sandqvist [email protected] Exercises Embedded Systems William Sandqvist [email protected] 2.2 Input/Output R/W reverses the direction of the databuss. CS Chip Select enables the chip Connect a 8 register memory-mapped peripheral to the CPU. The CPU has 8 bit address and data busses. The peripheral should have registeraddresses 0x10…0x17. William Sandqvist [email protected] Decode - doorlock How to open the doorlock? Press 4 (d) and 8 (h) simultaneously but don’t press any other key! William Sandqvist [email protected] Connections 0x10 0x11 0x12 0x13 0x14 0x15 0x16 0x17 = = = = = = = = 00010.000 00010.001 00010.010 00010.011 00010.100 00010.101 00010.110 00010.111 Decoder CS A7 A6 A5 A4 A3 CS RS2RS1RS0 William Sandqvist [email protected] Why memory cache? William Sandqvist [email protected] Exercises Embedded Systems William Sandqvist [email protected] 3.2 Hitrate and accesstime a) tAVG = 8 ns h = ? h is hitrate. b) tAVG = 15 ns h = ? c) tAVG = 6 ns h = ? William Sandqvist [email protected] Hitrate calculations tAVG h tC (1 h) tM h (tC tM ) tM tM tAVG h t M tC a) h 70 8 0,954 70 5 c) h tAVG 8, 15, 6 ns b) h 70 15 0,846 70 5 70 6 0,985 70 5 William Sandqvist [email protected] Exercises Embedded Systems William Sandqvist [email protected] Exercises Embedded Systems William Sandqvist [email protected] 3.1 Memory system In this example. The Blocktransfer is Cache-line of 2 words. The memory is Byteorganized, but we could draw it as if it was organized in Memory-lines with the same size as the Cache-line. This will simplify all figures. Direct addressmapping: Memory-line: i Cache-line: j = i % K William Sandqvist [email protected] Why Blocktransfer? ”1 word” 3TBus/word ”2 words” (3+1)/2 = 2TBus/word ”4 words” (3+1+1+1)/4 = 1.5TBus/word • To transfer 1 ”random” word in memory takes three buscykles 3TBus/word ( 2 TBUS are Waitstates) • To transfer a ”Burst” of 2 words takes 3+1 buscykles, 4/2 = 2TBus/word • To transfer a ”Burst” of 4 words takes 3+1+1+1 buscykler, 6/4 = 1,5TBus/word • To transfer a ”Burst” of 8 words takes 3+1+1+1+1+1+1+1 buscykles, 10/8 = 1,25TBus/word Remember, to make these gains, you must have use for most of the transfered words – otherwise blocktransfer could be even slower than random transfer! This is just an example. Other accesspatterns exists, eg. 5+3+3+3 and so on. The busclock is derived from the processorclock, perhaps TBUS = 10*TCPU. William Sandqvist [email protected] Mapping of memory address Memory 4kB 4*210 = 212 Bytes. Memory address: mmmmmmmmmmmm Cache 8 Word, 8*32 Bytes. Cache-line 2 Word, 2*4 Byte. Cache-address: ll.w.bb Memory – Cache mapping: mmmmmmm.mm.m.mm ttttttt.ll.w.bb The Adress Tag Adress in Cache is irrespective of tag-bits! Our example: Data-adresses are acessed four times in this order: 0x010, 0x1FC, 0x168, 0x008, 0x014, 0x1F8, 0x00C William Sandqvist [email protected] Memory and Cache Data is acessed from three different locations (Tags), but they will map to the same lines in this small cache! William Sandqvist [email protected] Direct mapped Cache Memory-address mem-location Tag (#) Cache .ll.w.bb 0x010 0000000.10.0.00 0000000. (0) .10.0.00 2(0) 0x1FC 0001111.11.1.00 0001111. (1) .11.1.00 3(1) 0x168 0001011.01.0.00 0001011. (2) .01.0.00 1(2) 0x008 0000000.01.0.00 0000000. (0) .01.0.00 1(0) 0x014 0000000.10.1.00 0000000. (0) .10.1.00 2(0) 0x1F8 0001111.11.0.00 0001111. (1) .11.0.00 3(1) 0x00C 0001111.01.1.00 0001111. (1) .01.1.00 1(1) William Sandqvist [email protected] Line#(Tag#) Program execution Data-adresses are acessed four times in this order: 0x010, 0x1FC, 0x168, 0x008, 0x014, 0x1F8, 0x00C Cache access, line#(tag#): 2(0)3(1)1(2)1(0)2(0)3(1)1(1) 2(0)3(1)1(2)1(0)2(0)3(1)1(1) 2(0)3(1)1(2)1(0)2(0)3(1)1(1) 2(0)3(1)1(2)1(0)2(0)3(1)1(1) CCCMHHM HHMMHHM HHMMHHM HHMMHHM C, ColdMiss = line entry to a previously unused cache memory (This counts as a Miss) M, Miss = the previous line entry was from an other location (tag) H, Hit = the previous line entry was from the same location (tag) 2 3 4 h 4 0,5 7 7 1 4 William Sandqvist [email protected] 2-way set associative cache Memory address: mmmmmmmm.m.m.mm Address mapping: tttttttt.l.w.bb OBSERVE! The set number is not included in the address map. Logic circuits within the associtive cache takes care of the set number and connects the CPU with the correct set. ( Tags are stored in associative cache for each line in every set. All sets are searched in parallell for tag. ) William Sandqvist [email protected] Example of how an associative cache can boost performance Memory: 0x010, Tag: 0x01 Cache: 0x0=0b0.0.00 Memory: 0x1FC, Tag: 0x1F Cache: 0xC=0b1.1.00 Memory: 0x168, Tag: 0x16 Cache: 0x8=0b1.0.00 Memory: 0x008, Tag: 0x00 Cache: 0x8=0b1.0.00 Memory: 0x014, Tag: 0x01 Cache: 0x4=0b0.1.00 Memory: 0x1F8, Tag: 0x1F Cache: 0x8=0b1.0.00 Memory: 0x00C, Tag: 0x00 Cache: 0xC=0b1.1.00 ( Nice example. The Cache part is one full hex digit.) William Sandqvist [email protected] Fewer conflict misses Memory locations 0x010, 0x014 are stored in cacheline 0 – But there are two sets! Both can be stored simultaneously. 0x1FC, 0x168, 0x008, 0x1F8, 0x00C are stored in cache-line 1, Two of them could be stored simultaneously. You have to consider the exchange policy in order to be able to analyse this example in full detail. (Not given). Exchange policy: FIFO, RANDOM, LRU … If the exchange policy were known, we could follow the cache accesses for every step to calculate hitrate: line,set(tag) line,set(tag) … William Sandqvist [email protected]