Lab 1 - SPO 600
Description
In this lab, we work will a basic assembly code that will color our screen here using this 6502 emulator
Here is our base code:
lda #$00 ; set a pointer in memory location $40 to point to $0200 sta $40 ; ... low byte ($00) goes in address $40 lda #$02 sta $41 ; ... high byte ($02) goes into address $41 lda #$07 ; colour number ldy #$00 ; set index to 0 loop: sta ($40),y ; set pixel colour at the address (pointer)+Y iny ; increment index bne loop ; continue until done the page (256 pixels) inc $41 ; increment the page ldx $41 ; get the current page number cpx #$06 ; compare with 6 bne loop ; continue until done all pages
The code above will fill the screen with yellow
Calculating Performance
Base code
To calculate the performance, we need to know the time the CPU takes to run the application. We need to know the number of cycles the CPU has to iterate to finish the code. Each instruction takes a different number of cycles and bytes to complete. To find out how many cycles each instruction takes, we have to check the 6502 manual. With the manual, we can put the code in the table and analyze it like below:
Cycles | Cycle count | Alt cycles | Alt count | Total | |||
lda #$00 | 2 | 1 | 2 | ||||
sta $40 | 3 | 1 | 3 | ||||
lda #$02 | 2 | 1 | 2 | ||||
sta $41 | 3 | 1 | 3 | ||||
lda #$07 | 2 | 1 | 2 | ||||
ldy #$00 | 2 | 1 | 2 | ||||
loop: | sta ($40),y | 6 | 1024 | 6144 | |||
iny | 2 | 1024 | 2048 | ||||
bne loop | 3 | 1020 | 4 | 4 | 3076 | ||
inc $41 | 5 | 4 | 20 | ||||
ldx $41 | 3 | 4 | 12 | ||||
cpx #$06 | 2 | 4 | 8 | ||||
bne loop | 4 | 4 | 16 | ||||
================ | |||||||
Total | 11338 | ||||||
CPU Speed | 1 | Mhz | |||||
uS per clock | 1 | ||||||
Time | 11338 | uS | |||||
11.338 | mS | ||||||
0.011338 | S |
The first for loop run 256 times because at every iteration, it increases the value in y register by 1, y register has 8 bits, so it will overflow when it reach 256, this will cause the loop to stop and move on to the next logic. However in the table above we see that every instructions in the first loop runs 1024 times, this is because the outer loop iterates 4 times (initial value for $41 was 2)
Improved version
So how do make this application runs faster? One of the easiest way is to avoid the loop from iterate too many times. For example, this is the base code
loop: sta ($40),y ; set pixel colour at the address (pointer)+Y iny ; increment index bne loop ; continue until done the page (256 pixels)
If we update 2 pixel each time we loop, the number of iteration will be half
loop: sta ($40),y
iny
sta ($40),y
iny
bne loop ; continue until done the page (256 pixels)
With this, we save 1538 cycles. If we go all for performance, we can just copy paste those two lines 256 times. However, who does that? I'm not gonna copy/paste and count if I have 256 sta ($40), y and iny though. Here is my version, where I do that 16 times, this results in this performance:
Cycles | Cycle count | Alt cycles | Alt count | Total | ||||
lda #$00 | 2 | 1 | 2 | |||||
sta $40 | 3 | 1 | 3 | |||||
lda #$02 | 2 | 1 | 2 | |||||
sta $41 | 3 | 1 | 3 | |||||
lda #$07 | 2 | 1 | 2 | |||||
ldy #$00 | 2 | 1 | 2 | |||||
loop: | sta ($40),y | 6 | 64 | 384 | ||||
iny | 2 | 64 | 128 | |||||
sta ($40),y | 6 | 64 | 384 | |||||
iny | 2 | 64 | 128 | |||||
sta ($40),y | 6 | 64 | 384 | |||||
iny | 2 | 64 | 128 | |||||
sta ($40),y | 6 | 64 | 384 | |||||
iny | 2 | 64 | 128 | |||||
sta ($40),y | 6 | 64 | 384 | |||||
iny | 2 | 64 | 128 | |||||
sta ($40),y | 6 | 64 | 384 | |||||
iny | 2 | 64 | 128 | |||||
sta ($40),y | 6 | 64 | 384 | |||||
iny | 2 | 64 | 128 | |||||
sta ($40),y | 6 | 64 | 384 | |||||
iny | 2 | 64 | 128 | |||||
sta ($40),y | 6 | 64 | 384 | |||||
iny | 2 | 64 | 128 | |||||
sta ($40),y | 6 | 64 | 384 | |||||
iny | 2 | 64 | 128 | |||||
sta ($40),y | 6 | 64 | 384 | |||||
iny | 2 | 64 | 128 | |||||
sta ($40),y | 6 | 64 | 384 | |||||
iny | 2 | 64 | 128 | |||||
sta ($40),y | 6 | 64 | 384 | |||||
iny | 2 | 64 | 128 | |||||
sta ($40),y | 6 | 64 | 384 | |||||
iny | 2 | 64 | 128 | |||||
sta ($40),y | 6 | 64 | 384 | |||||
iny | 2 | 64 | 128 | |||||
bne loop | 3 | 60 | 4 | 4 | 196 | |||
inc $41 | 5 | 4 | 20 | |||||
ldx $41 | 3 | 4 | 12 | |||||
cpx #$06 | 2 | 4 | 8 | |||||
bne loop | 4 | 4 | 16 | |||||
================ | ||||||||
Total | 7946 | |||||||
CPU Speed | 1 | Mhz | ||||||
uS per clock | 1 | |||||||
Time | 7946 | uS | ||||||
7.946 | mS | |||||||
0.007946 | S |
As you can see we go from 11338 cycles to 7946 cycles, which is roughly 30% performance improvement.
Update color
To change the color printing out on the screen, we can check this document to see a list of available colors (Peripherals and Memory Map section). Assuming we want this to become light blue, we change
lda #$07to
lda #$0e
This basically will update the last for bits of every memory location with #$0e, which specifies the color each pixel will show.
Display a different color in each quarter of the screen
To update the color each pixel display we need to increase the color value. Right now the color value is stored in the Accumulator, to increase its value, we can use adc #value. This will increase the value in the accumulator by the number we supply in adc. In this case I want to increase it by one:
loop: sta ($40),y lda #$00 ; set a pointer in memory location $40 to point to $0200 sta $40 ; ... low byte ($00) goes in address $40 lda #$02 sta $41 ; ... high byte ($02) goes into address $41 lda #$04 ; colour number ldy #$00 ; set index to 0 loop: sta ($40),y ; set pixel colour at the address (pointer)+Y iny ; increment index bne loop ; continue until done the page (256 pixels) inc $41 ; increment the page ldx $41 ; get the current page number adc #1 cpx #$06 ; compare with 6 bne loop ; continue until done all pages
And with that, we have this result:
Impression
I'm super new to machine code, it takes me forever to understand what I'm doing as most of the stuff I do is with web and application. I wouldn't be able to finish this lab without the help of my professor. Now that I have the basic understanding. I think I'll work toward a side project so that I can further improve myself and see what I can do with a CPU
Comments
Post a Comment