===================================================================
Assembly Tutorial
"This is for all you folks out there, who want to learn the magic art of Assembly programming." - MAD
Index of Section 1
Ready To Start Memory Segmentation Code Example The Stack The Naming Convention | Main Page Next Chapter | First thing you need to know is that Assembly is a great, fast language, but only if you put time and effort in learning it. You must give all or nothing. (I suggest you give all) And remember, the beginning is always borring and hard...so don't give up ! |
Ready to Start!
First of all, we'll talk about the registers and then about the instructions to manipulate (change) them. The 8086 has 14 16-bit registers, all with different usage (see below). You might not understand some of the registers purposes, but be patient, I'll explain everything later.
| (The general purpose registers can be "split". You have the AH and the AL register for example. AH contains the high byte of AX and AL contains the lowbyte. You also have: BH, BL, CH, CL, DL, DH So if eg. DX contains the value 1234h DH would be 12h and DL would be 34h). |
| Test it! If you want to see all these register and flags, you can go to DOS and then start "debug" (just type debug) When you're in debug, just type "r" and you'll see all the registers and some abreviations for the flags. Type "q" to quit again. We won't use debug to program in this tutorial, we'll use a real assembler. I use TASM 3.2, but MASM or any other assembler works just fine too. |
Memory Segmentation
Now I've to explain something about the way the 8086 uses memory (actually about how DOS uses memory). Since the databus of the 8086 is 16-bits, it can move and store 16-bits (1 word=2 bytes) at a time. If the processor stores a "word" (16-bits) it stores the bytes in reverse order in the memory. It looks like this:1234h (word) ---> memory 34h (byte) 12h (byte)
So if the memory looks like this: 78h 56h and you get a word from memory you'll get the value 5678h. (note, I use the "h" after a number to indicate it's hexadecimal) However, if you just get a byte from memory it goes this way: memory 78h 56h -----> first byte you get 78h. Okay, pretty clear huh?
Now let's talk about segments. The 8086 divides it's memory into segments. Segments are (standard in DOS) 64 KB big and have a number. These numbers are stored in the segment registers (see above). Three main segments are the code, data and stack segment. Segments overlap each other almost completely. If you start debug again and type "d" you can see some addresses at the left of the screen. The format is like this: 4576:0100. that's a memory address. The first number is the segment number and the second number is the offset within the segment. So FFFF:FFF0 means: Segment FFFFh and FFF0h bytes from the beginning of the segment.
As I said before, segments overlap. The address 0000:0010 is EXACTLY the same address as 0001:0000. That means that segment begin at paragraph boundaries. (a paragraph=16 bytes, so the segment starts at an address divisible by 16) Now you can start calculating REAL addresses in memory. An example: 0000:0010 means: segment 0000h offset 10h Now we multiply the segment number with 16 and add the offset.
Note that the offset 10h means the value 16 in decimal:
Next, the other address 0001:0000:
By The Way, this segmentation of memory is actually done by DOS at startup. On a 286 or higher, you have something called real-mode and protected-mode. This Segment explanation is based on Real-mode, in Protected-mode it's way different, but don't bother, that's real complicated stuff you don't need to know. Just assume that what I explained about segments is ALWAYS true. But remember in the back of your head, that there's more.... Trust me...... I know what I'm talking about.
Our first program
Our first program will be a real simple one. I'll first give you the code and then I'll explain it. Here's the code, cut it out and put it in a file called FIRST.ASM. Download the source.
.model small
.stack
.data
message db "Hello world, I'm learning Assembly !!!", "$"
.code
main proc
mov ax,seg message
mov ds,ax
mov ah,09
lea dx,message
int 21h
mov ax,4c00h
int 21h
main endp
end main
|
.model small : Lines that start with a "." are used to provide the assembler with infomation. The word(s) behind it say what kind of info. In this case it just tells the assembler the program is small and doesn't need a lot of memory. I'll get back on this later.
.stack : Another line with info. This one tells the assembler that the "stack" segment starts here. The stack is used to store temporary data. It isn't used in the program, but it must be there, because we make an .EXE file and these files MUST have a stack.
.data : indicates that the data segment starts here and that the stack segment ends there.
.code : indicates that the code segment starts there and the data segment ends there.
main proc : Code must be in procedures, just like in C or any other language. This indicates a procedure called main starts here. main endp states that the procedure is finished. Procedures MUST have a start and end. end main : tells the assembler that the program is finished. It also tells the assembler were to start the program. At the procedure called main in this case.
message db "xxxx" : DB means Define Byte and so it does. In the data-segment it defines a couple of bytes. These bytes contain the information between the brackets. "Message" is a name to indentify this byte-string. It's called an "indentifier".
mov ax, seg message : AX is a register. You use registers all the time, so that's why you had to know about them before I could explain this. MOV is an instruction that moves data. It can have a few "operands" (don't worry, I'll explain these names later) Here the operands are AX and seg message. Seg message can be seen as a number. It's the number of the segment "message" is in (The data-segment) We have to know this number, so we can load the DS register with it. Else we can't get to the bit-string in memory. We need to know WHERE the bit-string is located in memory. The number is loaded in the AX register. MOV always moves data to the operand left of the comma and from the operand right of the comma.
mov ds,ax : The MOV instruction again. Here it moves the number in the AX register (the number of the data segment) into the DS register. We have to load this DS register this way (with two instructions) Just typing: "mov ds,segment message" isn't possible.
mov ah, 09 : MOV again. This time it load the AH register with the constant value nine.
lea dx, message : LEA Load Efective Address. This intructions stores the offset within the datasegment of the bit-string message into the DX register. This offset is the second thing we need to know, when we want to know where "message" is in the memory. So now we have DS:DX. See the segment explanation above.
int 21h : This instruction causes an Interrupt. The processor calls a routine somewhere in memory. 21h tells the processor what kind of routine, in this case a DOS routine. INT's are very important and I'll explain more of them later, since they're also very, very complex. However, for now assume that it just calls a procedure from DOS. The procedure looks at the AH register to find out out what it has to do. In this example the value 9 in the AH register indicates that the procedure should write a bit-string to the screen.
mov ax, 4c00h : Load the Ax register with the constant value 4c00h
int 21h : The same INT again. But this time the AH register contains the value 4ch (AX=4c00h) and to the DOS procedure that means "exit program". The value of AL is used as an "exit-code" 00h means "No error"
That's it!!! You now fully understand this program (I hope)
Go to DOS and type "debug first.exe". The debug screen will appear. When you are in the debugger, type "d". You see some addresses and our program.
Now type "u" you'll see a list that looks like this:
0F77:0000 B8790F MOV AX,0F79 0F77:0003 8ED8 MOV DS,AX 0F77:0005 B409 MOV AH,09First 0F77:0000 is the segment number and offset. B8790F is the machine code of the mov ax,0f79 instruction. B8 means "mov ax," and 790F is the number. (reversed order) Note that the instruction was:
Now let's calculate another address for the data. 0F79:0000 substract 2 from the segment number. That would give you 0F77 (the code segment). 0002:0000 --> 2*16+0=32. Two segments further means 32 bytes further, and that means an offset of 32.
So at this location the data is: 0F77:0020. Check by typing "d 0f77:0020". Please note that it's the SAME data. We can see it at multiple addresses only because the segments overlap! But in the program we said the data had to be in a data-segment. Remember, the .data instruction? Well, it IS in a data-segment, the data is just stored directly behind the code, but that doesn't matter. I mean, we can address the data with a segment number and an offset of zero.
Also note, that after the int 21h instruction to end the program the data doesn't immediately start, first there some undefined bytes. (probably zero) That's because segments start at paragraph boundaries. The data-segment couldn't start at 0F77:0010 anymore, because there is code there, if there wasn't any code there, the data-segment would have been: 0F78. So the data-segment has to be 0F79 (closest match) and so, some bytes after the code and before the data just take up space. But that doesn't matter. Please remember that the assembler doesn't care how the segment are in the .ASM file. In this example we first declared the data-segment, but the assembler puts it last in memory.
The Stack
The stack is a place where data is temporarily stored. The SS and SP registers point to that place like this: SS:SP So the SS register is the segment and the SP register contains the offset. There are a few instructions that make use of the stack. POP and PUSH are the most basic ones. PUSH can "push" a value on the stack and POP can retrieve that value from the stack. It works like this:
| The final value of AX will be 1234h. First we load 1234h into AX, then we push that value to the stack. We now store 9 in AH, so AX will be 0934h and execute an INT. Then we pop the AX register. We retreive the pushed value from the stack. So AX contains 1234h again. Another example: | MOV AX, 1234H | The final values will be: |
MOV AX,1234H MOV BX,5678H PUSH AX PUSH BX POP AX POP BX | The values: AX=5678h BX=1234h First the value 1234h was pushed after that the value 5678h was pushed to the stack. Acording to LIFO 5678h comes of first, so AX will pop that value and BX will pop the next. |
Names
There are some names you need to know. Well, you don't HAVE to know them, but it's handy if you do. I'll use these names from now on, so better learn them.Indentifiers | An identifier is a name you aply to items in your program. the two types of indetifiers are "name", wich refers to the address of a data item, and "label", wich refers to the address of an instruction. The same rules aply to names and labels. |
Statements | A program is made of a set of statements, there are two types of statements, "instructions" such as MOV and LEA, and "directives" wich tell the assembler to perform a specific action, like ".model small" |
indentifier - operation - operand(s) - comment
The identifier is the name as explained above.
The operation is an instruction like MOV.
The operands provide information for the Operation to act on. Like MOV (operation) AX,BX (operands).
The comment is a line of text you can add as a comment, everything the assembler sees after a ";" is ignored.
So a complete instruction looks like this:
MOVINSTRUCTION: MOV AX,BX ;this is a MOV instruction
The label and the comment are optional. In fact I allready explained directives , but, okay, I'll do it again. Directives provide the assembler with information on how to assemble a .ASM file. .MODEL SMALL, or .CODE are, for example, directives.
And so we have come to the end of Section 1 of this tutorial. If you fully understand this stuff (registers, flags, segments, stack, names, etc.) you may, from now on, call yourself a "Level 0 Assembly Coder". Congratulations!
In Part 2 I'll explain some more instructions and I'll explain how to address data yourself.
I'll also explain the Interrupts and interrupt table.
Design & Concept by UZillusion
Text by MAD
Send any comments and sugestions to UZTeam