Assembly Language - Part 1

OK, so here we go! At this point, it's important to state that the assumption will be made that the reader is familiar with the basics of Intel 80x86 family assembly language.  If not, there are several books available, and a few of them are even worth spending money on. However, I very strongly recommend starting by checking out a book from your local library to learn the basics, then visit the Intel developer's site to get the specification for the various 80x86 processors.

The most important tool the novice assembly programmer will have is, of all things, the cc (or gcc) compiler. Nearly every release of cc from any vendor (including DEC, Sun, HP, and others) will have a very intriguing switch, the -S switch. This switch will compile a .c file, but instead of continuing with linking and so on, it only creates an assembly language file, usually named with the same base name, but replacing the .c extension with a .s extension. Therefore, if you have a code snippet that you want to see the machine's assembly language of, you can copy that snippet into a function in a file (call it "test.c" for example), compile with the -S switch, and find the assembly language file (in this example, "test.s").

Knowing that, let's take a look at why we might use such a process. If you'll forgive my use of a dirty word, let's look back to the early days of Windoze, we find a time where Pascal is competing with 'C' to be the dominant programming language. These two languages had one feature in common (as with almost all higher-level languages), passing parameters into functions. Not wanting to spoil the surprise, Pascal would pass parameters in one order, and 'C' would pass parameters in reverse order. So, for this first example, let's look in great detail at how does 'C' pass it's parameters.

Let's consider the following file named "parameters.c" which includes two functions. The first function, int func1( int, int, int) is the function I want to pass parameters to, call, and then receive a return value from, all being integer values for now. inside the function, I am performing a number of calculations. Note that I am doing something different with each parameter so that I can identify exactly what is being done to each parameter. The value for p1 is simply pulled from the parameter, p2 is going to be multiplied by 2, and p3 is going to be subtracted from 3. A final thing to note is that his is a very simple code snippet - it is not expected to run, although renaming "call_f1()" to "main()," we could compile and run this as a program, but it would not appear to do anything.

int func1( int p1, int p2, int p3 )
{
return( p1 + ( 2 * p2 ) + ( 3 - p3 ) );
}

int call_f1( void )
{
int a;

a = func1( 1, 2, 3 );

return -a;
}

That is pretty simple and straight-forward, itn't it? So, we use our friendly command:

gcc -S -masm=intel parameters.s

Wait, what's that "-masm=intel" thing? For one thing, it should be noted that Linux has it's roots in unix, which arguably owes it's existence to AT&T. AT&T uses a slightly different format from Intel, and for the moment, I want to remain true to the Intel format. If the -masm=intel switch is left out, the compiler will assume we intended to use the AT&T format, and we will get something that looks very different.

That being said, now we can look at the assembly language file produced:

        .file   "parameters.c"
.intel_syntax
.text
.globl func1
.type func1,@function
func1:
push %ebp
mov %ebp, %esp
mov %eax, DWORD PTR [%ebp+12]
sal %eax, 1
add %eax, DWORD PTR [%ebp+8]
sub %eax, DWORD PTR [%ebp+16]
add %eax, 3
leave
ret
.Lfe1:
.size func1,.Lfe1-func1
.globl call_f1
.type call_f1,@function
call_f1:
push %ebp
mov %ebp, %esp
sub %esp, 8
sub %esp, 4
push 3
push 2
push 1
call func1
add %esp, 16
mov DWORD PTR [%ebp-4], %eax
mov %eax, DWORD PTR [%ebp-4]
neg %eax
leave
ret
.Lfe2:
.size call_f1,.Lfe2-call_f1
.ident "GCC: (GNU) 3.2.2 20030222 (Red Hat Linux 3.2.2-5)"

Holy code-glob, Batman! What is all of that? OK, let's look at this slowly. The first line is fairly common in almost all assembly language files. .file "filename" will set the name for the assembler to use in it's listing. Note that the filename in the statement does not have to match the actual file name. In this case, the original file name was "parameters.c" but the name of the file we are looking at is actually "parameters.s" as a result of using gcc with the -S. the .file statement is not required, but any form of documentation is always recommended.

Next is the command, ".intel_syntax" which is simply a result of using the "-masm=intel" switch. This is important because it will tell the assembler that we are writing a file in the intel syntax instead of the AT&T syntax. without this command, the assembler will choke when it tries to read AT&T format and sees intel syntax instead.

Now we find the ".text" command. This tells the assembler that the following lines of code belong to the program's main section of executable program code. Later, we will find other sections of code and data.

At long last, we finally arrive at the actual functions. Here, we find two lines, beginning with ".globl func1" which is marking the label, "func1" to be listed in the object file as a label that can be seen from external object files. (NOTE: Remember that object files are still not directly executable, and must still be linked to additional object files and/or libraries.) This is followed by a command that identifies the label "func1" as a function.

This is the point where the actual function is defined, including the label "func1" and the actual lines of code, which I will come back to in a moment.

After the function's body of code, we find two last lines of code, beginning with what looks like another command, ".Lfe1:" which is not a command, but simply a label definition. IT is important to note that the label ".Lfe1" is not a legal label name in 'C,' C++, or any of the derived languages. This label is also not made global (externally visible) because it is only used in the following line, which looks strange, but it simply defines the total length of the function. this value is used in the link table in the object file, but it is not required. (In fact, I have written many files without the .size command and never had any errors or problems in leaving the .size command out of the file. DO NOT interpret that to mean that the command is optional, and it is, in fact, probably still wise to include it in all functions.)

OK, now to look at the generated code. The original reason for producing these files was to determine, in which order does the compiler put parameter values onto the stack? The easiest way to determine this is to look first at the line in call_func1():

  a = func1( 1, 2, 3 );

...and by looking at the assembly language that was produced, we find the following lines of code:

  push  3
push 2
push 1
call func1

What this tells us is that just before calling the function, the three parameter values are pushed onto the stack, but the last parameter (the right-most) is pushed first, and the first parameter (the left-most) is pushed last. Why push them backwards like that? Consider the printf() function, formally defined as "int printf( char *format, ... )" which means that the first parameter, a character pointer, must exist and be the first parameter. Any other parameters may or may not even exist. When the libstdc library (which contains the printf() function), looks only at the first parameter on the stack, and it will only look for other parameters when it discovers it needs them, as prescribed by "%s," "%d," "%c" and other special formatting commands requiring addition parameters.

Before going into the called function, we have to pause for a moment to recognize that the stack builds in a downward direction. That means that when a value is pushed onto the stack, the updated stack pointer is then decreased, not increased. At the moment just before the function is called, the program's stack looks like this, assuming higher addresses are shown as the top of list:

  3
2
1 <--- Stack pointer

And immediately after the function call:

  3
2
1
return address <--- Stack pointer

Now, we find ourselves inside the function, ready to execute the first operation, which, in any code generated by the compiler will always be, "push %ebp" followed by "mov %ebp,%esp" in intel format or "mov %esp,%ebp" in AT&T format. These are actually the same command, but the two syntax modes switch the order around. What is really happening here is, we want to save the value that the EBP register had when we entered the function, but while we are in the function, we may need to change the stack pointer, so we'll use the EBP to view parameters on the stack. The stack looks like this:

  3
2
1
return address
Entry EBP <--- Stack pointer <---EBP register

This is called the "stack frame." This stack frame is always going to give the function a common point of reference to be able to address (access) the parameters. Even when the stack pointer gets modified to point to different locations (see below), EBP will always point to the same location in the stack frame.

From this point, reading the parameters becomes a nearly trivial task. "Nearly," because there are some assumptions being made because of the data types chosen. In this case, the three parameters are all of type int, which each requires 4 bytes (32 bits) of space, even though the values of 1, 2, and 3 could all be stored in much smaller storage classes. Likewise, since Linux operates in 32-bit mode and a "flat" memory model, the return address also uses 4 bytes, and the EBP storage uses another 4-byte space, all of the values used consume 4 bytes. This is important because if I want to view the first parameter on the stack, I can use the expression "[%ebp+8]" to read this value:

  mov %aex, DWORD PTR {%ebp+8] #read param1, store in EAX register

Now, we can finally look at what happens in the function! We find the push and mov commands I described earlier, but then we find the list of operations:

  mov   %eax, DWORD PTR [%ebp+12]
sal %eax, 1
add %eax, DWORD PTR [%ebp+8]
sub %eax, DWORD PTR [%ebp+16]
add %eax, 3

Very slowly, the first thing we see it that the program copies the second parameter into the EAX register, and then shifts it left one nit. This shift is the same as doubling the value, so EAX now contains the value 2*parameter2. This is NOT what our original statement showed, but we'll continue on for a moment. After having 2*parameter2 in the EAX register, the next thing we see is that we add the parameter1 value, followed by subtracting the third parameter, and finally adding 3. So, what we have in the EAX register is:

(2*p2)+p1-p3+3

Our original equation in the 'C' file is:

p1 + (2*p2) + (3-p3)

Fortunately, it is a simple matter of algebra to discover that the two equations are identical from a mathematical perspective. The only thing that is really any different, is the equation, as it exists in the 'C' file, would require extra storage space to store temporary values, while the equation actually used my the program does all of it's work in the EAX register. This is a result of some optimization done by the compiler.

Finally, notice that the function leaves it's result in the EAX register. When we look at the function that calls func1(), we see that once program control returns to the instructions immediately after the call, it cleans up space, mostly a result of the three parameters it had to put onto the stack to call the function. There is a bit of a mystery here, because the three integers that were pushed onto the stack only consumed 12 bytes, and the cleanup of the stack is done by adding 16 back to the stack pointer. There is a subtraction of 4 from the stack pointer, so there is a total of 16 bytes, but why subtract 4 bytes more than necessary? I don't know what the answer to that is.

Notice the next line:

  mov DWORD PTR [%ebp-4],%eax

What is happening here is a little obscure. If we look back at the original 'C' code again, we notice that the second function, the one which currently has our interest), contains a local variable, a. a is only going to exist within this function, and the compiler will address this local variable very similarly to the parameters. In fact, the local variable(s) are actually part of the stack frame. func1() did not use any local variables, but the calling function does. it's stack frame looks like this (I added the values that refer to offsets in the addresses that are used, relative to the location EBP actually points to):

  +16 3 (parameter3)
+12 2 (parameter2)
+8 1 (parameter1)
+4 return address
0 <---EBP
-4 local variable a
-8 unused space <--- stack pointer

Notice in this stack frame that the stack pointer has been modified (by the instruction, "sub %esp,8") to leave 4 bytes for the local variable, a. I must admit, I am not sure why the compiler used 8 bytes for local storage. Usually, this can happen (even without local variables being declared) if the compiler decides that it needs to use temporary space, so even though no additional space is actually used, the compiler must have made the assumption that it would need this space.

The last thing that occurs in all functions is the leave instruction. This actually undoes the first two instructions (push %ebp and mov). Technically speaking, there is an "enter" command that should be used instead of the push/mov pair, but other switches need to be used at assemble time, or an error message comes up.

Finally, here is the AT&T format. There isn't really that much differance, but what diferances are her are very important.

The first thing to notice is the apparent reversal in operands. In the Intel format, the destination register is immediately after the instruction. In the Intel format, the command "mov %eax,2" should be read as "move into EAX register, the value of 2." The same instruction in AT&T format would appear as "mov $2,%eax" and should be read as, "move the value 2 into the EAX register," and in this case, the destination register is the last thing listed in the command.

Next, the immediate addressing mode in the Intel format does not use any special character modifier. Refering back to the previous statement for the mov instruction, notice how the 2 has a dollar sign ($) in front of the 2. In the AT&T format, if the dollar sign is omitted, the assumption is that the mode is direct addressing instead of immediate. In other words, the instruction "mov 2,%eax" will not load 2 into the EAX register, but instead, will attempt to read the memory at address 2, and whatever byte is stored in that location will be loaded into the EAX register. Sine it is extremely unlikely that this memory location will actually belong to the running program, this should cause a "segmentation violation" error, unless runnin gthe program as root which you should never ever EVER even think about doing under any circumstances. (NOTE: device drivers often use assembly language, and they must always be run by root, but this is the only exception to that rule.)

In Intel format, "mov %al,2" is used to load 2 into the 8-bit AL register, "mov %ax,2" is used to load 2 into the 16-bit AX register, and "mov %eax,2" will load 2 into the 32-bit EAX register. In the AT&T format, the mov instruction gets appended with a 'b,' 'w,' or 'l' in addition to the register name to indicate the size of the data moved. The command "movb $2,%al" will load 2 into the 8-bit register, the command "movw $2,%ax" will load a 16-bit value, and "movl $2,%eax" will load a 32-bit value. Part of the reason for this is self-checking code. The command "movb $2,%eax" will generate a warning because the command is to move an 8-bit value into a 32-bit register. The command "movb $32000,%al" will also generate an error because the number 32000 cannot fit into an 8-bit integer.

The last variation is in the indirect addressing mode instructions. Looking and the third line of func1 in the two assembly files, the Intel command is "mov %eax, DWORD PTR [%ebp+12]" while the AT&T format instruction is "movl 12(%ebp),%eax." Now, in both cases, the assembler should be able to figure out that the size of the data transfer is 32 bits because the destination register is EAX. However, the Intel format specifies that the term "DWORD PTR" is to be used, both to indicate that the address mode is indirect, and that the transfer size is 32 bits. The AT&T format also still requires the 'l' to indicate 32 bits are to be transfered. The big differance is in how the EBP register is used. In the Intel format, the phrase is "[%ebp+12]" and in the AT&T format, the phrase is "12(%ebp)" to accomplish the same thing.

So without further hesitation, here is the same file, just in AT&T format:

        .file   "parameters.c"
.text
.globl func1
.type func1,@function
func1:
pushl %ebp
movl %esp, %ebp
movl 12(%ebp), %eax
sall $1, %eax
addl 8(%ebp), %eax
subl 16(%ebp), %eax
addl $3, %eax
leave
ret
.Lfe1:
.size func1,.Lfe1-func1
.globl call_f1
.type call_f1,@function
call_f1:
pushl %ebp
movl %esp, %ebp
subl $8, %esp
subl $4, %esp
pushl $3
pushl $2
pushl $1
call func1
addl $16, %esp
movl %eax, -4(%ebp)
movl -4(%ebp), %eax
negl %eax
leave
ret
.Lfe2:
.size call_f1,.Lfe2-call_f1
.ident "GCC: (GNU) 3.2.2 20030222 (Red Hat Linux 3.2.2-5)"

I didn't forget the ".ident ..." lines; but they are really nothing more that decoration. More than anything else, they are used to identify the source of the source code. It barely even qualifies as documentation.

So, that's you introduction to assembly language in Linux. So, are you ready to go on? Part 2

Wenton's email (wenton@ieee.org)

Assembly language top.

home