Assembly Language - Part 2

In part 1, I discussed some of the basics of assembly language in Linux, including both Intel and AT&T syntax.  Because the native syntax in Linux (and unix in general), I'm going to continue, using only the AT&T syntax.  it will be a bit awkward for anyone that is only used to the Intel format, mostly because the operands will feel "backwards," but I think it is important to become accustomed to the feel of the native AT&T syntax.

Also in part 1, I discussed a brief introduction into using parameters in 'C.'  In this section, the two most important keys to remember are:

  1. The first parameter in the 'C' parameter list is the last parameter to be pushed onto the stack.
  2. The only return value from the 'C' function is actually stored in the AL/AX/EAX register at the end of the function.

Up to now, everything has only used integers because the data width for the Pentium family of processors is 32 bits for integers.  Conviniently, addresses are also 32 bits, although that is by the design of the Pentium chips.  However, a program will ultimately, eventually, need to deal with values that are not 32-bit data types.  As an example, consider the char data type in 'C.'  This is only 8 bits, and must be handled a little differently.  To see how the compiler handles the differances, consider the following 'C' fragment:

  
char global_char;
int  global_int;
  
void f1( char c, int i )
{
  global_char = c;
  global_int = i;
}

void caller( void )
{
  f1( 'A', 100000 );
}

...and the resulting assembly file:
NOTE: I added line numbers with the command

grep -n . <filename.s>

1:      .file   "char_param.c"
2:      .text
3:.globl f1
4:      .type   f1,@function
5:f1:
6:      pushl   %ebp
7:      movl    %esp, %ebp
8:      subl    $4, %esp
9:      movl    8(%ebp), %eax
10:     movb    %al, -1(%ebp)
11:     movb    -1(%ebp), %al
12:     movb    %al, global_char
13:     movl    12(%ebp), %eax
14:     movl    %eax, global_int
15:     leave
16:     ret
17:.Lfe1:
18:     .size   f1,.Lfe1-f1
19:.globl caller
20:     .type   caller,@function
21:caller:
22:     pushl   %ebp
23:     movl    %esp, %ebp
24:     subl    $8, %esp
25:     subl    $8, %esp
26:     pushl   $100000
27:     pushl   $65
28:     call    f1
29:     addl    $16, %esp
30:     leave
31:     ret
32:.Lfe2:
33:     .size   caller,.Lfe2-caller
34:     .comm   global_char,1,1
35:     .comm   global_int,4,4
36:     .ident  "GCC: (GNU) 3.2.2 20030222 (Red Hat Linux 3.2.2-5)"

OK, time to look at this.  The first thing to look at is the caller function.  Looking at the 'C' code, I am very explicitly using the character, 'A' in the char parameter, and the number 100000 in the integer parameter.  (Looking at the ASCII chart,we find that the character 'A' has an integer value of 65.)  Now, let's look at what the assembly language looks like.  Looking at line 28, we find the call to the f1() function.  Just prior to that, there are two commands to push values onto the stack.  First, in line 26, we are pushing the 32-bit value, 100000.  We know it is a 32-bit value because it is too large to fit into a 16-bit integer, but also because notice the 'l' that is tagged on the 'push' command.  That 'l' means "long," which means a 32-bit integer.  But notice in the next line, we are pushing the value 65 onto the stack.  This is the 'A' that we are pushing.  Notice that although the parameter is an 8-bit character, the compiler again uses "pushl" instead of "pushb," which would only push an 8-bit value.  The reason for this is that the computer wants to only deal with 32-bit values.  (It's actualy because the hardware uses all 32-bit wide memory that the computer does everything it can to use 32-bit integers.) One last thing to mention, quickly, notice lines 24, 25, and 29.  In these lines, the compiler thinks that it needs to generate 16 bytes of scratchpad space (think of this as local variables in 'C').  It does not actually use this space any, but it reserves space because it thinks it will need it.  It also cleans it up (line 29).  This is an example of why even 'C' is not as efficient at it could be, and even though this is not an error, it could be corrected by using certain optimization switches in the command line at compile time, but the typical programmer will be unaware of this, and not think to use the optimization switches.

So, let's look at what the called function, f1(), does with the two parameters.  Recall that the first two commands in every function are to build the stack frame, but also notice that the next line (line 8) is going to build a scratchpad area, 4 bytes long.  Again, it will only use 1 of them, but the compiler is still using 32-bit values, everywhere it can, so it reserves 4 bytes instead of just the one byte it really wants.  So, let's look at the new stack frame, and I'll add the offsets, relative to where %EBP is pointing:

+12:   100000 (the int i parameter)
+8:    65 (the 'A' expanded to 32 bits)
+4:    return address, pointing to the memory address containing line 29
+0:    stores the value of %ebp when the program entered this function
-4:    Scratchpad 32-bit word (NOTE: This is also the location %esp points to.)

Looking at this stack frame, we see everything interesting to our function.  As we look at line 9, we find the processor is getting the character parameter from the stack as a 32-bit integer, into the %eax register.  In line 10, we are saving only the %al register in the scratchpad area.  Notice the -1 value, offsetting the %ebp register to address memory space that was reserved.  If we were dealing with an short integer (16-bits), it would have to use a -2 offset.  In either case, it could (and should) use the offset -4 to keep the data properly aligned, but they didn't.  Deal with it, I guess.  If you were to re-write the f1() function in assembly, you could correct this oversight without changing anything in system performance.  Finally, the program reloads the 8-bit value that it just saved, and re-saves it in the variable named "global_char."  (After that, it then loads th int parameter, i, ans stores it in the variable named "global_int.")  

And last, but not least, a quick look at lines 34 and 35.  These are the two lines defining the "global_char" and "global_int" variables.  the ".comm" command does a collection of things.  First, it defines the label for the variable.  Second, it inherently includes the ".globl" command to make the labels visible, even to other files that will be linked to this file.  (That means be careful - don't use the same glbal variable names anywhere else in any other files unles you actually mean to use this exact variables.)

Finally, notice the two values at the end of each .comm line.  For the character variable, the size is 1 byte, so the length is 1; for the integer variable, the size is 4 bytes, so the length is 4.  For each variable, there is also an "alignment" value.  Since the character variable is a single byte, it will never possibly be split across two seperate physical memory locations.  The integer variable, however, is 32 bits long, and if it stared at the wrong memory address, it could be split over two physical memory locations.  For example, if the bottom memory address for the first byte was 0 (remember that computers start counting with 0), then the 4 bytes of the address would be 0, 1, 2, and 3.  When the computer is accessing memory, remember that the physical memory is laid out as being 32 bits wide.  This means that each physical memory location stores 4 bytes, which all have the same physical address, but are located with seperate bits within that memory address.  Byte address 0, 1, 2, and 3 are all stored in physical address 0, while bytes 4, 5, 6, and 7 are all stored in physical memory address 1.  To find the physical location of a specific byte, divite the byte address by 4.  the result will have a dividend and a remainder (even if one or both are 0).  The dividend is the physical address, and the remainder is an indicator of which byte to use within that physical location.  Now, let's consider our two variables.  The character variable is only going to need a single byte, so let's assume it uses byte 0.  Next, we need a 32-bit integer, which could be stored in bytes 1, 2, 3, and 4.  However, if we were to do this, the micrprocessor woul have to read two seperate physical addresses.  First, it would have to read physical memory location 0 to read bytes 1, 2, and 3, but then physical memory location 1 so it could read byte 4.  However, if the assembler was told to ignore bytes 1, 2, and 3, and instead, use byte locations 4, 5, 6, and 7 to store the integer variable, then the microprocessor could reead the entire 32-bit word by reading a single physical memory location.  Instructing the assembler to ignore spare byte locations to start the variable at the next physical memory location is called "alignment," and we are instructing the assembler to use the proper alignment with the 4 at the end of that line.  We are telling the assembler that the base address of the variable (the first byte of the variable) must be a byte address that, when divided by 4, must have a remainder of 0, which forces the assembler to put the base address of the variable in a position where the entire 32-bit value is stored in a single physical location.

Now that we know how the two functions work together, we can see that there is a lot of optimizing that can be done.  For starters, I want to use only the assembler (not the linker) to produce a listing file and an object file.  The command looks like this:

as -a=char_param.l char_param.s -o char_param.o

The assembly cods file (char_param.s) is assembled, and the object file is defined by the -o char_param.o, and the listing file is defined with the -a=char_param.l.  Note that the listing file is still human-readable, but the object file is not.  Looking at the listing file, we see the following:

GAS LISTING char_param.s                        page 1
 
 
   1                            .file   "char_param.c"
   2                            .text
   3                    .globl f1
   4                            .type   f1,@function
   5                    f1:
   6 0000 55                    pushl   %ebp
   7 0001 89E5                  movl    %esp, %ebp
   8 0003 83EC04                subl    $4, %esp
   9 0006 8B4508                movl    8(%ebp), %eax
  10 0009 8845FF                movb    %al, -1(%ebp)
  11 000c 8A45FF                movb    -1(%ebp), %al
  12 000f A2000000              movb    %al, global_char
  12      00
  13 0014 8B450C                movl    12(%ebp), %eax
  14 0017 A3000000              movl    %eax, global_int
  14      00
  15 001c C9                    leave
  16 001d C3                    ret
  17                    .Lfe1:
  18                            .size   f1,.Lfe1-f1
  19                    .globl caller
  20                            .type   caller,@function
  21                    caller:
  22 001e 55                    pushl   %ebp
  23 001f 89E5                  movl    %esp, %ebp
  24 0021 83EC08                subl    $8, %esp
  25 0024 83EC08                subl    $8, %esp
  26 0027 68A08601              pushl   $100000
  26      00
  27 002c 6A41                  pushl   $65
  28 002e E8FCFFFF              call    f1
  28      FF
  29 0033 83C410                addl    $16, %esp
  30 0036 C9                    leave
  31 0037 C3                    ret
  32                    .Lfe2:
  33                            .size   caller,.Lfe2-caller
  34                            .comm   global_char,1,1
  35                            .comm   global_int,4,4
  36                            .ident  "GCC: (GNU) 3.2.2 20030222 (Red Hat Linux 3.2.2-5)"
GAS LISTING char_param.s                      page 2
 
 
DEFINED SYMBOLS
                            *ABS*:00000000 char_param.c
        char_param.s:5      .text:00000000 f1
                            *COM*:00000001 global_char
                            *COM*:00000004 global_int
        char_param.s:21     .text:0000001e caller
 
NO UNDEFINED SYMBOLS

For the most part, this still looks the same, except that line numbers have been added, in addition to the machine language hexadecimal codes.  Each line in the machine language includes an address, and the machine code instruction.  In other words, line 8 is the assembly line to subtract 4 from the %esp register.  The machine language istruction starts at address location 0x03, and contains the bytes, 0x83, 0xEC, and 0x04.  (Note that these addresses are relative, and will be relocated to different locations when all object files are linked together.)  So, we can determine that the length of the f1() function is 30 bytes.  You can either count the bytes, or remember the .size command.  Notice how the extra label, .Lfe1, is used.  For one thing, the value of .Lfe1 is 0x1e.  )x1d was the last command of the function, so after filling a byte value into it, 0x1d is full, so the program counter has bumped up to 0x1e.  Now, f1 was defined at 0x00, so the size of the function is 0x1e - 0x00, or 0x1e bytes (0x1e = 30). Additionally, there are 11 commands in this function.  So, let's see what we can do.

First, look at lines 9, 10, 11, and 12.  In this case, a single value is pulled from the stack, put into temporary storage, loaded back (into the same register), and then saved into the global_char variable.  True, there is some conversion happening, the 32-bit value is technically being converted to an 8-bit value, but the actual data bits are never being modified.  Instead, it should be possible to get the parameter value from the stack, then save just the low byte into the global_char variable space, usinf only two commands:

movl 8(%ebp),%eax
movb %al,global_char

Next, after a little inspection, we find that the temporary scratchpad space is not used any more, so line 8 can be removed entirely.  We could also replace the pushl and movl instructions with the enter instruction, which requires the same number of bytes, but is only a single instruction, but that requires modifying the assembler's command line, and there is very little gain.  We could eliminate the pushl and movl instrusctions, and the leave instruction, but we have to stop a moment and think... what does the stack fram look like, and remember, we can't use the %ebp register if we don't set it up!  The offsets are now relative to the %esp pointer instead of the %ebp:

+8: the int i parameter
+4: the char c parameter
+0: return address to the calling function <-- %esp points here

Now, the entire function looks like this:

f1:
movl 4(%esp), %eax
movb %al, global_char
movl 8(%esp), %eax
movl %eax, global_int
ret

Leaving us with only 5 instructions, consuming 15 bytes.  We have reduced the physical spaec required for this function to 50%, and we have reduced the number of instructions (and thus, the processing time) by more than 50%.

Now, let's assume that the f1() functionwas the only one we were interested in optimizing, and that the caller() function could be left alone.  (Also, we'll rename caller() to main(), to make this into a working program example.  That leaves us with two files, f1.s and char_param_example.c that look like this:

/* char_param_example.c */
void main( void )
{
f1( 'A', 100000 );
}

      .file   "f1.s"
.text
.globl f1
.type f1,@function
f1:
pushl %ebp
movl %esp, %ebp
subl $4, %esp
movl 8(%ebp), %eax
movb %al, -1(%ebp)
movb -1(%ebp), %al
movb %al, global_char
movl 12(%ebp), %eax
movl %eax, global_int
leave
ret
.Lfe1:
.size f1,.Lfe1-f1
.size caller,.Lfe2-caller
.comm global_char,1,1
.comm global_int,4,4
.ident "Wenton's assebly language demos"

and now, we can compile the .c file and assemble the .s file to get the two object files, char_param_example.o and f1.o:

"gcc char_param_example.c -c" will produce the char_param_example.o file
"as f1.s -o f1.o" will produce the f1.o file

And now we can link the two files together. Because the program does still use 'C,' the stdc library must still be included, so for now, we will use the cc (gcc) program to manage this for us:

gcc char_param_example.o f1.o -o char_parameter_example

No, the program doesn't appear to do anything, but so far, this is only an example of how to generate assembly language from a 'C' program, isolate one function and optimize it, then relink the assembly file to the 'C' file into one program.

Now that we have this, it's time to look at a 'C'/assembly program that actually does something in Part 3.

Wenton's email (wenton@ieee.org)

Assembly language top.

home