Assembly Language - Part 3

So, we finally have all the tools we need to merge an assembly language function into a 'C' program.  So let's look at a live, although still not very practical example.

For this example, I want to demonstrate using some of the standard 'C' library to manage our I/O.  We will start with the code we have already worked with in part 2, but we we can use the 'C' libraries to make the program actually do something!

What I want my program to do is to ask the user for the length and width of a rectangle, and then calculate and display the rectangle's perimeter and area.  Just to keep things simple, we will only use integer values.  We can look at floating point values later.

I also want to do all the work in my asssembly language file.  Why? Because it works out to be a more meaningful demonstration! If we were doing this in 'C,' this would be a trivial program...

#include 

void f1( char c, int i )
{
  int length, width;

  printf( "Please enter the rectangle's length: " );
  scanf( "%d", &length );
  printf( "Please enter the regtangle's width:  " );
  scanf( "%d", &width );

  printf( "The perimeter is %d\n", 2*(length+width) );
  printf( "The area is %d\n", length * width );
}

That's all well and good, but what does it take to accomplish the same task in assembly? Well, we have a number of tasks, so let's look at what we need to accomplish.

We have a number of chunks of data in use.  We have the two obvious pieces, the length and width variables.  What is less obvious is that we have some larger pieces, the 4 character strings in the printf() statements, as well as the character string used in the two scanf() statements.  The two format strings are both the same, and both are going to be constant, so we will use the same string twice.

So far, we have not discussed memory segments, but it becomes a very useful, and necessary, tool.  You may recall from way back in part 1, that there was a section of code called ".text" which was the storage area for program code.  This was, to be more precise, the segment in memory that stores this program's executable program code.  Likewise, every program has another section called ".data" which stores data, ".rodata" which stores read-only data, and a section called ".bss" which stores data that needs to be initialized with preset values.

The character strings are all constant values that are not allowed to change, so they are going to fit into the .rodata segment:

           .rodata
prompt1:   .string "Please enter the rectangle's length: "
prompt2:   .string "Please enter the rectangle's width:  "
perimeter: .string "The perimeter of the rectangle is %d.\n"
area:      .string "The area of the rectangle is %d.\n"
scanf_int: .string "%d"

Next, we have the two variables for length and width, and we have a number of options.  Our first option would be to put these variables into the stack frame in such a way that they would both be addressed using -4(%ebp) and -8(%ebp), as long as we don't confuse which is which.  It would be very easy to start thinking, "length will be stored at -4(%ebp) and width is stored at -8(%ebp)," but then, later on the function, mistakenly use -8 when I wanted the length and -4 when I wanted the width, or as Bugs Bunny said once, "...or maybe it was the other way around in reverse."  This certainly invites a long conversation about comments and in-code documentation.  But you can read that for yourself.  Our other two options are to use space in the .data memory segment or the .bss segment.  I strongly recommend that you use the .bss segment because the .bss segment always initializes it's data, while the .data segment is specifically for non-initialized data.

For demonstration purposes, I want to show both, but I will start by using the .bss segment.  In this case, we have a few very simple lines of code:

       .bss
       .align 4

length .int 0
       .type length,@object

width  .int 0
       .type width,@object

Here, I am defining two variables, both are initialized to 0 when the program loads, both are integer values, and both are classified as objects (as opposed to functions).  If the program was not using 'C,' it would probably not be necessary to identify the two variables as objects, and we could probably get away with not doing this in this program since the 'C' portions of the code never access these variables.  Still, it is good to keep these identifications in place, just because we are not sure what the compiler or linker really use these values for.

So, now that we have our data figured out, we can start looking at the function.  Just one quick note first, remember in part 2 that the f1() function was receiving two parameters, but those parameters are not being used in the new version of f1() that we are writing.  Because we are not modifying the other 'C' file (that has main() and calls f1() ), we are not going to change the parameters; we are simply going to ignore them. Certainly, in a real-world application, we would want to go back and clean this up.

I would hope that if you are reading this, that going through the steps of how to calculate the perimeter and area of a rectangle should insult you, and I am going to jump over that section and move along to the actual code needed to accomplish the task.  For starters, we must ask the user to input data, then get the data from them.  In order to prompt the user for input, we need to start by sending them a character string, telling them what we want.  We are going to accomplish this by using the printf() function from the standard 'C' library.  This is where we finally learn what those damned pointer thingys really are.  They are pointers into memory, of course, but at this level, we discover that they are really nothing but integer values.  It has become common in 'C'/C++ classes (lectures, not data classes) for instructors to use house numbers as an example to illustrate memory addresses, and now we see that this analogy is actually far more precise than any other programming analogy I am otherwise aware of.  The memory addresses really are integer values, indicating byte addresses (not physical addresses) into memory.  SO, in order to call the printf() function, we need to know the address of the string we want to send to the function.  That is why we used a label when we wrote the code defining the strings.  In the first case, we used the label, "prompt1:" to identify the location of the string to print.  The colon is not part of the label; it merely indicates to the assembler that prompt1 is a label and not an attempt at giving a command which would not be recognised by the assembler.  So, the lines of code producing our call to the printf() function, including the string to print, look like this:

        pushl $prompt1
        call  printf
        add   $4,%esp       #We pushed a 4-byte value, clean the stack up!

Wow! Wasn't that simple? We only have 1 parameter to send to printf(), the address of prompt1.  ***DANGER*** Notice the dollar sign ($) sitting in front of the label name? This means that the value we want to push onto the stack is an immediate value... whatever integer value identifies the byte location of the string, that is the value we want to put on the stack.  If we leave off the dollar sign, we are not using immediate addressing anymore, we are using direct addressing, which means that we are trying to look at the contents stored in the memory location.  This is not what we want, and in this particular case, it causes a strange event where we are asking the microprocessor to grab data and not know what to do with it because if we were to try to use this data to be what we want to put on the stack, we would now be using indirect addressing, which is a different addressing mode, and uses a different technique of marking in the assembly code.  You MSUT use the dollar sign, here, to let the assembler know exactly what you are trying to do.

So, now that we have prompted the user to enter a number, I'd suggest actually doing something silly, like, oh, getting the number? To do this, we need to cal the scanf() function.  Our call to the scanf() function has two parameters... the pointer to the format string ("%d") and the pointer into memory where the input value is expected to be stored.  Well, neither of those are particularly difficult, but here's your first quiz: which address gets pushed onto the stack first?

Give up? Remember, the first parameter in the 'C' list is always the last parameter to push.  At the risk of sounding biblical, always remember that phrase, "the first shall be last and the last shall be first," and everything will be OK.  So, our call to the scanf() function looks like this:

        pushl $length
        pushl $scanf_int
        call  scanf
        addl  $8,%esp  #we pushed 2 4-byte values, clen the stack up!

Sigh, that was far less interesting than you probably expected, wasn't it? Oh be honest, yes it was! So, the entire function at this point looks like this, including reading both inputs from the user:

        .type f1,@function
        .globl f1
f1:
        pushl %ebp
        movl  %esp,%ebp

        pushl $prompt1
        call  printf
        add   $4,%esp

        pushl $length
        pushl $scanf_int
        call  scanf
        add   $8,%esp

        pushl $prompt2
        call  printf
        addl  $4,%esp

        pushl $width
        pushl $scanf_int
        call  scanf
        addl  $8,%esp

At this point, we have all of out input data from the user, so we can perform our calculations and output.  I was going to perform my calculations for each area and perimeter, then output them, but I suddenly had this horrible image of my college professor hunting me down, so I'll do my calculations first, then worry about the output.

This means that I need to go back to my .bss segment and add two more variables to store my area and perimeter valuesin.  I'll save myself (and you) a lot of time by pointing out that we alreasy have a label for both "area" and "perimeter" in the .rodata segment! That would have caused a strange error that could take hours for the novice to find.  Instead, I want to use the labels, "f1_area" and "f1_perimeter" in the .bss segment to store these values.  Now to write the code to compute the area and perimeter:

        movl length,%eax #grab the stored value for the length
        addl width,%eax  #add the stored value for the width (length+width)
        addl %eax,%eax   #double it (2*(length+width))
        movl %eax,f1_perimeter

        movl length,%eax #grab the length again
        mull width,%eax  #multiply by the width
        movl %eax,f1_area

And now, all that's left is to print it all out the results and return from the funtion.

        pushl f1_perimeter #get the value and push it onto the stack
        pushl $perimeter   #push the pointer to the string
        call  printf
        add   $8,%esp

        pushl f1_area      #get the value and push it
        pushl $area        #...and the string for area
        call  printf
        add   $8,%esp

        leave
        ret

...and that's it! That is how to call 'C' functions, even from the libraries, from assembly language.  One thing to keep in mind, if you do try to link to one of the libraries in 'C', if the library is not linked by default (like the libstdc library is), the cc command must still link the library in to the program, just like it would if the file was all done in 'C' (using the '-L<libname>').

One last note before graduating to part 4, I have been very careful about specifically only mentioning 'C' and not mentioning C++.  The reason for this is that C++ does a weird thing called "name mangling" to manage the overloaded functions.  This is actually a fairly intricate discussion which is not really in the same scope of this series of pages, but I may decide to re-address the concept later.  The typical way of getting the assembly language to merge into a C++ program would be to modify the function header/prototype.  In 'C,' we had to include a function prototype (outside of any function bodies):

void f1( char c, int i);

In C++, we need to do something similar, but it takes a slightly seperate form because we are trying to tell the C++ compiler that this function has the literal name, "f1" without any of the name mangling:

extern "C" {
  void f1( char c, int i );
}

The only thing we lose when we give up the name mangling is we can not overload any functions, but this would probably not have a large impact on our design.  If it were absolutely necessary to overload a group of functions that are written in assembly, we would have to find the mangled name for all possible overloaded functions by writing a short C++ file that only contains the overloaded functions, compile with the -S switch and look at the .s file to see the "real" names the compiler gives the functions.

And that's it! You should now be able to use assembly language routines in your 'C'/C++ programs.  What is even more interesting, because GNU made all of their compilers (g77=ForTran, jcg=java, gnat=ada) to be compatible with each other, you should also be able to link your assembly language routines with all those other languages, too.  I have found an excellent book to give demonstrations on mixing the high-level languages, and although it doesn't discull mixing assembly language with the other languages, it does discuss assembly language briefly.  I strongly recommend you look for (and buy) this book: GCC - The Complete Reference ISBN: 0-07-222405-3

Whew! Time for a break! You've earned it.  Next, I will discuss building an all-assembly language program in Part 4.

Wenton's email (wenton@ieee.org)

Assembly language top.

home