Assemblers and assembly language
Written by Harry Fairhead   
Wednesday, 16 January 2013
Article Index
Assemblers and assembly language
Symbols and addresses

The sort of instructions that most computers recognize are too simple for humans to be bothered with - and so we invented assembly language. Find out how it works and how it started the whole movement to abstract away from the computer's hardware.




We have already tackled what a computer program is but what we completely ignored is the question of how a computer program is created in the first place. The natural language of most computers is binary because this is what their hardware is based on.

When you write an instruction in binary it is more than just an abstract idea that something should be done and some how the machine mysteriously "understands" and does as it is told. The individual bits in the instruction control what the hardware does. One bit might select a particular register, another set the arithmetic unit to add and yet another might clear a status bit. The binary code for an instruction is the set of "levers" that makes the machine do what you want it to.

You can program a computer in binary if you really want to. However programming in binary isn’t fun and it isn’t very productive.You have to remember what each bit does and how to put them together to make the instructions you want. One single bit in error and the instruction means something else. Reading binary code is also a difficult thing for an average human. Even so in the early days this is how computers were programmed - often using the banks of switches and lamps on the front panels.

A better way of expressing things was quickly invented.

Mnemonic Codes

The first step away from pure binary or machine code was to make use of symbols for the instructions.

For example, in Pentium machine code the instruction to store an eight-bit value, xxxxxxxx, where x is either a 0 or 1, in register BL is:


For example, if the data is 01010101 the complete command would be:


This is slightly easier to read if it is written in hexadecimal as B055 which loads 55 hex into the BL register. (Notice that the B at the start of the instruction is just a coincidence and it has nothing to do with the name of the BL register.)

Hex may be better than binary but it still isn’t memorable for most normal humans. The solution is to use easy-to-remember symbols or mnemonics.

For example:

MOV BL, 055H

is naturally read as “move 55 hex into register BL”. This representation also hides the fact that all of the different “move” instructions have different binary machine codes. For example, to move the 16-bit value into the BX register you would use:


followed by the 16 bits you wanted to load but as a mnemonic it would be:

 MOV BX,01234H

which looks  lot more like the first move instruction.

In other words, the use of mnemonics not only makes things easier to read, it unifies the structure of the machine code by grouping all of the different move instructions as MOV followed by what should be moved from source to destination, i.e.

MOV destination,source

Notice that this is more sophisticated than you might think in that the instruction that the programmer thinks of as one single instruction corresponds to a very large set of machine code instructions that depend on what is being moved and to where.


Let's Build An Assembler

At first these mnemonics were just that – aids to memory.

The programmer wrote out the program using the mnemonics and then translated the program by hand, using a list of codes, into machine code.

But then some smart programmer had an idea that saved a great deal of work. After all programmers are all basically lazy, it’s the main reason for being a programmer!

Someone thought,

“what if I write a program to convert the mnemonics into machine code”.

Obvious really.

All that is needed is a big table of mnemonic to machine code conversions. The result of this idea was the first “assembler” and the first “assembly” language.

The terminology wasn’t quite stable in the early days and you will find that some earlier assembly languages were called “autocode” and many other things.   It doesn’t matter what you call it, the assembler idea is just the combination of using mnemonics to represent machine codes and using a program to translate them to machine code before running the program.

This is the start of a development process that goes from this basic use of mnemonics to all the way to the sophisticated programming languages we use today.

Abstraction avalanche!

After you take the step of moving even a little bit away from machine code there seemed to be nothing stopping the mad headlong rush away from it as fast as possible. In the early days the technology needed to implement an assembler was very simple.

All you needed was a lookup table of symbols. When the program contained the line:

MOV BX,01234H

the assembler simply split off the first three characters “MOV” and looked it up in a table that gave the machine code for it. Then it read the rest of the line and completed the instruction. The main difficulty in implementing an assembler is in the “read the rest of the line and complete the instruction” which is  more difficult than you might think.

For example, machine code only uses binary but a good assembler is going to let the programmer write numbers in hex or even better in decimal. So the assembler has to recognise the base used for a number, e.g. anything ending in H is in hex, and it has to convert the number to binary.

As you can see, assemblers start to get complicated very quickly but this doesn’t matter because as computers became more sophisticated assemblers became increasingly essential and increasingly easy to create.

The rush away from machine code is a natural one because you can start to create language facilities that just don’t exist in machine code or in the simple mnemonic codes used to represent it. As soon as you start to automate the conversion of mnemonic codes into machine code you start to think of ways that you can add instructions and features that make the programmer's life easier.



All you need is a lookup table to convert symbols to binary and you almost have an assembler.






Last Updated ( Wednesday, 03 January 2018 )