A Simple Virtual Machine
Written by Alexey Lyashko   
Wednesday, 01 February 2012
Article Index
A Simple Virtual Machine
Pseudo Assembly Language
Virtual Machine

Pseudo Assembly Language

Now, when we are done with the file format, we have to define our pseudo assembly language.

This includes both definition of commands and instruction encoding. As this VM is designed to only code/decode short text message, there is no need to develop full scale set of commands.

All we need is MOV, XOR, ADD, LOOP and RET.

Before you start writing macros that would represent these commands, we have to think about instruction encoding.

This is not going to be difficult - we are not trying to be Intel. For simplicity, all our instructions will be two bytes long followed by one or more immediate arguments if there are any.

This allows us to encode all the needed information, such as opcode, type of arguments, size of arguments and operation direction:

typedef struct _INSTRUCTION
{
unsigned short
opCode:5;    
/* Opcode value */

unsigned short
opType1:2;  
/* Type of the first operand
if present */

unsigned short
opType2:2;  
/* Type of the second operand
if present */

unsigned short
opSize:2;    
/* Size of the operand(s) */
unsigned short
reg1:2;      
/* Index of the register used as
first operand */

unsigned short
reg2:2;      
/* Index of the register used as
second operand */

unsigned short
direction:1;
/* Direction of the operation *
}INSTRUCTION;

Define the following constants:

/* Operand types */
#define OP_REG 0 /* Register operand */
#define
OP_IMM 1 /* Immediate operand */
#define
OP_MEM 2 /* Memory reference */
#define
OP_NONE 3 /* No operand (optional)*/
/*Operand sizes */
#define
_BYTE 0
#define
_WORD 1
#define
_DWORD 2
/* Operation direction */

#define
DIR_LEFT 0
#define
DIR_RIGHT 1
/* Instructions (OpCodes) */

#define
MOV 1
#define
MOVI 7
#define
ADD 2
#define
SUB 3
#define
XOR 4
#define
LOOP 5
#define
RET 6

It seems to me that there is no reason to list all the macros defining our pseudo assembly opcodes here, as it would be a waste of space.

I will just list one as an example. This will be the definition of MOV instruction:

Constants to be used with our pseudo assembly language Click to enlarge

listing1

 


Macro defining the MOV instruction Click to enlarge

listing2

 

As you can see in the code above, I've been lazy again and decided, that it would be easier to implicitly specify the size of the arguments, rather then writing some extra code to identify their size automatically.

In addition, the name of the instruction tells what that specific instruction is intended to do. For example, mov_rm - moves value from memory to register and letters 'r' and 'm' tell what types of arguments are in use (register, memory). In this case, moving a WORD from memory to a register would look like this:

mov_rm REG_A, address, _WORD

and the whole code section (currently contains only one function)  is represented by the image below:


listing3

This loads address of the message as immediate value into B register; loads length of the message from address described by message_len into C register; iterates message_len times and applies XOR to every byte of the message. "mov_rmi" performs the same operation as "mov_rm" but the address is in the register specified as second parameter.

This is what the output looks like in IDA Pro:

 Header (click to enlarge)listing4

Code (click to enlarge)

listing5

Data and Export sections (click to enlarge)

listing6



Last Updated ( Wednesday, 01 February 2012 )