Alright, now we have some sort of a "compiler" and we can start working on the VM itself. First of all, let us define a structure, that would represent our virtual CPU:
typedef struct _VCPU { unsigned int registers[4]; /* Four registers */ unsigned int *stackBase; /* Pointer to the allocated stack */ unsigned int *stackPtr; /* Pointer to the current position in stack */ unsigned int ip; /* Instruction pointer */ unsigned char *base; /* Pointer to the buffer where our pseudo executable is loaded to */ }VCPU
registers - general purpose registers. There is no need for any additional register in this VM's CPU;
stackBase - pointer to the beginning of the allocated region which we use as stack for our VM;
stackPtr - this is our stack pointer;
ip - instruction pointer. Points to the next instruction to be executed. It cannot point outside the buffer containing our pseudo executable;
base - pointer to the buffer which contains our executable. You may say that this is the memory of our VM.
In addition, you should implement at least some functions for the following:
allocate/free virtual CPU
load pseudo executable into VM's memory and setup stack
a function to retrieve either a file offset or normal pointer to an object exported by the pseudo executable
a function to set instruction pointer (although, this may be done by directly accessing the ip field of the virtual CPU
a function that would run our pseudo code.
In my case, the final source looks like this:
(click to enlarge)
I decided not to cite the VM's code here as you should be able to write it yourself if the subject is interesting enough for you. Although, the code in this article does not contain any checks for correct return values, you should take care of them.
Summary
Although, this article describes a trivial virtual machine which is only able to encode/decode a fixed length buffer, the concept itself may serve you well in software/data protection as hacking into VM is several times harder then cracking native code.
One more thing to add.
Our design allows us to call procedures provided by the pseudo executable, but there are several ways to allow the pseudo executable to "talk to us". The simplest (as it seems to me) is to implement interrupts.