Friday, April 8, 2011

How the ActionScript Virtual Machine Works

How the ActionScript Virtual Machine Works:

If you are involved with AS3 development you might have heard of a virtual machine that lives inside the Flash Player or of the so-called bytecode that your code gets transformed into. But what are they exactly?

A significant part of the Flash Player is the AVM – the ActionScript Virtual Machine. When you compile your AS3 code it gets transformed into a binary instruction set, called bytecode, which is embedded into the produced SWF. As a user loads the SWF into the Flash Player, the AVM parses the bytecode and executes it step by step.

Let’s examine the process in a little bit more detail: Have a look at the following statement and imagine we want to execute it (compute the result and assign it to “foo”):

foo = 2 + 3 * 4;

From a human’s point of view this line means “Multiply 3 by 4, add 2 to the result and assign that to a variable named foo”.

On the other hand if a computer reads this line it would store it as “character f, followed by character o, followed by character o, followed by character space, followed by character equals, followed by….” And so on. At this state this information is pretty useless and additional steps need to be taken in order to turn the statement into something that a machine can execute. Basically what we need to do is to compile the source code.


The Compiler

As I noted before, the statement above is raw source code, a collection of characters that mean nothing to a computer. In order to obtain useful information we have to run the code through a few procedures. First, we should turn the stream of characters into words and second, turn the words into sentences.

The first step of turning the characters into words is done by a tokenizer which basically performs lexical analysis of the source code. It runs over the characters and groups them into series of “tokens” (and it also determines their type – identifiers, operators, constants…). After the tokenizer (it’s also called a lexer BTW) finishes its job, we get an array which can be illustrated like this:

[Identifier foo][Operator equals][Integer 2][Operator plus][Integer 3][Operator multiply][Integer 4]

This is a higher level structure that contains “words” instead of the raw characters.

The resulting tokens are fed into a parser. It performs semantic analysis of the tokens and assembles them into machine instructions. In simpler words, it constructs sentences out of the words (tokens) and makes sense out of them (i.e. compiles instructions out of them). For instance if the parser is given a statement 2 + 3; i++; as tokens the parser must first separate the tokens into “sentences” (2 + 3 and i++) and then actually understand them (the first is add operation and second is an increment). After we understand the instruction we can actually compile instructions to the machine out of the input.

After the parsing the tokens of our string we get the following instructions:

push 2
push 3
push 4
multiply
add
assign "foo"

These are instructions that a computer can execute. Compress it to a binary format and you’ve got bytecode. Bytecode is a list of instructions that a machine is very good at processing and when processed in order yield the desired results.


The Interpreter

After we compiled the source code to bytecode we can execute it with a virtual machine. The VM is software that executes the bytecode one instruction at a time, so let’s just walk through the interpretation of our statement:

  1. push 2 — The command is to push the number 2 into the stack. The VM maintains a stack during execution which the commands can operate on, that is to push into and pop off values (http://en.wikipedia.org/wiki/Stack_(data_structure)). Currently the stack looks like this : [2]
  2. push 3 — push another integer into the stack. Now it looks like [2, 3];
  3. push 4 — push yet another integer into the stack. Now it looks like [2, 3, 4];
  4. multiply — this command pops 2 values off the stack, multiplies them and pushes the result back to the stack. It pops the values 3 and 4 (which are currently on the top of the stack), multiplies them and pushes the resulting 12 to the stack. It affects the stack to look like [2, 12];
  5. add — you’ve probably guessed it: The command pop 12 and 2 off the stack, adds them and pushes the resulting 14 into the stack. Now only 14 remains inside the stack.
  6. assign 'foo' — This command pops a value off the stack and assign it to a variable named foo. So now the variable foo contains value of 14 and the stack is empty.

That’s it! This is an example of an extremely simple statement executed of an extremely simple virtual machine. Let’s examine a slightly more complicated example: [...]

No comments:

Post a Comment