_Can you crack it? … part II

In my previous post I talked about the GCHQ recruitment crack. At the end of the post I discovered I had only cracked the first of three parts. In this post I’ll talk about how I tackled the second part (which turned out to be my favourite part). Throughout the series of cracks I want to draw your attention to details that seem superfluous, details that seem insignificant, but I want you to keep them in mind.

The solution to the first part revealed a web address, lets take a look. Incase the website is no longer available the contents can also be viewed here.

The web page is a piece f Javascript code. The comments detail a set of rules that define a virtual machine. There is also block of data, which is byte code. Looking at the function that executes the byte code it is apparent that somebody has forgotten to write the virtual machine! Now you may wonder how I have derived this so quickly, it’s plain and simple experience. If you write enough computer games, sooner or later someone will say ‘do you know what would be great? a scripting language!’. Before you can stop this person and their special brand of crazy they’ve sold this idea to the powers that be. Before you know it you are splicing a version of lua into the codebase, or worse still you are hand-cranking your own scripting language. Whilst this is a painful experience, it does teach you how scripting languages work. Most scripting languages take human readable text and convert (or compile) it to a form known as byte code. This byte code is easier (and faster) for the computer to run. The byte code is usually made from a relatively small subset of instructions.

The great thing about byte code is as long as you know the rules, anyone can write the implementation, in pretty much any language (incase the penny hasn’t dropped this is how Java works). The code is presented in Javascript, however there is no real reason to implement the VM in Javascript. I took the view that Javascript has a pretty poor debugging environment compared to c++ (especially when using Visual Studio). So I decided I would write the VM in c++.

When implementing a VM you are writing a CPU emulator, so a grounding in how CPU’s work is advantageous. If you got this far, that’s probably not a problem as you will have had to deal with assembly language in the first part. CPU’s are made up of several parts, registers, instruction pointer, cpu flags (your Intel CPU is a more complicated version of this design). The CPU variable definition gives us a good idea of what we need. C++ is slightly better at dealing with bit shifting than Javascript, this allows us to do some nice things. Rather than bit shifting and masking the data I have created a bit packed structure which represents an instruction (visible on the right). I can map this structure over the memory to give me a representation of the next instruction. Registers are represented by an array [0-6], and a single byte is sufficient to represent the cpu flags. The only other detail is an instruction pointer, the architecture stipulates that address space is segmented (16 bit). Originally segmented architecture was used as cost cutting exercise, it meant you could manufacture boards with less address space, yet were still able to access large addressable spaces. What this means in our implementation is that register 5 [code segment] provides an offset to the instruction pointer. Register 6 [data segment] provides an offset to memory that we access.

Now we have a CPU defined I need to implement all of the instructions that our VM supports. The VM supports the instructions shown in the image on the left. Implementing each of these instructions in a case statement will be sufficient to form the basis of our execution unit. Each instruction can perform slightly different operations depending on the mod bit. The different operations are detailed in the brief. Writing each instruction didn’t take long as by themselves they don’t really do much. Within about 30 minutes I had completed my VM. So now what? What does the program do? Looking at the instruction set, specifically at xor is seems very likely that the program decodes some kind of message. This is where the decision to write the program in c++ paid off. The program didn’t work first time, I had a few typo’s and had not properly implemented some of the instruction. These problems were fairly easy to pick up from the Visual Studio debugging environment. About 20 minutes later I had fixed all the bugs. So I put a break point on the hlt instruction (end of the program) and ran the code.

The break point hit, but what was the result? The VM isn’t sophisticated enough to display the results of the program, so the results (or decoded message) will be lurking around in memory. The program has a pretty small addressable space, so using the memory debugging inside Visual studio I was able to locate the results. Behold!

There in memory was the decoded message ‘GET /da75370fe15c4148bd4ceec861fbdaa5.exe’. It had worked! Again on completing the second part I have to reflect on an even smaller demographic that these challenges target. Whilst this is by no means true of all graduates, most of todays graduates are unfamiliar with CPU architecture and low-level programming. I don’t draw any conclusion from this only point out that it’s interesting.

Remember the superfluous detail we were keeping an eye out for? Did you spot it? Check out the listing, see anything there that we didn’t use? That’s right, firmware, doesn’t seem to be important to the VM or the decoding of the message. So why is it there? All will be revealed in the final article ‘bringing it all together’.

You can check out the full listing of my VM here.

 

 

 

 

 

Comments are closed.