Last Thursday I saw an article about a website setup by GCHQ (The Government Communication Head Quarters). The website was a recruitment drive, but not your average recruitment drive. In order to get to the recruitment page you had to crack a code. Well if that isn’t a red rag to your average programmer bull I don’t know what is. I thought I’d write a small article on how I approached the cracking.
Upon visiting the site you are presented with the screen on my left. The picture consists of a small grid of numbers, a time limit and a box where you can enter a keyword (presumably by decoding the numbers). Looking at the numbers and being a programmer it seemed immediately obvious that these numbers were base 16 numbers, or as we refer to them hexadecimal. I decided to make the assumption that absolutely nothing on this web page was ether an accident or a coincidence. Why were the numbers in hex? Well you could argue that by recognising the numbers are in hex you have cracked the first code, after all if you are a programmer hex is second nature to you. It seems this test is targeting a certain demographic, one that so far I fit into.
Static analysis of the numbers didn’t yield any obvious answers, a few patterns were notable 41414141 and 42424242, they’re ASCII, could there be more ASCII? I would need to look at these numbers in an editor. Remember I said that nothing on this page was left to chance? This isn’t really part of the code cracking, however this is where they give you a massive clue. The grid of numbers is a picture, a png, now that’s weird why wouldn’t they be text? Are you seriously going to make me type these numbers in? The answer to this is yes, and the only reason I can think of is to re-awaken a dorment memory of owning a ZX Spectrum. Why is this important? Well any Spectrum owner will have at some point typed in pokes for infinite lives from ZZap!
The code would be printed in the magazine and you would type it in before loading the program. As a young boy I simply could not get my head around how this worked, and it seemed somehow like magic. Here’s an example
chuckie egg 2 (P 'n' C)
loader (or use Multiface to enter POKEs)
100 FOR x=23296 TO 23308: READ a: POKE x,a: NEXT x: RANDOMIZE USR 23296
110 DATA 55,62,255,221,33,0,64,17,0,192,195,86,5
Once I’d typed in all the numbers from the grid I try a few basic tricks, looking for strings, reversing the byte order, xoring various parts of the message. Meanwhile in the back of my mind I’m thinking about things I haven’t thought of for 25 years, thinking about typing all those data statements from the Spectrum, this voice gets louder and louder until I actually listen to it. Yes of course, the numbers are a program. I point the disassembler at the data and sure enough it looks like code, but does it run? And if so what does it do?
The code in the disassembler looks like this. Fortunately I used to write a lot of assembler on my Amiga (68000), so I can read assembly language pretty well. The first statement in the program is both weird and later on important. There’s a JMP statement over the first 4 bytes, weird eh? Remembering the mantra, nothing is a coincidence. The importance of this JMP was not not obvious so I carried on working out what the program did. The program operates in two stages, during the first stage the program creates a dictionary, a cypher lookup table, encoded with phrase DEADBEEF (another thing to remember). Once the dictionary is created the program makes a fake call from the end of memory, this fake call allows the popping of addresses just after the call. It uses this popped address as the location of the message to be decoded. These addresses are checked to see if the contents match 41414141 and then 42424242. The program passes the first test but then fails on the second test. What? Did I miss something? I have the cypher key, but no message? I have to say this stumped me for a little while, as I was convinced the message was inside the code, after pointing the decoding at various bits of memory and ending up with nothing I decided to have a think.
I went back to the website and starting looking through the HTML, nothing obvious stood out. I remember doing a similar hack before and I found some information in a pictures alpha channel. So I loaded up the image of the numbers and looked at the alpha channel, nothing. Then I thought, I’ve also seen chunks embedded in files (like a spare IFF chunk). I loaded the image into a hex editor, scrolled up and down and nothing really stood out. I stepped back from the monitor and starred out the window for a little while, as my wondering gaze returned to the monitor I noticed that PNGs had a comments section. I never knew this, and it’s something I’d never used. The reason it didn’t stand out was because the comment was close to binary, however on furtherr inspection it was actually ASCII, could this be the message? Maybe I thought!
I appended the message to the end of the decoding program, excitedly I ran the program, only to see it bomb out. Hmmm, is it possible this isn’t the message? Well lets step back and think. The encoded message would probably not be completely within the ASCII readable character range, after all our dictionary is 255 bytes in length and contains all kinds of products when xored with source message. Yet the message in front of me is completely in the readable ASCII range. If at this point you have ever done any PHP coding, or worked with web cookies you will be screaming the answer. The trailing == should have really given the game away, however it’s all about familiarity and exposure to technologies.
The question is, how do we represent all the values from the ASCII range 0 – 255 in a readable text form, such that would be required for a comment field? The answer is of course Base64 encoding. Again I felt the pace quicken, I found a website that decoded Base64 into hex (very handy). When decoded the message began with 42424242, followed by a length, this was 100% the message, as that’s the signature the code was looking for.
As I ran the code I felt a rush of adrenaline (come on for programmers this is exciting stuff!). I had now become quite familiar with the decoding program so inserted a break point after the decoding section. I knew that EDI pointed to the decode buffer, below is what I saw in the memory.
This was pretty damn cool, from nothing I had unearthed the answer. Hang on though, that’s not the code I need to enter into the box, that’s a web address, what’s going on? If you go to the web site (assuming its still up) you are lead onto the second part of of the puzzle. Seems that the grid of numbers was only the first of three parts. In my next post I’ll talk about how I solved the next part, the VM!
The listing for my decoding program is here.