Part of an ongoing personal project. Here are all the posts on the topic. The first phase of writing this decoder application has given me some insights into the complexity of how the mind works when solving puzzles. Also, I’ve improved my jQuery skills, which was partly the point.
Extremely Loud and Incredibly Close was the first fiction book I had read in about four years. It left quite an impression on me, and it contained some hidden gems, one of which is a message coded in digits. The main character’s grandfather, Thomas Schell I is a survivor of the bombing of Dresden during World War II. He lost his first wife in the bombing, and one of his only remaining connections is his wife’s sister. The two of them have one child and their relationship is strained. She emigrates to New York, and he remains in Germany for a few decades. Thomas is mute after the bombing, and he communicates mainly by writing in a notebook. After he learns of the September 11 attacks, in which his son died, he returns to New York to try to reconnect with his estranged family. He has the following phone conversation with the mother of his deceased child.
Out of my own curiosity, I set out to decode the message that Thomas typed. The beginning was easy enough to discover by making a table of the corresponding letters to digits on a phone keypad. “4, 3, 5, 5, 6” : “Hello” . “4, 7, 4, 8, 7, 3, 2, 5, 5, 9, 9, 6, 8” : “Is it really you?”. The responses from the grandmother help quite a bit. She says, “Is this a joke?” He responds, “This is not a joke!”
The rest is a bit more difficult. Others had the same curiosity to decipher the message, but this is where I thought it would be a good job for a computer to take on.
With PHP, I defined an array to associate each phone digit with its corresponding possible letters.
I put the raw digit message in a text file.
And I output the digit message with its possible letter options below in the index file.
Decoding: Part 1
My next step was to create an algorithm to decode the message.
My first attempt was basically:
- Combine ten digits at a time and create an array with all their possible combinations of letters.
- Compare each item in that array with a local dictionary file to see if that combination of words is in the dictionary
- If none of the attempted 10-letter words are in the dictionary, pop off the last digit, and see if there’s a matched 9-letter word.
- Rinse and repeat
This initial attempt of relying on an algorithm to take a first pass at the message resulted in a lot of bad guesses – much worse than I thought it would. When I ran the algorithm to determine only the first two words, I got the following:
which I thought was delightful since grits are often the first food that I put into my belly in the morning, but I knew it wasn’t the right interpretation of the message. Extending the same algorithm to run on the first 2.5 lines (I hadn’t typed in the entire digit message at this point) got some more gibberish:
All of these were technically words, but from my initial manual decoding, I knew that the first few words were “Hello. Is it really you?”, so clearly this initial guess was pretty far off. I think it picked up the actual message midstream though with “My name” at the end of the first line.
At this point, I learned a few things:
- This algorithm needs to be smarter. Guessing the longest possible word is not a very reliable way to go. I think the way that humans decode things like this is to use a combination of prioritizing common words and word collocation
- I need to insert spaces between words to make the output easier to read
- Manual correction is going to be more important than I initially thought, so it’s time to do some design work.
Next up, I’ll figure out how to better accommodate the manual correction and allow the user to run the guessing algorithm again from a specified point.