Digit Decoder

Thomas Schell Sr. communicating with a notepad

Thomas Schell Sr. in the Warner Bros. movie, Extremely Loud and Incredibly Close (2012)

Part of an ongoing personal project. Here are all the posts on the topic. The first phase of writing this decoder application has given me some insights into the complexity of how the mind works when solving puzzles. Also, I’ve improved my jQuery skills, which was partly the point.

Here’s the Github repository and the working prototype.

Background

Extremely Loud and Incredibly Close was the first fiction book I had read in about four years. It left quite an impression on me, and it contained some hidden gems, one of which is a message coded in digits. The main character’s grandfather, Thomas Schell I is a survivor of the bombing of Dresden during World War II. He lost his first wife in the bombing, and one of his only remaining connections is his wife’s sister. The two of them have one child and their relationship is strained. She emigrates to New York, and he remains in Germany for a few decades. Thomas is mute after the bombing, and he communicates mainly by writing in a notebook. After he learns of the September 11 attacks, in which his son died, he returns to New York to try to reconnect with his estranged family. He has the following phone conversation with the mother of his deceased child.

Initial Steps

Out of my own curiosity, I set out to decode the message that Thomas typed. The beginning was easy enough to discover by making a table of the corresponding letters to digits on a phone keypad. “4, 3, 5, 5, 6” : “Hello” . “4, 7, 4, 8, 7, 3, 2, 5, 5, 9, 9, 6, 8” : “Is it really you?”. The responses from the grandmother help quite a bit. She says, “Is this a joke?” He responds, “This is not a joke!”

The rest is a bit more difficult. Others had the same curiosity to decipher the message, but this is where I thought it would be a good job for a computer to take on.

With PHP, I defined an array to associate each phone digit with its corresponding possible letters.

I put the raw digit message in a text file.

And I output the digit message with its possible letter options below in the index file.

Decoding: Part 1

My next step was to create an algorithm to decode the message.

My first attempt was basically:

  1. Combine ten digits at a time and create an array with all their possible combinations of letters.
  2. Compare each item in that array with a local dictionary file to see if that combination of words is in the dictionary
  3. If none of the attempted 10-letter words are in the dictionary, pop off the last digit, and see if there’s a matched 9-letter word.
  4. Rinse and repeat
  5. (Based on an approach by John Resig – Dictionary Lookups in JavaScript)

Here’s the code at this stage. Most of the business is handled in javascripts/pickletters.js.

I knew there would need to be some manual correction and the ability to run the functions again from a given point after the user did a manual correction. That’s why I did the bulk of the heavy lifting in JavaScript. I made these functions accept a “start index” parameter so that they could be easily re-used from any given starting position.

This initial attempt of relying on an algorithm to take a first pass at the message resulted in a lot of bad guesses – much worse than I thought it would. When I ran the algorithm to determine only the first two words, I got the following:

First two words: 'Hello grits'

which I thought was delightful since grits are often the first food that I put into my belly in the morning, but I knew it wasn’t the right interpretation of the message. Extending the same algorithm to run on the first 2.5 lines (I hadn’t typed in the entire digit message at this point) got some more gibberish:

First handful of words: mostly gibberish

All of these were technically words, but from my initial manual decoding, I knew that the first few words were “Hello. Is it really you?”, so clearly this initial guess was pretty far off. I think it picked up the actual message midstream though with “My name” at the end of the first line.

Lessons learned

At this point, I learned a few things:

  1. This algorithm needs to be smarter. Guessing the longest possible word is not a very reliable way to go. I think the way that humans decode things like this is to use a combination of prioritizing common words and word collocation
  2. I need to insert spaces between words to make the output easier to read
  3. Manual correction is going to be more important than I initially thought, so it’s time to do some design work.

Next up, I’ll figure out how to better accommodate the manual correction and allow the user to run the guessing algorithm again from a specified point.

10 thoughts on “Digit Decoder”

  1. Hi, I decoded a lot of this by hand tonight before finding your website and it was very tedious. I have been examining unintelligible bits and considering Greek and Latin phrases as well as German, because the Thomas Schell spoke English as well as those three languages. I was wondering if your application accounts for more than just English and German, but Latin and Greek dictionaries as well?

    Thank you very much for taking the time to program this application. It is very well-done.

    – Daniel

  2. Thanks for getting in touch, Daniel. I’m glad I’m not the only one crazy enough to try to decode this.

    I actually used an English-only dictionary file (taken from a Scrabble dictionary, with a couple of one-letter words “A” and “I” added). You can see the file at dictionary.txt – then click “view raw”.

    I had thought there might be some German in there as well, but I hadn’t considered Latin and Greek (I didn’t remember that Thomas knew those languages).

    I’d be curious to know what you found in the message. I was only able to get so far with the project I made – there is still a lot that remains unsolved for me. One thing I noticed is that there is a repeated phrase of “I just arrived at the airport. I need to find” (the next word is not in the English dictionary). Then at other points, that phrase is interrupted with punctuation in non-sensical places.

    I actually emailed J.S. Foer’s agent about the project but only got a response from his agent.

  3. I started last night, and What I found yesterday was the first portion up until “find”, and I’m sure that the 8 characters in place of Thomas’ name must follow some kind of pattern I haven’t yet solved. I found 15 or 16 occurrences of 5683 ‘love’, a few more of “I just arrived at the airport, I need to find’ and two I found very peculiar: 5 instances of “65557”, and 7 instances of “26545” and I remembered that Thomas basically outlines for you what he says in the paragraph directly before the mass of numbers: “WHY I’d left, WHERE I’d gone, HOW I’d found out about your death, WHY I’d come back, and WHAT I needed to do with the time I had left.” Both 26545 and 65557 are in the form of coordinates. This is a possibility that these are his way of telling her where he’s been. I found where the combinations of these two coordinates are:

    Let 65N 55′ 7” be A and

    26N 54′ 5” be B

    With latitudinal and longitudinal combinations (A,A), (A,B), (B,A), and (B,B), the locations are:

    1. Near the River Ob’, Russia
    2. Near Ranua, Finland and a Roadway I found labeled 941
    3. The Hindu Kush, in Pakistan, near a level area called “Nal” (I do not know if that is the name of the area or a town/village)
    4. Egypt – a place about equidistant from both Qasr Farafa and Abu Mingar

    With Thomas Sr.’s language abilities, he could go around most of the Old World without having any problem getting around, if he went to the right places. A trend to these locations is that they are all close to some waterway or a city(ies) or a Roadway: but never IN a large city. It’s possible I’m seeing a pattern where there is none; but I feel like these numbers do contain a message. While a lot is repetitive, that’s only to reinforce the idea of the breakdown of communication in the book. I don’t want to think that so many parts are unintelligible unless I absolutely know that they are. I’m going to use your software later and see what I can find that I missed by hand; I found most of what the apparent stuff that your program did. Thank you again for coding this. I think a significance of the locations (if they are indeed meant to be so) is that some of these numbers are actually meant to be numbers. 65557 doesn’t form a single word given any continental or international keypad in any of the languages he speaks. If it really is meant to be intelligible then I’m 90ish percent sure they’re meant to be numbers.

    Thanks, -Daniel

  4. Another idea: Thomas asks immediately before the numbers; “What, I wondered, is the sum of my life?” I’m going to take the sum of each line and of the total body starting from “I just arrived…” to the end, at the last “6-5-5-5-7”, after I finish my homework. Which means I might have to do it tomorrow.

    -Daniel

  5. Really good ideas with the numbers! I hadn’t thought of them representing anything other than words.

    Yeah, those two strings of “26545” and “65557” do seem to be significant the way they appear isolated before exclamation points and question marks.

    Also interesting that the message ends with “65557!”

    Coordinates could be an interesting avenue. Also, I wonder if there might be some meaning to the location on the phone keypad – 2 representing north, 8 representing south, 6 representing east, 4 representing west. Just a passing thought.

    Also, yes, that line “What, I wondered, is the sum of my life?” does sound quite like a hint…

    Well, you’ve got me thinking again about taking another look at the message as well – I’ll have to put some more thought to it in the coming days.

  6. Sorry that I haven’t replied in a while. I’ve been working on a project the school hired the App Development Club to do, and coding my own Chess Engine in C++. I plan to work on this more once my workload is a lot less dense, between projects and schoolwork. But if you’d keep me updated on any progress you make that would be greatly appreciated. Thanks,

    -Daniel

  7. Just a quick update. I spent a little bit of time going down a couple of paths.

    First, I tried some other ideas for what 2, 6, 5, 4, 5 might be. I was wondering if the position of the keys on the phone keypad were significant, and I thought that it might be a reference to the sign of the cross. Seems a little far-fetched though.

    Also, I thought a bit more about addition and subtraction, based on that line: “When the suffering is subtracted from the joy, what remains? What, I wondered, is the sum of my life?”. Before that, he mentions “Love” and “Hate”, I believe, so I tried subtracting the digits of “Hate” from the digits of “Love”, but I got a number that had a zero in it, which has no letters associated with it on the keypad.

    And I found this academic paper, Trauma in Jonathan Safran Foer’s Extremely Loud and Incredibly Close. Page 24 has some discussion of Thomas Schell and his code, but I didn’t get any new discoveries on the code from it.

  8. Hello,

    I love the book, and have read it more times than I can count. I was wondering if you have decoded the rest of the message?

    Thank you,

    Jennifer

  9. Sorry for the delay. I missed this pending comment for a while.

    The most I was able to decode was in this screenshot.

    I came to the conclusion that the message is most likely gibberish past a certain point. Parts of “I just arrived at the airport I need to find” repeat later on, and there are a few stretches without any possible vowels that seem too long to be words. It was a bit disheartening, but it was a fun journey.

    If you are able to decode any more of it, I’d be really curious to know what you find.

Leave a Reply

Your email address will not be published. Required fields are marked *