Tuesday 10 March 2009

predictive text isotaps

So, ever since getting a girlfriend, I've been texting like crazy. I actually know my way around the numpad, and I've gotten pretty fast at texting. This has pulled my attention toward the input method most people use for writing text messages on their phone nowadays, called predictive text.

Every button to on your phone is mapped to multiple characters. This is how you can resolve phone numbers like 1-800-pizza, but they are also used for texting. 2 maps to {'a', 'b', 'c'}, 3 to {'d', 'e', 'g'}, et cetera. In the old, traditional texting method, you would press the 2 key once to get an 'a,' twice for a 'b,' and so on. In predictive text you tap the 2 key only once, no matter which letter you want. The software uses a dictionary of words to see which word your sequence matches to. For example, to get 'you,' you type 968. The dictionary reveals that 'you' is the only word with this combination, and that word is printed to the screen. This is much more efficient than old style texting, since you need to press less buttons to get your word.

Of course, there are inevitable some words that have the same number sequence. For example, 'home,' 'good' and 'gone' all have the sequence 4663. These words are referred to as isotaps. In this case you can select the word you want with the arrow keys. The list is of course ordered by frequency of use.

Isotaps are quite annoying, because they require that you keep looking at the keyboard in case you meet one. So, I was wondering, how many isotaps are there in the english language? Using some python magic I got the answer, along with a few other random statistics. This was ran against the Ubuntu word list, which is usually more expansive than a cell phone one, but it gives an indication. It would be interesting to compare against other languages, to see which languages is most amenable to text prediction (for a better indication, weight isotaps with word frequency)
  • number of isotaps: 14152
  • longest isotap: size(14) 78873322846617 ["putrefaction's", "stupefaction's"]
  • sequence with most isotaps: (length: 12) 22737 ['acres', 'bards', 'barer', 'bares', 'barfs', 'baser', 'bases', 'caper', 'capes', 'cards', 'cares', 'cases']
amount of sequences with x isotaps:
x = 1: 0
x = 2: 4429
x = 3: 980
x = 4: 308
x = 5: 129
x = 6: 39
x = 7: 14
x = 8: 12
x = 9: 3
x = 10: 1
x = 11: 0
x = 12: 1

amount of isotaps of length x:
x = 1: 24
x = 2: 151
x = 3: 645
x = 4: 2134
x = 5: 2937
x = 6: 3389
x = 7: 2603
x = 8: 1366
x = 9: 517
x = 10: 218
x = 11: 81
x = 12: 49
x = 13: 11
x = 14: 5

Note that there are more isotaps of length 14. given is merely an example. The script is available here. I'm afraid I didn't do any fancy graphs, but the data is pretty interesting I think.

This is why I don't like Mines


I'll wait a second while everyone absorbs that screen shot. Just click it and look at it in full size. still waiting... You got it? So yeah, that's a game of Mines (that's minesweeper if you use windows) that I just finished. 99 mines, time is 5:12. That's pretty awesome. Except that I messed up on the last mine. Those who are familiar with Mines will see, though, that the square I picked is equally as valid as the other one. I did not lose due to a logic error, but merely because I was forced to guess.

And that just bothers me. I like Mines because it is a simple logic game that does not take much mind power to solve but does require some thinking. The problem with it is that most of the games I play end up requiring some guessing to finish. This is frustrating. One guess reduces your chances of winning to a mere 5o percent, due to no fault on your part.

Which makes me wonder, is it possible to generate minefields which are ensured to be solvable without guessing? How expensive is that?