Tuesday, 10 March 2009

predictive text isotaps

So, ever since getting a girlfriend, I've been texting like crazy. I actually know my way around the numpad, and I've gotten pretty fast at texting. This has pulled my attention toward the input method most people use for writing text messages on their phone nowadays, called predictive text.

Every button to on your phone is mapped to multiple characters. This is how you can resolve phone numbers like 1-800-pizza, but they are also used for texting. 2 maps to {'a', 'b', 'c'}, 3 to {'d', 'e', 'g'}, et cetera. In the old, traditional texting method, you would press the 2 key once to get an 'a,' twice for a 'b,' and so on. In predictive text you tap the 2 key only once, no matter which letter you want. The software uses a dictionary of words to see which word your sequence matches to. For example, to get 'you,' you type 968. The dictionary reveals that 'you' is the only word with this combination, and that word is printed to the screen. This is much more efficient than old style texting, since you need to press less buttons to get your word.

Of course, there are inevitable some words that have the same number sequence. For example, 'home,' 'good' and 'gone' all have the sequence 4663. These words are referred to as isotaps. In this case you can select the word you want with the arrow keys. The list is of course ordered by frequency of use.

Isotaps are quite annoying, because they require that you keep looking at the keyboard in case you meet one. So, I was wondering, how many isotaps are there in the english language? Using some python magic I got the answer, along with a few other random statistics. This was ran against the Ubuntu word list, which is usually more expansive than a cell phone one, but it gives an indication. It would be interesting to compare against other languages, to see which languages is most amenable to text prediction (for a better indication, weight isotaps with word frequency)
  • number of isotaps: 14152
  • longest isotap: size(14) 78873322846617 ["putrefaction's", "stupefaction's"]
  • sequence with most isotaps: (length: 12) 22737 ['acres', 'bards', 'barer', 'bares', 'barfs', 'baser', 'bases', 'caper', 'capes', 'cards', 'cares', 'cases']
amount of sequences with x isotaps:
x = 1: 0
x = 2: 4429
x = 3: 980
x = 4: 308
x = 5: 129
x = 6: 39
x = 7: 14
x = 8: 12
x = 9: 3
x = 10: 1
x = 11: 0
x = 12: 1

amount of isotaps of length x:
x = 1: 24
x = 2: 151
x = 3: 645
x = 4: 2134
x = 5: 2937
x = 6: 3389
x = 7: 2603
x = 8: 1366
x = 9: 517
x = 10: 218
x = 11: 81
x = 12: 49
x = 13: 11
x = 14: 5

Note that there are more isotaps of length 14. given is merely an example. The script is available here. I'm afraid I didn't do any fancy graphs, but the data is pretty interesting I think.

3 comments:

Martijn said...

"Note that there are more isotaps of length 14. given is merely an example"

You mean that all the numbers are false, or only 14? Then why is 14 an example?

tania said...
This comment has been removed by the author.
tania said...

Wow, this is so cool! I really don't ever get any of these stuff, but this time I really tried to asimilate everything. It's really interesting, specially because the introduction text is a bit familiar to everybody. I always just type with the old way, for me it's a bit faster, because I don't get so confused, (yes, I'm not good with technology) i guess it works for me because it only requires clicking, no choosing or moving with the arrows... I don't know how some people can write text messages without even looking at the screen, and under the desks... I'm so bad at texting...