Every button to on your phone is mapped to multiple characters. This is how you can resolve phone numbers like 1-800-pizza, but they are also used for texting. 2 maps to {'a', 'b', 'c'}, 3 to {'d', 'e', 'g'}, et cetera. In the old, traditional texting method, you would press the 2 key once to get an 'a,' twice for a 'b,' and so on. In predictive text you tap the 2 key only once, no matter which letter you want. The software uses a dictionary of words to see which word your sequence matches to. For example, to get 'you,' you type 968. The dictionary reveals that 'you' is the only word with this combination, and that word is printed to the screen. This is much more efficient than old style texting, since you need to press less buttons to get your word.
Of course, there are inevitable some words that have the same number sequence. For example, 'home,' 'good' and 'gone' all have the sequence 4663. These words are referred to as isotaps. In this case you can select the word you want with the arrow keys. The list is of course ordered by frequency of use.
Isotaps are quite annoying, because they require that you keep looking at the keyboard in case you meet one. So, I was wondering, how many isotaps are there in the english language? Using some python magic I got the answer, along with a few other random statistics. This was ran against the Ubuntu word list, which is usually more expansive than a cell phone one, but it gives an indication. It would be interesting to compare against other languages, to see which languages is most amenable to text prediction (for a better indication, weight isotaps with word frequency)
- number of isotaps: 14152
- longest isotap: size(14) 78873322846617 ["putrefaction's", "stupefaction's"]
- sequence with most isotaps: (length: 12) 22737 ['acres', 'bards', 'barer', 'bares', 'barfs', 'baser', 'bases', 'caper', 'capes', 'cards', 'cares', 'cases']
x = 1: 0
x = 2: 4429
x = 3: 980
x = 4: 308
x = 5: 129
x = 6: 39
x = 7: 14
x = 8: 12
x = 9: 3
x = 10: 1
x = 11: 0
x = 12: 1
amount of isotaps of length x:
x = 1: 24
x = 2: 151
x = 3: 645
x = 4: 2134
x = 5: 2937
x = 6: 3389
x = 7: 2603
x = 8: 1366
x = 9: 517
x = 10: 218
x = 11: 81
x = 12: 49
x = 13: 11
x = 14: 5
Note that there are more isotaps of length 14. given is merely an example. The script is available here. I'm afraid I didn't do any fancy graphs, but the data is pretty interesting I think.