[New Algorithm] Anagram

Jon Bentley’s book Programming Pearls contains a problem about finding anagrams of words.
The statement:

Given a dictionary of english words, find all sets of anagrams. For instance, “pots”, “stop” and “tops” are all anagrams of one another because each can be formed by permuting the letters of the others.

I thought a bit and it came to me that the solution would be to obtain the signature of the word you’re searching and comparing it with all the words in the dictionary. All anagrams of a word should have the same signature. But how to achieve this? My idea was to use the Fundamental Theorem of Arithmetic:

The fundamental theorem of arithmetic states that

every positive integer (except the number 1) can be represented in exactly one way apart from rearrangement as a product of one or more primes

So the idea is to use an array of the first 26 prime numbers. Then for each letter in the word we get the corresponding prime number A = 2, B = 3, C = 5, D = 7 … and then we calculate the product of our input word. Next we do this for each word in the dictionary and if a word matches our input word, then we add it to the resulting list.

The performance is more or less acceptable. For a dictionary of 479828 words, it takes 160 ms to get all anagrams. This is roughly 0.0003 ms / word, or 0.3 microsecond / word. Algorithm’s complexity seems to be O(mn) or ~O(m) where m is the size of the dictionary and n is the length of the input word.

Here’s the code:

 

Leave a Reply

Your email address will not be published. Required fields are marked *