Posts tagged mongodb
Posts tagged mongodb
So I’ve finished my anagram finder thing and it’s fast. http://anagrams.heroku.com/
It hardly has any design to the page as I’ve not had time to do that. It’s backed with MongoDB and searches documents of the form { word: “LOLCATZ”, lookup: “acllotz” } with an index on “lookup” so it’s fast.
Some issues that I had was the permute function that I wrote had a slight flaw that turned out making it hardly generate any permutations. This took me a while to find because when the search only provided a few results I figure it was just because I wasn’t using a large wordlist. Now the permute function is fixed and many more words are returned.
Any feedback or suggestions would be appreciated.
Inserted 8,087 small documents in 6.1125 seconds from a small Sinatra ruby app.
I decided to get familiar with MongoDB since I plan to use it in a product that I’m working on (was going to go with Redis but realized that’s a hack and while it works I haven’t been able to figure out an easy geospacial thing with Redis). A while ago I made a post about how NoSQL isn’t always the way to go. I was trying to find anagrams of words by computing a score for each word based on the product of prime numbers found by mapping the letters to primes. Then anagrams are just words where the gcd is greater than one. An example might help, ‘abc’ -> 2*3*5 = 30, ‘cab’ -> 5*2*3 = 30, gcd(30,30) > 1; also ‘ba’ -> 3*2 = 6 and gcd(30, 6) > 1. This works well for an in-memory system (it’s what I use in my Scala OMGBot that plays Letterblox) but not so much for a database like MongoDB.
What I’m doing now is making an anagram lookup service with several improvements. Firstly, there won’t be so many random words as now I’m using the text of Sherlock Holmes as my wordlist rather than some arbitrary wordlist that I found on the internet. Also this will use MongoDB hosted at MongoHQ and will hopefully be more efficient and faster.
The plan to find anagrams with Mongo is to store documents in the form { word: “LOLCATZ”, lookup: “acllotz”} with an index on lookup, the alphabetically sorted version of the word. Then, to find an anagram, all permutations of a word will be made into an array and then mongo will be queried via { lookup: {$in: [‘a’, ‘c’, … ‘ac’, … ‘lolcatz’, …]}} This should be much faster than the previous method that made big use of the modulo operator, something that mongo can’t index.
I should be done with this lookup system sometime tonight and I’ll make a post explaining a few things about it.