SOLR and Multi-Languages

One of our site requires Japanese search capabilities which looked easy on paper, but after playing around for the better part of a day I realized it wasn’t. It doesn’t take much to get it going, but sifting through mailing lists is pretty much the only way to get the details.

If you do a search using Japanese characters you’ll notice that solr will find results out of the box. The problem is that it uses English methods of search. The main problems this creates is that is assumes spaces separate words, which is not true in Japanese. A “tokenizer” creates the words for lucene to use. So we need to use a tokenizer that will work with Japanese. The CJK Tokenizer does the trick.

SOLR will not handle multiple languages in a single field so we need to create a new field for each language. First though you need to create the type of field:

Now define the field:

I, for whatever reason, could not get this going without specifically identifying which field the search should use, so I added “qf=body_ja” to my query string. If you see other syntax for defining the field type don’t use it (at least with solr 1.3-1.4). It seems to break the words up correctly but you won’t be able to search.

This tokenizer will is not 100% however. It breaks the characters into pairs and solr does its best to find matches. I’m not sure for how much, but you can find a better tokenizer at basistech.com.

After you adjust your front end to save into the new fields you should be off to the race.

SOLR 1.4 / LocalSOLR Gotchas

I wanted to write down some strange behavior I came across… hopefully it’ll help someone.

Firstly, SOLR 1.4 changed the date range query syntax.  Instead of “createddate[* to NOW]” try “createddate[* NOW]”.

Secondly running empty query with SOLR 1.4 and solrlocal has a funny syntax.  If  I’m doing a blank search without localsolr I’d just use a space (or %20) to select all.  But, when I include localsolr syntax, I have to use the old q=*:* syntax.

Hope this helps someone to not waste as much time as I did trying to figure it out.  If your developing in Drupal and want location-based search using solr, please check out http://drupal.org/node/347428