All, I'd like to put one of the harder problems we're struggling with to you all:
In freetext queries, our experience is that people tend to write the "dumbest" version of a string what they search. For example, they are likely to write "Gothe" or "Goethe" rather than "Göthe". The problem is not smaller with with accents, people tend to ignore them or get them wrong. This is something we need to take into account, but we are unsure about how to do it. We could dumb down all strings when indexing, so that an "ö" becomes "oe", but the example of where this would be wrong is not hard to find: was Göthe a great pöt or a was Goethe a great poet? Have anyone else encountered the same problem, and if so, what is your take on it? While acknowledging the obvious problems, our customer still feels that dumbing down certain characters in both ends is the best solution. Therefore right now, in our Jena-based solution, we have implemented a solution where they can create a hash that makes it possible to map e.g. "ø" and "ö" to "o", so that only Gothe is indexed. Then, if a user searches for Göthe, the query will be written as Gothe. This sort of does the job, and it is relatively simple to map several characters to one, but mapping one character is harder. We are in the process of migrating the whole solution and take Jena out of the mix for most components, so we are looking for a better solution to this problem than we have ourselves. Additionally, our own solution requires Jena, so we would prefer a Virtuoso-only solution. Kind regards Kjetil Kjernsmo -- Senior Knowledge Engineer Mobile: +47 986 48 234 Email: kjetil.kjern...@computas.com Web: http://www.computas.com/ | SHARE YOUR KNOWLEDGE | Computas AS PO Box 482, N-1327 Lysaker | Phone:+47 6783 1000 | Fax:+47 6783 1001