All,

I'd like to put one of the harder problems we're struggling with to you all:

In freetext queries, our experience is that people tend to write the "dumbest" 
version of a string what they search. For example, they are likely to 
write "Gothe" or "Goethe" rather than "Göthe". The problem is not smaller 
with with accents, people tend to ignore them or get them wrong. This is 
something we need to take into account, but we are unsure about how to do it. 

We could dumb down all strings when indexing, so that an "ö" becomes "oe", but 
the example of where this would be wrong is not hard to find: was Göthe a 
great pöt or a was Goethe a great poet? 

Have anyone else encountered the same problem, and if so, what is your take on 
it? 

While acknowledging the obvious problems, our customer still feels that 
dumbing down certain characters in both ends is the best solution. Therefore 
right now, in our Jena-based solution, we have implemented a solution where 
they can create a hash that makes it possible to map e.g. "ø" and "ö" to "o", 
so that only Gothe is indexed. Then, if a user searches for Göthe, the query 
will be written as Gothe. This sort of does the job, and it is relatively 
simple to map several characters to one, but mapping one character is harder. 

We are in the process of migrating the whole solution and take Jena out of the 
mix for most components, so we are looking for a better solution to this 
problem than we have ourselves. Additionally, our own solution requires Jena, 
so we would prefer a Virtuoso-only solution. 

Kind regards 

Kjetil Kjernsmo
-- 
Senior Knowledge Engineer
Mobile: +47 986 48 234
Email: kjetil.kjern...@computas.com   
Web: http://www.computas.com/

|  SHARE YOUR KNOWLEDGE  |

Computas AS  PO Box 482, N-1327 Lysaker | Phone:+47 6783 1000 | Fax:+47 6783 
1001


Reply via email to