If this is for a significant project and you are ready to pay for it, BasisTech has commercial solutions in this area I believe.
Regards, Alex. ---- Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter: http://www.solr-start.com/ On 10 November 2015 at 08:46, Mahmoud Almokadem <prog.mahm...@gmail.com> wrote: > Thanks Pual, > > Arabic analyser applying filters of normalisation and stemming only for > single terms out of standard tokenzier. > Gathering all synonyms will be hard work. Should I customise my Tokenizer > to handle this case? > > Sincerely, > Mahmoud > > > On Tue, Nov 10, 2015 at 3:06 PM, Paul Libbrecht <p...@hoplahup.net> wrote: > >> Mahmoud, >> >> there is an arabic analyzer: >> https://wiki.apache.org/solr/LanguageAnalysis#Arabic >> doesn't it do what you describe? >> Synonyms probably work there too. >> >> Paul >> >> > Mahmoud Almokadem <mailto:prog.mahm...@gmail.com> >> > 9 novembre 2015 17:47 >> > Thanks Jack, >> > >> > This is a good solution, but we have more combinations that I think >> > can’t be handled as synonyms like every word starts with ‘عبد’ ‘Abd’ >> > and ‘أبو’ ‘Abo’. When using Standard tokenizer on ‘أبو بكر’ ‘Abo >> > Bakr’, It’ll be tokenised to ‘أبو’ and ‘بكر’ and the filters will be >> > applied for each separate term. >> > >> > Is there available tokeniser to tokenise ‘أبو *’ or ‘عبد *' as a >> > single term? >> > >> > Thanks, >> > Mahmoud >> > >> > >> > >> > Jack Krupansky <mailto:jack.krupan...@gmail.com> >> > 9 novembre 2015 16:47 >> > Use an index-time (but not query time) synonym filter with a rule like: >> > >> > Abd Allah,Abdallah >> > >> > This will index the combined word in addition to the separate words. >> > >> > -- Jack Krupansky >> > >> > On Mon, Nov 9, 2015 at 4:48 AM, Mahmoud Almokadem < >> prog.mahm...@gmail.com> >> > >> > Mahmoud Almokadem <mailto:prog.mahm...@gmail.com> >> > 9 novembre 2015 10:48 >> > Hello, >> > >> > We are indexing Arabic content and facing a problem for tokenizing multi >> > terms phrases like 'عبد الله' 'Abd Allah', so users will search for >> > 'عبدالله' 'Abdallah' without space and need to get the results of 'عبد >> > الله' with space. We are using StandardTokenizer. >> > >> > >> > Is there any configurations to handle this case? >> > >> > Thank you, >> > Mahmoud >> > >> >>