Re: Arabic analyser

Mahmoud Almokadem Wed, 11 Nov 2015 01:32:41 -0800

Thank Alex,

So BasisTech works for the latest version of solr?


Sincerely,
Mahmoud

On Tue, Nov 10, 2015 at 5:28 PM, Alexandre Rafalovitch <arafa...@gmail.com>
wrote:

> If this is for a significant project and you are ready to pay for it,
> BasisTech has commercial solutions in this area I believe.
>
> Regards,
>    Alex.
> ----
> Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
> http://www.solr-start.com/
>
>
> On 10 November 2015 at 08:46, Mahmoud Almokadem <prog.mahm...@gmail.com>
> wrote:
> > Thanks Pual,
> >
> > Arabic analyser applying filters of normalisation and stemming only for
> > single terms out of standard tokenzier.
> > Gathering all synonyms will be hard work. Should I customise my Tokenizer
> > to handle this case?
> >
> > Sincerely,
> > Mahmoud
> >
> >
> > On Tue, Nov 10, 2015 at 3:06 PM, Paul Libbrecht <p...@hoplahup.net>
> wrote:
> >
> >> Mahmoud,
> >>
> >> there is an arabic analyzer:
> >>   https://wiki.apache.org/solr/LanguageAnalysis#Arabic
> >> doesn't it do what you describe?
> >> Synonyms probably work there too.
> >>
> >> Paul
> >>
> >> > Mahmoud Almokadem <mailto:prog.mahm...@gmail.com>
> >> > 9 novembre 2015 17:47
> >> > Thanks Jack,
> >> >
> >> > This is a good solution, but we have more combinations that I think
> >> > can’t be handled as synonyms like every word starts with ‘عبد’ ‘Abd’
> >> > and ‘أبو’ ‘Abo’. When using Standard tokenizer on ‘أبو بكر’ ‘Abo
> >> > Bakr’, It’ll be tokenised to ‘أبو’ and ‘بكر’ and the filters will be
> >> > applied for each separate term.
> >> >
> >> > Is there available tokeniser to tokenise ‘أبو *’ or ‘عبد *' as a
> >> > single term?
> >> >
> >> > Thanks,
> >> > Mahmoud
> >> >
> >> >
> >> >
> >> > Jack Krupansky <mailto:jack.krupan...@gmail.com>
> >> > 9 novembre 2015 16:47
> >> > Use an index-time (but not query time) synonym filter with a rule
> like:
> >> >
> >> > Abd Allah,Abdallah
> >> >
> >> > This will index the combined word in addition to the separate words.
> >> >
> >> > -- Jack Krupansky
> >> >
> >> > On Mon, Nov 9, 2015 at 4:48 AM, Mahmoud Almokadem <
> >> prog.mahm...@gmail.com>
> >> >
> >> > Mahmoud Almokadem <mailto:prog.mahm...@gmail.com>
> >> > 9 novembre 2015 10:48
> >> > Hello,
> >> >
> >> > We are indexing Arabic content and facing a problem for tokenizing
> multi
> >> > terms phrases like 'عبد الله' 'Abd Allah', so users will search for
> >> > 'عبدالله' 'Abdallah' without space and need to get the results of 'عبد
> >> > الله' with space. We are using StandardTokenizer.
> >> >
> >> >
> >> > Is there any configurations to handle this case?
> >> >
> >> > Thank you,
> >> > Mahmoud
> >> >
> >>
> >>
>

Re: Arabic analyser

Reply via email to