Re: Arabic analyser

Alexandre Rafalovitch Tue, 10 Nov 2015 07:30:06 -0800

If this is for a significant project and you are ready to pay for it,
BasisTech has commercial solutions in this area I believe.


Regards,
   Alex.
----
Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
http://www.solr-start.com/


On 10 November 2015 at 08:46, Mahmoud Almokadem <prog.mahm...@gmail.com> wrote:
> Thanks Pual,
>
> Arabic analyser applying filters of normalisation and stemming only for
> single terms out of standard tokenzier.
> Gathering all synonyms will be hard work. Should I customise my Tokenizer
> to handle this case?
>
> Sincerely,
> Mahmoud
>
>
> On Tue, Nov 10, 2015 at 3:06 PM, Paul Libbrecht <p...@hoplahup.net> wrote:
>
>> Mahmoud,
>>
>> there is an arabic analyzer:
>>   https://wiki.apache.org/solr/LanguageAnalysis#Arabic
>> doesn't it do what you describe?
>> Synonyms probably work there too.
>>
>> Paul
>>
>> > Mahmoud Almokadem <mailto:prog.mahm...@gmail.com>
>> > 9 novembre 2015 17:47
>> > Thanks Jack,
>> >
>> > This is a good solution, but we have more combinations that I think
>> > can’t be handled as synonyms like every word starts with ‘عبد’ ‘Abd’
>> > and ‘أبو’ ‘Abo’. When using Standard tokenizer on ‘أبو بكر’ ‘Abo
>> > Bakr’, It’ll be tokenised to ‘أبو’ and ‘بكر’ and the filters will be
>> > applied for each separate term.
>> >
>> > Is there available tokeniser to tokenise ‘أبو *’ or ‘عبد *' as a
>> > single term?
>> >
>> > Thanks,
>> > Mahmoud
>> >
>> >
>> >
>> > Jack Krupansky <mailto:jack.krupan...@gmail.com>
>> > 9 novembre 2015 16:47
>> > Use an index-time (but not query time) synonym filter with a rule like:
>> >
>> > Abd Allah,Abdallah
>> >
>> > This will index the combined word in addition to the separate words.
>> >
>> > -- Jack Krupansky
>> >
>> > On Mon, Nov 9, 2015 at 4:48 AM, Mahmoud Almokadem <
>> prog.mahm...@gmail.com>
>> >
>> > Mahmoud Almokadem <mailto:prog.mahm...@gmail.com>
>> > 9 novembre 2015 10:48
>> > Hello,
>> >
>> > We are indexing Arabic content and facing a problem for tokenizing multi
>> > terms phrases like 'عبد الله' 'Abd Allah', so users will search for
>> > 'عبدالله' 'Abdallah' without space and need to get the results of 'عبد
>> > الله' with space. We are using StandardTokenizer.
>> >
>> >
>> > Is there any configurations to handle this case?
>> >
>> > Thank you,
>> > Mahmoud
>> >
>>
>>

Re: Arabic analyser

Reply via email to