Re: Arabic words search in solr

Steve Rowe Thu, 09 Mar 2017 11:14:55 -0800

Hi Mohan,

Your examples refer to documents I don’t have in my 9 document set, so I recast 
the problem to a query/doc combo I have from earlier in this thread, and I was 
able to restrict hits to only documents that contained all terms from the query.

If I use the query “name_ar:(شرطة ازكي)” I get 3 hits (I’ve left out some 
details):

-----
{ "responseHeader": { ... "params": { "q":"name_ar:(شرطة ازكي)”, ... } },
  "response": { "numFound":3, "start":0,
    "docs": [
      { "id":"6", "name_ar":["شرطة عمان السلطانية - قيادة شرطة محافظة الداخلية 
- - مركز شرطة إزكي"], ... },
      { "id":"3", "name_ar":["شرطة عمان السلطانية - قيادة شرطة محافظة شمال 
الشرقية - - مركز شرطة إبراء”], ... },
      { "id":"8", "name_ar":["وزارة الصحة - المديرية العامة للخدمات الصحية  
محافظة الداخلية -  - مستشفى إزكي (البدالة)  - الطوارئ”], ... }]}
-----

If I add “q.op=AND” to the request, only one of these documents matches - note 
that I’ve also checked the “debugQuery” option on the Admin UI:

-----
{ "responseHeader": { … 
  "params": { "q":"name_ar:(شرطة ازكي)”, "q.op":"AND”, "debugQuery":“true”, ... 
} },
  "response": { "numFound":1, "start":0,
    "docs": [
      { "id":"6", "name_ar":["شرطة عمان السلطانية - قيادة شرطة محافظة الداخلية 
- - مركز شرطة إزكي”], ... }]},
  "debug": {
    "rawquerystring": "name_ar:(شرطة ازكي)",
    "querystring": "name_ar:(شرطة ازكي)",
    "parsedquery": "+name_ar:شرط +name_ar:ازك",
    "parsedquery_toString": "+name_ar:شرط +name_ar:ازك",
-----

Note the “parsedquery" above - it shows how to require individual terms when 
specifying the field for each term.  This is how the "name_ar:(شرطة ازكي)” 
query is interpreted when the "q.op=AND” request param is used.

The equivalent query using ‘+’ signs is: "name_ar:(+شرطة +ازكي)”.  This *looks* 
strange because of how the Unicode bidirectional algorithm works.  This W3C 
writeup uses Arabic to drive its discussion of display of strings that contain 
both RTL and LTR character runs, and I found it quite helpful here: 
<https://www.w3.org/International/articles/inline-bidi-markup/uba-basics>.

Here’s the output from the "name_ar:(+شرطة +ازكي)” query:

-----
{ "responseHeader": { ... "params": { "q":"name_ar:(+شرطة +ازكي)", 
"debugQuery":“true” ... } },
  "response": { "numFound":1, "start":0,
    "docs": [
      { "id":"6", "name_ar":["شرطة عمان السلطانية - قيادة شرطة محافظة الداخلية 
- - مركز شرطة إزكي”], ... }]},
  "debug": {
    "rawquerystring": "name_ar:(+شرطة +ازكي)",
    "querystring": "name_ar:(+شرطة +ازكي)",
    "parsedquery": "+name_ar:شرط +name_ar:ازك",
    "parsedquery_toString": "+name_ar:شرط +name_ar:ازك",
-----

The above is the same result (and has the same parsedQuery) as query 
"name_ar:(شرطة ازكي)” with request param “q.op=AND”.

I won’t show it here, but I get the same 1-hit result for this query when I use 
AND instead of ‘+’: "name_ar:(شرطة AND ازكي)” - note that the terms only 
*appear* to be in reverse order because of how the Unicode bidirectional 
algorithm works.

> On Mar 9, 2017, at 2:30 AM, mohanmca01 <mohanmc...@gmail.com> wrote:
> 
> I saw your products in lucidworks website. Do you have any solr arabic
> support customized product?

Lucidworks doesn’t have a specifically Arabic-focused product, but we have 
helped people enable Arabic search in the past.  Click on the “Contact Us” link 
on the website if you’d like to talk to us about getting involved.

--
Steve
www.lucidworks.com

Re: Arabic words search in solr

Reply via email to