Hi Erick,"You still haven’t given an example of the results you’re seeing that are unexpected". I will give an example of the data I received. Before starting data update I have:
solrCloud: Expected series criteria:386062
Collected series: 386062
Number of requests: 40
Collected unique series: 386062.
Similar results for nodes in solr cloud.
During the process of updating the series I have:
solrCloud: Expected series criteria:386062
Collected series: 445550
Number of requests: 124
Collected unique series: 386062.
First node:
Expected series criteria:386062
Collected series: 1442775
Number of requests: 146
Collected unique series: 386062.
Second node:
Expected series criteria:386062
Collected series: 242823
Number of requests: 26
Collected unique series: 242823.
After the completion of the data update. I get the data as before the update.
Best,
Vlad
Mon, 28 Sep 2020 10:51:01 -0400, Erick Erickson <erickerick...@gmail.com> писал(а):

I said nothing about docId changing. _any_ sort criteria changing is an issue. You’re sorting by score. Well, as you index documents, the new docs change the values used to calculate scores for _all_ documents will change, thus changing the sort order and potentially causing unexpected results when using cursormark. That said, I don’t think you’re getting any different scores at all if you’re really searching for “(* AND *)", try returning score in the fl list, are they different?

You still haven’t given an example of the results you’re seeing that are unexpected. And my assumption is that you are seeing odd results when you call this query again with a cursorMark returned by a previous call. Or are you saying that you don’t think facet.query is returning the correct count? Be aware that Solr doesn’t support true Boolean logic, see: https://lucidworks.com/post/why-not-and-or-and-not/

There’s special handling for the form "fq=NOT something” to change it to "fq=*:* NOT something” that’s not present in something like "q=NOT something”. How that plays in facet.query I’m not sure, but try “facet.query=*:* NOT something” if the facet count is what the problem is.

l have no idea what you’re trying to accomplish with (* AND *) unless those are just placeholders and you put real text in them. That’s rather odd. *:* is “select everything”...

BTW, returning 10,000 docs is somewhat of an anti-pattern, if you really require that many documents consider streaming.

On Sep 28, 2020, at 10:21 AM, vmakov...@xbsoftware.by wrote:

Hi, Erick

I have a python script that sends requests with CursorMark. This script checks data against the following Expected series criteria:
Collected series:
Number of requests:
Collected unique series:
The request looks like this: select?indent=off&defType=edismax&wt=json&facet.query={!key=NUM_DOCS}NOT SERIES_ID:0&fq=NOT SERIES_ID:0&spellcheck=true&spellcheck.collate=true&spellcheck.extendedResults=true&facet.limit=-1&q=(* AND *)&qf=all_text_stemming all_text&fq=facet_db_code:( "CN" )&fq=-SERIES_CODE:( "TEST" )&fl=SERIES_ID&sort=score desc,docId asc&bq=SERIES_STATUS:T^5&bq=KEY_SERIES_FLAG:1^5&bq=accuracy_name:0&bq=SERIES_STATUS:C^-30&rows=10000&cursorMark=*

DocId does not change during data update.During data updating process in solrCloud skript returnd incorect Number of requests and Collected series.

Best,
Vlad


Mon, 28 Sep 2020 08:54:57 -0400, Erick Erickson <erickerick...@gmail.com> писал(а):

Define “incorrect” please. Also, showing the exact query you use would be helpful. That said, indexing data at the same time you are using CursorMark is not guaranteed do find all documents. Consider a sort with date asc, id asc. doc53 has a date of 2001 and you’re already returned the doc. Next, you update doc53 to 2020. It now appears sometime later in the results due to the changed data. Or the other way, doc53 starts with 2020, and while your cursormark label is in 2010, you change doc53 to have a date of 2001. It will never be returned. Similarly for anything else you change that’s relevant to the sort criteria you’re using. CursorMark doesn’t remember _documents_, just, well, call it the fingerprint (i.e. sort criteria values) of the last document returned so far.
Best,
Erick
On Sep 28, 2020, at 3:32 AM, vmakov...@xbsoftware.by wrote:
Good afternoon,
Could you please suggest us a solution: during data updating process in solrCloud, requests with cursor mark return incorrect data. I suppose that the results do not follow each other during the indexation process, because the data doesn't have enough time to be replicated between the nodes.
Kind regards,
Vladislav Makovski
Vladislav Makovski
Developer
XB Software Ltd. | Minsk, Belarus
Site: https://xbsoftware.com
Skype: vlad__makovski
Cell:  +37529 6484100


Vladislav Makovski
Developer
XB Software Ltd. | Minsk, Belarus
Site: https://xbsoftware.com
Skype: vlad__makovski
Cell:  +37529 6484100

Reply via email to