Re: Apache Solr Query Issue with huge data

uyil...@vivaldi.net.INVALID Fri, 05 Apr 2024 02:07:21 -0700

Hi,

Solr usually fills the heap with various caches so I wouldn't worry much about 
it consuming %90 of the heap, unless I get OutOfMemory errors.


Pagination using rows parameter is intended for when row count is very low and 
page number is also small (eg. rows=10 page=2 etc.). It's problematic when 
either of them is high.

To read millions of rows from Solr without issues, take a look at the /export 
handler (or /stream handler if you are using SolrCloud and want to export from 
a distributed collection in one go).

Only downside of the /export handler is that it requires all of the fields you 
want to export to have docValues enabled, so you may need to write a new schema 
and reindex your documents. Another catch is most of the text field types don't 
support docValues so you might need to have copyFields for each field to store 
them in a "string" typed field.

--ufuk yilmaz


________________________________
From: prasad bezavada <prasadbezav...@gmail.com>
Sent: Friday, April 5, 2024 11:36 AM
To: users@solr.apache.org <users@solr.apache.org>
Subject: Apache Solr Query Issue with huge data

Dear Team,

I'm currently using Solr version 8.11.3, configured with RAM resources (125
GB physical memory, 64 GB heap memory). The collection comprises 4 shards
within the same node. Through our Java application ( SolrJ),
indexed approximately 8 million records from an RDBMS table into Solr.

Presently, my task is to query this indexes and exporting the results (5
million records fetched with my solr query) to PDF format via our Java
application. To avoid potential heap memory issues, I've implemented
pagination (3 lakhs) in the query using start and setrows parameters.

However, I've encountered an issue where the response time for subsequent
queries to fetch the next set of results (e.g., 3 to 6 lakhs, 6 to 9 lakhs)
progressively increases, leading to socket timeout exceptions.
Additionally, Solr's physical memory consumption exceeds 90%, without
releasing it.

I have several queries regarding this situation:

Why does the query time in Solr increase with each pagination query?
What causes Solr to occupy over 90% of physical memory and fail to release
it?
What would be the optimal approach for retrieving 5 million records from
our Java application and exporting them to PDF or other file formats?
Your insights and suggestions on resolving these issues would be greatly
appreciated.


--
*Thanks&Regards*

*Prasad Bezavada*

Re: Apache Solr Query Issue with huge data

Reply via email to