Hi, Solr usually fills the heap with various caches so I wouldn't worry much about it consuming %90 of the heap, unless I get OutOfMemory errors.
Pagination using rows parameter is intended for when row count is very low and page number is also small (eg. rows=10 page=2 etc.). It's problematic when either of them is high. To read millions of rows from Solr without issues, take a look at the /export handler (or /stream handler if you are using SolrCloud and want to export from a distributed collection in one go). Only downside of the /export handler is that it requires all of the fields you want to export to have docValues enabled, so you may need to write a new schema and reindex your documents. Another catch is most of the text field types don't support docValues so you might need to have copyFields for each field to store them in a "string" typed field. --ufuk yilmaz ________________________________ From: prasad bezavada <prasadbezav...@gmail.com> Sent: Friday, April 5, 2024 11:36 AM To: users@solr.apache.org <users@solr.apache.org> Subject: Apache Solr Query Issue with huge data Dear Team, I'm currently using Solr version 8.11.3, configured with RAM resources (125 GB physical memory, 64 GB heap memory). The collection comprises 4 shards within the same node. Through our Java application ( SolrJ), indexed approximately 8 million records from an RDBMS table into Solr. Presently, my task is to query this indexes and exporting the results (5 million records fetched with my solr query) to PDF format via our Java application. To avoid potential heap memory issues, I've implemented pagination (3 lakhs) in the query using start and setrows parameters. However, I've encountered an issue where the response time for subsequent queries to fetch the next set of results (e.g., 3 to 6 lakhs, 6 to 9 lakhs) progressively increases, leading to socket timeout exceptions. Additionally, Solr's physical memory consumption exceeds 90%, without releasing it. I have several queries regarding this situation: Why does the query time in Solr increase with each pagination query? What causes Solr to occupy over 90% of physical memory and fail to release it? What would be the optimal approach for retrieving 5 million records from our Java application and exporting them to PDF or other file formats? Your insights and suggestions on resolving these issues would be greatly appreciated. -- *Thanks&Regards* *Prasad Bezavada*