Hi Shawn and everyone who replied to the thread,

The solr version is 5.2.1 and each document is returning multi-valued fields 
for majority of fields defined in schema.xml. I'm in the process of pasting the 
content of my files to a paste website and soon will update.

Thanks,
Srinivas


On 11/19/2018 2:31 AM, Srinivas Kashyap wrote:
> I have a solr core with some 20 fields in it.(all are stored and indexed). 
> For an environment, the number of documents are around 0.29 million. When I 
> run the full import through DIH, indexing is completing successfully. But, it 
> is occupying the disk space of around 5 GB. Is there a possibility where I 
> can go and check, which document is consuming more memory? Put in another 
> way, can I sort the index based on size?

I am not aware of any way to do that.  Might be one that I don't know about, 
but if there were a way, seems like I would have come across it before.

It is not very that the large index size is due to a single document or a 
handful of documents.  It is more likely that most documents are relatively 
large.  I could be wrong about that, though.

If you have 290000 documents (which is how I interpreted 0.29 million) and the 
total index size is about 5 GB, then the average size per document in the index 
is about 18 kilobytes.This is in my view pretty large.  Typically I think that 
most documents are 1-2 kilobytes.

Can we get your Solr version, a copy of your schema, and exactly what Solr 
returns in search results for a typically sized document?  You'll need to use a 
paste website or a file-sharing website ... if you try to attach these things 
to a message, the mailing list will most likely eat them, and we'll never see 
them. If you need to redact the information in search results ... please do it 
in a way that we can still see the exact size of the text -- don't just remove 
information, replace it with information that's the same length.

Thanks,
Shawn

________________________________
DISCLAIMER:
E-mails and attachments from Bamboo Rose, LLC are confidential.
If you are not the intended recipient, please notify the sender immediately by 
replying to the e-mail, and then delete it without making copies or using it in 
any way.
No representation is made that this email or any attachments are free of 
viruses. Virus scanning is recommended and is the responsibility of the 
recipient.

Reply via email to