Hi there.

We are currently indexing some PDF files, the main handler to index is
/extract where we perform simple processing (extract relevant fields and
store on some fields). 

The PDF files are about 10M~100M size and we have to have available the text
extracted. So, everything works correct on test stages, but when we try to
index all the 14K files (around 120Gb) on a client application that only
sends http curls through 3-4 concurrent threads to /extract handler it
crashes. I can't find some relevant information about on solr logs (We
checked in server/logs & in core_dir/tlog).

My question is about performance. I think it is a small amount of info we
are processing, the deploy scenario is in a docker container with 4gb of JVM
Memory and ~50gb of physical memory (reported through dashboard) we are
using a single instance. 

I don't think is a normal behaviour that handler crashes. So, what are some
general tips about improving performance for this scenario?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-performance-issue-on-indexing-tp4327886.html
Sent from the Solr - User mailing list archive at Nabble.com.

Reply via email to