Hi there. We are currently indexing some PDF files, the main handler to index is /extract where we perform simple processing (extract relevant fields and store on some fields).
The PDF files are about 10M~100M size and we have to have available the text extracted. So, everything works correct on test stages, but when we try to index all the 14K files (around 120Gb) on a client application that only sends http curls through 3-4 concurrent threads to /extract handler it crashes. I can't find some relevant information about on solr logs (We checked in server/logs & in core_dir/tlog). My question is about performance. I think it is a small amount of info we are processing, the deploy scenario is in a docker container with 4gb of JVM Memory and ~50gb of physical memory (reported through dashboard) we are using a single instance. I don't think is a normal behaviour that handler crashes. So, what are some general tips about improving performance for this scenario? -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-performance-issue-on-indexing-tp4327886.html Sent from the Solr - User mailing list archive at Nabble.com.