Hello, Does your indexer utilize CPU/IO? - check it by iostat/vmstat. If it doesn't, take several thread dumps by jvisualvm sampler or jstack, try to understand what blocks your threads from progress. It might happen you need to speedup your SQL data consumption, to do this, you can enable threads in DIH (only in 3.6.1), move from N+1 SQL queries to select all/cache approach http://wiki.apache.org/solr/DataImportHandler#CachedSqlEntityProcessor and https://issues.apache.org/jira/browse/SOLR-2382
Good luck On Wed, Aug 8, 2012 at 9:16 AM, Pranav Prakash <pra...@gmail.com> wrote: > Folks, > > My full data import takes ~80hrs. It has around ~9m documents and ~15 SQL > queries for each document. The database servers are different from Solr > Servers. Each document has an update processor chain which (a) calculates > signature of the document using SignatureUpdateProcessorFactory and (b) > Finds out terms which have term frequency > 2; using a custom processor. > The index size is ~ 480GiB > > I want to know if the amount of time taken is too large compared to the > document count? How do I benchmark the stats and what are some of the ways > I can improve this? I believe there are some optimizations that I could do > at Update Processor Factory level as well. What would be a good way to get > dirty on this? > > *Pranav Prakash* > > "temet nosce" > -- Sincerely yours Mikhail Khludnev Tech Lead Grid Dynamics <http://www.griddynamics.com> <mkhlud...@griddynamics.com>