Re: Is this too much time for full Data Import?

Mikhail Khludnev Tue, 07 Aug 2012 23:37:03 -0700

Hello,

Does your indexer utilize CPU/IO? - check it by iostat/vmstat.
If it doesn't, take several thread dumps by jvisualvm sampler or jstack,
try to understand what blocks your threads from progress.
It might happen you need to speedup your SQL data consumption, to do this,
you can enable threads in DIH (only in 3.6.1), move from N+1 SQL queries to
select all/cache approach
http://wiki.apache.org/solr/DataImportHandler#CachedSqlEntityProcessor and
https://issues.apache.org/jira/browse/SOLR-2382


Good luck

On Wed, Aug 8, 2012 at 9:16 AM, Pranav Prakash <pra...@gmail.com> wrote:

> Folks,
>
> My full data import takes ~80hrs. It has around ~9m documents and ~15 SQL
> queries for each document. The database servers are different from Solr
> Servers. Each document has an update processor chain which (a) calculates
> signature of the document using SignatureUpdateProcessorFactory and (b)
> Finds out terms which have term frequency > 2; using a custom processor.
> The index size is ~ 480GiB
>
> I want to know if the amount of time taken is too large compared to the
> document count? How do I benchmark the stats and what are some of the ways
> I can improve this? I believe there are some optimizations that I could do
> at Update Processor Factory level as well. What would be a good way to get
> dirty on this?
>
> *Pranav Prakash*
>
> "temet nosce"
>



-- 
Sincerely yours
Mikhail Khludnev
Tech Lead
Grid Dynamics

<http://www.griddynamics.com>
 <mkhlud...@griddynamics.com>

Re: Is this too much time for full Data Import?

Reply via email to