Hello all,

We are using DIH to index our data (~6M documents) and its taking an extremely long time (~24 hours). I am trying to find ways that we can speed this up. I've been reading through older posts and it's my understanding this should not take that long.

One probably bottleneck is that we have a sub entity pulling in item descriptions from a separate datasource which we then strip html from. Before stripping the html we run it through JTidy. Our data-config looks something like this: http://pastie.org/2067011

I've heard about entity threads and I was wondering if this would help in my case? I haven't been able to find any good documentation on this.

Another possible bottleneck is the the number of sub entities we have... 5 (only 1 of which is CachedSqlEntityProcessor). Any ideas?

Thanks for the help


Reply via email to