Hello all,
We are using DIH to index our data (~6M documents) and its taking an
extremely long time (~24 hours). I am trying to find ways that we can
speed this up. I've been reading through older posts and it's my
understanding this should not take that long.
One probably bottleneck is that we have a sub entity pulling in item
descriptions from a separate datasource which we then strip html from.
Before stripping the html we run it through JTidy. Our data-config looks
something like this: http://pastie.org/2067011
I've heard about entity threads and I was wondering if this would help
in my case? I haven't been able to find any good documentation on this.
Another possible bottleneck is the the number of sub entities we have...
5 (only 1 of which is CachedSqlEntityProcessor). Any ideas?
Thanks for the help