Sub entities can slow down indexing remarkably.What is that datasource? DB? then try using CachedSqlEntityProcessor
On Tue, Jun 14, 2011 at 8:31 PM, Mark <static.void....@gmail.com> wrote: > Hello all, > > We are using DIH to index our data (~6M documents) and its taking an > extremely long time (~24 hours). I am trying to find ways that we can speed > this up. I've been reading through older posts and it's my understanding > this should not take that long. > > One probably bottleneck is that we have a sub entity pulling in item > descriptions from a separate datasource which we then strip html from. > Before stripping the html we run it through JTidy. Our data-config looks > something like this: http://pastie.org/2067011 > > I've heard about entity threads and I was wondering if this would help in my > case? I haven't been able to find any good documentation on this. > > Another possible bottleneck is the the number of sub entities we have... 5 > (only 1 of which is CachedSqlEntityProcessor). Any ideas? > > Thanks for the help > > > -- ----------------------------------------------------- Noble Paul