> > Do you think it's possible to return (in the nested entity) rows > independent of the unique id, and let the processor decide when a document > is complete? > I don't think so.
In my case, I had 9 (JDBC) entities for each document. Most of these entities returned a single column and limited number rows for each document. I observed a significant improvement in performance by using an aggregation query in my parent query. e.g. in MySQL, I used group_concat() function to aggregate all the values (separated using some delimiter) into a single column of the parent query's resultset. I would then use a RegexTransformer to split this data on the previously used delimiter to populate in a multi-valued field. I actually got rid of 5 entities out of 9 in my data-config. It reduced the import time significantly too. Cheers Avlesh On Thu, Aug 6, 2009 at 10:15 PM, Chantal Ackermann < chantal.ackerm...@btelligent.de> wrote: > Hi all, > > to keep this thread up to date... ;-) > > > d) jdbc batch size > changed to 10. (Was default: 500, then 1000) > > The problem with my dih setup is that the root entity query returns a huge > set (all ids that shall be indexed). A larger fetchsize would be good for > that query. > The nested entity, however, returns only up 9 rows, ever. The constraints > are so strict (by id) that there is no way that any additional data could be > pre-fetched. > (Actually, anynone using DIH with nested entities should run into that > problem?) > > After changing to 10, I cannot see that this low batch size slowed the > indexer down (significantly). > > As I would like to stick with DIH (instead of dumping the data into CSV and > import it then) here is my question: > > Do you think it's possible to return (in the nested entity) rows > independent of the unique id, and let the processor decide when a document > is complete? > The examples in the wiki always use an ID to get the data for the nested > entity, so I'm not sure it was planned with that in mind. But as I'm already > handling multiple db rows for one document, it might not be too difficult to > change to handling the unique id correctly, as well? > Of course, I would need something like a look ahead to know whether the > next row is already part of the next document. > > > Cheers, > Chantal > > > > Concerning the other settings (just fyi): > > a) mergeFactor 10 (and also tried 100) > I don't think that changed anything to the worse, rather to the better. So, > I'll stick with 10 from now on. > > b) ramBufferSizeMB > tried 512, 1024. RAM usage went up when I increased from 256 to 512. Not > sure about 1024. I'll stick to 512. > > >