Erik Hatcher-4 wrote: > > One thing that might help indexing speed - create a *single* SQL query > to grab all the data you need without using DIH's sub-entities, at > least the non-cached ones. > > Erik > > On Jun 2, 2010, at 12:21 PM, Blargy wrote: > >> >> >> As a data point, I routinely see clients index 5M items on normal >> hardware >> in approx. 1 hour (give or take 30 minutes). >> >> Also wanted to add that our main entity (item) consists of 5 sub- >> entities >> (ie, joins). 2 of those 5 are fairly small so I am using >> CachedSqlEntityProcessor for them but the other 3 (which includes >> item_description) are normal. >> >> All the entites minus the item_description connect to datasource1. >> They >> currently point to one physical machine although we do have a pool >> of 3 DB's >> that could be used if it helps. The other entity, item_description >> uses a >> datasource2 which has a pool of 2 DB's that could potentially be >> used. Not >> sure if that would help or not. >> >> I might as well that the item description will have indexed, stored >> and term >> vectors set to true. >> -- >> View this message in context: >> http://lucene.472066.n3.nabble.com/Importing-large-datasets-tp863447p865219.html >> Sent from the Solr - User mailing list archive at Nabble.com. > > >
I can't find any example of creating a massive sql query. Any out there? Will batching still work with this massive query? -- View this message in context: http://lucene.472066.n3.nabble.com/Importing-large-datasets-tp863447p866506.html Sent from the Solr - User mailing list archive at Nabble.com.