One thing that might help indexing speed - create a *single* SQL query
to grab all the data you need without using DIH's sub-entities, at
least the non-cached ones.
Erik
On Jun 2, 2010, at 12:21 PM, Blargy wrote:
As a data point, I routinely see clients index 5M items on normal
hardware
in approx. 1 hour (give or take 30 minutes).
Also wanted to add that our main entity (item) consists of 5 sub-
entities
(ie, joins). 2 of those 5 are fairly small so I am using
CachedSqlEntityProcessor for them but the other 3 (which includes
item_description) are normal.
All the entites minus the item_description connect to datasource1.
They
currently point to one physical machine although we do have a pool
of 3 DB's
that could be used if it helps. The other entity, item_description
uses a
datasource2 which has a pool of 2 DB's that could potentially be
used. Not
sure if that would help or not.
I might as well that the item description will have indexed, stored
and term
vectors set to true.
--
View this message in context:
http://lucene.472066.n3.nabble.com/Importing-large-datasets-tp863447p865219.html
Sent from the Solr - User mailing list archive at Nabble.com.