Re: Importing large datasets

Blargy Wed, 02 Jun 2010 19:34:46 -0700


Erik Hatcher-4 wrote:
> 
> One thing that might help indexing speed - create a *single* SQL query  
> to grab all the data you need without using DIH's sub-entities, at  
> least the non-cached ones.
> 
>       Erik
> 
> On Jun 2, 2010, at 12:21 PM, Blargy wrote:
> 
>>
>>
>> As a data point, I routinely see clients index 5M items on normal  
>> hardware
>> in approx. 1 hour (give or take 30 minutes).
>>
>> Also wanted to add that our main entity (item) consists of 5 sub- 
>> entities
>> (ie, joins). 2 of those 5 are fairly small so I am using
>> CachedSqlEntityProcessor for them but the other 3 (which includes
>> item_description) are normal.
>>
>> All the entites minus the item_description connect to datasource1.  
>> They
>> currently point to one physical machine although we do have a pool  
>> of 3 DB's
>> that could be used if it helps. The other entity, item_description  
>> uses a
>> datasource2 which has a pool of 2 DB's that could potentially be  
>> used. Not
>> sure if that would help or not.
>>
>> I might as well that the item description will have indexed, stored  
>> and term
>> vectors set to true.
>> -- 
>> View this message in context:
>> http://lucene.472066.n3.nabble.com/Importing-large-datasets-tp863447p865219.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
> 
> 
>


I can't find any example of creating a massive sql query. Any out there?
Will batching still work with this massive query?
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Importing-large-datasets-tp863447p866506.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Importing large datasets

Reply via email to