Wait! You're fetching records from one database and then doing lookups
against another DB? That makes this a completely different problem.

The DIH does not to my knowledge have the ability to "pool" these
queries. That is, it will not build a batch of 1000 keys from
datasource1 and then do a query against datasource2 with:
    select foo where key_field IN (key1, key2,... key1000);

This is the efficient way to do what you want. You'll have to write
your own client to do this.

On Wed, Jun 2, 2010 at 12:00 PM, David Stuart
<david.stu...@progressivealliance.co.uk> wrote:
> How long does it take to do a grab of all the data via SQL? I found by
> denormalizing the data into a lookup table meant that I was able to index
> about 300k rows of similar data size with dih regex spilting on some fields
> in about 8mins I know it's not quite the scale bit with batching...
>
> David Stuar
>
> On 2 Jun 2010, at 17:58, Blargy <zman...@hotmail.com> wrote:
>
>>
>>
>>
>>> One thing that might help indexing speed - create a *single* SQL query
>>> to grab all the data you need without using DIH's sub-entities, at
>>> least the non-cached ones.
>>>
>>
>> Not sure how much that would help. As I mentioned that without the item
>> description import the full process takes 4 hours which is bearable.
>> However
>> once I started to import the item description which is located on a
>> separate
>> machine/database the import process exploded to over 24 hours.
>>
>> --
>> View this message in context:
>> http://lucene.472066.n3.nabble.com/Importing-large-datasets-tp863447p865324.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Lance Norskog
goks...@gmail.com

Reply via email to