Alain,
How many rows did you export in this fashion and what was the
performance?

We do have oracle as underlying database with data obtained from
multiple tables. The data is only 1 level deep except for one table
where we need to traverse hierarchy to get information.

How many XML files did you feed into SOLR one at a time?

Shishir

-----Original Message-----
From: Alain Rogister [mailto:alain.rogis...@gmail.com] 
Sent: Tuesday, October 25, 2011 4:28 PM
To: solr-user@lucene.apache.org
Subject: Re: Loading data to SOLR first time ( taking too long)

Are you loading data from multiple tables ? How many levels deep ? After
some experimenting, I gave up on the DIH because I found it to generate
very chatty (one row at a time) SQL against my schema, and I experienced
concurrency bugs unless multithreading was set to false, and I wasn't
too confident in the incremental mode against a complex schema.

Here is what worked for us (with Oracle):

- create materialized views; make sure that you include a
'lastUpdateTime'
field in the main table. This step may be unnecessary if your source
data does not need any pre-processing / cleaning / reorganizing.
- write a stored procedure that exports the data in Solr's XML format;
parameterize it with a range of primary keys of your main table so that
you can partition the export into manageable subsets. The XML format is
very simple, no need for complex in-the-database XML functions to
generate it.
- use the database scheduler to run that procedure as a set of jobs; run
a few of them in parallel.
- use CURL or WGET or similar to feed the XML files into the index as
soon as they are available.
- compress and archive the XML files; they will come handy when you need
to provision another index instance and will save you a lot of exporting
time.
- make sure your stored procedure can work in incremental mode: e.g.
export all records updated after a certain timestamp; then just push the
resulting XML into Solr.

Alain

On Tue, Oct 25, 2011 at 9:56 PM, Awasthi, Shishir
<shishir.awas...@baml.com>wrote:

> Hi,
>
> I recently started working on SOLR and loaded approximately 4 million 
> records to the solr using DataImportHandler. It took 5 days to 
> complete this process.
>
>
>
> Can you please suggest how this can be improved? I would like this to 
> be done in less than 6 hrs.
>
>
>
> Thanks,
>
> Shishir
>
> ----------------------------------------------------------------------
> This message w/attachments (message) is intended solely for the use of

> the intended recipient(s) and may contain information that is 
> privileged, confidential or proprietary. If you are not an intended 
> recipient, please notify the sender, and then please delete and 
> destroy all copies and attachments, and be advised that any review or 
> dissemination of, or the taking of any action in reliance on, the 
> information contained in or attached to this message is prohibited.
> Unless specifically indicated, this message is not an offer to sell or

> a solicitation of any investment products or other financial product 
> or service, an official confirmation of any transaction, or an 
> official statement of Sender. Subject to applicable law, Sender may 
> intercept, monitor, review and retain e-communications (EC) traveling 
> through its networks/systems and may produce any such EC to 
> regulators, law enforcement, in litigation and as required by law.
> The laws of the country of each sender/recipient may impact the 
> handling of EC, and EC may be archived, supervised and produced in 
> countries other than the country in which you are located. This 
> message cannot be guaranteed to be secure or free of errors or
viruses.
>
> References to "Sender" are references to any subsidiary of Bank of 
> America Corporation. Securities and Insurance Products: * Are Not FDIC

> Insured * Are Not Bank Guaranteed * May Lose Value * Are Not a Bank 
> Deposit * Are Not a Condition to Any Banking Service or Activity * Are

> Not Insured by Any Federal Government Agency. Attachments that are 
> part of this EC may have additional important disclosures and
disclaimers, which you should read.
> This message is subject to terms available at the following link:
> http://www.bankofamerica.com/emaildisclaimer. By messaging with Sender

> you consent to the foregoing.
>

----------------------------------------------------------------------
This message w/attachments (message) is intended solely for the use of the 
intended recipient(s) and may contain information that is privileged, 
confidential or proprietary. If you are not an intended recipient, please 
notify the sender, and then please delete and destroy all copies and 
attachments, and be advised that any review or dissemination of, or the taking 
of any action in reliance on, the information contained in or attached to this 
message is prohibited. 
Unless specifically indicated, this message is not an offer to sell or a 
solicitation of any investment products or other financial product or service, 
an official confirmation of any transaction, or an official statement of 
Sender. Subject to applicable law, Sender may intercept, monitor, review and 
retain e-communications (EC) traveling through its networks/systems and may 
produce any such EC to regulators, law enforcement, in litigation and as 
required by law. 
The laws of the country of each sender/recipient may impact the handling of EC, 
and EC may be archived, supervised and produced in countries other than the 
country in which you are located. This message cannot be guaranteed to be 
secure or free of errors or viruses. 

References to "Sender" are references to any subsidiary of Bank of America 
Corporation. Securities and Insurance Products: * Are Not FDIC Insured * Are 
Not Bank Guaranteed * May Lose Value * Are Not a Bank Deposit * Are Not a 
Condition to Any Banking Service or Activity * Are Not Insured by Any Federal 
Government Agency. Attachments that are part of this EC may have additional 
important disclosures and disclaimers, which you should read. This message is 
subject to terms available at the following link: 
http://www.bankofamerica.com/emaildisclaimer. By messaging with Sender you 
consent to the foregoing.

Reply via email to