CSV file is 5GB aprox. for 29 millions. 

As you say Christopher, at the beggining we thougth that reading chunk by chunk 
from Oracle and writing to Solr
was the best strategy. 

But, from our tests we've remarked:

CSV creation via PL/SQL is really really fast. 40 minutes for the full dataset 
(with bulk collect).
Multiple SELECT calls from java slows down the process. I think Oracle is the 
bottleneck here.

Any other ideas/alternatives?

Some other points to remark:

We are going to enable autoCommit for every 10 minutes / 10000 rows. No commit 
from client.
During indexing,  whe call all the time a front-end load-balancer that redirect 
calls to the 3-node cluster.

Thanks in advance!!

==>Great maillist and really awesome tool!! 

-----Message d'origine-----
De : Christopher Schultz [mailto:ch...@christopherschultz.net] 
Envoyé : lundi 19 mars 2018 18:05
À : solr-user@lucene.apache.org
Objet : Re: Question liste solr

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

Mariano,

On 3/19/18 11:50 AM, LOPEZ-CORTES Mariano-ext wrote:
> Hello
> 
> We have an index Solr with 3 nodes, 1 shard et 2 replicas.
> 
> Our goal is to index 42 millions rows. Indexing time is important.
> The data source is an oracle database.
> 
> Our indexing strategy is :
> 
> *         Reading from Oracle to a big CSV file.
> 
> *         Reading from 4 files (big file chunked) and injection via
> ConcurrentUpdateSolrClient
> 
> Is it the optimal way of injecting such mass of data into Solr ?
> 
> For information, estimated time for our solution is 6h.

How big are the CSV files? If most of the time is taken performing the various 
SELECT operations, then it's probably a good strategy.

However, you may find that using the disk as a buffer slows everything down 
because disk-writes can be very slow.

Why not perform your SELECT(s) and write directly to Solr using one of the APIs 
(either a language-specific API, or through the HTTP API)?

Hope that helps,
- -chris
-----BEGIN PGP SIGNATURE-----
Comment: GPGTools - http://gpgtools.org
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQJRBAEBCAA7FiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlqv7aEdHGNocmlzQGNo
cmlzdG9waGVyc2NodWx0ei5uZXQACgkQHPApP6U8pFgJrg//RushznZlTg60TxdE
s/XKK+69s9c0+DwZ/IrU366j2ZOcJl8Osu9TpzaCSEpdWuulFG8qCSYThTngaijH
I02YCqnK9Ey4+6B7u9QECWNXjdlQXoeINjCnRLVENWzkSmht/U2nW3WTFEPKOvQ3
6ISTPATFnfo6Wt4VYrVefqO/yCCiR5bGL5LsSZYwvqlh9egR8K/wtf4sQ5kji3z+
r2Z0gYpR9igE3ZCIByf6QGq0Ftku90oFCG+kCVNOdgfqwkUaMdc7krv92oTSH4o5
BH+trc2jPf3HKFmp/ywRAPEhAfA5BwbT8vB9gwl/6vuT6efAot7xrLqduF3h7jG6
ffPtkEBbD/ld3inIVta6/hnUwxX9O1fBtJrZegD14cezLV9QcEWFJ8/lUfgGOTdX
ZuvwxBFhmCXE9EMWLlpdUOWK9iVBsZoQZxawoqw9xQauBp/Adg29fdeXmEkUssey
85HGDv/x33Bcr1xPGa8nOygWcZRUgGFCh871qStg9GeTNx3C/mSk0wxdKeUDRePg
GEuL0p803yCJYAddyF66nnx676LfFeDaocBJelx5UbiteNT23xut7jWP/COyOvoy
tpq3c9UfIkobgcA7bZ3IL2Og+hExgo+tLQXiOx6bf2TD1Jk2UOWWk1TAUspuUybD
VH6PlwgqcrO28Jx799mJvpIotoE=
=aMPk
-----END PGP SIGNATURE-----

Reply via email to