Re: Dataimport performance

2018-06-07 Thread Shawn Heisey
On 6/7/2018 12:19 AM, kotekaman wrote: sorry. may i know how to code it? Code *what*? Here's the same wiki page that I gave you for your last message: https://wiki.apache.org/solr/UsingMailingLists Even if I go to the Nabble website and discover that you've replied to a topic that's SEVEN A

Re: Dataimport performance

2018-06-07 Thread kotekaman
sorry. may i know how to code it? -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Dataimport performance

2010-12-19 Thread Lukas Kahwe Smith
On 19.12.2010, at 23:30, Alexey Serba wrote: > > Also Ephraim proposed a really neat solution with GROUP_CONCAT, but > I'm not sure that all RDBMS-es support that. Thats MySQL only syntax. But if you google you can find similar solution for other RDBMS. regards, Lukas Kahwe Smith m...@pooteew

Re: Dataimport performance

2010-12-19 Thread Alexey Serba
> With subquery and with left join:   320k in 6 Min 30 It's 820 records per second. It's _really_ impressive considering the fact that DIH performs separate sql query for every record in your case. >> So there's one track entity with an artist sub-entity. My (admittedly >> rather limited) experien

Re: Dataimport performance

2010-12-16 Thread Glen Newton
CPU. > > James Dyer > E-Commerce Systems > Ingram Content Group > (615) 213-4311 > > > -Original Message- > From: Ephraim Ofir [mailto:ephra...@icq.com] > Sent: Thursday, December 16, 2010 3:04 AM > To: solr-user@lucene.apache.org > Subject: RE: Dataimport pe

RE: Dataimport performance

2010-12-16 Thread Dyer, James
213-4311 -Original Message- From: Ephraim Ofir [mailto:ephra...@icq.com] Sent: Thursday, December 16, 2010 3:04 AM To: solr-user@lucene.apache.org Subject: RE: Dataimport performance Check out http://mail-archives.apache.org/mod_mbox/lucene-solr-user/20

RE: Dataimport performance

2010-12-16 Thread Ephraim Ofir
[mailto:rob...@dubture.com] Sent: Wednesday, December 15, 2010 4:49 PM To: solr-user@lucene.apache.org Subject: Re: Dataimport performance i've benchmarked the import already with 500k records, one time without the artists subquery, and one time without the join in the main query: Without sub

Re: Dataimport performance

2010-12-15 Thread Lance Norskog
Can you do just one join in the top-level query? The DIH does not have a batching mechanism for these joins, but your database does. On Wed, Dec 15, 2010 at 7:11 AM, Tim Heckman wrote: > The custom import I wrote is a java application that uses the SolrJ > library. Basically, where I had sub-enti

Re: Dataimport performance

2010-12-15 Thread Tim Heckman
The custom import I wrote is a java application that uses the SolrJ library. Basically, where I had sub-entities in the DIH config I did the mappings inside my java code. 1. Identify a subset or "chunk" of the primary id's to work on (so I don't have to load everything into memory at once) and put

Re: Dataimport performance

2010-12-15 Thread Robert Gründler
i've benchmarked the import already with 500k records, one time without the artists subquery, and one time without the join in the main query: Without subquery: 500k in 3 min 30 sec Without join and without subquery: 500k in 2 min 30. With subquery and with left join: 320k in 6 Min 30 so t

Re: Dataimport performance

2010-12-15 Thread Tim Heckman
2010/12/15 Robert Gründler : > The data-config.xml looks like this (only 1 entity): > >       >         >         >         >         >         >         name="sf_unique_id"/> > >         >           >         > >       So there's one track entity with an artist sub-entity. My (admittedly rather l

Re: Dataimport performance

2010-12-15 Thread Bernd Fehling
We are currently running Solr 4.x from trunk. -d64 -Xms10240M -Xmx10240M Total Rows Fetched: 24935988 Total Documents Skipped: 0 Total Documents Processed: 24568997 Time Taken: 5:55:19.104 24.5 Million Docs as XML from filesystem with less than 6 hours. May be your MySQL is the bottleneck? Reg

Re: Dataimport performance

2010-12-15 Thread Robert Gründler
> What version of Solr are you using? Solr Specification Version: 1.4.1 Solr Implementation Version: 1.4.1 955763M - mark - 2010-06-17 18:06:42 Lucene Specification Version: 2.9.3 Lucene Implementation Version: 2.9.3 951790 - 2010-06-06 01:30:55 -robert > > Adam > > 2010/12/15 Robert Gründ

Re: Dataimport performance

2010-12-15 Thread Erick Erickson
You're adding on the order of 750 rows (docs)/second, which isn't bad... have you profiled the machine as this runs? Even just with top (assuming unix)... because the very first question is always "what takes the time, getting the data from MySQL or indexing or I/O?". If you aren't maxing out you

Re: Dataimport performance

2010-12-15 Thread Adam Estrada
What version of Solr are you using? Adam 2010/12/15 Robert Gründler > Hi, > > we're looking for some comparison-benchmarks for importing large tables > from a mysql database (full import). > > Currently, a full-import of ~ 8 Million rows from a MySQL database takes > around 3 hours, on a QuadCo