Re: NRT vs TLOG bulk indexing performances

2019-10-30 Thread Dominique Bejean
, > > > > So, I understand that while non leader TLOG is copying the index from > > leader, the leader stop indexing. > > One shot large heavy bulk indexing should be very much more impacted than > > continus ligth indexing. > > > > Regards. > > > >

Re: NRT vs TLOG bulk indexing performances

2019-10-26 Thread Erick Erickson
"I understand that while non leader TLOG is copying the index from leader, the leader stop indexing” This _better_ not be happening. If you can demonstrate this let’s open a JIRA. > On Oct 25, 2019, at 8:28 AM, Dominique Bejean > wrote: > > I understand that while non leader TLOG is copying th

Re: NRT vs TLOG bulk indexing performances

2019-10-25 Thread Erick Erickson
indexing. > One shot large heavy bulk indexing should be very much more impacted than > continus ligth indexing. > > Regards. > > Dominique > > > Le ven. 25 oct. 2019 à 13:54, Shawn Heisey a écrit : > >> On 10/25/2019 1:16 AM, Dominique Bejean w

Re: NRT vs TLOG bulk indexing performances

2019-10-25 Thread Ere Maijala
Shawn Heisey kirjoitti 25.10.2019 klo 14.54: > With newer Solr versions, you can ask SolrCloud to prefer PULL replicas > for querying, so queries will be targeted to those replicas, unless they > all go down, in which case it will go to non-preferred replica types.  I > do not know how to do this,

Re: NRT vs TLOG bulk indexing performances

2019-10-25 Thread Dominique Bejean
Shawn, So, I understand that while non leader TLOG is copying the index from leader, the leader stop indexing. One shot large heavy bulk indexing should be very much more impacted than continus ligth indexing. Regards. Dominique Le ven. 25 oct. 2019 à 13:54, Shawn Heisey a écrit : > On

Re: NRT vs TLOG bulk indexing performances

2019-10-25 Thread Shawn Heisey
On 10/25/2019 1:16 AM, Dominique Bejean wrote: For collection created with all replicas as NRT * Indexing time : 22 minutes For collection created with all replicas as TLOG * Indexing time : 34 minutes NRT indexes simultaneously on all replicas. So when indexing is done on one, it is a

Re: NRT vs TLOG bulk indexing performances

2019-10-25 Thread Dominique Bejean
est? > > > Am 25.10.2019 um 09:16 schrieb Dominique Bejean < > dominique.bej...@eolya.fr>: > > > > Hi, > > > > I made some benchmarks for bulk indexing in order to compare performances > > and ressources usage for NRT versus TLOG replica. > > >

Re: NRT vs TLOG bulk indexing performances

2019-10-25 Thread Jörn Franke
Which Solr version are you using and how often you repeated the test? > Am 25.10.2019 um 09:16 schrieb Dominique Bejean : > > Hi, > > I made some benchmarks for bulk indexing in order to compare performances > and ressources usage for NRT versus TLOG replica. > > Env

NRT vs TLOG bulk indexing performances

2019-10-25 Thread Dominique Bejean
Hi, I made some benchmarks for bulk indexing in order to compare performances and ressources usage for NRT versus TLOG replica. Environnent : * Solrcloud with 4 Solr nodes (8 Gb RAM, 4 Gb Heap) * 1 collection with 2 shards x 2 replicas (all NRT or all TLOG) * 1 core per Solr Server Indexing of

Re: Solr CPU spiking up on bulk indexing

2019-06-18 Thread Erick Erickson
Dynamic fields don’t make any difference, they’re just like fixed fields as far as merging is concerned. So this is almost certainly merging being kicked off by your commits. The number of documents and the more terms, the more work Lucene has to do, so I suspect this is just how things work. I’l

Re: Solr CPU spiking up on bulk indexing

2019-06-18 Thread Venu
Thanks Erick. I see the above pattern only at the time of commit. I have many fields (like around 250 fields out of which around 100 fields are dynamic fields and around 3 n-gram fields and text fields, while many of them are stored fields along with indexed fields), will a merge take a lot of t

Re: Solr CPU spiking up on bulk indexing

2019-06-16 Thread Erick Erickson
When indexing, segments periodically merged by background threads, which can be quite CPU intensive. See: http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html Segment merges can be fairly long running, so even after indexing stops it can take some time for the CPU to s

Solr CPU spiking up on bulk indexing

2019-06-15 Thread Venu
Hi While doing a batch indexing, Solr CPU is spiking regularly. I am doing the auto-commit for every 5 minutes. Please find the image below On stopping the indexing, the CPU is coming to the normal state (around 20%). In the image a

Solr CPU spiking up on bulk indexing

2019-06-15 Thread Venu
Hi While doing a batch indexing, Solr CPU is spiking regularly. I am doing the auto-commit for every 5 minutes. Please find the image below On stopping the indexing, the CPU is coming to the normal state (around 20%). In the image a

Re: SolrJ bulk indexing documents - HttpSolrClient vs. ConcurrentUpdateSolrClient

2016-11-18 Thread Erick Erickson
WithinMs) instead, which I >> expect would already improve performance. >> Does it matter which method I use? Beside the method taking a >> Collection there is also one that takes an >> Iterator ... and what about ConcurrentUpdateSolrClient? >> Should I use it for bulk

Re: SolrJ bulk indexing documents - HttpSolrClient vs. ConcurrentUpdateSolrClient

2016-11-18 Thread Shawn Heisey
would already improve performance. > Does it matter which method I use? Beside the method taking a > Collection there is also one that takes an > Iterator ... and what about ConcurrentUpdateSolrClient? > Should I use it for bulk indexing instead of HttpSolrClient? > > Currently

SolrJ bulk indexing documents - HttpSolrClient vs. ConcurrentUpdateSolrClient

2016-11-18 Thread Sebastian Riemer
to use add(Collection docs, int commitWithinMs) instead, which I expect would already improve performance. Does it matter which method I use? Beside the method taking a Collection there is also one that takes an Iterator ... and what about ConcurrentUpdateSolrClient? Should I use it for bulk

Re: problems with bulk indexing with concurrent DIH

2016-08-08 Thread Shawn Heisey
On 8/2/2016 7:50 AM, Bernd Fehling wrote: > Only assumption so far, DIH is sending the records as "update" (and > not pure "add") to the indexer which will generate delete files during > merge. If the number of segments is high it will take quite long to > merge and check all records of all segment

Re: problems with bulk indexing with concurrent DIH

2016-08-04 Thread Bernd Fehling
t;> If it runs with DIH it should run with SolrJ with additional >> performance >>>>>> boost. >>>>>> >>>>>> Bernd >>>>>> >>>>>> >>>>>> On 27.07.2016 at 16:03, Erick Erickson: >&

Re: problems with bulk indexing with concurrent DIH

2016-08-02 Thread Bernd Fehling
16 CPUs and my test continued with 8 >> concurrent DIHs. >> Then i was trying different and settings but >> now I'm stuck. >> I can't figure out what is the best setting for bulk indexing. >> What I see is that the indexing is "falling asl

Re: problems with bulk indexing with concurrent DIH

2016-08-02 Thread Shalin Shekhar Mangar
s to much because for 16 CPUs and my test continued with 8 > concurrent DIHs. > Then i was trying different and settings but > now I'm stuck. > I can't figure out what is the best setting for bulk indexing. > What I see is that the indexing is "falling asleep" after som

Re: problems with bulk indexing with concurrent DIH

2016-08-02 Thread Mikhail Khludnev
gt;>> or similar. Currently, you're putting a load on the Solr > >>>>> servers (especially if you're also using Tika) in addition > >>>>> to all indexing etc. > >>>>> > >>>>> Here's a sample: > >>&

Re: problems with bulk indexing with concurrent DIH

2016-08-02 Thread Bernd Fehling
especially if you're also using Tika) in addition >>>>>> to all indexing etc. >>>>>> >>>>>> Here's a sample: >>>>>> https://lucidworks.com/blog/2012/02/14/indexing-with-solrj/ >>>&g

Re: problems with bulk indexing with concurrent DIH

2016-08-02 Thread Bernd Fehling
gt;>>> to all indexing etc. >>>>> >>>>> Here's a sample: >>>>> https://lucidworks.com/blog/2012/02/14/indexing-with-solrj/ >>>>> >>>>> Dodging the question I know, but DIH sometimes isn't >>>

Re: problems with bulk indexing with concurrent DIH

2016-08-02 Thread Susheel Kumar
//lucidworks.com/blog/2012/02/14/indexing-with-solrj/ > > >>> > > >>> Dodging the question I know, but DIH sometimes isn't > > >>> the best solution. > > >>> > > >>> Best, > > >>> Erick > > >

Re: problems with bulk indexing with concurrent DIH

2016-08-02 Thread Mikhail Khludnev
times isn't > >>> the best solution. > >>> > >>> Best, > >>> Erick > >>> > >>> On Wed, Jul 27, 2016 at 6:59 AM, Bernd Fehling > >>> wrote: > >>>> After enhancing the server with SSDs I'm

Re: problems with bulk indexing with concurrent DIH

2016-07-27 Thread Bernd Fehling
m trying to speed up indexing. >>>> >>>> The server has 16 CPUs and more than 100G RAM. >>>> JAVA (1.8.0_92) has 24G. >>>> SOLR is 4.10.4. >>>> Plain XML data to load is 218G with about 96M records. >>>> This will result in a s

Re: problems with bulk indexing with concurrent DIH

2016-07-27 Thread Erick Erickson
>> Plain XML data to load is 218G with about 96M records. > >> This will result in a single index of 299G. > >> > >> I tried with 4, 8, 12 and 16 concurrent DIHs. > >> 16 and 12 was to much because for 16 CPUs and my test continued with 8 > concurrent

Re: problems with bulk indexing with concurrent DIH

2016-07-27 Thread Bernd Fehling
>> Plain XML data to load is 218G with about 96M records. >> This will result in a single index of 299G. >> >> I tried with 4, 8, 12 and 16 concurrent DIHs. >> 16 and 12 was to much because for 16 CPUs and my test continued with 8 >> concurrent DIHs. >&g

Re: problems with bulk indexing with concurrent DIH

2016-07-27 Thread Erick Erickson
en i was trying different and settings but > now I'm stuck. > I can't figure out what is the best setting for bulk indexing. > What I see is that the indexing is "falling asleep" after some time of > indexing. > It is only producing del-files, like _11_1.

problems with bulk indexing with concurrent DIH

2016-07-27 Thread Bernd Fehling
rrent DIHs. 16 and 12 was to much because for 16 CPUs and my test continued with 8 concurrent DIHs. Then i was trying different and settings but now I'm stuck. I can't figure out what is the best setting for bulk indexing. What I see is that the indexing is "falling asleep

Re: bulk indexing with optimistick lock

2015-02-13 Thread Scott Stults
This isn't a Solr-specific answer, but the easiest approach might be to just collect the document IDs you're about to add, query for them, and then filter out the ones Solr already has (this'll give you a nice list for later reporting). You'll need to keep your batch sizes below maxBooleanClauses i

bulk indexing with optimistick lock

2015-02-11 Thread Sankalp Gupta
Hi All, My server side we are trying to add multiple documents in a list and then ask solr to add them in solr (using solrj client) and then after its finished calling the commit. Now we also want to control concurrency and for that we wanted to use solr's optimistic lock/versioning feature. That i

Re: bulk indexing - EofExceptions and big latencies after soft-commit

2014-03-18 Thread adfel70
lf. It > sounds like a setting for some other piece of software, perhaps a > client, load balancer, or servlet container. > > Thanks, > Shawn -- View this message in context: http://lucene.472066.n3.nabble.com/bulk-indexing-EofExceptions-and-big-latencies-after-soft-commit-tp4124574p4125214.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: bulk indexing - EofExceptions and big latencies after soft-commit

2014-03-17 Thread Shawn Heisey
On 3/17/2014 7:07 AM, adfel70 wrote: > we currently have arround 200gb in a server. > I'm aware of the RAM issue, but it somehow doesnt seems related. > I would expect search latency problems. not strange eofexceptions. > > regarding the http.timeout - I didn't change anything concerning this. > D

Re: bulk indexing - EofExceptions and big latencies after soft-commit

2014-03-17 Thread adfel70
ning >> on 8gb heap jvms. each node has total of 64gb memory. >> My current collection (7 shards, 3 replicas) has around 500 million docs. >> I'm performing bulk indexing into the collection. I set softCommit to 10 >> minutes and hardCommit openSearcher=false to 15 minu

Re: bulk indexing - EofExceptions and big latencies after soft-commit

2014-03-16 Thread Shawn Heisey
On 3/16/2014 10:34 AM, adfel70 wrote: > I have a 12-node solr 4.6.1 cluster. each node has 2 solr procceses, running > on 8gb heap jvms. each node has total of 64gb memory. > My current collection (7 shards, 3 replicas) has around 500 million docs. > I'm performing bulk

bulk indexing - EofExceptions and big latencies after soft-commit

2014-03-16 Thread adfel70
Hi I have a 12-node solr 4.6.1 cluster. each node has 2 solr procceses, running on 8gb heap jvms. each node has total of 64gb memory. My current collection (7 shards, 3 replicas) has around 500 million docs. I'm performing bulk indexing into the collection. I set softCommit to 10 minute

Re: Solr Cloud Bulk Indexing Questions

2014-01-23 Thread Software Dev
; We are testing our shiny new Solr Cloud architecture but we are > > experiencing some issues when doing bulk indexing. > > > > We have 5 solr cloud machines running and 3 indexing machines (separate > > from the cloud servers). The indexing machines pull off ids from a queue > > the

Re: Solr Cloud Bulk Indexing Questions

2014-01-23 Thread Otis Gospodnetic
testing our shiny new Solr Cloud architecture but we are > experiencing some issues when doing bulk indexing. > > We have 5 solr cloud machines running and 3 indexing machines (separate > from the cloud servers). The indexing machines pull off ids from a queue > then they index and

Re: Solr Cloud Bulk Indexing Questions

2014-01-23 Thread Shawn Heisey
On 1/23/2014 11:01 AM, Software Dev wrote: Is there any way to configure autoCommit, softCommit values on a per request basis? The majority of the time we have small flow of updates coming in and we would like to see them in ASAP. However we occasionally need to do some bulk indexing (once a

Re: Solr Cloud Bulk Indexing Questions

2014-01-23 Thread Software Dev
uent that hard commits. > > Is there any way to configure autoCommit, softCommit values on a per > request basis? The majority of the time we have small flow of updates > coming in and we would like to see them in ASAP. However we occasionally > need to do some bulk indexing (once a week

Re: Solr Cloud Bulk Indexing Questions

2014-01-23 Thread Software Dev
flow of updates coming in and we would like to see them in ASAP. However we occasionally need to do some bulk indexing (once a week or less) and the need to see those updates right away isn't as critical. I would say 95% of the time we are in "Index-Light Query-Light/Heavy" mode and

Re: Solr Cloud Bulk Indexing Questions

2014-01-22 Thread Erick Erickson
;>> >>>>>> far as docs/second it would guess around 200/sec which doesn't seem >>>>>>> >>>>>> that >>>>> >>>>>> high. >>>>>>> >>>>>>> >>>>>>>

Re: Solr Cloud Bulk Indexing Questions

2014-01-22 Thread Software Dev
ou commit your updates? What is your >>>>>>> indexing rate in docs/second? >>>>>>> >>>>>>> In a SolrCloud setup, you should be using a CloudSolrServer. If the >>>>>>> server is having trouble keeping up with updates, switching

Re: Solr Cloud Bulk Indexing Questions

2014-01-22 Thread Andre Bois-Crettez
;t help. So I suspect there's something not optimal about your setup that's the culprit. Best, Erick On Mon, Jan 20, 2014 at 4:00 PM, Software Dev < static.void@gmail.com> wrote: We are testing our shiny new Solr Cloud architecture but we are experiencing some issues whe

Re: Solr Cloud Bulk Indexing Questions

2014-01-21 Thread Software Dev
t; In a SolrCloud setup, you should be using a CloudSolrServer. If the >> >>> server is having trouble keeping up with updates, switching to CUSS >> >>> probably wouldn't help. >> >>> >> >>> So I suspect there's something not optimal about y

Re: Solr Cloud Bulk Indexing Questions

2014-01-20 Thread Software Dev
;s something not optimal about your setup that's > >>> the culprit. > >>> > >>> Best, > >>> Erick > >>> > >>> On Mon, Jan 20, 2014 at 4:00 PM, Software Dev < > static.void@gmail.com> > >>> wrote: &g

Re: Solr Cloud Bulk Indexing Questions

2014-01-20 Thread Mark Miller
>>> probably wouldn't help. >>> >>> So I suspect there's something not optimal about your setup that's >>> the culprit. >>> >>> Best, >>> Erick >>> >>> On Mon, Jan 20, 2014 at 4:00 PM, Software Dev >>>

Re: Solr Cloud Bulk Indexing Questions

2014-01-20 Thread Software Dev
t; Erick >> >> On Mon, Jan 20, 2014 at 4:00 PM, Software Dev >> wrote: >> > We are testing our shiny new Solr Cloud architecture but we are >> > experiencing some issues when doing bulk indexing. >> > >> > We have 5 solr cloud machines running and 3 ind

Re: Solr Cloud Bulk Indexing Questions

2014-01-20 Thread Software Dev
e culprit. > > Best, > Erick > > On Mon, Jan 20, 2014 at 4:00 PM, Software Dev > wrote: > > We are testing our shiny new Solr Cloud architecture but we are > > experiencing some issues when doing bulk indexing. > > > > We have 5 solr cloud machines running

Re: Solr Cloud Bulk Indexing Questions

2014-01-20 Thread Erick Erickson
ot optimal about your setup that's the culprit. Best, Erick On Mon, Jan 20, 2014 at 4:00 PM, Software Dev wrote: > We are testing our shiny new Solr Cloud architecture but we are > experiencing some issues when doing bulk indexing. > > We have 5 solr cloud machines running and 3

Solr Cloud Bulk Indexing Questions

2014-01-20 Thread Software Dev
We are testing our shiny new Solr Cloud architecture but we are experiencing some issues when doing bulk indexing. We have 5 solr cloud machines running and 3 indexing machines (separate from the cloud servers). The indexing machines pull off ids from a queue then they index and ship over a

Re: Solr Stalls on Bulk indexing, no logs or errors

2013-12-04 Thread Erick Erickson
ther you're using that. The test for > > whether you're running in to that is whether you can continue > > to _query_, just not update. > > > > But you need to tell us more about our setup. In particular > > hour commit settings (hard and soft), your solrconfig setti

Re: Solr Stalls on Bulk indexing, no logs or errors

2013-12-04 Thread steven crichton
rticular > hour commit settings (hard and soft), your solrconfig settings, > particularly around autowarming, how you're "bulk indexing", > SolrJ? DIH? a huge CSV file? > > Best, > Erick > > > On Wed, Dec 4, 2013 at 2:30 PM, steven crichton <[h

Re: Solr Stalls on Bulk indexing, no logs or errors

2013-12-04 Thread Erick Erickson
hour commit settings (hard and soft), your solrconfig settings, particularly around autowarming, how you're "bulk indexing", SolrJ? DIH? a huge CSV file? Best, Erick On Wed, Dec 4, 2013 at 2:30 PM, steven crichton wrote: > I am finding with a bulk index using SOLR 4.3 on Tomcat,

Solr Stalls on Bulk indexing, no logs or errors

2013-12-04 Thread steven crichton
ene.472066.n3.nabble.com/Solr-Stalls-on-Bulk-indexing-no-logs-or-errors-tp4104981.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Bulk Indexing Question

2012-11-27 Thread Shawn Heisey
On 11/27/2012 1:07 PM, Joseph C. Trubisz wrote: When I curl a file to be indexed (in this case, as CSV), how do I know which index it’s going to, if I have multiple indexes currently being managed by Solr? For example, I have indexes for drug, company, author, abstract and I want to CSV load to

Bulk Indexing Question

2012-11-27 Thread Joseph C. Trubisz
Greetings… I’m new to Solr, so this might be a real amateur question. When I curl a file to be indexed (in this case, as CSV), how do I know which index it’s going to, if I have multiple indexes currently being managed by Solr? For example, I have indexes for drug, company, author, abstract and

Re: Bulk Indexing

2012-07-31 Thread Mikhail Khludnev
Usually collecting whole array hurts client's jvm JVM, sending doc-by-doc bloats sever by huge number of small requests. You need just rewrite your code from the eager loop to pulling iterator to be able to submit all docs via single http request http://wiki.apache.org/solr/Solrj#Streaming_document

Re: Bulk Indexing

2012-07-28 Thread Sohail Aboobaker
We have auto commit on and will basically send it in a loop after validating each record, we send it to search service. And keep doing it in a loop. Mikhail / Lan, are you suggesting that instead of sending it in a loop, we should collect them in an array and do a commit at the end? Is this better

Re: Bulk Indexing

2012-07-28 Thread Mikhail Khludnev
Lan, I assume that some particular server can freeze on such bulk. But overall message seems not absolutely correct to me. Solr has a lot of mechanisms to survive in such cases. Bulk indexing is absolutely right (if you submit single request with long iterator of SolrInputDocs). This indexing

RE: Bulk Indexing

2012-07-27 Thread Lan
'stop the world' events during indexing. - Update in batches with a commit at the end of the batch. -- View this message in context: http://lucene.472066.n3.nabble.com/Bulk-Indexing-tp3997745p3997815.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Bulk Indexing

2012-07-27 Thread Sohail Aboobaker
We will be using Solr 3.x version. I was wondering if we do need to worry about this as we have only 10k index entries at a time. It sounds like a very low number and we have only document type at this point. Should we worry about directly using SolrJ for indexing and searching for this low volume

Re: Bulk Indexing

2012-07-27 Thread Alexandre Rafalovitch
Haven't tried this but: 1) I think SOLR 4 supports on-the-fly core attach/detach/select. Can somebody confirm this? 2) If 1) is true, run everything as two cores. 3) One core is live in production 4) Second core is detached from SOLR and attached to something like SolrJ, which I believe can index w

RE: Bulk Indexing

2012-07-27 Thread Zhang, Lisheng
large, per folder). This is just my plan (not fully implemented yet). Best regards, Lisheng -Original Message- From: Sohail Aboobaker [mailto:sabooba...@gmail.com] Sent: Friday, July 27, 2012 6:56 AM To: solr-user@lucene.apache.org Subject: Bulk Indexing Hi, We have created a search

RE: Bulk indexing data into solr

2012-07-26 Thread Zhang, Lisheng
-Original Message- From: Mikhail Khludnev [mailto:mkhlud...@griddynamics.com] Sent: Thursday, July 26, 2012 12:46 PM To: solr-user@lucene.apache.org Subject: Re: Bulk indexing data into solr IIRC about a two month ago problem with such scheme discussed here, but I can remember exact

Re: Bulk indexing data into solr

2012-07-26 Thread Mikhail Khludnev
Message- > From: Mikhail Khludnev [mailto:mkhlud...@griddynamics.com] > Sent: Thursday, July 26, 2012 10:15 AM > To: solr-user@lucene.apache.org > Subject: Re: Bulk indexing data into solr > > > Coming back to your original question. I'm puzzled a little. > It

RE: Bulk indexing data into solr

2012-07-26 Thread Zhang, Lisheng
@lucene.apache.org Subject: Re: Bulk indexing data into solr Coming back to your original question. I'm puzzled a little. It's not clear where you wanna call Lucene API directly from. if you mean that you has standalone indexer, which write index files. Then it stops and these files become ava

Re: Bulk indexing data into solr

2012-07-26 Thread Mikhail Khludnev
Coming back to your original question. I'm puzzled a little. It's not clear where you wanna call Lucene API directly from. if you mean that you has standalone indexer, which write index files. Then it stops and these files become available for Solr Process it will work. Sharing index between proces

Re: Bulk indexing data into solr

2012-07-26 Thread Mikhail Khludnev
Right in time, guys. https://issues.apache.org/jira/browse/SOLR-3585 Here is server side update processing "fork". It does the best for halting processing on exception occurs. Plug this UpdateProcessor, specify number of threads. Then submit lazy iterator into StreamingUpdateServer at client side

RE: Bulk indexing data into solr

2012-07-26 Thread Zhang, Lisheng
Thanks very much, both your and Rafal's advice are very helpful! -Original Message- From: Shawn Heisey [mailto:s...@elyograg.org] Sent: Thursday, July 26, 2012 8:47 AM To: solr-user@lucene.apache.org Subject: Re: Bulk indexing data into solr On 7/26/2012 7:34 AM, Rafał Kuć wrote:

Re: Bulk indexing data into solr

2012-07-26 Thread Shawn Heisey
On 7/26/2012 7:34 AM, Rafał Kuć wrote: If you use Java (and I think you do, because you mention Lucene) you should take a look at StreamingUpdateSolrServer. It not only allows you to send data in batches, but also index using multiple threads. A caveat to what Rafał said: The streaming object

Re: Bulk indexing data into solr

2012-07-26 Thread Rafał Kuć
Hello! If you use Java (and I think you do, because you mention Lucene) you should take a look at StreamingUpdateSolrServer. It not only allows you to send data in batches, but also index using multiple threads. -- Regards, Rafał Kuć Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch -

Bulk indexing data into solr

2012-07-26 Thread Zhang, Lisheng
Hi, I am starting to use solr, now I need to index a rather large amount of data, it seems that calling solr to pass data through HTTP is rather inefficient, I am think still call lucene API directly for bulk index but to use solr for search, is this design OK? Thanks very much for helps, Li

Re: Bulk indexing, UpdateProcessor overwriteDupes and poor IO performances

2011-06-01 Thread Tanguy Moal
Lee, Thank you very much for your answer. Using the signature field as the uniqueKey is effectively what I was doing, so the "overwriteDupes=true" parameter in my solrconfig was somehow redundant, although I wasn't aware of it! =D In practice it works perfectly and that's the nice part. By

Re: Bulk indexing, UpdateProcessor overwriteDupes and poor IO performances

2011-05-31 Thread lee carroll
Tanguy You might have tried this already but can you set overwritedupes to false and set the signiture key to be the id. That way solr will manage updates? from the wiki http://wiki.apache.org/solr/Deduplication HTH Lee On 30 May 2011 08:32, Tanguy Moal wrote: > > Hello, > > Sorry for re-

Re: Bulk indexing, UpdateProcessor overwriteDupes and poor IO performances

2011-05-30 Thread Tanguy Moal
Hello, Sorry for re-posting this but it seems my message got lost in the mailing list's messages stream without hitting anyone's attention... =D Shortly, has anyone already experienced dramatic indexing slowdowns during large bulk imports with overwriteDupes turned on and a fairly high dupli

Bulk indexing, UpdateProcessor overwriteDupes and poor IO performances

2011-05-25 Thread Tanguy Moal
Dear list, I'm posting here after some unsuccessful investigations. In my setup I push documents to Solr using the StreamingUpdateSolrServer. I'm sending a comfortable initial amount of documents (~250M) and wished to perform overwriting of duplicated documents at index time, during the update