ConcurrentUpdateSolrClient vs CloudSolrClient for bulk update to SolrCloud

2016-01-12 Thread Shivaji Dutta
We have a customer that needs to update few billion documents to SolrCloud. I know the suggested way of using is SolrCloudClient, for its load balancing feature. As per docs - CloudSolrClient SolrJ client class to communicate with SolrCloud. Instances of this class communicate with Zookeeper

Re: ConcurrentUpdateSolrClient vs CloudSolrClient for bulk update to SolrCloud

2016-01-12 Thread Shawn Heisey
On 1/12/2016 7:42 PM, Shivaji Dutta wrote: > Now since with ConcurrentUdateSolrClient I am able to use a queue and a pool > of threads, which makes it more attractive to use over CloudSolrClient which > will use a HTTPSolrClient once it gets a set of nodes to do the updates. > > What is the

collection reflection in resource manager node

2016-01-12 Thread vidya
Hi I have created a collection in one datanode on which solr server is deployed say DN1. I am having another datanode on which solr server is deployed which has resource manager service also running on it,say DN2. When i created a collection using solrctl command in DN1, it got reflected in DN2

RE: WArning in SolrCloud logs

2016-01-12 Thread Gian Maria Ricci - aka Alkampfer
Perfect, I'll remove the block and check if the warning will be gone. Thanks. -- Gian Maria Ricci Cell: +39 320 0136949 -Original Message- From: Alessandro Benedetti [mailto:abenede...@apache.org] Sent: martedì 12 gennaio 2016 10:43 To: solr-user@lucene.apache.org Subject: Re:

Re: indexing rich data with solr 5.3

2016-01-12 Thread kostali hassan
yes i'am indexing succeflly with DIH other files ; now i try to index this files with ExtractingRequestHandler i get this ERROR: null:org.apache.solr.common.SolrException: org.apache.tika.exception.TikaException: Error creating OOXML extractor at

RE: WArning in SolrCloud logs

2016-01-12 Thread Gian Maria Ricci - aka Alkampfer
THis is the replication handler configured in solrconfig.xml, there is nothing else regarding replication. This configuration is used for a single core test installation, then to experiment with SolrCloud I simply uploaded the very same configuration to SolrCloud. Do you think it could be the

Re: Change leader in SolrCloud

2016-01-12 Thread Alessandro Benedetti
I would like to do a special mention of the update request processor chain Solr Cloud mechanism.[1] Quoting the documentation : In a distributed SolrCloud situation setup, All processors in the chain > *before* the DistributedUpdateProcessor are run on the first node that > receives an update

Re: WArning in SolrCloud logs

2016-01-12 Thread Alessandro Benedetti
To be honest, that block is not necessary anymore. As Erick and Shawn were saying that is now implicit and defined by default. Cheers On 12 January 2016 at 08:22, Gian Maria Ricci - aka Alkampfer < alkamp...@nablasoft.com> wrote: > THis is the replication handler configured in solrconfig.xml,

realtime get requirements

2016-01-12 Thread Matteo Grolla
Hi, can you confirm me that realtime get requirements are just: true json true ${solr.ulog.dir:}

RE: Change leader in SolrCloud

2016-01-12 Thread Gian Maria Ricci - aka Alkampfer
Understood, thanks. I thought that the leader send data to other shards after indexing and autocommit take place, but I know that this is not the optimal situation. Sending all documents to all shard Solr can guarantee consistency of data. Now everything is more clear. Thanks for the

Boost does not appear in solr debug explain debug

2016-01-12 Thread Vincenzo D'Amore
Hi all, looking at parsedquery_toString debug I have many fields, but there is one that have this configuration: ((attr_search:8 attr_search:gb)~2^5.0) I hope to be right, but I expect to find a boost in both the values matches. Now, I don't understand why, even if both the terms matches, I

It's possible up and debug solr in eclipse IDE?

2016-01-12 Thread Rodrigo Testillano
I need debug my custom processor (updateRequestProcessor) in my Eclipse IDE. With old Solr version was possible, but with the solr like a service with jetty i don't know if exists some way to do -- Un Saludo. Rodrigo Testillano Tordesillas.

Re: It's possible up and debug solr in eclipse IDE?

2016-01-12 Thread Vincenzo D'Amore
Yep. I have done this just few hours ago. Let's download Solr source: wget http://it.apache.contactlab.it/lucene/solr/5.4.0/solr-5.4.0-src.tgz untar the file. I'm not sure we need, but I have already installed latest versions of: ant, ivy and maven. Then in the solr-5.4.0 directory I did

Re: It's possible up and debug solr in eclipse IDE?

2016-01-12 Thread Vincenzo D'Amore
Mmmm... I'm not sure it worth the trouble. Anyway, I'm just curious, when you find a way let me know. On Tue, Jan 12, 2016 at 1:01 PM, Rodrigo Testillano < rodrite.testill...@gmail.com> wrote: > Yes, with remote debug is working, but i want up a jetty with solr in > Eclipse like i did with

Re: SolrCloud, DIH, and XPathEntityProcessor

2016-01-12 Thread Shawn Heisey
On 1/12/2016 6:05 AM, Tom Evans wrote: > Hi all, trying to move our Solr 4 setup to SolrCloud (5.4). Having > some problems with a DIH config that attempts to load an XML file and > iterate through the nodes in that file, it trys to load the file from > disk instead of from zookeeper. > >

Re: realtime get requirements

2016-01-12 Thread Shawn Heisey
On 1/12/2016 2:50 AM, Matteo Grolla wrote: > and that it works with any directory factory? (Not just > NRTCachingDirectoryFactory) Realtime Get relies on the updateLog to return uncommitted documents, and standard Lucene mechanisms to return documents that have already been committed. It should

Re: realtime get requirements

2016-01-12 Thread Erick Erickson
right, suggester had some bad behavior where it rebuilt on startup despite setting the flag to _not_ do that. See: Some details here: https://lucidworks.com/blog/2015/03/04/solr-suggester/ Best, Erick On Tue, Jan 12, 2016 at 8:12 AM, Matteo Grolla wrote: > ok, >

Re: solrcloud -How to delete a doc at a specific shard

2016-01-12 Thread Erick Erickson
bq: it is too hard understand,what do you mean "lots"? I mean that if you have one or two duplicate docs it's worth looking at things like leading or trailing spaces in the ID leading to IDs that look identical but aren't. If it's hundreds or thousands of docs, then it's probably indicative of

Re: It's possible up and debug solr in eclipse IDE?

2016-01-12 Thread Erick Erickson
And a neater way to debug stuff rather than attaching to Solr is to step through the Junit tests that exercise the code you need to work on rather than attach to a remote Solr. This is often much faster rather than compile/start solr/attach. Of course some problems don't fit that process, but I

Re: indexing rich data with solr 5.3

2016-01-12 Thread Erick Erickson
Then you probably have a corrupt file or have discovered a Tika bug. Next I'd try running the file through stand-alone Tika, perhaps trying different versions of Tika. If this latter is the case, you can always use a more recent version of Tika with Solr and/or process the file on a SolrJ client

Re: Problems using MapReduceIndexerTool with multiple reducers

2016-01-12 Thread Douglas Rapp
As an update, I went ahead and used the Collection API and deleted the existing one, and then recreated it (specifying the compositeId router), and when I tried out MRIT, I didn't have any problems whatsoever with the number of reducers (and was able to cut the indexing time by over half!!). I'm

Re: Problems using MapReduceIndexerTool with multiple reducers

2016-01-12 Thread Douglas Rapp
Great to know. Thank you very much for your assistance! On Tue, Jan 12, 2016 at 10:34 AM, Erick Erickson wrote: > bq: Do you know, is using the API the > recommended way of handling collections? As opposed to putting collection > folders containing "core.properties"

Re: solr in action - multiple language content in one field

2016-01-12 Thread Erick Erickson
Well, Solr _can_ put all the languages in one field... it's just that the user experience is sub-optimal. Stopwords, stemming rules, even tokenization vary between languages and using, say, the English stopwords for Catalan is not the best. And the CJK languages (Chinese, Japanese and Korean)

Re: Boost does not appear in solr debug explain debug

2016-01-12 Thread Erick Erickson
You won't necessarily find both if those values are NOT in the particular document. If you have a document you know contains both but doesn't appear in your results list, consider using explainOther to see how the doc of interest is actually scored. Best, Erick On Tue, Jan 12, 2016 at 1:54 AM,

Re: Problems using MapReduceIndexerTool with multiple reducers

2016-01-12 Thread Erick Erickson
bq: Do you know, is using the API the recommended way of handling collections? As opposed to putting collection folders containing "core.properties" file and "conf" folders (containing "schema.xml" and "solrconfig.xml", etc) all in the Solr home location? Absolutely and certainly DO use the

Re: SolrCloud, DIH, and XPathEntityProcessor

2016-01-12 Thread Erick Erickson
Yeah, that's essentially the nature of open source, someone gets frustrated enough with current behavior and fixes it ;)... There's never any harm in opening a JIRA, all you need to do is register. It's not a bad idea to open on as you _start_ writing the code, even providing very early versions

Re: Boost does not appear in solr debug explain debug

2016-01-12 Thread Chris Hostetter
: ((attr_search:8 attr_search:gb)~2^5.0) : : I hope to be right, but I expect to find a boost in both the values : matches. 1) "boost" information should show up as a detail of the "queryWeight", which is itself a detail of the "weight" of term clauses -- in the output you've included below,

Re: It's possible up and debug solr in eclipse IDE?

2016-01-12 Thread Rodrigo Testillano
Thank you so much!, I'm going to try right now and tell you my results!! 2016-01-12 12:47 GMT+01:00 Vincenzo D'Amore : > Yep. > > I have done this just few hours ago. > Let's download Solr source: > > wget http://it.apache.contactlab.it/lucene/solr/5.4.0/solr-5.4.0-src.tgz >

Re: solrcloud -How to delete a doc at a specific shard

2016-01-12 Thread vidya
So, you have deployed solr server on three nodes namely 192.168.100.210;211;212 . Am I correct ? -- View this message in context: http://lucene.472066.n3.nabble.com/solrcloud-How-to-delete-a-doc-at-a-specific-shard-tp4249354p4250117.html Sent from the Solr - User mailing list archive at

Re: It's possible up and debug solr in eclipse IDE?

2016-01-12 Thread Rodrigo Testillano
Yes, with remote debug is working, but i want up a jetty with solr in Eclipse like i did with tomcat in older versions. Thank you very much for your help! I am going to try other way to do it, but maybe will be not possible 2016-01-12 12:51 GMT+01:00 Rodrigo Testillano

SolrCloud, DIH, and XPathEntityProcessor

2016-01-12 Thread Tom Evans
Hi all, trying to move our Solr 4 setup to SolrCloud (5.4). Having some problems with a DIH config that attempts to load an XML file and iterate through the nodes in that file, it trys to load the file from disk instead of from zookeeper. The file exists in zookeeper, adjacent to the

Re: SolrCloud, DIH, and XPathEntityProcessor

2016-01-12 Thread Shawn Heisey
On 1/12/2016 7:45 AM, Tom Evans wrote: > That makes no sense whatsoever. DIH loads the data_import.conf from ZK > just fine, or is that provided to DIH from another module that does > know about ZK? This is accomplished indirectly through a resource loader in the SolrCore object that is

Re: SolrCloud, DIH, and XPathEntityProcessor

2016-01-12 Thread Tom Evans
On Tue, Jan 12, 2016 at 3:00 PM, Shawn Heisey wrote: > On 1/12/2016 7:45 AM, Tom Evans wrote: >> That makes no sense whatsoever. DIH loads the data_import.conf from ZK >> just fine, or is that provided to DIH from another module that does >> know about ZK? > > This is

Re: SolrCloud, DIH, and XPathEntityProcessor

2016-01-12 Thread Tom Evans
On Tue, Jan 12, 2016 at 2:32 PM, Shawn Heisey wrote: > On 1/12/2016 6:05 AM, Tom Evans wrote: >> Hi all, trying to move our Solr 4 setup to SolrCloud (5.4). Having >> some problems with a DIH config that attempts to load an XML file and >> iterate through the nodes in that

Re: realtime get requirements

2016-01-12 Thread Matteo Grolla
Thanks Shawn, On a production solr instance some cores take a long time to load while other of similar size take much less. One of the differences between these cores is the directoryFactory. 2016-01-12 15:34 GMT+01:00 Shawn Heisey : > On 1/12/2016 2:50 AM, Matteo

Re: realtime get requirements

2016-01-12 Thread Matteo Grolla
ok, suggester was responsible for the long time to load. Thanks 2016-01-12 15:47 GMT+01:00 Matteo Grolla : > Thanks Shawn, > On a production solr instance some cores take a long time to load > while other of similar size take much less. One of the differences

Re: Problems using MapReduceIndexerTool with multiple reducers

2016-01-12 Thread Douglas Rapp
I'm actually not specifying any router, and assumed the "implicit" one was the default. The only resource I can find for setting the document router is when creating a new collection via the Collections API, which I am not using. What I do is define several options in the "solrconfig.xml" file,