Re: SolrCloud default shard assignment order not correct

2015-04-27 Thread spillane
Shawn, The 4.2 cloud graph ordering turned out not to be a problem, after the first startup of my 5 leaders and 5 replicas their shard assignments were 'fixed' in Zookeeper. I can now start them in any order and get the same graph.

SolrCloud Replication Issue

2015-04-27 Thread Amit L
Hi, A few days ago I deployed a solr 4.9.0 cluster, which consists of 2 collections. Each collection has 1 shard with 3 replicates on 3 different machines. On the first day I noticed this error appear on the leader. Full Log - http://pastebin.com/wcPMZb0s 4/23/2015, 2:34:37 PM SEVERE

Solr + RDF = SolRDF

2015-04-27 Thread Andrea Gazzarini
Hi guys, I'd like to share with you a project (actually a hobby for me) where I'm spending my free time, maybe someone could get some idea or benefit from it. https://github.com/agazzarini/SolRDF I called it SolRDF (Solr + RDF): It is a set of Solr extensions for managing (indexing and querying)

Re: Load balancer for indexing?

2015-04-27 Thread Chris Hostetter
: I manage a SolrCloud with 5 shards. Queries go thru an AWS load balancer but : indexing does not, so my leader1 is getting clobbered. Should my SolrJ app : be pointing at a load balancer and if so will indexing via the : ConcurrentUpdateSolrServer class still work? The Concurrent part

Re: Load balancer for indexing?

2015-04-27 Thread Shawn Heisey
On 4/27/2015 3:44 PM, spillane wrote: I manage a SolrCloud with 5 shards. Queries go thru an AWS load balancer but indexing does not, so my leader1 is getting clobbered. Should my SolrJ app be pointing at a load balancer and if so will indexing via the ConcurrentUpdateSolrServer class still

Re: SolrCloud Replication Issue

2015-04-27 Thread Amit L
Appreciate the response, to answer your questions. * Do you see this happen often? How often? It has happened twice in five days. The first two days after deployment. * Are there any known network issues? There are no obvious network issues but as these instances reside in AWS i cannot rule it

Re: Solr Cloud reclaiming disk space from deleted documents

2015-04-27 Thread Gili Nachum
To prevent it from re occurring you could monitor index size and once above a certain size threshold add another machine and split the shard between existing and new machine. On Apr 20, 2015 9:10 PM, Rishi Easwaran rishi.easwa...@aol.com wrote: So is there anything that can be done from a tuning

Load balancer for indexing?

2015-04-27 Thread spillane
I manage a SolrCloud with 5 shards. Queries go thru an AWS load balancer but indexing does not, so my leader1 is getting clobbered. Should my SolrJ app be pointing at a load balancer and if so will indexing via the ConcurrentUpdateSolrServer class still work? -- View this message in

AW: Odp.: solr issue with pdf forms

2015-04-27 Thread Steve.Scholl
Erick, thanks a lot for helping me here. In my case it ist he content field which is displayed not correctly. So I went tot he schema browser like you pointed out. Here ist he information I found: Field: content Field Type: text Properties: Indexed, Tokenized, Stored, TermVector Stored Schema:

stats component performance

2015-04-27 Thread Matteo Grolla
Hi, is there any public benchmark or description of how the solr stats component works? Matteo

Re: Auto replication mechanism in SolrCloud 5.1 not working

2015-04-27 Thread mihaela olteanu
Thanks for the reply. Now it's working but I'm not sure what change fixed this .. It might have been a communication error with ZooKeeper although I could not see anything as such in the logs. I found that ZooKeeper was for example generating some trace files in a location that was running out

Replication not triggered

2015-04-27 Thread Michael Lackhoff
We have old fashioned replication configured between one master and one slave. Everything used to work but today I noticed that recent records were not present in the slave (same query gives hits on master but non on slave). The replication communication seems to work. This is what I get in the

RE: TIKA OCR not working

2015-04-27 Thread Uwe Schindler
Yes that is fixed. - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Mattmann, Chris A (3980) [mailto:chris.a.mattm...@jpl.nasa.gov] Sent: Monday, April 27, 2015 4:29 PM To: u...@tika.apache.org Cc:

Integrating Solr with an existing web application - and SolrJ

2015-04-27 Thread O. Olson
I can get the standard Solr example to run within Jetty and I can use it through the velocity templates. I'm now thinking of integrating Solr with a couple of existing websites. In this regard, I have the following questions: 1. For a medium sized website (about 100+ concurrent users), what is

Re: TIKA OCR not working

2015-04-27 Thread Mattmann, Chris A (3980)
It should work out of the box in Solr as long as Tesseract is installed and on the class path. Solr had an issue with it since Tika sends 2 startDocument calls, but I fixed that with Uwe and it was shipped in 4.10.4 and in 5.x I think?

RE: TIKA OCR not working

2015-04-27 Thread Uwe Schindler
Hi, TIKA OCR is definitely working automatically with Solr 5.x. It is just important to install TesseractOCR on path (which is a native tool that does the actual work). On Ubuntu Linux, this should be quite simple (apt-get install tesseract-ocr or like that). You may also need to ainstall

FW: TIKA OCR not working

2015-04-27 Thread Allison, Timothy B.
Trung, I haven't experimented with our OCR parser yet, but this should give a good start: https://wiki.apache.org/tika/TikaOCR . Have you installed tesseract? Tika colleagues, Any other tips? What else has to be configured and how? -Original Message- From: trung.ht

Re: Integrating Solr with an existing web application - and SolrJ

2015-04-27 Thread Doug Turnbull
1. Unless usage is very light, you likely want Solr to be on a different server. Its going to have different caching and system needs than your web app. You may also want to scale Solr independently from your web app. Think of it just like you think of a database-- do you want your MySQL instance

Field attribute default value

2015-04-27 Thread Steven White
Hi Everyone, I'm looking at https://cwiki.apache.org/confluence/display/solr/Defining+Fields and https://wiki.apache.org/solr/SchemaXml but cannot find an answer, so maybe it is someplace else? I need to know what is the default value for each field attribute (when that attribute is missing).

Re: SolrCloud Replication Issue

2015-04-27 Thread Erick Erickson
Amit: The fact that all instances are using no more than 30% isn't really indicative of whether or not GC pauses are a problem. If you have a large heap allocated to Java, then the to-be-collected objects will build up and _eventually_ you'll have a stop-the-world GC pause even though each

Re: Start Solr with multiple external zookeepers on Windows Server?

2015-04-27 Thread Zheng Lin Edwin Yeo
For my version in Solr-5.0.0, I use this command to start: java -DzkHost=localhost:2181,localhost:2182,localhost:2183 -jar start.jar For my setup, the 3 zookeeper servers are running on the same machine, but you can replace the 'localhost' to your server IP addresses, and also replace the ports

Re: TIKA OCR not working

2015-04-27 Thread trung.ht
Hi Uwe, Thanks for the answer, but it looks like it does not work on my machine. I use Mac OS 10.10.3, tesseract is installed through homebrew, and tested with the same file I post to solr. I think tesseract is on path since I run this command successfully: tesseract test_tesseract.png output

Solr node going to recovering state during heavy reindexing

2015-04-27 Thread Gopal Jee
We have a 26 node solr cloud cluster. During heavy re-indexing, some of nodes go into recovering state. as per current config, soft commit is set to 15 minute and hard commit to 30 sec. Moreover, zkClientTimeout is set to 30 sec in solr nodes. Please advise. Thanks Gopal

Re: Solr node going to recovering state during heavy reindexing

2015-04-27 Thread Shawn Heisey
On 4/27/2015 9:15 AM, Gopal Jee wrote: We have a 26 node solr cloud cluster. During heavy re-indexing, some of nodes go into recovering state. as per current config, soft commit is set to 15 minute and hard commit to 30 sec. Moreover, zkClientTimeout is set to 30 sec in solr nodes. Please

Re: Odp.: solr issue with pdf forms

2015-04-27 Thread Erick Erickson
We're still not quite there. There should be a load term info button on that page. Clicking that button will show you the terms in your index (as opposed to the raw stored input which is what you get when you look at results in the browser). My bet is that you'll see perfectly normal tokens in the

Re: Field attribute default value

2015-04-27 Thread Erick Erickson
I'd just define these in the fieldType definition explicitly. Then you're absolutely sure what each field has and can override as needed. Best, Erick On Mon, Apr 27, 2015 at 7:56 AM, Steven White swhite4...@gmail.com wrote: Hi Everyone, I'm looking at

Re: Integrating Solr with an existing web application - and SolrJ

2015-04-27 Thread O. Olson
Thank you very much Doug. I was thinking of putting Solr on a separate server, but I did not expect you to so strongly recommend Jetty. I think I would stick to the embedded Jetty, because I don't need the security. I'm using Solr 4.10.3 at the moment, so I'm not familiar with Solr 5. Thanks

Re: Solr node going to recovering state during heavy reindexing

2015-04-27 Thread Rajesh Hazari
our production solr nodes were having similar issue with 4 nodes everything is normal, but when we try to increase the replicas (nodes) to 10 most of then went to recovery. our config params : nodes : 20 (replica in each node) soft commit is 6 sec hard commit is 5 min indexing scheduled time :

Re: and stopword in user query is being change to q.op=AND

2015-04-27 Thread Rajesh Hazari
I did go through the documentation of edismax (solr 5.1 documentation), that suggests to use *stopwords* query param that signal the parser to respect stopfilterfactory while parsing, still i did not find this is happening. my final query looks like this

Re: Solr node going to recovering state during heavy reindexing

2015-04-27 Thread Rajesh Hazari
thanks, i am sure that we have missed this command line property, this gives me more information on how to use latest solr scripts more effectively. *Thanks,* *Rajesh**.* On Mon, Apr 27, 2015 at 12:04 PM, Shawn Heisey apa...@elyograg.org wrote: On 4/27/2015 9:15 AM, Gopal Jee wrote: We have

Start Solr with multiple external zookeepers on Windows Server?

2015-04-27 Thread Stephan Schubert
Hi everyone, how is it possible to start solr with an external set of zookeeper instances (quorum of 3 servers) on a windows server (2008R2)? From the wiki I got ( https://cwiki.apache.org/confluence/display/solr/Setting+Up+an+External+ZooKeeper+Ensemble ) bin\solr restart -c -p 8983 -z

Re: TIKA OCR not working

2015-04-27 Thread Konstantin Gribov
JFYI, there's no tesseract leptonica for centos6/rhel6 (even in epel), so I have specs for building tesseract and leptonica (its dependency) on github (https://github.com/grossws/tesseract-ocr-specs). Feel free to use if you're on centos/rhel. Also, tesseract language packs are trained for one

Re: Field attribute default value

2015-04-27 Thread Chris Hostetter
the defaults for a field/ come from the fieldType/ specified by the type attribute. From that point, the default behavior of a fieldType/ can vary by the individual FieldType class implementation (ie: most fields default to omitTermFreqAndPositions=true but TextField defaults to false) or by

Start Solr with multiple external zookeepers on Windows Server?

2015-04-27 Thread Stephan Schubert
Hi everyone, how is it possible to start solr with an external set of zookeeper instances (quorum of 3 servers) on a windows server (2008R2)? From the wiki I got ( https://cwiki.apache.org/confluence/display/solr/Setting+Up+an+External+ZooKeeper+Ensemble ) bin\solr restart -c -p 8983 -z

Re: TIKA OCR not working

2015-04-27 Thread Mattmann, Chris A (3980)
Thanks Konstantin! ++ Chris Mattmann, Ph.D. Chief Architect Instrument Software and Science Data Systems Section (398) NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 168-519, Mailstop: 168-527 Email:

more like this generated query

2015-04-27 Thread alxsss
Hello, I am using solr-4.10.4 with mlt. I noticed that mlt constructs query which is missing some words. For example, for doc with title: Jennnifer Lopez keywords: Jennifer, concert, Hollywood the parsedquery generated by mlt for this doc is title:lopez keywords:jennifer keywords:concert

Re: /suggest through SolrJ?

2015-04-27 Thread Alessandro Benedetti
Just had the very same problem, and I confirm that currently is quite a mess to manage suggestions in SolrJ ! I have to go with manual Json parsing. Cheers 2015-02-02 12:17 GMT+00:00 Jan Høydahl jan@cominvent.com: Using the /suggest handler wired to SuggestComponent, the SpellCheckResponse

Re: Start Solr with multiple external zookeepers on Windows Server?

2015-04-27 Thread Timothy Potter
Can you try defining the ZK_HOST in bin\solr.in.cmd instead of passing it on the command-line? On Mon, Apr 27, 2015 at 12:10 PM, Erick Erickson erickerick...@gmail.com wrote: What version of Solr are you using? 4.10.3? 5.1? And can we see the full output of your attempt to start Solr? There

Re: Start Solr with multiple external zookeepers on Windows Server?

2015-04-27 Thread Erick Erickson
What version of Solr are you using? 4.10.3? 5.1? And can we see the full output of your attempt to start Solr? There might be some more informative bits above the help response. Best, Erick On Mon, Apr 27, 2015 at 9:49 AM, Stephan Schubert stephan.schub...@sick.de wrote: Hi everyone, how is

Why are these two queries different?

2015-04-27 Thread Frank li
We did two SOLR qeries and they supposed to return the same results but didnot: Query 1: all_text:(US 4,568,649 A) parsedquery: (+((all_text:us ((all_text:4 all_text:568 all_text:649 all_text:4568649)~4))~2))/no_coord, Result: numFound: 0, Query 2: all_text:(US 4568649) parsedquery: