DIH with SolrCloud

2013-10-07 Thread Prasi S
Hi , I have setup solrcloud with solr4.4. The cloud has 2 tomcat instances with separate zookeeper. i execute the below command in the url, http://localhost:8180/solr/colindexer/dataimportmssql?command=full-import&commit=true&clean=false 0 0 data-config-mssql.xml status idle 1 0 0 201

SolrCloud shard splitting keeps failing

2013-10-07 Thread Kalle Aaltonen
I have a test system where I have a index of 15M documents in one shard that I would like to split in two. I've tried it four times now. I have a stand-alone zookeeper running on the same machine. The end result is that I have two new shards with state "construction", and each has one replica whic

Re: How to round solr score ?

2013-10-07 Thread Mamta Thakur
Thanks for your replies. I am actually doing the frange approach for now. The only downside I see there is it makes the function call twice, calling createWeight() twice. And so my social connections are evaluated twice which is quite heavy operation. So I was thinking if I could get away with o

Re: Fix sort order within an index ?

2013-10-07 Thread Upayavira
On Mon, Oct 7, 2013, at 11:09 PM, user 01 wrote: > Any way to store documents in a fixed sort order within the indexes of > certain fields(either the arrival order or sorted by int ids, that also > serve as my unique key), so that I could store them optimized for > browsing > lists of items ? >

Re: Doing time sensitive search in solr

2013-10-07 Thread Erick Erickson
I'd index them as separate documents. Best, Erick On Mon, Oct 7, 2013 at 2:59 PM, Darniz wrote: > Thanks Eric > > Ok if we go by that proposal of copying all date fields into on bag_of_dates > field > > Hence now we have a field and it will look something like this. > > 2013-09-01T00:00:0

Re: solr cpu usage

2013-10-07 Thread Erick Erickson
Tim: Thanks! Mostly I wrote it to have something official looking to hide behind when I didn't have a good answer to the hardware sizing question :). On Mon, Oct 7, 2013 at 2:48 PM, Tim Vaillancourt wrote: > Fantastic article! > > Tim > > > On 5 October 2013 18:14, Erick Erickson wrote: > >> Fr

Re: Soft commit and flush

2013-10-07 Thread Erick Erickson
bq: If so, using soft commit without calling hard commit could cause OOM no. Aside from anything you have configured for auto(hard) commit, the ramBufferSizeMB in solrconfig.xml will flush the in-memory structures out to the segments when the size reaches this limit. It won't _close_ the current

Re: Search for non empty fields in a index with denormalized tables

2013-10-07 Thread Erick Erickson
I don't think your model fits well into Solr. What I'd do is make my the patient ID, and put the image names (or links or whatever) in a multiValued field. Then you can do what you want by a simple q=*:* -image_name:[* TO *] Best, Erick On Mon, Oct 7, 2013 at 9:20 AM, SandroZbinden wrote: > Ok

Re: no such field error:smaller big block size details while indexing doc files

2013-10-07 Thread Erick Erickson
Well, one of the attributes parsed out of, probably the meta-information associated with one of your structured docs is SMALLER_BIG_BLOCK_SIZE_DETAILS and Solr Cel is faithfully sending that to your index. If you want to throw all these in the bit bucket, try defining a true catch-all field that ig

Re: Regarding edismax parsing

2013-10-07 Thread Erick Erickson
You're probably having problem with the distinction between query parsing and analysis which has been discussed many times. The issues is that the query parser breaks things up into individual tokens and _then_ sends them to the analyzer chain as individual tokens (usually). Try escaping your spa

Issue with distributed spelling check in Solr 4.4

2013-10-07 Thread shamik
Hi, We are in the process of transitioning to SolrCloud (4.4) from Master-Slave architecture (4.2) . One of the issues I'm facing now is with making spell check work. It only seems to work if I explicitly set distrib=false. I'm using a custom request handler and included the spell check opt

Fix sort order within an index ?

2013-10-07 Thread user 01
Any way to store documents in a fixed sort order within the indexes of certain fields(either the arrival order or sorted by int ids, that also serve as my unique key), so that I could store them optimized for browsing lists of items ? The order for browsing is always fixed & there are no further f

Solr 4.5 - CoreAPI issue with CREATE

2013-10-07 Thread yriveiro
Hi, I'm doing replicas for my shards manually and the solr.xml config doesn't save the changes (solr.xml attribute "persist" = true). The command used is: curl 'http://192.168.2.18:8983/solr/admin/cores?action=CREATE&name=test_shard1_replica2&collection=test&shard=shard1' Someone else with the

Re: Adding OR operator in querystring and grouping fields?

2013-10-07 Thread PeterKerk
@Jason: your example worked perfectly! -- View this message in context: http://lucene.472066.n3.nabble.com/Adding-OR-operator-in-querystring-and-grouping-fields-tp4093942p4093999.html Sent from the Solr - User mailing list archive at Nabble.com.

How to achieve distributed spelling check in SolrCloud ?

2013-10-07 Thread Shamik Bandopadhyay
Hi, We are in the process of transitioning to SolrCloud (4.4) from Master-Slave architecture (4.2) . One of the issues I'm facing now is with making spell check work. It only seems to work if I explicitly set distrib=false. I'm using a custom request handler and included the spell check option.

Split shard doesn't persist data correctly on solr.xml

2013-10-07 Thread yriveiro
I notice that when a SPLISHARD operation finish, the solr.xml is not update properly. # Parent solr.xml: # Children solr.xml: # Paren Clusterstate: "shard1":{ "range":"8000-", "state":"inactive", "replicas":{"192.168.2.18:8983_solr_test_shard1_r

Re: Adding OR operator in querystring and grouping fields?

2013-10-07 Thread Jason Hellman
fq=here:there OR this:that For the lurker: an AND should be: fq=here:there&fq=this:that While you can, technically, pass: fq=here:there AND this:that Solr will cache the separate fq= parameters and reuse them in any context. The AND(ed) filter will be cached as a single entr

Re: Adding OR operator in querystring and grouping fields?

2013-10-07 Thread Jack Krupansky
Combine the two filter queries with an explicit OR operator. -- Jack Krupansky -Original Message- From: PeterKerk Sent: Monday, October 07, 2013 1:50 PM To: solr-user@lucene.apache.org Subject: Re: Adding OR operator in querystring and grouping fields? Ok thanks. "you must combine them

SolrJ best pratices

2013-10-07 Thread Mark
Are there any links describing best practices for interacting with SolrJ? I've checked the wiki and it seems woefully incomplete: (http://wiki.apache.org/solr/Solrj) Some specific questions: - When working with HttpSolrServer should we keep around instances for ever or should we create a single

Gracefully stopping jetty server - LockObtainFailedException

2013-10-07 Thread Ashwin Tandel
Hi, I have solr cloud(4.1) setup with embedded jetty server. I use the below command to start and stop the server. start server : nohup java -DSTOP.PORT=8085 -DSTOP.KEY= -DnumShards=2 -Dbootstrap_confdir=./solr/nlp/conf -Dcollection.configName=myconf -DzkHost=10.88.139.206:2181,10.88.139.206:2

Re: Doing time sensitive search in solr

2013-10-07 Thread Darniz
Thanks Eric Ok if we go by that proposal of copying all date fields into on bag_of_dates field Hence now we have a field and it will look something like this. 2013-09-01T00:00:00Z 2013-12-01T00:00:00Z Sept content : Honda is releasing the car this month Dec content : T

Re: Delete a field - Atomic updates (SOLR 4.1.0) without using null="true"

2013-10-07 Thread Jason Hellman
I don't know if there's a way to accomplish your goal directly, but as a pure workaround, you can write a routine to fetch all the stored values and resubmit the document without the field in question. This is what atomic updates do, minus the overhead of the transmission. On Oct 7, 2013, at 1

Re: solr cpu usage

2013-10-07 Thread Tim Vaillancourt
Fantastic article! Tim On 5 October 2013 18:14, Erick Erickson wrote: > From my perspective, your question is almost impossible to > answer, there are too many variables. See: > > http://searchhub.org/dev/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/ > > Best

Re: {soft}Commit and cache flusing

2013-10-07 Thread Tim Vaillancourt
Is there a way to make autoCommit only commit if there are pending changes, ie: if there are 0 adds pending commit, don't autoCommit (open-a-searcher and wipe the caches)? Cheers, Tim On 2 October 2013 00:52, Dmitry Kan wrote: > right. We've got the autoHard commit configured only atm. The so

Delete a field - Atomic updates (SOLR 4.1.0) without using null="true"

2013-10-07 Thread SolrLover
I am using SOLR 4.1.0 and perform atomic updates on SOLR documents. Unfortunately there is a bug in 4.1.0 (https://issues.apache.org/jira/browse/SOLR-4297) that blocks me from using null="true" for deleting a field through atomic update functionality. Is there any other way to delete a field other

Re: Adding OR operator in querystring and grouping fields?

2013-10-07 Thread PeterKerk
Ok thanks. "you must combine them into one filter query parameter. ", how would I do that? Can I simply change the URL structure or must I change my schema.xml and/or data-config.xml? -- View this message in context: http://lucene.472066.n3.nabble.com/Adding-OR-operator-in-querystring-and-group

Re: Adding OR operator in querystring and grouping fields?

2013-10-07 Thread Jack Krupansky
The default query operator applies only within a single query parameter. If you want to OR two filter queries, you must combine them into one filter query parameter. -- Jack Krupansky -Original Message- From: PeterKerk Sent: Monday, October 07, 2013 1:08 PM To: solr-user@lucene.apach

Adding OR operator in querystring and grouping fields?

2013-10-07 Thread PeterKerk
This query returns the correct results: http://localhost:8983/solr/tt/select/?indent=on&fq={!geofilt}&pt=41.7882,-71.9498&sfield=geolocation&d=2000&q=*:*&start=0&rows=12&fl=id,title&facet.mincount=1&fl=_dist_:geodist() However, I want to add OR select on a field city as well: &fq=city:(brooklyn)

Web App Engineer at Harvard-Smithsonian Astrophysical Observatory, full time, indefinite contract

2013-10-07 Thread Roman Chyla
Dear all, We are looking for a new member to join our team. This position requires solid knowledge of Python, plus experience with web development, HTML5, XSLT, JSON, CSS3, relational databases and NoSQL but search (and SOLR) is the central point of everything we do here. So, if you love SOLR/Luce

Re: Among LatLonType & SpatialRecursivePrefixTreeFieldType which one for filtering outside of bounding box?

2013-10-07 Thread Smiley, David W.
Use the location_rpt field type in the example schema.xml -- it has "good performance & less memory" (what you asked for) compared to LatLonType. To learn how to tweak some of the settings to get better performance at the expense of some accuracy, see http://wiki.apache.org/solr/SolrAdaptersForLuce

Re: feedback on Solr 4.x LotsOfCores feature

2013-10-07 Thread Shalin Shekhar Mangar
I think we'd all love to see those improvements land in Solr. I was involved in the work at AOL WebMail where the LotsOfCores idea originated. We had many of the problems that you've had to solve yourself. I remember that we switched to compound file format to reduce file descriptors. Also we had

Re: feedback on Solr 4.x LotsOfCores feature

2013-10-07 Thread Yago Riveiro
I assume that the lotOfCores feature doesn't use zookeeper I tried simulate the cores as collection, but when the size of clusterstate.json is bigger than 1M and -Djute.maxbuffer is needed to increase the 1 mega limitation. A naive question, why clusterstate.json is doesn't by collection?

Re: Soft commit and flush

2013-10-07 Thread Guido Medina
Out of Memory Exception is well known as OOM. Guido. On 07/10/13 14:11, adfel70 wrote: Sorry, by "OOE" I meant Out of memory exception... -- View this message in context: http://lucene.472066.n3.nabble.com/Soft-commit-and-flush-tp4091726p4093902.html Sent from the Solr - User mailing list a

Re: Search for non empty fields in a index with denormalized tables

2013-10-07 Thread SandroZbinden
Okay I try to specify my question a little bit. I have a denormalized index of two sql tables patient and table. If I add a patient with two images to the solr index my index contains 3 documents. --- Pat_ID |Patient_Lastnname | Image_ID | Image_Name -

Re: Soft commit and flush

2013-10-07 Thread adfel70
Sorry, by "OOE" I meant Out of memory exception... -- View this message in context: http://lucene.472066.n3.nabble.com/Soft-commit-and-flush-tp4091726p4093902.html Sent from the Solr - User mailing list archive at Nabble.com.

no such field error:smaller big block size details while indexing doc files

2013-10-07 Thread sweety
Im trying to index .doc,.docx,pdf files, im using this url: curl "http://localhost:8080/solr/document/update/extract?literal.id=12&commit=true"; -F"myfile=@complex.doc" This is the error I get: Oct 07, 2013 5:02:18 PM org.apache.solr.common.SolrException log SEVERE: null:java.lang.RuntimeException

Re: feedback on Solr 4.x LotsOfCores feature

2013-10-07 Thread Erick Erickson
Thanks for the great writeup! It's always interesting to see how a feature plays out "in the real world". A couple of questions though: bq: We added 2 Cores options : Do you mean you patched Solr? If so are you willing to shard the code back? If both are "yes", please open a JIRA, attach the patch

Regarding edismax parsing

2013-10-07 Thread Prashant Golash
Hi, I have a question regarding to parsing of tokens in edismax parser and subsequently a follow up question related to same. - Each field has list of analyzers and tokenizers as configured in schema.xml (Index and query time). Now, say I search for query - red shoes. So, is it like that

Re: Difference Between Query Time and Elapsed Time at Solrj Query Response

2013-10-07 Thread Erick Erickson
Query time is the time spent in Solr getting the search results. It does NOT include reading the bits off disk to assemble the response etc. elapsed time is the time from when the query was sent to the time it gets back. It includes qtime, reading the bits off disk to assemble the response, trans

Re: Soft commit and flush

2013-10-07 Thread Erick Erickson
bq: Does the NRTCachingDirectoryFactory relevant for both types of commit, or just for hard commit Don't know the code deeply, but NRT==Near Real Time == Soft commit I'd guess. bq: If soft commit does not flush... soft commit flushes the transaction log. On restart if the content of the tlog isn

Re: How to warm up filter queries for a category field with 1000 possible values ?

2013-10-07 Thread Erick Erickson
That's what the "autowarm" number for filterCache is about. It re-executes the last N fq clauses and caches them. Similarly for some of the other autowarm. But don't go wild here. Measure _then_ fix. Usually autowarming just a few (< 32) is sufficient. And remember that autowarming is done whenev

Re: Doing time sensitive search in solr

2013-10-07 Thread Erick Erickson
Wait, are you saying you have fields like 2013-12-01T00:00:00Z_entryDate? So you have some wildcard definition in your schema like *_entryDate type="tdate"? If so, I think your model is just wrong and you should have some field(s) that you store dates in. That aside, and assuming you have wildcard

Re: Improving indexing performance

2013-10-07 Thread Erick Erickson
Just skimmed, but the usual reason you can't max out the server is that the client can't go fast enough. Very quick experiment: comment out the server.add line in your client and run it again, does that speed up the client substantially? If not, then the time is being spent on the client. Or split

How to share Schema between multicore on Solr 4.4

2013-10-07 Thread Dharmendra Jaiswal
I am using Solr 4.4 version with SolrCloud on Windows machine. Somehow i am not able to share schema between multiple core. My solr.xml file look like:- ${shareSchema:true} ${hostContext:SolrEngine} ${tomcat.port:8080} ${zkClientTimeout:15000} I have used core.properties file for each core. On

Re: Does the queryResultCache, contain all the results returned by main query or after filtering out

2013-10-07 Thread Erick Erickson
No, the queryResultCache contains the top N for the query, _including_ the filters. The idea is that you should be able to get the next page of results without going to any searching code. You couldn't do this if in the scenario you describe. If your filters are truly unique, you'll gain a little

feedback on Solr 4.x LotsOfCores feature

2013-10-07 Thread Soyez Olivier
Hello, In my company, we use Solr in production to offer full text search on mailboxes. We host dozens million of mailboxes, but only webmail users have such feature (few millions). We have the following use case : - non static indexes with more update (indexing and deleting), than select requests

Re: [SolrJ] HttpSolrServer - maxRetries

2013-10-07 Thread Bram Van Dam
On 10/07/2013 12:55 PM, Furkan KAMACI wrote: One more thing, could you say that which version of Solr you are using? The stacktrace comes from 4.2.1, but I suspect that this could occur on 4.4 as well. I've not been able to reproduce this consistently: it has happened twice (!) after indexing

Re: [SolrJ] HttpSolrServer - maxRetries

2013-10-07 Thread Furkan KAMACI
One more thing, could you say that which version of Solr you are using? 2013/10/7 Bram Van Dam > On 10/07/2013 11:51 AM, Furkan KAMACI wrote: > >> Could you send you error logs? >> > > Whoops, forgot to paste: > > > Caused by: org.apache.solr.client.solrj.**SolrServerException: > IOException oc

Re: [SolrJ] HttpSolrServer - maxRetries

2013-10-07 Thread Bram Van Dam
On 10/07/2013 11:51 AM, Furkan KAMACI wrote: Could you send you error logs? Whoops, forgot to paste: Caused by: org.apache.solr.client.solrj.SolrServerException: IOException occured when talking to server at: http://localhost:8080/solr/fooIndex at org.apache.solr.client.solrj.impl.H

Re: [SolrJ] HttpSolrServer - maxRetries

2013-10-07 Thread Furkan KAMACI
Hi Bram; Could you send you error logs? 2013/10/7 Bram Van Dam > Hi folks, > > Long story short: I'm occasionally getting exceptions under heavy load > (SocketException: Connection reset). I would expect HttpSolrServer to try > again maxRetries-times, but it doesn't. > > For reasons I don't en

[SolrJ] HttpSolrServer - maxRetries

2013-10-07 Thread Bram Van Dam
Hi folks, Long story short: I'm occasionally getting exceptions under heavy load (SocketException: Connection reset). I would expect HttpSolrServer to try again maxRetries-times, but it doesn't. For reasons I don't entirely understand, the call to httpClient.execute(method) is not inside the

Re: Shard split issue

2013-10-07 Thread Yago Riveiro
If the replica has 20G must probably the recovery will take more than 120 seconds. In my case I have ssd's and 120 it's not enough. -- Yago Riveiro Sent with Sparrow (http://www.sparrowmailapp.com/?sig) On Monday, October 7, 2013 at 9:19 AM, Shalin Shekhar Mangar wrote: > I think what is h

Difference Between Query Time and Elapsed Time at Solrj Query Response

2013-10-07 Thread Furkan KAMACI
QueryResponse object at Solrj has two different methods for required time for a given query. One of them is for *QTime(queryTime)* and the other one is for *elapsedTime. *What are the differences between them and what exactly for elapsedTime?

Re: Shard split issue

2013-10-07 Thread Shalin Shekhar Mangar
I think what is happening here is that the sub shard replicas are taking time to recover. We use a core admin command to wait for the replicas to become active before the shard states are switched. The timeout value for that command is just 120 seconds. We should wait for more than that. I'll open