Backup stops replication

2013-07-06 Thread Cool Techi
Hi, We migrated from solr 3.6 to solr 4.3, when we fire a backup command (/replication?command=backuplocation=/disk4/backups ) on the salve the slave stops replicating OR starts full replication from the master. This was not the behavior in the earlier version or solr, I have check the logs

Re: Changing the number of shards?

2013-07-06 Thread Rafał Kuć
Hello! Just to be perfectly clear here - Solr has advantage over ElasticSearch because it can split the live index. However, this sentence from Jack's mail is not true: In other words, even though you can easily “add shards” on ES, those are really just replicas of existing primary shards and

Re: [Announcement] Norch- a search engine for node.js

2013-07-06 Thread Fergus McDowall
(Given that hardware is sufficient) The upper limit of documents in Norch is determined by the capacity of levelDB, the underlying data store. I have heard tell of a slight performance drop off in LevelDB after 200 000 000 million entries. If you say that one Norch document generates roughly

Re: [Announcement] Norch- a search engine for node.js

2013-07-06 Thread Fergus McDowall
Norch is based on the node module search-index which is like a simplified lucene, built with Google's levelDB library (https://github.com/fergiemcdowall/search-index) (so posting here is a bit cheeky- but I figured the solr/lucene readership might be interested :-) F On Jul 6, 2013, at

Re: solrj distributed solr example

2013-07-06 Thread Shalin Shekhar Mangar
If you are asking about how to use Solrj with SolrCloud, see http://wiki.apache.org/solr/Solrj#Using_with_SolrCloud If you are not using SolrCloud then add shards=solr_server1:port/solr,solr_server2:port/solr as a parameter to a query request. Indexing doesn't need anything special -- you just

Restrict/change numFound solr result

2013-07-06 Thread aniljayanti
Hi, I am working on solr 3.3. i am getting total 120 records with below query, in response xml numFound is showing 540 records. http://localhost:8080/test/select?q=*:*rows=*120* response lst name=responseHeader int name=status0/int int name=QTime1/int lst name=params str

Re: Aggregate TermFrequency on Result Grouping / Field Collapsing

2013-07-06 Thread Erick Erickson
Well, you've just restated the problem. I'm asking what use-case this is supporting? You've said: he/she wants to know the occurrence of a specific term in the result set of that 'X1' search criteria OK, _why_? Idle curiosity? Ranking the docs? Choosing the most relevant? I don't think you can

Re: Is it possible to find a leader from a list of cores in solr via java code

2013-07-06 Thread Erick Erickson
bq: Reason being If a write request is sent to the replica it relays it to the leader and then the leader relays it to all the replicas. This will help me in saving some network traffic as my application performs continuous writes How are you indexing? SolrJ already is leader aware and sends

Re: Solr cloud date based paritioning

2013-07-06 Thread Erick Erickson
Not saying it's always one way or the other, just that one shouldn't automatically _assume_ putting the most recent data on a single node is automatically good. It may well be, but not in all cases. On Wed, Jul 3, 2013 at 12:21 PM, Otis Gospodnetic otis.gospodne...@gmail.com wrote: Exactly.

Re: omitTermFreqAndPositions=true in easy English, please?

2013-07-06 Thread Erick Erickson
Neither ORing or ANDing the terms together will help you regardless of whether you do it or the query parser you use does it behind the scenes. That's because you have _string_ fields. So Google Cloud Storage is a single token in your index, not three. You probably want to spend some time on the

Re: Solr 4.3 Master/Slave Issues

2013-07-06 Thread Erick Erickson
kill -9 is evil, you aren't certain what the state is afterwards, so the presence of the lock file is not surprising.. solrconfig.xml has a commented-out entry unlockOnStartupfalse/unlockOnStartup I haven't personally used it, but it looks like it might help if you insist on kill -9.

Re: Invalid version (expected 2, but 60) or the data in not in 'javabin' format

2013-07-06 Thread Erick Erickson
The process looks like this: each shard returns the top 100K documents (actually the doc ID and whatever your sort criteria is, often just the score). _from every shard_ and the node that distributes that request then takes those 900K items and merges the list to get the 100K that satisfy the

Re: Surprising score?

2013-07-06 Thread Erick Erickson
Not a problem. index time boosts are boosts made _when you're indexing_, not when you're querying so omitting norms should stil have your query boosting work. Also, try adding debug=all and examining the results, it'll show you exactly how scores were calculated. It does take a bit to work though

Re: DIH: HTMLStripTransformer in sub-entities?

2013-07-06 Thread Andy Pickler
That's exactly what turned out to be the problem. We thought we had already tried that permutation but apparently hadn't. I know it's obvious in retrospect. Thanks for the suggestion. Thanks, Andy Pickler On Wed, Jul 3, 2013 at 2:38 PM, Alexandre Rafalovitch arafa...@gmail.comwrote: On Tue,

Re: Concurrent Modification Exception

2013-07-06 Thread Dmitry Kan
Ok. If you can create a test setting to repro the bug, I would suggest filing a jira. Sounds like a regression if your config remained otherwise same. On 5 Jul 2013 16:57, adityab aditya_ba...@yahoo.com wrote: well our observation leads us that this happens only during spell check. If we turn

Re: any plans to remove int32 limitation on the number of the documents in the index?

2013-07-06 Thread Erick Erickson
What does this have to to with removing the int32 limit? It's still the same problem if you have a high start parameter, it's the deep paging issue that's part of Solr. I know there has been work on this (you'll have to search the JIRAs), the basic idea is that you pass enough information that

Re: any plans to remove int32 limitation on the number of the documents in the index?

2013-07-06 Thread Erick Erickson
Oh, P.S. Solr is a great search engine, but it's certainly not the perfect answer to all problems. Mayhap you've hit on a case where it isn't the best solution On Sat, Jul 6, 2013 at 8:22 AM, Erick Erickson erickerick...@gmail.comwrote: What does this have to to with removing the int32

Re: 2.1billion+ document

2013-07-06 Thread Erick Erickson
uniqueKey is used to enforce there being only a single copy of a doc. Say a doc changes and you re-index it. If there is a doc in the index already _with the same uniqueKey_ it'll be deleted and the new one will be the only one visible. Which implies that if you do implement the suggestions, be

Re: Restrict/change numFound solr result

2013-07-06 Thread Erick Erickson
Solr only returns the number of documents specified by the rows parameter. You can page through your results by specifying, say, start=20rows=20 then start=40rows=20 etc. Or you can bump rows, but you really don't want to return huge result sets so you'll probably be paging sometime. But it is

Re: Changing the number of shards?

2013-07-06 Thread Jack Krupansky
I know this stuff can be confusing, but I think my statement is still true - it uses the phrase NEW primary shard - the promotion of a shard can only be to replace an existing primary shard, not to create a new and separate partitioning of key values. You left out the next sentence from my

Re: Concurrent Modification Exception

2013-07-06 Thread Yonik Seeley
On Fri, Jul 5, 2013 at 10:57 AM, adityab aditya_ba...@yahoo.com wrote: well our observation leads us that this happens only during spell check. If we turn off the spell check we don't see this issue occurring at all from our 24hrs test run. Are you using any custom components / plugins, or is

Solr 4.x union of cross-joins

2013-07-06 Thread mihaela olteanu
Hello, I have 3 indices that form a hierarchy. Basically these were constructed from 3 tables: parent, child1 and child2 and between parent and children there is a one to many relationship. parent (id,name) child1(id,join_key,field1)  child2(id,join_key,field2)  join_key is the foreign key

Re: Solr 4.x union of cross-joins

2013-07-06 Thread Walter Underwood
If you flatten this to make a single table with rows for each combination of parent, child1, and child2, the query will be simple and probably very fast. You've said it makes more sense to have three tables, and that is true for a relational database. For a search engine, like Solr, it makes

Re: Solr 4.x union of cross-joins

2013-07-06 Thread Yonik Seeley
On Sat, Jul 6, 2013 at 2:22 PM, mihaela olteanu mihaela...@yahoo.com wrote: Hello, I have 3 indices that form a hierarchy. Basically these were constructed from 3 tables: parent, child1 and child2 and between parent and children there is a one to many relationship. parent (id,name)

Re: Using the Schema API from SolrJ

2013-07-06 Thread Steven Glass
Does anyone have any idea how I can access the schema version info using SolrJ? Thanks. On Jul 3, 2013, at 4:16 PM, Steven Glass wrote: I'm using a Solr 4.3 server and accessing it from both a Java based desktop application using SolrJ and an Android based mobile application using my

Re: Using the Schema API from SolrJ

2013-07-06 Thread Jason Hellman
Steven, Some information can be gleaned from the system admin request handler: http://localhost:8983/solr/admin/system I am specifically looking at this: lst name=corestr name=schemaexample/str Mind you, that is a manually-set value in the schema file. But just in case you want to get crazy

Re: Using the Schema API from SolrJ

2013-07-06 Thread Steven Glass
Thanks for your response. But it seems like there should be a way to issue the equivalent of http://localhost:8983/solr/schema/version which returns { responseHeader:{ status:0, QTime:4}, version:1.5} from the server. I know how to do it using HTTPGet in

Re: Using the Schema API from SolrJ

2013-07-06 Thread Erick Erickson
You _should_ be able to use an HttpSolrServer, set the base URL and then go ahead and make the request. Haven't done it myself, but the SolrJ support for various Solr features often consists of just convenience methods over the underlying HTTP, saving you the necessity of, say, parsing the

Re: Concurrent Modification Exception

2013-07-06 Thread adityab
Nope no custom plugin. Just use the DirectSpellCheck component. We have raised a ticket with LucidWorks i will followup with that and once have a JIRA will update this post. -- View this message in context:

Re: [Announcement] Norch- a search engine for node.js

2013-07-06 Thread William Bell
Can it do Geo Spatial searching? (i.e. Find documents within 10 miles of a lat,long?) On Fri, Jul 5, 2013 at 12:53 PM, Fergus McDowall fergusmcdow...@gmail.comwrote: Here is some news that might be of interest to users and implementers of Solr

Lucene pass through for faceting in SOLR

2013-07-06 Thread William Bell
I submitted a JIRA ticket a while ago, since I thought that having a way to use the Lucene facets in SOLR could speed up our faceting. However, no one seems to have picked up the development. https://issues.apache.org/jira/browse/SOLR-4774 What is involved with hooking it into SOLR ? Similar to

Re: Find related words

2013-07-06 Thread William Bell
Why is LUCENE-474 not committed? On Thu, Jul 4, 2013 at 4:21 PM, Koji Sekiguchi k...@r.email.ne.jp wrote: Hi Dotan, (13/07/04 23:51), Dotan Cohen wrote: Thank you Jack and Koji. I will take a look at MLT and also at the .zip files from LUCENE-474. Koji, did you have to modify the code

RE: Solr 4.3 Master/Slave Issues

2013-07-06 Thread Cool Techi
thanks Erick. My tomcat setup is running on Ubuntu and has nothing else deployed other than the Solr war. My suspicion is it takes a long time to de-allocate the memory it has reserved for itself, but will get dump to find better. regards, Ayush Date: Sat, 6 Jul 2013 07:55:57 -0400 Subject:

Re: Is it possible to find a leader from a list of cores in solr via java code

2013-07-06 Thread vicky desai
Hi Erik, I just wanted to clarify if u got my concern right. If i send some documents to the replica core wont it first have to send the documents to the leader core which in turn would be sending it back to the replica cores. If yes then this will lead to additional network traffic which can be

Re: Is it possible to find a leader from a list of cores in solr via java code

2013-07-06 Thread Jack Krupansky
There are three concepts to grasp: 1. You can send Solr update requests to ANY node of the cluster. Period. 2. Any extra network traffic (within the cluster) is likely to be negligible and absolutely not worrying about unless you have definitive evidence to the contrary. 3. Leader nodes in