SolrCloud and Join Queries

2013-01-04 Thread Hassan
Hi, I am considering SolrCloud for our applications but I have run into the limitation of not being able to use Join Queries in distributed searches. Our requirements are the following: - SolrCloud will serve many applications where each application index is separate from other application.

RE: MoreLikeThis supporting multiple document IDs as input?

2013-01-04 Thread David Parks
Aha! mlt=true, that was the key I hadn't worked out before (thought it was qt=mlt that achieved that), things are looking rosy now, and these results are a perfect fit for my needs. Thanks very much for your time to help explain this!! David -Original Message- From: Jack Krupansky

Re: SolrCloud and Join Queries

2013-01-04 Thread Per Steffensen
On 1/4/13 9:21 AM, Hassan wrote: Hi, I am considering SolrCloud for our applications but I have run into the limitation of not being able to use Join Queries in distributed searches. Our requirements are the following: - SolrCloud will serve many applications where each application index is

Solr 3.6.2 or 4.0

2013-01-04 Thread vijeshnair
We are starting a new e-com application from this month onwards, for which I am trying to identify the right SOLR release. We were using 3.4 in our previous project, bu I have read in multiple blogs and forums about the improvements that SOLR 4 has in terms of efficient memory management, less

Re: Solr 3.6.2 or 4.0

2013-01-04 Thread Dikchant Sahi
As someone in the forum correctly said, if all Solr releases were evolutionary Solr 4.0 is revolutionary. It has lots of improvement over the previous releases like NoSql features, atomic updates, cloud features and lot more. Solr 4.0 would be the right migration I believe. Can someone in the

Re: What can we do if one shard's index crash

2013-01-04 Thread Erick Erickson
First, I'm assuming SolrCloud with Zookeeper etc. 1 Don't do anything. If Node A is the leader, the replica for that shard will become the leader. 2 This is a little unclear. There are two cases, a the leader crashed or b the replica crashed. a no problem, distributed

Re: Solr 3.6.2 or 4.0

2013-01-04 Thread Upayavira
3.6.2 is a maintenance release with bug fixes for existing 3.x users for whom an upgrade to 4.0 is too big a leap at present. 4.0 is the release that will see active development from here on in. If you ware starting with a new project, 4.0 seems a reasonable place to start. I'd expect 4.1 to be

Re: distributed / federated search Solr

2013-01-04 Thread Upayavira
Solr does not support federated search in the form you describe - that is, to make a query to Solr which solr defers to another search system. There may be ways you could achieve it (Solr is pretty extensible) and such a feature would be a very useful one, but it would take some, likely

Re: Terminology question: Core vs. Collection vs...

2013-01-04 Thread darren
This is a good explanation and makes sense. The one inconsistency is referring to a replica of a shard that has no replication. But its not that big of a problem. If you wove the term 'core' into your writeup below it would be complete and should be posted on the wiki. Sent from my Verizon

Re: Terminology question: Core vs. Collection vs...

2013-01-04 Thread Jack Krupansky
I thought about adding Solr core, but it only muddies the water. Yes, it needs to be added, but carefully. In the context of SolrCloud, a Solr core is the underlying representation of a replica. Alternatively, a replica of a shard of a collection is implemented as a Solr core. [Need to factor

Re: Terminology question: Core vs. Collection vs...

2013-01-04 Thread darren
Yes. Thats it. Its clear if we separate logical terms from physical terms. A simple cake diagram on the wiki along with perhaps a uml will solidify these concepts. Sent from my Verizon Wireless 4G LTE Smartphone Original message From: Jack Krupansky j...@basetechnology.com

Re: Terminology question: Core vs. Collection vs...

2013-01-04 Thread Yonik Seeley
On Fri, Jan 4, 2013 at 2:26 AM, Per Steffensen st...@designware.dk wrote: Our biggest problem is that we really havent decided once and for all and made sure to reflect the decision consistently across code and documentation. As long as we havnt I believe it is still ok to change our minds.

Re: Terminology question: Core vs. Collection vs...

2013-01-04 Thread darren
Agreed. But for completeness can it be node/collection/shard/replica/core? Sent from my Verizon Wireless 4G LTE Smartphone Original message From: Yonik Seeley yo...@lucidworks.com Date: To: solr-user@lucene.apache.org Subject: Re: Terminology question: Core vs. Collection

Re: Terminology question: Core vs. Collection vs...

2013-01-04 Thread darren
Actually. Node/collection/shard/replica/core/index Sent from my Verizon Wireless 4G LTE Smartphone Original message From: darren dar...@ontrenet.com Date: To: yo...@lucidworks.com,solr-user@lucene.apache.org Subject: Re: Terminology question: Core vs. Collection vs...

Solr 4 (CloudSolrServer and LBHttpSolrServer question)

2013-01-04 Thread Jay Parashar
Hi, I am trying to migrate to Solr 4 (from 3.6) for a multithreaded/multicollection environment using the Solrj java client. I need some clarification of when to use the Cloud Solr Server vs LBHttpSolrServer. Any help is appreciated. Which one do I use? The CloudSolrServer uses the LB server

Re: Solr 4 (CloudSolrServer and LBHttpSolrServer question)

2013-01-04 Thread Mark Miller
CloudSolrServer can be used for indexing and is smart about indexing since it knows the current cluster state. For 4.0 I'd use one per collection because there is a bug around this fixed in the upcoming 4.1 (using one for more than one collection). In fact, if you are moving to 4, it's a good

Re: Solr 4 (CloudSolrServer and LBHttpSolrServer question)

2013-01-04 Thread Luis Cappa Banda
Any release stimation date, Mark? I heard something about January. I was considering using 4.0 for production but if 4.1 release is incomming I could wait a little more. 2013/1/4 Mark Miller markrmil...@gmail.com CloudSolrServer can be used for indexing and is smart about indexing since it

Re: distributed / federated search Solr

2013-01-04 Thread Oleg Ruchovets
Ok , thank you for the answer. May be you can pointing me on documentation or any other source where can I get the Idea how to develop such extension. Thanks Oleg. On Fri, Jan 4, 2013 at 2:47 PM, Upayavira u...@odoko.co.uk wrote: Solr does not support federated search in the form you describe

Re: Solr 4 (CloudSolrServer and LBHttpSolrServer question)

2013-01-04 Thread Shawn Heisey
On 1/4/2013 8:54 AM, Luis Cappa Banda wrote: Any release stimation date, Mark? I heard something about January. I was considering using 4.0 for production but if 4.1 release is incomming I could wait a little more. I'm not a committer, but I contribute the occasional patch and keep an eye on

Re: Solr 4.0 SolrCloud with AWS Auto Scaling

2013-01-04 Thread Bill Au
thanks for pointing me to Solr's Zookeeper servlet. I will look at the source to see how I can use to fulfill my needs. Bill On Thu, Jan 3, 2013 at 6:43 PM, Mark Miller markrmil...@gmail.com wrote: Technically, you want to make sure zookeeper reports the node as live and active. You could

RE: Solr 4 (CloudSolrServer and LBHttpSolrServer question)

2013-01-04 Thread Jay Parashar
Thanks Mark. -Original Message- From: Mark Miller [mailto:markrmil...@gmail.com] Sent: Friday, January 04, 2013 9:51 AM To: solr-user@lucene.apache.org Subject: Re: Solr 4 (CloudSolrServer and LBHttpSolrServer question) CloudSolrServer can be used for indexing and is smart about

Re: Solr 4 (CloudSolrServer and LBHttpSolrServer question)

2013-01-04 Thread Mark Miller
I'm going to push *hard* for a Jan release. Woe to those that get in my way :) - Mark On Jan 4, 2013, at 11:37 AM, Shawn Heisey s...@elyograg.org wrote: On 1/4/2013 8:54 AM, Luis Cappa Banda wrote: Any release stimation date, Mark? I heard something about January. I was considering using 4.0

Re: indexing cpu utilization

2013-01-04 Thread Uwe Reh
Hi Mark, SOLR-3929 rocks! A nigthly build of 4.1 with maxIndexingThreads configured to 24, takes 80% to 100% of the cpu resources :-) Thank you, Otis and Gora mpstat 10 CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl 00 0 13 607 241 234 78 100

RE: Solr 4 (CloudSolrServer and LBHttpSolrServer question)

2013-01-04 Thread Markus Jelsma
Well, i hope this won't spoil everything then: https://issues.apache.org/jira/browse/SOLR-4260 I'll continue tests monday -Original message- From:Mark Miller markrmil...@gmail.com Sent: Fri 04-Jan-2013 17:54 To: solr-user@lucene.apache.org Subject: Re: Solr 4 (CloudSolrServer and

Re: Terminology question: Core vs. Collection vs...

2013-01-04 Thread Alexandre Rafalovitch
Can I just start by saying that this was AMAZING. :-) When I asked the question, I certainly did not expect this level of details. And I vote on the cake diagram for WIKI as well. Perhaps, two with the first one showing the trivial collapsed state of single collection/shard/replica/core. The

Re: Terminology question: Core vs. Collection vs...

2013-01-04 Thread Jack Krupansky
The entire collection does have an index - a distributed index - which consists of a Lucene index on each core/replica for the subset of the data in that shard. -- Jack Krupansky -Original Message- From: Alexandre Rafalovitch Sent: Friday, January 04, 2013 1:12 PM To:

Re: Terminology question: Core vs. Collection vs...

2013-01-04 Thread darren
My understanding is core is a logical solr term. Index is a physical lucene term. A solr core is backed by a physical lucene index. One index per core. Solr team can correct me if its not accurate. :) Sent from my Verizon Wireless 4G LTE Smartphone Original message From:

Re: Terminology question: Core vs. Collection vs...

2013-01-04 Thread Alexandre Rafalovitch
Hmm. Doesn't that make (logical) index=collection? And (physical) index=core? Which creates duplication of terminology and at the same time can cause confusion between highest logical and lowest physical level. Regards, Alex. P.s. Hoping not to start a new terminology war. Personal blog:

Re: Terminology question: Core vs. Collection vs...

2013-01-04 Thread Yonik Seeley
On Fri, Jan 4, 2013 at 1:35 PM, Alexandre Rafalovitch arafa...@gmail.com wrote: Hmm. Doesn't that make (logical) index=collection? And (physical) index=core? Which creates duplication of terminology and at the same time can cause confusion between highest logical and lowest physical level.

Re: Terminology question: Core vs. Collection vs...

2013-01-04 Thread darren
I agree. In my opinion index is a low level lucene thing. I never say a collection has an index directly. That confuses levels and creates confusion. To me at least. I think the terminology discussed is good. Just some lingering usage inconsistencies. Sent from my Verizon Wireless 4G LTE

Re: Terminology question: Core vs. Collection vs...

2013-01-04 Thread Upayavira
Using your terminology, I'd say core is a physical solr term, and index is a pysical lucene term. A collection or a shard is a logical solr term. Upayavira On Fri, Jan 4, 2013, at 06:28 PM, darren wrote: My understanding is core is a logical solr term. Index is a physical lucene term. A solr

Re: Terminology question: Core vs. Collection vs...

2013-01-04 Thread darren
Good point. Agree. Sent from my Verizon Wireless 4G LTE Smartphone Original message From: Upayavira u...@odoko.co.uk Date: To: solr-user@lucene.apache.org Subject: Re: Terminology question: Core vs. Collection vs... Using your terminology, I'd say core is a physical

Re: Terminology question: Core vs. Collection vs...

2013-01-04 Thread Mark Miller
Currently a SolrCore is 1:1 with a low level Lucene index. There is no reason that needs to alway be that way. It's possible that we may at some point add built in micro sharding support that means a SolrCore could have multiple underlying Lucene indexes. Or we may not. - Mark On Jan 4,

Re: Terminology question: Core vs. Collection vs...

2013-01-04 Thread Darren Govoni
Yes. In that case, core should best be described as a logical solr entity with various managed attributes and qualities above the physical layer (sorry, not trying to perpetuate this thread so much). On 01/04/2013 01:55 PM, Mark Miller wrote: Currently a SolrCore is 1:1 with a low level

Re: Terminology question: Core vs. Collection vs...

2013-01-04 Thread Per Steffensen
It was a very good explanation, Jack! I believe I have heard most of it before, so it is really not new for me. I DO understand that the name replica and replication-factor CAN be justified, but it requires a long and thorough explanation. And thats the point. A good name for a concept means

Re: distributed / federated search Solr

2013-01-04 Thread Upayavira
We're not gonna have documentation to explain it. I guess it is more a question of starting a discussion here about how to do it. My thought would be to write an adapter in front of your APIs to make it look like a Solr instance, and fake distributed search. But, to get that to work, you'd need

Re: Terminology question: Core vs. Collection vs...

2013-01-04 Thread Alexandre Rafalovitch
Would this be a reasonable (if very rough) attempt at cake diagram? https://docs.google.com/drawings/d/1XxLjds0OOm44zOVCMR-cwCJXnTs3C2x257KpCTxI1Ec/edit Not sure if I managed to get logical/physical separation clearly enough, but it could be a start. Regards, Alex. Personal blog:

Re: Terminology question: Core vs. Collection vs...

2013-01-04 Thread Mark Miller
On Jan 4, 2013, at 2:14 PM, Per Steffensen st...@designware.dk wrote: I'm not sure what the node tells Zookeeper and who does shard assignment. I mean, does a node explicitly say what shard it wants to be, or is that assigned by Zookeeper, or is that a node's choice/option? It's basically

Re: Solr 3.6.2 or 4.0

2013-01-04 Thread Otis Gospodnetic
Hi, If you don't need to shard your index and don't need NRT search Solr 3.x is much simpler to operate and is more mature. Otis Solr ElasticSearch Support http://sematext.com/ On Jan 4, 2013 7:08 AM, Dikchant Sahi contacts...@gmail.com wrote: As someone in the forum correctly said, if all

Re: distributed / federated search Solr

2013-01-04 Thread Oleg Ruchovets
Yes , it would be great to start discussion of this topic. I am looking a sort of kick start information to get start more detailed investigation. And of course may be someone already faced with this problem so please share your ideas and experience. Thanks Oleg. On Fri, Jan 4, 2013 at 2:15 PM,

Re: Solr 3.6.2 or 4.0

2013-01-04 Thread Upayavira
I agree with the 'more mature' analysis, but surely you can use 4.0 in a 3.x style without greater difficulty, no? Upayavira On Fri, Jan 4, 2013, at 07:35 PM, Otis Gospodnetic wrote: Hi, If you don't need to shard your index and don't need NRT search Solr 3.x is much simpler to operate and

Re: background merge hit exception AND read past EOF: NIOFSIndexInput

2013-01-04 Thread Otis Gospodnetic
Sounds like you may have a corrupt index. Try running the CheckIndex tool. Otis Solr ElasticSearch Support http://sematext.com/ On Jan 3, 2013 8:59 AM, Karan jindal karanjindal1...@gmail.com wrote: Hi everyone, I have a solr index which is built using solr 3.2. I am facing two problem with

Re: distributed / federated search Solr

2013-01-04 Thread Alexandre Rafalovitch
I think the problem is that you have to interpret the user query (Solr has one syntax, other sources have a different one) and then combine results (how?). All of those are non-trivial. Have you looked at something like http://www.comcepta.com/en/enterprise-metasearch.html which builds on top of

RE: search features Endeca vs Solr

2013-01-04 Thread Dyer, James
Sachin, You might more response on this list is you can describe a little in detail what your application needs to do. A lot of us haven't used Endeca and won't understand exactly what you mean here. With that said, I migrated a few apps from Endeca to Solr a few years back and will try to

Re: search features Endeca vs Solr

2013-01-04 Thread Mark Miller
On Jan 4, 2013, at 3:41 PM, Dyer, James james.d...@ingramcontent.com wrote: 4. Dynamic Business Rules. There is an open JIRA issue around biz rules and drools integration. Not sure if there is any work done there, but at least some notes about it last I looked. - Mark

Search Engineers at SimplyHired.com

2013-01-04 Thread Jagdish Nomula
Hello Solr-Users, I thought you, or someone you know, might be interested in a very important role here at Simply Hired. The Staff Search Engineer will own the responsibility of writing the search engine of SimplyHired. You will work on cutting edge machine learning, search and big data

Solr 4 exceptions on trying to create a collection

2013-01-04 Thread Jay Parashar
Hi All, I am getting exceptions on trying to create a collection. Any help is appreciated. While trying to create a collection, I got this error Caused by: org.apache.solr.client.solrj.SolrServerException: No live SolrServers available to handle this request at

Re: StatsComponent and query times while indexing

2013-01-04 Thread Otis Gospodnetic
Hi, I think what you are seeing is a general thing. Regular search is slower while there is indexing, too, of course. So maybe it's best to mentally decouple indexing part here and simply make your calls as fast as possible without indexing. Then you can add indexing and play with things like

Re: Solr 4 exceptions on trying to create a collection

2013-01-04 Thread Alexandre Rafalovitch
For the second one: Wrong version of library on a classpath or multiple versions of library on the classpath which causes wrong classes with missing fields/variables? Or library interface baked in and the implementation is newer. Some sort of mismatch basically. Most probably in Apache http

Re: StatsComponent and query times while indexing

2013-01-04 Thread Marcin Rzewucki
Thanks. I guess you're right - it's normal behaviour. Are there some guidelines how to use ramBufferSizeMB or only by testing ? Do you know if DIH is gentler than indexing via REST or solrj API ? Kind regards. On 4 January 2013 23:14, Otis Gospodnetic otis.gospodne...@gmail.comwrote: Hi, I

RE: Solr 4 exceptions on trying to create a collection

2013-01-04 Thread Jay Parashar
Thanks! I had a different version of httpclient in the classpath. So the 2nd exception is gone but now I am back to the first one org.apache.solr.client.solrj.SolrServerException: No live SolrServers available to handle this request -Original Message- From: Alexandre Rafalovitch

Re: StatsComponent and query times while indexing

2013-01-04 Thread Upayavira
DIH won't make any real difference, I'd say. The work to write terms to your index still happens in either case. Upayavira On Fri, Jan 4, 2013, at 11:25 PM, Marcin Rzewucki wrote: Thanks. I guess you're right - it's normal behaviour. Are there some guidelines how to use ramBufferSizeMB or only

Re: Solr 4 exceptions on trying to create a collection

2013-01-04 Thread Alexandre Rafalovitch
Tried Wireshark yet to see what host/port it is trying to connect and why it fails? It is a complex tool, but well worth learning. Regards, Alex. Personal blog: http://blog.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps

Re: Solr 4 (CloudSolrServer and LBHttpSolrServer question)

2013-01-04 Thread Jack Krupansky
That's probably as official as anything ever gets around here. -- Jack Krupansky -Original Message- From: Mark Miller Sent: Friday, January 04, 2013 11:47 AM To: solr-user@lucene.apache.org Subject: Re: Solr 4 (CloudSolrServer and LBHttpSolrServer question) I'm going to push *hard*

Re: StatsComponent and query times while indexing

2013-01-04 Thread Otis Gospodnetic
If you index from the outside (i.e. not using DIH) you have more control: * how many threads you use * how you batch documents * how much you wait between indexing batches ... Otis -- Solr ElasticSearch Support http://sematext.com/ On Fri, Jan 4, 2013 at 6:25 PM, Marcin Rzewucki

Re: Removing terms from a search query with no results

2013-01-04 Thread Jack Krupansky
Not at this time. That is something you would do at your app level - re-query with a looser query if zero results for the original query. -- Jack Krupansky -Original Message- From: Varun Thacker Sent: Friday, January 04, 2013 7:50 AM To: solr-user@lucene.apache.org Subject: Removing

Re: Removing terms from a search query with no results

2013-01-04 Thread Otis Gospodnetic
Hi Varun, I don't think this exists in Solr... But have a look at http://sematext.com/products/dym-researcher/index.html . Look at the screenshot and you will spot something labeled as Relaxer in the blue area. This (Query) Relaxer is DYM ReSearcher's cousin and can be seen in action on

Re: SolrCloud and Join Queries

2013-01-04 Thread Otis Gospodnetic
Hi, I think things will work for Hassan as he described them. The key is not to shard in his case, that's all. Hassan, yes, 1-2M docs is small. But beware of creating a crazy number (e.g. thousands) of collections per server, as each collection has some cost. Otis -- Solr ElasticSearch

Re: Solr Cloud index refreshes after restart

2013-01-04 Thread Sai Gadde
Hi Erick, The issue was with zookeeper when we tried to force full replication by cleaning the datadir in zookeeper, caused the index removal. Our index always replicated full even on short outage or restart. I think too far out of date could be the reason. We felt zookeeper was to blame here.