Re: Terminology question: Core vs. Collection vs...

2013-01-03 Thread Per Steffensen
On 1/3/13 5:58 PM, Walter Underwood wrote: A "factor" is multiplied, so multiplying the leader by a replicationFactor of 1 means you have exactly one copy of that shard. I think that recycling the term "replication" within Solr was confusing, but it is a bit late to change that. wunder Yes, t

Re: Solr Collection API doesn't seem to be working

2013-01-03 Thread Per Steffensen
On 1/3/13 5:26 PM, Yonik Seeley wrote: I agree - it's pointless to have two replicas of the same shard on a single node. But I'm talking about having replicationFactor as a target, so when you start up *new* nodes they will become a replica for any shard where the number of replicas is currentl

search features Endeca vs Solr

2013-01-03 Thread Sachin Gangadhar Katarki
Hi Everyone, I am looking for search features similar to what Endeca provides in Solr. I have googled a lot could not get answer for few and for some I could not get a right answers. 1) Keyword re-direct : On search particular keyword user will be redirected to configured URL 2) Di

Re: Terminology question: Core vs. Collection vs...

2013-01-03 Thread Per Steffensen
On 1/3/13 4:55 PM, Mark Miller wrote: Trying to forge our own path here seems more confusing than helpful IMO. We have enough issues with terminology right now - where we can go with the industry standard, I think we should. - Mark Fair enough. I dont think our biggest problem is whether we d

Re: Anybody uses PreAnalyzedField ?

2013-01-03 Thread Jack Krupansky
No, it's just like any other field type, except that it is truly literal - it specifies all the details of the token and attribute stream that will be output by the field analyzer, but in source form. In other words, it's a language for describing a token stream that can be compiled into the act

Oddity in /admin/ping

2013-01-03 Thread Shawn Heisey
I am seeing something a little weird in Solr branch_4x. It looks like 3.5 may have a similar issue, but I'd like to concentrate on the newer version for now. I am sending a request to /admin/ping. This in turn calls a request handler (using qt) that initiates a distributed search. Below, I'

Re: Most Popular Search / ExternalFileField

2013-01-03 Thread Mikhail Khludnev
Atuldj, You are right. There is no such out-of-the-box feature in Solr. The most closet thing which I'm aware of is Lucidworks.lucidimagination.com/display/lweug/Click+Scoring+Relevance+Framework But I've never use it. ExternalFileField is an appropriate building block. I'm trying to contribute so

SolrCloud Commit issues

2013-01-03 Thread davers
I am having issues any time I add documents or delete documents. The issue is that the log is reporting that the commit is happening but when I search after the commit I get no change in the result set. It's only after I manually commit again that I can see the new results. For example I have a So

Re: Solr 4.0 SolrCloud with AWS Auto Scaling

2013-01-03 Thread Mark Miller
Technically, you want to make sure zookeeper reports the node as live and active. You could use the same api that the UI uses for that - the localhost:port/solr/zookeeper (I think?) servlet. If you can't reach it for a node, it's obviously down - if you can reach it, parse the json and see if

Re: What is group.query?

2013-01-03 Thread Yonik Seeley
>From http://wiki.apache.org/solr/FieldCollapsing "Return a single group of documents that also match the given query." ''' We can find the top documents that also match arbitrary queries with the group.query command (much like facet.query). For example, we could use this to find the top 3 docume

What is group.query?

2013-01-03 Thread Lance Norskog
What does group.query do? How is it different from q= and fq= ? Thanks.

Issue streaming .zip file using ContentStreamBase

2013-01-03 Thread Danny Dvinov
Hello, Wondering if anyone could point me to the right way of streaming a .zip file: my goal is to stream a zipped version of the index. I zip up the index files I get from calling IndexCommit#getFileNames, and then attempt to stream using a custom handler with the following in handleRequestBod

Re: Solr 4.0 SolrCloud with AWS Auto Scaling

2013-01-03 Thread Bill Au
Thanks, Mark. That does remove the node. And it seems to do so permanently. Even when I restart Solr after unloading, it does not join the SolrCloud cluster. And I can get it to re-join the cluster by creating the core. Anyone know if there is an API to determine the state of a node. When AWS

Streaming .zip file using ContentStreamBase

2013-01-03 Thread Danny Dvinov
Hello, Wondering if anyone could point me to the right way of streaming a .zip file: my goal is to stream a zipped version of the index. I zip up the index files I get from calling IndexCommit#getFileNames, and then attempt to stream using a custom handler with the following in handleRequestBod

Re: How heavy is HttpSolrServer

2013-01-03 Thread Shawn Heisey
On 1/3/2013 3:02 PM, Benjamin, Roy wrote: I currently create a new instance of HttpSolrServer for each update. It's convenient when sharding over a hundred shards in a heavily threaded updating client. How heavy is this class? Would it really be worth using a pool (map of pools really) to hol

How heavy is HttpSolrServer

2013-01-03 Thread Benjamin, Roy
I currently create a new instance of HttpSolrServer for each update. It's convenient when sharding over a hundred shards in a heavily threaded updating client. How heavy is this class? Would it really be worth using a pool (map of pools really) to hold on to previously created instances? Tha

Re: solr4.0 problem zkHost with multiple hosts throws out of range exception

2013-01-03 Thread Mark Miller
Cool, we should add this to the wiki. -Mark On Thursday, January 3, 2013, cmuarg wrote: > the solution: –DzkHost=zoo1:8983,zoo2:8983,zoo3:8983/solrroot > > thanks > /C > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/solr4-0-problem-zkHost-with-multiple-hosts-throw

theory of sets

2013-01-03 Thread Uwe Reh
Hi, I'm looking for a tricky solution of a common problem. I have to handle a lot of items and each could be member of several groups. - "OK, just add a field called 'member_of'" No that's not enough, because each group is sorted and each member has a sortstring for this group. - "OK, still e

Re: Force SolrJ 4.0.0 to use XML to talk to Solr 1.4.1 server

2013-01-03 Thread Mark Bennett
Thank you Sean for the option. Your second post made me smile! -- Mark Bennett / New Idea Engineering, Inc. / mbenn...@ideaeng.com Direct: 408-733-0387 / Main: 866-IDEA-ENG / Cell: 408-829-6513 On Thu, Jan 3, 2013 at 12:21 PM, Shawn Heisey wrote: > On 1/3/2013 12:39 PM, Shawn Heisey wrote: >

Re: Terminology question: Core vs. Collection vs...

2013-01-03 Thread Darren Govoni
I see. So sharding and distributing/replicating can have separate and different advantages. On 01/03/2013 01:06 PM, Lance Norskog wrote: Also, searching can be much faster if you put all of the shards on one machine, and the search distributor. That way, you search with multiple simultaneous t

Re: Force SolrJ 4.0.0 to use XML to talk to Solr 1.4.1 server

2013-01-03 Thread Shawn Heisey
On 1/3/2013 12:39 PM, Shawn Heisey wrote: This should work: server.setParser(new XMLResponseParser()); Additional note, because I seem to have trouble getting this through the heads of the developers in my organization. Mark, this is not directed at you, I just feel it may need saying in ge

Re: solr4.0 problem zkHost with multiple hosts throws out of range exception

2013-01-03 Thread cmuarg
the solution: –DzkHost=zoo1:8983,zoo2:8983,zoo3:8983/solrroot thanks /C -- View this message in context: http://lucene.472066.n3.nabble.com/solr4-0-problem-zkHost-with-multiple-hosts-throws-out-of-range-exception-tp4014440p4030394.html Sent from the Solr - User mailing list archive at Nabble.c

Re: Force SolrJ 4.0.0 to use XML to talk to Solr 1.4.1 server

2013-01-03 Thread Shawn Heisey
On 1/3/2013 12:24 PM, Mark Bennett wrote: I know I've seen this before, but I'll be darned if I can find it on Google. I have a SolrJ app that normally submits data to Solr 4.x. But sometimes it needs to submit to 1.4.1 for reasons I won't go in to. I'd like to stick with the 4.x jar files, bu

Force SolrJ 4.0.0 to use XML to talk to Solr 1.4.1 server

2013-01-03 Thread Mark Bennett
I know I've seen this before, but I'll be darned if I can find it on Google. I have a SolrJ app that normally submits data to Solr 4.x. But sometimes it needs to submit to 1.4.1 for reasons I won't go in to. I'd like to stick with the 4.x jar files, but still submit to 1.x, and my understanding

Re: Solr Collection API doesn't seem to be working

2013-01-03 Thread davers
Yes that is exactly what I was hoping for. I can live with just adding nodes manually for now. Would be nice if this feature was included in 4.1 though as I will be waiting for the 4.1 release to make the jump to SolrCloud. -- View this message in context: http://lucene.472066.n3.nabble.com/Sol

Re: Odd exceptions in both 3.5 and 4.1-SNAPSHOT

2013-01-03 Thread Shawn Heisey
On 1/3/2013 10:40 AM, Shawn Heisey wrote: On 1/3/2013 8:11 AM, Michael Ryan wrote: We see these EofExceptions in our system occasionally. I believe they occur when our SolrJ client times out and closes the connection, before Jetty returns the response. This would make sense - the load balancer

Re: Upgrading from 3.6 to 4.0

2013-01-03 Thread Lance Norskog
Please start new mail threads for new questions. This makes it much easier to research old mail threads. Old mail is often the only documentation for some problems. On 01/02/2013 10:04 AM, Benjamin, Roy wrote: Will the existing 3.6 indexes work with 4.0 binary ? Will 3.6 solrJ clients work wit

Re: Terminology question: Core vs. Collection vs...

2013-01-03 Thread Lance Norskog
Also, searching can be much faster if you put all of the shards on one machine, and the search distributor. That way, you search with multiple simultaneous threads inside one machine. I've seen this make searches several times faster. On 01/03/2013 06:36 AM, Jack Krupansky wrote: Ah... the mul

Re: solr4.0 problem zkHost with multiple hosts throws out of range exception

2013-01-03 Thread Tomás Fernández Löbbe
I think it should be –DzkHost=zoo1:8983,zoo2:8983,zoo3:8983/solrroot Tomás On Thu, Jan 3, 2013 at 2:14 PM, Mark Miller wrote: > I don't really understand your question. More than one what? > > More than one external zk node? Start up an ensemble, and pass a comma sep > list of the addresses

Re: Odd exceptions in both 3.5 and 4.1-SNAPSHOT

2013-01-03 Thread Shawn Heisey
On 1/3/2013 8:11 AM, Michael Ryan wrote: We see these EofExceptions in our system occasionally. I believe they occur when our SolrJ client times out and closes the connection, before Jetty returns the response. This would make sense - the load balancer probably drops the healthcheck connecti

Re: solr4.0 problem zkHost with multiple hosts throws out of range exception

2013-01-03 Thread Mark Miller
I don't really understand your question. More than one what? More than one external zk node? Start up an ensemble, and pass a comma sep list of the addresses as the zkhost - each one should have the same chroot on it. - Mark On Jan 3, 2013, at 4:32 AM, cmuarg wrote: > Hello > > I have a zook

RE: Re: Terminology question: Core vs. Collection vs...

2013-01-03 Thread Darren Govoni
And based on the previous explanation there is never a "copy of a shard". A shard represents and contains only replicas for itself, replicas being copies of cores within the shard. --- Original Message --- On 1/3/2013 11:58 AM Walter Underwood wrote:A "factor" is multiplied, so multip

Re: solr4.0 problem zkHost with multiple hosts throws out of range exception

2013-01-03 Thread cmuarg
Hello I have a zookeeper ensemble that is also used for other purposes and I don’t want the zookeeper root get messed up with solrcloud things so I try to use ‘chroot’. One external zookeeper node works fine with –DzkHost=zoo1:8983/solrroot (solrroot must exist) but how specify more than one? Th

Re: Terminology question: Core vs. Collection vs...

2013-01-03 Thread Walter Underwood
A "factor" is multiplied, so multiplying the leader by a replicationFactor of 1 means you have exactly one copy of that shard. I think that recycling the term "replication" within Solr was confusing, but it is a bit late to change that. wunder On Jan 3, 2013, at 7:33 AM, Mark Miller wrote: >

Re: Solr Collection API doesn't seem to be working

2013-01-03 Thread Yonik Seeley
On Thu, Jan 3, 2013 at 8:46 AM, Per Steffensen wrote: > There are defaults for both replicationFactor and maxShardsPerNode, so non > of them HAS to be provided - default is 1 in both cases. > > int repFactor = msgStrToInt(message, REPLICATION_FACTOR, 1); > int maxShardsPerNode = msgStr

Re: Anybody uses PreAnalyzedField ?

2013-01-03 Thread Alexandre Rafalovitch
So, do you need a custom request handler? Or it somehow fits into (say) eDismax handler? Regards, Alex. Personal blog: http://blog.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately,

Re: Terminology question: Core vs. Collection vs...

2013-01-03 Thread Mark Miller
On Jan 3, 2013, at 10:55 AM, Mark Miller wrote: > > On Jan 3, 2013, at 10:42 AM, Per Steffensen wrote: > >> "Why Solr is better than its competitors" list :-) > > The problem is that it's not just Solr competitors. It seems to be pretty > much everyone. If you can provide counter examples,

Re: Terminology question: Core vs. Collection vs...

2013-01-03 Thread Mark Miller
On Jan 3, 2013, at 10:42 AM, Per Steffensen wrote: > "Why Solr is better than its competitors" list :-) The problem is that it's not just Solr competitors. It seems to be pretty much everyone. If you can provide counter examples, I'd be interested to see them, but I've found confirmation exam

RE: Re: Terminology question: Core vs. Collection vs...

2013-01-03 Thread Darren Govoni
Great point. --- Original Message --- On 1/3/2013 10:42 AM Per Steffensen wrote:On 1/3/13 4:33 PM, Mark Miller wrote: > This has pretty much become the standard across other distributed systems and in the literat…err…books. Hmmm Im not sure you are right about that. Maybe more than one

Re: Terminology question: Core vs. Collection vs...

2013-01-03 Thread Per Steffensen
On 1/3/13 4:33 PM, Mark Miller wrote: This has pretty much become the standard across other distributed systems and in the literat…err…books. Hmmm Im not sure you are right about that. Maybe more than one distributed system calls them "Replica", but there is also a lot that doesnt. But if you

Re: Solr Collection API doesn't seem to be working

2013-01-03 Thread Mark Miller
Happy to clarify. - Mark On Jan 3, 2013, at 10:02 AM, Per Steffensen wrote: > Ok, sorry. Easy to misunderstand, though. > > On 1/3/13 3:58 PM, Mark Miller wrote: >> MAX_INT is just a place holder for a high value given the context of this >> guy wanting to add replicas for as many machines as

Re: Terminology question: Core vs. Collection vs...

2013-01-03 Thread Mark Miller
This has pretty much become the standard across other distributed systems and in the literat…err…books. I first implemented it as you mention you'd like, but Yonik correctly pointed out that we were going against the grain. - Mark On Jan 3, 2013, at 10:01 AM, Per Steffensen wrote: > For the

Re: Anybody uses PreAnalyzedField ?

2013-01-03 Thread Jack Krupansky
You need to present your query terms in the same format as the pre-analyzed terms come in. In other words, you need to do the pre-analysis yourself when constructing the query. -- Jack Krupansky -Original Message- From: Alexandre Rafalovitch Sent: Thursday, January 03, 2013 5:53 AM

RE: Odd exceptions in both 3.5 and 4.1-SNAPSHOT

2013-01-03 Thread Michael Ryan
We see these EofExceptions in our system occasionally. I believe they occur when our SolrJ client times out and closes the connection, before Jetty returns the response. -Michael -Original Message- From: Shawn Heisey [mailto:s...@elyograg.org] Sent: Thursday, January 03, 2013 10:07 AM

Re: Terminology question: Core vs. Collection vs...

2013-01-03 Thread Jack Krupansky
Yes, in the context of SolrCloud, "Node" = "Solr server JVM". So, "node" is an instance of Solr, which can support multiple cores and multiple collections - or at least shards of multiple collections. -- Jack Krupansky -Original Message- From: Per Steffensen Sent: Thursday, January

Odd exceptions in both 3.5 and 4.1-SNAPSHOT

2013-01-03 Thread Shawn Heisey
I'm running 3.5.0 in production (with an old patch from SOLR-1972) and yesterday's branch_4x in dev (with the most recent SOLR-1972 patch). Both versions are spitting occasional exceptions. You can see them both here: http://pastie.org/private/o2ekh0drs4syqb6t8re4w I'm pretty sure that the 4

Re: Solr Collection API doesn't seem to be working

2013-01-03 Thread Per Steffensen
Ok, sorry. Easy to misunderstand, though. On 1/3/13 3:58 PM, Mark Miller wrote: MAX_INT is just a place holder for a high value given the context of this guy wanting to add replicas for as many machines as he adds down the line. You are taking it too literally. - Mark

Re: Terminology question: Core vs. Collection vs...

2013-01-03 Thread Per Steffensen
For the same reasons that "Replica" shouldnt be called "Replica" (it requires to long an explanation to agree that it is an ok name), "replicationFactor" shouldnt be called "replicationFactor" and long as it referes to the TOTAL number of cores you get for your "Shard". "replicationFactor" woul

Re: Solr Collection API doesn't seem to be working

2013-01-03 Thread Mark Miller
MAX_INT is just a place holder for a high value given the context of this guy wanting to add replicas for as many machines as he adds down the line. You are taking it too literally. - Mark On Jan 3, 2013, at 9:02 AM, Per Steffensen wrote: > On 1/3/13 2:50 AM, Mark Miller wrote: >> Unfortunate

Re: Terminology question: Core vs. Collection vs...

2013-01-03 Thread Per Steffensen
Hi Here is my version - do not believe the explanations have been very clear We have the following concepts (here I will try to explain what each the concept cover without naming it - its hard) 1) Machines (virtual or physical) running Solr server JVMs (one machine can run several Solr server

Re: Terminology question: Core vs. Collection vs...

2013-01-03 Thread Jack Krupansky
Ah... the multiple shards (of the same collection) in a single node is about planning for future expansion of your cluster - create more shards than you need today, put more of them on a single node and then migrate them to their own nodes as the data outgrows the smaller number of nodes. In oth

Re: Re: Terminology question: Core vs. Collection vs...

2013-01-03 Thread Yonik Seeley
On Thu, Jan 3, 2013 at 9:17 AM, Darren Govoni wrote: > I think what's confusing about your explanation below is when you have a > situation where there is no replication factor. That's possible too, yes? > > So in that case, is each core of a shard of a collection, still referred to > as a replica

Re: Terminology question: Core vs. Collection vs...

2013-01-03 Thread Jack Krupansky
There is always a replication factor, but it could be 1 - meaning there is only a single replica of the data for a shard. You can't have a replication factor of 0 - that would mean the data does not exist. Don't confuse the old pre-SolrCloud master/slave use of replica. There is no "replicatio

Re: Terminology question: Core vs. Collection vs...

2013-01-03 Thread Mark Miller
On Jan 3, 2013, at 9:17 AM, Darren Govoni wrote: > Even a non-replicated core is called a replica? To some :) Forcing agreement on terminology has been … challenging… And even if there is some agreement, new people come, old people that were not around for the agreement come back, etc. Usua

RE: Re: Terminology question: Core vs. Collection vs...

2013-01-03 Thread Darren Govoni
Ah, ok. Good. Makes sense. I think I will draw all this up in a UML that includes the distinction between the "logical" terms and the "physical" terms (and their mapping) as they do get intertwined. I'll post it here when I'm done. --- Original Message --- On 1/3/2013 09:19 AM Jack Kr

Re: Terminology question: Core vs. Collection vs...

2013-01-03 Thread Jack Krupansky
A single shard MAY exist on a single core, but only if it is not replicated. Generally, a single shard will exist on multiple cores, each a replica of the source data as it comes into the update handler. -- Jack Krupansky -Original Message- From: Darren Govoni Sent: Thursday, January

RE: Re: Terminology question: Core vs. Collection vs...

2013-01-03 Thread Darren Govoni
Yes. And its worth to note that when having multiple shards in a single node(@deprecated) that they are shards of different collections... --- Original Message --- On 1/3/2013 09:16 AM Jack Krupansky wrote:And I would revise "node" to note that in SolrCloud a node is simply an instance

RE: Re: Terminology question: Core vs. Collection vs...

2013-01-03 Thread Darren Govoni
I think what's confusing about your explanation below is when you have a situation where there is no replication factor. That's possible too, yes? So in that case, is each core of a shard of a collection, still referred to as a replica? To me a replica is a duplicate/backup of a shard's core.

Re: Terminology question: Core vs. Collection vs...

2013-01-03 Thread Jack Krupansky
And I would revise "node" to note that in SolrCloud a node is simply an instance of a Solr server. And, technically, you can have multiple shards in a single instance of Solr, separating the logical sharding of keys from the distribution of the data. -- Jack Krupansky -Original Message--

RE: Re: Terminology question: Core vs. Collection vs...

2013-01-03 Thread Darren Govoni
Thanks. I got that part. A group of shards (and therefore cores) represent a collection, yes. But a single shard exist only on a single core? --- Original Message --- On 1/3/2013 09:03 AM Jack Krupansky wrote:No, a shard is a subset (or "slice") of the collection. Sharding is a way of

Re: Solr 4.0 SolrCloud with AWS Auto Scaling

2013-01-03 Thread Mark Miller
http://wiki.apache.org/solr/CoreAdmin#UNLOAD - Mark On Jan 3, 2013, at 9:06 AM, Bill Au wrote: > Mark, > What do you mean by "unload them"? > > I am using an AWS load balancer with my auto scaling group in stead of > using Solr's built-in load balancer. I am no sharding my index. I am >

Re: Terminology question: Core vs. Collection vs...

2013-01-03 Thread Jack Krupansky
Oops... let me word that a little more carefully: ...we are "replicating the data of each shard". -- Jack Krupansky -Original Message- From: Jack Krupansky Sent: Thursday, January 03, 2013 9:03 AM To: solr-user@lucene.apache.org Subject: Re: Terminology question: Core vs. Collectio

Re: indexing cpu utilization

2013-01-03 Thread Mark Miller
On Jan 3, 2013, at 5:40 AM, Uwe Reh wrote: > "use more threads" vs. "use less threads" > It is a bit confusing. My point was to make sure you are using more than one thread. With 32 cores, probably a lot more than one thread. Otis' point was that you can also use too many threads. Both are

Re: Solr 4.0 SolrCloud with AWS Auto Scaling

2013-01-03 Thread Bill Au
Mark, What do you mean by "unload them"? I am using an AWS load balancer with my auto scaling group in stead of using Solr's built-in load balancer. I am no sharding my index. I am using SolrCloud for replication only. I am doing local search on each instance and sending all updates to the

Re: Terminology question: Core vs. Collection vs...

2013-01-03 Thread Jack Krupansky
No, a shard is a subset (or "slice") of the collection. Sharding is a way of "slicing" the original data, before we talk about how the shards get stored and replicated on actual Solr cores. Replicas are instances of the data for a shard. Sometimes people may loosely speak of a replica as being

Re: Solr Collection API doesn't seem to be working

2013-01-03 Thread Per Steffensen
On 1/3/13 2:50 AM, Mark Miller wrote: Unfortunately, for 4.0, the collections API was pretty bare bones. You don't actually get back responses currently - you just pass off the create command to zk for the Overseer to pick up and execute. So you actually have to check the logs of the Overseer

Re: Solr 4.0 SolrCloud with AWS Auto Scaling

2013-01-03 Thread Bill Au
With AWS auto scaling, one can specify a minimum number of instances for an auto scaling group. So there should never be an insufficient number of replicas. Once can also specify a termination policy so that the newly added nodes are removed first. But with SolrCloud as long as there are enough

RE: Re: Terminology question: Core vs. Collection vs...

2013-01-03 Thread Darren Govoni
Thanks again. (And sorry to jump into this convo) But I had a question on your statement: On 1/3/2013 08:07 AM Jack Krupansky wrote: Collection is the more modern term and incorporates the fact that the collection may be sharded, with each shard on one or more cores, with each core being a r

Re: Solr Collection API doesn't seem to be working

2013-01-03 Thread Per Steffensen
On 1/3/13 3:05 AM, davers wrote: This is what I get from the leader overseer log: 2013-01-02 18:04:24,663 - INFO [ProcessThread:-1:PrepRequestProcessor@419] - Got user-level KeeperException when processing sessionid:0x23bfe1d4c280001 type:create cxid:0x58 zxid:0xfffe txntype:unknown

Re: Solr Collection API doesn't seem to be working

2013-01-03 Thread Per Steffensen
There are defaults for both replicationFactor and maxShardsPerNode, so non of them HAS to be provided - default is 1 in both cases. int repFactor = msgStrToInt(message, REPLICATION_FACTOR, 1); int maxShardsPerNode = msgStrToInt(message, MAX_SHARDS_PER_NODE, 1); Remember than replica

Re: MoreLikeThis supporting multiple document IDs as input?

2013-01-03 Thread Jack Krupansky
The MLT search component is enabled using &mlt=true and works on any normal Solr query. It gives a batch of similar documents for each search result of the original query, one batch per original query result. It uses the &mlt.count=n parameter to control how many similar results to return for e

Re: Terminology question: Core vs. Collection vs...

2013-01-03 Thread Jack Krupansky
A node is a machine in a cluster or cloud (graph). It could be a real machine or a virtualized machine. Technically, you could have multiple virtual nodes on the same physical "box". Each Solr replica would be on a different node. Technically, you could have multiple Solr instances running on

RE: Re: Terminology question: Core vs. Collection vs...

2013-01-03 Thread Darren Govoni
Good write up. And what about "node"? I think there needs to be an official glossary of terms that is sanctioned by the solr team and some terms still ni use may need to be labeled "deprecated". After so many years, its still confusing. --- Original Message --- On 1/3/2013 08:07 AM

Re: Terminology question: Core vs. Collection vs...

2013-01-03 Thread Jack Krupansky
Collection is the more modern term and incorporates the fact that the collection may be sharded, with each shard on one or more cores, with each core being a replica of the other cores within that shard of that collection. Instance is a general term, but is commonly used to refer to a running

RE: CPU spikes on trunk

2013-01-03 Thread Markus Jelsma
Thanks for pointing to visualvm again, will check that first in the future. There is no problem with trunk but there was a problem with my GC settings. I forgot to add an additional 0 to -XX:MaxGCPauseMillis so it became too small. Thanks, Markus -Original message- > From:Mark Miller

Re: indexing cpu utilization

2013-01-03 Thread Uwe Reh
Hi, thank you for the hints. On 3 January 2013 05:55, Mark Miller wrote: 32 cores eh? You probably have to raise some limits to take advantage of that. 32 cores isn't that much anymore. You can buy amd servers from Supermicro with two sockets and 32G of ram for less than 2500$. Systems with

Re: Terminology question: Core vs. Collection vs...

2013-01-03 Thread Alexandre Rafalovitch
Haven't seen these yet. These look like a great start, though now I see even more terms to figure out. Thank you, Alex. Personal blog: http://blog.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all a

Re: SolrJ | IOException while Indexing a PDF document with additional fields

2013-01-03 Thread uwe72
wasn't it the stacetrace in my posting before? It is the same behavior when i use the HttpSolrServer.java here is the console output of the solr server: 03.01.2013 11:32:31 org.apache.solr.core.SolrDeletionPolicy updateCommits INFO: newest commit = 1 03.01.2013 11:32:31 org.apache.solr.update.pr

Re: Terminology question: Core vs. Collection vs...

2013-01-03 Thread Aloke Ghoshal
Hi, If you haven't already, please refer to: http://www.ngdata.com/site/blog/57-ng.html http://lucene.472066.n3.nabble.com/solr-cloud-concepts-td3726292.html http://wiki.apache.org/solr/SolrCloud#FAQ Regards, Aloke On Thu, Jan 3, 2013 at 3:12 PM, Alexandre Rafalovitch wrote: > Hello, > > I am

Terminology question: Core vs. Collection vs...

2013-01-03 Thread Alexandre Rafalovitch
Hello, I am trying to understand the core Solr terminology. I am looking for correct rather than loose meaning as I am trying to teach an example that starts from easy scenario and may scale to multi-core, multi-machine situation. Here are the terms that seem to be all overlapping and/or crossing

RE: MoreLikeThis supporting multiple document IDs as input?

2013-01-03 Thread David Parks
I'm not seeing the results I would expect. In the previous email below it's stated that the "MLT search component" returns N results and K similar documents per EACH of the N results. If I'm not mistaken I access the "MLT search component" via a query to /solr/select/?qt=mlt, such as this: http:/