On 1/3/13 5:58 PM, Walter Underwood wrote:
A "factor" is multiplied, so multiplying the leader by a replicationFactor of 1
means you have exactly one copy of that shard.
I think that recycling the term "replication" within Solr was confusing, but it
is a bit late to change that.
wunder
Yes, t
On 1/3/13 5:26 PM, Yonik Seeley wrote:
I agree - it's pointless to have two replicas of the same shard on a
single node. But I'm talking about having replicationFactor as a
target, so when you start up *new* nodes they will become a replica
for any shard where the number of replicas is currentl
Hi Everyone,
I am looking for search features similar to what Endeca provides in Solr. I
have googled a lot could not get answer for few and for some I could not get a
right answers.
1) Keyword re-direct : On search particular keyword user will be
redirected to configured URL
2) Di
On 1/3/13 4:55 PM, Mark Miller wrote:
Trying to forge our own path here seems more confusing than helpful
IMO. We have enough issues with terminology right now - where we can
go with the industry standard, I think we should. - Mark
Fair enough.
I dont think our biggest problem is whether we d
No, it's just like any other field type, except that it is truly literal -
it specifies all the details of the token and attribute stream that will be
output by the field analyzer, but in source form. In other words, it's a
language for describing a token stream that can be compiled into the act
I am seeing something a little weird in Solr branch_4x. It looks like
3.5 may have a similar issue, but I'd like to concentrate on the newer
version for now.
I am sending a request to /admin/ping. This in turn calls a request
handler (using qt) that initiates a distributed search. Below, I'
Atuldj,
You are right. There is no such out-of-the-box feature in Solr. The most
closet thing which I'm aware of is
Lucidworks.lucidimagination.com/display/lweug/Click+Scoring+Relevance+Framework
But I've never use it.
ExternalFileField is an appropriate building block.
I'm trying to contribute so
I am having issues any time I add documents or delete documents. The issue is
that the log is reporting that the commit is happening but when I search
after the commit I get no change in the result set. It's only after I
manually commit again that I can see the new results.
For example I have a So
Technically, you want to make sure zookeeper reports the node as live and
active.
You could use the same api that the UI uses for that - the
localhost:port/solr/zookeeper (I think?) servlet.
If you can't reach it for a node, it's obviously down - if you can reach it,
parse the json and see if
>From http://wiki.apache.org/solr/FieldCollapsing
"Return a single group of documents that also match the given query."
'''
We can find the top documents that also match arbitrary queries with
the group.query command (much like facet.query). For example, we could
use this to find the top 3 docume
What does group.query do? How is it different from q= and fq= ?
Thanks.
Hello,
Wondering if anyone could point me to the right way of streaming a .zip file:
my goal is to stream a zipped version of the index. I zip up the index files I
get from calling IndexCommit#getFileNames, and then attempt to stream using a
custom handler with the following in handleRequestBod
Thanks, Mark.
That does remove the node. And it seems to do so permanently. Even when I
restart Solr after unloading, it does not join the SolrCloud cluster. And
I can get it to re-join the cluster by creating the core.
Anyone know if there is an API to determine the state of a node. When AWS
Hello,
Wondering if anyone could point me to the right way of streaming a .zip file:
my goal is to stream a zipped version of the index. I zip up the index files I
get from calling IndexCommit#getFileNames, and then attempt to stream using a
custom handler with the following in handleRequestBod
On 1/3/2013 3:02 PM, Benjamin, Roy wrote:
I currently create a new instance of HttpSolrServer for each update. It's
convenient when sharding
over a hundred shards in a heavily threaded updating client. How heavy is this
class? Would
it really be worth using a pool (map of pools really) to hol
I currently create a new instance of HttpSolrServer for each update. It's
convenient when sharding
over a hundred shards in a heavily threaded updating client. How heavy is this
class? Would
it really be worth using a pool (map of pools really) to hold on to previously
created instances?
Tha
Cool, we should add this to the wiki.
-Mark
On Thursday, January 3, 2013, cmuarg wrote:
> the solution: –DzkHost=zoo1:8983,zoo2:8983,zoo3:8983/solrroot
>
> thanks
> /C
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/solr4-0-problem-zkHost-with-multiple-hosts-throw
Hi,
I'm looking for a tricky solution of a common problem. I have to handle
a lot of items and each could be member of several groups.
- "OK, just add a field called 'member_of'"
No that's not enough, because each group is sorted and each member has a
sortstring for this group.
- "OK, still e
Thank you Sean for the option.
Your second post made me smile!
--
Mark Bennett / New Idea Engineering, Inc. / mbenn...@ideaeng.com
Direct: 408-733-0387 / Main: 866-IDEA-ENG / Cell: 408-829-6513
On Thu, Jan 3, 2013 at 12:21 PM, Shawn Heisey wrote:
> On 1/3/2013 12:39 PM, Shawn Heisey wrote:
>
I see. So sharding and distributing/replicating can have separate and
different advantages.
On 01/03/2013 01:06 PM, Lance Norskog wrote:
Also, searching can be much faster if you put all of the shards on one
machine, and the search distributor. That way, you search with
multiple simultaneous t
On 1/3/2013 12:39 PM, Shawn Heisey wrote:
This should work:
server.setParser(new XMLResponseParser());
Additional note, because I seem to have trouble getting this through the
heads of the developers in my organization. Mark, this is not directed
at you, I just feel it may need saying in ge
the solution: –DzkHost=zoo1:8983,zoo2:8983,zoo3:8983/solrroot
thanks
/C
--
View this message in context:
http://lucene.472066.n3.nabble.com/solr4-0-problem-zkHost-with-multiple-hosts-throws-out-of-range-exception-tp4014440p4030394.html
Sent from the Solr - User mailing list archive at Nabble.c
On 1/3/2013 12:24 PM, Mark Bennett wrote:
I know I've seen this before, but I'll be darned if I can find it on Google.
I have a SolrJ app that normally submits data to Solr 4.x. But sometimes
it needs to submit to 1.4.1 for reasons I won't go in to.
I'd like to stick with the 4.x jar files, bu
I know I've seen this before, but I'll be darned if I can find it on Google.
I have a SolrJ app that normally submits data to Solr 4.x. But sometimes
it needs to submit to 1.4.1 for reasons I won't go in to.
I'd like to stick with the 4.x jar files, but still submit to 1.x, and my
understanding
Yes that is exactly what I was hoping for. I can live with just adding nodes
manually for now. Would be nice if this feature was included in 4.1 though
as I will be waiting for the 4.1 release to make the jump to SolrCloud.
--
View this message in context:
http://lucene.472066.n3.nabble.com/Sol
On 1/3/2013 10:40 AM, Shawn Heisey wrote:
On 1/3/2013 8:11 AM, Michael Ryan wrote:
We see these EofExceptions in our system occasionally. I believe they
occur when our SolrJ client times out and closes the connection,
before Jetty returns the response.
This would make sense - the load balancer
Please start new mail threads for new questions. This makes it much
easier to research old mail threads. Old mail is often the only
documentation for some problems.
On 01/02/2013 10:04 AM, Benjamin, Roy wrote:
Will the existing 3.6 indexes work with 4.0 binary ?
Will 3.6 solrJ clients work wit
Also, searching can be much faster if you put all of the shards on one
machine, and the search distributor. That way, you search with multiple
simultaneous threads inside one machine. I've seen this make searches
several times faster.
On 01/03/2013 06:36 AM, Jack Krupansky wrote:
Ah... the mul
I think it should be
–DzkHost=zoo1:8983,zoo2:8983,zoo3:8983/solrroot
Tomás
On Thu, Jan 3, 2013 at 2:14 PM, Mark Miller wrote:
> I don't really understand your question. More than one what?
>
> More than one external zk node? Start up an ensemble, and pass a comma sep
> list of the addresses
On 1/3/2013 8:11 AM, Michael Ryan wrote:
We see these EofExceptions in our system occasionally. I believe they occur
when our SolrJ client times out and closes the connection, before Jetty returns
the response.
This would make sense - the load balancer probably drops the healthcheck
connecti
I don't really understand your question. More than one what?
More than one external zk node? Start up an ensemble, and pass a comma sep list
of the addresses as the zkhost - each one should have the same chroot on it.
- Mark
On Jan 3, 2013, at 4:32 AM, cmuarg wrote:
> Hello
>
> I have a zook
And based on the previous explanation there is never a "copy of a shard". A
shard represents and contains only replicas for itself, replicas being copies of cores
within the shard.
--- Original Message ---
On 1/3/2013 11:58 AM Walter Underwood wrote:A "factor" is multiplied, so
multip
Hello
I have a zookeeper ensemble that is also used for other purposes and I don’t
want the zookeeper root get messed up with solrcloud things so I try to use
‘chroot’.
One external zookeeper node works fine with –DzkHost=zoo1:8983/solrroot
(solrroot must exist) but how specify more than one?
Th
A "factor" is multiplied, so multiplying the leader by a replicationFactor of 1
means you have exactly one copy of that shard.
I think that recycling the term "replication" within Solr was confusing, but it
is a bit late to change that.
wunder
On Jan 3, 2013, at 7:33 AM, Mark Miller wrote:
>
On Thu, Jan 3, 2013 at 8:46 AM, Per Steffensen wrote:
> There are defaults for both replicationFactor and maxShardsPerNode, so non
> of them HAS to be provided - default is 1 in both cases.
>
> int repFactor = msgStrToInt(message, REPLICATION_FACTOR, 1);
> int maxShardsPerNode = msgStr
So, do you need a custom request handler? Or it somehow fits into (say)
eDismax handler?
Regards,
Alex.
Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at
once. Lately,
On Jan 3, 2013, at 10:55 AM, Mark Miller wrote:
>
> On Jan 3, 2013, at 10:42 AM, Per Steffensen wrote:
>
>> "Why Solr is better than its competitors" list :-)
>
> The problem is that it's not just Solr competitors. It seems to be pretty
> much everyone. If you can provide counter examples,
On Jan 3, 2013, at 10:42 AM, Per Steffensen wrote:
> "Why Solr is better than its competitors" list :-)
The problem is that it's not just Solr competitors. It seems to be pretty much
everyone. If you can provide counter examples, I'd be interested to see them,
but I've found confirmation exam
Great point.
--- Original Message ---
On 1/3/2013 10:42 AM Per Steffensen wrote:On 1/3/13 4:33 PM, Mark Miller
wrote:
> This has pretty much become the standard across other distributed systems
and in the literat…err…books.
Hmmm Im not sure you are right about that. Maybe more than one
On 1/3/13 4:33 PM, Mark Miller wrote:
This has pretty much become the standard across other distributed systems and
in the literat…err…books.
Hmmm Im not sure you are right about that. Maybe more than one
distributed system calls them "Replica", but there is also a lot that
doesnt. But if you
Happy to clarify.
- Mark
On Jan 3, 2013, at 10:02 AM, Per Steffensen wrote:
> Ok, sorry. Easy to misunderstand, though.
>
> On 1/3/13 3:58 PM, Mark Miller wrote:
>> MAX_INT is just a place holder for a high value given the context of this
>> guy wanting to add replicas for as many machines as
This has pretty much become the standard across other distributed systems and
in the literat…err…books.
I first implemented it as you mention you'd like, but Yonik correctly pointed
out that we were going against the grain.
- Mark
On Jan 3, 2013, at 10:01 AM, Per Steffensen wrote:
> For the
You need to present your query terms in the same format as the pre-analyzed
terms come in.
In other words, you need to do the pre-analysis yourself when constructing
the query.
-- Jack Krupansky
-Original Message-
From: Alexandre Rafalovitch
Sent: Thursday, January 03, 2013 5:53 AM
We see these EofExceptions in our system occasionally. I believe they occur
when our SolrJ client times out and closes the connection, before Jetty returns
the response.
-Michael
-Original Message-
From: Shawn Heisey [mailto:s...@elyograg.org]
Sent: Thursday, January 03, 2013 10:07 AM
Yes, in the context of SolrCloud, "Node" = "Solr server JVM".
So, "node" is an instance of Solr, which can support multiple cores and
multiple collections - or at least shards of multiple collections.
-- Jack Krupansky
-Original Message-
From: Per Steffensen
Sent: Thursday, January
I'm running 3.5.0 in production (with an old patch from SOLR-1972) and
yesterday's branch_4x in dev (with the most recent SOLR-1972 patch).
Both versions are spitting occasional exceptions. You can see them both
here:
http://pastie.org/private/o2ekh0drs4syqb6t8re4w
I'm pretty sure that the 4
Ok, sorry. Easy to misunderstand, though.
On 1/3/13 3:58 PM, Mark Miller wrote:
MAX_INT is just a place holder for a high value given the context of this guy
wanting to add replicas for as many machines as he adds down the line. You are
taking it too literally.
- Mark
For the same reasons that "Replica" shouldnt be called "Replica" (it
requires to long an explanation to agree that it is an ok name),
"replicationFactor" shouldnt be called "replicationFactor" and long as
it referes to the TOTAL number of cores you get for your "Shard".
"replicationFactor" woul
MAX_INT is just a place holder for a high value given the context of this guy
wanting to add replicas for as many machines as he adds down the line. You are
taking it too literally.
- Mark
On Jan 3, 2013, at 9:02 AM, Per Steffensen wrote:
> On 1/3/13 2:50 AM, Mark Miller wrote:
>> Unfortunate
Hi
Here is my version - do not believe the explanations have been very clear
We have the following concepts (here I will try to explain what each the
concept cover without naming it - its hard)
1) Machines (virtual or physical) running Solr server JVMs (one machine
can run several Solr server
Ah... the multiple shards (of the same collection) in a single node is about
planning for future expansion of your cluster - create more shards than you
need today, put more of them on a single node and then migrate them to their
own nodes as the data outgrows the smaller number of nodes. In oth
On Thu, Jan 3, 2013 at 9:17 AM, Darren Govoni wrote:
> I think what's confusing about your explanation below is when you have a
> situation where there is no replication factor. That's possible too, yes?
>
> So in that case, is each core of a shard of a collection, still referred to
> as a replica
There is always a replication factor, but it could be 1 - meaning there is
only a single replica of the data for a shard. You can't have a replication
factor of 0 - that would mean the data does not exist.
Don't confuse the old pre-SolrCloud master/slave use of replica. There is no
"replicatio
On Jan 3, 2013, at 9:17 AM, Darren Govoni wrote:
> Even a non-replicated core is called a replica?
To some :) Forcing agreement on terminology has been … challenging…
And even if there is some agreement, new people come, old people that were not
around for the agreement come back, etc.
Usua
Ah, ok. Good. Makes sense.
I think I will draw all this up in a UML that includes the distinction between the
"logical" terms and the "physical" terms (and their mapping) as they do get
intertwined. I'll post it here when I'm done.
--- Original Message ---
On 1/3/2013 09:19 AM Jack Kr
A single shard MAY exist on a single core, but only if it is not replicated.
Generally, a single shard will exist on multiple cores, each a replica of
the source data as it comes into the update handler.
-- Jack Krupansky
-Original Message-
From: Darren Govoni
Sent: Thursday, January
Yes. And its worth to note that when having multiple shards in a single
node(@deprecated) that they are shards of different collections...
--- Original Message ---
On 1/3/2013 09:16 AM Jack Krupansky wrote:And I would revise "node" to note that in SolrCloud a node is simply an
instance
I think what's confusing about your explanation below is when you have a
situation where there is no replication factor. That's possible too, yes?
So in that case, is each core of a shard of a collection, still referred to as a replica?
To me a replica is a duplicate/backup of a shard's core.
And I would revise "node" to note that in SolrCloud a node is simply an
instance of a Solr server.
And, technically, you can have multiple shards in a single instance of Solr,
separating the logical sharding of keys from the distribution of the data.
-- Jack Krupansky
-Original Message--
Thanks. I got that part.
A group of shards (and therefore cores) represent a collection, yes. But a single shard exist only on a single core?
--- Original Message ---
On 1/3/2013 09:03 AM Jack Krupansky wrote:No, a shard is a subset (or "slice") of the collection. Sharding is a way of
http://wiki.apache.org/solr/CoreAdmin#UNLOAD
- Mark
On Jan 3, 2013, at 9:06 AM, Bill Au wrote:
> Mark,
> What do you mean by "unload them"?
>
> I am using an AWS load balancer with my auto scaling group in stead of
> using Solr's built-in load balancer. I am no sharding my index. I am
>
Oops... let me word that a little more carefully:
...we are "replicating the data of each shard".
-- Jack Krupansky
-Original Message-
From: Jack Krupansky
Sent: Thursday, January 03, 2013 9:03 AM
To: solr-user@lucene.apache.org
Subject: Re: Terminology question: Core vs. Collectio
On Jan 3, 2013, at 5:40 AM, Uwe Reh wrote:
> "use more threads" vs. "use less threads"
> It is a bit confusing.
My point was to make sure you are using more than one thread. With 32 cores,
probably a lot more than one thread.
Otis' point was that you can also use too many threads.
Both are
Mark,
What do you mean by "unload them"?
I am using an AWS load balancer with my auto scaling group in stead of
using Solr's built-in load balancer. I am no sharding my index. I am
using SolrCloud for replication only. I am doing local search on each
instance and sending all updates to the
No, a shard is a subset (or "slice") of the collection. Sharding is a way of
"slicing" the original data, before we talk about how the shards get stored
and replicated on actual Solr cores. Replicas are instances of the data for
a shard.
Sometimes people may loosely speak of a replica as being
On 1/3/13 2:50 AM, Mark Miller wrote:
Unfortunately, for 4.0, the collections API was pretty bare bones. You don't
actually get back responses currently - you just pass off the create command to
zk for the Overseer to pick up and execute.
So you actually have to check the logs of the Overseer
With AWS auto scaling, one can specify a minimum number of instances for an
auto scaling group. So there should never be an insufficient number of
replicas. Once can also specify a termination policy so that the newly
added nodes are removed first.
But with SolrCloud as long as there are enough
Thanks again. (And sorry to jump into this convo)
But I had a question on your statement:
On 1/3/2013 08:07 AM Jack Krupansky wrote:
Collection is the more modern term and incorporates the fact that the
collection may be sharded, with each shard on one or more cores, with each
core being a r
On 1/3/13 3:05 AM, davers wrote:
This is what I get from the leader overseer log:
2013-01-02 18:04:24,663 - INFO [ProcessThread:-1:PrepRequestProcessor@419]
- Got user-level KeeperException when processing sessionid:0x23bfe1d4c280001
type:create cxid:0x58 zxid:0xfffe txntype:unknown
There are defaults for both replicationFactor and maxShardsPerNode, so
non of them HAS to be provided - default is 1 in both cases.
int repFactor = msgStrToInt(message, REPLICATION_FACTOR, 1);
int maxShardsPerNode = msgStrToInt(message, MAX_SHARDS_PER_NODE, 1);
Remember than replica
The MLT search component is enabled using &mlt=true and works on any normal
Solr query. It gives a batch of similar documents for each search result of
the original query, one batch per original query result. It uses the
&mlt.count=n parameter to control how many similar results to return for
e
A node is a machine in a cluster or cloud (graph). It could be a real
machine or a virtualized machine. Technically, you could have multiple
virtual nodes on the same physical "box". Each Solr replica would be on a
different node.
Technically, you could have multiple Solr instances running on
Good write up.
And what about "node"?
I think there needs to be an official glossary of terms that is sanctioned by the solr
team and some terms still ni use may need to be labeled "deprecated". After so
many years, its still confusing.
--- Original Message ---
On 1/3/2013 08:07 AM
Collection is the more modern term and incorporates the fact that the
collection may be sharded, with each shard on one or more cores, with each
core being a replica of the other cores within that shard of that
collection.
Instance is a general term, but is commonly used to refer to a running
Thanks for pointing to visualvm again, will check that first in the future.
There is no problem with trunk but there was a problem with my GC settings. I
forgot to add an additional 0 to -XX:MaxGCPauseMillis so it became too small.
Thanks,
Markus
-Original message-
> From:Mark Miller
Hi,
thank you for the hints.
On 3 January 2013 05:55, Mark Miller wrote:
32 cores eh? You probably have to raise some limits to take advantage of
that.
32 cores isn't that much anymore. You can buy amd servers from
Supermicro with two sockets and 32G of ram for less than 2500$. Systems
with
Haven't seen these yet. These look like a great start, though now I see
even more terms to figure out.
Thank you,
Alex.
Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all a
wasn't it the stacetrace in my posting before?
It is the same behavior when i use the HttpSolrServer.java
here is the console output of the solr server:
03.01.2013 11:32:31 org.apache.solr.core.SolrDeletionPolicy updateCommits
INFO: newest commit = 1
03.01.2013 11:32:31 org.apache.solr.update.pr
Hi,
If you haven't already, please refer to:
http://www.ngdata.com/site/blog/57-ng.html
http://lucene.472066.n3.nabble.com/solr-cloud-concepts-td3726292.html
http://wiki.apache.org/solr/SolrCloud#FAQ
Regards,
Aloke
On Thu, Jan 3, 2013 at 3:12 PM, Alexandre Rafalovitch wrote:
> Hello,
>
> I am
Hello,
I am trying to understand the core Solr terminology. I am looking for
correct rather than loose meaning as I am trying to teach an example that
starts from easy scenario and may scale to multi-core, multi-machine
situation.
Here are the terms that seem to be all overlapping and/or crossing
I'm not seeing the results I would expect. In the previous email below it's
stated that the "MLT search component" returns N results and K similar
documents per EACH of the N results.
If I'm not mistaken I access the "MLT search component" via a query to
/solr/select/?qt=mlt, such as this:
http:/
81 matches
Mail list logo