I've tried (very simplistically) hitting a collection with a good variety
of searches and looking at the collection's heap memory and working out the
bytes / doc. I've seen results around 100 bytes / doc, and as low as 3
bytes / doc for collections with small docs. It's still a work-in-progress
-
While it's hard to answer this question because as others have said, it
depends, I think it will be good of we can quantify or assess the cost of
running a SolrCore.
For instance, let's say that a server can handle a load of 10M indexed
documents (I omit search load on purpose for now) in a
Interesting none the less Shawn :)
We use G1GC on our servers, we were on Java 7 (64-bit, RHEL6), but are
trying to migrate to Java 8 (which seems to cause more GC issues, so we
clearly need to tweak our settings), will investigate 8u40 though.
On 25 March 2015 at 04:23, Shawn Heisey
Thanks Eric,
I'm working on Solr 4.10.2 and all my dependencies jar seems to be compatible
with this version.
I can't figure out which one make this issue.
ThanksRegards,
Le Mardi 24 mars 2015 23h45, Erick Erickson erickerick...@gmail.com a
écrit :
bq: 13 moreCaused by:
On Wed, 2015-03-25 at 03:46 +0100, Ian Rose wrote:
Thus theoretically we could actually just use one single collection for
all of our customers (adding a 'customer:whatever' type fq to all
queries) but since we never need to query across customers it seemed
more performant (as well as safer -
In one of our production environments we use 32GB, 4-core, 3T RAID0
spinning disk Dell servers (do not remember the exact model). We have
about 25 collections with 2 replica (shard-instances) per collection on
each machine - 25 machines. Total of 25 coll * 2 replica/coll/machine *
25 machines
On 3/25/2015 5:03 AM, Nitin Solanki wrote:
Please can anyone assist me? I am indexing on single shard it
is taking too much of time to index data. And I am indexing around 49GB of
data on single shard. What's wrong? Why solr is taking too much time to
index data?
Earlier I was
Per - Wow, 1 trillion documents stored is pretty impressive. One
clarification: when you say that you have 2 replica per collection on each
machine, what exactly does that mean? Do you mean that each collection is
sharded into 50 shards, divided evenly over all 25 machines (thus 2 shards
per
Hello,
* Updating my question again.*
Please can anyone assist me? I am indexing on single shard it
is taking too much of time to index data. And I am indexing around 49GB of
data on single shard. What's wrong? Why solr is taking too much time to
index data?
Earlier I was
On 25/03/15 15:03, Ian Rose wrote:
Per - Wow, 1 trillion documents stored is pretty impressive. One
clarification: when you say that you have 2 replica per collection on each
machine, what exactly does that mean? Do you mean that each collection is
sharded into 50 shards, divided evenly over
I have loved working on Solr, so thought of posting an Information
Retrieval/Text Mining requirement that we have for our GE Data Mining
Research Labs @ Bangalore. Apologies if it is considered inappropriate here.
Here goes the Job Description for those interested:
If Information Retrieval,
Hi,
I am a .net developer, but i need to use solr and specifically this good
plugin AutoPhrasingTokenFilter.
I searched everywhere and i couldn't get useful information, can any one
help me to run it in solr 5.0 or even previous versions. I am not able to
add it to my solr it is throwing below
If I do an initial search without any field sorting; and then do the exact
same query but also sort one field will I get the same result set in the
subsequent query but sorted. In other words, does simply applying a sort
criteria affect the re-rank on the full search or does it just sort the
Hello,
I am familiar with the JMX points that Solr exposes to allow for monitoring of
statistics like QPS, numdocs, Average Query Time...
I am wondering if there is a way to configure Solr to automatically store the
value of these stats over time (for a given time interval), and then allow a
Hi,
I didn't find the answer yet, please help. We have standalone Solr 5.0.0
with a few cores yet. One of those cores contains:
numDocs:120M
deletedDocs:110M
Our data are changing frequently so that's why so many deletedDocs.
Optimized core takes around 50GB on disk, we are now almost on 100GB
Hi
Is it possible for a replica to be DOWN, while the node it resides on is
under /live_nodes? If so, what can lead to it, aside from someone unloading
a core.
I don't know if each SolrCore reports status to ZK independently, or it's
done by the Solr process as a whole.
Also, is it possible for
Hi Shawn,
Sorry for all the things.
Server configuration:
8 CPUs.
32 GB RAM
O.S. - Linux
*Earlier*, I was using 8 shards without replica(default is 1) using SOLR
CLOUD. On server, Only Solr is running. There is no other application which
are running. Java heap set to 4096 MB in
Hi,
You're right. Those sets are same each other, only documents order is different.
Koji
On 2015/03/26 0:53, innoculou wrote:
If I do an initial search without any field sorting; and then do the exact
same query but also sort one field will I get the same result set in the
subsequent query
You're still mixing master/slave with SolrCloud. Do _not_ reconfigure
the replication. If you want your core (we call them replicas in
SolrCloud) to appear on various nodes in your cluster, either create
the collection with the nodes specified (createNodeSet) or, once the
collection is created on
Images don't come through the mailing list, can't see your image.
Whether or not all the jars in the directory you're working on are
consistent is the least of your problems. Are the libs to be found in any
_other_ place specified on your classpath?
Best,
Erick
On Wed, Mar 25, 2015 at 12:36 AM,
On 3/25/2015 5:49 AM, Tom Evans wrote:
On Tue, Mar 24, 2015 at 4:00 PM, Tom Evans tevans...@googlemail.com wrote:
Hi all
We're migrating to SOLR 5 (from 4.8), and our infrastructure guys
would prefer we installed SOLR from an RPM rather than extracting the
tarball where we need it. They are
That's a high number of deleted documents as a percentage of your
index! Or at least I find those numbers surprising. When segments are
merged in the background during normal indexing, quite a bit of weight
is given to segments that have a high percentage of deleted docs. I
usually see at most
On 3/25/2015 9:08 AM, pavelhladik wrote:
Our data are changing frequently so that's why so many deletedDocs.
Optimized core takes around 50GB on disk, we are now almost on 100GB and I'm
looking for best solution howto optimize this huge core without downtime. I
know optimization working in
bq: It does NOT optimize multiple replicas or shards in parallel.
This behavior was changed in 4.10 though, see:
https://issues.apache.org/jira/browse/SOLR-6264
So with 5.0 Pavel is seeing the result of that JIRA I bet.
I have to agree with Shawn, the optimization step should proceed
invisibly
Yeah, this is a head scratcher. But it _has_ to be that way for things
like edismax to work where you mix-and-match fielded and un-fielded
terms. I.e. I can have a query like q=field1:whatever some more
stuffqf=field2,field3,field4 where I want whatever to be evaluated
only against field1, but the
Matt:
Not really. There's a bunch of third-party log analysis tools that
give much of this information (not everything exposed by JMX of course
is in the log files though).
Not quite sure whether things like Nagios, Zabbix and the like have
this kind of stuff built in seems like a natural
Hello, Chris Morley here, of Wayfair.com. I am working on the German
compound-splitter by Dawid Weiss.
I tried to upgrade the words.fst file that comes with the German
compound-splitter using Solr 3.5, but it doesn't work. Below is the
IndexNotFoundException that I get.
On 3/25/2015 8:42 AM, Nitin Solanki wrote:
Server configuration:
8 CPUs.
32 GB RAM
O.S. - Linux
snip
are running. Java heap set to 4096 MB in Solr. While indexing,
snip
*Currently*, I have 1 shard with 2 replicas using SOLR CLOUD.
Data Size:
102G
Hello,
solr.KeywordTokenizerFactory seems splitting by whitespaces though according
SOLR documentation shouldn't do that.
For example I have the following configuration for the fields proj_name and
proj_name_sort:
field name=proj_name type=sortable_text_general indexed=true
stored=true/
On Wed, Mar 25, 2015 at 2:40 PM, Shawn Heisey apa...@elyograg.org wrote:
I think you will only need to change the ownership of the solr home and
the location where the .war file is extracted, which by default is
server/solr-webapp. The user must be able to *read* the program data,
but should
This is a _very_ common thing we all had to learn; what you're seeing
is the results of the _query parser_, not the analysis chain. Anything
like
proj_name_sort:term1 term2 gets split at the query parser level,
attaching debug=query to the URL should show down in the parsed
query section something
On 3/25/2015 9:26 AM, Matt Kuiper wrote:
I am familiar with the JMX points that Solr exposes to allow for monitoring
of statistics like QPS, numdocs, Average Query Time...
I am wondering if there is a way to configure Solr to automatically store the
value of these stats over time (for a
Comments inline:
On Wed, Mar 25, 2015 at 8:30 AM, Shai Erera ser...@gmail.com wrote:
Hi
Is it possible for a replica to be DOWN, while the node it resides on is
under /live_nodes? If so, what can lead to it, aside from someone unloading
a core.
Yes, aside from someone unloading the index,
Thanks for a quick response.
A bit confusing that analyzer of query type configured to use
KeywordTokenizerFactory does not un-tokenize query criteria.
I guess whitespace only the special case because it separates phrases in a
query and runs prior analyzing.
Actually I am handling a query the
Just to give a specific answer to the original question, I would say that
dozens of cores (collections) is certainly fine (assuming the total data
load and query rate is reasonable), maybe 50 or even 100. Low hundreds of
cores/collections MAY work, but isn't advisable. Thousands, if it works at
Thanks.
Does Solr ever clean up those states? I.e. does it ever remove down
replicas, or replicas belonging to non-live_nodes after some time? Or will
these remain in the cluster state forever (assuming they never come back
up)?
If they remain there, is there any penalty? E.g. Solr tries to send
Re,
Sorry about the image.So, there are all my dependencies jar in listing below :-
commons-cli-2.0-mahout.jar- commons-compress-1.9.jar- commons-io-2.4.jar-
commons-logging-1.2.jar- httpclient-4.4.jar- httpcore-4.4.jar-
httpmime-4.4.jar- junit-4.10.jar- log4j-1.2.17.jar-
Re,
Sorry about the image.So, there are all my dependencies jar in listing below :
- commons-cli-2.0-mahout.jar
- commons-compress-1.9.jar
- commons-io-2.4.jar
- commons-logging-1.2.jar
- httpclient-4.4.jar
- httpcore-4.4.jar
- httpmime-4.4.jar
Hi all,
I am wondering what the process is for applying Tokenizers and Filter (as
defined in the FieldType definition) to field contents that result from
CopyFields. To be more specific, in my Solr instance, Iwould like to support
query expansion by two means: removing stop words and adding
Hi,
I'm using a three level composite router in a solr cloud environment,
primarily for multi-tenant and field collapsing. The format is as follows.
*language!topic!url*.
An example would be :
ENU!12345!www.testurl.com/enu/doc1
GER!12345!www.testurl.com/ger/doc2
Thanks a lot, Michael. See replies below.
Am 25.03.2015 um 21:41 schrieb Michael Della Bitta
michael.della.bi...@appinions.com:
Two other things I noticed:
1. You probably don't want to store your copyFields. That's literally going
to be the same information each time.
OK, got it. I
Two other things I noticed:
1. You probably don't want to store your copyFields. That's literally going
to be the same information each time.
2. Your expectation the pre-processed version of the text is added to the
index may be incorrect. Anything done in analyzer type=query sections
actually
On Wed, Mar 25, 2015 at 12:51 PM, Shai Erera ser...@gmail.com wrote:
Thanks.
Does Solr ever clean up those states? I.e. does it ever remove down
replicas, or replicas belonging to non-live_nodes after some time? Or will
these remain in the cluster state forever (assuming they never come back
Hi Martin,
fq means filter query. May be you want to use qf (query fields) parameter of
edismax?
On Wednesday, March 25, 2015 9:23 PM, Martin Wunderlich martin...@gmx.net
wrote:
Hi all,
I am wondering what the process is for applying Tokenizers and Filter (as
defined in the FieldType
Thanks a lot, Ahmet. I’ve just read up on this query field parameter and it
sounds good. Since the field contents are currently all identical, I can’t
really test it, yet.
Cheers,
Martin
Am 25.03.2015 um 21:27 schrieb Ahmet Arslan iori...@yahoo.com.INVALID:
Hi Martin,
fq means
hi
I have field name GeoLocate with datatype as location. For some lat and long
it is giving me following error during indexing process
Can't parse point '139.9544301,35.4298081' because: Bad Y value 139.9544301
is not in boundary Rect(minX=-180.0,maxX=180.0,minY=-90.0,maxY=90.0)
Any idea
Re,
Finally, i think i found where this problem comes.I didn't use the right class
extender, instead using Tokenizers, i'm using Token filter.
Eric, thanks for your replies.Regards.
Le Mercredi 25 mars 2015 23h55, Test Test andymish...@yahoo.fr a écrit :
Re,
I have tried to remove
I agree the terminology is possibly a little confusing.
Stored refers to values that are stored verbatim. You can retrieve them
verbatim. Analysis does not affect stored values.
Indexed values are tokenized/transformed and stored inverted. You can't
recover the literal analyzed version (at least,
That should work. Check to be sure that you really are running Solr 5.0.
Was it an old version of trunk or the 5x branch before last August when the
terms query parser was added?
-- Jack Krupansky
On Tue, Mar 24, 2015 at 5:15 PM, Shamik Bandopadhyay sham...@gmail.com
wrote:
Hi,
I'm trying
Hello Chris - i don't know that token filter you mention but i would like to
recommend Lucene's HyphenationCompoundWordTokenFilter. It works reasonably well
if you provide the hyphenation rules and a dictionary. It has some flaws such
as decompounding to irrelevant subwords, overlapping
Re,
I have tried to remove all the redundant jar files.Then i've relaunched it but
it's blocked directly on the same issue.
It's very strange.
Regards,
Le Mercredi 25 mars 2015 23h31, Erick Erickson erickerick...@gmail.com a
écrit :
Wait, you didn't put, say, lucene-core-4.10.2.jar
Martin:
Perhaps this would help
indexed=true, stored=true
field can be searched. The raw input (not analyzed in any way) can be
shown to the user in the results list.
indexed=true, stored=false
field can be searched. However, the field can't be returned in the
results list with the document.
Thanks Erick for the helpful explanations.
thanks
sumit
From: Erick Erickson [erickerick...@gmail.com]
Sent: Monday, March 23, 2015 4:58 PM
To: solr-user@lucene.apache.org
Subject: Re: Difference in indexing using config file vs client i.e SolrJ
1 Either
The issue we had with Java 8 was with DIH handler. We were using Rhino and
with the new implementation in Java 8, we had several Regex expression
issues...
We are almost ready to go now, since we moved away from Rhino and now use
Java.
Bill
On Wed, Mar 25, 2015 at 2:14 AM, Daniel Collins
Erick,
Thanks for your help. I could fix the problem. I work in no SolrCloud mode.
Best Regards,
Ale
- Mensaje original -
De: Erick Erickson erickerick...@gmail.com
Para: solr-user@lucene.apache.org
Enviados: Martes, 24 de Marzo 2015 10:14:22
Asunto: [MASSMAIL]Re: Issues to create new
On Tue, Mar 24, 2015 at 4:00 PM, Tom Evans tevans...@googlemail.com wrote:
Hi all
We're migrating to SOLR 5 (from 4.8), and our infrastructure guys
would prefer we installed SOLR from an RPM rather than extracting the
tarball where we need it. They are creating the RPM file themselves,
and
On Wed, Mar 25, 2015 at 9:24 PM, Shai Erera ser...@gmail.com wrote:
There's even a param onyIfDown=true which will remove a
replica only if it's already 'down'.
That will only work if the replica is in DOWN state correct? That is, if
the Solr JVM was killed, and the replica stays in
Wait, you didn't put, say, lucene-core-4.10.2.jar into your
contrib/tamingtext/dependency directory did you? That means you have
Lucene (and solr and solrj and ...) in your class path twice since
they're _already_ in your classpath by default since you're running
Solr.
All your jars should be in
In Solr 5 (or 4), is there an easy way to retrieve the list of words to
highlight?
Use case: allow an external application to highlight the matching words
of a matching document, rather than using the highlighted snippets
returned by Solr.
Thanks,
Damien
Hello,
Please can anyone assist me? I am indexing on single shard it
is taking too much of time to index data. And I am indexing around 49GB of
data on single shard. What's wrong? Why solr is taking too much time to
index data?
Earlier I was indexing same data on 8 shards. That time,
Thanks for letting us know the resolution, the problem was bugging me
Erick
On Wed, Mar 25, 2015 at 4:21 PM, Test Test andymish...@yahoo.fr wrote:
Re,
Finally, i think i found where this problem comes.I didn't use the right
class extender, instead using Tokenizers, i'm using Token
There's even a param onyIfDown=true which will remove a
replica only if it's already 'down'.
That will only work if the replica is in DOWN state correct? That is, if
the Solr JVM was killed, and the replica stays in ACTIVE, but its node is
not under /live_nodes, it won't get deleted? What I
62 matches
Mail list logo