thanks for all your info.
I will try increase the RAM and check it.
thanks,
--
View this message in context:
http://lucene.472066.n3.nabble.com/Solrj-performance-bottleneck-tp2682797p2692503.html
Sent from the Solr - User mailing list archive at Nabble.com.
Hi,
For my Solr server, some of the query strings will be in Asian languages such
as Chinese or Japanese.
For such query strings, would the Standard or Dismax request handler work? My
understanding is that both the Standard and the Dismax handler tokenize the
query string by whitespace. And t
Hi,
The dreaded parent-child without denormalization question. What are one's
options for the following example:
parent: shoes
3 children. each with 2 attributes/fields: color and size
* color: red black orange
* size: 10 11 12
The goal is to be able to search for:
1) color:red AND size:10 a
Try give Solr like 1.5gb by setting Jave params. Solr is usually CPU bound. So
medium or large instances are good.
Bill Bell
Sent from mobile
On Mar 16, 2011, at 10:56 AM, Asharudeen wrote:
> Hi
>
> Thanks for your info.
>
> Currently my index size is around 4GB. Normally in small instances
: I'm not sure if I get what you are trying to achieve. What do you mean
: by "constraint"?
"constraint" it fairly standard terminology when refering to facets, it's
used extensively in our facet docs and is even listed on solr's glossary
page (allthough not specificyly in hte context of faceti
On 3/16/2011 6:09 PM, Shawn Heisey wrote:
du -hc *x
I was looking over the files in an index and I think it needs to include
more of the files for a true picture of RAM needs. I get 5.9GB running
the following command against a 16GB index. It excludes *.fdt (stored
field data) and *.tvf (t
On 3/16/2011 7:56 AM, Vadim Kisselmann wrote:
If the load is low, both slaves replicate with around 100MB/s from master.
But when I use Solrmeter (100-400 queries/min) for load tests (over
the load balancer), the replication slows down to an unacceptable
speed, around 100KB/s (at least that's wh
I agree with this and it is even needed for function sorting for multvalued
fields. See geohash patch for one wY to deal with multivalued fields on
distance. Not ideal but it works efficiently.
Bill Bell
Sent from mobile
On Mar 16, 2011, at 4:08 PM, Jonathan Rochkind wrote:
> Huh, so lucene
(11/03/17 3:53), Jonathan Rochkind wrote:
Interesting, any documentation on the PathTokenizer anywhere?
It is PathHierarchyTokenizer:
https://hudson.apache.org/hudson/job/Solr-trunk/javadoc/org/apache/solr/analysis/PathHierarchyTokenizerFactory.html
Koji
--
http://www.rondhuit.com/en/
Huh, so lucene is actually doing what has been commonly described as
impossible in Solr?
But is Solr trunk, as the OP person seemed to report, still not aware of
this and raising on a sort on multi-valued field, instead of just
saying, okay, we'll just pass it to lucene anyway and go with luce
On Wed, Mar 16, 2011 at 5:46 PM, Chris Hostetter
wrote:
>
> : However, many of our multiValued fields are single valued for the majority
> : of documents in our index so we may not have noticed the incorrect sorting
> : behaviors.
>
> that would make sense ... if you use a multiValued field as if
> > I am using Solr 4.0 api
> > to search from index (made using solr1.4 version). I
> am
> > getting error Invalid version (expected 2, but 1) or
> the
> > data in not in 'javabin' format. Can anyone help me to
> fix
> > problem.
>
> You need to use solrj version 1.4 which is compatible
: However, many of our multiValued fields are single valued for the majority
: of documents in our index so we may not have noticed the incorrect sorting
: behaviors.
that would make sense ... if you use a multiValued field as if it were
single valued, you would never enocunter a problem. if yo
It looks like Dismax query parser can somehow handle parens, used for
applying, for instance, + or - to a group, distributing it. But I'm not
sure what effect they have on the overall query.
For instance, if I give dismax this:
book (dog +( cat -frog))
debugQuery shows:
+((DisjunctionMaxQuery(
Actually, i dug in the logs again and surprise, it sometimes still occurs with
`random` queries. Here's are a few snippets from the error log. Somewhere
during that time there might be OOM-errors but older logs are unfortunately
rotated away.
2011-03-14 00:25:32,152 ERROR [solr.search.SolrCac
Hi,
> FWIW: it sounds like your problem wasn't actually related to your
> fieldCache, but probably instead if was because of how big your
> queryResultCache is
It's the same cluster as in the other thread. I decided a long time ago that
documentCache and queryResultCache wouldn't be a good
On Wed, Mar 16, 2011 at 5:10 PM, Robert Petersen wrote:
> OK I have a 30 gb index where there are lots of sparsly populated int
> fields and then one title field and one catchall field with title and
> everything else we want as keywords, the catchall field. I figure it is
> the biggest field in
: Alright, i can now confirm the issue has been resolved by reducing precision.
: The garbage collector on nodes without reduced precision has a real hard time
: keeping up and clearly shows a very different graph of heap consumption.
:
: Consider using MINUTE, HOUR or DAY as precision in case
OK I have a 30 gb index where there are lots of sparsly populated int
fields and then one title field and one catchall field with title and
everything else we want as keywords, the catchall field. I figure it is
the biggest field in our documents which as I mentioned is otherwise
composed of a var
Hi Yonik,
I have ran the queries against single index solr with only 16M documents.
After attaching facet.method=fc the results seemed to come faster (first two
queries below), but still not fast enough.
Here are the fieldValueCache stats:
(facet.limit=100&facet.mincount=5&facet.method=fc, 5
> that is odd...
>
> can you let us know exactly what verison of Solr/Lucne you are using (if
> it's not an official release, can you let us know exactly what the version
> details on the admin info page say, i'm curious about the svn revision)
Of course, that's the stable 1.4.1.
>
> can you al
: I.E. Instruct Solr that you are interested in documents that match a
: given query and then have Solr notify you (through whatever callback
: mechanism is specified) if and when a document appears that matches the
: query.
:
: We are planning on writing some software that will effectively grind
Lewis
Quick response, I am currently using Tomcat 7.0.8 with solr (with no issues), I
will upgrade to 7.0.11 tonight and see if I run into the same issues.
Stay tuned as they say.
Cheers
François
On Mar 16, 2011, at 2:38 PM, McGibbney, Lewis John wrote:
> Hello list,
>
> Is anyone running S
:
: Yesterday's error log contains something peculiar:
:
: ERROR [solr.search.SolrCache] - [pool-29-thread-1] - : Error during auto-
: warming of key:+*:*
:
(1.0/(7.71E-8*float(ms(const(1298682616680),date(sort_date)))+1.0))^20.0:java.lang.NullPointerException
: at org.apache.lucene.u
On Wed, Mar 16, 2011 at 8:05 AM, Dmitry Kan wrote:
> Hello guys. We are using shard'ed solr 1.4 for heavy faceted search over the
> trigrams field with about 1 million of entries in the result set and more
> than 100 million of entries to facet on in the index. Currently the faceted
> search is ve
Hi Toke,
Thanks a lot for trying this out. I have to mention, that the facetted
search hits only one specific shard by design, so in general the time to
query a shard directly and through the "proxy" SOLR should be comparable.
Would it be feasible for you to make that field ngram'ed or is it too
Hi Kaushik,
If the field is being treated as blobs, you can try using the
FieldStreamDataSource mapping.
This handles the blob objects to extract contents from it.
This feature is available only after Solr 3.1, I suppose.
http://lucene.apache.org/solr/api/org/apache/solr/handler/dataimport/FieldS
Hi Erik,
I have been reading about the progression of SOLR-792 into pivot faceting,
however can you expand to comment on
where it is committed. Are you referring to trunk?
The reason I am asking is that I have been using 1.4.1 for some time now and
have been thinking of upgrading to trunk... or
Interesting, any documentation on the PathTokenizer anywhere? Or just
have to find and look at the source? That's something I hadn't known
about, which may be useful to some stuff I've been working on depending
on how it works.
If nothing else, in the meantime, I'm going to take that exact mes
Oh, doc count over 100M is a very different thing than doc count about
1M. In your original message you said "I tried creating an index with 1M
documents, each with 100 unique terms in a field." If you instead have
100M documents, your use is a couple orders of magnitude larger than mine.
It a
Hello list,
Is anyone running Solr (in my case 1.4.1) on above Tomcat dist? In the
past I have been using guidance in accordance with
http://wiki.apache.org/solr/SolrTomcat#Installing_Solr_instances_under_Tomcat
but having upgraded from Tomcat 7.0.8 to 7.0.11 I am having problems
E.g.
INFO: Deplo
Hi Jonathan,
Thanks for sharing useful bits. Each shard has 16G of heap. Unless I do
something fundamentally wrong in the SOLR configuration, I have to admit,
that counting ngrams up to trigrams across whole set of shard's documents is
pretty intensive task, as each ngram can occur anywhere in the
I take raw user search term data, 'collapse' it into a form where I have
only unique terms, per store, ordered by frequency of searches over some
time period. The suggestions are then grouped and presented with store
breakouts. That sounds kind of like what this page is talking about
here, but I
Sorry, I missed the original mail on this thread
I put together that hierarchical faceting wiki page a couple of years ago when
helping a customer evaluate SOLR-64 vs. SOLR-792 vs.other approaches. Since
then, SOLR-792 morphed and is committed as pivot faceting. SOLR-64 spawned a
PathToke
Hi all!
I created a SolrJ project to run test Solr. So, I am inserting batches of
7000 records, each with 200 attributes which adds up approximately to 13.77
Mb per batch.
I am measuring the time it takes to add and commit each set of 7000
records to an instantiation of CommonsHttpSolrServer.
Eac
On Wed, Mar 16, 2011 at 12:56 PM, Asharudeen wrote:
> Currently my index size is around 4GB. Normally in small instances total
> available memory will be 1.6GB. In my setup, I allocated around 1GB as a
> heap size for tomcat. Hence I believe, remaining 600 MB will be used for OS
> cache.
Actually
Hi
Thanks for your info.
Currently my index size is around 4GB. Normally in small instances total
available memory will be 1.6GB. In my setup, I allocated around 1GB as a
heap size for tomcat. Hence I believe, remaining 600 MB will be used for OS
cache.
I believe, I need to migrate my Solr insta
Hi,
This is also where I am having problems. I have not been able to understand
very much on the wiki.
I do not understand how to configure the faceting we are referring to.
Although I know very little about this, I can't help but think that the wiki is
quite clearly unaccurate by some way!
Any
On Wed, Mar 16, 2011 at 9:50 PM, Kaushik Chakraborty wrote:
> The query's there in the data-config.xml. And the query's fetching as
> expected from the database.
[...]
Doh! Sorry, had missed that somehow.
So, the relevant part is:
SELECT ... p.message as solr_post_message,
What is the field typ
Ah, wait, you're doing sharding? Yeah, I am NOT doing sharding, so that
could explain our different experiences. It seems like sharding
definitely has trade-offs, makes some things faster and other things
slower. So far I've managed to avoid it, in the interest of keeping
things simpler and e
I don't know anything about trying to use map-reduce with Solr.
But I can tell you that with about 6 million entries in the result set,
and around 10 million values to facet on (facetting on a multi-value
field) -- I still get fine performance in my application. In the worst
case it can take m
The query's there in the data-config.xml. And the query's fetching as
expected from the database.
Thanks,
Kaushik
On Wed, Mar 16, 2011 at 9:21 PM, Gora Mohanty wrote:
> On Wed, Mar 16, 2011 at 2:29 PM, Stefan Matheis
> wrote:
> > Kaushik,
> >
> > i just remembered an ML-Post few weeks ago ..
On Wed, Mar 16, 2011 at 2:29 PM, Stefan Matheis
wrote:
> Kaushik,
>
> i just remembered an ML-Post few weeks ago .. same problem while
> importing geo-data
> (http://lucene.472066.n3.nabble.com/Solr-4-0-Spatial-Search-How-to-tp2245592p2254395.html)
> - the solution was:
>
>> CAST( CONCAT( lat, ','
On Wed, 2011-03-16 at 13:05 +0100, Dmitry Kan wrote:
> Hello guys. We are using shard'ed solr 1.4 for heavy faceted search over the
> trigrams field with about 1 million of entries in the result set and more
> than 100 million of entries to facet on in the index. Currently the faceted
> search is v
Heh heh, you say "it worked correctly for me" yet you didn't actually have
multi-valued data ;-) Funny.
The only solution right now is to store the max and min into indexed
single-valued fields at index time. This is pretty straight-forward to do.
Even if/when Solr supports sorting on a mult
Hi David,
It did seem to work correctly for me - we had it running on our production
indexes for some time and we never noticed any strange sorting behavior.
However, many of our multiValued fields are single valued for the majority
of documents in our index so we may not have noticed the incorre
Hi,
We are looking for some one who can provide online training for ruby and
rails
I found your profile interesting and If you are Interested then please do
reply me for this mail.
If not then please do not consider this message as a spam.
If you are Interested then let me know -
How much
Hi everyone,
I have Solr running on one master and two slaves (load balanced) via
Solr 1.4.1 native replication.
If the load is low, both slaves replicate with around 100MB/s from master.
But when I use Solrmeter (100-400 queries/min) for load tests (over
the load balancer), the replication slow
Am 16.03.2011 14:12, schrieb Erlend Garåsen:
>
> We are unsure whether we should use SSL in order to communicate with
> our Solr server since it will increase the cost of creating http
> connections. If we go for SSL, is it advisable to do some additional
> settings for the HttpClient in order to r
On Wed, Mar 16, 2011 at 7:25 AM, rahul wrote:
> In our setup, we are having Solr index in one machine. And Solrj client part
> (java code) in another machine. Currently as you suggest, if it may be a
> 'not enough free RAM for the OS to cache' then whether I need to increase
> the RAM in the machi
We are unsure whether we should use SSL in order to communicate with our
Solr server since it will increase the cost of creating http
connections. If we go for SSL, is it advisable to do some additional
settings for the HttpClient in order to reduce the connection costs?
After reading the Co
What Solr are you using? That filter is not pre 3.1 releases.
On Wednesday 16 March 2011 13:55:21 Brian Lamb wrote:
> Hi all,
>
> I am setting up multicore and the schema.xml file in the core0 folder says
> not to sure that one because its very stripped down. So I copied the schema
> from example
Hi all,
I am setting up multicore and the schema.xml file in the core0 folder says
not to sure that one because its very stripped down. So I copied the schema
from example/solr/conf but now I am getting a bunch of class not found
exceptions:
SEVERE: org.apache.solr.common.SolrException: Error loa
> When I use the Porter Stemmer in
> Solr, it appears to take works that are
> stemmed and replace it with the root work in the index.
> I verified this by looking at analysis.jsp.
>
> Is there an option to expand the stemmer to include all
> combinations of the
> word? Like include 's, ly, etc?
> does anyone have a successfull setup (=pom.xml) that
> specifies the
> Hudson snapshot repository :
>
> https://builds.apache.org/hudson/job/Lucene-Solr-Maven-3.x/lastStableBuild/artifact/maven_artifacts
> (or that for trunk)
>
> and entries for any solr snapshot artifacts which are then
> foun
No, not setting those options in the query or schema.xml file.
I'll try what you said, however.
Thanks
Chris Hostetter-3 wrote:
>
> : We have a "D" field (string, indexed, stored, not required) that is
> returned
> : * when we search with the standard request handler
> : * when we search with
Hello guys. We are using shard'ed solr 1.4 for heavy faceted search over the
trigrams field with about 1 million of entries in the result set and more
than 100 million of entries to facet on in the index. Currently the faceted
search is very slow, taking about 5 minutes per query. Would running on
Hmm, i'm not sure if its supposed to stem that way but if it doesn't and you
insist then you might be able to abuse the PatternReplaceFilterFactory.
On Wednesday 16 March 2011 06:02:32 Bill Bell wrote:
> When I use the Porter Stemmer in Solr, it appears to take works that are
> stemmed and replac
Yes, due to warmup queries Solr may run out of heap space at start up.
On Monday 14 March 2011 16:52:15 Ranma wrote:
> I am still stuck at the same point.
>
> Looking here and there I could read that the memory limit (heap space) may
> need to be increased to -Xms512M -Xmx512M when launching the
Hi,
Thanks for your information.
One simple question. Please clarify me.
In our setup, we are having Solr index in one machine. And Solrj client part
(java code) in another machine. Currently as you suggest, if it may be a
'not enough free RAM for the OS to cache' then whether I need to increase
Hello,
I have a problem with the SOLR spellchecker component. This is the problem:
Searching term = Company: American today, City: London (two fields:
copyfield to one: Spell )
User search = American tuday, Londen
What i want is a collation of: American today london. SOLR returns with the
q par
Hi all,
does anyone have a successfull setup (=pom.xml) that specifies the
Hudson snapshot repository :
https://builds.apache.org/hudson/job/Lucene-Solr-Maven-3.x/lastStableBuild/artifact/maven_artifacts
(or that for trunk)
and entries for any solr snapshot artifacts which are then found by
Mave
Hi Upayavira,
I use the term constraint to define additional options for a user to refine
search with under each facet. If we could think
of them as sub facet's then maybe this would explain in slightly better terms.
I didn't add additional document source types in my original email but if I
kn
Kaushik,
i just remembered an ML-Post few weeks ago .. same problem while
importing geo-data
(http://lucene.472066.n3.nabble.com/Solr-4-0-Spatial-Search-How-to-tp2245592p2254395.html)
- the solution was:
> CAST( CONCAT( lat, ',', lng ) AS CHAR )
at that time i search a little bit for the reason
AWESOME, thanks for your time!
Regards
James
On Wed, Mar 16, 2011 at 6:14 PM, David Smiley (@MITRE.org) <
dsmi...@mitre.org> wrote:
> Hi. Where did you find such an obtuse example?
>
> Recently, Solr supports sorting by function query. One such function is
> named "query" which takes a query
Hello list,
the dismax query type has one feature that is particularly nice... the ability
to expand tokens to a query to many fields. This is really useful to do such
jobs as "prefer a match in title, prefer exact matches over stemmed matches
over phonetic matches".
My problem: I wish to do
66 matches
Mail list logo