Re: Solrj performance bottleneck

2011-03-16 Thread rahul
thanks for all your info. I will try increase the RAM and check it. thanks, -- View this message in context: http://lucene.472066.n3.nabble.com/Solrj-performance-bottleneck-tp2682797p2692503.html Sent from the Solr - User mailing list archive at Nabble.com.

What request handlers to use for query strings in Chinese or Japanese?

2011-03-16 Thread Andy
Hi, For my Solr server, some of the query strings will be in Asian languages such as Chinese or Japanese. For such query strings, would the Standard or Dismax request handler work? My understanding is that both the Standard and the Dismax handler tokenize the query string by whitespace. And t

Parent-child options

2011-03-16 Thread Otis Gospodnetic
Hi, The dreaded parent-child without denormalization question. What are one's options for the following example: parent: shoes 3 children. each with 2 attributes/fields: color and size * color: red black orange * size: 10 11 12 The goal is to be able to search for: 1) color:red AND size:10 a

Re: Solrj performance bottleneck

2011-03-16 Thread Bill Bell
Try give Solr like 1.5gb by setting Jave params. Solr is usually CPU bound. So medium or large instances are good. Bill Bell Sent from mobile On Mar 16, 2011, at 10:56 AM, Asharudeen wrote: > Hi > > Thanks for your info. > > Currently my index size is around 4GB. Normally in small instances

Re: Faceting help

2011-03-16 Thread Chris Hostetter
: I'm not sure if I get what you are trying to achieve. What do you mean : by "constraint"? "constraint" it fairly standard terminology when refering to facets, it's used extensively in our facet docs and is even listed on solr's glossary page (allthough not specificyly in hte context of faceti

Re: Replication slows down massively during high load

2011-03-16 Thread Shawn Heisey
On 3/16/2011 6:09 PM, Shawn Heisey wrote: du -hc *x I was looking over the files in an index and I think it needs to include more of the files for a true picture of RAM needs. I get 5.9GB running the following command against a 16GB index. It excludes *.fdt (stored field data) and *.tvf (t

Re: Replication slows down massively during high load

2011-03-16 Thread Shawn Heisey
On 3/16/2011 7:56 AM, Vadim Kisselmann wrote: If the load is low, both slaves replicate with around 100MB/s from master. But when I use Solrmeter (100-400 queries/min) for load tests (over the load balancer), the replication slows down to an unacceptable speed, around 100KB/s (at least that's wh

Re: Sorting on multiValued fields via function query

2011-03-16 Thread Bill Bell
I agree with this and it is even needed for function sorting for multvalued fields. See geohash patch for one wY to deal with multivalued fields on distance. Not ideal but it works efficiently. Bill Bell Sent from mobile On Mar 16, 2011, at 4:08 PM, Jonathan Rochkind wrote: > Huh, so lucene

Re: hierarchical faceting, SOLR-792 - confused on config

2011-03-16 Thread Koji Sekiguchi
(11/03/17 3:53), Jonathan Rochkind wrote: Interesting, any documentation on the PathTokenizer anywhere? It is PathHierarchyTokenizer: https://hudson.apache.org/hudson/job/Solr-trunk/javadoc/org/apache/solr/analysis/PathHierarchyTokenizerFactory.html Koji -- http://www.rondhuit.com/en/

Re: Sorting on multiValued fields via function query

2011-03-16 Thread Jonathan Rochkind
Huh, so lucene is actually doing what has been commonly described as impossible in Solr? But is Solr trunk, as the OP person seemed to report, still not aware of this and raising on a sort on multi-valued field, instead of just saying, okay, we'll just pass it to lucene anyway and go with luce

Re: Sorting on multiValued fields via function query

2011-03-16 Thread Yonik Seeley
On Wed, Mar 16, 2011 at 5:46 PM, Chris Hostetter wrote: > > : However, many of our multiValued fields are single valued for the majority > : of documents in our index so we may not have noticed the incorrect sorting > : behaviors. > > that would make sense ... if you use a multiValued field as if

Re: Version Incompatibility(Invalid version (expected 2, but 1) or the data in not in 'javabin' format)

2011-03-16 Thread Ahmet Arslan
> >           I am using Solr 4.0 api > > to search from index (made using solr1.4 version). I > am > > getting error Invalid version (expected 2, but 1) or > the > > data in not in 'javabin' format. Can anyone help me to > fix > > problem. > > You need to use solrj version 1.4 which is compatible

Re: Sorting on multiValued fields via function query

2011-03-16 Thread Chris Hostetter
: However, many of our multiValued fields are single valued for the majority : of documents in our index so we may not have noticed the incorrect sorting : behaviors. that would make sense ... if you use a multiValued field as if it were single valued, you would never enocunter a problem. if yo

dismax parser, parens, what do they do exactly

2011-03-16 Thread Jonathan Rochkind
It looks like Dismax query parser can somehow handle parens, used for applying, for instance, + or - to a group, distributing it. But I'm not sure what effect they have on the overall query. For instance, if I give dismax this: book (dog +( cat -frog)) debugQuery shows: +((DisjunctionMaxQuery(

Re: Error during auto-warming of key

2011-03-16 Thread Markus Jelsma
Actually, i dug in the logs again and surprise, it sometimes still occurs with `random` queries. Here's are a few snippets from the error log. Somewhere during that time there might be OOM-errors but older logs are unfortunately rotated away. 2011-03-14 00:25:32,152 ERROR [solr.search.SolrCac

Re: FunctionQueries and FieldCache and OOM

2011-03-16 Thread Markus Jelsma
Hi, > FWIW: it sounds like your problem wasn't actually related to your > fieldCache, but probably instead if was because of how big your > queryResultCache is It's the same cluster as in the other thread. I decided a long time ago that documentCache and queryResultCache wouldn't be a good

Re: i don't get why my index didn't grow more...

2011-03-16 Thread Yonik Seeley
On Wed, Mar 16, 2011 at 5:10 PM, Robert Petersen wrote: > OK I have a 30 gb index where there are lots of sparsly populated int > fields and then one title field and one catchall field with title and > everything else we want as keywords, the catchall field.  I figure it is > the biggest field in

Re: FunctionQueries and FieldCache and OOM

2011-03-16 Thread Chris Hostetter
: Alright, i can now confirm the issue has been resolved by reducing precision. : The garbage collector on nodes without reduced precision has a real hard time : keeping up and clearly shows a very different graph of heap consumption. : : Consider using MINUTE, HOUR or DAY as precision in case

i don't get why my index didn't grow more...

2011-03-16 Thread Robert Petersen
OK I have a 30 gb index where there are lots of sparsly populated int fields and then one title field and one catchall field with title and everything else we want as keywords, the catchall field. I figure it is the biggest field in our documents which as I mentioned is otherwise composed of a var

Re: faceting over ngrams

2011-03-16 Thread Dmitry Kan
Hi Yonik, I have ran the queries against single index solr with only 16M documents. After attaching facet.method=fc the results seemed to come faster (first two queries below), but still not fast enough. Here are the fieldValueCache stats: (facet.limit=100&facet.mincount=5&facet.method=fc, 5

Re: Error during auto-warming of key

2011-03-16 Thread Markus Jelsma
> that is odd... > > can you let us know exactly what verison of Solr/Lucne you are using (if > it's not an official release, can you let us know exactly what the version > details on the admin info page say, i'm curious about the svn revision) Of course, that's the stable 1.4.1. > > can you al

Re: 'Registering' a query / Percolation

2011-03-16 Thread Chris Hostetter
: I.E. Instruct Solr that you are interested in documents that match a : given query and then have Solr notify you (through whatever callback : mechanism is specified) if and when a document appears that matches the : query. : : We are planning on writing some software that will effectively grind

Re: Using Solr 1.4.1 on most recent Tomcat 7.0.11

2011-03-16 Thread François Schiettecatte
Lewis Quick response, I am currently using Tomcat 7.0.8 with solr (with no issues), I will upgrade to 7.0.11 tonight and see if I run into the same issues. Stay tuned as they say. Cheers François On Mar 16, 2011, at 2:38 PM, McGibbney, Lewis John wrote: > Hello list, > > Is anyone running S

Re: Error during auto-warming of key

2011-03-16 Thread Chris Hostetter
: : Yesterday's error log contains something peculiar: : : ERROR [solr.search.SolrCache] - [pool-29-thread-1] - : Error during auto- : warming of key:+*:* : (1.0/(7.71E-8*float(ms(const(1298682616680),date(sort_date)))+1.0))^20.0:java.lang.NullPointerException : at org.apache.lucene.u

Re: faceting over ngrams

2011-03-16 Thread Yonik Seeley
On Wed, Mar 16, 2011 at 8:05 AM, Dmitry Kan wrote: > Hello guys. We are using shard'ed solr 1.4 for heavy faceted search over the > trigrams field with about 1 million of entries in the result set and more > than 100 million of entries to facet on in the index. Currently the faceted > search is ve

Re: faceting over ngrams

2011-03-16 Thread Dmitry Kan
Hi Toke, Thanks a lot for trying this out. I have to mention, that the facetted search hits only one specific shard by design, so in general the time to query a shard directly and through the "proxy" SOLR should be comparable. Would it be feasible for you to make that field ngram'ed or is it too

Re: SOLR DIH importing MySQL "text" column as a BLOB

2011-03-16 Thread Jayendra Patil
Hi Kaushik, If the field is being treated as blobs, you can try using the FieldStreamDataSource mapping. This handles the blob objects to extract contents from it. This feature is available only after Solr 3.1, I suppose. http://lucene.apache.org/solr/api/org/apache/solr/handler/dataimport/FieldS

RE: hierarchical faceting, SOLR-792 - confused on config

2011-03-16 Thread McGibbney, Lewis John
Hi Erik, I have been reading about the progression of SOLR-792 into pivot faceting, however can you expand to comment on where it is committed. Are you referring to trunk? The reason I am asking is that I have been using 1.4.1 for some time now and have been thinking of upgrading to trunk... or

Re: hierarchical faceting, SOLR-792 - confused on config

2011-03-16 Thread Jonathan Rochkind
Interesting, any documentation on the PathTokenizer anywhere? Or just have to find and look at the source? That's something I hadn't known about, which may be useful to some stuff I've been working on depending on how it works. If nothing else, in the meantime, I'm going to take that exact mes

Re: faceting over ngrams

2011-03-16 Thread Jonathan Rochkind
Oh, doc count over 100M is a very different thing than doc count about 1M. In your original message you said "I tried creating an index with 1M documents, each with 100 unique terms in a field." If you instead have 100M documents, your use is a couple orders of magnitude larger than mine. It a

Using Solr 1.4.1 on most recent Tomcat 7.0.11

2011-03-16 Thread McGibbney, Lewis John
Hello list, Is anyone running Solr (in my case 1.4.1) on above Tomcat dist? In the past I have been using guidance in accordance with http://wiki.apache.org/solr/SolrTomcat#Installing_Solr_instances_under_Tomcat but having upgraded from Tomcat 7.0.8 to 7.0.11 I am having problems E.g. INFO: Deplo

Re: faceting over ngrams

2011-03-16 Thread Dmitry Kan
Hi Jonathan, Thanks for sharing useful bits. Each shard has 16G of heap. Unless I do something fundamentally wrong in the SOLR configuration, I have to admit, that counting ngrams up to trigrams across whole set of shard's documents is pretty intensive task, as each ngram can occur anywhere in the

RE: Different options for autocomplete/autosuggestion

2011-03-16 Thread Robert Petersen
I take raw user search term data, 'collapse' it into a form where I have only unique terms, per store, ordered by frequency of searches over some time period. The suggestions are then grouped and presented with store breakouts. That sounds kind of like what this page is talking about here, but I

Re: hierarchical faceting, SOLR-792 - confused on config

2011-03-16 Thread Erik Hatcher
Sorry, I missed the original mail on this thread I put together that hierarchical faceting wiki page a couple of years ago when helping a customer evaluate SOLR-64 vs. SOLR-792 vs.other approaches. Since then, SOLR-792 morphed and is committed as pivot faceting. SOLR-64 spawned a PathToke

Error: "Unbuffered entity enclosing request can not be repeated."

2011-03-16 Thread André Santos
Hi all! I created a SolrJ project to run test Solr. So, I am inserting batches of 7000 records, each with 200 attributes which adds up approximately to 13.77 Mb per batch. I am measuring the time it takes to add and commit each set of 7000 records to an instantiation of CommonsHttpSolrServer. Eac

Re: Solrj performance bottleneck

2011-03-16 Thread Yonik Seeley
On Wed, Mar 16, 2011 at 12:56 PM, Asharudeen wrote: > Currently my index size is around 4GB. Normally in small instances total > available memory will be 1.6GB. In my setup, I allocated around 1GB as a > heap size for tomcat. Hence I believe, remaining 600 MB will be used for OS > cache. Actually

Re: Solrj performance bottleneck

2011-03-16 Thread Asharudeen
Hi Thanks for your info. Currently my index size is around 4GB. Normally in small instances total available memory will be 1.6GB. In my setup, I allocated around 1GB as a heap size for tomcat. Hence I believe, remaining 600 MB will be used for OS cache. I believe, I need to migrate my Solr insta

RE: hierarchical faceting, SOLR-792 - confused on config

2011-03-16 Thread McGibbney, Lewis John
Hi, This is also where I am having problems. I have not been able to understand very much on the wiki. I do not understand how to configure the faceting we are referring to. Although I know very little about this, I can't help but think that the wiki is quite clearly unaccurate by some way! Any

Re: SOLR DIH importing MySQL "text" column as a BLOB

2011-03-16 Thread Gora Mohanty
On Wed, Mar 16, 2011 at 9:50 PM, Kaushik Chakraborty wrote: > The query's there in the data-config.xml. And the query's fetching as > expected from the database. [...] Doh! Sorry, had missed that somehow. So, the relevant part is: SELECT ... p.message as solr_post_message, What is the field typ

Re: faceting over ngrams

2011-03-16 Thread Jonathan Rochkind
Ah, wait, you're doing sharding? Yeah, I am NOT doing sharding, so that could explain our different experiences. It seems like sharding definitely has trade-offs, makes some things faster and other things slower. So far I've managed to avoid it, in the interest of keeping things simpler and e

Re: faceting over ngrams

2011-03-16 Thread Jonathan Rochkind
I don't know anything about trying to use map-reduce with Solr. But I can tell you that with about 6 million entries in the result set, and around 10 million values to facet on (facetting on a multi-value field) -- I still get fine performance in my application. In the worst case it can take m

Re: SOLR DIH importing MySQL "text" column as a BLOB

2011-03-16 Thread Kaushik Chakraborty
The query's there in the data-config.xml. And the query's fetching as expected from the database. Thanks, Kaushik On Wed, Mar 16, 2011 at 9:21 PM, Gora Mohanty wrote: > On Wed, Mar 16, 2011 at 2:29 PM, Stefan Matheis > wrote: > > Kaushik, > > > > i just remembered an ML-Post few weeks ago ..

Re: SOLR DIH importing MySQL "text" column as a BLOB

2011-03-16 Thread Gora Mohanty
On Wed, Mar 16, 2011 at 2:29 PM, Stefan Matheis wrote: > Kaushik, > > i just remembered an ML-Post few weeks ago .. same problem while > importing geo-data > (http://lucene.472066.n3.nabble.com/Solr-4-0-Spatial-Search-How-to-tp2245592p2254395.html) > - the solution was: > >> CAST( CONCAT( lat, ','

Re: faceting over ngrams

2011-03-16 Thread Toke Eskildsen
On Wed, 2011-03-16 at 13:05 +0100, Dmitry Kan wrote: > Hello guys. We are using shard'ed solr 1.4 for heavy faceted search over the > trigrams field with about 1 million of entries in the result set and more > than 100 million of entries to facet on in the index. Currently the faceted > search is v

Re: Sorting on multiValued fields via function query

2011-03-16 Thread Smiley, David W.
Heh heh, you say "it worked correctly for me" yet you didn't actually have multi-valued data ;-) Funny. The only solution right now is to store the max and min into indexed single-valued fields at index time. This is pretty straight-forward to do. Even if/when Solr supports sorting on a mult

Re: Sorting on multiValued fields via function query

2011-03-16 Thread harish.agarwal
Hi David, It did seem to work correctly for me - we had it running on our production indexes for some time and we never noticed any strange sorting behavior. However, many of our multiValued fields are single valued for the majority of documents in our index so we may not have noticed the incorre

Online training for ruby and rails

2011-03-16 Thread hi . sinie
Hi, We are looking for some one who can provide online training for ruby and rails I found your profile interesting and If you are Interested then please do reply me for this mail. If not then please do not consider this message as a spam. If you are Interested then let me know - How much

Replication slows down massively during high load

2011-03-16 Thread Vadim Kisselmann
Hi everyone, I have Solr running on one master and two slaves (load balanced) via Solr 1.4.1 native replication. If the load is low, both slaves replicate with around 100MB/s from master. But when I use Solrmeter (100-400 queries/min) for load tests (over the load balancer), the replication slow

Re: SSL and connection pooling

2011-03-16 Thread Em
Am 16.03.2011 14:12, schrieb Erlend Garåsen: > > We are unsure whether we should use SSL in order to communicate with > our Solr server since it will increase the cost of creating http > connections. If we go for SSL, is it advisable to do some additional > settings for the HttpClient in order to r

Re: Solrj performance bottleneck

2011-03-16 Thread Yonik Seeley
On Wed, Mar 16, 2011 at 7:25 AM, rahul wrote: > In our setup, we are having Solr index in one machine. And Solrj client part > (java code) in another machine. Currently as you suggest, if it may be a > 'not enough free RAM for the OS to cache' then whether I need to increase > the RAM in the machi

SSL and connection pooling

2011-03-16 Thread Erlend Garåsen
We are unsure whether we should use SSL in order to communicate with our Solr server since it will increase the cost of creating http connections. If we go for SSL, is it advisable to do some additional settings for the HttpClient in order to reduce the connection costs? After reading the Co

Re: Multicore

2011-03-16 Thread Markus Jelsma
What Solr are you using? That filter is not pre 3.1 releases. On Wednesday 16 March 2011 13:55:21 Brian Lamb wrote: > Hi all, > > I am setting up multicore and the schema.xml file in the core0 folder says > not to sure that one because its very stripped down. So I copied the schema > from example

Multicore

2011-03-16 Thread Brian Lamb
Hi all, I am setting up multicore and the schema.xml file in the core0 folder says not to sure that one because its very stripped down. So I copied the schema from example/solr/conf but now I am getting a bunch of class not found exceptions: SEVERE: org.apache.solr.common.SolrException: Error loa

Re: Stemming question

2011-03-16 Thread Ahmet Arslan
> When I use the Porter Stemmer in > Solr, it appears to take works that are > stemmed and replace it with the root work in the index. > I verified this by looking at analysis.jsp. > > Is there an option to expand the stemmer to include all > combinations of the > word? Like include 's, ly, etc?

Re: Maven : Specifying SNAPSHOT Artifacts and the Hudson Repository

2011-03-16 Thread Ahmet Arslan
> does anyone have a successfull setup (=pom.xml) that > specifies the > Hudson snapshot repository : > > https://builds.apache.org/hudson/job/Lucene-Solr-Maven-3.x/lastStableBuild/artifact/maven_artifacts > (or that for trunk) > > and entries for any solr snapshot artifacts which are then > foun

Re: Dismax: field not returned unless in sort clause?

2011-03-16 Thread mrw
No, not setting those options in the query or schema.xml file. I'll try what you said, however. Thanks Chris Hostetter-3 wrote: > > : We have a "D" field (string, indexed, stored, not required) that is > returned > : * when we search with the standard request handler > : * when we search with

faceting over ngrams

2011-03-16 Thread Dmitry Kan
Hello guys. We are using shard'ed solr 1.4 for heavy faceted search over the trigrams field with about 1 million of entries in the result set and more than 100 million of entries to facet on in the index. Currently the faceted search is very slow, taking about 5 minutes per query. Would running on

Re: Stemming question

2011-03-16 Thread Markus Jelsma
Hmm, i'm not sure if its supposed to stem that way but if it doesn't and you insist then you might be able to abuse the PatternReplaceFilterFactory. On Wednesday 16 March 2011 06:02:32 Bill Bell wrote: > When I use the Porter Stemmer in Solr, it appears to take works that are > stemmed and replac

Re: Solr admin page timed out and index updating issues

2011-03-16 Thread Markus Jelsma
Yes, due to warmup queries Solr may run out of heap space at start up. On Monday 14 March 2011 16:52:15 Ranma wrote: > I am still stuck at the same point. > > Looking here and there I could read that the memory limit (heap space) may > need to be increased to -Xms512M -Xmx512M when launching the

Re: Solrj performance bottleneck

2011-03-16 Thread rahul
Hi, Thanks for your information. One simple question. Please clarify me. In our setup, we are having Solr index in one machine. And Solrj client part (java code) in another machine. Currently as you suggest, if it may be a 'not enough free RAM for the OS to cache' then whether I need to increase

Multiple spellchecker

2011-03-16 Thread royr
Hello, I have a problem with the SOLR spellchecker component. This is the problem: Searching term = Company: American today, City: London (two fields: copyfield to one: Spell ) User search = American tuday, Londen What i want is a collation of: American today london. SOLR returns with the q par

Maven : Specifying SNAPSHOT Artifacts and the Hudson Repository

2011-03-16 Thread Chantal Ackermann
Hi all, does anyone have a successfull setup (=pom.xml) that specifies the Hudson snapshot repository : https://builds.apache.org/hudson/job/Lucene-Solr-Maven-3.x/lastStableBuild/artifact/maven_artifacts (or that for trunk) and entries for any solr snapshot artifacts which are then found by Mave

RE: Faceting help

2011-03-16 Thread McGibbney, Lewis John
Hi Upayavira, I use the term constraint to define additional options for a user to refine search with under each facet. If we could think of them as sub facet's then maybe this would explain in slightly better terms. I didn't add additional document source types in my original email but if I kn

Re: SOLR DIH importing MySQL "text" column as a BLOB

2011-03-16 Thread Stefan Matheis
Kaushik, i just remembered an ML-Post few weeks ago .. same problem while importing geo-data (http://lucene.472066.n3.nabble.com/Solr-4-0-Spatial-Search-How-to-tp2245592p2254395.html) - the solution was: > CAST( CONCAT( lat, ',', lng ) AS CHAR ) at that time i search a little bit for the reason

Re: noobie question: sorting

2011-03-16 Thread James Lin
AWESOME, thanks for your time! Regards James On Wed, Mar 16, 2011 at 6:14 PM, David Smiley (@MITRE.org) < dsmi...@mitre.org> wrote: > Hi. Where did you find such an obtuse example? > > Recently, Solr supports sorting by function query. One such function is > named "query" which takes a query

query expansion à la dismax

2011-03-16 Thread Paul Libbrecht
Hello list, the dismax query type has one feature that is particularly nice... the ability to expand tokens to a query to many fields. This is really useful to do such jobs as "prefer a match in title, prefer exact matches over stemmed matches over phonetic matches". My problem: I wish to do