Re: summing facets on a specific field

2012-02-06 Thread Johannes Goll
I meant stats=true&stats.field=price&stats.facet=category 2012/2/6 Johannes Goll : > you can use the StatsComponent > > http://wiki.apache.org/solr/StatsComponent > > with stats=true&stats.price=category&stats.facet=category > > and pull the sum fields from the resulting stats facets. > > Johanne

Re: summing facets on a specific field

2012-02-06 Thread Johannes Goll
you can use the StatsComponent http://wiki.apache.org/solr/StatsComponent with stats=true&stats.price=category&stats.facet=category and pull the sum fields from the resulting stats facets. Johannes 2012/2/5 Paul Kapla : > Hi everyone, > I'm pretty new to solr and I'm not sure if this can even

Re: Performance degradation with distributed search

2012-02-06 Thread XJ
Yonik, thanks for your explanation. I've created a ticket here https://issues.apache.org/jira/browse/SOLR-3104 On Mon, Feb 6, 2012 at 4:28 PM, Yonik Seeley wrote: > On Mon, Feb 6, 2012 at 6:16 PM, XJ wrote: > > Sorry I didn't make this clear. Yeah we use dismax in main query, as > well as > > in

Re: multiple values encountered for non multiValued field type:[text/html, text, html]

2012-02-06 Thread William_Xu
Thank you for your reply, it is much helpful for me ! -- View this message in context: http://lucene.472066.n3.nabble.com/multiple-values-encountered-for-non-multiValued-field-type-text-html-text-html-tp3719088p3721305.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr with Scala

2012-02-06 Thread Tommy Chheng
I have created a solr plugin using scala. It works without problems. I wouldn't go as far as using scala improve solr performance but you can definitely use scala to add a missing functionality or custom query parsing. Just build a jar using maven/sbt and put it in solr's lib directory. On Sun,

spell check - preserve case in suggestions

2012-02-06 Thread Satish Kumar
Hi, Say that the field name has the following terms: Giants Manning New York When someone searches for "gants" or "Gants", I need the suggestion to be returned as "Giants" (capital G - same case as in the content that was indexed). Using lowercase filter in both index and query analyzers I get

Re: Performance degradation with distributed search

2012-02-06 Thread Yonik Seeley
On Mon, Feb 6, 2012 at 5:53 PM, XJ wrote: > Yes as I mentioned in previous email, we do dismax queries(with different mm > values), solr function queries (map, etc) math calculations (sum, product, > log). I understand those are expensive. But worst case it should only double > the time not going

Re: Performance degradation with distributed search

2012-02-06 Thread XJ
Yes as I mentioned in previous email, we do dismax queries(with different mm values), solr function queries (map, etc) math calculations (sum, product, log). I understand those are expensive. But worst case it should only double the time not going from 200ms to 1200ms right? XJ On Mon, Feb 6, 201

Re: Performance degradation with distributed search

2012-02-06 Thread Yonik Seeley
On Mon, Feb 6, 2012 at 5:35 PM, XJ wrote: > hm.. just looked at the log only 112 matched, and start=0, rows=30 Are any of the sort criteria sort-by-function with anything complex (like an embedded relevance query)? -Yonik lucidimagination.com > > On Mon, Feb 6, 2012 at 1:33 PM, Yonik Seeley >

Re: Performance degradation with distributed search

2012-02-06 Thread XJ
hm.. just looked at the log only 112 matched, and start=0, rows=30 On Mon, Feb 6, 2012 at 1:33 PM, Yonik Seeley wrote: > On Mon, Feb 6, 2012 at 3:30 PM, oleole wrote: > > Thanks for your reply. Yeah that's the first thing I tried (adding > fsv=true > > to the query) and it surprised me too. Coul

Re: Performance degradation with distributed search

2012-02-06 Thread Yonik Seeley
On Mon, Feb 6, 2012 at 3:30 PM, oleole wrote: > Thanks for your reply. Yeah that's the first thing I tried (adding fsv=true > to the query) and it surprised me too. Could it due to we're using many > complex sortings (20 sortings with dismax, and, or...). Any thing it can be > optimized? Looks lik

Re: Performance degradation with distributed search

2012-02-06 Thread XJ
BTW we just upgraded to Solr 3.5 from Solr 1.4. Thats why we want to explore the improvements/new features of distributed search. On Mon, Feb 6, 2012 at 12:30 PM, oleole wrote: > Yonik, > > Thanks for your reply. Yeah that's the first thing I tried (adding fsv=true > to the query) and it surpris

Re: Performance degradation with distributed search

2012-02-06 Thread oleole
Yonik, Thanks for your reply. Yeah that's the first thing I tried (adding fsv=true to the query) and it surprised me too. Could it due to we're using many complex sortings (20 sortings with dismax, and, or...). Any thing it can be optimized? Looks like it's calculated twice in solr? XJ -- View t

solrcore.properties

2012-02-06 Thread Walter Underwood
Looking at SOLR-1335 and the wiki, I'm not quite sure of the final behavior for this. These properties are per-core, and not visible in other cores, right? Are variables substituted in solr.xml, so I can swap in different properties files for dev, test, and prod? Like this: If that does not

Re: Parallel indexing in Solr

2012-02-06 Thread Erick Erickson
. I've had recurring discussions with "executive level folks" that no matter how many VMs you host on a machine, and no matter how big that machine is, there really, truly, *is* some hardware underlying it all that really, truly, *does* have some limits. And adding more VMs doesn't somehow get aro

Re: SolrCell maximum file size

2012-02-06 Thread Augusto Camarotti
Thanks for the tips Erick, i'm really talking about 2.5GB files full of data to be indexed. Like .csv files or .xls, .ods and so on. I guess I will try to do a great increase on the memory the JVM will be able to use. Regards, Augusto >>> Erick Erickson 1/27/2012 1:22 pm >>> Hmmm, I'd go c

Commit call - ReadTimeoutException -> usage scenario for big update requests and the ioexception case

2012-02-06 Thread Torsten Krah
Hi, i wonder if it is possible to commit data to solr without having to catch SockedReadTimeout Exceptions. I am calling commit(false, false) using a streaming server instance - but i still have to wait > 30 seconds and catch the timeout from http method. I does not matter if its 30 or 60, it wil

Re: Parallel indexing in Solr

2012-02-06 Thread Per Steffensen
Sami Siren skrev: On Mon, Feb 6, 2012 at 2:53 PM, Per Steffensen wrote: Actually right now, I am trying to find our what my bottleneck is. The setup is more complex, than I would bother you with, but basically I have servers with 80-90% IO-wait and only 5-10% "real CPU usage". It might not

Re: Parallel indexing in Solr

2012-02-06 Thread Per Steffensen
So SolrJ with CommonsHttpSolrServer will not support handling several requests concurrently? Nope. Use StreamingUpdateSolrServer, it should be just a drop-in with a different constructor. I will try to do that. It is a little bit difficult for me, as we are actually not dealing with

Re: Parallel indexing in Solr

2012-02-06 Thread Sami Siren
On Mon, Feb 6, 2012 at 2:53 PM, Per Steffensen wrote: > Actually right now, I am trying to find our what my bottleneck is. The setup > is more complex, than I would bother you with, but basically I have servers > with 80-90% IO-wait and only 5-10% "real CPU usage". It might not be a > Solr-relat

Re: Parallel indexing in Solr

2012-02-06 Thread Erick Erickson
Right. See below. On Mon, Feb 6, 2012 at 7:53 AM, Per Steffensen wrote: > See response below > > Erick Erickson skrev: > >> Unfortunately, the answer is "it depends(tm)". >> >> First question: How are you indexing things? SolrJ? post.jar? >> > > SolrJ, CommonsHttpSolrServer > >> But some observat

Re: Searching context within a book

2012-02-06 Thread Robert Stewart
You are probably better off splitting up each book into separate SOLR documents, one document per paragraph (each document with same book ID, ISBN, etc.). Then you can use field-collapsing on the book ID to return a single document per book. And you can use highlighting to show the paragraph

Re: Replication problem on windows

2012-02-06 Thread Rafał Kuć
Hello! Thanks for the answer Shawn. -- Regards, Rafał Kuć Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch > On 2/6/2012 3:04 AM, Rafał Kuć wrote: >> Hello! >> >> We have Solr running on Windows. Once in a while we see a problem with >> replication failing. While slave server replic

Re: Replication problem on windows

2012-02-06 Thread Shawn Heisey
On 2/6/2012 3:04 AM, Rafał Kuć wrote: Hello! We have Solr running on Windows. Once in a while we see a problem with replication failing. While slave server replicates the index, it throws exception like the following: SEVERE: Unable to copy index file from: D:\web\solr\collection\data\index.20

Is Solr waiting for data to arrive

2012-02-06 Thread Per Steffensen
Hi I have a setup where a lot is going on, but where there is about 80-90% IO-wait (%wa in top). I have a suspicion that this is due to slow networking. I would like someone to help med interpret threaddumps (retrieved using kill -3). Whenever I do threaddumps I see that most threads have th

Re: effect of continuous deletes on index's read performance

2012-02-06 Thread Michael McCandless
On Mon, Feb 6, 2012 at 8:20 AM, prasenjit mukherjee wrote: > Pardon my ignorance, Why can't the IndexWriter and IndexSearcher share > the same underlying in-memory datastructure so that IndexSearcher need > not be reopened with every commit. Because the semantics of an IndexReader in Lucene guar

Re: effect of continuous deletes on index's read performance

2012-02-06 Thread prasenjit mukherjee
Pardon my ignorance, Why can't the IndexWriter and IndexSearcher share the same underlying in-memory datastructure so that IndexSearcher need not be reopened with every commit. On 2/6/12, Erick Erickson wrote: > Your continuous deletes won't affect performance > noticeably, that's true. > > But

Re: effect of continuous deletes on index's read performance

2012-02-06 Thread Nagendra Nagarajayya
You could also try Solr 3.4 with RankingAlgorithm as this offers NRT. You can get more information about NRT for Solr 3.4 from here: http://solr-ra.tgels.org/wiki/en/Near_Real_Time_Search_ver_3.x Regards, - Nagendra Nagarajayya http://solr-ra.tgels.org http://rankingalgorithm.tgels.org On 2/

Re: effect of continuous deletes on index's read performance

2012-02-06 Thread Erick Erickson
Your continuous deletes won't affect performance noticeably, that's true. But you're really doing bad things with the commit after every add or delete. You haven't said whether you have a master/ slave setup or not, but assuming you're searching on the same machine you're indexing to, each time yo

Re: Parallel indexing in Solr

2012-02-06 Thread Per Steffensen
See response below Erick Erickson skrev: Unfortunately, the answer is "it depends(tm)". First question: How are you indexing things? SolrJ? post.jar? SolrJ, CommonsHttpSolrServer But some observations: 1> sure, using multiple cores will have some parallelism. So will using a single co

Searching context within a book

2012-02-06 Thread pistacchio
I'm very new to Solr and I'm evaluating it. My task is to look for words within a corpus of books and return them within a small context. So far, I'm storing the books in a database split by paragraphs (slicing the books by line breaks), I do a fulltext search and return the row. In Solr, would I

multiple values encountered for non multiValued field type:[text/html, text, html]

2012-02-06 Thread William_Xu
Hi everyone: when i index my crawl result form some bbs site by solr, then i got that error. Is there someone could help me? my solr schema is :

Re: Which Tokeniser (and/or filter)

2012-02-06 Thread Robert Brown
mapping dots to spaces. I don't think that's workable anyway since ".net" would cause issues. Tying out the wdftypes now... --- IntelCompute Web Design & Local Online Marketing http://www.intelcompute.com On Mon, 6 Feb 2012 04:10:18 -0800 (PST), Ahmet Arslan wrote: >> My fear is what will t

Re: Which Tokeniser (and/or filter)

2012-02-06 Thread Ahmet Arslan
> My fear is what will then happen with > highlighting if I use re-mapping? What do you mean by re-mapping?

Re: Which Tokeniser (and/or filter)

2012-02-06 Thread Robert Brown
My fear is what will then happen with highlighting if I use re-mapping? On Mon, 6 Feb 2012 03:33:03 -0800 (PST), Ahmet Arslan wrote: >> I need to tokenise on whitespace, full-stop, and comma >> ONLY. >> >> Currently using solr.WhitespaceTokenizerFactory with >> WordDelimiterFilterFactory but th

Re: Which Tokeniser (and/or filter)

2012-02-06 Thread Ahmet Arslan
> I need to tokenise on whitespace, full-stop, and comma > ONLY. > > Currently using solr.WhitespaceTokenizerFactory with > WordDelimiterFilterFactory but this is also splitting on > &, /, new-line, etc. WDF is customizable via types="wdftypes.txt" parameter. https://svn.apache.org/repos/asf/lu

Improving performance for SOLR geo queries?

2012-02-06 Thread Matthias Käppler
Hi, we need to perform fast geo lookups on an index of ~13M places, and were running into performance problems here with SOLR. We haven't done a lot of query optimization / SOLR tuning up until now so there's probably a lot of things we're missing. I was wondering if you could give me some feedbac

Phonetic search and matching

2012-02-06 Thread Dirk Högemann
Hi, I have a question on phonetic search and matching in solr. In our application all the content of an article is written to a full-text search field, which provides stemming and a phonetic filter (cologne phonetic for german). This is the relevant part of the configuration for the index analyzer

Re: multiple values encountered for non multiValued field type:[text/html, text, html]

2012-02-06 Thread tamanjit.bin...@yahoo.co.in
Hi I am not sure if what you are doing is possible i.e. having a schema other than that provided by nutch. The schema provided by nutch in its directory \conf is to be used as the solr schema. -- View this message in context: http://lucene.472066.n3.nabble.com/multiple-values-encountered-for-non-

Replication problem on windows

2012-02-06 Thread Rafał Kuć
Hello! We have Solr running on Windows. Once in a while we see a problem with replication failing. While slave server replicates the index, it throws exception like the following: SEVERE: Unable to copy index file from: D:\web\solr\collection\data\index.2011102510\_3s.fdt to: D:\web\solr\Col

Symbols in synonyms

2012-02-06 Thread Robert Brown
is it good practice, common, or even possible to put symbols in my list of synonyms? I'm having trouble indexing and searching for "A&E", with it being split on the &. we already convert .net to dotnet, but don't want to store every combination of 2 letters, A&E, M&E, etc. -- IntelComp

Which Tokeniser (and/or filter)

2012-02-06 Thread Robert Brown
Hi, I need to tokenise on whitespace, full-stop, and comma ONLY. Currently using solr.WhitespaceTokenizerFactory with WordDelimiterFilterFactory but this is also splitting on &, /, new-line, etc. It seems such a simple setup, what am I doing wrong? what do you use for such "normal searchin

Re: multiple values encountered for non multiValued field type:[text/html, text, html]

2012-02-06 Thread William_Xu
error message: org.apache.solr.common.SolrException: ERROR: [http://bbs.dichan.com/] mult iple values encountered for non multiValued field type: [text/html, text, html] at org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.jav a:242) at org.apache.solr.update.proc

Weighting categories

2012-02-06 Thread Ramo Karahasan
Hi, i've a table with products and their proper categories. Is it possible to weight categories, so that a user that searches for "apple ipad" don't get a magazine about apple ipad at the first result but the "hardware" apple ipad? I'm using DHI for indexing data, but don't know if there is any