Query boosting

2010-02-18 Thread deepak agrawal
Hi, i want to boost the result through query. i have 4 fields in our schema. field name=UPDBY type=text indexed=true stored=true/ field name=TO type=text indexed=true stored=true/ field name=CC type=text indexed=true stored=true/ field name=BCC type=text indexed=true stored=true/ If i search

Re: labeling facets and highlighting question

2010-02-18 Thread gwk
There's a ! missing in there, try {!key=label}. Regards, gwk On 2/18/2010 5:01 AM, adeelmahmood wrote: okay so if I dont want to do any excludes then I am assuming I should just put in {key=label}field .. i tried that and it doesnt work .. it says undefined field {key=label}field Lance

Re: Upgrading Tika in Solr

2010-02-18 Thread Christian Vogler
Just a word of caution: I've been bitten by this bug, which affects Tika 0.6: https://issues.apache.org/jira/browse/PDFBOX-541 It causes the parser to go into an infinite loop, which isn't exactly great for server stability. Tika 0.4 is not affected in the same way - as far as I remember, the

Re: Query boosting

2010-02-18 Thread Paul Dhaliwal
Try using the dismax handler http://wiki.apache.org/solr/DisMaxRequestHandler This would be very good read for you. you would use the bq ( boost query parameter) and it should look something similar to.. bq=UPDBY:deepak^5.0+TO:deepak^4.0+CC:deepak^3.0+BCC:deepak^2.0 Paul On Thu, Feb 18, 2010

Re: getting unexpected statscomponent values

2010-02-18 Thread Koji Sekiguchi
solr-user wrote: Hossman, what do you mean by including a TestCase? Will create issue in Jira asap; I will include the URL, schema and some code to generate sample data I think they are good for TestCase. Koji -- http://www.rondhuit.com/en/

java.io.IOException: read past EOF after Solr 1.4.0

2010-02-18 Thread Koji Sekiguchi
Using release-1.4.0 or trunk branch Solr and indexing example data and search 0 boosted word: http://localhost:8983/solr/select/?q=usb^0.0 I got the following exception: java.io.IOException: read past EOF at

some scores to 0 using omitNorns=false

2010-02-18 Thread Raimon Bosch
Hi, We did some tests with omitNorms=false. We have seen that in the last result's page we have some scores set to 0.0. This scores setted to 0 are problematic to our sorters. It could be some kind of bug? Regrads, Raimon Bosch. -- View this message in context:

Re: java.io.IOException: read past EOF after Solr 1.4.0

2010-02-18 Thread Yonik Seeley
2010/2/18 Koji Sekiguchi k...@r.email.ne.jp: Using release-1.4.0 or trunk branch Solr and indexing example data and search 0 boosted word: http://localhost:8983/solr/select/?q=usb^0.0 Confirmed - looks like Solr is requesting an incorrect docid. I'm looking into it. -Yonik

score computation for dismax handler

2010-02-18 Thread bharath venkatesh
Hi , When query is made across multiple fields in dismax handler using paramater qf , I have observed that with debug query enabled the resultant score is max score of scores of query across each fields . but I want the resultant score to be sum of score across fields (like the standard

Re: Realtime search and facets with very frequent commits

2010-02-18 Thread Otis Gospodnetic
Hi Janne, I *think* Ocean Realtime Search has been superseded by Lucene NRT search. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Hadoop ecosystem search :: http://search-hadoop.com/ - Original Message From: Janne Majaranta janne.majara...@gmail.com To:

Re: parsing strings into phrase queries

2010-02-18 Thread Robert Muir
i gave it a rough shot Lance, if there's a better way to explain it, please edit On Wed, Feb 17, 2010 at 10:23 PM, Lance Norskog goks...@gmail.com wrote: That would be great. After reading this and the PositionFilter class I still don't know how to use it. On Wed, Feb 17, 2010 at 12:38 PM,

Re: What is largest reasonable setting for ramBufferSizeMB?

2010-02-18 Thread Otis Gospodnetic
Hi Tom, 32MB is very low, 320MB is medium, and I think you could go higher, just pick whichever garbage collector is good for throughput. I know Java 1.6 update 18 also has some Hotspot and maybe also GC fixes, so I'd use that. Finally, this sounds like a good use case for reindexing with

Re: optimize is taking too much time

2010-02-18 Thread Jagdish Vasani
Hi, you should not optimize index after each insert of document.insted you should optimize it after inserting some good no of documents. because in optimize it will merge all segments to one according to setting of lucene index. thanks, Jagdish On Fri, Feb 12, 2010 at 4:01 PM, mklprasad

Re: getting unexpected statscomponent values

2010-02-18 Thread Erick Erickson
SOLR makes heavy use of JUnit for testing. The real advantage of a JUnit testcase being attached is that it can then be permanently incorporated into the SOLR builds. If you're unfamiliar with JUnit, then providing the raw data that illustrates the bug allows people who work on SOLR to save a

Re: some scores to 0 using omitNorns=false

2010-02-18 Thread adeelmahmood
I was gonna ask a question about this but you seem like you might have the answer for me .. wat exactly is the omitNorms field do (or is expected to do) .. also if you could please help me understand what termVectors and multiValued options do ?? Thanks for ur help Raimon Bosch wrote: Hi,

Re: Realtime search and facets with very frequent commits

2010-02-18 Thread Janne Majaranta
Hi Otis, Ok, now I'm confused ;) There seems to be a bit activity though when looking at the last updated timestamps in the google code project wiki: http://code.google.com/p/oceansearch/w/list The Tag Index feature sounds very interesting. -Janne 2010/2/18 Otis Gospodnetic

Re: some scores to 0 using omitNorns=false

2010-02-18 Thread Raimon Bosch
I am not an expert in lucene scoring formula, but omintNorms=false makes the scoring formula a little bit more complex, taking into account boosting for fields and documents. If I'm not wrong (if I am please, correct me) I think that with omitNorms=false take into account the queryNorm(q) and

including 'the' dismax query kills results

2010-02-18 Thread Nagelberg, Kallin
I've noticed some peculiar behavior with the dismax searchhandler. In my case I'm making the search The British Open, and am getting 0 results. When I change it to British Open I get many hits. I looked at the query analyzer and it should be broken down to british and open tokens ('the' is a

Re: optimize is taking too much time

2010-02-18 Thread NarasimhaRaju
Hi, You can also make use of autocommit feature of solr. You have two possibilities either based on max number of uncommited docs or based on time. see updateHandler of your solrconfig.xml. Example:- autoCommit    !--     maxDocs1/maxDocs    --        !-- maximum time (in MS) after adding

Re: dataimporthandler and expungeDeletes=false

2010-02-18 Thread Jorg Heymans
I found the error. The uniqueKey definition in schema.xml was not set to the primary key field/column as returned by the deletedPkQuery. Jorg On Wed, Feb 17, 2010 at 11:38 AM, Jorg Heymans jorg.heym...@gmail.comwrote: Looking closer at the documentation, it appears that expungeDeletes in fact

Re: Realtime search and facets with very frequent commits

2010-02-18 Thread Jason Rutherglen
Janne, I don't think there's any activity happening there. SOLR-1606 is the tracking issue for moving to per segment facets and docsets. I haven't had an immediate commercial need to implement those. Jason On Thu, Feb 18, 2010 at 7:04 AM, Janne Majaranta janne.majara...@gmail.com wrote: Hi

replications issue

2010-02-18 Thread giskard
Hi all, I've setup solr replication as described in the wiki. when i start the replication a directory called index.$numebers is created after a while it disappears and a new index.$othernumbers is created index/ remains untouched with an empty index. any clue? thank you in advance, Riccardo

Schema error unknown field

2010-02-18 Thread Pulkit Singhal
I'm getting the following exception SEVERE: org.apache.solr.common.SolrException: ERROR:unknown field 'desc' I'm wondering what I need to do in order to add the desc field to the Solr schema for indexing?

@Field annotation support

2010-02-18 Thread Pulkit Singhal
Hello All, When I use Maven or Eclipse to try and compile my bean which has the @Field annotation as specified in http://wiki.apache.org/solr/Solrj page ... the compiler doesn't find any class to support the annotation. What jar should we use to bring in this custom Solr annotation?

Re: Schema error unknown field

2010-02-18 Thread Erick Erickson
Add desc as a field in your schema.xml file would be my first guess. Providing some explanation of what you're trying to do would help diagnose your issues. HTH Erick On Thu, Feb 18, 2010 at 12:21 PM, Pulkit Singhal pulkitsing...@gmail.comwrote: I'm getting the following exception

Re: parsing strings into phrase queries

2010-02-18 Thread Kevin Osborn
The PositionFilter worked great for my purpose along with another filter that I build. In my case, my indexed data may be something like X150. So, a query for Nokia X150 should match. But I don't want random matches on x. However, if my indexed data is G7, I do want a query on PowerShot G7 to

Re: Faceting

2010-02-18 Thread José Moreira
have you used UIMA? i did a quick read on the docs and it seems to do what i'm looking for. 2010/2/11 Otis Gospodnetic otis_gospodne...@yahoo.com Note that UIMA doesn't doe NER itself (as far as I know), but instead relies on GATE or OpenNLP or OpenCalais, AFAIK :) Those interested in UIMA

Re: Realtime search and facets with very frequent commits

2010-02-18 Thread Janne Majaranta
Ok, thanks. -Janne 2010/2/18 Jason Rutherglen jason.rutherg...@gmail.com Janne, I don't think there's any activity happening there. SOLR-1606 is the tracking issue for moving to per segment facets and docsets. I haven't had an immediate commercial need to implement those. Jason On

Re: including 'the' dismax query kills results

2010-02-18 Thread Joe Calderon
use the common grams filter, itll create tokens for stop words and their adjacent terms On Thu, Feb 18, 2010 at 7:16 AM, Nagelberg, Kallin knagelb...@globeandmail.com wrote: I've noticed some peculiar behavior with the dismax searchhandler. In my case I'm making the search The British Open,

Re: Deleting spelll checker index

2010-02-18 Thread darniz
Thanks If this is really the case, i declared a new filed called mySpellTextDup and retired the original field. Now i have a new field which powers my dictionary with no words in it and now i am free to index which ever term i want. This is not the best of solution but i cant think of a

Re: Schema error unknown field

2010-02-18 Thread Pulkit Singhal
I guess my n00b-ness is showing :) I started off using the instructions directly from http://wiki.apache.org/solr/Solrj and there was no mention of schema there and even after gettign this error and searching for schema.xml in the wiki ... I found no meaningful hits so I thought it best to ask.

Re: What is largest reasonable setting for ramBufferSizeMB?

2010-02-18 Thread Tom Burton-West
Thanks Otis, I don't know enough about Hadoop to understand the advantage of using Hadoop in this use case. How would using Hadoop differ from distributing the indexing over 10 shards on 10 machines with Solr? Tom Otis Gospodnetic wrote: Hi Tom, 32MB is very low, 320MB is medium, and

Re: What is largest reasonable setting for ramBufferSizeMB?

2010-02-18 Thread Yonik Seeley
On Thu, Feb 18, 2010 at 8:52 AM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: 32MB is very low, 320MB is medium, and I think you could go higher, just pick whichever garbage collector is good for throughput.  I know Java 1.6 update 18 also has some Hotspot and maybe also GC fixes, so

Re: Schema error unknown field

2010-02-18 Thread Smiley, David W.
On Feb 18, 2010, at 3:27 PM, Erick Erickson wrote: The Manning book for SOLR or LucidWorks are good resources And of course the PACKT book ;-) ~ David Smiley Author: http://www.packtpub.com/solr-1-4-enterprise-search-server/

Run Solr within my war

2010-02-18 Thread Pulkit Singhal
Hello Everyone, I do NOT want to host Solr separately. I want to run it within my war with the Java Application which is using it. How easy/difficult is that to setup? Can anyone with past experience on this topic, please comment. thanks, - Pulkit

Re: Run Solr within my war

2010-02-18 Thread Dave Searle
Why would you want to? Surely having it seperate increases scalablity? On 18 Feb 2010, at 22:23, Pulkit Singhal pulkitsing...@gmail.com wrote: Hello Everyone, I do NOT want to host Solr separately. I want to run it within my war with the Java Application which is using it. How

Re: Schema error unknown field

2010-02-18 Thread Erick Erickson
Oops, got my Manning MEAP edition of LIA II mixed up with my PACKT SOLR 1.4 book. But some author guy caught my gaffe G... Erick On Thu, Feb 18, 2010 at 5:13 PM, Smiley, David W. dsmi...@mitre.org wrote: On Feb 18, 2010, at 3:27 PM, Erick Erickson wrote: The Manning book for SOLR or

Re: Run Solr within my war

2010-02-18 Thread Pulkit Singhal
Yeah I have been pitching that but I want all the functionality of Solr in a small package because it is not a concern given the specifically limited data set being searched upon. I understand that the # of users is still another part of this equation but there just aren't that many at this time

spellcheck.build=true has no effect

2010-02-18 Thread darniz
Hello All. After doing a lot of research i came to this conclusion please correct me if i am wrong. i noticed that if you have buildonCommit and buildOnOptimize as true in your spell check component, then the spell check builds whenever a commit or optimze happens. which is the desired behaviour

Range Searches in Collections

2010-02-18 Thread cjkadakia
Hi, I'm trying to do a search on a range of floats that are part of my solr schema. Basically we have a collection of fees that are associated with each document in our index. The query I tried was: q=fees:[3 TO 10] This should return me documents with Fee values between 3 and 10 inclusively,

Re: Run Solr within my war

2010-02-18 Thread Richard Frovarp
On 2/18/2010 4:22 PM, Pulkit Singhal wrote: Hello Everyone, I do NOT want to host Solr separately. I want to run it within my war with the Java Application which is using it. How easy/difficult is that to setup? Can anyone with past experience on this topic, please comment. thanks, - Pulkit

Re: Range Searches in Collections

2010-02-18 Thread Otis Gospodnetic
Hm, yes, it sounds like your fees field has multiple values/tokens, one for each fee. That's full-text search for you. :) How about having multiple fee fields, each with just one fee value? Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Hadoop ecosystem search ::

Re: What is largest reasonable setting for ramBufferSizeMB?

2010-02-18 Thread Otis Gospodnetic
Hi Tom, It wouldn't. I didn't see the mention of parallel indexing in the original email. :) Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Hadoop ecosystem search :: http://search-hadoop.com/ - Original Message From: Tom Burton-West tburtonw...@gmail.com To:

Re: parsing strings into phrase queries

2010-02-18 Thread Otis Gospodnetic
This sounds useful to me! Here's a pointer: http://wiki.apache.org/solr/HowToContribute Thanks! Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Hadoop ecosystem search :: http://search-hadoop.com/ From: Kevin Osborn osbo...@yahoo.com To:

Re: replications issue

2010-02-18 Thread Otis Gospodnetic
giskard, Is this on the master or on the slave(s)? Maybe you can paste your replication handler config for the master and your replication handler config for the slave. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Hadoop ecosystem search :: http://search-hadoop.com/

Re: @Field annotation support

2010-02-18 Thread Noble Paul നോബിള്‍ नोब्ळ्
solrj jar On Thu, Feb 18, 2010 at 10:52 PM, Pulkit Singhal pulkitsing...@gmail.com wrote: Hello All, When I use Maven or Eclipse to try and compile my bean which has the @Field annotation as specified in http://wiki.apache.org/solr/Solrj page ... the compiler doesn't find any class to

Re: optimize is taking too much time

2010-02-18 Thread mklprasad
Jagdish Vasani-2 wrote: Hi, you should not optimize index after each insert of document.insted you should optimize it after inserting some good no of documents. because in optimize it will merge all segments to one according to setting of lucene index. thanks, Jagdish On Fri,