Re: How do I sort lucene search results by relevance and time?

2011-05-11 Thread Otis Gospodnetic
If only you were using Solr http://wiki.apache.org/solr/DisMaxQParserPlugin#bf_.28Boost_Functions.29 Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message > From: Johnbin Wang > To: java-user@l

Help needed on Ant build script for creating Lucene index

2011-05-11 Thread Saurabh Gokhale
Hi, Can someone pls direct me to an example where I can get help on creating ant build script for creating lucene index?. It is part of Lucene contrib but I did not get much idea from the documentation on Lucene site. Thanks Saurabh

found workaround: Query on using Payload with MoreLikeThis class

2011-05-11 Thread Saurabh Gokhale
Hi All, I am not sure if any one got chance to go over my question (below). The question was to check if I can modify MoreLikeThis.like() result using index time boosting. I have found a work around as there is no easy way to influence MoreLikeThis result using index time payload value. The wo

Re: Can I omit ShingleFilter's filler tokens

2011-05-11 Thread William Koscho
I meant I'm trying for #2 so this should work (got my numbers mixed up). Thanks again Bill On 5/11/11, William Koscho wrote: > #1 is what I'm trying for, so Ill give setPositionIncrements(false) a > try. Thanks for everyone's help. > > Bill > > On 5/11/11, Steven A Rowe wrote: >> Yes, StopFilte

Re: Can I omit ShingleFilter's filler tokens

2011-05-11 Thread William Koscho
#1 is what I'm trying for, so Ill give setPositionIncrements(false) a try. Thanks for everyone's help. Bill On 5/11/11, Steven A Rowe wrote: > Yes, StopFilter.setEnablePositionIncrements(false) will almost certainly get > higher throughput than inserting PositionFilter. Like PositionFilter, thi

Re: Bug in BrazilianAnalyzer?

2011-05-11 Thread Adriano Crestani
Hi, I think you forgot to attach the JUnit. On Wed, May 11, 2011 at 10:04 AM, wrote: > Hi, > I did a test to understand the use of '*'and '?'. > If I use StandardAnalyzer I have expected results by if a use > BrazilianAnalyzer I have a mistake result. > Please, where is my mistake? Junit is

Bug in BrazilianAnalyzer?

2011-05-11 Thread paulocsc
Hi, I did a test to understand the use of '*'and '?'. If I use StandardAnalyzer I have expected results by if a use BrazilianAnalyzer I have a mistake result. Please, where is my mistake? Junit is at the end. Paulo Cesar cities = {"Brasília","Brasilândia","Braslândia", "São Paulo", "São Roque

Re: Non-English Languages Search

2011-05-11 Thread Robert Muir
On Mon, May 9, 2011 at 5:32 PM, Provalov, Ivan wrote: > We are planning to ingest some non-English content into our application.  All > content is OCR'ed and there are a lot of misspellings and garbage terms > because of this.  Each document has one primary language with a some > exceptions (e.

RE: Can I omit ShingleFilter's filler tokens

2011-05-11 Thread Steven A Rowe
Yes, StopFilter.setEnablePositionIncrements(false) will almost certainly get higher throughput than inserting PositionFilter. Like PositionFilter, this will buy you #2 (create shingles as if stopwords were never there), but not #1 (don't create shingles across stopwords). > -Original Messa

Re: Can I omit ShingleFilter's filler tokens

2011-05-11 Thread Robert Muir
another idea is to .setEnablePositionIncrements(false) on your stopfilter. On Wed, May 11, 2011 at 8:27 AM, Steven A Rowe wrote: > Hi Bill, > > I can think of two possible interpretations of "removing filler tokens": > > 1. Don't create shingles across stopwords, e.g. for text "one two three four

RE: Can I omit ShingleFilter's filler tokens

2011-05-11 Thread Steven A Rowe
Hi Bill, I can think of two possible interpretations of "removing filler tokens": 1. Don't create shingles across stopwords, e.g. for text "one two three four five" and stopword "three", bigrams only, you'd get ("one two", "four five"), instead of the current ("one two", "two _", "_ four", "fou

Re: Sharding Techniques

2011-05-11 Thread Ian Lea
I'm sure that you should try building one large index and convert to NumericField wherever you can. I'm convinced that will be faster - but as ever, the proof will be in the numbers. On repeated terms, I believe that lucene will search multiple times. If so, I'd guess it is just something that ha

Re: Sharding Techniques

2011-05-11 Thread Ian Lea
Ganesh Nobody is saying that sharding is never a good idea - it just doesn't seem to be applicable in the case being discussed. On my indexes I care much more about speed of searching rather than speed of indexing. The latter typically happens in the background in the dead of night and within r

Re: Sharding Techniques

2011-05-11 Thread Samarendra Pratap
Hi Tom, the more i am getting responses in this thread the more i feel that our application needs optimization. 350 GB and less than 2 seconds!!! That's much more than my expectation :-) (in current scenario). *characteristics of slow queries:* there are a few reasons for greater search time