Adding line count to a document

2006-02-28 Thread Eyal Post
I'd like to add a line count field to my indexed document. The obvious way is to read my file twice, once to tokenize it and add it's content to a field in the document and once to count the number of lines in it and add it to another field. Any idea how can I optimize this and read the file once?

Re: Hacking proximity search: looking for feedback

2006-02-28 Thread Chris Hostetter
: Very good points, I hadn't considered the term frequency of the digits : affecting scoring. As an aside, can that aspect of the score be ignored for : these fields? The easiest way is to use a boost that is so low it's insignificant, or you could subclass TermQuery and override getSimilarity t

RE: Inside a Boolean Query

2006-02-28 Thread Seeta Somagani
Thanks Yonik, there they are -Original Message- From: Yonik Seeley [mailto:[EMAIL PROTECTED] Sent: Tuesday, February 28, 2006 4:59 PM To: java-user@lucene.apache.org Subject: Re: Inside a Boolean Query On 2/28/06, Seeta Somagani <[EMAIL PROTECTED]> wrote: > Is there a way that I can dete

Re: Inside a Boolean Query

2006-02-28 Thread Yonik Seeley
On 2/28/06, Seeta Somagani <[EMAIL PROTECTED]> wrote: > Is there a way that I can detect the composition of a BooleanQuery, > rather than just extract the individual terms? Hi Seeta, I think BooleanQuery.getClauses() might be what you are looking for. -Yonik

Inside a Boolean Query

2006-02-28 Thread Seeta Somagani
Hi, I need to return the context of the terms along with the results. The approach that I'm using is to 1) detect what kind of query it is, 2) extract the terms of the query, 3) fetch the context of the individual terms, and 4) finally join them depending on the

Re: Hacking proximity search: looking for feedback

2006-02-28 Thread Jeff Rodenburg
Very good points, I hadn't considered the term frequency of the digits affecting scoring. As an aside, can that aspect of the score be ignored for these fields? I need to spend more time with FunctionQuery, I haven't given it the attention it deserves. Great feedback, thanks for the notes. -- j

RE: RangeQuery or BooleanQuery?

2006-02-28 Thread Seeta Somagani
Hoss, Your observation about the spaces seems very likely. I hence, removed the spaces, padded the numbers and also tried using the RangeFilter, but still I got the same result. Upon closer inspection of my code, I found that I was tokenizing the "id" field, which was rendering that field illeg

Re: Hacking proximity search: looking for feedback

2006-02-28 Thread Jeff Rodenburg
Michael - Great thoughts, and thanks for the feedback. Following on the Range Query approach, how is performance? I found the range approach (albeit with the exact values) to be slower than the parsed-string approach I posited. On the custom scoring, is the distance element for sorting or as a

Re: Hacking proximity search: looking for feedback

2006-02-28 Thread Jeff Rodenburg
I'm in the same boat as Michael on this one. It's not a matter of finding the right technology to do geo-locational calculations, but rather being able to accomplish that task in conjunction with keyword search. -- j On 2/28/06, Bryzek.Michael <[EMAIL PROTECTED]> wrote: > > Our geo searches are

RE: Hacking proximity search: looking for feedback

2006-02-28 Thread Bryzek.Michael
Our geo searches are combined with keyword searches. We previously performed all of our queries in the database (Oracle 10g w/ interMedia for the unstructured portion) but found that it was easier to scale search outside the database than within. -Original Message- From: John Powers

RE: Hacking proximity search: looking for feedback

2006-02-28 Thread John Powers
I don't know if this matters, but we do all of our geolocating in sql with decent speed. All the trig is in the query itself and then we can limit top 5, top 10 etc for what we show.Is the data such that you need lucene?Can I ask what causes it to be beyond a databases ability? -Orig

RE: Hacking proximity search: looking for feedback

2006-02-28 Thread Bryzek.Michael
Jeff - This is an interesting approach. On our end, we have experimented with two variants: Variant 1: Use Range Query Rather than precomputing the boolean clauses yourself, index encoded latitude and longitude values and use a Range Query. We encode by adding 1000 to each of the values. Note: W

Re: Hacking proximity search: looking for feedback

2006-02-28 Thread Chris Hostetter
: Geo definition: : Boxing around a center point. It's not critical to do a radius search with : a given circle. A boxed approach allows for taller or wider frames of : reference, which are applicable for our use. if you are just loking to confine your results to a box then i think RangeFilteri

Hacking proximity search: looking for feedback

2006-02-28 Thread Jeff Rodenburg
I've been wrestling with a way to index and search data with a geo-positional aspect. By a geo-positional search, I want to constrain search results within a given location range. Furthermore, I want to allow the user to set/change the geo-positional boundaries as needed for their search. This i

Re: search problem

2006-02-28 Thread Chris Hostetter
: price and about 10 more additional fields. I want to not just find : something in the index also I want to get the lists of all brands and : price. The list of brands is needed for displaying all of the products : and the quantity of products of this brand for certain search request. 1) iterati

RE: RangeQuery or BooleanQuery?

2006-02-28 Thread Chris Hostetter
there are a couple of things that could be happening that make your results unexpected... : But, when I enter the query - id: [104 TO 200] content: "Marbella : EspaƱa" it's just returning me all the results while ignoring the range. 1) if you really have a space between the "id:" and the "[104 T

Re: Indexing performance with Lucene 1.9

2006-02-28 Thread Eric Jain
Otis Gospodnetic wrote: Regarding performance fix - if you can be more precise (is it really just more or less or is it as good as before), that would be great for those of us itching to use 1.9. To be more precise: The patch reduced the time required to build one large index from 13 to 11 ho

Re: Indexing performance with Lucene 1.9

2006-02-28 Thread Eric Jain
Otis Gospodnetic wrote: Regarding performance fix - if you can be more precise (is it really > just more or less or is it as good as before), that would be great > for those of us itching to use 1.9. Yes, I can confirm that performance differs by no more than 3.1 fraggles. ;-) --

Re: Indexing performance with Lucene 1.9

2006-02-28 Thread Otis Gospodnetic
Hi Eric, Regarding performance fix - if you can be more precise (is it really just more or less or is it as good as before), that would be great for those of us itching to use 1.9. Thanks, Otis - Original Message From: Eric Jain <[EMAIL PROTECTED]> To: java-user@lucene.apache.org Sent:

RE: RangeQuery or BooleanQuery?

2006-02-28 Thread Seeta Somagani
Please ignore the ContextQueryParser... I dumped that and switched back to the QueryParser which still gives me the same result. Thanks. Seeta Somagani -Original Message- From: Seeta Somagani [mailto:[EMAIL PROTECTED] Sent: Tuesday, February 28, 2006 10:54 AM To: java-user@lucene.apache

RangeQuery or BooleanQuery?

2006-02-28 Thread Seeta Somagani
Hi, My documents are in the following format. doc.add(new Field ("id",page, Field.Store.YES, Field.Index.TOKENIZED)); doc.add(new Field ("content",fileContent.toString(), Field.Store.YES, Field.Index.TOKENIZED, Field.TermVector.WITH_POSITIONS_OFFSETS)); I need to make a query on

Re: Restricting the number of docs per search field

2006-02-28 Thread emerson cargnin
Yes, the bottleneck is defenitely in lucene. The index is quite big, three files with more than 1 giga. We are querying for html extracts, with the id together, but it can return almost 50 extracts for each ID, and just the first 2 will be used. We could as well do 10 queries(that's the max number

Re: Restricting the number of docs per search field

2006-02-28 Thread Grant Ingersoll
Do you want the first 2 docs, regardless of score, with the same property or do you want the 2 highest scoring docs with the same property? You might look at the HitCollector search method on IndexSearcher. Btw, the Filter that is required can be null. The HitCollector interface allows you t

Re: Restricting the number of docs per search field

2006-02-28 Thread emerson cargnin
does anyone knows a solution for that? I know theres a method that returns a TopDoc, but it needs a filter, and in my case, Ill need the first 2 of each doc with the same value in a given property. On 27/02/06, emerson cargnin <[EMAIL PROTECTED]> wrote: > > Hi all > > Due a performance problem, I

Re: How to Configure lucene using tomcat

2006-02-28 Thread Erik Hatcher
On Feb 28, 2006, at 6:53 AM, Haritha_Parvatham wrote: I believe that snowball uses porter stemming algorthim. The Snowball stemmer came from Porter, yes. There was a looser stemming algorithm prior to Snowball though, which the PorterStemFilter uses. Anyway Is there is any other a

Re: Filter Field.Keyword possible?

2006-02-28 Thread Erik Hatcher
On Feb 28, 2006, at 8:11 AM, Samuru Jackson wrote: Also heed the other recommendations in LIA and don't necessarily use Filters when BooleanQuery clauses will suffice. There is overhead involved in the Filter mechanism in terms of executing multiple queries to build all the filters you're prop

Re: search problem

2006-02-28 Thread Michael D. Curtin
Anton Potehin wrote: I have a problem. There is an index, which contains about 6,000,000 records (15,000,000 will be soon) the size is 4GB. Index is optimized and consists of only one segment. This index stores the products. Each product has brand, price and about 10 more additional fields. I

Re: Filter Field.Keyword possible?

2006-02-28 Thread Samuru Jackson
> Also heed the other recommendations in LIA and don't necessarily use > Filters when BooleanQuery clauses will suffice. There is overhead > involved in the Filter mechanism in terms of executing multiple > queries to build all the filters you're proposing. I'm aware of the fact that using multip

How to Configure lucene using tomcat

2006-02-28 Thread Haritha_Parvatham
Hi erik, Iam sorry for using the same subject line. I believe that snowball uses porter stemming algorthim.Is there is any other alternative for snowball.Because I want stemmer which supports multilingualism. Please help me in configuring lucene step by step in my systemI have downloaded lucene

Re: Filter Field.Keyword possible?

2006-02-28 Thread Erik Hatcher
Haritha - please do not hijack threads (meaning you're replying to a message with one subject, but starting a new one with the same "subject" line). Please create a brand new message to the list with a new subject. The SnowballAnalyzer is available in Lucene, which incorporates the Snowb

RE: Filter Field.Keyword possible?

2006-02-28 Thread Haritha_Parvatham
Hi, Lucene uses stemmer for supporting multilingualism.The stemming algorthim differs from language to language. Can you tell me how many different types of stemmer available & which stemmer lucene supports.I believe it supports snowball stemmer.I have downloaded the snowball stemmer .it support

Re: Filter Field.Keyword possible?

2006-02-28 Thread Erik Hatcher
On Feb 28, 2006, at 6:10 AM, Samuru Jackson wrote: Hi again! 2) Use a QueryFilter with that same TermQuery, and apply that Filter to your search method. Thanks for the hint - I just bought "Lucene in Action" and now I'm more into it :-) But at this point I'm facing some Filter pro

Re: Filter Field.Keyword possible?

2006-02-28 Thread Erik Hatcher
On Feb 28, 2006, at 6:14 AM, Haritha_Parvatham wrote: Hi, Is there some one to guide to deploy lucene 1.4.3. Iam having lucene 1.4.3 sources.Please tell me the procedure to run lucene in my system.Iam using windows as os. First steps are to familiarize yourself with just what exactly Lucene

RE: Filter Field.Keyword possible?

2006-02-28 Thread Haritha_Parvatham
Hi, Is there some one to guide to deploy lucene 1.4.3. Iam having lucene 1.4.3 sources.Please tell me the procedure to run lucene in my system.Iam using windows as os. Thanks, -Original Message- From: Samuru Jackson [mailto:[EMAIL PROTECTED] Sent: Tuesday, February 28, 2006 4:41 PM

Re: Filter Field.Keyword possible?

2006-02-28 Thread Samuru Jackson
Hi again! > 2) Use a QueryFilter with that same TermQuery, and apply that Filter > to your search method. Thanks for the hint - I just bought "Lucene in Action" and now I'm more into it :-) But at this point I'm facing some Filter problems again. As proposed in LiA the easiest way would

Re: Indexing performance with Lucene 1.9

2006-02-28 Thread Eric Jain
Daniel Naber wrote: A fix has now been committed to trunk in SVN, it should be part of the next 1.9 release. Performance seems to have recovered, more or less, thanks! - To unsubscribe, e-mail: [EMAIL PROTECTED] For additiona

Maven2 Lucene 1.9 package

2006-02-28 Thread Andreas Baumann
Hi all, I need an official Maven2 package and wanted to build one. Then I saw the following in the documentation of Maven: "Maven partners The following sites sync automatically their project repository with the central one. If you want a project from any of this sites to be uploaded to ibib

Efficiently updating indexed documents

2006-02-28 Thread Nadav Har'El
A few days ago someone on this list asked how to efficiently "update" documents in the index, i.e., delete the old version of the document (found by some unique id field) and add the new version. The problem was that opening and closing the IndexReader and IndexWriter after each document was inef

search problem

2006-02-28 Thread Anton Potehin
I have a problem. There is an index, which contains about 6,000,000 records (15,000,000 will be soon) the size is 4GB. Index is optimized and consists of only one segment. This index stores the products. Each product has brand, price and about 10 more additional fields. I want to not just find so