Re: Benchmarking LUCENE-584 with contrib/benchmark

2007-04-02 Thread Antony Bowesman
Otis Gospodnetic wrote: Here is one more related question. It looks like the o.a.l.benchmark.Driver class is supposed to be a generic driver class that uses the Benchmarker configured in one of those conf/*.xml files. However, I see StandardBenchmarker.class hard-coded there: digester

Can Query.toString() output be parsed to the same query?

2007-04-02 Thread Kun Hong
Hi, I am new to Lucene. I find that the output of the Query.toString() method cannot be parsed back to the same query. Is it true? If it is true, I am wondering why not make the output of Query.toString() parsable to the same query again? Unless there is something prevent us to do so, such as: no

How to calculate centroid from HITS?

2007-04-02 Thread Lokeya
Hi All, I have queried and have got a HITS object which is a collection of documents. I want to find out the centroid of these documents. Centroid = Top Most 35(for eg)common terms across all the documents in the HITS object. Is there any API in Lucene for this? Thanks in Advance. -- View th

Re: Benchmarking LUCENE-584 with contrib/benchmark

2007-04-02 Thread Doron Cohen
Hi Otis, you could use the byTask package - add your-type-of-search-task. Suffix the new task class name by "Task" - e.g. NewNameTask - and then you can use the 'command' "NewName" in an alg file. I am not sure you can extend/reuse the existing ReadTask for this, because its implementation of sear

Re: Benchmarking LUCENE-584 with contrib/benchmark

2007-04-02 Thread Otis Gospodnetic
Here is one more related question. It looks like the o.a.l.benchmark.Driver class is supposed to be a generic driver class that uses the Benchmarker configured in one of those conf/*.xml files. However, I see StandardBenchmarker.class hard-coded there: digester.addObjectCreate("benchmar

Benchmarking LUCENE-584 with contrib/benchmark

2007-04-02 Thread Otis Gospodnetic
Hi, I'm looking at benchmarking Paul's http://issues.apache.org/jira/browse/LUCENE-584 code. I'd like to compare either: HitCollector.collect(doc, score) vs. MatchCollector.collect(doc) or IndexSearcher.search(Weight, Filter, HitCollector) vs. IndexSearcher.match(Query, MatchCollector) ..

Re: Deeper Ranking Issues in Lucene

2007-04-02 Thread Otis Gospodnetic
Xiangyu Jin, a better place to ask is the java-user list. You'll want to subscribe to that. You didn't mention Similarity/DefaultSimilarity classes. Maybe that's what you missed. Otis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Simpy -- http://www.simpy.com/ - Tag - Sear

Re: precision double sortable String

2007-04-02 Thread Yonik Seeley
On 4/2/07, MB Leasing <[EMAIL PROTECTED]> wrote: In Solr, NumberUtils.double2sortableStr prints out a literal question mark character '?' . You shouldn't try to print it out... it's essentially binary :-) -Yonik - To unsubscr

Re: precision double sortable String

2007-04-02 Thread Peter W.
One more thing... It could optionally be indexed and stored as a String then contents of the Hits object could be placed into a Collection with a comparator that sorts double values in reverse order. Regards, Peter W. On Apr 2, 2007, at 12:02 PM, "Peter W." <[EMAIL PROTECTED]> wrote: Hi

precision double sortable String

2007-04-02 Thread MB Leasing
Hi, I'm trying to turn a double with decimal point precision fifteen digits to the left into a sortable String for Lucene (.346210426731253). NumberTools is for longs so doesn't apply. In Solr, NumberUtils.double2sortableStr prints out a literal question mark character '?' . I tried making it

Re: search-time boosting

2007-04-02 Thread Mike Klaas
On 4/2/07, Ofer Nave <[EMAIL PROTECTED]> wrote: I'd like to be able to boost documents at search-time, and I'm not sure how to do it. Example: I'm building a search engine for products (comparison shopping). Many queries tend to indicate a category (i.e., 'digital cameras') as opposed to a pro

short documents = help me tweak Similarity??

2007-04-02 Thread John Kleven
My documents are cars... i.e., Nissan Altima Sports Package Nissan Altima Standard The problem I have is when i search "Nissan Altima", I want to get the 2nd hit back first, i.e. "Nissan Altima Standard", because it is shorter. However, this doesn't happen. They are both scored the exact same.

search-time boosting

2007-04-02 Thread Ofer Nave
I'd like to be able to boost documents at search-time, and I'm not sure how to do it. Example: I'm building a search engine for products (comparison shopping). Many queries tend to indicate a category (i.e., 'digital cameras') as opposed to a product (i.e., 'canon powershot'). I have the na

Re: HITS and termDoc give different results

2007-04-02 Thread dziadgba dziadgba
you were right thanks for help dziadgba 2007/3/11, Doron Cohen <[EMAIL PROTECTED]>: Is "Text" the only field in the index? Note that the search only looks at field "Text", while the terms() iteration as appears in that code might bump into a term with same text but in another field. A better c

Re: Searches fail while indexwriter is open

2007-04-02 Thread baronDodd
Many thanks for your response, some good points which I had not thought of, but unfortunately the problem remains. To clarify my index sequence in pseudo-code is this: if( fileExists( filePath ) ){ createIndexReader(); delectDoc( docNumber ); } createIndexWriter(); indexDoc

Re: Searches fail while indexwriter is open

2007-04-02 Thread Erick Erickson
Yes, you can search while index writes are taking place, but When you open an index reader, it essentially takes a snapshot of the index and further modifications of the index are not visible to that searcher as long as it's open. You must close and re-open the reader (and associated searcher

Re: Morphological Search Problem

2007-04-02 Thread Grant Ingersoll
Have you used Luke to see what is actually in the index? Or written some test cases for your analyzer to know that the appropriate tokens are coming out of your analyzer? Also, could you give more details about the filters you are using? I am not familiar w/ ExactTokensConstructorFilter,

Morphological Search Problem

2007-04-02 Thread Shaimaa Mohamed
Dear all, We are using a Unified Analyzer as the analyzer of Lucene so as to be able to index and search Arabic and English documents as well. Here is the code: public TokenStream tokenStream(String FieldName, Reader reader) { switch(analysisMode) {

Searches fail while indexwriter is open

2007-04-02 Thread baronDodd
I am currently writing a Lucene application and having a huge headache with concurrency. My requirements are that each time a file is indexed a search on its path is performed to see if an update (delete then re-index) is required. If a document with the same path exists then an IndexReader delet

Re: Lock files in a read-only application

2007-04-02 Thread Michael McCandless
"Nilesh Bansal" <[EMAIL PROTECTED]> wrote: > thanks for your replies. i have two more questions. > > You need to be really certain your own locking protects Lucene > > properly. Specifically, no IndexReader can be created (restarted) > > while a writer is open against the index, and, only one writ