RE: How to obtain raw scores?

2008-11-19 Thread Ng Vinny
hi Is there any documentation that says that scores obtained from TopDocs.scoredocs[i].score are comparable across queries. I am having this problem myself so I would really appreciate if anyone has some pointers to this. At [1], it seems like they are not. Is there any solution to enable this co

Re: altering the value of non indexed fields

2008-11-19 Thread Michael McCandless
Unfortunately, not yet. There have been discussions about this, including this issue for "column-stride fields": https://issues.apache.org/jira/browse/LUCENE-1231 But no real progress on it lately... Mike Diego Cassinera wrote: Hello All I´m writing an application to move full te

Re: How to search documents taking in account the dates ???

2008-11-19 Thread Erick Erickson
Well, MultiSearcher is just a Searcher, so you have available all of the search methods on Searcher. One of which is: search public TopFieldDocs *search*(Query query, Filter filter, int n, Sort sort)

Re: How to search documents taking in account the dates ???

2008-11-19 Thread Ariel
Well, this is what I am doing: queryString="year:[2003 TO 2005]" [CODE] Query pquery = null; Hits hits = null; Analyzer analyzer = null; analyzer = new SnowballAnalyzer("English"); try { pquery = MultiFieldQueryParser.parse(new String[] {queryString, queryString}, new S

Re: How to search documents taking in account the dates ???

2008-11-19 Thread Ian Lea
Are you using one of the search methods that includes sorting? If not, then do. If you are, then you need to tell us exactly what you are doing and exactly what you reckon is going wrong. -- Ian. On Wed, Nov 19, 2008 at 6:23 PM, Ariel <[EMAIL PROTECTED]> wrote: > it is supposed lucene make a

Re: Term numbering and range filtering

2008-11-19 Thread Paul Elschot
Tim, Op Wednesday 19 November 2008 02:32:40 schreef Tim Sturge: ... > >> > >> This is less than 2x slower than the dedicated bitset and more > >> than 50x faster than the range boolean query. > >> > >> Mike, Paul, I'm happy to contribute this (ugly but working) code > >> if there is interest. Let

Re: How to search documents taking in account the dates ???

2008-11-19 Thread Ariel
it is supposed lucene make a lexicocraphic sorting but this is not hapening, Could you tell me what I'm doing wrong ? I hope you can help me. Regards On Wed, Nov 19, 2008 at 11:56 AM, Ariel <[EMAIL PROTECTED]> wrote: > Thanks, that was very helpful, but I have a question when I make the > searche

Re: IndexSearcher and multi-threaded performance

2008-11-19 Thread Tomer Gabel
It's more than possible, it's probable. Cache thrashing would definitely be my first guess; with so many copies of the exact same data you're not only missing out on significant gains with the L2 cache, you're also taking a major hit with every cache miss (which probably happens every context swit

RE: How to obtain raw scores?

2008-11-19 Thread Teruhiko Kurosaka
Please ignore this question. I've noticed it was answered in another thread just before I posted my question. Answer: use TopDocs.scoredocs[i].score T. "Kuro" Kurosaka, Basis Technology San Francisco, California, U.S.A. -

Re: Lucene implementation/performance question

2008-11-19 Thread Greg Shackles
I have a couple quick questions...it might just be because I haven't looked at this in a week now (got pulled away onto some other stuff that had to take priority). In the searching phase, I would run the search across all page documents, and then for each of those pages, do a search with PayloadS

Re: How to search documents taking in account the dates ???

2008-11-19 Thread Ariel
Thanks, that was very helpful, but I have a question when I make the searches it does not sort the results according to the range, for example: year: [2003 TO 2008] in the first page 2003 documents are showed, in the second 2005 documents, in the third page 2004 documents, I don't see any sort crit

Re: 2.4 Performance

2008-11-19 Thread Paul Elschot
Op Wednesday 19 November 2008 03:39:01 schreef [EMAIL PROTECTED]: ... > > Our design is roughly as follows: we have some pre-query filters, > queries typically involving around 25 clauses, and some > post-processing of hits. We collect counts and filter post query > using a hit collector, which use

How to obtain raw scores?

2008-11-19 Thread Teruhiko Kurosaka
Hello, Is there anyway to obtain a raw hit score? I understand the deprecated Hits.getScore() returns normalized scores, relative to each query. Is TopDocs.scoreDocs[i].score also normalized, or raw? I'd like to compare confidence levels of hits among different queries. Thanks. T. "Kuro" K

altering the value of non indexed fields

2008-11-19 Thread Diego Cassinera
Hello All I´m writing an application to move full text search out of my rdbms. Today the app hits the db two times. 1) to do the search it self. 2) to format the output of the search results. In my plan I´m moving everything to lucene documents that contain fields where I will be doing the s

Re: How to search documents taking in account the dates ???

2008-11-19 Thread Ian Lea
Hi - sounds like you need a range query. http://lucene.apache.org/java/2_3_2/queryparsersyntax.html#Range%20Searches -- Ian. On Wed, Nov 19, 2008 at 4:02 PM, Ariel <[EMAIL PROTECTED]> wrote: > Hi everybody: > > I need to make search with lucene 2.3.2, taking in account the dates, > previously

How to search documents taking in account the dates ???

2008-11-19 Thread Ariel
Hi everybody: I need to make search with lucene 2.3.2, taking in account the dates, previously when I build the index I create a date field where I stored the year in which the document was created, at the search moment I would like to retrieve documents that have been created before a Year or aft

Re: Spread of lucene score

2008-11-19 Thread Mark Miller
excitingComm2 wrote: Hi everybody, as far as I know the lucene score is an arbitrary number between 0.0 and 1.0. Is this correct, that the scores in my resultset are always normalised to this spread or is it possible to get higher scores? Regards, John W. Hits is the class that did the norma

Spread of lucene score

2008-11-19 Thread excitingComm2
Hi everybody, as far as I know the lucene score is an arbitrary number between 0.0 and 1.0. Is this correct, that the scores in my resultset are always normalised to this spread or is it possible to get higher scores? Regards, John W. -- View this message in context: http://www.nabble.com/Spre

Re: InstatiatedIndex questions

2008-11-19 Thread David Causse
Hi Karl, The reset() problem is not very problematic I can adapt our TokenStreams. For the Serialization : as we need to share very small indexes (200 docs max) in a cluster we need to serialize something. I was planning to use the Java Serialization with maybe some compression on the resulting

Re: Special characters prevent entity being indexed

2008-11-19 Thread Erick Erickson
I'm going to have to punt on what Hibernate does/doesn't do since I have no experience there. But in general analyzers are very important. StandardAnalyzer, for instance, tries to recognize e-mail addresses. So it'll create some very interesting tokens, some that are unexpected unless you really k

Re: InstatiatedIndex questions

2008-11-19 Thread karl wettin
Hi David, thanks for the report! I suppose you speak of IndexWriter vs InstantiatedIndexWriter? These are definitely considered discrepancy problems. I've created a new issue in the tracker: http://issues.apache.org/jira/browse/LUCENE-1462 For what reason do you try to serialize the InstantatedIn

Re: Special characters prevent entity being indexed

2008-11-19 Thread Pekka Nykyri
Thanks for the quick answer! I haven't specified the analyzer so it should be the StandardAnalyzer. I forgot to mention that I'm using Lucene via Hibernate seach where I can easily define the fields in the hibernate POJO-classes. But as far as I know this shouldn't change things that much bec

Re: 2.4 Performance

2008-11-19 Thread Michael McCandless
Can you describe the queries in more detail? Can you narrow down exactly which operations / types of queries are substantially slower? Also, I'm assuming both of you are NOT on Windows? NIOFSDirectory has poor performance on Windows due to this bug in Sun's JVM: http://bugs.sun.com/

InstatiatedIndex questions

2008-11-19 Thread David Causse
Hi, Here are some differences I noticed between InstanciatedIndex and RAMDirectory : - RAMDirectory seems to do a reset on tokenStreams the first time, this permits to initialise some objects before starting streaming, InstanciatedIndex does not. - I can Serialize a RAMDirectory but I cannot

Re: 2.4 Performance

2008-11-19 Thread Eric Bowman
[EMAIL PROTECTED] wrote: > On an index of around 20 gigs I've been seeing a performance drop of > around 35% after upgrading to 2.4 (measured on ~1 requests > identical requests, executed in parallel against a threaded lucene / > apache setup, after a roughly 1 query warmup). The principal

Re: Searching repeating fields

2008-11-19 Thread Eran Sevi
If you don't have a lot of entries for each invoice you can duplicate the invoice for each entry - you'll have some field duplications (and bigger index size) between the different invoices but it'll be easy to find exactly what you want. If you have too many different values, I built a solution s