date:20070803

Re: Getting only the Ids, not the whole documents.

2007-08-03 Thread Mike Klaas

You still have a disk seek per doc if the index can't fit in memory (usually more costly than reading the fields) . Why not use FieldCache? -Mike On 2-Aug-07, at 5:41 PM, Mark Miller wrote: If you are just retrieving your custom id and you have more stored fields (and they are not tiny) yo

Re: How can I get the Document Frequency for a specific term??? And more questions...

2007-08-03 Thread Grant Ingersoll

On Aug 3, 2007, at 9:47 AM, tierecke wrote: Hi, Can I know in how many documents a term appears (DF - Document Frequency)? Does Lucene keep it? Can I retrieve it? See the TermEnum class (IndexReader.terms() Now - an even more advanced question: Since I have a 77GB index, I cut it into

Nested Fields

2007-08-03 Thread Spencer Tickner

Hi, and thanks in advace for any help. I'm fairly new to lucene so excuse the ignorance. I'm attempting to field an XML documents with nested fields. So: This That would give me hits for: bar:This bat:That foo:ThisThat The only way I can see a way of doing this now is to field each eleme

Re: multiple field searcher

2007-08-03 Thread Steven Rowe

qaz zaq wrote: > I have Search Terms: T1, T2... Tn. Also I have document fields of F1 F2... Fm. > > I want to search the match documents across F1 to Fm fields,i.e., all of the > T1, T2, ...Tn need to be matched, but can be in the combination of T1, T2, > ... Tn field. > > I check the MultiFie

Re: Performance improvements using writer.delete vs reader.delete

2007-08-03 Thread Mark Miller

Heh. I suppose I'll defer to your judgment. In my mind, the simple system to make is to just buffer the adds, buffer the deletes - later apply the adds, apply the deletes (or the reverse). I am sure something in Solr would have a more sophisticated process, but my guess was about what the new L

multiple field searcher

2007-08-03 Thread qaz zaq

I have Search Terms: T1, T2... Tn. Also I have document fields of F1 F2... Fm. I want to search the match documents across F1 to Fm fields,i.e., all of the T1, T2, ...Tn need to be matched, but can be in the combination of T1, T2, ... Tn field. I check the MultiFieldQueryParser, it doesn't app

Re: Can I do boosting based on term postions?

2007-08-03 Thread Shailendra Sharma

Ah, Good way ! On 8/4/07, Paul Elschot <[EMAIL PROTECTED]> wrote: > > On Friday 03 August 2007 20:35, Shailendra Sharma wrote: > > Paul, > > > > If I understand Cedric right, he wants to have different boosting > depending > > on search term positions in the document. By using SpanFirstQuery he >

Re: Can I do boosting based on term postions?

2007-08-03 Thread Paul Elschot

On Friday 03 August 2007 20:35, Shailendra Sharma wrote: > Paul, > > If I understand Cedric right, he wants to have different boosting depending > on search term positions in the document. By using SpanFirstQuery he will > only be able to consider in terms till particular position; > but he won

Re: Can I do boosting based on term postions?

2007-08-03 Thread Shailendra Sharma

Paul, If I understand Cedric right, he wants to have different boosting depending on search term positions in the document. By using SpanFirstQuery he will only be able to consider in terms till particular position; but he won't be able to do something like following: a) Give 100% boosting to ma

Re: Performance improvements using writer.delete vs reader.delete

2007-08-03 Thread Mike Klaas

On 3-Aug-07, at 3:27 AM, Mark Miller wrote: Also, IndexWriter probably buffers better than you would. If you buffer a delete with IndexWriter and then add a document that would be removed by that delete right after, when the buffered deletes are flushed, your latest doc will not be removed

Re: strange MultiFieldQueryParser error: java.lang.Integer

2007-08-03 Thread Luca Rondanini

Sometimes I feel stupid! ;) Thank you very much! Luca testn wrote: Boost must be Map Luca123 wrote: Hi all, I've always used the MultiFieldQueryParser class without problems but now i'm experiencing a strange problem. This is my code: Map boost = new HashMap(); boost.put("field1",5); boos

Re: extracting non-english text from word, pdf, etc....??

2007-08-03 Thread Ryan Ackley

The textmining library (textmining.org) for Word docs should work fine with non-english text as well. Let me know if it doesn't On 8/2/07, Ben Litchfield <[EMAIL PROTECTED]> wrote: > In terms of PDF documents... > > PDFBox should work just fine with any latin based languages; at this > time certai

Re: strange MultiFieldQueryParser error: java.lang.Integer

2007-08-03 Thread testn

Boost must be Map Luca123 wrote: > > Hi all, > I've always used the MultiFieldQueryParser class without problems but > now i'm experiencing a strange problem. > This is my code: > > Map boost = new HashMap(); > boost.put("field1",5); > boost.put("field2",1); > > Analyzer analyzer = new Standa

Re: Can I do boosting based on term postions?

2007-08-03 Thread Paul Elschot

Cedric, You can choose the end limit for SpanFirstQuery yourself. Regards, Paul Elschot On Friday 03 August 2007 05:38, Cedric Ho wrote: > Hi Paul, > > Isn't SpanFirstQuery only match those with position less than a > certain end position? > > I am rather looking for a query that would score

Re: Get the TokenStream of an indexed but unstored field

2007-08-03 Thread tierecke

I fixed my question later. I meant I did not STORE the document themselves. Anyway - the issue is already solved, thank to testn. But there are new hard (for me) questions. Thanks a lot! Erick Erickson wrote: > > I indexed a large number of large documents, but I did not index the > document the

Re: Get the TokenStream of an indexed but unstored field

2007-08-03 Thread Erick Erickson

<<>> This is really confusing since it's self-contradictory. Could you post the lines where you do the document.add() for the fields in question? Best Erick On 8/3/07, tierecke <[EMAIL PROTECTED]> wrote: > > > Hi, > > I indexed a large number of large documents, but I did not index the > documen

strange MultiFieldQueryParser error: java.lang.Integer

2007-08-03 Thread Luca Rondanini

Hi all, I've always used the MultiFieldQueryParser class without problems but now i'm experiencing a strange problem. This is my code: Map boost = new HashMap(); boost.put("field1",5); boost.put("field2",1); Analyzer analyzer = new StandardAnalyzer(STOP_WORDS); String[] s_fields = new String[2

Re: Get the terms and frequency vector of an indexed but unstored field

2007-08-03 Thread tierecke

Thanks a lot, that works 100%!... Fortunately, I did use the flag to state that Lucene should store the term frequency vector. Otherwise, I'd have to index 77GB right now... :-) -- View this message in context: http://www.nabble.com/Get-the-terms-and-frequency-vector-of-an-indexed-but-unstored-f

How can I get the Document Frequency for a specific term???

2007-08-03 Thread tierecke

Hi, Can I know in how many documents a term appears (DF - Document Frequency)? Does Lucene keep it? Can I retrieve it? thanks a lot from Amsterdam, Nir. -- View this message in context: http://www.nabble.com/How-can-I-get-the-Document-Frequency-for-a-specific-termtf4212615.html#a11983532

Re: How do YOU detect corrupt indexes?

2007-08-03 Thread Joe R

We're planning on using encryption at the filesystem level (whole-disk encryption) and, to be honest, I don't have a mechanism that can produce the changes I'm talking about. Neither does my boss, unfortunately ;) He came along one day and asked, "how do we know when data changed on disk without

Re: Get the terms and frequency vector of an indexed but unstored field

2007-08-03 Thread testn

you can use IndexReader.getTermFreqVectors(int n) to get all terms and their frequencies. Make sure when you create an index, you choose option to store it by specifying Field.TermVector option. Check out http://www.cnlp.org/presentations/slides/AdvancedLuceneEU.pdf tierecke wrote: > > Hi, >

Re: Performance improvements using writer.delete vs reader.delete

2007-08-03 Thread Mark Miller

Also, IndexWriter probably buffers better than you would. If you buffer a delete with IndexWriter and then add a document that would be removed by that delete right after, when the buffered deletes are flushed, your latest doc will not be removed. Its unlikely your own buffer system would work

Get the terms and frequency vector of an indexed but unstored field

2007-08-03 Thread tierecke

Hi, I indexed a large number of large documents, but I did not store the document themselves, just indexed them. Now I am interested in getting the vector (i.e.: the terms indexed and the frequency) of that indexed but unstored field. doc.getField (fieldname) returns null. How can I get the data?

Re: Getting only the Ids, not the whole documents.

Re: How can I get the Document Frequency for a specific term??? And more questions...

Nested Fields

Re: multiple field searcher

Re: Performance improvements using writer.delete vs reader.delete

multiple field searcher

Re: Can I do boosting based on term postions?

Re: Can I do boosting based on term postions?

Re: Can I do boosting based on term postions?

Re: Performance improvements using writer.delete vs reader.delete

Re: strange MultiFieldQueryParser error: java.lang.Integer

Re: extracting non-english text from word, pdf, etc....??

Re: strange MultiFieldQueryParser error: java.lang.Integer

Re: Can I do boosting based on term postions?

Re: Get the TokenStream of an indexed but unstored field

Re: Get the TokenStream of an indexed but unstored field

strange MultiFieldQueryParser error: java.lang.Integer

Re: Get the terms and frequency vector of an indexed but unstored field

How can I get the Document Frequency for a specific term???

Re: How do YOU detect corrupt indexes?

Re: Get the terms and frequency vector of an indexed but unstored field

Re: Performance improvements using writer.delete vs reader.delete

Get the terms and frequency vector of an indexed but unstored field

23 matches

Site Navigation

Mail list logo

Footer information