You still have a disk seek per doc if the index can't fit in memory
(usually more costly than reading the fields) .
Why not use FieldCache?
-Mike
On 2-Aug-07, at 5:41 PM, Mark Miller wrote:
If you are just retrieving your custom id and you have more stored
fields (and they are not tiny) yo
On Aug 3, 2007, at 9:47 AM, tierecke wrote:
Hi,
Can I know in how many documents a term appears (DF - Document
Frequency)?
Does Lucene keep it? Can I retrieve it?
See the TermEnum class (IndexReader.terms()
Now - an even more advanced question:
Since I have a 77GB index, I cut it into
Hi, and thanks in advace for any help.
I'm fairly new to lucene so excuse the ignorance. I'm attempting to
field an XML documents with nested fields. So:
This
That
would give me hits for:
bar:This
bat:That
foo:ThisThat
The only way I can see a way of doing this now is to field each
eleme
qaz zaq wrote:
> I have Search Terms: T1, T2... Tn. Also I have document fields of F1 F2... Fm.
>
> I want to search the match documents across F1 to Fm fields,i.e., all of the
> T1, T2, ...Tn need to be matched, but can be in the combination of T1, T2,
> ... Tn field.
>
> I check the MultiFie
Heh. I suppose I'll defer to your judgment. In my mind, the simple
system to make is to just buffer the adds, buffer the deletes - later
apply the adds, apply the deletes (or the reverse). I am sure something
in Solr would have a more sophisticated process, but my guess was about
what the new L
I have Search Terms: T1, T2... Tn. Also I have document fields of F1 F2... Fm.
I want to search the match documents across F1 to Fm fields,i.e., all of the
T1, T2, ...Tn need to be matched, but can be in the combination of T1, T2, ...
Tn field.
I check the MultiFieldQueryParser, it doesn't app
Ah, Good way !
On 8/4/07, Paul Elschot <[EMAIL PROTECTED]> wrote:
>
> On Friday 03 August 2007 20:35, Shailendra Sharma wrote:
> > Paul,
> >
> > If I understand Cedric right, he wants to have different boosting
> depending
> > on search term positions in the document. By using SpanFirstQuery he
>
On Friday 03 August 2007 20:35, Shailendra Sharma wrote:
> Paul,
>
> If I understand Cedric right, he wants to have different boosting depending
> on search term positions in the document. By using SpanFirstQuery he will
> only be able to consider in terms till particular position;
> but he won
Paul,
If I understand Cedric right, he wants to have different boosting depending
on search term positions in the document. By using SpanFirstQuery he will
only be able to consider in terms till particular position; but he won't be
able to do something like following:
a) Give 100% boosting to ma
On 3-Aug-07, at 3:27 AM, Mark Miller wrote:
Also, IndexWriter probably buffers better than you would. If you
buffer a delete with IndexWriter and then add a document that would
be removed by that delete right after, when the buffered deletes
are flushed, your latest doc will not be removed
Sometimes I feel stupid! ;)
Thank you very much!
Luca
testn wrote:
Boost must be Map
Luca123 wrote:
Hi all,
I've always used the MultiFieldQueryParser class without problems but
now i'm experiencing a strange problem.
This is my code:
Map boost = new HashMap();
boost.put("field1",5);
boos
The textmining library (textmining.org) for Word docs should work fine
with non-english text as well. Let me know if it doesn't
On 8/2/07, Ben Litchfield <[EMAIL PROTECTED]> wrote:
> In terms of PDF documents...
>
> PDFBox should work just fine with any latin based languages; at this
> time certai
Boost must be Map
Luca123 wrote:
>
> Hi all,
> I've always used the MultiFieldQueryParser class without problems but
> now i'm experiencing a strange problem.
> This is my code:
>
> Map boost = new HashMap();
> boost.put("field1",5);
> boost.put("field2",1);
>
> Analyzer analyzer = new Standa
Cedric,
You can choose the end limit for SpanFirstQuery yourself.
Regards,
Paul Elschot
On Friday 03 August 2007 05:38, Cedric Ho wrote:
> Hi Paul,
>
> Isn't SpanFirstQuery only match those with position less than a
> certain end position?
>
> I am rather looking for a query that would score
I fixed my question later. I meant I did not STORE the document themselves.
Anyway - the issue is already solved, thank to testn.
But there are new hard (for me) questions.
Thanks a lot!
Erick Erickson wrote:
>
> I indexed a large number of large documents, but I did not index the
> document the
<<>>
This is really confusing since it's self-contradictory. Could you
post the lines where you do the document.add() for the fields in
question?
Best
Erick
On 8/3/07, tierecke <[EMAIL PROTECTED]> wrote:
>
>
> Hi,
>
> I indexed a large number of large documents, but I did not index the
> documen
Hi all,
I've always used the MultiFieldQueryParser class without problems but
now i'm experiencing a strange problem.
This is my code:
Map boost = new HashMap();
boost.put("field1",5);
boost.put("field2",1);
Analyzer analyzer = new StandardAnalyzer(STOP_WORDS);
String[] s_fields = new String[2
Thanks a lot, that works 100%!...
Fortunately, I did use the flag to state that Lucene should store the term
frequency vector. Otherwise, I'd have to index 77GB right now... :-)
--
View this message in context:
http://www.nabble.com/Get-the-terms-and-frequency-vector-of-an-indexed-but-unstored-f
Hi,
Can I know in how many documents a term appears (DF - Document Frequency)?
Does Lucene keep it? Can I retrieve it?
thanks a lot from Amsterdam,
Nir.
--
View this message in context:
http://www.nabble.com/How-can-I-get-the-Document-Frequency-for-a-specific-termtf4212615.html#a11983532
We're planning on using encryption at the filesystem level (whole-disk
encryption) and, to be honest, I don't have a mechanism that can produce the
changes I'm talking about. Neither does my boss, unfortunately ;) He came
along one day and asked, "how do we know when data changed on disk without
you can use IndexReader.getTermFreqVectors(int n) to get all terms and their
frequencies. Make sure when you create an index, you choose option to store
it by specifying Field.TermVector option.
Check out http://www.cnlp.org/presentations/slides/AdvancedLuceneEU.pdf
tierecke wrote:
>
> Hi,
>
Also, IndexWriter probably buffers better than you would. If you buffer
a delete with IndexWriter and then add a document that would be removed
by that delete right after, when the buffered deletes are flushed, your
latest doc will not be removed. Its unlikely your own buffer system
would work
Hi,
I indexed a large number of large documents, but I did not store the
document themselves, just indexed them.
Now I am interested in getting the vector (i.e.: the terms indexed and the
frequency) of that indexed but unstored field.
doc.getField (fieldname) returns null.
How can I get the data?
23 matches
Mail list logo