On 05/03/2012 19:26, Chris Hostetter wrote:
: very small to occasionally very large. It also might be the case that
: cover letters and e-mails while short might not be really something to
: heavily discount. The lower discount range can be ignored by setting
: the min of any sweet spot to 1.
Hi there,
Is Java7 now safe to use with Lucene? If so, is there a minimum Lucene version
I must use with it?
Thanks,
- Chris
I've posted a self-contained test case to github of a mystery.
git://github.com/bimargulies/lucene-4-update-case.git
The code can be seen at
https://github.com/bimargulies/lucene-4-update-case/blob/master/src/test/java/org/apache/lucene/BadFieldTokenizedFlagTest.java.
I write a doc to an index,
Hi,
Any version of Lucene should be compatible with Java 7, if you use at least
JDK7 update 1. There are some minor issues with older Lucene versions when
*building* the package and *running tests*, but the precompiled binaries are
fine. But you should use Lucene/Solr 3.5 as a minimum, as this one
Under "LUCENE-1458, LUCENE-2111: Flexible Indexing", CHANGES.txt
appears to be missing one critical hint. If you have existing code
that called IndexReader.terms(), where do you start to get a
FieldsEnum?
-
To unsubscribe, e-mail:
AtomicReader.fields()
-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de
> -Original Message-
> From: Benson Margulies [mailto:bimargul...@gmail.com]
> Sent: Tuesday, March 06, 2012 2:50 PM
> To: java-user@lucene.apache.org
> Subject:
I think MIGRATE.txt talks about this?
Mike McCandless
http://blog.mikemccandless.com
On Tue, Mar 6, 2012 at 8:50 AM, Benson Margulies wrote:
> Under "LUCENE-1458, LUCENE-2111: Flexible Indexing", CHANGES.txt
> appears to be missing one critical hint. If you have existing code
> that called Inde
On Tue, Mar 6, 2012 at 8:56 AM, Uwe Schindler wrote:
> AtomicReader.fields()
I went and read up AtomicReader in CHANGES.txt. Should I call
SegmentReader.getReader(IOContext)?
I just posted a patch to CHANGES.txt to clarify before I read your
email, shall I improve it to use this instead of
On Tue, Mar 6, 2012 at 9:09 AM, Michael McCandless
wrote:
> I think MIGRATE.txt talks about this?
Yes it does, but it doesn't actually answer the specific question. See
LUCENE-3853 where I added what seems to be missing. If it's somewhere
else in the file I apologize.
>
> Mike McCandless
>
> htt
Oh, I see, I didn't read far enough down. Well, the patch still
repairs a bug in the code fragment relative to the Term enumeration.
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail
I think the issue is that your analyzer is standardanalyzer, yet field
text value is "value-1"
So standardanalyzer will tokenize this into two terms: "value" and "1"
But later, you proceed to do TermQueries on "value-1". This term won't
exist... TermQuery etc that take Term don't analyze any text
Oh, ouch, there's no SegmentReader.getReader, I was reading IndexWriter. Sorry.
On Tue, Mar 6, 2012 at 9:14 AM, Benson Margulies wrote:
> On Tue, Mar 6, 2012 at 8:56 AM, Uwe Schindler wrote:
>> AtomicReader.fields()
-
To unsubs
On Tue, Mar 6, 2012 at 9:20 AM, Robert Muir wrote:
> I think the issue is that your analyzer is standardanalyzer, yet field
> text value is "value-1"
Robert,
Why is this field analyzed at all? It's built with StringField.TYPE_STORED.
I'll push another copy that shows that it works fine when the
On Tue, Mar 6, 2012 at 9:23 AM, Benson Margulies wrote:
> On Tue, Mar 6, 2012 at 9:20 AM, Robert Muir wrote:
>> I think the issue is that your analyzer is standardanalyzer, yet field
>> text value is "value-1"
>
> Robert,
>
> Why is this field analyzed at all? It's built with StringField.TYPE_STO
On Tue, Mar 6, 2012 at 9:23 AM, Benson Margulies wrote:
> On Tue, Mar 6, 2012 at 9:20 AM, Robert Muir wrote:
>> I think the issue is that your analyzer is standardanalyzer, yet field
>> text value is "value-1"
>
> Robert,
>
> Why is this field analyzed at all? It's built with StringField.TYPE_STO
Hi,
MultiFields should only be used (as it is slow) if you exactly know what you
are doing and what the consequences are. There is a change in Lucene 4.0, so
you can no longer terms and postings from a top-level (composite) reader. More
info is also here: http://goo.gl/lMKTM
Uwe
-
Uwe Sch
On Tue, Mar 6, 2012 at 9:34 AM, Uwe Schindler wrote:
> Hi,
>
> MultiFields should only be used (as it is slow) if you exactly know what you
> are doing and what the consequences are. There is a change in Lucene 4.0, so
> you can no longer terms and postings from a top-level (composite) reader.
Dear list,
I have a quite specific issue on which I would appreciate very much
having some thoughts before I start the actual implementation. Here's my
task description:
I would like to index corpora that have already been tokenized by an
external tokenizer. This tokenization is stored in an extern
Hmm something is up here... I'll dig. Seems like we are somehow
analyzing StringField when we shouldn't...
Mike McCandless
http://blog.mikemccandless.com
On Tue, Mar 6, 2012 at 9:33 AM, Robert Muir wrote:
> On Tue, Mar 6, 2012 at 9:23 AM, Benson Margulies
> wrote:
>> On Tue, Mar 6, 2012 at 9
On Tue, Mar 6, 2012 at 9:33 AM, Robert Muir wrote:
> On Tue, Mar 6, 2012 at 9:23 AM, Benson Margulies
> wrote:
>> On Tue, Mar 6, 2012 at 9:20 AM, Robert Muir wrote:
>>> I think the issue is that your analyzer is standardanalyzer, yet field
>>> text value is "value-1"
>>
>> Robert,
>>
>> Why is
Hi,
The recommended way to get an atomic reader from a composite reader is to use
SlowCompositeReaderWrapper.wrap(reader). MultiFields is now purely internal. I
think it's only public because the codecs package may need it, otherwise it
should be pkg-private.
-
Uwe Schindler
H.-H.-Meier-Al
String field is analyzed, but with KeywordTokenizer, so all should be fine.
-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de
> -Original Message-
> From: Michael McCandless [mailto:luc...@mikemccandless.com]
> Sent: Tuesday, March 06
On Tue, Mar 6, 2012 at 9:47 AM, Uwe Schindler wrote:
> String field is analyzed, but with KeywordTokenizer, so all should be fine.
I filed LUCENE-3854.
>
> -
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: u...@thetaphi.de
>
>
>> -Original Message
On Tue, Mar 6, 2012 at 9:46 AM, Uwe Schindler wrote:
> Hi,
>
> The recommended way to get an atomic reader from a composite reader is to use
> SlowCompositeReaderWrapper.wrap(reader). MultiFields is now purely internal.
> I think it's only public because the codecs package may need it, otherwise
Thanks Benson: look like the problem revolves around indexing
Document/Fields you get back from IR.document... this has always been
'lossy', but I think this is a real API trap.
Please keep testing :)
On Tue, Mar 6, 2012 at 9:58 AM, Benson Margulies wrote:
> On Tue, Mar 6, 2012 at 9:47 AM, Uwe S
On Tue, Mar 6, 2012 at 10:04 AM, Robert Muir wrote:
> Thanks Benson: look like the problem revolves around indexing
> Document/Fields you get back from IR.document... this has always been
> 'lossy', but I think this is a real API trap.
>
> Please keep testing :)
Got a suggestion for sneaking arou
On Tue, Mar 6, 2012 at 10:06 AM, Benson Margulies wrote:
> On Tue, Mar 6, 2012 at 10:04 AM, Robert Muir wrote:
>> Thanks Benson: look like the problem revolves around indexing
>> Document/Fields you get back from IR.document... this has always been
>> 'lossy', but I think this is a real API trap.
I have an ID field that contains about 100,000 unique ids. If I want to
query all records with ids [1-100], How should I be doing this?
I tried doing it the following way:
Query qry = new MultiFieldQueryParser( fi
You'll need to pad your ids to make this work.
01
02
etc.
with a length to match the max you require, now or in the future.
Or, better, upgrade to a recent release and use NumericField.
--
Ian.
On Mon, Mar 5, 2012 at 9:46 PM, Kushal Dave wrote:
> I have an ID field that contains abo
I have a number of fields that either only ever have a term frequency of
1 or I don't want them to be disavantaged if they do have a greater term
frequency, and I never boost the field so I disable norms for these
fields with Field.Index.ANALYZED_NO_NORM or
Field.Index.NOT_ANALYZED_NO_NORM.
B
On 06/03/2012 21:44, Paul Taylor wrote:
I have a number of fields that either only ever have a term frequency
of 1 or I don't want them to be disavantaged if they do have a greater
term frequency, and I never boost the field so I disable norms for
these fields with Field.Index.ANALYZED_NO_NORM
On 05/03/2012 23:24, Robert Muir wrote:
On Mon, Mar 5, 2012 at 6:01 PM, Paul Hill wrote:
I would definitely not suggest using SSS for fields like legal brief text or
emails where there is huge
variability in the length of the content -- i can't think of any context where a
"short" email is
de
On Tue, Mar 6, 2012 at 5:57 PM, Paul Taylor wrote:
>> Hello,
>>
>> what is previously Similarity in older releases is moved to
>> TFIDFSimilarity: it extends Similarity and exposes a vector-space API,
>> with its same formulas in the javadocs:
>>
>> https://builds.apache.org/view/G-L/view/Lucene/j
i.e. Field length :)
A trivial question maybe: if one uses these flags does that mean they don't
need to override the computeNorm method as shown in Simon's article on
seachworkings? I am referring to the case when one doesn't want to use norms.
h.
-Original Message-
From: Paul Taylor
I'm running with 3.4 code and have studied up on all the API related to the
optimize() replacements and understand I needn't worry about deleted documents,
but I still want to ask a few things about keeping the index in good shape
And about merge policy.
I have an index with 421163 documents (in
35 matches
Mail list logo