contrib in revision 786474
Thanks for all your help! I will now open another issue for the remaining
(optional) tasks
> Move TrieRange to core
> --
>
> Key: LUCENE-1673
> URL: https://issues.apache.org/jira/browse/LUCENE-1673
>
We can separately tweak the javadocs...
> Move TrieRange to core
> --
>
> Key: LUCENE-1673
> URL: https://issues.apache.org/jira/browse/LUCENE-1673
> Project: Lucene - Java
> Issue Type: New Feature
>
day or two :-)
When committing, I will also remove TrieRange from contrib/search (not included
in patch).
If you want to make javadocs updates, feel free to post an updated patch or do
it after I committed.
After that I will do some work for NumericField and NumericSortField as well as
moving
[
https://issues.apache.org/jira/browse/LUCENE-1673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Uwe Schindler updated LUCENE-1673:
--
Attachment: LUCENE-1673.patch
Here some intermediate update...
> Move TrieRange to c
ds is controversial, I
think for 2.9, we should only add NumericField at indexing (document
creation) time. So, we don't store a new bit in stored fields file
and the index format is unchanged.
> Move TrieRange to core
> --
>
> Key: LUCENE-167
open for cutting over contrib/spacial to
NumericUtils
> Move TrieRange to core
> --
>
> Key: LUCENE-1673
> URL: https://issues.apache.org/jira/browse/LUCENE-1673
> Project: Lucene - Java
>
ic fields, build something
schema-like and put it in contribs. If it's hard to build - Lucene core is to
blame, it's not extensible enough. From my experience, for that purporse it's
okay as it is.
> Move TrieRange to core
> --
>
>
currently exist
and is not necessary for Trie.
Want a convenience method for the user? TrieUtils.createDocumentField(...) ,
same as the sortField currently works.
The current Trie behavior works the same way everything else does in Lucene...
changing that an
) ,
same as the sortField currently works.
The current Trie behavior works the same way everything else does in Lucene...
changing that and encoding types into the index deserves it's own issue and
discussion (and something big like that doesn't seem to belong in 2.9 which is
wind
y?
I do agree that retrieving a doc is already "buggy", in that various
things are lost from your index time doc (a well known issue at this
point!), but I don't think we should intentionally make that behavior
even more buggy, if we can help it...
> Move TrieRange to core
> -
my opinion, DateTools has its usage.
OK I agree, we should leave DateTools un-deprecated. If/when we offer
easier integration for Dates w/ Numeric*, we can reconsider
deprecation at that point.
> Move TrieRange to core
> --
>
> Key: LUCENE
e core indexing and
the types of fields... they aren't coupled now except when the generic format
of the index changes (like omitNorms, omitTf, indexed, etc).
> Move TrieRange to core
> --
>
> Key: LUCENE-1673
> URL: https
even downto the
millisecond). The important thing is: the lower precision terms are not at
common date boundaries.
Because of this different use cases, in my opinion, DateTools has its usage.
> Move TrieRange to core
> --
>
> Key: LUCENE-1673
>
om 0:00 on the day to 0:00 on the following day exclusive.
Couldn't we have a NumericTermQuery for such cases? You have the full
precision term in the index...
> Move TrieRange to core
> --
>
> Key: LUCENE-1673
> URL: https:
on the day to 0:00 on the following day exclusive.
> Move TrieRange to core
> --
>
> Key: LUCENE-1673
> URL: https://issues.apache.org/jira/browse/LUCENE-1673
> Project: Lucene - Java
> Issue Type: New Fe
ic"; then FieldsReader would return a
NumericField.
> Move TrieRange to core
> --
>
> Key: LUCENE-1673
> URL: https://issues.apache.org/jira/browse/LUCENE-1673
> Project: Lucene - Java
> Issue Type: New Fe
mpatient, this is how it's used", and then a separate section
detailing how it works, what precisionStep means (and tradeoffs of high/low
values for it), the reference to the full paper, etc.
But we can iterate on the javadocs in the separate issue, too.
> Move TrieRange to core
>
add a short note to package.html in
analysis and search.
> Move TrieRange to core
> --
>
> Key: LUCENE-1673
> URL: https://issues.apache.org/jira/browse/LUCENE-1673
> Project: Lucene - Java
>
hese (NumericField, NumericSortField)
are important to do for 2.9. Maybe others (adding support for the
missing numeric types (byte & short)) can wait.
Let's wrap this one up and move onto the next ones ;)
> Move TrieRange to core
> --
>
>
27;t have back-compat issues, so it could be
added any time - no need to link it to this issue or to rush it.
I think the same, I should first resolve this and open some more issues :-)
> Move TrieRange to core
> --
>
> Key: LUCENE-1673
>
have back-compat issues, so it could be added
any time - no need to link it to this issue or to rush it.
> Move TrieRange to core
> --
>
> Key: LUCENE-1673
> URL: https://issues.apache.org/jira/browse/LUCENE-1673
>
about it
- [unneeded SortField factory, parsers] -> extra issue, maybe after 3.0
> Move TrieRange to core
> --
>
> Key: LUCENE-1673
> URL: https://issues.apache.org/jira/browse/LUCENE-1673
> Project: Lucene - Java
, 2009 10:51 PM
> To: java-dev@lucene.apache.org
> Subject: Re: [jira] Commented: (LUCENE-1673) Move TrieRange to core
>
> On Mon, Jun 15, 2009 at 4:42 PM, Mark Miller wrote:
>
> > Remember the last time we started to push for 2.9 in Dec/Jan :)
>
> Yes this is very
On Mon, Jun 15, 2009 at 4:42 PM, Mark Miller wrote:
> Remember the last time we started to push for 2.9 in Dec/Jan :)
Yes this is very much on my mind too!!
So maybe, it's a race between the trie* "group" of issues, and the other 28 ;)
Mike
-
Michael McCandless (JIRA) wrote:
We're forking off new 2.9 issues left and
right here!!
Evil :) You guys are like small team working against me.
We still have 29+- issue to wrap up though, so probably plenty of time.
I hope we can set a rough target date soon though - it really feels like
issue for how to best integrate/default SortField
and FieldCache.
bq. Nevertheless, I would like to remove emphasis from NumericUtils (which is
in realyity a helper class).
+1
bq. For bytes, TrieRange is not very interesting, for shorts, maybe, but I
would subsume them during indexing as simple
e in complete, its really useless. Maybe, I
add a getShift() method to NumericUtils, that returns the shift value of a
Token/String. See java-dev mailing with Yonik.
> Move TrieRange to core
> --
>
> Key: LUCENE-1673
> URL: https:
pend() in NumericTokenStream? This makes
it really easy to index.
The only good thing of NumericField would be the possibility to automatically
disable TF and Norms per default when indexing.
> Move TrieRange to core
> --
>
> Key: LUCENE-1673
>
r trie indexed fields as well? (Since
SortField, FieldCache support these numeric types too...).
For bytes, TrieRange is not very interesting, for shorts, maybe, but I would
subsume them during indexing as simple integers. You could not speedup
searching, but limit index size a little bit.
bq. C
uses the right parser?
A factory method TrieUtils.getSortField() could also return the right SortField.
> Move TrieRange to core
> --
>
> Key: LUCENE-1673
> URL: https://issues.apache.org/jira/browse/LUCENE-1673
>
).
Leaving it util seems OK, since it's used by analysis & searching.
> Move TrieRange to core
> --
>
> Key: LUCENE-1673
> URL: https://issues.apache.org/jira/browse/LUCENE-1673
> Project: Lucene
ortField, FieldCache support these numeric types too...).
> Move TrieRange to core
> --
>
> Key: LUCENE-1673
> URL: https://issues.apache.org/jira/browse/LUCENE-1673
> Project: Lucene - Java
> Issue Type:
openend after this.
> Move TrieRange to core
> --
>
> Key: LUCENE-1673
> URL: https://issues.apache.org/jira/browse/LUCENE-1673
> Project: Lucene - Java
> Issue Type: New Feature
> Components: Sear
the class using any subclass of
java.lang.Number? Because type safety to prevent people from doing things like
new NumericRangeQuery(field,precStep,new Long(val1),new Float(val2)) which may
lead to undefined behaviour. The second problem is missing type safety with
auto-boxing in Jav
w Long(val1),new Float(val2)) which may
lead to undefined behaviour. The second problem is missing type safety with
auto-boxing in Java 5.
> Move TrieRange to core
> --
>
> Key: LUCENE-1673
> URL: https://issu
vent people from doing things like
new NumericRangeQuery(field,precStep,new Long(val1),new Float(val2)) which may
lead to undefined behaviour. The second problem is missing type safety with
auto-boxing in Java 5.
> Move TrieRange to core
> --
>
>
nStep and
friends).
> Move TrieRange to core
> --
>
> Key: LUCENE-1673
> URL: https://issues.apache.org/jira/browse/LUCENE-1673
> Project: Lucene - Java
> Issue Type: New Feature
> Components:
l comment, moving TrieRange to core should be moving it to the
core and perhaps renaming the classes if we can think of a better name. Some of
the other stuff belongs in a different issue.
I think this is correct. I will post a patch soon, that leaves TrieUtils alive.
> Move TrieR
about an
analysis.numeric package?
As a general comment, moving TrieRange to core should be moving it to the core
and perhaps renaming the classes if we can think of a better name. Some of the
other stuff belongs in a different issue.
> Move TrieR
with
Shai's and Michael's and Jason's changes here.
> Move TrieRange to core
> --
>
> Key: LUCENE-1673
> URL: https://issues.apache.org/jira/browse/LUCENE-1673
> Project: Lucene - Java
>
TrieUtils)
- the default parsers (private) in FieldCache renaming to also PlainText* (but
accessible)
- add TrieUtils.XxxParser to FieldCache (but accessible)
> Move TrieRange to core
> --
>
> Key: LUCENE-1673
> URL: https://i
the
internal encoding name from TrieUtils)
- the default parsers (private) in FieldCache renaming to also PlainText* (but
accessible)
- add TrieUtils.XxxParser to FieldCache (but accessible)
> Move TrieRange to core
> --
>
> Key: LUCENE-1673
>
, as TRIE no longer used) and
the parser instances can be added to FieldCache.
For indexing or querying it is not required for end users, one can use
NumericTokenStream and NumericRangeQuery for all his needs.
So NumberUtils is more internal than before.
Any thoughts?
> Move TrieRange
bq. NumericRangeQuery.newFloatRange(Float a, Float b, precisionStep) and so on.
Could we also do this for a "term range"? Then, we could have a single
RangeQuery that rewrites to the right impl based on what kind of range you are
doing?
(And in fact it could fold in FieldCacheRangeFilter too).
> Move
No we do not have such an issue, as far as I know. Storing some
version/field type info would be great. In this case we could maybe extend
TrieRange in future to use a different encoding or e.g. CSF for the highest
precisision (as Michael Busch suggested in Amsterdam).
Because TrieRange was and
s
> > change. EG TermRangeQuery... to emphasize that you use it for
> > non-numbers. The javadocs of TermRangeQuery should point to
> > Int/LongRangeQuery as strongly preferred for numeric ranges.
>
> Cool. For the others, too (FieldCacheRangeQuery).
> {quote}
>
> Yes.
>
>
>
ame RangeQuery to something else, with this
> change. EG TermRangeQuery... to emphasize that you use it for
> non-numbers. The javadocs of TermRangeQuery should point to
> Int/LongRangeQuery as strongly preferred for numeric ranges.
Cool. For the others, too (FieldCacheR
eferred for numeric ranges.
{quote}
Cool. For the others, too (FieldCacheRangeQuery).
There is a lot more to decide, I will keep this issue open a little bit before
starting to work to collect ideas!
> Move TrieRange to core
> --
>
>
ust noticed the code fragment in the javadocs for
LongTrieTokenStream won't compile, because the setValue method is not
available for TokenStream; the stream should be defined as
LongTrieTokenStream, I think?; same with IntTrieTokenStream)
> Move TrieRange to core
>
u could consider putting everything in o.a.l.trie .
I'd prefer to have explicit class names containing Long, Int etc, and also
containing Trie.
I don't know the details of the tokenizing, but AbstractTrieField sounds just
right.
> Move TrieRange to core
> --
ribs, you won't be bound by any other
back-compat policies besides common sense. :)
> Move TrieRange to core
> --
>
> Key: LUCENE-1673
> URL: https://issues.apache.org/jira/browse/LUCENE-1673
> Project: Lucene -
Move TrieRange to core
--
Key: LUCENE-1673
URL: https://issues.apache.org/jira/browse/LUCENE-1673
Project: Lucene - Java
Issue Type: New Feature
Components: Search
Affects Versions: 2.9
Reporter
uals() and toString() of TrieRangeQueries in
revision 767982.
> Rewrite TrieRange to use MultiTermQuery
> ---
>
> Key: LUCENE-1602
> URL: https://issues.apache.org/jira/browse/LUCENE-1602
>
[
https://issues.apache.org/jira/browse/LUCENE-1602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Uwe Schindler resolved LUCENE-1602.
---
Resolution: Fixed
Committed revision 765618.
> Rewrite TrieRange to use MultiTermQu
also added
svn:eol-style to all files in trie and test-trie.
Because this is not yet committed, the patch may still fail to apply, but I
will commit in the next few hours.
> Rewrite TrieRange to use MultiTermQuery
> ---
>
> Key:
, so it's clear you have to provide a MultiTermQuery
yourself (via subclass) to use it.
> Changes for TrieRange in FilteredTermEnum and MultiTermQuery improvement
>
>
> Key: LUCENE-1603
>
good? MultiTermQueryWrapperFilter or simplier
MultiTermFilter? Its not really one of both, its a mix between wrapper and the
real filter: It wraps the query, but does the getDocIdSet and TermEnums himself.
> Changes for TrieRange in FilteredTermEnum and MultiTermQuery impr
[
https://issues.apache.org/jira/browse/LUCENE-1603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael McCandless resolved LUCENE-1603.
Resolution: Fixed
> Changes for TrieRange in FilteredTermEnum and MultiTermQu
ll commit shortly. Thanks Uwe!
> Changes for TrieRange in FilteredTermEnum and MultiTermQuery improvement
>
>
> Key: LUCENE-1603
> URL: https://issues.apache.org
drop-in-backwards-compatibility with this patch
applied, with Windows, the checkout through Ant does not work correctly? I also
set the native line ending svn property of the new file in the patch.
The update of the TrieRange follows after this is committed, code not affected,
compiles still fine
those filters MultiTermQueryWrapperFilter (name to be discussed).
> Changes for TrieRange in FilteredTermEnum and MultiTermQuery improvement
>
>
> Key: LUCENE-1603
> URL: https://is
this, would be to make the *Filter counterparts
subclasses of a new superclass MutiTermFilter, that just passes all methods to
the corresponding query.
That'd be great -- can you reopen this & attach patch?
> Changes for TrieRange in FilteredTermEnum and MultiTermQ
atch?
> Changes for TrieRange in FilteredTermEnum and MultiTermQuery improvement
>
>
> Key: LUCENE-1603
> URL: https://issues.apache.org/jira/browse/LUCENE-1603
>
the number of terms methods also to the
*Filter counterparts of the MultiTermQueries (as the new methods are only
automatically appear in subclasses, but not in related pass-to-query-only
classes)?
In trie-range I have these pass-to-query methods.
> Changes for TrieRange in FilteredTermE
. The ZIP file is
not added, as this methods are not really needed for testing, just for
completeness.
> Rewrite TrieRange to use MultiTermQuery
> ---
>
> Key: LUCENE-1602
> URL: https://issues.apache.org/jira/bro
ooks good.
> Rewrite TrieRange to use MultiTermQuery
> ---
>
> Key: LUCENE-1602
> URL: https://issues.apache.org/jira/browse/LUCENE-1602
> Project: Lucene - Java
> Issue Type: New Feat
bug in the tests (filter term count was
incorrect)
> Rewrite TrieRange to use MultiTermQuery
> ---
>
> Key: LUCENE-1602
> URL: https://issues.apache.org/jira/browse/LUCENE-1602
> Proj
uick pass. I can't compile/test based on the zip
(since we renamed the new method), but I like the new approach.
> Rewrite TrieRange to use MultiTermQuery
> ---
>
> Key: LUCENE-1602
> URL: https://issues.apach
[
https://issues.apache.org/jira/browse/LUCENE-1603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael McCandless resolved LUCENE-1603.
Resolution: Fixed
Thanks Uwe!
> Changes for TrieRange in FilteredTermEnum
ways must call
clear first).
> Changes for TrieRange in FilteredTermEnum and MultiTermQuery improvement
>
>
> Key: LUCENE-1603
> URL: https://issues.apache.org/jira/browse/LUCE
[
https://issues.apache.org/jira/browse/LUCENE-1603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Uwe Schindler updated LUCENE-1603:
--
Attachment: LUCENE-1603.patch
Here the patch with the suggested changes.
TrieRange test
ient, it should not be serialized...
> Changes for TrieRange in FilteredTermEnum and MultiTermQuery improvement
>
>
> Key: LUCENE-1603
> URL: https://issues.apache.org/jira
that
to get "number of unique terms" vs "amount of work (seeks) done".
If we do change it, ow about "get/clearTotalNumberOfTerms()"?
> Changes for TrieRange in FilteredTermEnum and MultiTermQuery improvement
> ---
ange. In case of a change, the method should be called
getCurrentNumberOfTerms() or something like that -- naming is the hardest one.
> Changes for TrieRange in FilteredTermEnum and MultiTermQuery impr
. Please
note: two test files were removed by the patch, so you must remove them by hand.
I tried to patch a fresh checkout of trunk without problems. TortoiseMerge
patched all files without problems.
> Rewrite TrieRange to use MultiTermQu
you instead post a tar file w/ your current *.java?
> Rewrite TrieRange to use MultiTermQuery
> ---
>
> Key: LUCENE-1602
> URL: https://issues.apache.org/jira/browse/LUCENE-1602
> Project: L
d we allow lastNumberOfTerms to be the sum of all
invocations? (Instead of clearing it per segment)? And maybe add a
resetLastNumberOfTerms, in case one wants to re-use a MultiTermQuery and
recheck that count.
> Changes for TrieRange in FilteredTermEnum and MultiTermQuery impr
[
https://issues.apache.org/jira/browse/LUCENE-1603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael McCandless reassigned LUCENE-1603:
--
Assignee: Michael McCandless
> Changes for TrieRange in FilteredTermEnum
[
https://issues.apache.org/jira/browse/LUCENE-1603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael McCandless updated LUCENE-1603:
---
Fix Version/s: 2.9
> Changes for TrieRange in FilteredTermEnum and MultiTermQu
s the old problem of TortoiseSVN on Windows: It
generates CR-LF instead of LF alone. dostounix should help. I always apply
patches using the TortoiseSVN merge function (that also keeps track of correct
local versions from patch header and merges alltogether).
> Rewrite TrieRange to use Mu
inverse range
- a comparison of TrieRange and classic Range in term numbers
The patch LUCENE-1603 must be applied before.
> Rewrite TrieRange to use MultiTermQuery
> ---
>
> Key: LUCENE-1602
> URL: https://issues.
could be
improved...
> Changes for TrieRange in FilteredTermEnum and MultiTermQuery improvement
>
>
> Key: LUCENE-1603
> URL: https://issues.apache.org/jira/bro
Changes for TrieRange in FilteredTermEnum and MultiTermQuery improvement
Key: LUCENE-1603
URL: https://issues.apache.org/jira/browse/LUCENE-1603
Project: Lucene - Java
How did you generate the diff?
> Rewrite TrieRange to use MultiTermQuery
> ---
>
> Key: LUCENE-1602
> URL: https://issues.apache.org/jira/browse/LUCENE-1602
> Project: Lucene - Java
>
both the EMPTY_DOCID_SET optimization and the
getTermCount() addition to MultiTermQuery.
> Rewrite TrieRange to use MultiTermQuery
> ---
>
> Key: LUCENE-1602
> URL: https://issues.apache.org/jira/browse/LUCENE-16
uery:
The original version of TrieRange had a shortcut in the getDocIdSet call: If
the range was inverse and would return for sure no documents, it returned the
DocIdSet.EMPTY_DOCID_SET instance and did not allocate any OpenBitSet.
MultiTermQuery could also do this automatically, if
core:
- Make the private members protected, to have access to them from the very
special TrieRangeTermEnum
- Fix a small inconsistency (docFreq() now only returns a value, if a valid
term is existing)
- All core tests pass
2. TrieRange patch:
- New TrieRangeQuery classes
Rewrite TrieRange to use MultiTermQuery
---
Key: LUCENE-1602
URL: https://issues.apache.org/jira/browse/LUCENE-1602
Project: Lucene - Java
Issue Type: New Feature
Components: contrib
> > For me it would not be a problem, I would use a FilteredTermEnum and
> > subclass it, but would only implement next() and the other abstract
> methods
> > would be dummies (including difference() returning 1.0f). Only the enum
> and
> > the term should have a protected access or a getter in thi
On Mon, Apr 13, 2009 at 12:05 PM, Uwe Schindler wrote:
> For me it would not be a problem, I would use a FilteredTermEnum and
> subclass it, but would only implement next() and the other abstract methods
> would be dummies (including difference() returning 1.0f). Only the enum and
> the term shou
> > MultiTermQuery has in its protected getEnum() returning
> FilteredTermEnum.
> > For TrieRange, the return should be changed to TermEnum, it is not
> needed to
> > have a FilteredTermEnum (FilteredTermEnum is only an implementation, the
> > method should return an
Uwe Schindler wrote:
MultiTermQuery has in its protected getEnum() returning FilteredTermEnum.
For TrieRange, the return should be changed to TermEnum, it is not needed to
have a FilteredTermEnum (FilteredTermEnum is only an implementation, the
method should return an abstract TermEnum). If this
Hi,
it was discussed now many times on this list, but I did not get a solution,
if we should include TrieRange into the core or not.
When thinking about it and looking in the latest developments about
TrieRange (TokenStreams for indexing), I plan to do the following:
a) Put the classes into the
o on with 831...
Here here!
> Make TrieRange completely independent from Document/Field with TokenStream of
> prefix encoded values
> ---
>
> Key: LUCENE-1582
>
filter tests.
> Make TrieRange completely independent from Document/Field with TokenStream of
> prefix encoded values
> ---
>
> Key: LUCENE-1582
>
this.
Finally: Let's go on with 831... :-)
> Make TrieRange completely independent from Document/Field with TokenStream of
> prefix encoded values
> ---
>
>
ache part... thanks Uwe!
> Make TrieRange completely independent from Document/Field with TokenStream of
> prefix encoded values
> ---
>
> Key: LUCENE-1582
>
ache look OK -- I'll commit shortly. I'll tone back
the javadoc to a Expert/non-back-compat warning. It doesn't matter much since
with LUCENE-831, we should be able to remove it entirely, before releasing 2.9.
> Make TrieRange completely independent from Document/Field with To
think, the changes in FieldCache are OK, can you commit only the
changes to the FieldCache?
> Make TrieRange completely independent from Document/Field with TokenStream of
> prefi
Change: prefixCodedTo...() now accepts CharSequence instead of String
(because only this interface's methods are needed for decoding).
> Make TrieRange completely independent from Document/Field with TokenStream of
> prefix enc
1 - 100 of 143 matches
Mail list logo