Interesting question. Does zero-padding make primary key lookups faster or
slower in lucene?
From my tests it would seem that non-padded keys are quicker to lookup than
zero-padded ones (tested doing random access on indexes of varying sizes up to
5m unique keys).
However I imagine there could
Hi
you may put two fields in your document one contains decoded values and
another is original values, but still you need to implement your query
builder so while searching you must specify the decoded filed and you show
the original field to the user
On 9/18/07, [EMAIL PROTECTED] <[EMAIL PROTECTE
: > : I want to use INT sorting instead, but these strings can not be parsed
: > : back into integers by Java's built in parsing functions, which is
: >
: > 1) Take a look at FieldCache.IntParser and
: > FieldCache.getInts(IndexReader,String,IntParser) .. you can use it in your
: > own custom Sort
On 6/9/06, Chris Hostetter <[EMAIL PROTECTED]> wrote:
: I have an integer field that I've indexed after converting to a string
: using NumberTools.longToString().
: Now I want to sort my results using this field. Everything works when
: treating the field as a string, but is very slow and memor
: I have an integer field that I've indexed after converting to a string
: using NumberTools.longToString().
: Now I want to sort my results using this field. Everything works when
: treating the field as a string, but is very slow and memory intensive.
:
: I want to use INT sorting instead, but
> An int will be stored as a 2 char string which will be sorted "char by char"
> so
> they will be almost as fast as sorting as integers.
John, two problems:
1) Memory consumption - string sorting uses String[] instead of int[]
2) Lucene uses UTF-8 to store strings, and you can't round-trip
arbit
Doug Cutting writes (3/22/2005 10:05 AM):
Chuck Williams wrote:
If there is going to be any generalization to built-in sorting
representations, I'd like to suggest two things be included:
1. Fix issue 34028 (delete the one word "final")
Done.
Thank you!
2. Include a provision for query-time
Doug Cutting apache.org> writes:
> I'd like to see benchmarks that demonstrate the improvement before we
> consider including such a patch. You're making a lot of assumptions
> about where time is spent performing numeric searching and sorting.
> Sort and RangeFilter are already pretty effici
Chuck Williams wrote:
If there is going to
be any generalization to built-in sorting representations, I'd like to
suggest two things be included:
1. Fix issue 34028 (delete the one word "final")
Done.
2. Include a provision for query-time parameters
Can you provide a proposal?
Doug
--
John Patterson writes (3/22/2005 12:56 AM):
It would be great if this could be incorporated into Lucene as it will make
numeric searches much more efficient. I will soon need to store simple
geographical data in my index to do a "find the nearest x" type of search.
I just added "find the neares
John Patterson wrote:
It would be great if this could be incorporated into Lucene as it will make
numeric searches much more efficient.
I'd like to see benchmarks that demonstrate the improvement before we
consider including such a patch. You're making a lot of assumptions
about where time is sp
Chris Hostetter fucit.org> writes:
> I haven't worked through the math to prove to myself that your algorithm
> is a viable way of expressing any Integer as a 4 byte String; such that
> any two Integers sort lexigraphically correct as strings ... but let's
> assume that i have, and that it works
: > I can see in FieldDocSortedHitQueue where the case statement deals with
: > the various types of SortField, but at that point it's comparing FieldDoc
: > objects whose fields[i] is expected to allready be an "Integer" object.
: > where is that "Integer" object parsed from the String value of th
Chris Hostetter fucit.org> writes:
>
> So why couldn't a user specified NumberFormat object be used to
> convert that string into an Integer? Allowing people to format
> their numbers in a way that sorts lexigraphically for Range Filters,
> but still get the good Numeric Sot
: One annoyance I have run across is the impedance mismatch between
: range queries and sorting.
:
: If your terms are indexed as standard numbers, then integer sorting
: is fast, but range queries don't work (for negative values). If you
: format the terms such that range queries work for any i
: One annoyance I have run across is the impedance mismatch between
: range queries and sorting.
:
: If your terms are indexed as standard numbers, then integer sorting
: is fast, but range queries don't work (for negative values). If you
: format the terms such that range queries work for any in
Erik Hatcher ehatchersolutions.com> writes:
> Lucene's index works with any String. But, when dealing with numbers
> and dates such that range queries work, they need to be formatted in a
> way that makes them orderable.
What I am suggesting here is storing numeric values as unsigned binary v
Using a zero-padded number like 0001 is fine for both range queries
and for integer-based sorting. Are you finding otherwise?
Erik
On Mar 18, 2005, at 12:46 PM, Yonik Seeley wrote:
There is prefix compression used on term values. So you could pad
numbers with lots of leading zeros a
> There is prefix compression used on term values. So you could pad
> numbers with lots of leading zeros and not incur much additional
> size... 0001, for example.
Interesting...
One annoyance I have run across is the impedance mismatch between
range queries and sorting.
If your terms a
On Mar 18, 2005, at 11:21 AM, John Patterson wrote:
Because Lucene deals with String's lexicographically ordered.
I thought lexographical ordering simply used the Unicode value of the
chars and
so would also work with non alpa-numeric strings.
Lucene's index works with any String. But, when deal
> Because Lucene deals with String's lexicographically ordered.
I thought lexographical ordering simply used the Unicode value of the chars and
so would also work with non alpa-numeric strings.
> Is there an issue you're encountering?
No issue - I will soon need to add a lot of unstored numeric
Because Lucene deals with String's lexicographically ordered.
Is there an issue you're encountering?
Erik
On Mar 18, 2005, at 4:31 AM, John Patterson wrote:
Hi all,
I was wondering why NumberTools and DateTools create strings
restricted to
alpha-numberic values?
John.
22 matches
Mail list logo