Re: NumberTools

2008-01-16 Thread mark harwood
Interesting question. Does zero-padding make primary key lookups faster or slower in lucene? From my tests it would seem that non-padded keys are quicker to lookup than zero-padded ones (tested doing random access on indexes of varying sizes up to 5m unique keys). However I imagine there could

Re: NumberTools - Range Searches

2007-09-18 Thread Mohammad Norouzi
Hi you may put two fields in your document one contains decoded values and another is original values, but still you need to implement your query builder so while searching you must specify the decoded filed and you show the original field to the user On 9/18/07, [EMAIL PROTECTED] <[EMAIL PROTECTE

Re: Numbertools and efficient sorting

2006-06-11 Thread Chris Hostetter
: > : I want to use INT sorting instead, but these strings can not be parsed : > : back into integers by Java's built in parsing functions, which is : > : > 1) Take a look at FieldCache.IntParser and : > FieldCache.getInts(IndexReader,String,IntParser) .. you can use it in your : > own custom Sort

Re: Numbertools and efficient sorting

2006-06-10 Thread Benjamin Stein
On 6/9/06, Chris Hostetter <[EMAIL PROTECTED]> wrote: : I have an integer field that I've indexed after converting to a string : using NumberTools.longToString(). : Now I want to sort my results using this field. Everything works when : treating the field as a string, but is very slow and memor

Re: Numbertools and efficient sorting

2006-06-09 Thread Chris Hostetter
: I have an integer field that I've indexed after converting to a string : using NumberTools.longToString(). : Now I want to sort my results using this field. Everything works when : treating the field as a string, but is very slow and memory intensive. : : I want to use INT sorting instead, but

Re: NumberTools

2005-03-24 Thread Yonik Seeley
> An int will be stored as a 2 char string which will be sorted "char by char" > so > they will be almost as fast as sorting as integers. John, two problems: 1) Memory consumption - string sorting uses String[] instead of int[] 2) Lucene uses UTF-8 to store strings, and you can't round-trip arbit

Re: NumberTools

2005-03-22 Thread Chuck Williams
Doug Cutting writes (3/22/2005 10:05 AM): Chuck Williams wrote: If there is going to be any generalization to built-in sorting representations, I'd like to suggest two things be included: 1. Fix issue 34028 (delete the one word "final") Done. Thank you! 2. Include a provision for query-time

Re: NumberTools

2005-03-22 Thread John Patterson
Doug Cutting apache.org> writes: > I'd like to see benchmarks that demonstrate the improvement before we > consider including such a patch. You're making a lot of assumptions > about where time is spent performing numeric searching and sorting. > Sort and RangeFilter are already pretty effici

Re: NumberTools

2005-03-22 Thread Doug Cutting
Chuck Williams wrote: If there is going to be any generalization to built-in sorting representations, I'd like to suggest two things be included: 1. Fix issue 34028 (delete the one word "final") Done. 2. Include a provision for query-time parameters Can you provide a proposal? Doug --

Re: NumberTools

2005-03-22 Thread Chuck Williams
John Patterson writes (3/22/2005 12:56 AM): It would be great if this could be incorporated into Lucene as it will make numeric searches much more efficient. I will soon need to store simple geographical data in my index to do a "find the nearest x" type of search. I just added "find the neares

Re: NumberTools

2005-03-22 Thread Doug Cutting
John Patterson wrote: It would be great if this could be incorporated into Lucene as it will make numeric searches much more efficient. I'd like to see benchmarks that demonstrate the improvement before we consider including such a patch. You're making a lot of assumptions about where time is sp

Re: NumberTools

2005-03-22 Thread John Patterson
Chris Hostetter fucit.org> writes: > I haven't worked through the math to prove to myself that your algorithm > is a viable way of expressing any Integer as a 4 byte String; such that > any two Integers sort lexigraphically correct as strings ... but let's > assume that i have, and that it works

Re: NumberTools

2005-03-22 Thread Chris Hostetter
: > I can see in FieldDocSortedHitQueue where the case statement deals with : > the various types of SortField, but at that point it's comparing FieldDoc : > objects whose fields[i] is expected to allready be an "Integer" object. : > where is that "Integer" object parsed from the String value of th

Re: NumberTools

2005-03-21 Thread John Patterson
Chris Hostetter fucit.org> writes: > > So why couldn't a user specified NumberFormat object be used to > convert that string into an Integer? Allowing people to format > their numbers in a way that sorts lexigraphically for Range Filters, > but still get the good Numeric Sot

Re: NumberTools

2005-03-21 Thread Chuck Williams
: One annoyance I have run across is the impedance mismatch between : range queries and sorting. : : If your terms are indexed as standard numbers, then integer sorting : is fast, but range queries don't work (for negative values). If you : format the terms such that range queries work for any i

Re: NumberTools

2005-03-21 Thread Chris Hostetter
: One annoyance I have run across is the impedance mismatch between : range queries and sorting. : : If your terms are indexed as standard numbers, then integer sorting : is fast, but range queries don't work (for negative values). If you : format the terms such that range queries work for any in

Re: NumberTools

2005-03-18 Thread John Patterson
Erik Hatcher ehatchersolutions.com> writes: > Lucene's index works with any String. But, when dealing with numbers > and dates such that range queries work, they need to be formatted in a > way that makes them orderable. What I am suggesting here is storing numeric values as unsigned binary v

Re: NumberTools

2005-03-18 Thread Erik Hatcher
Using a zero-padded number like 0001 is fine for both range queries and for integer-based sorting. Are you finding otherwise? Erik On Mar 18, 2005, at 12:46 PM, Yonik Seeley wrote: There is prefix compression used on term values. So you could pad numbers with lots of leading zeros a

Re: NumberTools

2005-03-18 Thread Yonik Seeley
> There is prefix compression used on term values. So you could pad > numbers with lots of leading zeros and not incur much additional > size... 0001, for example. Interesting... One annoyance I have run across is the impedance mismatch between range queries and sorting. If your terms a

Re: NumberTools

2005-03-18 Thread Erik Hatcher
On Mar 18, 2005, at 11:21 AM, John Patterson wrote: Because Lucene deals with String's lexicographically ordered. I thought lexographical ordering simply used the Unicode value of the chars and so would also work with non alpa-numeric strings. Lucene's index works with any String. But, when deal

Re: NumberTools

2005-03-18 Thread John Patterson
> Because Lucene deals with String's lexicographically ordered. I thought lexographical ordering simply used the Unicode value of the chars and so would also work with non alpa-numeric strings. > Is there an issue you're encountering? No issue - I will soon need to add a lot of unstored numeric

Re: NumberTools

2005-03-18 Thread Erik Hatcher
Because Lucene deals with String's lexicographically ordered. Is there an issue you're encountering? Erik On Mar 18, 2005, at 4:31 AM, John Patterson wrote: Hi all, I was wondering why NumberTools and DateTools create strings restricted to alpha-numberic values? John.