add(CharSequence) in automaton builder

2011-04-01 Thread Dawid Weiss
Mike, can you remember what ordering is required for add(CharSequence)? I see it requires INPUT_TYPE.BYTE4 assert fst.getInputType() == FST.INPUT_TYPE.BYTE4; but this would imply the order of full unicode codepoints on the input? Is this what String comparators do by default (I doubt, but wanted

Re: add(CharSequence) in automaton builder

2011-04-01 Thread Robert Muir
On Fri, Apr 1, 2011 at 7:58 AM, Dawid Weiss dawid.we...@gmail.com wrote: Mike, can you remember what ordering is required for add(CharSequence)? I see it requires INPUT_TYPE.BYTE4 assert fst.getInputType() == FST.INPUT_TYPE.BYTE4; but this would imply the order of full unicode codepoints on

Re: add(CharSequence) in automaton builder

2011-04-01 Thread Dawid Weiss
(sorry not mike, but) you are right, String.compareTo() compares in He, he, thanks Robert. We have these anti-child-abuse commercials on tv right now you never know who's on the other side... how appropriate for this situation. utf-16 order by default. this is not consistent with the order the

Re: add(CharSequence) in automaton builder

2011-04-01 Thread Robert Muir
On Fri, Apr 1, 2011 at 8:25 AM, Dawid Weiss dawid.we...@cs.put.poznan.pl wrote: Yes, this is what I also figured out. The unicode code point order is also impl. in BytesRef.getUTF8SortedAsUnicodeComparator, correct? For what I need I'll use raw utf8 byte order, it doesn't matter as long as

Re: add(CharSequence) in automaton builder

2011-04-01 Thread Michael McCandless
On Fri, Apr 1, 2011 at 8:29 AM, Robert Muir rcm...@gmail.com wrote: sorry, since you were talking about the charsequence api to builder, i assumed for a second you were working with chars/Strings, and forgot about how this is confusingly mixed with, yet distinct from, the whole BYTE1/BYTE4

Re: add(CharSequence) in automaton builder

2011-04-01 Thread Dawid Weiss
sorry, since you were talking about the charsequence api to builder, i assumed for a second you were working with chars/Strings, and forgot about how this is confusingly mixed with, yet distinct from, the whole BYTE1/BYTE4 selection in builder :) I am working with strings because that's what