Re: are long words split into up to 256 long tokens?

2010-04-21 Thread jm
; Uwe Schindler > H.-H.-Meier-Allee 63, D-28213 Bremen > http://www.thetaphi.de > eMail: u...@thetaphi.de > > >> -Original Message- >> From: jm [mailto:jmugur...@gmail.com] >> Sent: Wednesday, April 21, 2010 3:59 PM >> To: java-user@lucene.apache

RE: are long words split into up to 256 long tokens?

2010-04-21 Thread Uwe Schindler
t; From: jm [mailto:jmugur...@gmail.com] > Sent: Wednesday, April 21, 2010 3:59 PM > To: java-user@lucene.apache.org > Subject: Re: are long words split into up to 256 long tokens? > > oh, yes it does extend CharTokenizer..thanks Ahmet. I had searched > lucene source code for 256 and found

Re: are long words split into up to 256 long tokens?

2010-04-21 Thread jm
oh, yes it does extend CharTokenizer..thanks Ahmet. I had searched lucene source code for 256 and found nothing suspicious, and that was itself suspicious cause it looked clearly like an inner limit. Of course I should have searched for 255... I'll see how I proceed cause I don't want to use a cus

Re: are long words split into up to 256 long tokens?

2010-04-21 Thread Ahmet Arslan
> Is 256 some inner maximum too > in some > lucene internal that causes this? What is happening is that > the long > word is split into smaller words up to 256 and then the min > and max > limit applied. Is that correct? I have removed LengthFilter > and still > see the splitting at 256 happen. I w

are long words split into up to 256 long tokens?

2010-04-21 Thread jm
I am analizying this wiht my custom analyzer: String s = "mail77 mail8 tc ro45mine durante jjkk