subject:"How do you see if a tokenstream has tokens without consuming the tokens \?"

Re: How do you see if a tokenstream has tokens without consuming the tokens ?

2011-10-20 Thread Paul Taylor

On 19/10/2011 15:17, Steven A Rowe wrote: Hi Paul, What version of Lucene are you using? The JFlex spec you quote below looks pre-v3.1? Yes, we copied a version of StandardTokenizer from 2.4 to make some changes, we are actually on 3.1 now but haven't spent any time looking at the new token

RE: How do you see if a tokenstream has tokens without consuming the tokens ?

2011-10-19 Thread Steven A Rowe

t;> "'java- > u...@lucene.apache.org'" > Subject: Re: How do you see if a tokenstream has tokens without consuming > the tokens ? > > On 18/10/2011 05:19, Steven A Rowe wrote: > > Hi Paul, > > > > You could add a rule to the StandardTokenizer JFlex

RE: How do you see if a tokenstream has tokens without consuming the tokens ?

2011-10-19 Thread Steven A Rowe

Hi Paul, On 10/19/2011 at 5:26 AM, Paul Taylor wrote: > On 18/10/2011 15:25, Steven A Rowe wrote: > > On 10/18/2011 at 4:57 AM, Paul Taylor wrote: > > > On 18/10/2011 06:19, Steven A Rowe wrote: > > > > Another option is to create a char filter that substitutes > > > > PUNCT-EXCLAMATION for exclam

Re: How do you see if a tokenstream has tokens without consuming the tokens ?

2011-10-19 Thread Paul Taylor

y when the entire input consists exclusively of whitespace and punctuation. These symbols would then be left intact by StandardTokenizer. Steve -Original Message- From: Paul Taylor [mailto:paul_t...@fastmail.fm] Sent: Monday, October 17, 2011 8:13 AM To: 'java-user@lucene.a

Re: How do you see if a tokenstream has tokens without consuming the tokens ?

2011-10-19 Thread Paul Taylor

On 18/10/2011 15:25, Steven A Rowe wrote: Hi Paul, On 10/18/2011 at 4:57 AM, Paul Taylor wrote: On 18/10/2011 06:19, Steven A Rowe wrote: Another option is to create a char filter that substitutes PUNCT-EXCLAMATION for exclamation points, PUNCT-PERIOD for periods, etc., Yes that is how I firs

RE: How do you see if a tokenstream has tokens without consuming the tokens ?

2011-10-18 Thread Steven A Rowe

Hi Paul, On 10/18/2011 at 4:57 AM, Paul Taylor wrote: > On 18/10/2011 06:19, Steven A Rowe wrote: > > Another option is to create a char filter that substitutes > > PUNCT-EXCLAMATION for exclamation points, PUNCT-PERIOD for periods, > > etc., > > Yes that is how I first did it No, I don't think

Re: How do you see if a tokenstream has tokens without consuming the tokens ?

2011-10-18 Thread Paul Taylor

On 18/10/2011 06:19, Steven A Rowe wrote:On 18/10/2011 06:19, Steven A Rowe wrote: Hi Paul, You could add a rule to the StandardTokenizer JFlex grammar to handle this case, bypassing its other rules. Hmm, dont really understand jflex, but that is a possibility, but would prefer to do in Java c

RE: How do you see if a tokenstream has tokens without consuming the tokens ?

2011-10-17 Thread Steven A Rowe

exclusively of whitespace and punctuation. These symbols would then be left intact by StandardTokenizer. Steve > -Original Message- > From: Paul Taylor [mailto:paul_t...@fastmail.fm] > Sent: Monday, October 17, 2011 8:13 AM > To: 'java-user@lucene.apache.org' > Sub

Re: How do you see if a tokenstream has tokens without consuming the tokens ?

2011-10-17 Thread Sujit Pal

Hi Paul, Since you have modified the StandardAnalyzer (I presume you mean StandardFilter), why not do a check on the term.text() and if its all punctuation, skip the analysis for that term? Something like this in your StandardFilter: public final boolean incrementToken() throws IOException { Ch

How do you see if a tokenstream has tokens without consuming the tokens ?

2011-10-17 Thread Paul Taylor

We have a modified version of a Lucene StandardAnalyzer , we use it for tokenizing music metadata such as as artist names & song titles, so typically only a few words. On tokenizing it usually it strips out punctuations which is correct, however if the input text consists of only punctuation

Re: How do you see if a tokenstream has tokens without consuming the tokens ?

RE: How do you see if a tokenstream has tokens without consuming the tokens ?

RE: How do you see if a tokenstream has tokens without consuming the tokens ?

Re: How do you see if a tokenstream has tokens without consuming the tokens ?

Re: How do you see if a tokenstream has tokens without consuming the tokens ?

RE: How do you see if a tokenstream has tokens without consuming the tokens ?

Re: How do you see if a tokenstream has tokens without consuming the tokens ?

RE: How do you see if a tokenstream has tokens without consuming the tokens ?

Re: How do you see if a tokenstream has tokens without consuming the tokens ?

How do you see if a tokenstream has tokens without consuming the tokens ?

10 matches

Site Navigation

Mail list logo

Footer information