URL Tokenization

2010-06-23 Thread Sudha Verma
Hi, I am new to lucene and I am using Lucene 3.0.2. I am using Lucene to parse text which may contain URLs. I noticed the StandardTokenizer keeps the email addresses in one token, but not the URLs. I also looked at Solr wiki pages, and even though the wiki page for solr.StandardTokenizerFactory s

RE: URL Tokenization

2010-06-23 Thread Steven A Rowe
RFCs. Steve > -Original Message- > From: Sudha Verma [mailto:verma.su...@gmail.com] > Sent: Wednesday, June 23, 2010 2:07 PM > To: java-user@lucene.apache.org > Subject: URL Tokenization > > Hi, > > I am new to lucene and I am using Lucene 3.0.2. > >

Re: URL Tokenization

2010-06-24 Thread Sudha Verma
Hi Steve, Thanks for the quick reply and implementing support for URL tokenization. Another newbie question about applying this patch. I have the Lucene 3.0.2 source and I downloaded the patch and tried to apply it: lucene-3.0.2> patch -p0 < LUCENE-2167.patch Comes back with the error m

RE: URL Tokenization

2010-06-24 Thread Steven A Rowe
gt; Subject: Re: URL Tokenization > > Hi Steve, > > Thanks for the quick reply and implementing support for URL tokenization. > Another newbie question about applying this patch. > > I have the Lucene 3.0.2 source and I downloaded the patch and tried to > apply > it: &

Re: URL Tokenization

2010-06-25 Thread Sudha Verma
riginal Message- > > From: Sudha Verma [mailto:verma.su...@gmail.com] > > Sent: Wednesday, June 23, 2010 2:07 PM > > To: java-user@lucene.apache.org > > Subject: URL Tokenization > > > > Hi, > > > > I am new to lucene and I am using L