Re: How to not tokenize HTML tag from input string

2007-02-08 Thread Yonik Seeley
On 2/8/07, Peter W. <[EMAIL PROTECTED]> wrote: Using a parser to get text out of HTML, XML (including RSS, ATOM) is only easy if you control the source documents. HTML pages in the wild are much different, generating exceptions you must catch and deal with. Yes, that's why the Solr version isn

Re: How to not tokenize HTML tag from input string

2007-02-08 Thread Peter W.
http://issues.apache.org/jira/browse/SOLR-42 : Date: Wed, 7 Feb 2007 17:04:54 -0800 (PST) : From: Joe Tang <[EMAIL PROTECTED]> : Reply-To: java-user@lucene.apache.org : To: java-user@lucene.apache.org : Subject: How to not tokenize HTML tag from input string : : : My work is to index ke

Re: How to not tokenize HTML tag from input string

2007-02-08 Thread Chris Hostetter
ply-To: java-user@lucene.apache.org : To: java-user@lucene.apache.org : Subject: How to not tokenize HTML tag from input string : : : My work is to index keywords with a document. In my case, the document is : made up with HTML tags which i don't want to index them. : : For example: : Inp

Re: How to not tokenize HTML tag from input string

2007-02-07 Thread Erick Erickson
ex them. For example: Input Document: You are welcome Testing text Expected Keywords: keywords:You keywords:are keywords:welcome keywords:Testing keywords:text Is there anyway I can make them not to be one of the keywords? -- View this message in context: http://www.nabble.com/How-to-not-tokeniz

How to not tokenize HTML tag from input string

2007-02-07 Thread Joe Tang
there anyway I can make them not to be one of the keywords? -- View this message in context: http://www.nabble.com/How-to-not-tokenize-HTML-tag-from-input-string-tf3190778.html#a8857789 Sent from the Lucene - Java Users mailing list archive at Nabbl