On Jan 17, 2005, at 4:51 AM, Chris Lamprecht wrote:
I submitted a testcase --
http://issues.apache.org/bugzilla/show_bug.cgi?id=33134
I reviewed and applied your contributed unit test. Thanks!
Erik
-
To unsubscribe, e-mail
€ 0.02: Indexing code "++" is a stop term, it might be in english text
as well. 'C' is a not very descriptive but very valid variable name. '#'
is used in some old morse transcripts I think. I am not going to die or
get fired, but I'd suggest not including those tokens in a standard
anything.
I personally don't have a problem with that change, however I don't
like changing such things as they can lead to unexpected and confusing
issues later. Suppose someone upgrades their version of Lucene without
re-indexing and now queries that used to work no longer work? (sure, I
agree it is
Erik, Paul, Daniel,
I submitted a testcase --
http://issues.apache.org/bugzilla/show_bug.cgi?id=33134
On a related note, what do you all think about updating the
StandardAnalyzer grammar to treat "C#" and "C++" as tokens? It's a
small modification to the grammar -- NutchAnalysis.jj has it.
-Chr
I don't see any tests of StandardAnalyzer either. Your contribution
would be most welcome. There are tests that use StandardAnalyzer, but
not to test it directly.
Erik
On Jan 16, 2005, at 11:48 PM, Chris Lamprecht wrote:
Does anyone have a unit test for StandardAnalyzer? I've modified
Chris,
On Monday 17 January 2005 05:49, Chris Lamprecht wrote:
> PS-I didn't find any in lucene CVS head, and I'd be glad to contribute
> some unit tests.
Under Unix this will give you the cvs head:
cvs -d :pserver:[EMAIL PROTECTED]:/home/cvspublic checkout jakarta-lucene
The tests are in the j
PS-I didn't find any in lucene CVS head, and I'd be glad to contribute
some unit tests.
> Does anyone have a unit test for StandardAnalyzer? I've modified the
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands,
Does anyone have a unit test for StandardAnalyzer? I've modified the
StandardAnalyzer javacc grammar to tokenize "c#" and "c++" without
removing the "#" and "++" parts, using pieces of the grammar from
Nutch. Now I'd like to make sure I didn't change the way it parses
any other tokens. thanks,