Hi Jack,
Thanks! I'll look at it.
Koji
On 2015/02/24 22:29, Jack Krupansky wrote:
This is the first mention that I have seen for that corpus on this list.
There seem to be more than a few references when I google for ""brown
corpus" lucene", such as:
https://github.com/INL/BlackLab/wiki/Black
This is the first mention that I have seen for that corpus on this list.
There seem to be more than a few references when I google for ""brown
corpus" lucene", such as:
https://github.com/INL/BlackLab/wiki/Blacklab-query-tool
-- Jack Krupansky
On Tue, Feb 24, 2015 at 1:40 AM, Koji Sekiguchi wro
The word delimiter filter has the ability to pass a table which specifies
the type for a character:
http://lucene.apache.org/core/4_5_1/analyzers-common/org/apache/lucene/analysis/miscellaneous/WordDelimiterFilter.html
http://lucene.apache.org/core/4_5_1/analyzers-common/org/apache/lucene/analy
Thanks, I think I got it.
-Original Message-
From: John Byrne [mailto:john.by...@propylon.com]
Sent: Friday, July 17, 2009 2:43 PM
To: java-user@lucene.apache.org
Subject: Re: Tokenizer queston: how can I force ? and ! to be separate
tokens?
Yes, you could even use the
Yes, you could even use the WhitespaceTokenizer and then look for the
symbols in a token filter. You would get [you?] as a single token; your
job in the token filter is then to store the [?] and return the [you].
The next time the token filter is called for the next token, you return
the [?] th
I'd think extending WhiteSpaceTokenizer would be a good place to start.
Then create a new Analyzer that exactly mirrors your current Analyzer,
with the exception that it uses your new tokenizer instead of
WhiteSpaceTokenizer (Well.. there is of course my assumption that you
are using an Analyz
Hello,
> I have two questions.
>
> First, Is there a tokenizer that takes every word and simply
> makes a token
> out of it?
org.apache.lucene.analysis.WhitespaceTokenizer
> So it looks for two white spaces and takes the characters
> between them and makes a token out of them?
>
> If this to