Hi,
First of all please please always make sure that you use
exactly the same Analyser during indexing and searching.
I am not confident with the BrazilianAnalyzer, but I saw
in the source code that it does not use a ISOLatin1AccentFilter,
which replaces the accented characters (รง -> c).
Probab
You can extend the DefaultAnalyzer.
The only thing you have to do, is to rewrite the method tokenStream like
this:
/** Constructs a [EMAIL PROTECTED] StandardTokenizer} filtered by a [EMAIL
PROTECTED]
StandardFilter}, a [EMAIL PROTECTED] LowerCaseFilter} and a [EMAIL PROTECTED]
StopFilter}.
Hello Luceners
I have started a new project and need to index pdf documents.
There are several projects around, which allow to extract the content,
like pdfbox, xpdf and pjclassic.
As far as I studied the FAQ's and examples, all these
tools allow simple text extraction.
Which of these open sour
Hello Sengly
First of all you have to make sure, that you create new Fields, which
you add to a Document, with the appropriate constructor. You have to
specify the usage of term vectors (Field.TermVector.YES):
new Field("text", "your text...", Field.Store.YES,
Field.Index.TOKENIZED,Field.Ter
Hava a look at the "TermDocs" Interface in the API.
You can get term frequency with a open IndexReader
TermDocs termDocs = reader.termDocs(term);
where "term" represents the current Term.
now you can call:
termDocs.freq()
to get the frequency of the term within the current document.
For th
Write your own analyzer, which calls the appropriate Filter in the
method "tokenStream".
In the method "tokenStream" you can define, how the input should be
analyzed and parsed.
Your analyzer must extend the abstract class Analyzer. The easiest way
is to create a new class (Analyzer), which
You can adapt the source code of StopAnalyzer.java in the analysis
package, or I suppose you can use the default constructor with a empty
stop word list (but please check this).
If you don't know "Luke" use this small tool to display your index and
verify your index process.
http://www.getopt