Otis,
Can you give me/us a rough idea of what these are supposed to do? It's hard
to extrapolate the terse unit test code into much of a general notion. I
searched the archives with little success.
Regards,
Terry
- Original Message -
From: Otis Gospodnetic [EMAIL PROTECTED]
To:
I am using latest PDFbox library for parsing . I can parse a english
documents successfully but when I parse a document containing english and
japanese I do not get as I expected .
Have anyone tried using PDFBox library for parsing a japanese documents ? Or
do i need to use other parser like xPDF
hi scott,
Tnks for ur advise now i am using POI to convert word documents and made
sure that i convert into unicode before I put into lucene for indexing .
and working perfectly fine. Which parser is best for parsing PDF documents i
tried pdfbox but seems it doesnt work well with japanese
How exactly would you take advantage of a subclassable Hits class?
On Mar 21, 2004, at 6:01 AM, Terry Steichen wrote:
Does anyone know why the Hits class is final (thus preventing it from
being subclassed)?
Regards,
Terry
-
I have not tried these other tools yet.
Have you asked Ben Litchfield, the PDFBox author, about handling of
Japanese text?
Otis
--- Chandan Tamrakar [EMAIL PROTECTED] wrote:
I am using latest PDFbox library for parsing . I can parse a english
documents successfully but when I parse a document
Yes he did, but I was away the past couple days. As this is more of a
PDFBox issue I responded in the PDFBox forums, please follow the thread
there if you are interested.
Ben
On Mon, 22 Mar 2004, Otis Gospodnetic wrote:
I have not tried these other tools yet.
Have you asked Ben
On Fri, 2004-03-19 at 11:58, Doug Cutting wrote:
Doug Cutting wrote:
On Thu, 2004-03-18 at 13:32, Doug Cutting wrote:
Have you tried assigning these very small boosts (0 boost 1) and
assigning other query clauses relatively large boosts (boost 1)?
I don't think you understood my
Re-directing to lucene-user list.
One way of doing this is by writing a custom Analyzer that throws away
words you don't want to index (see an example of custom Analyzer in
jGuru FAQ). Another way would be to just re-use the existing Analyzers
and add words you don't want indexed to the
Erik,
There are a number of different possibilities which I'm still evaluating.
But if there is some significant reason for *not* subclassing Hits
(performance?), that will have a major bearing on whether the approach I'm
evaluating makes sense.
So, let me rephrase my question: Is the final
Terry,
I'm still quite curious how you plan to take advantage of a
subclassable Hits. Are you going to create your own IndexSearcher with
returns your subclass somehow?
You could use a HitCollector (which is what is used under the covers of
the Hits returning methods anyway) to emulate
I have some code that creates a lucene index. It has been working fine
with lucene-1.3-rc1.jar but I wanted to upgrade to lucene-1.3-final.jar.
I did this and the indexer breaks. I get the following error when
running the index with 1.3-final:
Optimizing the index
IOException:
Lucene does not iterate through the termPositions on one of my indexed data
sources. It used to iterate properly through this data source, but not
anymore. I tried on a different indexed data source and it iterates
properly. The Lucene index directory does not have any lock files either.
My code
Dan wrote:
I have some code that creates a lucene index. It has been working fine
with lucene-1.3-rc1.jar but I wanted to upgrade to
lucene-1.3-final.jar. I did this and the indexer breaks. I get the
following error when running the index with 1.3-final:
Optimizing the index
IOException:
Or use IndexWriter.setUseCompundFile(true) to reduce the number of files
created by Lucene.
http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/index/IndexWriter.html#setUseCompoundFile(boolean)
=Matt
Kevin A. Burton wrote:
Dan wrote:
I have some code that creates a lucene index. It
Just an RFE... if a lock times out we should probably throw the name of
the FSDirectory (or if it's a RAMDirectory) ...
I'm lazy so this is a reminder for either myself to do this or wait
until one of you guys take care of it :)
Kevin
--
Please reply using PGP.
15 matches
Mail list logo