Hi,
open office documents are getting indexed but when i search for the words of
those documents i am not seeing the correct result.
regards,
ganesh
Uwe Schindler wrote:
>
> For converting full text to plain text for indexing look at Apache TIKA,
> which has an converter for OpenDocument: http
Hi,
open office documents are getting indexed but when i search for the words of
those documents i am not seeing the correct result.
regards,
ganesh
ganesh H D wrote:
>
> Hi,
>
> I have been working on Apache Lucene from past 3 days. I tried to deploy
> the sample application which we get from
For converting full text to plain text for indexing look at Apache TIKA,
which has an converter for OpenDocument: http://lucene.apache.org/tika/
This Mailing List is *about* the development of Lucene, not about questions
*how* to develop own code that uses Lucene.
-
Uwe Schindler
H.-H.-Meier-
Questions about *using* the lucene APIs should be sent to the *user* list
... the dev list is for discusion about the development of the internals.
Please ask your question on that list, but before doing so you may want to
check out the FAQ on TooManyClauses and search the archives for
"prefix To
17 mar 2007 kl. 06.01 skrev Lokeya:
Help Appreciated.
There are even more, helpful, people in the java-users. You have a
greater chance to get a good answer in time there, as this forum
focus on development of the actual API rather than consumer
implementations.
--
karl
The approach I am currently using is (pseudo code):
select count(*) from docs
where date_modified > lastIndexRunDate
if ((countChangedOrNew/reader.numDocs) >50%)
{
//quicker to rebuild the whole index
wipeIndex;
Select * from docs
for (each record)
Hi
I am indexing database tables with huge data via Lucene. Do I need to reindex
the whole table(s) as changes are made to keep the search up to date..? since
it is time consuming to create new index every time from scratch when the data
is modified in the tables, can anybody suggest some work
age-
From: Mario Alejandro M. [mailto:[EMAIL PROTECTED]
Sent: 23 January 2006 15:58
To: Otis Gospodnetic
Cc: [email protected]
Subject: Re: Indexing Urls pointing to same content
I know Lucene is not a web indexer... maybe I explain this bad.
I'm asking in how STORE the data, not in ho
I know Lucene is not a web indexer... maybe I explain this bad.
I'm asking in how STORE the data, not in how locate it. If two files are the
same, using MD5 is my actual approach, then I plan to STORE the content once
but is necesary add the two locations.
Example:
c:\file1 Content: One
c:\file2
Mario,
Lucene != web indexer, so Lucene doesn't know anything about files or URLs,
etc. It just indexes what it's told. You should check how Nutch does it, and
I believe it does it by comparing "fingerprints" of web pages. Fingerprints
are MD5 checksums, but I believe the recent changes ther
:
: Taking this to java-dev: Since this is such a common issue, would it
: be feasible for Lucene to have some sort of capability to be told
: what field is the unique one and automatically update (delete, and
: add) a document added with a duplicate of a unique field? This
: would probably requi
Taking this to java-dev: Since this is such a common issue, would it
be feasible for Lucene to have some sort of capability to be told
what field is the unique one and automatically update (delete, and
add) a document added with a duplicate of a unique field? This
would probably require t
Please post to java-user for such questions in the future.
The short answer with Lucene is, if you can get text, you can index
it. Lucene doesn't crawl URLs. Maybe you want Nutch instead for
this feature? Or perhaps WebDAV access? Lots of ways, none
directly related to Lucene though.
: probably you'll need http client module (commons-httpclient or something)
More specifically: when dealing with lucene, the concept of a "document"
is very specific: it is an instance of
org.apache.lucene.document.Document. how you construct one of these
Document objects in your application is
probably you'll need http client module (commons-httpclient or something)
2005/10/27, [EMAIL PROTECTED] <[EMAIL PROTECTED]>:
> Can Lucene index remote documents? For example, if there are some documents
> at http://server:/documents, can I index the documents directory tree?
> Any help wou
15 matches
Mail list logo