RE: Indexing Open office documents

2008-11-24 Thread ganesh H D
Hi, open office documents are getting indexed but when i search for the words of those documents i am not seeing the correct result. regards, ganesh Uwe Schindler wrote: > > For converting full text to plain text for indexing look at Apache TIKA, > which has an converter for OpenDocument: http

Re: Indexing Open office documents

2008-11-24 Thread ganesh H D
Hi, open office documents are getting indexed but when i search for the words of those documents i am not seeing the correct result. regards, ganesh ganesh H D wrote: > > Hi, > > I have been working on Apache Lucene from past 3 days. I tried to deploy > the sample application which we get from

RE: Indexing Open office documents

2008-11-21 Thread Uwe Schindler
For converting full text to plain text for indexing look at Apache TIKA, which has an converter for OpenDocument: http://lucene.apache.org/tika/ This Mailing List is *about* the development of Lucene, not about questions *how* to develop own code that uses Lucene. - Uwe Schindler H.-H.-Meier-

Re: Indexing and searching help

2007-07-03 Thread Chris Hostetter
Questions about *using* the lucene APIs should be sent to the *user* list ... the dev list is for discusion about the development of the internals. Please ask your question on that list, but before doing so you may want to check out the FAQ on TooManyClauses and search the archives for "prefix To

Re: Indexing time taken is too long - Help Appreciated.

2007-03-17 Thread karl wettin
17 mar 2007 kl. 06.01 skrev Lokeya: Help Appreciated. There are even more, helpful, people in the java-users. You have a greater chance to get a good answer in time there, as this forum focus on development of the actual API rather than consumer implementations. -- karl

Re: Re Indexing

2006-02-23 Thread mark harwood
The approach I am currently using is (pseudo code): select count(*) from docs where date_modified > lastIndexRunDate if ((countChangedOrNew/reader.numDocs) >50%) { //quicker to rebuild the whole index wipeIndex; Select * from docs for (each record)

Re Indexing

2006-02-23 Thread N
Hi I am indexing database tables with huge data via Lucene. Do I need to reindex the whole table(s) as changes are made to keep the search up to date..? since it is time consuming to create new index every time from scratch when the data is modified in the tables, can anybody suggest some work

RE: Indexing Urls pointing to same content

2006-01-23 Thread Gwyn Carwardine
age- From: Mario Alejandro M. [mailto:[EMAIL PROTECTED] Sent: 23 January 2006 15:58 To: Otis Gospodnetic Cc: [email protected] Subject: Re: Indexing Urls pointing to same content I know Lucene is not a web indexer... maybe I explain this bad. I'm asking in how STORE the data, not in ho

Re: Indexing Urls pointing to same content

2006-01-23 Thread Mario Alejandro M.
I know Lucene is not a web indexer... maybe I explain this bad. I'm asking in how STORE the data, not in how locate it. If two files are the same, using MD5 is my actual approach, then I plan to STORE the content once but is necesary add the two locations. Example: c:\file1 Content: One c:\file2

Re: Indexing Urls pointing to same content

2006-01-20 Thread Otis Gospodnetic
Mario, Lucene != web indexer, so Lucene doesn't know anything about files or URLs, etc. It just indexes what it's told. You should check how Nutch does it, and I believe it does it by comparing "fingerprints" of web pages. Fingerprints are MD5 checksums, but I believe the recent changes ther

Re: Indexing

2005-10-31 Thread Chris Hostetter
: : Taking this to java-dev: Since this is such a common issue, would it : be feasible for Lucene to have some sort of capability to be told : what field is the unique one and automatically update (delete, and : add) a document added with a duplicate of a unique field? This : would probably requi

Re: Indexing

2005-10-31 Thread Erik Hatcher
Taking this to java-dev: Since this is such a common issue, would it be feasible for Lucene to have some sort of capability to be told what field is the unique one and automatically update (delete, and add) a document added with a duplicate of a unique field? This would probably require t

Re: Indexing Remote Documents

2005-10-27 Thread Erik Hatcher
Please post to java-user for such questions in the future. The short answer with Lucene is, if you can get text, you can index it. Lucene doesn't crawl URLs. Maybe you want Nutch instead for this feature? Or perhaps WebDAV access? Lots of ways, none directly related to Lucene though.

Re: Indexing Remote Documents

2005-10-27 Thread Chris Hostetter
: probably you'll need http client module (commons-httpclient or something) More specifically: when dealing with lucene, the concept of a "document" is very specific: it is an instance of org.apache.lucene.document.Document. how you construct one of these Document objects in your application is

Re: Indexing Remote Documents

2005-10-27 Thread DalHo Park
probably you'll need http client module (commons-httpclient or something) 2005/10/27, [EMAIL PROTECTED] <[EMAIL PROTECTED]>: > Can Lucene index remote documents? For example, if there are some documents > at http://server:/documents, can I index the documents directory tree? > Any help wou