lucene with dbx reader

2007-01-25 Thread Bhavin Pandya
Hi guys, Can i open outlook express mails ( dbx file) in java and index the mails using lucene.. ? Do anybody aware of such dbx reader ??? I tried one "mstos" but its not working properly... - Bhavin pandya

Re: Building Lucene index for XML document

2007-01-25 Thread maureen tanuwidjaja
Thanks a lot Daniel :) Regards, Maureen Daniel Noll <[EMAIL PROTECTED]> wrote: maureen tanuwidjaja wrote: > Before implementing this search engine,I have designed to build the > index in such a way that every XML tag is converted using binary > value,in order to reduce the size ind

Re: Exception while retrieving 100th element id in hits.id()

2007-01-25 Thread Mukesh Bhardwaj
Thanks Doron,you are right ,I'm performing delete operation. Doron Cohen <[EMAIL PROTECTED]> wrote: Hi Mukesh, Are you by a chance deleting docs in that loop, using the same reader as the one used the searcher? If so, using a separate reader for delete would fix that. Also see related disc

Re: Exception while retrieving 100th element id in hits.id()

2007-01-25 Thread Mukesh Bhardwaj
Thanks Doron,you are right ,I'm performing delete operation. Doron Cohen <[EMAIL PROTECTED]> wrote: Hi Mukesh, Are you by a chance deleting docs in that loop, using the same reader as the one used the searcher? If so, using a separate reader for delete would fix that. Also see related discu

Re: lucene with dbx reader

2007-01-25 Thread karl wettin
25 jan 2007 kl. 10.15 skrev Bhavin Pandya: Can i open outlook express mails ( dbx file) in java and index the mails using lucene.. ? Not out of the box, sorry. But I think Outlook express files are nothing but OLE2 documents. If that is true, then you could use the POIFS part of Jakarta P

Lock obtain timed out SimpleFSLock

2007-01-25 Thread maureen tanuwidjaja
Hi, I am indexing thousands of XML document,then it stops after indexing for about 7 hrs ... Indexing C:\sweetpea\wikipedia_xmlfiles\part-0\37003.xml Indexing C:\sweetpea\wikipedia_xmlfiles\part-0\37004.xml Indexing C:\sweetpea\wikipedia_xmlfiles\part-0\37008.xml Indexing C:\swee

Re: Building Lucene index for XML document

2007-01-25 Thread maureen tanuwidjaja
btw Daniel,can please give me the reference to find the explanation about SegmentTermEnum/Field Infos if such one exist? I search but best can only find http://lucene.apache.org/java/docs/clover/org/apache/lucene/index/SegmentTermEnum.html which is the source code only... Many thanks and B

Re: Lock obtain timed out SimpleFSLock

2007-01-25 Thread Michael McCandless
maureen tanuwidjaja wrote: I am indexing thousands of XML document,then it stops after indexing for about 7 hrs ... Indexing C:\sweetpea\wikipedia_xmlfiles\part-0\37027.xml java.io.IOException: Lock obtain timed out: [EMAIL PROTECTED]:\sweetpea\dual_index\DI\write.lock java.lang

jetspeed lucene portlet example

2007-01-25 Thread e.j.w.vanbloem
Hello, I am new to lucene and I tried the make a simple search engine by following the book 'portla development with open source tools' but I can not get it to work. Can somebody give/ direct me to a simple search code example for jetspeed2 Regards, Erik

AW: modifier.optimize() causes Java heap space (OutOfMemoryException)

2007-01-25 Thread Marcel Morisse
Hey, thank you for all your help, I could likely fix the problem. Unfortunately I needed 10 hours to find the error (1 line of code) ;-) I forgot to close the IndexSearcher after an index query, so, I think, lots of instances of IndexSearchers (one for every query) were hold in the memory. Thi

Re: Lock obtain timed out SimpleFSLock

2007-01-25 Thread Erick Erickson
One way to mitigate the cost of this kind of thing is to create a series of indexes on portions of your corpus and then merge them. Say you have 10,000 documents. Create 10 separate indexes of 1,000 documents each then use IndexWriter.addIndexes to make them all into a single index. This pre-supp

Re: Lock obtain timed out SimpleFSLock

2007-01-25 Thread maureen tanuwidjaja
Hi Mike,thanks for the reply... 1.Here is the class that I use for indexing.. package edu.ntu.ce.maureen.index; import org.apache.lucene.index.IndexWriter; import org.apache.lucene.analysis.Analyzer; import org.apache.lucene.analysis.standard.StandardAnalyzer; import org.

Re: Lock obtain timed out SimpleFSLock

2007-01-25 Thread Erick Erickson
Don't do it that way. You're opening and closing your indexwrwiter for each document, which is extremely wasteful. And given locking has been a source of much discussion on this list, it's not clear that locking will withstand this kind of hammering. You want to do something like IndexWriter writ

Re: Lock obtain timed out SimpleFSLock

2007-01-25 Thread Michael McCandless
Erick Erickson wrote: Don't do it that way. You're opening and closing your indexwrwiter for each document, which is extremely wasteful. And given locking has been a source of much discussion on this list, it's not clear that locking will withstand this kind of hammering. You want to do something

RE: Low hits

2007-01-25 Thread DECAFFMEYER MATHIEU
Thank u for your reply, There is not much help in Regain community, But I can see that when I type e.g. title:logistics I have like 0.70 also headlines:logistics 0.70 But when I type logistics I have 0.02 I do not udnerstand since I added this word as title and headlines and I need a higher sc

Re: Lock obtain timed out SimpleFSLock

2007-01-25 Thread maureen tanuwidjaja
Hi Erick and Mike Really thanks a lot for the advice... =) I will fix my code..I'll let you guys know if any problem arises. Many thanks and best regards ^ ^ MauReen Michael McCandless <[EMAIL PROTECTED]> wrote: Erick Erickson wrote: > Don't do it that way. You're ope

SpellChecker::suggestSimilar() Question

2007-01-25 Thread Ryan O'Hara
It seems that the suggestions returned by SpellChecker::suggestSimilar (queryText, num_sug, reader, field, bool) are randomly chosen, then sorted. By altering num_sug (10, 5, 3,2,1), I received the following suggestions for "gnetics": suggestion0: genetics suggestion1: ginetics suggestion2:

What type of query best for OR with high score?

2007-01-25 Thread Arturo Perez
Hi all, Which type of query should I use for the following type of thing. I have multiple words/phrases. I want to run a search for them all OR'd together. But I want the documents with the most distinct matches to have the highest score. An example. I want to search for "TOM OR DICK OR HARRY

Anyone have an XMLAnalyzer?

2007-01-25 Thread Arturo Perez
Is there an analyzer that can work with XML? Any suggestions for such? -arturo - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: SpellChecker::suggestSimilar() Question

2007-01-25 Thread karl wettin
25 jan 2007 kl. 20.43 skrev Ryan O'Hara: Is there anyway to sort the suggestions prior, so that grabbing only one suggestion would give you the best suggestion, in this case "genetics"? Without having looked at the code for a long time, I think the problem is what the lucene scoring cons

Re: Building Lucene index for XML document

2007-01-25 Thread Doron Cohen
Hi Maureen, Some relevant info in the file formats doc - http://lucene.apache.org/java/docs/fileformats.html Regards, Doron maureen tanuwidjaja <[EMAIL PROTECTED]> wrote on 25/01/2007 01:31:25: > btw Daniel,can please give me the reference to find the explanation > about SegmentTermEnum/Field I

Re: Lock obtain timed out SimpleFSLock

2007-01-25 Thread Mark Miller
But, locking should be fine even for this "hammering" use case (and if it's not, that's a bug, and I'd really like to know about it!). I have hammered over 2.5 million 5-10k docs into an index this way (a realtime system that I had not yet added a special load call to) and had 0 problems. On

Re: Anyone have an XMLAnalyzer?

2007-01-25 Thread Simon Willnauer
It's just a google query away :) http://www.google.com/search?hl=de&q=Lucene+XML+analyze&btnG=Google-Suche&meta= best regards simon On 1/25/07, Arturo Perez <[EMAIL PROTECTED]> wrote: Is there an analyzer that can work with XML? Any suggestions for such? -arturo --

Re: corrupt index: .fdx and stored norms

2007-01-25 Thread Doron Cohen
Hi Nick, Have you managed to solve/recreate this issue? There has been a recent progress on index corruption issues: http://issues.apache.org/jira/browse/LUCENE-140 http://issues.apache.org/jira/browse/LUCENE-784 In those cases an application created FSDirectory with create=false and created

Re: Anyone have an XMLAnalyzer?

2007-01-25 Thread Arturo PĂ©rez
In article <[EMAIL PROTECTED]>, "Simon Willnauer" <[EMAIL PROTECTED]> wrote: > http://www.google.com/search?hl=de&q=Lucene+XML+analyze&btnG=Google-Suche&meta > = Yeah, I'd seen that. I was hoping for something a bit more tightly integrated than Digester. More specifically, I already parse my

Re: Lucene Indexing

2007-01-25 Thread Sairaj Sunil
Hi I was asking what exactly is the inverted indexing strategy used for storing the index. Is it batch-based index/b-tree based/segment-based data structure that is used as an index data structure. On 1/25/07, Rajiv Roopan <[EMAIL PROTECTED]> wrote: http://lucene.apache.org/java/docs/api/org/

How many documents in the biggest Lucene index to date?

2007-01-25 Thread Bill Taylor
I have used Lucene to index a small collection - only a few hundred documents. I have a potential client who wants to index a collection which will start at about a million documents and could easily grow to two million. Has anyone used Lucene with an index that large? Thank you very much

Re: Extending scoring to eliminate sorting on timestamp

2007-01-25 Thread Chris Hostetter
: For various reasons, we'd like to eliminate the sort step. can you elaborate on what those reasons are? FunctionQuery (in the solr code base, you'll find lots of discussing in the archives of this list) can let you use a numeric field value in the score calculation, but it still uses the Field

Re: Building Lucene index for XML document

2007-01-25 Thread maureen tanuwidjaja
Thanks Doron =) Regards, Maureen Doron Cohen <[EMAIL PROTECTED]> wrote: Hi Maureen, Some relevant info in the file formats doc - http://lucene.apache.org/java/docs/fileformats.html Regards, Doron maureen tanuwidjaja wrote on 25/01/2007 01:31:25: > btw Daniel,can please give me the ref