Re: indexing incrementally concurrently

2004-07-05 Thread Michael Wechner
Erik Hatcher wrote: On Jul 5, 2004, at 9:00 AM, Michael Wechner wrote: If several users are saving documents on the server concurrently and during saving the index shall be updated incrementally ... do I have to make sure that it's going to be "threadsave" or does Lucene take care o

indexing incrementally concurrently

2004-07-05 Thread Michael Wechner
If several users are saving documents on the server concurrently and during saving the index shall be updated incrementally ... do I have to make sure that it's going to be "threadsave" or does Lucene take care of this? Thanks Michi -- Michael Wechner Wyona Inc. - Open Source Con

Re: incrementally indexing a million documents

2004-07-05 Thread Michael Wechner
p the precision on the directory to a 3 digit setup or a 4 digit setup (once you automate it, sky's the limit) Hope this helps Nader Henein Michael Wechner wrote: I try to index around a million documents. The problem is that I run out of memory during sorting by uid when I go through the d

incrementally indexing a million documents

2004-06-14 Thread Michael Wechner
llion documents). Is there another approach than sorting by uid? Thanks Michi -- Michael Wechner Wyona Inc. - Open Source Content Management - Apache Lenya http://www.wyona.com http://cocoon.apache.org/lenya/ [EMAIL PROTECTED][EMAIL PROT

Re: sorting by date (XML)

2004-04-27 Thread Michael Wechner
ds, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] -- Michael Wechner Wyona Inc. - Open Source Content Management - Apache Lenya http://www

Re: sorting by date (XML)

2004-04-27 Thread Michael Wechner
hich is the DOC ID) and save yourself the pain unfortunately this isn't possible. Thanks a lot for your help Michi Hope this helps. Nader Henein -Original Message- From: Michael Wechner [mailto:[EMAIL PROTECTED] Sent: Tuesday, April 27, 2004 3:52 PM To: Lucene Users List Subj

sorting by date (XML)

2004-04-27 Thread Michael Wechner
my XML files contain something like 20040427... and I would like to sort by this date. So I guess I need to modify the Documentparser and generate something like a millisecond field and then sort by this, correct? Has anyone done something like this yet? Thanks Michi -- Michael Wechner

Re: what web crawler work best with Lucene?

2004-04-27 Thread Michael Wechner
] For additional commands, e-mail: [EMAIL PROTECTED] -- Michael Wechner Wyona Inc. - Open Source Content Management - Apache Lenya http://www.wyona.com http://cocoon.apache.org/lenya/ [EMAIL PROTECTED][EMAIL PROTECTED

Re: org.apache.lucene.demo.IndexHTML - parse JSP files?

2003-03-24 Thread Michael Wechner
John Bresnik wrote: anyone know of a quick and easy way to get this demo [org.apache.lucene.demo.IndexHTML] to parse JSP files as well? I used to a crawler to create a local [static] version of the site [i.e. they are not longer "JSP" files just the html output from the original JSP file - but in

Re: xpdf parser usage for lucene

2003-02-25 Thread Michael Wechner
Pinky Iyer wrote: Hi ! I am trying to use xpdf for pdf parser, the problem i encounter is when i encounter a file with .pdf extension, i call the pdftotext script to convert to text, which in turn uses the file system and leaves the same file with .txt extension in same dir. How can i get thi

Re: PLAN: WebLucene -- Lucene Web interface, use XML as a lightweightprotocol.

2003-02-20 Thread Michael Wechner
That's very interesting. I have tried something similar by integrating Lucene into Wyona, which is a CMS based on Cocoon, and I also separated Structure from Layout. You can try it out at HTML: http://195.226.6.70:8080/wyona-cms/oscom/search-oscom/lucene?publication-id=all&queryString=Cocoon+Wyo

Re: or

2003-01-30 Thread Michael Wechner
Erik Hatcher wrote: On Thursday, January 30, 2003, at 07:07 PM, Michael Wechner wrote: Maybe Erik wants to include an "improved" version of my code snippet into CVS. Only if it can be made generic somehow - but that might be a bit tricky to implement depending on how crazy we

Re: or

2003-01-30 Thread Michael Wechner
Erik Hatcher wrote: On Thursday, January 30, 2003, at 06:59 PM, Michael Wechner wrote: 2) I got two Javadoc warnings, because @return was empty within HtmlDocument (getDocument() and Document()) picky picky! :) But thanks - I'll correct those too. sorry for that, but ant

Re: or

2003-01-30 Thread Michael Wechner
VS. I guess I am not the only one wanting to exclude certain parts from an HTML page ;-) All the best Michael Regards, Kelvin The book giving manifesto - http://how.to/sharethisbook On Thu, 30 Jan 2003 10:56:50 +0100, Michael Wechner said: Hi I am looking for an HTMLParser w

Re: or

2003-01-30 Thread Michael Wechner
Ronnie Kolehmainen wrote: Michael, the HtmlDocument class supports ignoring tags, ie all text inside specified tag names is ignored. Look at the setIgnoreTags(String [] ignoredtags) method. Remember to also include "script" and "style" in this array along with your custom tag names. I am not

Re: or

2003-01-30 Thread Michael Wechner
= title + body and your class HtmlDocument contents=body 2) I got two Javadoc warnings, because @return was empty within HtmlDocument (getDocument() and Document()) Thanks very much for your help Michael Erik On Thursday, January 30, 2003, at 04:56 AM, Michael Wechner wrote: Hi

or

2003-01-30 Thread Michael Wechner
Hi I am looking for an HTMLParser which skips text tagged by or something similar. This way I could exclude for instance a "global navigation section" within the HTML International Business Science ... It seems that the current demo/HTMLParser (http://lucene.sourceforge.net/cgi-bin/faq/faq

Re: Searcher is not returning the records - help please

2003-01-25 Thread Michael Wechner
Are you using HTMLDocument for indexing? The name of the content field is "contents" and not "content" which you are using within your code (plz see below). Vinu SB wrote: Hi, I am relatively new to Lucene. My indexing process is going fine, as and when I upload the files. But when I search the

Re: Indexing other documents (.pdf et .doc)

2002-12-19 Thread Michael Wechner
Friaa Nafaa wrote: Hello,I use Lucene with Tomcat and I can now index and search all html documents. But I would like to index other documents such us pdf or Word (.doc), I hope that sameone can help me ! Concerning PDF: Before indexing you should extract the text from the PDF and save it as

Re: Score: Lucene 1.2 versus 1.3-dev1

2002-12-16 Thread Michael Wechner
-created, scores will be as before. (cutting) -Original Message- From: Michael Wechner [mailto:[EMAIL PROTECTED]] Sent: Monday, December 16, 2002 4:34 PM To: [EMAIL PROTECTED] Subject: Score: Lucene 1.2 versus 1.3-dev1 Hi I started to deploy Lucene 1.3-dev1 from CVS very recently

Score: Lucene 1.2 versus 1.3-dev1

2002-12-16 Thread Michael Wechner
Hi I started to deploy Lucene 1.3-dev1 from CVS very recently and noticed that the "score" is kind of different. In the case of Lucene1.2 I received scores such as for instance 3.45345234 * 10e-1 In the case of Lucene1.3-dev1 I am receiving scores such as for instance 3.23232131 *10e-8 Is t