Erik Hatcher wrote:
On Jul 5, 2004, at 9:00 AM, Michael Wechner wrote:
If several users are saving documents on the server concurrently
and during saving the index shall be updated incrementally ... do
I have to make sure that it's going to be "threadsave" or does Lucene
take care o
If several users are saving documents on the server concurrently
and during saving the index shall be updated incrementally ... do
I have to make sure that it's going to be "threadsave" or does Lucene
take care of this?
Thanks
Michi
--
Michael Wechner
Wyona Inc. - Open Source Con
p the precision on the directory
to a 3 digit setup or a 4 digit setup (once you automate it, sky's the
limit)
Hope this helps
Nader Henein
Michael Wechner wrote:
I try to index around a million documents. The problem is
that I run out of memory during sorting by uid when I go through
the d
llion
documents).
Is there another approach than sorting by uid?
Thanks
Michi
--
Michael Wechner
Wyona Inc. - Open Source Content Management - Apache Lenya
http://www.wyona.com http://cocoon.apache.org/lenya/
[EMAIL PROTECTED][EMAIL PROT
ds, e-mail: [EMAIL PROTECTED]
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
--
Michael Wechner
Wyona Inc. - Open Source Content Management - Apache Lenya
http://www
hich is the DOC ID) and save yourself the pain
unfortunately this isn't possible.
Thanks a lot for your help
Michi
Hope this helps.
Nader Henein
-Original Message-
From: Michael Wechner [mailto:[EMAIL PROTECTED]
Sent: Tuesday, April 27, 2004 3:52 PM
To: Lucene Users List
Subj
my XML files contain something like
20040427...
and I would like to sort by this date.
So I guess I need to modify the Documentparser and generate something like
a millisecond field and then sort by this, correct?
Has anyone done something like this yet?
Thanks
Michi
--
Michael Wechner
]
For additional commands, e-mail: [EMAIL PROTECTED]
--
Michael Wechner
Wyona Inc. - Open Source Content Management - Apache Lenya
http://www.wyona.com http://cocoon.apache.org/lenya/
[EMAIL PROTECTED][EMAIL PROTECTED
John Bresnik wrote:
anyone know of a quick and easy way to get this demo
[org.apache.lucene.demo.IndexHTML] to parse JSP files as well? I used to a
crawler to create a local [static] version of the site [i.e. they are not
longer "JSP" files just the html output from the original JSP file - but in
Pinky Iyer wrote:
Hi !
I am trying to use xpdf for pdf parser, the problem i encounter is when i encounter
a file with .pdf extension, i call the pdftotext script to convert to text, which in
turn uses the file system and leaves the same file with .txt extension in same dir.
How can i get thi
That's very interesting.
I have tried something similar by integrating
Lucene into Wyona, which is a CMS based on Cocoon,
and I also separated Structure from Layout. You can try it out at
HTML:
http://195.226.6.70:8080/wyona-cms/oscom/search-oscom/lucene?publication-id=all&queryString=Cocoon+Wyo
Erik Hatcher wrote:
On Thursday, January 30, 2003, at 07:07 PM, Michael Wechner wrote:
Maybe Erik wants to include an "improved" version of my code snippet
into CVS.
Only if it can be made generic somehow - but that might be a bit
tricky to implement depending on how crazy we
Erik Hatcher wrote:
On Thursday, January 30, 2003, at 06:59 PM, Michael Wechner wrote:
2) I got two Javadoc warnings, because @return was empty within
HtmlDocument (getDocument() and Document())
picky picky! :) But thanks - I'll correct those too.
sorry for that, but ant
VS.
I guess I am not the only one wanting to exclude certain parts from an
HTML page ;-)
All the best
Michael
Regards,
Kelvin
The book giving manifesto - http://how.to/sharethisbook
On Thu, 30 Jan 2003 10:56:50 +0100, Michael Wechner said:
Hi
I am looking for an HTMLParser w
Ronnie Kolehmainen wrote:
Michael,
the HtmlDocument class supports ignoring tags, ie all text inside specified
tag names is ignored. Look at the setIgnoreTags(String [] ignoredtags)
method. Remember to also include "script" and "style" in this array along
with your custom tag names.
I am not
= title + body
and your class HtmlDocument
contents=body
2) I got two Javadoc warnings, because @return was empty within
HtmlDocument (getDocument() and Document())
Thanks very much for your help
Michael
Erik
On Thursday, January 30, 2003, at 04:56 AM, Michael Wechner wrote:
Hi
Hi
I am looking for an HTMLParser which skips text tagged by
or something similar. This way I could exclude for
instance a "global navigation section" within the HTML
International
Business
Science
...
It seems that the current demo/HTMLParser
(http://lucene.sourceforge.net/cgi-bin/faq/faq
Are you using HTMLDocument for indexing? The name of the content field is
"contents" and not "content" which you are using within your code (plz
see below).
Vinu SB wrote:
Hi,
I am relatively new to Lucene. My indexing process is
going fine, as and when I upload the files. But when I
search the
Friaa Nafaa wrote:
Hello,I use Lucene with Tomcat and I can now index and search all html documents. But I would like to index other documents such us pdf or Word (.doc), I hope that sameone can help me !
Concerning PDF:
Before indexing you should extract the text from the PDF and save it
as
-created, scores will be
as before. (cutting)
-Original Message-
From: Michael Wechner [mailto:[EMAIL PROTECTED]]
Sent: Monday, December 16, 2002 4:34 PM
To: [EMAIL PROTECTED]
Subject: Score: Lucene 1.2 versus 1.3-dev1
Hi
I started to deploy Lucene 1.3-dev1 from CVS very recently
Hi
I started to deploy Lucene 1.3-dev1 from CVS very recently and
noticed that the "score" is kind of different.
In the case of Lucene1.2 I received scores such as for instance
3.45345234 * 10e-1
In the case of Lucene1.3-dev1 I am receiving scores such as for instance
3.23232131 *10e-8
Is t
21 matches
Mail list logo