OK, thanks.
Adam
> -Original Message-
> From: Otis Gospodnetic [mailto:[EMAIL PROTECTED]
> Sent: Monday, January 31, 2005 5:51 PM
> To: Lucene Users List; [EMAIL PROTECTED]
> Subject: RE: carrot2 question too - Re: Fun with the Wikipedia
>
> Adam,
>
> Dawid posted some code that lets you
Hi Adam.
Otis and David have already provided you with pointers to my previous
post regarding Carrot2-Lucene integration, so just a tiny note here:
Also, when I looked at Carrot2 the pipe line is implemented as over http. I
wonder how efficient that is, or can it be changed, for instance for an
Check out http://www.javaworld.com/javaworld/jw-12-2000/jw-1229-traps.html
which provides some pointers and code which should be helpful.
Cheers,
Kelvin
http://www.supermind.org
On Mon, 31 Jan 2005 19:01:11 +0100, Bertrand VENZAL wrote:
> Hi all,
>
> I ve a kind of problem to execute a convertin
I will assume you are asking this question on the lucene mailing list
because you now want to index that PDF document.
Have you tried PDFBox? It can't create an html file for you but it can
extract text.
Ben
http://www.pdfbox.org
On Mon, 31 Jan 2005, Bertrand VENZAL wrote:
> Hi all,
>
> I v
Hi all,
I ve a kind of problem to execute a converting tool to modify a pdf to an
html under Linux. In fact, i have an executable "pdftohtml" which work
correctly on batch mode, and when I want to use it through Java under
Windows 2000 works also,BUT it does not work at all on the server under
Otis Gospodnetic wrote:
Adam,
Dawid posted some code that lets you use Carrot2 locally with Lucene,
see embedded zip url here for carrot2/lucene code - it may also be in
the carrot2 cvs tree too - this is what I used in the wikipedia/cluster
stuff as the basis
http://www.newsarch.com/archive/mai
Adam,
Dawid posted some code that lets you use Carrot2 locally with Lucene,
without the componentized pipe line system described on Carrot2 site.
Otis
--- Adam Saltiel <[EMAIL PROTECTED]> wrote:
> David, Hi,
> Would you be able to comment on coincidentally recent thread " RE: ->
> Grouping Sear
Yura Smolsky wrote:
There is a big difference when you use compound index format or
multiple files. I have tested it on the big index (45 Gb). When I used
compound file then optimize takes 3 times more space, b/c *.cfs needs
to be unpacked.
Now I do use non compound file format. It needs like twice
David, Hi,
Would you be able to comment on coincidentally recent thread " RE: ->
Grouping Search Results by Clustering Snippets:"?
Also, when I looked at Carrot2 the pipe line is implemented as over http. I
wonder how efficient that is, or can it be changed, for instance for an all
local implementa
I am pleased to announce that MindRetrieve 0.4.0 has been released.
MindRetrieve is a desktop search tool to help users to search and organize
the web they have seen. Download it from http://mindretrieve.berlios.de/.
Everyday we read a large amount of information from the world wide web.
The t
Hi.
Coming up with answers... a little belated, but hope you're still on:
we have been experimenting with carrot2 and are very pleased so far,
only one issue: there is no release not even an alpha one and the
dependencies seemed to be patched (jama)
Yes, there is not "official" release. We just don
Saad,
Here is what I got. I will post again, and be more
specific.
-Y
--- Nader Henein <[EMAIL PROTECTED]> wrote:
> We'll need a little more detail to help you, what
> are the sizes of your
> updates and how often are they updated.
>
> 1) No just re-open the index writer every time to
> re-inde
Hi Jonathan,
> Yet another burning question :-). Can someone explain how the document
> numbers in Lucene documents work? For example, the TermDocs.doc()
> method returns "the current doc number." How can I get this doc number
> if I just have a Document?
>
I don't think you can.
A document
13 matches
Mail list logo