We did a simple one a while ago. Could probably be a bit more
sophisticated, but it seems to do it job on the little bit of testing we
did.
See
http://cvs.sourceforge.net/viewcvs.py/toscanaj/docco/source/org/tockit/docco/documenthandler/OpenOfficeDocumentHandler.java?rev=1.4&view=auto
HTH,
Pe
I've the same requirement. I used antiword, xlhtml and ppthtml on win2k. I
called them with Runtime.exec(). There are still problems: all three hang
up sometimes. Otherwise, it worked. I indexed several hunderds of
thousands files in development mode. I never got into production.
Argyn
On Mon,
On Monday 19 April 2004 14:01, Mario Ivankovits wrote:
> Stephane James Vaucher wrote:
> > Anyone try what Joerg suggested here?
> > http://nagoya.apache.org/eyebrowse/[EMAIL PROTECTED]
> >pache.org&msgNo=6231
>
> Dont know what you would like to do, but if you simply would like to
> extract text,
Actually, the objective would be to use OO to extract text from MSOffice
formats. If I read your code correctly, your code should only work with OO
as the docs are in xml.
Thanks for the code for OO docs through,
sv
On Mon, 19 Apr 2004, Mario Ivankovits wrote:
> Stephane James Vaucher wrote:
Dear all,
I hate to be insistent, but I have a large live website with a growing,
un-optimizable Lucene index and which therefore has it's appointment
with destiny pencilled into The Diary of Doom on a date roughly
three weeks hence.
So if I'm doing something stupid, or there's a workaround, or s
Stephane
James Vaucher wrote:
Anyone try what Joerg suggested here?
http://nagoya.apache.org/eyebrowse/[EMAIL PROTECTED]&msgNo=6231
Dont know what you would like to do, but if you simply would like to
extract text, you could simply try this sniplet:
---snip--- J
I'll make a copy of the code available on the wiki before it disappears
off the Web.
Now for some info on using OO on a production system:
http://www.oooforum.org/forum/viewtopic.php?t=2913&highlight=jurt
OO works well (but is slow), but is
not multi-threaded (the communication bridge is).
Qu