Hi Alex,
Alex Ott writes:
> res - the map consisting of:
> - :text -> extracted text
> - all other fields - metadata from document
Works like a charm, thanks a bunch!
--
Bastien
--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this
>lein try clj-tika "1.2.0"
user=> (use 'tika)
user=> (def res (parse "
https://www.oasis-open.org/committees/download.php/25054/07-08-22-MetaData-Examples.odt
"))
#'user/res
res - the map consisting of:
- :text -> extracted text
- all other fields - metadata from document
On Wed, Jun 4, 20
Hi Alex,
Alex Ott writes:
> Pantomime right now doesn't support the text extraction, but you can
> take the https://github.com/alexott/clj-tika (outdate although) - it
> uses the Apache Tika for text extraction
thanks -- I stumbled upon clj-tika but didn't understand how to use
it. Would you h
Hi
Pantomime right now doesn't support the text extraction, but you can take
the https://github.com/alexott/clj-tika (outdate although) - it uses the
Apache Tika for text extraction
On Wed, Jun 4, 2014 at 1:27 AM, Bastien wrote:
> Hi all,
>
> I'm trying to get the content of an ODT file as pla
Hi Denis,
Denis Fuenzalida writes:
> I've created a small gist which shows how to use the ODFDOM API which
> is much simpler to use:
>
> https://gist.github.com/dfuenzalida/a1e9755e9b2e7f638620
Thanks a lot for this! I tested it and I can get the human readable
text from an arbitrary .odt file
Hi Jeffrey,
"'Jeffrey Cummings' via Clojure" writes:
> You may want to look at Docjure https://github.com/mjul/docjure
>
>
> It parses .xlsx files it may be able to parse .odt files.
Thanks, but I don't see anything in docjure about parsing .odt files.
Or am I missing something?
--
Basti
>
> You may want to look at Docjure https://github.com/mjul/docjure
It parses .xlsx files it may be able to parse .odt files. It uses the
Apache POI Java library to parse.
Jeff
--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this grou
I've created a small gist which shows how to use the ODFDOM API which is
much simpler to use:
https://gist.github.com/dfuenzalida/a1e9755e9b2e7f638620
El martes, 3 de junio de 2014 20:58:20 UTC-4, Denis Fuenzalida escribió:
>
> Hi Bastien,
>
> ODT files from OpenOffice/LibreOffice are just Zip
Hi Bastien,
ODT files from OpenOffice/LibreOffice are just Zip files which contain a
bunch of xml files and folders for the images or media which you've
inserted into a document. The text itself is contained in a file called
"content.xml" inside of it.
There's a plain Java parser for ODT files
Hi all,
I'm trying to get the content of an ODT file as plain text.
I've found Pantomime, but don't understand how to use it?
Can anyone put me on the right tracks with a minimal working
example?
Thanks in advance!
--
Bastien
--
You received this message because you are subscribed to the G
10 matches
Mail list logo