Re: Parsing ODT files (with Pantomime?)

2014-06-04 Thread Bastien
Hi Alex, Alex Ott writes: > res - the map consisting of: > - :text -> extracted text > - all other fields - metadata from document Works like a charm, thanks a bunch! -- Bastien -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this

Re: Parsing ODT files (with Pantomime?)

2014-06-04 Thread Alex Ott
>lein try clj-tika "1.2.0" user=> (use 'tika) user=> (def res (parse " https://www.oasis-open.org/committees/download.php/25054/07-08-22-MetaData-Examples.odt ")) #'user/res res - the map consisting of: - :text -> extracted text - all other fields - metadata from document On Wed, Jun 4, 20

Re: Parsing ODT files (with Pantomime?)

2014-06-04 Thread Bastien
Hi Alex, Alex Ott writes: > Pantomime right now doesn't support the text extraction, but you can > take the https://github.com/alexott/clj-tika (outdate although) - it > uses the Apache Tika for text extraction thanks -- I stumbled upon clj-tika but didn't understand how to use it. Would you h

Re: Parsing ODT files (with Pantomime?)

2014-06-04 Thread Alex Ott
Hi Pantomime right now doesn't support the text extraction, but you can take the https://github.com/alexott/clj-tika (outdate although) - it uses the Apache Tika for text extraction On Wed, Jun 4, 2014 at 1:27 AM, Bastien wrote: > Hi all, > > I'm trying to get the content of an ODT file as pla

Re: Parsing ODT files (with Pantomime?)

2014-06-04 Thread Bastien
Hi Denis, Denis Fuenzalida writes: > I've created a small gist which shows how to use the ODFDOM API which > is much simpler to use: > > https://gist.github.com/dfuenzalida/a1e9755e9b2e7f638620 Thanks a lot for this! I tested it and I can get the human readable text from an arbitrary .odt file

Re: Parsing ODT files (with Pantomime?)

2014-06-04 Thread Bastien
Hi Jeffrey, "'Jeffrey Cummings' via Clojure" writes: > You may want to look at Docjure https://github.com/mjul/docjure > > > It parses .xlsx files it may be able to parse .odt files. Thanks, but I don't see anything in docjure about parsing .odt files. Or am I missing something? -- Basti

Re: Parsing ODT files (with Pantomime?)

2014-06-03 Thread 'Jeffrey Cummings' via Clojure
> > You may want to look at Docjure https://github.com/mjul/docjure It parses .xlsx files it may be able to parse .odt files. It uses the Apache POI Java library to parse. Jeff -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this grou

Re: Parsing ODT files (with Pantomime?)

2014-06-03 Thread Denis Fuenzalida
I've created a small gist which shows how to use the ODFDOM API which is much simpler to use: https://gist.github.com/dfuenzalida/a1e9755e9b2e7f638620 El martes, 3 de junio de 2014 20:58:20 UTC-4, Denis Fuenzalida escribió: > > Hi Bastien, > > ODT files from OpenOffice/LibreOffice are just Zip

Re: Parsing ODT files (with Pantomime?)

2014-06-03 Thread Denis Fuenzalida
Hi Bastien, ODT files from OpenOffice/LibreOffice are just Zip files which contain a bunch of xml files and folders for the images or media which you've inserted into a document. The text itself is contained in a file called "content.xml" inside of it. There's a plain Java parser for ODT files

Parsing ODT files (with Pantomime?)

2014-06-03 Thread Bastien
Hi all, I'm trying to get the content of an ODT file as plain text. I've found Pantomime, but don't understand how to use it? Can anyone put me on the right tracks with a minimal working example? Thanks in advance! -- Bastien -- You received this message because you are subscribed to the G