Re: Read Microsoft Word .doc files in Clojure

2014-12-05 Thread Ron Toland
Divya, Here's a simple example for converting text from an input stream (which you can convert any file into): (ns sample.tika (:require [clj-tika.core :as tika]) (defn extract-text "Extracts the text from the input stream" [input-stream] (tika/parse input-stream)) Ron -- Sent with

Re: Read Microsoft Word .doc files in Clojure

2014-12-05 Thread Gary Verhaegen
clj-tika seems to be abandoned (and is marked as deprecated). You will probably be better off using Tika directly through interop. On Friday, 5 December 2014, Divya Shravanthi wrote: > Hi Ron, > > Could you please share an example of how to pull simple text from pdf/doc > files. I couldn't find

Re: Read Microsoft Word .doc files in Clojure

2014-12-05 Thread Divya Shravanthi
Hi Ron, Could you please share an example of how to pull simple text from pdf/doc files. I couldn't find a proper tutorial for clj-tika. Thanks On Friday, 3 January 2014 05:03:11 UTC+5:30, Ron Toland wrote: > > If all you need is the text, you could use Apache Tika to extract it: > http://tik

Re: Read Microsoft Word .doc files in Clojure

2014-01-03 Thread Brendan Younger
I've used the Java code in TextExtractor http://stackoverflow.com/questions/10250617/java-apache-poi-can-i-get-clean-text-from-ms-word-doc-files with good success in Clojure projects. Either throw it in as Java source or convert to Clojure code. You'll probably want the tika-parsers jar ins

Re: Read Microsoft Word .doc files in Clojure

2014-01-02 Thread Ron Toland
If all you need is the text, you could use Apache Tika to extract it: http://tika.apache.org/ There's a simple clojure lib to get you started: https://github.com/alexott/clj-tika I've used it to pull text out of .doc, .pdf, and .odt files. Ron On Wednesday, January 1, 2014 11:49:30 PM UTC-8,

Re: Read Microsoft Word .doc files in Clojure

2014-01-02 Thread Frank Hale
One solution is to use ClojureCLR and the OpenXML SDK. On Thu, Jan 2, 2014 at 11:08 AM, Dennis Haupt wrote: > use apache poi and write a small wrapper or something > this is what i did > > > 2014/1/2 Joshua Mendoza > >> Hi!, >> >> I've been looking for libraries or resources to read MS .doc fi

Re: Read Microsoft Word .doc files in Clojure

2014-01-02 Thread Dennis Haupt
use apache poi and write a small wrapper or something this is what i did 2014/1/2 Joshua Mendoza > Hi!, > > I've been looking for libraries or resources to read MS .doc files in > Clojure, but found none. Does anyone have tried, used, encountered or > witnessed such a thing to read them? > > I

Read Microsoft Word .doc files in Clojure

2014-01-02 Thread Joshua Mendoza
Hi!, I've been looking for libraries or resources to read MS .doc files in Clojure, but found none. Does anyone have tried, used, encountered or witnessed such a thing to read them? I found a lot of info publicly available by the government in .doc files but I want to process them automaticall