Divya,
Here's a simple example for converting text from an input stream (which you can
convert any file into):
(ns sample.tika
(:require [clj-tika.core :as tika])
(defn extract-text
"Extracts the text from the input stream"
[input-stream]
(tika/parse input-stream))
Ron
--
Sent with
clj-tika seems to be abandoned (and is marked as deprecated). You will
probably be better off using Tika directly through interop.
On Friday, 5 December 2014, Divya Shravanthi
wrote:
> Hi Ron,
>
> Could you please share an example of how to pull simple text from pdf/doc
> files. I couldn't find
Hi Ron,
Could you please share an example of how to pull simple text from pdf/doc
files. I couldn't find a proper tutorial for clj-tika.
Thanks
On Friday, 3 January 2014 05:03:11 UTC+5:30, Ron Toland wrote:
>
> If all you need is the text, you could use Apache Tika to extract it:
> http://tik
I've used the Java code in
TextExtractor
http://stackoverflow.com/questions/10250617/java-apache-poi-can-i-get-clean-text-from-ms-word-doc-files
with good success in Clojure projects. Either throw it in as Java source
or convert to Clojure code. You'll probably want the tika-parsers jar
ins
If all you need is the text, you could use Apache Tika to extract
it: http://tika.apache.org/
There's a simple clojure lib to get you
started: https://github.com/alexott/clj-tika
I've used it to pull text out of .doc, .pdf, and .odt files.
Ron
On Wednesday, January 1, 2014 11:49:30 PM UTC-8,
One solution is to use ClojureCLR and the OpenXML SDK.
On Thu, Jan 2, 2014 at 11:08 AM, Dennis Haupt wrote:
> use apache poi and write a small wrapper or something
> this is what i did
>
>
> 2014/1/2 Joshua Mendoza
>
>> Hi!,
>>
>> I've been looking for libraries or resources to read MS .doc fi
use apache poi and write a small wrapper or something
this is what i did
2014/1/2 Joshua Mendoza
> Hi!,
>
> I've been looking for libraries or resources to read MS .doc files in
> Clojure, but found none. Does anyone have tried, used, encountered or
> witnessed such a thing to read them?
>
> I
Hi!,
I've been looking for libraries or resources to read MS .doc files in
Clojure, but found none. Does anyone have tried, used, encountered or
witnessed such a thing to read them?
I found a lot of info publicly available by the government in .doc files
but I want to process them automaticall