On 07/06/2013 09:42 AM, Denis Papathanasiou wrote:
I have a plain text file containing an English-language essay that I'd
like to split into sentences, based on the presence of punctuation.
If you want a natural language processing-based version of the a
sentence detector, clojure-opennlp[1]
I have a plain text file containing an English-language essay that I'd like
to split into sentences, based on the presence of punctuation.
I wrote this function to determine if a given character is an English
punctuation mark:
(defn ispunc? [c]
( (count (filter #(= % c) '(. ! ? ;))) 0))
I
On Sat, Jul 6, 2013 at 11:42 AM, Denis Papathanasiou
denis.papathanas...@gmail.com wrote:
(def my-text (slurp mytext.txt))
(def my-sentences (partition-by ispunc? my-text))
Unfortunately, this returns a sequence of 1, whose first and only element
contains the entire text, since ispunc?
On Saturday, July 6, 2013 1:22:32 PM UTC-4, Lars Nilsson wrote:
[snip]
If that kind of splitting is really all you require,
(clojure.string/split my-text #[.!?;]) or (re-seq #[^.!?;]+
my-text)
Thanks!
Is there any way to preserve the actual punctuation? That's why I was
looking at
On Sat, Jul 6, 2013 at 5:02 PM, Denis Papathanasiou
denis.papathanas...@gmail.com wrote:
On Saturday, July 6, 2013 1:22:32 PM UTC-4, Lars Nilsson wrote:
[snip]
If that kind of splitting is really all you require,
(clojure.string/split my-text #[.!?;]) or (re-seq #[^.!?;]+
my-text)
Is there