Re: Most idiomatic way of splitting a string into sentences?

2013-07-07 Thread Lee Hinman
On 07/06/2013 09:42 AM, Denis Papathanasiou wrote: I have a plain text file containing an English-language essay that I'd like to split into sentences, based on the presence of punctuation. If you want a natural language processing-based version of the a sentence detector, clojure-opennlp[1]

Most idiomatic way of splitting a string into sentences?

2013-07-06 Thread Denis Papathanasiou
I have a plain text file containing an English-language essay that I'd like to split into sentences, based on the presence of punctuation. I wrote this function to determine if a given character is an English punctuation mark: (defn ispunc? [c] ( (count (filter #(= % c) '(. ! ? ;))) 0)) I

Re: Most idiomatic way of splitting a string into sentences?

2013-07-06 Thread Lars Nilsson
On Sat, Jul 6, 2013 at 11:42 AM, Denis Papathanasiou denis.papathanas...@gmail.com wrote: (def my-text (slurp mytext.txt)) (def my-sentences (partition-by ispunc? my-text)) Unfortunately, this returns a sequence of 1, whose first and only element contains the entire text, since ispunc?

Re: Most idiomatic way of splitting a string into sentences?

2013-07-06 Thread Denis Papathanasiou
On Saturday, July 6, 2013 1:22:32 PM UTC-4, Lars Nilsson wrote: [snip] If that kind of splitting is really all you require, (clojure.string/split my-text #[.!?;]) or (re-seq #[^.!?;]+ my-text) Thanks! Is there any way to preserve the actual punctuation? That's why I was looking at

Re: Most idiomatic way of splitting a string into sentences?

2013-07-06 Thread Lars Nilsson
On Sat, Jul 6, 2013 at 5:02 PM, Denis Papathanasiou denis.papathanas...@gmail.com wrote: On Saturday, July 6, 2013 1:22:32 PM UTC-4, Lars Nilsson wrote: [snip] If that kind of splitting is really all you require, (clojure.string/split my-text #[.!?;]) or (re-seq #[^.!?;]+ my-text) Is there