Help on a Clojure performance question

2011-07-08 Thread Christopher
Hi all, I have recently been watching a set of videos from O'Reilly on MapReduce. The author of the series is using Python for all of the examples, but, in an effort to use Clojure more, I've been following along and writing my code in Clojure. When I implemented the mapper function that he descri

Re: Help on a Clojure performance question

2011-07-08 Thread Ken Wesson
On Fri, Jul 8, 2011 at 7:05 PM, Christopher wrote: > ;; mapper.clj > > (use ['clojure.java.io :only '(reader)]) > (use ['clojure.string :only '(split)]) > > (defn mapper [lines] >  (doseq [line lines] >    (doseq [word (split line #"\s+")] >      (println (str word "\t1") > > (mapper (line-seq

Re: Help on a Clojure performance question

2011-07-08 Thread Michael Klishin
2011/7/9 Christopher > % time cake run mapper.clj < input.txt > real0m3.573s > user0m2.031s > sys 0m1.528s > These numbers include JVM startup overhead (which is significant compared to Python startup overhead). -- MK http://github.com/michaelklishin http://twitter.com/michaelklis

Re: Help on a Clojure performance question

2011-07-08 Thread Christopher
Hi Michael, Thanks for the comments, though, I want to point out that I'm using cake to run the program which keeps an instance of the JVM spun up at all times. That should remove the startup time, unless I am misunderstanding how cake works. Also, the startup time should be constant (say a few se

Re: Help on a Clojure performance question

2011-07-08 Thread Alan Malloy
On Jul 8, 4:17 pm, Ken Wesson wrote: > On Fri, Jul 8, 2011 at 7:05 PM, Christopher wrote: > > ;; mapper.clj > > > (use ['clojure.java.io :only '(reader)]) > > (use ['clojure.string :only '(split)]) > > > (defn mapper [lines] > >  (doseq [line lines] > >    (doseq [word (split line #"\s+")] > >  

Re: Help on a Clojure performance question

2011-07-08 Thread Christopher
Hi Ken, Thanks for the comment. I tried what you suggested, but I am not getting any reflection warnings. That said, comments like this are exactly what I am looking for; I had no idea that you could turn on checking for reflection issues. I'd love it if I could find a way to speed this piece of c

Re: Help on a Clojure performance question

2011-07-08 Thread Benny Tsai
Hi Christopher, I ran your code with only one modification, using the "time" macro to measure the execution time of the mapper function itself: (use ['clojure.java.io :only '(reader)]) (use ['clojure.string :only '(split)]) (defn mapper [lines] (doseq [line lines] (doseq [word (split line

Re: Help on a Clojure performance question

2011-07-08 Thread David Nolen
Running a program like that with cake run is awful, use AOT: (ns clj-play.mapper (:use [clojure.java.io :only [reader]]) (:use [clojure.string :only [split]]) (:gen-class)) (defn mapper [lines] (doseq [line lines] (doseq [word (split line #"\s+")] (println (str word "\t1"))

Re: Help on a Clojure performance question

2011-07-08 Thread David Nolen
Here's a very ugly low-level version just to show that it can be done: (ns clj-play.mapper (:use [clojure.java.io :only [reader]]) (:use [clojure.string :only [split]]) (:gen-class)) (set! *warn-on-reflection* true) (defn mapper [^java.io.BufferedReader r ^java.io.OutputStreamWriter out]

Re: Help on a Clojure performance question

2011-07-08 Thread Christopher
Thanks Benny. I tried again without using cake and just compiling the code into a jar and it does execute much better. I guess using the cake run command as a way to avoid the JVM startup overhead isn't the best option for writing highly performant code. I was kind of hoping that after the first ru

Re: Help on a Clojure performance question

2011-07-08 Thread Christopher
Hi David, Thanks for the comments and the code rewrite. This is excellent information. I just tried it out on my own system and got the same results. This is a really great example of how to optimize Clojure code. I'm considering using Clojure for some more research-oriented work where I will need

Re: Help on a Clojure performance question

2011-07-08 Thread Kenny Stone
JVM is very slow to start. Try measuring around your method calls instead. Also try running it for a long enough time to see the JVM GC kick the butt of python's GC... On Fri, Jul 8, 2011 at 6:19 PM, Michael Klishin wrote: > 2011/7/9 Christopher > >> % time cake run mapper.clj < input.txt >>