Hi all,
I have recently been watching a set of videos from O'Reilly on
MapReduce. The author of the series is using Python for all of the
examples, but, in an effort to use Clojure more, I've been following
along and writing my code in Clojure. When I implemented the mapper
function that he descri
On Fri, Jul 8, 2011 at 7:05 PM, Christopher wrote:
> ;; mapper.clj
>
> (use ['clojure.java.io :only '(reader)])
> (use ['clojure.string :only '(split)])
>
> (defn mapper [lines]
> (doseq [line lines]
> (doseq [word (split line #"\s+")]
> (println (str word "\t1")
>
> (mapper (line-seq
2011/7/9 Christopher
> % time cake run mapper.clj < input.txt
> real0m3.573s
> user0m2.031s
> sys 0m1.528s
>
These numbers include JVM startup overhead (which is significant compared to
Python startup overhead).
--
MK
http://github.com/michaelklishin
http://twitter.com/michaelklis
Hi Michael,
Thanks for the comments, though, I want to point out that I'm using
cake to run the program which keeps an instance of the JVM spun up at
all times. That should remove the startup time, unless I am
misunderstanding how cake works. Also, the startup time should be
constant (say a few se
On Jul 8, 4:17 pm, Ken Wesson wrote:
> On Fri, Jul 8, 2011 at 7:05 PM, Christopher wrote:
> > ;; mapper.clj
>
> > (use ['clojure.java.io :only '(reader)])
> > (use ['clojure.string :only '(split)])
>
> > (defn mapper [lines]
> > (doseq [line lines]
> > (doseq [word (split line #"\s+")]
> >
Hi Ken,
Thanks for the comment. I tried what you suggested, but I am not
getting any reflection warnings. That said, comments like this are
exactly what I am looking for; I had no idea that you could turn on
checking for reflection issues. I'd love it if I could find a way to
speed this piece of c
Hi Christopher,
I ran your code with only one modification, using the "time" macro to
measure the execution time of the mapper function itself:
(use ['clojure.java.io :only '(reader)])
(use ['clojure.string :only '(split)])
(defn mapper [lines]
(doseq [line lines]
(doseq [word (split line
Running a program like that with cake run is awful, use AOT:
(ns clj-play.mapper
(:use [clojure.java.io :only [reader]])
(:use [clojure.string :only [split]])
(:gen-class))
(defn mapper [lines]
(doseq [line lines]
(doseq [word (split line #"\s+")]
(println (str word "\t1"))
Here's a very ugly low-level version just to show that it can be done:
(ns clj-play.mapper
(:use [clojure.java.io :only [reader]])
(:use [clojure.string :only [split]])
(:gen-class))
(set! *warn-on-reflection* true)
(defn mapper [^java.io.BufferedReader r ^java.io.OutputStreamWriter out]
Thanks Benny. I tried again without using cake and just compiling the
code into a jar and it does execute much better. I guess using the
cake run command as a way to avoid the JVM startup overhead isn't the
best option for writing highly performant code. I was kind of hoping
that after the first ru
Hi David,
Thanks for the comments and the code rewrite. This is excellent
information. I just tried it out on my own system and got the same
results. This is a really great example of how to optimize Clojure
code. I'm considering using Clojure for some more research-oriented
work where I will need
JVM is very slow to start. Try measuring around your method calls instead.
Also try running it for a long enough time to see the JVM GC kick the butt
of python's GC...
On Fri, Jul 8, 2011 at 6:19 PM, Michael Klishin wrote:
> 2011/7/9 Christopher
>
>> % time cake run mapper.clj < input.txt
>>
12 matches
Mail list logo