Re: [Haskell-cafe] Re: Mining Twitter data in Haskell and Clojure

2010-07-04 Thread Jason Dusek
So I wonder what the timings for Haskell, O'Caml and Clojure are now, given the patch to GHC. -- Jason Dusek Linux User #510144 | http://counter.li.org/ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/ha

Re: [Haskell-cafe] Re: Mining Twitter data in Haskell and Clojure

2010-07-02 Thread Nicolas Pouillard
On Thu, 24 Jun 2010 19:25:35 -0700 (PDT), braver wrote: > Claus -- cafe5 is pretty much where it's at. You're right, the proggy > was used as the bug finder, actually at cafe3, still using ByteString. > > Having translated it from Clojure to Haskell to OCaml, I'm now > debugging the logic and pe

Re: [Haskell-cafe] Re: Mining Twitter data in Haskell and Clojure

2010-06-28 Thread Antoine Latter
On Mon, Jun 28, 2010 at 2:32 PM, Don Stewart wrote: > claus.reinke: >> >> To binary package users/authors: is there a typed version of binary (that >> is, one that records and checks a representation of the serialized type >> before actual (de-)serialization)? It >> would be nice to have such a ty

Re: [Haskell-cafe] Re: Mining Twitter data in Haskell and Clojure

2010-06-28 Thread Don Stewart
claus.reinke: > > To binary package users/authors: is there a typed version of binary (that > is, one that records and checks a representation of the serialized type > before actual (de-)serialization)? It > would be nice to have such a type check, even though it > wouldn't protect against missin

Re: [Haskell-cafe] Re: Mining Twitter data in Haskell and Clojure

2010-06-28 Thread Claus Reinke
Claus -- cafe5 is pretty much where it's at. You're right, the proggy was used as the bug finder, actually at cafe3, still using ByteString. It would be useful to have a really tiny data source - no more than 100 entries per Map should be sufficient to confirm or reject hunches about potentia

[Haskell-cafe] Re: Mining Twitter data in Haskell and Clojure

2010-06-25 Thread Simon Marlow
On 25/06/2010 04:23, Don Stewart wrote: deliverable: Simon -- so how can I get me a new ghc now? From git, I suppose? (It used to live in darcs...) It still lives in darcs. Nightly builds are here: http://www.haskell.org/ghc/dist/stable/dist/ You'll want to check with Simon that the patch

Re: [Haskell-cafe] Re: Mining Twitter data in Haskell and Clojure

2010-06-25 Thread Simon Marlow
On 24/06/2010 21:40, Claus Reinke wrote: I'll work with Simon to investigate the runtime, but would welcome any ideas on further speeding up cafe4. An update on this: with the help of Alex I tracked down the problem (an integer overflow bug in GHC's memory allocator), and his program now runs t

Re: [Haskell-cafe] Re: Mining Twitter data in Haskell and Clojure

2010-06-24 Thread Don Stewart
deliverable: > Simon -- so how can I get me a new ghc now? From git, I suppose? (It > used to live in darcs...) It still lives in darcs. Nightly builds are here: http://www.haskell.org/ghc/dist/stable/dist/ You'll want to check with Simon that the patch got pushed, though, first. -- Don ___

Re: [Haskell-cafe] Re: Mining Twitter data in Haskell and Clojure

2010-06-24 Thread Ivan Miljenovic
On 25 June 2010 12:32, braver wrote: > Simon -- so how can I get me a new ghc now?  From git, I suppose?  (It > used to live in darcs...) Still does if I recall correctly. -- Ivan Lazar Miljenovic ivan.miljeno...@gmail.com IvanMiljenovic.wordpress.com ___

[Haskell-cafe] Re: Mining Twitter data in Haskell and Clojure

2010-06-24 Thread braver
Simon -- so how can I get me a new ghc now? From git, I suppose? (It used to live in darcs...) -- Alexy ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

[Haskell-cafe] Re: Mining Twitter data in Haskell and Clojure

2010-06-24 Thread braver
Simon -- amazing feat! Thanks for tracking it down. I'll now happily rely on the Haskell version if it is fast enough :). -- Alexy ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

[Haskell-cafe] Re: Mining Twitter data in Haskell and Clojure

2010-06-24 Thread braver
Claus -- cafe5 is pretty much where it's at. You're right, the proggy was used as the bug finder, actually at cafe3, still using ByteString. Having translated it from Clojure to Haskell to OCaml, I'm now debugging the logic and perhaps the conceptual data structures. Then better maps will be tri

[Haskell-cafe] Re: Mining Twitter data in Haskell and Clojure

2010-06-24 Thread braver
On Jun 24, 5:07 am, Johan Tibell wrote: > The new "The Performance of Haskell containers package" paper compares the > performance of, among other things, Maps holding Strings/ByteString. It also > improves the performance of many operations on these. I think it's very > relevant to your work. > >

Re: [Haskell-cafe] Re: Mining Twitter data in Haskell and Clojure

2010-06-24 Thread Claus Reinke
I'll work with Simon to investigate the runtime, but would welcome any ideas on further speeding up cafe4. An update on this: with the help of Alex I tracked down the problem (an integer overflow bug in GHC's memory allocator), and his program now runs to completion. So this was about keepin

Re: [Haskell-cafe] Re: Mining Twitter data in Haskell and Clojure

2010-06-24 Thread Duncan Coutts
On 24 June 2010 13:10, Simon Marlow wrote: > On 17/06/2010 06:23, braver wrote: > >> I'll work with Simon to investigate the runtime, but would welcome any >> ideas on further speeding up cafe4. > > An update on this: with the help of Alex I tracked down the problem (an > integer overflow bug in G

Re: [Haskell-cafe] Re: Mining Twitter data in Haskell and Clojure

2010-06-24 Thread Don Stewart
marlowsd: >> I'll work with Simon to investigate the runtime, but would welcome any >> ideas on further speeding up cafe4. > > An update on this: with the help of Alex I tracked down the problem (an > integer overflow bug in GHC's memory allocator), and his program now > runs to completion. > >

[Haskell-cafe] Re: Mining Twitter data in Haskell and Clojure

2010-06-24 Thread Simon Marlow
On 17/06/2010 06:23, braver wrote: WIth @dafis's help, there's a version tagged cafe3 on the master branch which is better performing with ByteString. I also went ahead and interned ByteString as Int, converting the structure to IntMap everywhere. That's reflected on the new "intern" branch at

Re: [Haskell-cafe] Re: Mining Twitter data in Haskell and Clojure

2010-06-24 Thread Johan Tibell
On Tue, Jun 15, 2010 at 11:24 PM, braver wrote: > Wren -- thanks for the clarification! Someone said that Foldable on > Trie may not be very efficient -- is that true? > > I use ByteString as a node type for the graph; these are Twitter user > names. Surely it's useful to replace them with Int,

Re: [Haskell-cafe] Re: Mining Twitter data in Haskell and Clojure

2010-06-18 Thread wren ng thornton
braver wrote: Wren -- thanks for the clarification! Someone said that Foldable on Trie may not be very efficient -- is that true? That was probably me saying that I had worked on some more efficient implementations than those currently in use. Alas, the more efficient ones seem to alter the

[Haskell-cafe] Re: Mining Twitter data in Haskell and Clojure

2010-06-17 Thread braver
On Jun 17, 2:36 pm, "Claus Reinke" wrote: > > I'll work with Simon to investigate the runtime, but would welcome any > > ideas on further speeding up cafe4. > > Just a wild guess, but those foldWithKeys make me nervous. > > The result is strict, the step function tries to be strict, but if > you l

[Haskell-cafe] Re: Mining Twitter data in Haskell and Clojure

2010-06-17 Thread braver
Since folks got interested, I've added a doc/ subdirectory (on the "intern" branch) with a PDF defining my karmic social capital mathematically. It is this definition which is faithfully computed both in Clojure and Haskell. I've also added a LICENSE file basically stating that this research is t

Re: [Haskell-cafe] Re: Mining Twitter data in Haskell and Clojure

2010-06-17 Thread Claus Reinke
I'll work with Simon to investigate the runtime, but would welcome any ideas on further speeding up cafe4. Just a wild guess, but those foldWithKeys make me nervous. The result is strict, the step function tries to be strict, but if you look at the code for Data.IntMap.foldr, it doesn't really

[Haskell-cafe] Re: Mining Twitter data in Haskell and Clojure

2010-06-16 Thread braver
WIth @dafis's help, there's a version tagged cafe3 on the master branch which is better performing with ByteString. I also went ahead and interned ByteString as Int, converting the structure to IntMap everywhere. That's reflected on the new "intern" branch at tag cafe4. Still it can't do the ful

[Haskell-cafe] Re: Mining Twitter data in Haskell and Clojure

2010-06-16 Thread Simon Marlow
On 15/06/2010 20:43, braver wrote: On Jun 15, 6:27 am, Simon Marlow wrote: On 15/06/2010 06:09, braver wrote: In fact, the tag cafe2, when run on the full dataset, gets stuck at 11 days, with RAM slowly getting into 50 GB; a previous version caused ghc 6.12.1 to segfault around day 12 -- -deb

Re: [Haskell-cafe] Re: Mining Twitter data in Haskell and Clojure

2010-06-15 Thread Daniel Fischer
On Tuesday 15 June 2010 23:26:10, Don Stewart wrote: > deliverable: > > Wren -- thanks for the clarification! Someone said that Foldable on > > Trie may not be very efficient -- is that true? > > > > I use ByteString as a node type for the graph; these are Twitter user > > names. Surely it's usef

Re: [Haskell-cafe] Re: Mining Twitter data in Haskell and Clojure

2010-06-15 Thread Don Stewart
deliverable: > Wren -- thanks for the clarification! Someone said that Foldable on > Trie may not be very efficient -- is that true? > > I use ByteString as a node type for the graph; these are Twitter user > names. Surely it's useful to replace them with Int, which I'll try, > but Clojure works

[Haskell-cafe] Re: Mining Twitter data in Haskell and Clojure

2010-06-15 Thread braver
Wren -- thanks for the clarification! Someone said that Foldable on Trie may not be very efficient -- is that true? I use ByteString as a node type for the graph; these are Twitter user names. Surely it's useful to replace them with Int, which I'll try, but Clojure works with Java String fine an

Re: [Haskell-cafe] Re: Mining Twitter data in Haskell and Clojure

2010-06-15 Thread wren ng thornton
braver wrote: On Jun 14, 11:40 am, Don Stewart wrote: Oh, you'll want insertWith'. You might also consider bytestring-trie for the Graph, and IntMap for the AdJList ? Yeah, I saw jsonb using Trie and thought there's a reason for it. But it's very API-poor compared with Map, e.g. there's not

[Haskell-cafe] Re: Mining Twitter data in Haskell and Clojure

2010-06-15 Thread braver
On Jun 15, 6:27 am, Simon Marlow wrote: > On 15/06/2010 06:09, braver wrote: > > > In fact, the tag cafe2, when run on the full dataset, gets stuck at 11 > > days, with RAM slowly getting into 50 GB; a previous version caused > > ghc 6.12.1 to segfault around day 12 -- -debug showing an assert > >

Re: [Haskell-cafe] Re: Mining Twitter data in Haskell and Clojure

2010-06-15 Thread Don Stewart
deliverable: > > If you just want to optimize it and not compare exactly equal idiomatic > > code, > > you should stop using functional data structures and use a structure that > > fits > > your problem (the ST monad has been designed for that in Haskell), because > > compilers do not detect sing

[Haskell-cafe] Re: Mining Twitter data in Haskell and Clojure

2010-06-15 Thread braver
> If you just want to optimize it and not compare exactly equal idiomatic code, > you should stop using functional data structures and use a structure that fits > your problem (the ST monad has been designed for that in Haskell), because > compilers do not detect single-threaded usage and rewrite a

[Haskell-cafe] Re: Mining Twitter data in Haskell and Clojure

2010-06-15 Thread Simon Marlow
On 15/06/2010 06:09, braver wrote: In fact, the tag cafe2, when run on the full dataset, gets stuck at 11 days, with RAM slowly getting into 50 GB; a previous version caused ghc 6.12.1 to segfault around day 12 -- -debug showing an assert failure in Storage.c. ghc 6.10 got stuck at 30 days for g

Re: [Haskell-cafe] Re: Mining Twitter data in Haskell and Clojure

2010-06-14 Thread Ketil Malde
braver writes: > In fact, the tag cafe2, when run on the full dataset, gets stuck at 11 > days, with RAM slowly getting into 50 GB One tip might be to limit available heap memory by using +RTS -M2G (or whatever your real memory is). If (as seems likely) the RAM usage leads to thrashing (the sym

[Haskell-cafe] Re: Mining Twitter data in Haskell and Clojure

2010-06-14 Thread braver
In fact, the tag cafe2, when run on the full dataset, gets stuck at 11 days, with RAM slowly getting into 50 GB; a previous version caused ghc 6.12.1 to segfault around day 12 -- -debug showing an assert failure in Storage.c. ghc 6.10 got stuck at 30 days for good, and when profiling crashed twice

[Haskell-cafe] Re: Mining Twitter data in Haskell and Clojure

2010-06-14 Thread braver
OK, sample data is uploaded to data/sample in the git repo, and README.md updated with the build and run command lines. I've achieved a bit more strictness, again with great help from @dons, @dafis, and other great folks here and #haskell, but it's still slower than Clojure and occupies a bit more

[Haskell-cafe] Re: Mining Twitter data in Haskell and Clojure

2010-06-14 Thread braver
On Jun 14, 11:40 am, Don Stewart wrote: > Oh, you'll want insertWith'. > > You might also consider bytestring-trie for the Graph, and IntMap for > the AdJList ? Yeah, I saw jsonb using Trie and thought there's a reason for it. But it's very API-poor compared with Map, e.g. there's not even a fol

Re: [Haskell-cafe] Re: Mining Twitter data in Haskell and Clojure

2010-06-14 Thread Don Stewart
deliverable: > I've supplied a profile report there. Since I load the graphs in > memory and then walk them a lot, the time seems expected. It > allocates a lot, though. The main graph type is > > > type Graph = M.Map User AdjList > type AdjList = M.Map Day Reps > type User = B.ByteString > ty

Re: [Haskell-cafe] Re: Mining Twitter data in Haskell and Clojure

2010-06-14 Thread John Van Enk
Would it be possible to use an IntMap instead of your OrdMap? Perhaps zip your users with [0..] and key off the integer? As a side note, I threw this package onto Hackage a while ago and may suit your needs if you decide to move to something like IntMap: http://hackage.haskell.org/package/EnumMap

[Haskell-cafe] Re: Mining Twitter data in Haskell and Clojure

2010-06-14 Thread braver
I've supplied a profile report there. Since I load the graphs in memory and then walk them a lot, the time seems expected. It allocates a lot, though. The main graph type is type Graph = M.Map User AdjList type AdjList = M.Map Day Reps type User = B.ByteString type Day = Int type Reps = M.Map