One more suggestion: Try simply creating one giant map that maps
complete postal code strings directly to the associated data values,
without any tree data structure explicitly created in your code. The
code will be a lot simpler, and the underlying data structure is
likely to be competit
Check out assoc-in, get-in, and update-in. They make working with
nested maps a breeze. Here's a rewrite of your code:
(ns user
(:require [clojure.string :as str]
[clojure.java.io :as io]))
(def postcode-trie
(with-open [r (io/reader "/path/to/data.csv")]
(reduce
(fn [tri
On Nov 3, 2010, at 8:31 PM, Andy Fingerhut wrote:
> I'd recommend changing your code to use Java chars instead of single-char
> Java strings. That plus the change Paul suggests below should reduce memory
> consumption significantly.
Another option is to .intern() the strings. You'll still crea
In addition to this, note that every java.lang.String, i.e. every "A"
in your example data structures below, requires storage for one
java.lang.Object (8 bytes on 32-bit JVMs, 16 bytes on 64-bit JVMs)
plus an array of java char's, where I think that is equal to the size
of a java.lang.Objec
Why not just sort the text file and then build the merged trees
directly, without the numerous intermediate trees?
On Wed, Nov 3, 2010 at 12:22 PM, Paul Ingles wrote:
> Hi,
>
> I've been playing around with breaking apart a list of postal codes to
> be stored in a tree with leaf nodes containing
I could be missing something, but since you say you're running into
problems as your data gets large, I think you're possibly running into
2 things:
1. You're reading all the data into memory, and keeping it there, in
the form of the lines from the file.
2. The way you're defining postcodes-from-f
On Wed, Nov 3, 2010 at 12:22 PM, Paul Ingles wrote:
> (defn merge-tree
> [tree other]
> (if (not (map? other))
>tree
>(merge-with (fn [x y] (merge-tree x y))
>tree other)))
>
You can get rid of the anonymous function here and just do (merge-with
merge-tree tree other).
Hi,
I've been playing around with breaking apart a list of postal codes to
be stored in a tree with leaf nodes containing information about that
area. I have some code that works with medium-ish size inputs but
fails with a GC Overhead error with larger input sets (1.5m rows) and
would really appr