Re: Constructing Nested Trees and Memory Consumption

2010-11-03 Thread Andy Fingerhut
One more suggestion: Try simply creating one giant map that maps complete postal code strings directly to the associated data values, without any tree data structure explicitly created in your code. The code will be a lot simpler, and the underlying data structure is likely to be competit

Re: Constructing Nested Trees and Memory Consumption

2010-11-03 Thread Justin Kramer
Check out assoc-in, get-in, and update-in. They make working with nested maps a breeze. Here's a rewrite of your code: (ns user (:require [clojure.string :as str] [clojure.java.io :as io])) (def postcode-trie (with-open [r (io/reader "/path/to/data.csv")] (reduce (fn [tri

Re: Constructing Nested Trees and Memory Consumption

2010-11-03 Thread Michael Gardner
On Nov 3, 2010, at 8:31 PM, Andy Fingerhut wrote: > I'd recommend changing your code to use Java chars instead of single-char > Java strings. That plus the change Paul suggests below should reduce memory > consumption significantly. Another option is to .intern() the strings. You'll still crea

Re: Constructing Nested Trees and Memory Consumption

2010-11-03 Thread Andy Fingerhut
In addition to this, note that every java.lang.String, i.e. every "A" in your example data structures below, requires storage for one java.lang.Object (8 bytes on 32-bit JVMs, 16 bytes on 64-bit JVMs) plus an array of java char's, where I think that is equal to the size of a java.lang.Objec

Re: Constructing Nested Trees and Memory Consumption

2010-11-03 Thread Leif Walsh
Why not just sort the text file and then build the merged trees directly, without the numerous intermediate trees? On Wed, Nov 3, 2010 at 12:22 PM, Paul Ingles wrote: > Hi, > > I've been playing around with breaking apart a list of postal codes to > be stored in a tree with leaf nodes containing

Re: Constructing Nested Trees and Memory Consumption

2010-11-03 Thread Paul Mooser
I could be missing something, but since you say you're running into problems as your data gets large, I think you're possibly running into 2 things: 1. You're reading all the data into memory, and keeping it there, in the form of the lines from the file. 2. The way you're defining postcodes-from-f

Re: Constructing Nested Trees and Memory Consumption

2010-11-03 Thread Ken Wesson
On Wed, Nov 3, 2010 at 12:22 PM, Paul Ingles wrote: > (defn merge-tree > [tree other] > (if (not (map? other)) >tree >(merge-with (fn [x y] (merge-tree x y)) >tree other))) > You can get rid of the anonymous function here and just do (merge-with merge-tree tree other).

Constructing Nested Trees and Memory Consumption

2010-11-03 Thread Paul Ingles
Hi, I've been playing around with breaking apart a list of postal codes to be stored in a tree with leaf nodes containing information about that area. I have some code that works with medium-ish size inputs but fails with a GC Overhead error with larger input sets (1.5m rows) and would really appr