Re: processing large text files

2014-10-26 Thread Alan Busby
On Mon, Oct 27, 2014 at 12:52 PM, Brian Craft wrote: > Makes sense, but not an option for this application. What about something > similar to iota, backed with byte arrays, or something? As Patrick pointed out, if you're working directly with byte array's you might want to use mmap which is wha

Re: processing large text files

2014-10-26 Thread Brian Craft
On Sunday, October 26, 2014 6:51:18 PM UTC-7, TheBusby wrote: > > On Mon, Oct 27, 2014 at 7:10 AM, Brian Craft > wrote: > >> I found iota, which looks like a good solution for the read portion of >> the problem. However I also need to process the data in the file. If I >> start with an iota/ve

Re: processing large text files

2014-10-26 Thread PlĂ­nio Balduino
I wrote a sample code to process the English Wikipedia file dump (+- 40GB) and didn't use nothing but the core Clojure and a bzip library. I'll put on GitHub to show you. I hope it helps. Plinio Balduino 11 982 611 487 > On 26/10/2014, at 23:51, Alan Busby wrote: > >> On Mon, Oct 27, 2014 a

Re: processing large text files

2014-10-26 Thread Alan Busby
On Mon, Oct 27, 2014 at 7:10 AM, Brian Craft wrote: > I found iota, which looks like a good solution for the read portion of the > problem. However I also need to process the data in the file. If I start > with an iota/vec and need to sort it, something like > > (sort (iota/vec "foo")) > Short d

processing large text files

2014-10-26 Thread Patrick Logan
The JVM on most platforms has good support for memory-mapped files. -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with y

Re: processing large text files

2014-10-26 Thread bernardH
On Sunday, October 26, 2014 11:10:19 PM UTC+1, Brian Craft wrote: > > The java overhead for Strings is incredible. Even moderate-sized input > files will consume all memory. Are there good existing solutions? > When needed (large size) and possible (not exactly text as in unicode, but ASCII or

processing large text files

2014-10-26 Thread Brian Craft
The java overhead for Strings is incredible. Even moderate-sized input files will consume all memory. Are there good existing solutions? I found iota, which looks like a good solution for the read portion of the problem. However I also need to process the data in the file. If I start with an io