Re: Trying to reduce memory usage

2021-02-22 Thread Jon Degenhardt via Digitalmars-d-learn
On Tuesday, 23 February 2021 at 00:08:40 UTC, tsbockman wrote: On Friday, 19 February 2021 at 00:13:19 UTC, Jon Degenhardt wrote: It would be interesting to see how the performance compares to tsv-uniq (https://github.com/eBay/tsv-utils/tree/master/tsv-uniq). The prebuilt binaries turn on all

Re: Trying to reduce memory usage

2021-02-22 Thread tsbockman via Digitalmars-d-learn
On Friday, 19 February 2021 at 00:13:19 UTC, Jon Degenhardt wrote: On Wednesday, 17 February 2021 at 04:10:24 UTC, tsbockman wrote: I spent some time experimenting with this problem, and here is the best solution I found, assuming that perfect de-duplication is required. (I'll put the code up o

Re: Trying to reduce memory usage

2021-02-18 Thread Jon Degenhardt via Digitalmars-d-learn
On Wednesday, 17 February 2021 at 04:10:24 UTC, tsbockman wrote: I spent some time experimenting with this problem, and here is the best solution I found, assuming that perfect de-duplication is required. (I'll put the code up on GitHub / dub if anyone wants to have a look.) It would be inter

Re: Trying to reduce memory usage

2021-02-16 Thread tsbockman via Digitalmars-d-learn
On Wednesday, 17 February 2021 at 04:10:24 UTC, tsbockman wrote: On files small enough to fit in RAM, it is similar in speed to the other solutions posted, but less memory hungry. Memory consumption in this case is around (sourceFile.length + 32 * lineCount * 3 / 2) bytes. Run time is similar t

Re: Trying to reduce memory usage

2021-02-16 Thread tsbockman via Digitalmars-d-learn
On Friday, 12 February 2021 at 01:23:14 UTC, Josh wrote: I'm trying to read in a text file that has many duplicated lines and output a file with all the duplicates removed. By the end of this code snippet, the memory usage is ~5x the size of the infile (which can be multiple GB each), and when

Re: Trying to reduce memory usage

2021-02-13 Thread Daniel N via Digitalmars-d-learn
On Saturday, 13 February 2021 at 04:19:17 UTC, Ali Çehreli wrote: On 2/11/21 6:22 PM, H. S. Teoh wrote: >bool[size_t] hashes; I would start with an even simpler solution until it's proven that there still is a memory issue: import std.stdio; void main() { bool[string] lines;

Re: Trying to reduce memory usage

2021-02-12 Thread Ali Çehreli via Digitalmars-d-learn
On 2/11/21 6:22 PM, H. S. Teoh wrote: >bool[size_t] hashes; I would start with an even simpler solution until it's proven that there still is a memory issue: import std.stdio; void main() { bool[string] lines; foreach (line; stdin.byLine) { if (line !in li

Re: Trying to reduce memory usage

2021-02-12 Thread H. S. Teoh via Digitalmars-d-learn
On Fri, Feb 12, 2021 at 07:23:12AM +, frame via Digitalmars-d-learn wrote: > On Friday, 12 February 2021 at 02:22:35 UTC, H. S. Teoh wrote: > > > This turns the OP's O(n log n) algorithm into an O(n) algorithm, > > doesn't need to copy the entire content of the file into memory, and > > also u

Re: Trying to reduce memory usage

2021-02-12 Thread frame via Digitalmars-d-learn
On Friday, 12 February 2021 at 07:23:12 UTC, frame wrote: On Friday, 12 February 2021 at 02:22:35 UTC, H. S. Teoh wrote: This turns the OP's O(n log n) algorithm into an O(n) algorithm, doesn't need to copy the entire content of the file into memory, and also uses much less memory by storing

Re: Trying to reduce memory usage

2021-02-11 Thread frame via Digitalmars-d-learn
On Friday, 12 February 2021 at 02:22:35 UTC, H. S. Teoh wrote: This turns the OP's O(n log n) algorithm into an O(n) algorithm, doesn't need to copy the entire content of the file into memory, and also uses much less memory by storing only hashes. But this kind of hash is maybe insufficient

Re: Trying to reduce memory usage

2021-02-11 Thread H. S. Teoh via Digitalmars-d-learn
On Fri, Feb 12, 2021 at 01:45:23AM +, mw via Digitalmars-d-learn wrote: > On Friday, 12 February 2021 at 01:23:14 UTC, Josh wrote: > > I'm trying to read in a text file that has many duplicated lines and > > output a file with all the duplicates removed. > > If you only need to remove duplicat

Re: Trying to reduce memory usage

2021-02-11 Thread mw via Digitalmars-d-learn
On Friday, 12 February 2021 at 01:23:14 UTC, Josh wrote: I'm trying to read in a text file that has many duplicated lines and output a file with all the duplicates removed. If you only need to remove duplicates, keep (and compare) a string hash for each line is good enough. Memory usage should

Trying to reduce memory usage

2021-02-11 Thread Josh via Digitalmars-d-learn
I'm trying to read in a text file that has many duplicated lines and output a file with all the duplicates removed. By the end of this code snippet, the memory usage is ~5x the size of the infile (which can be multiple GB each), and when this is in a loop the memory usage becomes unmanageable a