On Tuesday, 23 February 2021 at 00:08:40 UTC, tsbockman wrote:
On Friday, 19 February 2021 at 00:13:19 UTC, Jon Degenhardt
wrote:
It would be interesting to see how the performance compares to
tsv-uniq
(https://github.com/eBay/tsv-utils/tree/master/tsv-uniq). The
prebuilt binaries turn on all
On Friday, 19 February 2021 at 00:13:19 UTC, Jon Degenhardt wrote:
On Wednesday, 17 February 2021 at 04:10:24 UTC, tsbockman wrote:
I spent some time experimenting with this problem, and here is
the best solution I found, assuming that perfect
de-duplication is required. (I'll put the code up o
On Wednesday, 17 February 2021 at 04:10:24 UTC, tsbockman wrote:
I spent some time experimenting with this problem, and here is
the best solution I found, assuming that perfect de-duplication
is required. (I'll put the code up on GitHub / dub if anyone
wants to have a look.)
It would be inter
On Wednesday, 17 February 2021 at 04:10:24 UTC, tsbockman wrote:
On files small enough to fit in RAM, it is similar in speed to
the other solutions posted, but less memory hungry. Memory
consumption in this case is around (sourceFile.length + 32 *
lineCount * 3 / 2) bytes. Run time is similar t
On Friday, 12 February 2021 at 01:23:14 UTC, Josh wrote:
I'm trying to read in a text file that has many duplicated
lines and output a file with all the duplicates removed. By the
end of this code snippet, the memory usage is ~5x the size of
the infile (which can be multiple GB each), and when
On Saturday, 13 February 2021 at 04:19:17 UTC, Ali Çehreli wrote:
On 2/11/21 6:22 PM, H. S. Teoh wrote:
>bool[size_t] hashes;
I would start with an even simpler solution until it's proven
that there still is a memory issue:
import std.stdio;
void main() {
bool[string] lines;
On 2/11/21 6:22 PM, H. S. Teoh wrote:
>bool[size_t] hashes;
I would start with an even simpler solution until it's proven that there
still is a memory issue:
import std.stdio;
void main() {
bool[string] lines;
foreach (line; stdin.byLine) {
if (line !in li
On Fri, Feb 12, 2021 at 07:23:12AM +, frame via Digitalmars-d-learn wrote:
> On Friday, 12 February 2021 at 02:22:35 UTC, H. S. Teoh wrote:
>
> > This turns the OP's O(n log n) algorithm into an O(n) algorithm,
> > doesn't need to copy the entire content of the file into memory, and
> > also u
On Friday, 12 February 2021 at 07:23:12 UTC, frame wrote:
On Friday, 12 February 2021 at 02:22:35 UTC, H. S. Teoh wrote:
This turns the OP's O(n log n) algorithm into an O(n)
algorithm, doesn't
need to copy the entire content of the file into memory, and
also uses
much less memory by storing
On Friday, 12 February 2021 at 02:22:35 UTC, H. S. Teoh wrote:
This turns the OP's O(n log n) algorithm into an O(n)
algorithm, doesn't
need to copy the entire content of the file into memory, and
also uses
much less memory by storing only hashes.
But this kind of hash is maybe insufficient
On Fri, Feb 12, 2021 at 01:45:23AM +, mw via Digitalmars-d-learn wrote:
> On Friday, 12 February 2021 at 01:23:14 UTC, Josh wrote:
> > I'm trying to read in a text file that has many duplicated lines and
> > output a file with all the duplicates removed.
>
> If you only need to remove duplicat
On Friday, 12 February 2021 at 01:23:14 UTC, Josh wrote:
I'm trying to read in a text file that has many duplicated
lines and output a file with all the duplicates removed.
If you only need to remove duplicates, keep (and compare) a
string hash for each line is good enough. Memory usage should
I'm trying to read in a text file that has many duplicated lines
and output a file with all the duplicates removed. By the end of
this code snippet, the memory usage is ~5x the size of the infile
(which can be multiple GB each), and when this is in a loop the
memory usage becomes unmanageable a
13 matches
Mail list logo