> "Rich" == Rich Shepard writes:
Rich>I have a large data file that contains duplicate rows. 'uniq'
Rich> finds those rows that match character-by-character, but not
Rich> those who match only on the first three fields (separated by
Rich> '|').
No, uniq *disregards* rows that match char
On Tue, 30 Oct 2012, Fred James wrote:
> How about (see below) ... would that work with "-u"?
> "`-k POS1[,POS2]'
> `--key=POS1[,POS2]'
> Specify a sort field that consists of the part of the line between
> POS1 and POS2 (or the end of the line, if POS2 is omitted),
> _inclusive_.
>
# from Rich Shepard on Tuesday 30 October 2012:
> I have a large data file that contains duplicate rows. 'uniq' finds
>those rows that match character-by-character, but not those who match
>only on the first three fields (separated by '|').
Hi Rich,
perl -e 'while(<>) {
my $k = join "|",
Rich Shepard wrote:
> On Tue, 30 Oct 2012, Scott Bigelow wrote:
>
>> ... although if your fields are all significantly different in length, it
>> probably won't work as well as Dale's solution.
> Scott,
>
> The first field has a varying number of characters, that's why I didn't
> use the -w opt
On Tue, 30 Oct 2012, Scott Bigelow wrote:
> ... although if your fields are all significantly different in length, it
> probably won't work as well as Dale's solution.
Scott,
The first field has a varying number of characters, that's why I didn't
use the -w option.
Thanks,
Rich
___
On Tue, 30 Oct 2012, Dale Snell wrote:
> From the above, may I take it that any data other than the first three
> fields is irrelevant? If so, use cut(1) to write those fields,
> line-by-line, to a scratch file. Then sort(1) said file, and use uniq(1)
> to delete the duplicate lines.
Dale,
uniq also has the "-w" flag, which instructs it to only compare the first N
characters in a line:
-w, --check-chars=N
compare no more than N characters in lines
although if your fields are all significantly different in length, it
probably won't work as well as Dale's soluti
On Tue, 30 Oct 2012 16:17:08 -0700 (PDT)
Rich Shepard wrote:
>I have a large data file that contains duplicate rows. 'uniq'
> finds those rows that match character-by-character, but not those who
> match only on the first three fields (separated by '|'). There are
> rows with the same locatio
I have a large data file that contains duplicate rows. 'uniq' finds those
rows that match character-by-character, but not those who match only on the
first three fields (separated by '|'). There are rows with the same location
ID, date, and chemical that have different concentrations listed, and