Re: [PLUG] Finding partial duplicate rows with uniq

2012-10-30 Thread Russell Senior
> "Rich" == Rich Shepard writes: Rich>I have a large data file that contains duplicate rows. 'uniq' Rich> finds those rows that match character-by-character, but not Rich> those who match only on the first three fields (separated by Rich> '|'). No, uniq *disregards* rows that match char

Re: [PLUG] Finding partial duplicate rows with uniq

2012-10-30 Thread Rich Shepard
On Tue, 30 Oct 2012, Fred James wrote: > How about (see below) ... would that work with "-u"? > "`-k POS1[,POS2]' > `--key=POS1[,POS2]' > Specify a sort field that consists of the part of the line between > POS1 and POS2 (or the end of the line, if POS2 is omitted), > _inclusive_. >

Re: [PLUG] Finding partial duplicate rows with uniq

2012-10-30 Thread Eric Wilhelm
# from Rich Shepard on Tuesday 30 October 2012: > I have a large data file that contains duplicate rows. 'uniq' finds >those rows that match character-by-character, but not those who match >only on the first three fields (separated by '|'). Hi Rich, perl -e 'while(<>) { my $k = join "|",

Re: [PLUG] Finding partial duplicate rows with uniq

2012-10-30 Thread Fred James
Rich Shepard wrote: > On Tue, 30 Oct 2012, Scott Bigelow wrote: > >> ... although if your fields are all significantly different in length, it >> probably won't work as well as Dale's solution. > Scott, > > The first field has a varying number of characters, that's why I didn't > use the -w opt

Re: [PLUG] Finding partial duplicate rows with uniq

2012-10-30 Thread Rich Shepard
On Tue, 30 Oct 2012, Scott Bigelow wrote: > ... although if your fields are all significantly different in length, it > probably won't work as well as Dale's solution. Scott, The first field has a varying number of characters, that's why I didn't use the -w option. Thanks, Rich ___

Re: [PLUG] Finding partial duplicate rows with uniq

2012-10-30 Thread Rich Shepard
On Tue, 30 Oct 2012, Dale Snell wrote: > From the above, may I take it that any data other than the first three > fields is irrelevant? If so, use cut(1) to write those fields, > line-by-line, to a scratch file. Then sort(1) said file, and use uniq(1) > to delete the duplicate lines. Dale,

Re: [PLUG] Finding partial duplicate rows with uniq

2012-10-30 Thread Scott Bigelow
uniq also has the "-w" flag, which instructs it to only compare the first N characters in a line: -w, --check-chars=N compare no more than N characters in lines although if your fields are all significantly different in length, it probably won't work as well as Dale's soluti

Re: [PLUG] Finding partial duplicate rows with uniq

2012-10-30 Thread Dale Snell
On Tue, 30 Oct 2012 16:17:08 -0700 (PDT) Rich Shepard wrote: >I have a large data file that contains duplicate rows. 'uniq' > finds those rows that match character-by-character, but not those who > match only on the first three fields (separated by '|'). There are > rows with the same locatio

[PLUG] Finding partial duplicate rows with uniq

2012-10-30 Thread Rich Shepard
I have a large data file that contains duplicate rows. 'uniq' finds those rows that match character-by-character, but not those who match only on the first three fields (separated by '|'). There are rows with the same location ID, date, and chemical that have different concentrations listed, and