Re: [Boston.pm] perl program to count distinct values - can it be made faster

2014-03-09 Thread Steve Tolkin
Tolkin Cc: Boston Perl Mongers Subject: Re: [Boston.pm] perl program to count distinct values - can it be made faster [much snipped] 4A. Normally i'd use Heuristic: convert the hash-array update loop to array/hash slice [ www.stonehenge.com/merlyn/UnixReview/col68.html ] We'd like 0 .. $NF range

Re: [Boston.pm] perl program to count distinct values - can it be made faster

2014-03-09 Thread Bill Ricker
On Sun, Mar 9, 2014 at 11:59 AM, Steve Tolkin stevetol...@comcast.netwrote: Since I can just set a constant value Aha, this suggests using that optimization in the loop form as well, since you don't care about the Count. $aoh[$_]{$data[$_]}* ||= *1 foreach 0 .. $#data; forces the

[Boston.pm] perl program to count distinct values - can it be made faster

2014-03-08 Thread Steve Tolkin
I wrote a simple perl program to count the number of distinct values in each field of a file. It just reads each line once, sequentially. The input files vary in the number of rows: 1 million is typical, but some have as many as 100 million rows, and are 10 GB in size, so I am reluctant to use

Re: [Boston.pm] perl program to count distinct values - can it be made faster

2014-03-08 Thread Uri Guttman
On 03/08/2014 12:56 PM, David Larochelle wrote: In terms of style, I suggest you take a look at Perl Best practices. I would change $firstline = $_ if $numlines == 0; to if ($numlines == 0) { $firstline = $_; } why even have $numlines? use the buitin $. that alone is a nice

Re: [Boston.pm] perl program to count distinct values - can it be made faster

2014-03-08 Thread Gyepi SAM
On Sat, Mar 08, 2014 at 10:59:22AM -0500, Steve Tolkin wrote: # return code: 0 == success; 1 == some warnings; 2 == some errors my $rc = 0; This value never changes. I assume the larger program could change it. my $split_char=','; # CHANGE ME IF NEEDED (later use getopts) my @aoh; # array

Re: [Boston.pm] perl program to count distinct values - can it be made faster

2014-03-08 Thread Charles Reitzel
I think Gyepi SAM is getting close to the issue: regex. it is clearly NOT i/o bound if scanning once per field is about as fast. Regexes are slow. Very slow. Replace w/ simple parsing logic based on string position and you'll get the speed up. Note, I like regex just fine, but I have

Re: [Boston.pm] perl program to count distinct values - can it be made faster

2014-03-08 Thread David Larochelle
On Sat, Mar 8, 2014 at 1:40 PM, Gyepi SAM gy...@praxis-sw.com wrote: For fun, I wrote a version in Go and it's twice as fast as the perl version. I imagine a C version would be faster yet, but I get paid for that kind of fun. I'd be happy to send you the Go version if you're interested. I'm

Re: [Boston.pm] perl program to count distinct values - can it be made faster

2014-03-08 Thread Bill Ricker
Hi Steve, You've got AIX again? Cool. I like AIX. Let me know if you need help ! (not just perl) Or, maybe someone has a different and faster program that does count distinct on the fields in a file. Outside the box, If the data is coming from or going to a SQL system, SQL can do the COUNT

Re: [Boston.pm] perl program to count distinct values - can it be made faster

2014-03-08 Thread Gyepi SAM
On Sat, Mar 08, 2014 at 02:12:50PM -0500, David Larochelle wrote: On Sat, Mar 8, 2014 at 1:40 PM, Gyepi SAM gy...@praxis-sw.com wrote: For fun, I wrote a version in Go and it's twice as fast as the perl version. I imagine a C version would be faster yet, but I get paid for that kind of

Re: [Boston.pm] perl program to count distinct values - can it be made faster

2014-03-08 Thread David Larochelle
Thanks Gyepi On Mar 8, 2014 3:27 PM, Gyepi SAM gy...@praxis-sw.com wrote: On Sat, Mar 08, 2014 at 02:12:50PM -0500, David Larochelle wrote: On Sat, Mar 8, 2014 at 1:40 PM, Gyepi SAM gy...@praxis-sw.com wrote: For fun, I wrote a version in Go and it's twice as fast as the perl version.

Re: [Boston.pm] perl program to count distinct values - can it be made faster

2014-03-08 Thread Conor Walsh
On Sat, Mar 8, 2014 at 10:59 AM, Steve Tolkin stevetol...@comcast.netwrote: I wrote a simple perl program to count the number of distinct values in each field of a file. It just reads each line once, sequentially. The input files vary in the number of rows: 1 million is typical, but some