Tolkin
Cc: Boston Perl Mongers
Subject: Re: [Boston.pm] perl program to count distinct values - can it be
made faster
[much snipped]
4A. Normally i'd use
Heuristic: convert the hash-array update loop to array/hash slice [
www.stonehenge.com/merlyn/UnixReview/col68.html ]
We'd like 0 .. $NF range
On Sun, Mar 9, 2014 at 11:59 AM, Steve Tolkin stevetol...@comcast.netwrote:
Since I can just set a constant value
Aha, this suggests using that optimization in the loop form as well, since
you don't care about the Count.
$aoh[$_]{$data[$_]}* ||= *1 foreach 0 .. $#data;
forces the
I wrote a simple perl program to count the number of distinct values in each
field of a file. It just reads each line once, sequentially. The input
files vary in the number of rows: 1 million is typical, but some have as
many as 100 million rows, and are 10 GB in size, so I am reluctant to use
On 03/08/2014 12:56 PM, David Larochelle wrote:
In terms of style, I suggest you take a look at Perl Best practices. I
would change
$firstline = $_ if $numlines == 0;
to
if ($numlines == 0)
{
$firstline = $_;
}
why even have $numlines? use the buitin $. that alone is a nice
On Sat, Mar 08, 2014 at 10:59:22AM -0500, Steve Tolkin wrote:
# return code: 0 == success; 1 == some warnings; 2 == some errors
my $rc = 0;
This value never changes. I assume the larger program could change it.
my $split_char=','; # CHANGE ME IF NEEDED (later use getopts)
my @aoh; # array
I think Gyepi SAM is getting close to the issue: regex. it is clearly
NOT i/o bound if scanning once per field is about as fast. Regexes
are slow. Very slow. Replace w/ simple parsing logic based on string
position and you'll get the speed up.
Note, I like regex just fine, but I have
On Sat, Mar 8, 2014 at 1:40 PM, Gyepi SAM gy...@praxis-sw.com wrote:
For fun, I wrote a version in Go and it's twice as fast as the perl
version. I imagine a C version would be faster yet, but I get paid for that
kind of fun. I'd be happy to send you the Go version if you're interested.
I'm
Hi Steve,
You've got AIX again? Cool.
I like AIX. Let me know if you need help ! (not just perl)
Or, maybe someone has a different and faster program that does count
distinct on the fields in a
file.
Outside the box, If the data is coming from or going to a SQL system, SQL
can do the COUNT
On Sat, Mar 08, 2014 at 02:12:50PM -0500, David Larochelle wrote:
On Sat, Mar 8, 2014 at 1:40 PM, Gyepi SAM gy...@praxis-sw.com wrote:
For fun, I wrote a version in Go and it's twice as fast as the perl
version. I imagine a C version would be faster yet, but I get paid for that
kind of
Thanks Gyepi
On Mar 8, 2014 3:27 PM, Gyepi SAM gy...@praxis-sw.com wrote:
On Sat, Mar 08, 2014 at 02:12:50PM -0500, David Larochelle wrote:
On Sat, Mar 8, 2014 at 1:40 PM, Gyepi SAM gy...@praxis-sw.com wrote:
For fun, I wrote a version in Go and it's twice as fast as the perl
version.
On Sat, Mar 8, 2014 at 10:59 AM, Steve Tolkin stevetol...@comcast.netwrote:
I wrote a simple perl program to count the number of distinct values in
each
field of a file. It just reads each line once, sequentially. The input
files vary in the number of rows: 1 million is typical, but some
11 matches
Mail list logo