w00t! 12 cents! please donate to TriLUG. :p wow, that perl stuff... zoiks, amazing, I am no longer worthy... and what an incredible way to bring the thread on-topic. wow. *stares blankly in awe*
laters, David McD On 8/24/05, Aaron S. Joyner <[EMAIL PROTECTED]> wrote: > This is exactly the kind of solution I was looking for. :) Although, > Timothy doesn't seem to want the quarter (as evidenced by his lack of > links), so I'll just send 13 cents on over to Lee who came up with the > shortest post the soonest, and 12 cents to David for the longest. The > comparison for the shortest between the two candidates comes up > something like this: > 5 12 61 = Cygwin > 3 15 83 = Pavlov's dog > > The cygwin post wins in everything but line count. Since shortest is an > incomplete description (shortest vertically when printed, fewest bytes, > fewest words, fewest bytes after maximum gzip compression - aka least > "information", etc), I'd go with the default of who ever wins the wc > character count. For those not familiar with the above output, man wc. > > Now for the dissection of Tim's post, for the curious. I welcome his > commentary or additional comments on how many revs of that perl one > liner he went through before he got it counting right. :) > > Timothy A. Chagnon wrote: > > >...Nasty fun with perl, just counting lines to get a good guess: > >$ mkdir joyner > >$ wget -nd -nH -P joyner -r -l 1 -A gz \ > >-X Week-of http://www.trilug.org/pipermail/trilug/ > > > > > Get the files linked from the TriLUG archives, recursively down one > level, which end in .gz, and store them on the local disk (in "joyner"). > > >$ gunzip -c joyner/*gz >trilug.txt > > > > > Decompress them all into a file called trilug.txt (there by creating a > single text file with all the posts to trilug, ever). > > >$ perl -n -e 'if( /^From: / ){ if($count){print "$count\n";$count=0} > >if(/joyner/){$joyner=1;}else{$joyner=0} }else{if($joyner){ if(/^Date: > >/||/^Subject:/){print;} if(!/^>/&&/[a-zA-Z]/){$count++;}} }' trilug.txt > >|perl -n -e 'chomp; if(/^Date/) {$d=$_;}else{if(/^Sub/){$s=$_;}else{print > >"$_ $d $s\n";}}'|sort -n -k1 > > > > > To deconstruct this, it helps to break it down from a one liner into > properly intended code, which would be commented something like this: > > if( /^From: / ){ # If it's the start of a message... > if($count) { print "$count\n"; $count=0; } # Consider it the end of the > # previous message, print the count > if(/joyner/) { $joyner=1; } # if the From line contains "joyner", mark it > else { $joyner=0; } # otherwise, clear that mark > } > else{ # if this is a line in a message... > if($joyner) { # marked as written by me... > if(/^Date: /||/^Subject:/) { print; } # print the date and subject > headers > if(!/^>/&&/[a-zA-Z]/) { $count++; } # and count all the other lines > } > } > > This ends the first script, and he runs that script across the trilug.txt > file, which produces some output that's just line a Date: line, a Subject: > line, and a line count. He then runs this second script, using the previous > script's output as it's input: > > chomp; # clear off the newline character > if(/^Date/) { # If it's the date line > $d=$_; # stick the line in a var $d > } > else{ > if(/^Sub/){ # if it's a subject line > $s=$_; # stick it in the var $s > } > else{ > print "$_ $d $s\n"; # print the count, date, and subject lines on one > line > } > } > > He then takes the output of that, and runs it through sort, in order to ... > well.. I'll leave that up to the reader. > > So who wants to point out potential points for optimization of his code? Tim > - care to comment on / condense / clean up anything? :) > > Aaron S. Joyner > > -- > TriLUG mailing list : http://www.trilug.org/mailman/listinfo/trilug > TriLUG Organizational FAQ : http://trilug.org/faq/ > TriLUG Member Services FAQ : http://members.trilug.org/services_faq/ > TriLUG PGP Keyring : http://trilug.org/~chrish/trilug.asc > -- TriLUG mailing list : http://www.trilug.org/mailman/listinfo/trilug TriLUG Organizational FAQ : http://trilug.org/faq/ TriLUG Member Services FAQ : http://members.trilug.org/services_faq/ TriLUG PGP Keyring : http://trilug.org/~chrish/trilug.asc
