I'm now using a Perl script which Malcom Beattie kindly gave me. I
made some minor changes to be more generalized, but the main logic is
his. I ran it and it was significantly faster than BASH. And I then
managed to use up all the space in the filesystem that I had it on.
OOPS. I'm going to need to
The internal bash parameter expansion functions (e.g. ${line%% *}) tend to be
quite inefficient. Here is one example, compare the performance of bash
substitution to Perl substitution:
#!/bin/bash
comma_sep=$(perl -e 'for($i=0;$i<1000;$i++) { print("$i;") };')
time space_sep=${comma_sep//;/ }
ti
On 31 January 2013 22:01, Philipp Kern wrote:
> even quite old Intel boxes manage to saturate 1 GE easily. You're
> copying stuff into the send buffer and ring a bell.
>
> Nowadays it doesn't seem hard to do 10 GE with a Linux box, especially if
> you've got HW assist on the network card. The z n
I want to thank Malcolm Beattie for the Perl script. I'm running it
now. It has already finished processing one generation, having split
it up into 60 output files. This has been about 3 hours now.
Significantly faster. I only made one change. I did a close on all the
cached output files after fini
Rob,
am Thu, Jan 31, 2013 at 03:14:09PM +0100 hast du folgendes geschrieben:
> On 31 January 2013 14:38, Philipp Kern wrote:
> > Also you should be able to do between 100MB/s to 1GB/s on 10 GE, which is
> My rule of thumb is that pumping 100 MB/s or so through the Linux
> TCP/IP stack will burn a
On 31 January 2013 14:38, Philipp Kern wrote:
> Also you should be able to do between 100MB/s to 1GB/s on 10 GE, which is
My rule of thumb is that pumping 100 MB/s or so through the Linux
TCP/IP stack will burn a CPU, maybe half if he can use large packets.
So 1 GB/s takes 5-10 CPUs if the wire
John,
am Wed, Jan 30, 2013 at 08:24:04AM -0600 hast du folgendes geschrieben:
> Well, I know that downloading the 160 Gig uncompressed data takes
> about 8 hours on the 10 Gig/sec Ethernet connection. I then bzip2
> compress [...]
don't do that. If you just need a compression on such a high throu
Many thanks for that Perl code. I've taken it and will see how fast it is.
On Wed, Jan 30, 2013 at 7:55 AM, Malcolm Beattie wrote:
> John McKown writes:
> Perl (and Python) aren't simply interpreted. In the case of perl, it
> compiles the source into an internal op tree (rather like bytecode)
>
On Wed, Jan 30, 2013 at 11:56 PM, Chase, John wrote:
>> -Original Message-
>> From: Linux on 390 Port On Behalf Of John McKown
>>
>> Well, I know that downloading the 160 Gig uncompressed data takes about 8
>> hours on the 10 Gig/sec
>> Ethernet connection. I then bzip2 compress that to a
John McKown writes:
> This is more a curiosity question. I have written a bash script which
> reads a bzip2 compressed set of files. For each record in the file, it
> writes the record into a file name based on the first two "words" in
> the record and the "generation number" from the input file na
> -Original Message-
> From: Linux on 390 Port On Behalf Of John McKown
>
> Well, I know that downloading the 160 Gig uncompressed data takes about 8
> hours on the 10 Gig/sec
> Ethernet connection. I then bzip2 compress that to about 50 Meg. Which I
> binary upload back to z/OS
> for sa
Regarding i/o buffering, as Rob discusses
On 1/30/13 2:40 AM, Rob van der Heij wrote:
>
> If the input files have a lot of 'chunks' that go to the same output
> file, it might be fairly easy to gobble up the ones that go together
> and write them in a single go. Based on more heuristics, you may b
Well, I know that downloading the 160 Gig uncompressed data takes
about 8 hours on the 10 Gig/sec Ethernet connection. I then bzip2
compress that to about 50 Meg. Which I binary upload back to z/OS for
safety (since it's setting on my Linux desktop) in just a few minutes.
But bzgrep can scan the co
And I should mention, as Shane G. suggested, for simple, big, text
processing stuff like this you'll get as good performance from perl as
from compiled code. There's a reason big text processing genomics stuff
like bioperl is written in perl
I'm not a python wizard, but I have to imagine tha
>>> On 1/30/2013 at 08:44 AM, John McKown wrote:
> But I may be forced into using C or C++ for
> speed. Too bad I'm not a very good C programmer.
Based on the number of buffer overflow exploits that have been discovered over
the years, I don't think very many people are.
Mark Post
--
On Thu, Jan 31st, 2013 at 12:44 AM, John McKown wrote:
> Thanks to all for the input! I _tried_ to run the script over night. I
> added an echo to tell me which input file I was working on. I came in
> this morning. It had been running from 14:00 to 06:30 (16 1/2 hours)
> and was still on the firs
Thanks to all for the input! I _tried_ to run the script over night. I
added an echo to tell me which input file I was working on. I came in
this morning. It had been running from 14:00 to 06:30 (16 1/2 hours)
and was still on the first input file. That ain't gonna cut it. Time
to rethink. Using a
INUX-390@VM.MARIST.EDU] On Behalf Of John
McKown
Sent: den 29 januari 2013 23:13
To: LINUX-390@VM.MARIST.EDU
Subject: Speed of BASH script vs. Python vs. Perl vs. compiled
This is more a curiosity question. I have written a bash script which
reads a bzip2 compressed set of files. For each recor
On 30 January 2013 05:17, Patrick Spinler wrote:
> Since no one else seems to be pointing this out, i can see at least one
> potential optimization:
>
> This will re-open the output file and seek to the end every output.
> That's a factor of 2 or 3 more syscalls every time. (possibly plus the
>
Since no one else seems to be pointing this out, i can see at least one
potential optimization:
On 1/29/13 4:13 PM, John McKown wrote:
>
> If you're interested, the bash script looks like:
>
> #!/bin/bash
> for i in irradu00.g*.bz2;do
> gen=${i#irradu00.}; # remove prefix
> gen=${g
On Tue, Jan 29, 2013 at 8:55 PM, John McKown
wrote:
> Interesting. I've tried looking at R, but just can't get the time to read
> the the books I've bought.
I've been doing some analyses with R. It is *very* complex with lots
and lots of commands. I've found faculty who teach courses in which R
I don't think you're going to see much (if any) improvement, really. This
process is pretty much a simple filter, and you're mostly I/O bound, so C/C++
aren't really going to help much, and the amount of code you'll need to write
to simulate the parsing capabilities of any of the shell languages
Interesting. I've tried looking at R, but just can't get the time to read
the the books I've bought.
On Jan 29, 2013 7:39 PM, "David Boyes" wrote:
> > Actually, it is the IRRADU00 reformatted RACF audit records from SMF.
> Can't
> > process the SMF itself easily on VM/CMS or Linux.
>
> I have a f
> Interesting. I've tried looking at R, but just can't get the time to read the
> the
> books I've bought.
Another option for CMS or Linux might be the really ancient version of MACSYMA
that lives on the MVS CBT tape.
If you have access to a Fortran compiler, that beastie can eat structured
re
> Actually, it is the IRRADU00 reformatted RACF audit records from SMF. Can't
> process the SMF itself easily on VM/CMS or Linux.
I have a faint memory that someone took the SMF publication, extracted the
record layouts and created some data descriptions for the S statistical tool on
Linux. Don'
Actually, it is the IRRADU00 reformatted RACF audit records from SMF. Can't
process the SMF itself easily on VM/CMS or Linux.
On Jan 29, 2013 6:14 PM, "Michael Harding" wrote:
> Based on John's previous posts and the dataset names referenced in this,
> I'd say he's playing with the output from z/
Based on John's previous posts and the dataset names referenced in this,
I'd say he's playing with the output from z/OS' RACF database unload. If
his zlinux is VM-hosted, I'd be inclined to bring the files to VM first,
mung them up with CMS pipelines then transfer the output files to his
guest. O
Lifetime is as long as I work here and want to. This is not production. The
data is RACF audit information that I alone use to answer ad hoc
information requests. If pushed, I must recreate the report using z/OS
procedures. I do this on my Linux box to save z/OS cycles.
I may look at using ooREXX,
Shell is a write-only language. (that's an opinion) What I mean is,
maintaining "applications" written in shell, even BASH, is *hard*.
However ...
What is the extent and life of this script? If the purpose is to
wrap-up a number of other programs, then use a shell. I agree with
Jon. You're ca
Although I'm a fan of Python, I wouldn't rewrite your program. Bash itself
is an interpreted language like your other options of Python / Perl. You're
going to get near native (compiled program) speed by using that call to
bzcat for your decompression and then I like how your while loop is using
ba
This is more a curiosity question. I have written a bash script which
reads a bzip2 compressed set of files. For each record in the file, it
writes the record into a file name based on the first two "words" in
the record and the "generation number" from the input file name. Do to
the extreme size o
31 matches
Mail list logo