[Unicon-group] Efficient rearranging of bytes...

Steve Wampler Fri, 12 Dec 2003 10:33:17 -0800

After seeing the fun Art and Frank have been having (neat stuff!), I
thought I'd contribute a bit more code to this list...


A recent discussion on big- versus little-endianness on
comp.os.linux.misc led me to revise a Unicon program I
had written a while back that converts a file from big-endian
to little-endian long values.

The result is a program that efficiently (for Unicon, at least)
rearranges bytes in a file according to an arbitrary pattern - so
long as the pattern can be expressed in 256 bytes or less.  To
generalize things, I've created a class MapBytes and added it to
the Utils package available at:

    http://tapestry.tucson.az.us/unicon

The reason for picking a class (the Utils package had a simple
procedure mapBytes() previously) is because you can use
separate class instances to describe different rearrangements
simultaneously and avoid the cost of repeatedly 'bootstrapping' a
transposition (byte rearrangements are really just transposition
ciphers) to an efficient internal representation.  Using
a class was a natural fit for this problem.

The class also provides access to methods for transposing bytes
in arbitrary strings, which allows the building of other custom
transposition programs (which would allow the handling of 
transpositions that cannot be expressed in 256-bytes or less).

The heart of the code is a varient (with the recursion removed) of
the transposition approach described in Chapter 20 of the 3rd
edition of the Icon Programming Language book, and some safety
checks added to the bootstrapping code.

Both the main program and the MapBytes class source are attached
below, but you'll have to either modify both slightly or install
the Unilib packages from the above web site to actually use the
code.

If people have suggested improvements or comments, I'll be happy
to look at them!


-Steve
-- 
Steve Wampler -- [EMAIL PROTECTED]
The gods that smiled on your birth are now laughing out loud.

#<p>
#  Rearrange bytes in a file to some pattern  (default is
#     to reverse bytes in every 4-byte chunk).
#</p>
#<p>
# <b>Author:</b> Steve Wampler (<i>[EMAIL PROTECTED]</i>)
#</p>
#<p>
#  This file is in the <i>public domain</i>.
#</p>

import Utils

#<p>
# This program rearranges the bytes in standard input, writing
#   the rearrangement to standard output.  Arguments are:
#</p>
#<p>
#<br>    --inmap=INMAP        # default INMAP  is 1234
#<br>    --outmap=OUTMAP      # default OUTMAP is 4321
#<br>    --blocksize=BSIZE    # default BSIZE  is 64Kbytes
#</p>
#<p>
#  BSIZE is the number of bytes to load from the input file on
#  each read.
#  Normally, the default BSIZE is sufficient - it's really
#  included as an option to allow experimentation.
#</P>
procedure main(args)

    helpMesg(zapPrefix(!args,"--help"))

    inMap  := zapPrefix(!args,"--inmap=")  | "1234"
    outMap := zapPrefix(!args,"--outmap=") | "4321"
    bSize  := zapPrefix(!args,"--blocksize=")

    MapBytes(inMap,outMap,bSize).mapInFile(&input, &output)

end

procedure helpMesg()
    writes(&errout, "Usage: mapBytes --inmap=INMAP --outmap=OUTMAP")
    write(&errout, " [--blocksize=BSIZE]")
    write(&errout, "\tCopies input to ouput, reordering bytes.")
    write(&errout, "")
    write(&errout, "  INMAP -- pattern for input byte sequence")
    write(&errout, " OUTMAP -- rerrangement of INMAP")
    write(&errout, "  BSIZE -- bytes to read at a time from standard input")
    write(&errout, "")
    write(&errout, "   INMAP defaults to \"1234\"")
    write(&errout, "  OUTMAP defaults to \"4321\"")
    write(&errout, "   BSIZE defaults to 64KBytes")
    stop()
end

#<p>
#  Efficient byte-level transpositions
#</p>
#<p>
# <b>Author:</b> Steve Wampler (<i>[EMAIL PROTECTED]</i>)
#</p>
#<p>
#  This file is in the <i>public domain</i>.
#</p>

package Utils

link "Class"

#<p>
# Efficiently rearrange bytes in a string.  This is a class so that
#    multiple simultaneous rearrangements can be carried out efficiently.
#    Use a different class instance for each different rearrangement.
#</p>
#<p>
# Handles any transposition that can be expressed in multiples of
#   256-bytes or less.
#</p>
class MapBytes: Class (inM, inMap, outM, outMap, blkSize)

    #<p>
    # Return a copy of s with the bytes rearranged 
    #</p>
    method mapIn(s)
        local ns
        ns := ""
        s ? {
            while ns ||:= map(outMap, inMap, move(*inMap))
            while ns ||:= map(outM,   inM,   move(*inM))
            ns ||:= tab(0)
            }
        return ns
    end

    #<p>
    # Reverse the byte rearrangement
    #</p>
    method mapOut(s)
        local ns
        ns := ""
        s ? {
            while ns ||:= map(inMap, outMap, move(*outMap))
            while ns ||:= map(inM,   outM,   move(*outM))
            ns ||:= tab(0)
            }
        return ns
    end

    #<p>
    # Copy one file to another with mapping (leaves files open).
    #</p>
    method mapInFile(inFile,    # Input file (already opened for reading)
                     outFile,   # Output file (already opened for writing)
                     blockSize  # If present, overrides class' blkSize
                    )
        blockSize := adjBlockSize(blockSize)
        while writes(outFile, mapIn(reads(inFile, blockSize)))
    end

    #<p>
    # Copy one file to another with reverse mapping (leaves files open)
    #</p>
    method mapOutFile(inFile,    # Input file (already opened for reading)
                      outFile,   # Output file (already opened for writing)
                      blockSize  # If present, overrides class' blkSize
                     )
        blockSize := adjBlockSize(blockSize)
        while writes(outFile, mapOut(reads(inFile, blockSize)))
    end

    #<p>
    #  Adjust the given blockSize for file reads() to a valid value.
    #  <i>Used internally.</i>
    #</p>
    method adjBlockSize(blockSize)
        /blockSize := blkSize;
        blockSize <:= *inM                       # Won't work with less!
        blockSize := (blockSize / *inM) * (*inM) # Must be multiple of *inM!
        return blockSize
    end

    #<p>
    #  Create a class instance suitable for transposing bytes in
    #  in a string.  Efficiently handles large strings.
    #</p>
    initially(in,        # Arrangement of bytes found in input string
              out,       # Rearrangement of those bytes for output string
              blockSize  # Default size of file read/write operations (bytes).
                         # <i>Adjusted automatically to be a multiple
                         # of *in</i>.
             )
        inM  := \in  | "1"
        outM := \out | in                # Default is identity mapping
        blkSize := 64*1024;              # Default is large blocks
        blkSize := adjBlockSize(blockSize)

        # bootStrap to longest possible transposition strings...
        inMap := string(&cset)
        inMap := inMap[1+:((*inMap/*inM)*(*inM))]
        outMap := ""
        inMap ? while(outMap ||:= map(outM, inM, move(*inM)))
end

[Unicon-group] Efficient rearranging of bytes...

Reply via email to