On Fri, Mar 21, 2014 at 2:35 PM, Philippe Ombredanne
<pombreda...@nexb.com> wrote:
> On Thu, Mar 20, 2014 at 9:50 PM, Marc-Antoine Ruel <mar...@chromium.org> 
> wrote:
[...]
>> - Encoding, I had to write a state machine to read the logs properly, see
>> the >110 lines of strace_process_quoted_arguments(). That's independent of
>> -x.
>
> Agreed this is an issue and that is one reason for getting a
> structured output. Let me make a separate post with a python snippet
> that may help you there.

Marc-Antoine:

In your case using the standard Python shlex module may be of some help?
It has a lot of the logic needed to lex shell-like quoted arguments.

Here is a Python snippet that demonstrates this approach. This code is
not copyrighted and placed in the public domain if you ever fancy
reusing it.

Cordially
-- 
Philippe Ombredanne


import re
import shlex

# catch things like , [/* 65 vars */]
VARS_COMMENT = re.compile(r', \[\/\* \d+ vars \*\/\]')

def decode_args(args):
    """
    Return a list of arguments from args string.

    Based on a classical strace output like::
        execve("/bin/bash", ["/bin/bash", "-c", "gcc -Wall
-Wwrite-strings -g -O2   -o strace bjm.o ."...], [/* 24 vars */]) = 0
    The expected args string is something like::
        "/bin/bash", ["/bin/bash", "-c", "gcc -Wall -Wwrite-strings -g
-O2   -o strace bjm.o ."...], [/* 24 vars */]
    And the returned list looks like::
        ['/bin/bash', '/bin/bash', '-c', 'gcc -Wall -Wwrite-strings -g
-O2   -o strace bjm.o ....']
    'Inner' arguments could be further decoded using the same approach.
    """
    try:
        # First some cleanup on args
        # remove var comments
        cleaned = re.sub(VARS_COMMENT, '', args)
        # remove deleted info: this can happen in decoded file descriptors
        # read(0</tmp/sh-thd-1391680596 (deleted)>, "..."..., 61) = 61
        cleaned = cleaned.replace(' (deleted)>', '>')
        # Then lex
        lexed = shlex.shlex(cleaned, posix=True)
        lexed.commenters = ''
        # use comma and whitespace as args delimiters
        lexed.whitespace_split = True
        lexed.whitespace += ','
        decoded = list(lexed)
        # Then fix brackets: [ at beginning and ] at end of each arg
        # FIXME: should do it only on the first and last arg but not all args
        fixed = [arg.lstrip('[').rstrip(']') for arg in decoded]
    except ValueError, e:
        raise ValueError('Error while decoding args: %(args)r.' % locals())
    return fixed

------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and their
applications. Written by three acclaimed leaders in the field,
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/13534_NeoTech
_______________________________________________
Strace-devel mailing list
Strace-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/strace-devel

Reply via email to