On Fri, Mar 21, 2014 at 2:35 PM, Philippe Ombredanne <pombreda...@nexb.com> wrote: > On Thu, Mar 20, 2014 at 9:50 PM, Marc-Antoine Ruel <mar...@chromium.org> > wrote: [...] >> - Encoding, I had to write a state machine to read the logs properly, see >> the >110 lines of strace_process_quoted_arguments(). That's independent of >> -x. > > Agreed this is an issue and that is one reason for getting a > structured output. Let me make a separate post with a python snippet > that may help you there.
Marc-Antoine: In your case using the standard Python shlex module may be of some help? It has a lot of the logic needed to lex shell-like quoted arguments. Here is a Python snippet that demonstrates this approach. This code is not copyrighted and placed in the public domain if you ever fancy reusing it. Cordially -- Philippe Ombredanne import re import shlex # catch things like , [/* 65 vars */] VARS_COMMENT = re.compile(r', \[\/\* \d+ vars \*\/\]') def decode_args(args): """ Return a list of arguments from args string. Based on a classical strace output like:: execve("/bin/bash", ["/bin/bash", "-c", "gcc -Wall -Wwrite-strings -g -O2 -o strace bjm.o ."...], [/* 24 vars */]) = 0 The expected args string is something like:: "/bin/bash", ["/bin/bash", "-c", "gcc -Wall -Wwrite-strings -g -O2 -o strace bjm.o ."...], [/* 24 vars */] And the returned list looks like:: ['/bin/bash', '/bin/bash', '-c', 'gcc -Wall -Wwrite-strings -g -O2 -o strace bjm.o ....'] 'Inner' arguments could be further decoded using the same approach. """ try: # First some cleanup on args # remove var comments cleaned = re.sub(VARS_COMMENT, '', args) # remove deleted info: this can happen in decoded file descriptors # read(0</tmp/sh-thd-1391680596 (deleted)>, "..."..., 61) = 61 cleaned = cleaned.replace(' (deleted)>', '>') # Then lex lexed = shlex.shlex(cleaned, posix=True) lexed.commenters = '' # use comma and whitespace as args delimiters lexed.whitespace_split = True lexed.whitespace += ',' decoded = list(lexed) # Then fix brackets: [ at beginning and ] at end of each arg # FIXME: should do it only on the first and last arg but not all args fixed = [arg.lstrip('[').rstrip(']') for arg in decoded] except ValueError, e: raise ValueError('Error while decoding args: %(args)r.' % locals()) return fixed ------------------------------------------------------------------------------ Learn Graph Databases - Download FREE O'Reilly Book "Graph Databases" is the definitive new guide to graph databases and their applications. Written by three acclaimed leaders in the field, this first edition is now available. Download your free book today! http://p.sf.net/sfu/13534_NeoTech _______________________________________________ Strace-devel mailing list Strace-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/strace-devel