Sorry it took so long to respond, just getting back from the holidays. You all have given me much to think about. I've read all the messages through once, now I need to go trough them again and try to apply the ideas. I'll be posting other questions as I run into problems. BTW, Danny, best explanation of generators I've heard, well done and thank you.
regards, Richard On Thu, Dec 24, 2015 at 4:54 PM, Danny Yoo <d...@hashcollision.org> wrote: > > I think what I need to do would be analogous to (pardon if I'm using the > > wrong terminology, at this poing in the discussion I am officially out of > > my depth) sending the input stream to a buffer(s) until the ETX for that > > message comes in, shoot the buffer contents to the parser while accepting > > the next STX + message fragment into the buffer, or something analogous. > > Yes, I agree. It sounds like you have one process read the socket and > collect chunks of bytes delimited by the STX markers. It can then > send those chunks to the XML parser. > > > We can imagine one process that reads the socket and spits out a list > of byte chunks: > > chunks = readDelimitedChunks(socket) > > and another process that parses those chunks and does something with them: > > for chunk in chunks: > .... > > > It would be nice if we could organize the program like this. But one > problem is that chunks might not be finite! The socket might keep on > returning bytes. If it keeps returning bytes, we can't possibly > return a finite list of the chunked bytes. > > > What we really want is something like: > > chunkStream = readDelimitedChunks(socket) > for chunk in chunkStream: > .... > > where chunkStream is itself like a socket: it should be something that > we can repeatedly read from as if it were potentially infinite. > > > We can actually do this, and it isn't too bad. There's a mechanism in > Python called a generator that allows us to write function-like things > that consume streams of input and produce streams of output. Here's a > brief introduction to them. > > For example, here's a generator that knows how to produce an infinite > stream of numbers: > > ############## > def nums(): > n = 0 > while True: > yield n > n += 1 > ############## > > What distinguishes a generator from a regular function? The use of > "yield". A "yield" is like a return, but rather than completely > escape out of the function with the return value, this generator will > remember what it was doing at that time. Why? Because it can > *resume* itself when we try to get another value out of the generator. > > Let's try it out: > > ##################### > > >>> numStream = nums() > >>> numStream.next() > 0 > >>> numStream.next() > 1 > >>> numStream.next() > 2 > >>> numStream.next() > 3 > >>> numStream.next() > 4 > ##################### > > Every next() we call on a generator will restart it from where it left > off, until it reaches its next "yield". That's how we get this > generator to return an infinite sequence of things. > > > That's how we produce infinite sequences. And we can write another > generator that knows how to take a stream of numbers, and square each > one. > > ######################## > def squaring(stream): > for n in stream: > yield n > ######################## > > > Let's try it. > > > ######################## > > >>> numStream = nums() > >>> squaredNums = squaring(numStream) > >>> squaredNums.next() > 0 > >>> squaredNums.next() > 1 > >>> squaredNums.next() > 4 > >>> squaredNums.next() > 9 > >>> squaredNums.next() > 16 > ######################## > > > If you have experience with other programming languages, you may have > heard of the term "co-routine". What we're doing with this should be > reminiscent of coroutine-style programming. We have one generator > feeding input into the other, with program control bouncing back and > forth between the generators as necessary. > > > So that's a basic idea of generators. It lets us write processes that > can deal with and produce streams of data. In the context of sockets, > this is particularly helpful, because sockets can be considered a > stream of bytes. > > > Here's another toy example that's closer to the problem you're trying > to solve. Let's say that we're working on a program to alphabetize > the words of a sentence. Very useless, of course. :P We might pass > it in the input: > > this > is > a > test > of > the > emergency > broadcast > system > > and expect to get back the following sentence: > > hist > is > a > estt > fo > eht > ceeegmnry > aabcdorst > emssty > > We can imagine one process doing chunking, going from a sequence of > characters to a sequence of words: > > ########################################### > def extract_words(seq): > """Yield the words in a sequence of characters.""" > buffer = [] > for ch in seq: > if ch.isalpha(): > buffer.append(ch) > elif buffer: > yield ''.join(buffer) > del buffer[:] > # If we hit the end of the buffer, we still might > # need to yield one more result. > if buffer: > yield ''.join(buffer) > ########################################### > > > and a function that transforms words to their munged counterpart: > > ######################### > def transform(word): > """"Munges a word into its alphabetized form.""" > chars = list(word) > chars.sort() > return ''.join(chars) > ######################### > > This forms the major components of a program that can do the munging > on a file... or a socket! > > > Here's the complete example: > > > ############################################# > import sys > > def extract_words(seq): > """Yield the words in a sequence of characters.""" > buffer = [] > for ch in seq: > if ch.isalpha(): > buffer.append(ch) > elif buffer: > yield ''.join(buffer) > del buffer[:] > # If we hit the end of the buffer, we still might > # need to yield one more result. > if buffer: > yield ''.join(buffer) > > def transform(word): > """"Munges a word into its alphabetized form.""" > chars = list(word) > chars.sort() > return ''.join(chars) > > > def as_byte_seq(f): > """Return the bytes of the file-like object f as a > sequence.""" > while True: > ch = f.read(1) > if not ch: break > yield ch > > > if __name__ == '__main__': > for word in extract_words(as_byte_seq(sys.stdin)): > print(transform(word)) > ############################################ > > > > If you have questions, please feel free to ask. Good luck! > -- All internal models of the world are approximate. ~ Sebastian Thrun _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor