On Feb 16, 2011, at 6:54 AM, Ian Barber wrote: > > On Wed, Feb 16, 2011 at 2:30 AM, Chuck Remes <[email protected]> wrote: > So, it's trying to expand the buffer past 10MB and fails. The buffer starts > out at 512k (new default I set) and grows very rapidly. What kind of data is > 0mq putting on this internal socket that could grow to this size so quickly? > > I did track down the code in this component that appears to trigger it. It's > publishing as fast as it can to a PUB socket. The socket is allocated with > its default settings (HWM, etc) so it should be able to grow to the size of > memory if the subscribers are too slow. This box has 12GB RAM and is nowhere > near its limit. > > Suggestions? > > cr > > Hi Chuck, > > I saw some similar stuff a while back due to two sockets with the same > identity being connected at the same time (which shouldn't be able to happen, > but did). Just as a though - is there any chance something like that could be > happening, as at least then there might be a work around.
Interesting idea. I do set all of my identities, but each one uses a string prepended with a random number (0 to 999_999_999) so collisions are extremely unlikely. I'll audit that code to make sure it isn't happening though. In any event, I *think* I may have resolved this particular issue. Let me explain in detail the entire setup, my hypothesis for what was occurring, and how I solved it. Purpose: The system of distributed components in a trading system "back tester" that allows simulation of trading ideas (strategies) against historical data. The historical data can either be tick level data (hundreds/thousands of ticks per second) or bar data (summarized tick data at a 1 minute, 5 minute, X minute level). To backtest a trading idea against data going back as far as 2002, the system must be able to run at faster than "real time" otherwise it would take 9 years to backtest a single strategy (i.e. 1 second of tick data would take 1 second of real time to process). To accomplish faster than real-time, the distributed components use a simulated clock signal that is broadcast amongst the components. The signal is incremented in lock-step with the most recently published tick data (which contains a timestamp) so that component A has the same clock time as the other components. To make sure the clock signal advances correctly, each component must receive all tick timestamps and agree that they have the same one before they signal the clock component that it may advance the clock. All socket handling uses the reactor pattern so all sends/recvs are NOBLOCK and occur within a single thread. A component/process may have multiple threads *each* with their own reactor. All cross-thread communication is accomplished via 0mq sockets to there aren't any user code mutexes. Problem: One of the components is the tick broadcaster. It receives requests for a specific contract_id and timestamp and then begins publishing every tick of that contract. This component has 4 threads, each running its own reactor. Requests are load balanced across all 4 using the XREQ/XREP load balancing semantics through an intermediate QUEUE device. All pretty standard stuff. Each reactor has a non-blocking client that subscribes to all tick data via a SUB socket and a subscription topic of '' (everything). It drops/closes all messages except for the last one read before EAGAIN and decodes it for its timestamp. It matches that timestamp against what it expects and either signals the clock component (another process) to advance or it goes back to read more ticks to find the appropriate timestamp. The subscriber should be faster than the publisher because it *only* decodes the last message whereas the publisher has to encode everything for transmission. Subscribers do way less work. I have special logic for handling weekends. I deal with futures markets to they close at 5pm on Friday afternoon and don't reopen again until around 4pm on Sunday. I need to have the clock automatically skip forward so I don't process all of those "dead" seconds on Friday night, Saturday and Sunday morning. I had a flaw in this skip logic that skipped the clock forward to Monday instead of Sunday. So, instead of having to encode and publish 1m of new tick data on Sunday, it was publishing 24hours worth of tick data (on Monday) in one tight loop before it would allow the clock to advance again. So the tick broadcaster was in a tight (blocking) loop to encode and publish a few million ticks to 4 subscribers within the same process as well as to a few other subscribers in other processes/components. This is exactly where it died. Hypothesis: As a rule, reactors aren't supposed to "block" except for very short periods of time to accomplish their work. Since each "reactor tick" could publish its usual 1 second (or 1 minute) of tick data in a few dozen milliseconds, this "blocking" really didn't last long. When it had to publish 24 *hours* worth of tick data, it was now blocking for tens of seconds before letting the reactor loop recycle and allow all registered sockets to get serviced. 1. The publisher pushed too much data to its socket. HWM was set to default, so it had no limit other than memory (many GBs which I could see from monitoring were not being consumed). One of the subscribers shares the same reactor loop as the publisher, so it didn't get a chance to read anything off of its SUB socket *until the PUB was done*. The commands for it backed up and overflowed the mailbox. or 2. The other 5 subscribers (3 more in process, 2 in other processes) were too slow in reading from the single publisher. Commands in their mailboxes backed up and overflowed the mailbox. My money is on #1. It kind of feeds into Martin Sustrik's explanation about how mailboxes are used for signaling between I/O threads and to user threads. If none of them call a 0mq function for a while, the commands can back up. In my case, the publisher *was* calling zmq_send() but one of the receivers was on the *same* thread and had no opportunity to call zmq_recv() until the publisher was done sending millions of messages. The Fix: I fixed it by finally detecting and repairing the "weekend clock skip" bug. It no longer tries to publish 24 hours worth of ticks in one shot. The *most* it will try to publish at once is 1 minute. I let it run overnight with the fix and it didn't assert at all. <sigh of relief> So it's not really "fixed" in the 0mq library; I just avoid the condition that overruns the OS buffers and causes the library to abort. Still, this seems like a use-case that 0mq should be able to handle without aborting. It appears to stem from its use of a socketpair which has limits set by the OS. Perhaps it should replace the socketpair with an internal memory queue of some nature to accomplish the same task. Then the queue size would be limited only by available heap space. Thank you to all of those on the list who participated in this thread or emailed me off-list. I appreciated your ideas. It would have taken me far longer to resolve this problem without your suggestions. Many, many thanks. cr
_______________________________________________ zeromq-dev mailing list [email protected] http://lists.zeromq.org/mailman/listinfo/zeromq-dev
