from:"Gerry Steele"

Re: [zeromq-dev] How to define a global queue

2015-12-22 Thread Gerry Steele

I think the question you must ask yourself is how have you compared a
practical implementation of your business requirements with different
products.

Your question makes it seem like you may well have a dull naive
understanding of complex software systems.

No one here can answer your questions unless you want consultants who can
consider your business complexities.

Forgive my frankness.

On 22 Dec 2015 3:32 pm, "Louis Hust"  wrote:
>
> Hi, all
>
> We are designing a distributed software, we'd like to have a global
queue, and many producer programs can send message
> to the global queue, and many consumer programs can receive message from
the global queue.
> Each message in global queue can be consumed by each consumer
independently, not in load balance way.
>
> RabbitMQ can satisfy our need, but we want to compare with more MQ, so
ZMQ up to us for it's performance.
>
> But we found ZMQ is not an independent Software, instead it's library.
>
> If ZMQ satisfy this situation? Can anyone give me some idea about it?
>
> I am newbee for ZMQ, any idea will be appreciated!!!
>
> ___
> zeromq-dev mailing list
> zeromq-dev@lists.zeromq.org
> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>
___
zeromq-dev mailing list
zeromq-dev@lists.zeromq.org
http://lists.zeromq.org/mailman/listinfo/zeromq-dev

Re: [zeromq-dev] Reliable pub sub on c++ linux

2015-04-29 Thread Gerry Steele

That doesn't make sense. Did you set hwm on both sides of the connection?
On 28 Apr 2015 17:26, "Peter Krey"  wrote:

> this fixed it so that the pub can hold up to a few seconds of throughput
> in memory
>
> int hwm = 900;
> publisher.setsockopt( ZMQ_SNDHWM, &hwm, sizeof (hwm));
>
> the documentation said that a hwm of zero would never flood; i think what
> happened was  memory couldn't be allocated fast enough. my test app is
> sending a few million msgs/sec on the publisher.
>
>
>
> On Mon, Apr 27, 2015 at 1:08 PM, Peter Krey  wrote:
>
>> I have HWM set to zero on recv and pub. I am keeping track of sequence
>> numbers recved on the sub socket which are sent out by the pub socket. Here
>> is an example output.
>>
>> The pub socket is publishing a uint64_t seqNumber. If i change the socket
>> types to pair, no seqNumbers are ever missed.
>>
>>
>>  seqNumber missed 2301000
>>  seqNumber missed 2303206
>>  seqNumber missed 2305000
>>  seqNumber missed 2306820
>>  seqNumber missed 2309353
>>  seqNumber missed 2311575
>>  seqNumber missed 2314514
>>  seqNumber missed 2316767
>>  seqNumber missed 2318000
>>  seqNumber missed 2319924
>>  seqNumber missed 2321730
>>  seqNumber missed 2323618
>>  seqNumber missed 2325000
>>  seqNumber missed 2326963
>>  seqNumber missed 2329000
>>  seqNumber missed 2330664
>>  seqNumber missed 2333000
>>  seqNumber missed 2334997
>>  seqNumber missed 2336000
>>  seqNumber missed 2338000
>>  seqNumber missed 234
>>  seqNumber missed 2343000
>>  seqNumber missed 2344933
>>  seqNumber missed 2346401
>>  seqNumber missed 2349000
>>  seqNumber missed 2351000
>>  seqNumber missed 2352309
>>  seqNumber missed 2354198
>>  seqNumber missed 2356000
>>  seqNumber missed 2357645
>>
>> On Mon, Apr 27, 2015 at 12:56 PM, Pieter Hintjens  wrote:
>>
>>> You can increase the HWM on sender and receiver to match your
>>> expectations.
>>>
>>> If you set the HWM to zero there will never be any message loss, which
>>> also means your publisher will explode if the subscriber stops
>>> reading.
>>>
>>>
>>>
>>> On Mon, Apr 27, 2015 at 9:03 PM, Peter Krey  wrote:
>>> > Hi,
>>> >
>>> > What is the best way to get guaranteed in order delivery over pub-sub
>>> > framework in zmq using c++ on linux?
>>> >
>>> > I have a test server and client running zmq pub and sub sockets. The
>>> pub
>>> > pushes sequence numbers as fast as possible in a tight loop. The sub
>>> socket
>>> > misses around one in every 10k messages.
>>> >
>>> > Thanks
>>> >
>>> > ___
>>> > zeromq-dev mailing list
>>> > zeromq-dev@lists.zeromq.org
>>> > http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>>> >
>>> ___
>>> zeromq-dev mailing list
>>> zeromq-dev@lists.zeromq.org
>>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>>>
>>
>>
>
> ___
> zeromq-dev mailing list
> zeromq-dev@lists.zeromq.org
> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>
>
___
zeromq-dev mailing list
zeromq-dev@lists.zeromq.org
http://lists.zeromq.org/mailman/listinfo/zeromq-dev

Re: [zeromq-dev] Reliable pub sub on c++ linux

2015-04-27 Thread Gerry Steele

Do you have the code where you set the hwms?
On 27 Apr 2015 21:08, "Peter Krey"  wrote:

> I have HWM set to zero on recv and pub. I am keeping track of sequence
> numbers recved on the sub socket which are sent out by the pub socket. Here
> is an example output.
>
> The pub socket is publishing a uint64_t seqNumber. If i change the socket
> types to pair, no seqNumbers are ever missed.
>
>
>  seqNumber missed 2301000
>  seqNumber missed 2303206
>  seqNumber missed 2305000
>  seqNumber missed 2306820
>  seqNumber missed 2309353
>  seqNumber missed 2311575
>  seqNumber missed 2314514
>  seqNumber missed 2316767
>  seqNumber missed 2318000
>  seqNumber missed 2319924
>  seqNumber missed 2321730
>  seqNumber missed 2323618
>  seqNumber missed 2325000
>  seqNumber missed 2326963
>  seqNumber missed 2329000
>  seqNumber missed 2330664
>  seqNumber missed 2333000
>  seqNumber missed 2334997
>  seqNumber missed 2336000
>  seqNumber missed 2338000
>  seqNumber missed 234
>  seqNumber missed 2343000
>  seqNumber missed 2344933
>  seqNumber missed 2346401
>  seqNumber missed 2349000
>  seqNumber missed 2351000
>  seqNumber missed 2352309
>  seqNumber missed 2354198
>  seqNumber missed 2356000
>  seqNumber missed 2357645
>
> On Mon, Apr 27, 2015 at 12:56 PM, Pieter Hintjens  wrote:
>
>> You can increase the HWM on sender and receiver to match your
>> expectations.
>>
>> If you set the HWM to zero there will never be any message loss, which
>> also means your publisher will explode if the subscriber stops
>> reading.
>>
>>
>>
>> On Mon, Apr 27, 2015 at 9:03 PM, Peter Krey  wrote:
>> > Hi,
>> >
>> > What is the best way to get guaranteed in order delivery over pub-sub
>> > framework in zmq using c++ on linux?
>> >
>> > I have a test server and client running zmq pub and sub sockets. The pub
>> > pushes sequence numbers as fast as possible in a tight loop. The sub
>> socket
>> > misses around one in every 10k messages.
>> >
>> > Thanks
>> >
>> > ___
>> > zeromq-dev mailing list
>> > zeromq-dev@lists.zeromq.org
>> > http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>> >
>> ___
>> zeromq-dev mailing list
>> zeromq-dev@lists.zeromq.org
>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>>
>
>
> ___
> zeromq-dev mailing list
> zeromq-dev@lists.zeromq.org
> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>
>
___
zeromq-dev mailing list
zeromq-dev@lists.zeromq.org
http://lists.zeromq.org/mailman/listinfo/zeromq-dev

Re: [zeromq-dev] SOLUTION-- Encryption failure problem and wireless connectivity

2014-12-19 Thread Gerry Steele

Do you see the same for any other tcp traffic?
On 15 Dec 2014 19:40, "Steve Murphy"  wrote:

> Pieter--
>
> I'm sorry if I gave the wrong impression. I didn't make it clear
> that my app was running on over a dozen different hosts,
> all spread across the internet. I was also trying to
> run it on my home machine, and found I couldn't get a curve
> connection to any other box from just my local machine.   I don't think
> that
> CURVE affects the network at all. My theory is that my wireless
> connection was so lousy at the moment (it varies with
> time and, apparently, weather conditions), irregardless
> of CURVE, that CURVE couldn't get in a word
> edgewise and the handshake couldn't complete the
> startup protocol. The same software, running on a machine
> a more solid network connection, yielded successful
> CURVE encryption and communication.
>
> At least, that's my theory as to why it wouldn't work on my machine.
>
> I suspect that, had I run the program some hours/days earlier or
> later, I wouldn't have had any problems. Such are the transient
> vagaries of wireless, temperature, weather, auroras, sunspots,
> and maybe even the phase of the moon.
>
> murf
>
>
>
> On Mon, Dec 15, 2014 at 5:10 AM, Pieter Hintjens  wrote:
>
>> There's no theory where CURVE encryption can affect network
>> performance. So anything you're seeing which suggests this is
>> coincidence. The only plausible interference I could imagine is heavy
>> CPU cost on a node causing it to be slightly slower, yet this should
>> make the network happier, not sadder.
>>
>> If you have any theory how encryption could affect network
>> reliability, I'd like to hear it.
>>
>> On Sun, Dec 14, 2014 at 2:47 AM, Steve Murphy  wrote:
>> > Pieter--
>> >
>> > C
>> > ould you elaborate a little on the coincidence?
>> > I, and maybe others, could benefit by your thoughts,
>> > I believe!
>> >
>> > murf
>> >
>> >
>> > On Thu, Dec 11, 2014 at 10:45 AM, Pieter Hintjens 
>> wrote:
>> >>
>> >> Since CurveZMQ runs over TCP, and the encryption is entirely
>> >> abstracted from the network, this is probably coincidence.
>> >>
>> >> On Thu, Dec 11, 2014 at 4:17 PM, Steve Murphy 
>> wrote:
>> >> > Hello, fellow zeromq devs!
>> >> >
>> >> > Some months ago, I posted a problem I was having,
>> >> > that was quite vexing. Since then, I figured it out, and
>> >> > thought I should share before it completely gets forgotten.
>> >> >
>> >> > The problem appeared at first blush, to be an incompatibility
>> >> > between Ubuntu and CentOS. My home node is running
>> >> > Ubuntu, and all my other nodes were mostly CentOS. All
>> >> > the CentoOS nodes were behaving normally, with CURVE
>> >> > encryption between them. But on my home Ubuntu machine,
>> >> > the same code would not establish an encrypted connection.
>> >> >
>> >> > At last, after wiresharking the back and forth protocol of CURVE
>> >> > encryption, I saw that the protocol seemed to get to a certain
>> >> > stage, and then just quit. I delved deeper and deeper into the code
>> >> > underneath, and still, no particular failure point!
>> >> >
>> >> > Then it hit me: My home is connected to the internet via a wireless
>> >> > connection. Could it be my connection? I did an MTR betwen my home
>> >> > machine and the other centOS mechines, and sure enough, I was
>> >> > seeing a 50% packet loss! I had not noticed any performance drop
>> >> > in my connection; no slowdowns. Normally mtr between my home and
>> >> > the internet is pretty clean, but that week, it was a bit shaky.
>> >> >
>> >> > Moved the testing off my machine and no problem.
>> >> >
>> >> > So, I think I may have a found a packet loss percentage at which
>> >> > CURVE encryption will no longer operate (but unencrypted connections
>> >> > will),
>> >> > but, to be fair, the connection is via Motorola Canopy hardware, and
>> the
>> >> > other end of the link is somewhere near 6 miles away. Packet losses
>> >> > in that environment could get somewhat selective as to size or
>> timing.
>> >> >
>> >> > Just a heads-up to the other newbies on this mailing list, of a
>> possible
>> >> > pitfall, and how to detect it.
>> >> >
>> >> > murf
>> >> >
>> >
>> > --
>> >
>> > Steve Murphy
>> > ParseTree Corporation
>> > 57 Lane 17
>> > Cody, WY 82414
>> > ✉  murf at parsetree dot com
>> > ☎ 307-899-5535
>> >
>> >
>> >
>> > ___
>> > zeromq-dev mailing list
>> > zeromq-dev@lists.zeromq.org
>> > http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>> >
>> ___
>> zeromq-dev mailing list
>> zeromq-dev@lists.zeromq.org
>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>>
>
>
>
> --
>
> Steve Murphy
> ParseTree Corporation
> 57 Lane 17
> Cody, WY 82414
> ✉  murf at parsetree dot com
> ☎ 307-899-5535
>
>
>
> ___
> zeromq-dev mailing list
> zeromq-dev@lists.zeromq.org
> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>
>
__

Re: [zeromq-dev] Process eating 100 % of one core

2014-11-07 Thread Gerry Steele

How long does the cpu use last for when it does happen? Or does it stay at
100pct till restart?
On 7 Nov 2014 10:21, "Emmanuel TAUREL"  wrote:

> Hello all,
>
> We are using ZMQ (still release 3.2.4) mainly on Linux boxes. We are
> using the PUB/SUB model.
> Our system runs 24/7. From time to time, we have some of our PUB
> processes eating 100 % of one core of our CPU's.
> We don't know yet what exactly triggers this phenomenon and therefore we
> are not able to repoduce it. It does not happen so often (once every 3/6
> months!!)
> Nevertheless, we did some analysis last time it happens.
>
> Here are the result of "strace" on the PUB process
>
> 2889  10:53:18.021013 epoll_wait(19, {{EPOLLERR|EPOLLHUP,
> {u32=335547808, u64=140097873776032}}}, 256, 4294967295) = 1
> 2889  10:53:18.021041 epoll_ctl(19, EPOLL_CTL_MOD, 49, {0,
> {u32=335547808, u64=140097873776032}}) = 0
> 2889  10:53:18.021068 epoll_wait(19, {{EPOLLERR|EPOLLHUP,
> {u32=335547808, u64=140097873776032}}}, 256, 4294967295) = 1
> 2889  10:53:18.021096 epoll_ctl(19, EPOLL_CTL_MOD, 49, {0,
> {u32=335547808, u64=140097873776032}}) = 0
> 2889  10:53:18.021123 epoll_wait(19, {{EPOLLERR|EPOLLHUP,
> {u32=335547808, u64=140097873776032}}}, 256, 4294967295) = 1
> 2889  10:53:18.021151 epoll_ctl(19, EPOLL_CTL_MOD, 49, {0,
> {u32=335547808, u64=140097873776032}}) = 0
> 2889  10:53:18.021178 epoll_wait(19, {{EPOLLERR|EPOLLHUP,
> {u32=335547808, u64=140097873776032}}}, 256, 4294967295) = 1
> 2889  10:53:18.021206 epoll_ctl(19, EPOLL_CTL_MOD, 49, {0,
> {u32=335547808, u64=140097873776032}}) = 0
> 2889  10:53:18.021233 epoll_wait(19, {{EPOLLERR|EPOLLHUP,
> {u32=335547808, u64=140097873776032}}}, 256, 4294967295) = 1
> 2889  10:53:18.021260 epoll_ctl(19, EPOLL_CTL_MOD, 49, {0,
> {u32=335547808, u64=140097873776032}}) = 0
> 2889  10:53:18.021288 epoll_wait(19, {{EPOLLERR|EPOLLHUP,
> {u32=335547808, u64=140097873776032}}}, 256, 4294967295) = 1
>
>  From the number of couples epoll_wait()/epoll_ctl() and their period (2
> times in in 100 us), it is clear that this is this thread which eats the
> CPU.
> Form the flag returned by epoll_wait() (EPOLLERR|EPOLLHUP), it seems
> that something wrong happens on one of the file descriptor (number 49 if
> I look
> at epoll_ctl() argument. It is confirmed by the result of "lsof" on the
> same PUB process:
>
> Starter 2863 dserver   49u  sock0,6  0t0 7902 can't
> identify protocol
>
> If I take control of the PUB process with gdb and if I request for this
> thread stack trace, I have
>
> #0  0x7fb65d3205ca in epoll_ctl () from /lib/x86_64-linux-gnu/libc.so.6
> #1  0x7fb65e23c298 in zmq::epoll_t::reset_pollin (this=,
>  handle_=) at epoll.cpp:101
> #2  0x7fb65e253da1 in zmq::stream_engine_t::in_event
> (this=0x7fb6509d8c10)
>  at stream_engine.cpp:216
> #3  0x7fb65e23c46b in zmq::epoll_t::loop (this=0x7fb6611c5b70)
>  at epoll.cpp:154
> #4  0x7fb65e257de6 in thread_routine (arg_=0x7fb6611c5be0) at
> thread.cpp:83
> #5  0x7fb65de0d0a4 in start_thread ()
> from /lib/x86_64-linux-gnu/libpthread.so.0
> #6  0x7fb65d32004d in clone () from /lib/x86_64-linux-gnu/libc.so.6
>
> Even if something wrong has happened on the socket associated to fd 49,
> I think Zmq should not enter into a "crazy" loop.
> Is it a known issue?
> Is there something we could do to prevent this to happen anymore?
>
> Thank's in advance for your help
>
> Emmanuel
>
>
>
> ___
> zeromq-dev mailing list
> zeromq-dev@lists.zeromq.org
> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>
___
zeromq-dev mailing list
zeromq-dev@lists.zeromq.org
http://lists.zeromq.org/mailman/listinfo/zeromq-dev

Re: [zeromq-dev] zeromq, abort(), and high reliability environments

2014-08-10 Thread Gerry Steele

How about not sending an ack to your users until the unit of work they
input has cleared the pipeline? That way the input application can decide
what to do. Obviously depends on your application...
On 9 Aug 2014 03:12, "Dylan Cali"  wrote:

> Hey guys,
>
> What is the right way to use zeromq in high reliability environments?  In
> certain insane/impossible situations (e.g. out of memory, out of file
> descriptors, etc) libzmq assertions will fail and it will abort.
>
> I came across a thread by Martin where he addresses a similar situation
> [1].  If
> I'm reading his argument correctly, the gist in general is: If it's
> impossible
> to connect due to some error, than you're dead in the water anyways.  Crash
> loudly and immediately with the error (the Fail-Fast paradigm), fix the
> error,
> and then restart the process.
>
> I actually agree with this philosophy, but a user would say "You
> terminated my
> entire application stack and didn't give me a chance to cleanup!  I had
> very important data
> in memory and it's gone!"  This is especially the case with Java
> programmers who
> Always Expect an Exception.
>
> For example, in the case of being out of file descriptors, the jzmq
> bindings will abort,
> but a Java programmer would expect to get an Exception with the "Too Many
> Open
> Files" error.
>
> I guess one possible retort is: if the data in memory was so important, why
> didn't you have redundancy/failover/some kind of playback log? Why did you
> put
> all your eggs in one basket assuming your process would never crash?
>
> Is that the right answer here (basically blame the user for not having
> disaster
> recovery), or is there a different/better way to address the high
> reliability
> scenario?
>
> I came across another thread where Martin gets this very
> complaint (zeromq aborted my application!), and basically says well, if
> you really, really want to,
> you can install a signal handler for SIGABRT, but caveat emptor [2].
>
> To me, this is playing with fire, dangerous, and just a Bad Idea. But
> maybe it's
> worth the risk in high reliability environments?
>
>
> Thanks in advance for any advice or thoughts.
>
> [1] http://lists.zeromq.org/pipermail/zeromq-dev/2009-May/000784.html
> [2] http://lists.zeromq.org/pipermail/zeromq-dev/2011-October/013608.html
>
>
> ___
> zeromq-dev mailing list
> zeromq-dev@lists.zeromq.org
> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>
>
___
zeromq-dev mailing list
zeromq-dev@lists.zeromq.org
http://lists.zeromq.org/mailman/listinfo/zeromq-dev

Re: [zeromq-dev] question about sub/pub speed and capability

2014-07-14 Thread gerry . steele

Given the specificity of your requirement would it not be easier to ‎implement a pub/sub on your own hardware and network and measure how long it takes with payloads which are realistically sized to match what you are sending? You can't escape testing it yourself anyway. Benchmark apps are included in the zmq source code. Also be aware the high watermarks default to 1000 so if you burst at more than 1k sends you can lose messages with pub/sub.    From: Johnny LeeSent: Monday, 14 July 2014 15:00To: zeromq-dev@lists.zeromq.orgReply To: ZeroMQ development listSubject: [zeromq-dev] question about sub/pub speed and capabilitygeneral question about sub/pub capabilities:our program needs to quickly send a message to possibly hundreds of receivers. let's say we want 1 publisher publishing a single, simple message to 400 subscribers within 5 seconds. as I understand that the ZMQ protocol really has the Subscribers "pull" the message as oppose to a single sender "pushing out" hundreds of messages to the different receivers, how fast can all the of the receiver/subscribers get the message?would this depend on the "horse-power" of the workstations?how robust the network infra-structure is?what would be the limitations that would slow down the message transmission?please let me know if you need clarifying details.thank you.___
zeromq-dev mailing list
zeromq-dev@lists.zeromq.org
http://lists.zeromq.org/mailman/listinfo/zeromq-dev
___
zeromq-dev mailing list
zeromq-dev@lists.zeromq.org
http://lists.zeromq.org/mailman/listinfo/zeromq-dev

Re: [zeromq-dev] PUB/SUB unreliabiliity

2014-06-16 Thread Gerry Steele

Thanks, there was also an error in my error handling thus why it was never
flagged. I imagine its the same in my app code. uint64_t came from the cli
argument handling lib thus why it was used over int. A lesson learned there.




On 16 June 2014 19:13, Pieter Hintjens  wrote:

> And indeed, this code prints "-1" as the return code:
>
> void *context = zmq_ctx_new ();
> void *publisher = zmq_socket (context, ZMQ_PUB);
> uint64_t rhwm = 0;
> int rc = zmq_setsockopt (publisher, ZMQ_SNDHWM, &rhwm, sizeof (rhwm));
> printf ("RC=%d\n", rc);
>
> -Pieter
>
> On Mon, Jun 16, 2014 at 8:03 PM, Pieter Hintjens  wrote:
> > Hmm, it does check the size of the passed argument, and if that's
> > wrong, returns an error (which you do check for).
> >
> > On Mon, Jun 16, 2014 at 7:36 PM, Gerry Steele 
> wrote:
> >> Hi Pieter, you have struck on something there.
> >>
> >> Converting it to int seems to yield the correct behaviour.
> >>
> >> I guess the way setsockopt works type coercion doesn't happen.
> >>
> >> Embarrassing! But at least we got to the bottom of it.
> >>
> >> I was able to send billions of events without incurring loss. Apologies
> for
> >> taking everyones time.
> >>
> >> Thanks all.
> >>
> >> g
> >>
> >>
> >>
> >> On 16 June 2014 18:22, Pieter Hintjens  wrote:
> >>>
> >>> OK, just to double check, you're using ZeroMQ 4.0.x? In your test case
> >>> (which I'm belatedly looking at), you use a uint64_t for the hwm
> >>> values; it should be int. Probably not significant.
> >>>
> >>> On Mon, Jun 16, 2014 at 6:20 PM, Gerry Steele 
> >>> wrote:
> >>> > In the patent email I have links to the minimal examples on
> >>> > gist.github.com
> >>> >
> >>> > Happy to open an issue and commit them later on if that's what you
> need.
> >>> >
> >>> > Thanks
> >>> >
> >>> > On 16 Jun 2014 14:43, "Pieter Hintjens"  wrote:
> >>> >>
> >>> >> Gerry, can you provide a minimal test case that shows the behavior?
> >>> >> Thanks.
> >>> >>
> >>> >> On Mon, Jun 16, 2014 at 12:49 PM, Gerry Steele <
> gerry.ste...@gmail.com>
> >>> >> wrote:
> >>> >> > Thanks Peter. I can't try this out till I get home but it is
> looking
> >>> >> > like
> >>> >> > hwm overflows.
> >>> >> >
> >>> >> > If you run the utilities you notice the drops start happening
> after
> >>> >> > precisely 1000 events in the first instance (which Is the default
> >>> >> > hwm).
> >>> >> >
> >>> >> > There was another largely ignored thread about this recently
> >>> >> > mentioning
> >>> >> > the
> >>> >> > same problem.
> >>> >> >
> >>> >> > I also tried setting the hwm values to a number greater than the
> >>> >> > number
> >>> >> > of
> >>> >> > events and it seemed to have no effect either.
> >>> >> >
> >>> >> > g
> >>> >> >
> >>> >> > On 16 Jun 2014 09:32, "Pieter Hintjens"  wrote:
> >>> >> >>
> >>> >> >> On Mon, Jun 16, 2014 at 9:10 AM, Gerry Steele
> >>> >> >> 
> >>> >> >> wrote:
> >>> >> >>
> >>> >> >> > Big chunks of messages go missing mid flow and then pick up
> again.
> >>> >> >> > There
> >>> >> >> > is
> >>> >> >> > no literature that indicates that is expected behaviour.
> >>> >> >>
> >>> >> >> Right. The two plausible causes for this are (a) HWM overflows,
> and
> >>> >> >> (b) temporary network disconnects. You have excluded (a), though
> to
> >>> >> >> be
> >>> >> >> paranoid I'd probably add some temporary logging to libzmq's pub
> >>> >> >> socket to shout out if/when it does hit the HWM. To detect (b)
> you
> >>> >> >> could use the socket monitoring.  The third p

Re: [zeromq-dev] PUB/SUB unreliabiliity

2014-06-16 Thread Gerry Steele

Hi Pieter, you have struck on something there.

Converting it to int seems to yield the correct behaviour.

I guess the way setsockopt works type coercion doesn't happen.

Embarrassing! But at least we got to the bottom of it.

I was able to send billions of events without incurring loss. Apologies for
taking everyones time.

Thanks all.

g



On 16 June 2014 18:22, Pieter Hintjens  wrote:

> OK, just to double check, you're using ZeroMQ 4.0.x? In your test case
> (which I'm belatedly looking at), you use a uint64_t for the hwm
> values; it should be int. Probably not significant.
>
> On Mon, Jun 16, 2014 at 6:20 PM, Gerry Steele 
> wrote:
> > In the patent email I have links to the minimal examples on
> gist.github.com
> >
> > Happy to open an issue and commit them later on if that's what you need.
> >
> > Thanks
> >
> > On 16 Jun 2014 14:43, "Pieter Hintjens"  wrote:
> >>
> >> Gerry, can you provide a minimal test case that shows the behavior?
> >> Thanks.
> >>
> >> On Mon, Jun 16, 2014 at 12:49 PM, Gerry Steele 
> >> wrote:
> >> > Thanks Peter. I can't try this out till I get home but it is looking
> >> > like
> >> > hwm overflows.
> >> >
> >> > If you run the utilities you notice the drops start happening after
> >> > precisely 1000 events in the first instance (which Is the default
> hwm).
> >> >
> >> > There was another largely ignored thread about this recently
> mentioning
> >> > the
> >> > same problem.
> >> >
> >> > I also tried setting the hwm values to a number greater than the
> number
> >> > of
> >> > events and it seemed to have no effect either.
> >> >
> >> > g
> >> >
> >> > On 16 Jun 2014 09:32, "Pieter Hintjens"  wrote:
> >> >>
> >> >> On Mon, Jun 16, 2014 at 9:10 AM, Gerry Steele <
> gerry.ste...@gmail.com>
> >> >> wrote:
> >> >>
> >> >> > Big chunks of messages go missing mid flow and then pick up again.
> >> >> > There
> >> >> > is
> >> >> > no literature that indicates that is expected behaviour.
> >> >>
> >> >> Right. The two plausible causes for this are (a) HWM overflows, and
> >> >> (b) temporary network disconnects. You have excluded (a), though to
> be
> >> >> paranoid I'd probably add some temporary logging to libzmq's pub
> >> >> socket to shout out if/when it does hit the HWM. To detect (b) you
> >> >> could use the socket monitoring.  The third possibility is that
> you're
> >> >> doing something wrong with subscriptions... though that seems
> >> >> unlikely.
> >> >>
> >> >> -Pieter
> >> >> ___
> >> >> zeromq-dev mailing list
> >> >> zeromq-dev@lists.zeromq.org
> >> >> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
> >> >
> >> >
> >> > ___
> >> > zeromq-dev mailing list
> >> > zeromq-dev@lists.zeromq.org
> >> > http://lists.zeromq.org/mailman/listinfo/zeromq-dev
> >> >
> >> ___
> >> zeromq-dev mailing list
> >> zeromq-dev@lists.zeromq.org
> >> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
> >
> >
> > ___
> > zeromq-dev mailing list
> > zeromq-dev@lists.zeromq.org
> > http://lists.zeromq.org/mailman/listinfo/zeromq-dev
> >
> ___
> zeromq-dev mailing list
> zeromq-dev@lists.zeromq.org
> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>



-- 
Gerry Steele
___
zeromq-dev mailing list
zeromq-dev@lists.zeromq.org
http://lists.zeromq.org/mailman/listinfo/zeromq-dev

Re: [zeromq-dev] PUB/SUB unreliabiliity

2014-06-16 Thread Gerry Steele

In the patent email I have links to the minimal examples on gist.github.com

Happy to open an issue and commit them later on if that's what you need.

Thanks
On 16 Jun 2014 14:43, "Pieter Hintjens"  wrote:

> Gerry, can you provide a minimal test case that shows the behavior? Thanks.
>
> On Mon, Jun 16, 2014 at 12:49 PM, Gerry Steele 
> wrote:
> > Thanks Peter. I can't try this out till I get home but it is looking like
> > hwm overflows.
> >
> > If you run the utilities you notice the drops start happening after
> > precisely 1000 events in the first instance (which Is the default hwm).
> >
> > There was another largely ignored thread about this recently mentioning
> the
> > same problem.
> >
> > I also tried setting the hwm values to a number greater than the number
> of
> > events and it seemed to have no effect either.
> >
> > g
> >
> > On 16 Jun 2014 09:32, "Pieter Hintjens"  wrote:
> >>
> >> On Mon, Jun 16, 2014 at 9:10 AM, Gerry Steele 
> >> wrote:
> >>
> >> > Big chunks of messages go missing mid flow and then pick up again.
> There
> >> > is
> >> > no literature that indicates that is expected behaviour.
> >>
> >> Right. The two plausible causes for this are (a) HWM overflows, and
> >> (b) temporary network disconnects. You have excluded (a), though to be
> >> paranoid I'd probably add some temporary logging to libzmq's pub
> >> socket to shout out if/when it does hit the HWM. To detect (b) you
> >> could use the socket monitoring.  The third possibility is that you're
> >> doing something wrong with subscriptions... though that seems
> >> unlikely.
> >>
> >> -Pieter
> >> ___
> >> zeromq-dev mailing list
> >> zeromq-dev@lists.zeromq.org
> >> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
> >
> >
> > ___
> > zeromq-dev mailing list
> > zeromq-dev@lists.zeromq.org
> > http://lists.zeromq.org/mailman/listinfo/zeromq-dev
> >
> ___
> zeromq-dev mailing list
> zeromq-dev@lists.zeromq.org
> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>
___
zeromq-dev mailing list
zeromq-dev@lists.zeromq.org
http://lists.zeromq.org/mailman/listinfo/zeromq-dev

Re: [zeromq-dev] PUB/SUB unreliabiliity

2014-06-16 Thread Gerry Steele

Thanks Peter. I can't try this out till I get home but it is looking like
hwm overflows.

If you run the utilities you notice the drops start happening after
precisely 1000 events in the first instance (which Is the default hwm).

There was another largely ignored thread about this recently mentioning the
same problem.

I also tried setting the hwm values to a number greater than the number of
events and it seemed to have no effect either.

g

On 16 Jun 2014 09:32, "Pieter Hintjens"  wrote:

> On Mon, Jun 16, 2014 at 9:10 AM, Gerry Steele 
> wrote:
>
> > Big chunks of messages go missing mid flow and then pick up again. There
> is
> > no literature that indicates that is expected behaviour.
>
> Right. The two plausible causes for this are (a) HWM overflows, and
> (b) temporary network disconnects. You have excluded (a), though to be
> paranoid I'd probably add some temporary logging to libzmq's pub
> socket to shout out if/when it does hit the HWM. To detect (b) you
> could use the socket monitoring.  The third possibility is that you're
> doing something wrong with subscriptions... though that seems
> unlikely.
>
> -Pieter
> ___
> zeromq-dev mailing list
> zeromq-dev@lists.zeromq.org
> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>
___
zeromq-dev mailing list
zeromq-dev@lists.zeromq.org
http://lists.zeromq.org/mailman/listinfo/zeromq-dev

Re: [zeromq-dev] PUB/SUB unreliabiliity

2014-06-16 Thread Gerry Steele

The issue I'm seeing is not indicative of having no subscribers and nothing
to do with sending backed up data to late subscribers. That I'd something
pub sub should never do. In my testing I ruled that out.

Big chunks of messages go missing mid flow and then pick up again. There is
no literature that indicates that is expected behaviour.

On 16 Jun 2014 05:40, "Justin Karneges"  wrote:

> Pubsub is by definition unreliable since messages are dropped if there
> are no subscribers.
>
> An argument could be made that ZeroMQ ought to support a reliable
> reconnection for known subscribers, so that temporary disconnects
> between publisher and subscriber don't result in any lost messages.
> However, the key here is "temporary". If a subscriber remains
> disconnected for a very long time, then the question becomes how long
> should the publisher queue messages for a lost subscriber. And unless
> the answer is "for all time", then, well... you still have unreliability.
>
> So, because subscribers may or may not exist at the time of publish, and
> because you'll never have an infinite queue, it's best to just assume
> that pubsub isn't reliable. Build reliability around it.
>
> Some philosophy:
> http://zguide.zeromq.org/page:all#Pros-and-Cons-of-Pub-Sub
>
> On 06/15/2014 04:43 AM, Gerry Steele wrote:
> > Thanks Charles, that's pretty much my understanding too. Meaning this is
> > a bug in my implementation or in zeromq.
> >
> > I understand the implications of the slow consumer problem but the
> > fundamental issue here is to establish trust in PUB/SUB.
> >
> >
> > On 14 June 2014 21:09, Charles Remes  > <mailto:li...@chuckremes.com>> wrote:
> >
> > Let’s back up for a second.
> >
> > Take a look at the man page for zmq_setsockopt and read the section
> > on ZMQ_SNDHWM. It clearly states that zero means “no limit.” Second,
> > it also states that when the socket reaches its exceptional state
> > then it will either block or drop messages depending on socket type.
> >
> > Next, look at the man page for zmq_socket and check the ZMQ_PUB
> > section. The socket will reach its mute state (its exceptional
> > state) when it reaches it high water mark. When it’s mute, it will
> > drop messages.
> >
> > So, taking the two together then a socket with a ZMQ_SNDHWM of 0
> > should never drop messages because it will never reach its mute
> state.
> >
> > The one exception to this is when there are no SUB sockets connected
> > to the PUB socket. When there are no connections, all messages are
> > dropped (because no one is listening and there are no queues
> created).
> >
> > However, I highly recommend *against* setting HWM to 0 for a PUB
> > socket. Here’s why:
> >
> > 1. It gives you a false sense of security that all messages will be
> > delivered.
> > If the publishing process dies, any messages in queue go with it so
> > they’ll never get delivered.
> >
> > 2. Your subscribers might be too slow.
> > If your subscribers can’t keep up with the message flow and the
> > publisher starts queueing, it *will* run out of memory. You’ll
> > either exhaust the amount of memory allowed by your process, or your
> > OS will start paging & swapping and you’ll wish the process had just
> > died.
> >
> > cr
> >
> >
> > On Jun 13, 2014, at 5:34 PM, Gerry Steele  > <mailto:gerry.ste...@gmail.com>> wrote:
> >
> >> Hi Brian
> >>
> >> I noticed your comment on another thread about this and I think
> >> you got it a bit wrong:
> >>
> >> > The high water mark is a hard limit on the maximum number of
> >> outstanding messages ØMQ shall queue in memory for any single peer
> >> that the specified/socket/is communicating with.*A value of zero
> >> means no limit.*
> >> *
> >> *
> >> and from your link:
> >>
> >> > Since v3.x, ØMQ forces default limits on its internal buffers
> >> (the so-called high-water mark or HWM), so publisher crashes are
> >> rarer *unless you deliberately set the HWM to infinite.*
> >>
> >> Nothing I read indicates anything other than the fact that no
> >> messages post connections being made should be dropped.
> >>
> >> Thanks
> >> G
> >>
> >>
> >>
> >> On

Re: [zeromq-dev] PUB/SUB unreliabiliity

2014-06-15 Thread Gerry Steele

Thanks Charles, that's pretty much my understanding too. Meaning this is a
bug in my implementation or in zeromq.

I understand the implications of the slow consumer problem but the
fundamental issue here is to establish trust in PUB/SUB.


On 14 June 2014 21:09, Charles Remes  wrote:

> Let’s back up for a second.
>
> Take a look at the man page for zmq_setsockopt and read the section on
> ZMQ_SNDHWM. It clearly states that zero means “no limit.” Second, it also
> states that when the socket reaches its exceptional state then it will
> either block or drop messages depending on socket type.
>
> Next, look at the man page for zmq_socket and check the ZMQ_PUB section.
> The socket will reach its mute state (its exceptional state) when it
> reaches it high water mark. When it’s mute, it will drop messages.
>
> So, taking the two together then a socket with a ZMQ_SNDHWM of 0 should
> never drop messages because it will never reach its mute state.
>
> The one exception to this is when there are no SUB sockets connected to
> the PUB socket. When there are no connections, all messages are dropped
> (because no one is listening and there are no queues created).
>
> However, I highly recommend *against* setting HWM to 0 for a PUB socket.
> Here’s why:
>
> 1. It gives you a false sense of security that all messages will be
> delivered.
> If the publishing process dies, any messages in queue go with it so
> they’ll never get delivered.
>
> 2. Your subscribers might be too slow.
> If your subscribers can’t keep up with the message flow and the publisher
> starts queueing, it *will* run out of memory. You’ll either exhaust the
> amount of memory allowed by your process, or your OS will start paging &
> swapping and you’ll wish the process had just died.
>
> cr
>
>
> On Jun 13, 2014, at 5:34 PM, Gerry Steele  wrote:
>
> Hi Brian
>
> I noticed your comment on another thread about this and I think you got it
> a bit wrong:
>
> > The high water mark is a hard limit on the maximum number of
> outstanding messages ØMQ shall queue in memory for any single peer that the
> specified*socket* is communicating with.* A value of zero means no limit.*
>
> and from your link:
>
> > Since v3.x, ØMQ forces default limits on its internal buffers (the
> so-called high-water mark or HWM), so publisher crashes are rarer *unless
> you deliberately set the HWM to infinite.*
>
> Nothing I read indicates anything other than the fact that no messages
> post connections being made should be dropped.
>
> Thanks
> G
>
>
>
> On 13 June 2014 23:17, Brian Knox  wrote:
>
>> "From what i've read, PUB SUB should be reliable when the _HWM are set to
>> zero (don't drop). By reliable I mean no messages should fail to be
>> delivered to an already connected consumer."
>>
>>
>> Your understanding of pub-sub behavior and how  it interacts with the HWM
>> is incorrect.  Please see: http://zguide.zeromq.org/php:chapter5
>>
>> Brian
>>
>>
>>
>>
>> On Fri, Jun 13, 2014 at 2:33 PM, Gerry Steele 
>> wrote:
>>
>>> I've read everything I can find including the Printed book, but I am at
>>> a loss as to the definitive definition as to how PUB/SUB should behave in
>>> zmq.
>>>
>>> A production system I'm using is experiencing message loss between
>>> several nodes using PUB/SUB.
>>>
>>> From what i've read, PUB SUB should be reliable when the _HWM are set to
>>> zero (don't drop). By reliable I mean no messages should fail to be
>>> delivered to an already connected consumer.
>>>
>>> I implemented some utilities to reproduce the message loss in my system :
>>>
>>> zmq_sub: https://gist.github.com/easytiger/992b3a29eb5c8545d289
>>> zmq_pub: https://gist.github.com/easytiger/e382502badab49856357
>>>
>>>
>>> zmq_pub takes a number of events to send and the logging frequency and
>>> zmq_sub only takes the logging frequency. zmq prints out the number of msgs
>>> received vs the packet contents containing the integer packet count from
>>> the publisher.
>>>
>>> It can be seen when sending events in a tight loop that messages simply
>>> go missing mid way through (loss is not at beginning or end ruling out slow
>>> connectors etc)
>>>
>>> In a small loop it usually works ok:
>>>
>>> $ ./zmq_pub 2000 1000
>>> sent MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #1000 with
>>> rc=58
>>> sent MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #2000 with
&g

Re: [zeromq-dev] PUB/SUB unreliabiliity

2014-06-15 Thread Gerry Steele

Hi Peter,

As per the code binds & connects are done after the SND_HWM and RCV_HWM are
set in both the publisher and subscriber.

Thanks
g


On 14 June 2014 23:37, Pieter Hintjens  wrote:

> Are you setting the HWM to zero before doing any binds or connects, or
> after?
>
> Also, are you setting the HWM both at publisher and at subscriber, or
> at one side only?
>
> -Pieter
>
> On Fri, Jun 13, 2014 at 8:33 PM, Gerry Steele 
> wrote:
> > I've read everything I can find including the Printed book, but I am at a
> > loss as to the definitive definition as to how PUB/SUB should behave in
> zmq.
> >
> > A production system I'm using is experiencing message loss between
> several
> > nodes using PUB/SUB.
> >
> > From what i've read, PUB SUB should be reliable when the _HWM are set to
> > zero (don't drop). By reliable I mean no messages should fail to be
> > delivered to an already connected consumer.
> >
> > I implemented some utilities to reproduce the message loss in my system :
> >
> > zmq_sub: https://gist.github.com/easytiger/992b3a29eb5c8545d289
> > zmq_pub: https://gist.github.com/easytiger/e382502badab49856357
> >
> >
> > zmq_pub takes a number of events to send and the logging frequency and
> > zmq_sub only takes the logging frequency. zmq prints out the number of
> msgs
> > received vs the packet contents containing the integer packet count from
> the
> > publisher.
> >
> > It can be seen when sending events in a tight loop that messages simply
> go
> > missing mid way through (loss is not at beginning or end ruling out slow
> > connectors etc)
> >
> > In a small loop it usually works ok:
> >
> > $ ./zmq_pub 2000 1000
> > sent MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #1000 with
> rc=58
> > sent MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #2000 with
> rc=58
> >
> > $ ./zmq_sub 1
> >
> > RECV:1|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #1
> > RECV:2|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #2
> > RECV:3|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #3
> > RECV:4|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #4
> > [...]
> > RECV:2000|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #2000
> >
> > You can see every message was sent as the counts align.
> >
> > However increase the message counts and messages start going missing
> >
> > $ ./zmq_pub 20 10
> > sent MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #10 with
> rc=60
> > sent MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #20 with
> rc=60
> >
> > ./zmq_sub 1
> > RECV:1|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #11000
> > RECV:2|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #21000
> > RECV:3|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #31610
> > RECV:4|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #42000
> > RECV:5|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #52524
> > RECV:6|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #64654
> > RECV:7|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #77298
> > RECV:8|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #90117
> > RECV:9|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #102864
> > RECV:10|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #115846
> > RECV:11|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #129135
> > RECV:12|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #141606
> > RECV:13|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #154179
> > RECV:14|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #166627
> > RECV:15|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #179166
> > RECV:16|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #192247
> >
> >
> > Is this expected behaviour? With PUSH/PULL I get no loss at all with
> similar
> > utilities.
> >
> > If I put more work between sends (e.g. cout  each time) and the full
> message
> > the results are better.
> >
> > zmq_push: https://gist.github.com/easytiger/2c4f806594ccfbc74f54
> > zmq_pull:   https://gist.github.com/easytiger/268a630fd22f959fde93
> >
> > Is there an issue/bug in my implementation that would cause this?
> >
> > Using zeromq 4.0.3
> >
> > Many Thanks
> > Gerry
> >
> >
> >
> >
> >
> > --
> > Gerry Steele
> >
> >
> > ___
> > zeromq-dev mailing list
> > zeromq-dev@lists.zeromq.org
> > http://lists.zeromq.org/mailman/listinfo/zeromq-dev
> >
> ___
> zeromq-dev mailing list
> zeromq-dev@lists.zeromq.org
> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>



-- 
Gerry Steele
___
zeromq-dev mailing list
zeromq-dev@lists.zeromq.org
http://lists.zeromq.org/mailman/listinfo/zeromq-dev

Re: [zeromq-dev] Universal Fast Sockets and ZeroMQ

2014-06-13 Thread Gerry Steele

>From What I can tell it is preloaded so which intercepts socket calls and
if the calls are destined to go over loopback they use their own protocol
and ipc mechanism.

http://grokbase.com/t/zeromq/zeromq-dev/13apd39pg8/jeromq-zeromq-transparent-acceleration-with-fast-sockets
 On 2 Jun 2014 20:31, "Jonathan Jekeli"  wrote:

> Earlier this year, I saw some posts from someone regarding Universal Fast
> Sockets, saying that they greatly decreased the latency of zeromq (
> http://lists.zeromq.org/pipermail/zeromq-dev/2014-February/025452.html).
>  After doing some research, tracked it back to TorusWare, a Spanish company
> that advertised a gain of 2400% (
> http://torusware.com/increase-zeromq-performance-by-up-to-2400/).
>  However, I couldn't find any numbers outside of what they themselves
> advertised, and no real documentation. I was just wondering if anyone out
> there had tried the Universal Fast Sockets or had any experience or insight
> into them?
>
> Thanks,
>
> Jon
>
> ___
> zeromq-dev mailing list
> zeromq-dev@lists.zeromq.org
> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>
>
___
zeromq-dev mailing list
zeromq-dev@lists.zeromq.org
http://lists.zeromq.org/mailman/listinfo/zeromq-dev

Re: [zeromq-dev] PUB/SUB unreliabiliity

2014-06-13 Thread Gerry Steele

Hi Brian

I noticed your comment on another thread about this and I think you got it
a bit wrong:

> The high water mark is a hard limit on the maximum number of outstanding
messages ØMQ shall queue in memory for any single peer that the specified
*socket* is communicating with.* A value of zero means no limit.*

and from your link:

> Since v3.x, ØMQ forces default limits on its internal buffers (the
so-called high-water mark or HWM), so publisher crashes are rarer *unless
you deliberately set the HWM to infinite.*

Nothing I read indicates anything other than the fact that no messages post
connections being made should be dropped.

Thanks
G



On 13 June 2014 23:17, Brian Knox  wrote:

> "From what i've read, PUB SUB should be reliable when the _HWM are set to
> zero (don't drop). By reliable I mean no messages should fail to be
> delivered to an already connected consumer."
>
>
> Your understanding of pub-sub behavior and how  it interacts with the HWM
> is incorrect.  Please see: http://zguide.zeromq.org/php:chapter5
>
> Brian
>
>
>
>
> On Fri, Jun 13, 2014 at 2:33 PM, Gerry Steele 
> wrote:
>
>> I've read everything I can find including the Printed book, but I am at a
>> loss as to the definitive definition as to how PUB/SUB should behave in zmq.
>>
>> A production system I'm using is experiencing message loss between
>> several nodes using PUB/SUB.
>>
>> From what i've read, PUB SUB should be reliable when the _HWM are set to
>> zero (don't drop). By reliable I mean no messages should fail to be
>> delivered to an already connected consumer.
>>
>> I implemented some utilities to reproduce the message loss in my system :
>>
>> zmq_sub: https://gist.github.com/easytiger/992b3a29eb5c8545d289
>> zmq_pub: https://gist.github.com/easytiger/e382502badab49856357
>>
>>
>> zmq_pub takes a number of events to send and the logging frequency and
>> zmq_sub only takes the logging frequency. zmq prints out the number of msgs
>> received vs the packet contents containing the integer packet count from
>> the publisher.
>>
>> It can be seen when sending events in a tight loop that messages simply
>> go missing mid way through (loss is not at beginning or end ruling out slow
>> connectors etc)
>>
>> In a small loop it usually works ok:
>>
>> $ ./zmq_pub 2000 1000
>> sent MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #1000 with rc=58
>> sent MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #2000 with rc=58
>>
>> $ ./zmq_sub 1
>>
>> RECV:1|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #1
>> RECV:2|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #2
>> RECV:3|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #3
>> RECV:4|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #4
>> [...]
>> RECV:2000|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #2000
>>
>> You can see every message was sent as the counts align.
>>
>> However increase the message counts and messages start going missing
>>
>> $ ./zmq_pub 20 10
>>
>> sent MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #10 with
>> rc=60
>> sent MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #20 with
>> rc=60
>>
>> ./zmq_sub 1
>> RECV:1|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #11000
>>  RECV:2|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #21000
>> RECV:3|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #31610
>> RECV:4|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #42000
>> RECV:5|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #52524
>> RECV:6|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #64654
>> RECV:7|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #77298
>> RECV:8|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #90117
>> RECV:9|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #102864
>> RECV:10|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #115846
>> RECV:11|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #129135
>> RECV:12|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #141606
>> RECV:13|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #154179
>> RECV:14|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #166627
>> RECV:15|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #179166
>> RECV:16|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #192247
>>
>>
>> Is this expected behaviour? With PUSH/PULL I get no loss at all with
>> si

[zeromq-dev] PUB/SUB unreliabiliity

2014-06-13 Thread Gerry Steele

I've read everything I can find including the Printed book, but I am at a
loss as to the definitive definition as to how PUB/SUB should behave in zmq.

A production system I'm using is experiencing message loss between several
nodes using PUB/SUB.

>From what i've read, PUB SUB should be reliable when the _HWM are set to
zero (don't drop). By reliable I mean no messages should fail to be
delivered to an already connected consumer.

I implemented some utilities to reproduce the message loss in my system :

zmq_sub: https://gist.github.com/easytiger/992b3a29eb5c8545d289
zmq_pub: https://gist.github.com/easytiger/e382502badab49856357


zmq_pub takes a number of events to send and the logging frequency and
zmq_sub only takes the logging frequency. zmq prints out the number of msgs
received vs the packet contents containing the integer packet count from
the publisher.

It can be seen when sending events in a tight loop that messages simply go
missing mid way through (loss is not at beginning or end ruling out slow
connectors etc)

In a small loop it usually works ok:

$ ./zmq_pub 2000 1000
sent MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #1000 with rc=58
sent MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #2000 with rc=58

$ ./zmq_sub 1

RECV:1|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #1
RECV:2|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #2
RECV:3|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #3
RECV:4|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #4
[...]
RECV:2000|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #2000

You can see every message was sent as the counts align.

However increase the message counts and messages start going missing

$ ./zmq_pub 20 10

sent MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #10 with rc=60
sent MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #20 with rc=60

./zmq_sub 1
RECV:1|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #11000
RECV:2|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #21000
RECV:3|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #31610
RECV:4|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #42000
RECV:5|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #52524
RECV:6|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #64654
RECV:7|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #77298
RECV:8|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #90117
RECV:9|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #102864
RECV:10|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #115846
RECV:11|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #129135
RECV:12|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #141606
RECV:13|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #154179
RECV:14|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #166627
RECV:15|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #179166
RECV:16|MESSAGE PAYLOAD OF A NONTRIVAL SIZE KIND OF AND SUCH #192247


Is this expected behaviour? With PUSH/PULL I get no loss at all with
similar utilities.

If I put more work between sends (e.g. cout  each time) and the full
message the results are better.

zmq_push: https://gist.github.com/easytiger/2c4f806594ccfbc74f54
zmq_pull:   https://gist.github.com/easytiger/268a630fd22f959fde93

Is there an issue/bug in my implementation that would cause this?

Using zeromq 4.0.3

Many Thanks
Gerry





-- 
Gerry Steele
___
zeromq-dev mailing list
zeromq-dev@lists.zeromq.org
http://lists.zeromq.org/mailman/listinfo/zeromq-dev

Re: [zeromq-dev] 10 seconds delay on a PUB/SUB

2014-03-14 Thread Gerry Steele

On a tangent... Does high watermark=0 really make pub/ sub fully reliable?
Wasn't my understanding. Could be wrong.

How big are the messages you are sending?

Can you reproduce on same hardware with a hello world pub sub for messages
of the same size?
 On 14 Mar 2014 15:06, "Giacomo Tesio"  wrote:

> Hi, I'm getting 5 to 10 seconds delay in on a pub/sub socket with low load
> (in a context with heavy load on other sockets).
>
> I'm using NetMQ on Windows 7, with tcp transport on 127.0.0.1 (indeed it
> should be ipc, but it's not supported on Windows AFAIK).
>
> This is the topology:
>
> We have Server A, Client B and Client C.
>
> Server binds a PUB with heavy load (let's call it PUB1), publishing 50-100
> msg/s with few daily peaks at 1000 msg/s.
>
> Server binds a PUB with small load (let's call it PUB2). publishing 50-500
> msg *each day*. Note however that these messages are sent in groups of 1
> to 5 in a few milliseconds.
>
> Client B connect with a SUB socket to PUB1,
> Client C connects with two SUB socket to PUB1 and PUB2.
>
> My issue is that when a group of messages is sent in PUB2, the first is
> received almost instantly from Client C, but the others are received
> seconds after, at seconds of distances.
>
> For example, here are a few times from today problems.
>
> Sent from Server A  -> Received from Client C
> 09:00:59.608 -> 09:01:05.643
> 09:01:00.055 -> 09:01:05.64
> 09:01:00.117 -> 09:01:10.928
> 09:01:02.883 -> 09:01:16.172
> 09:01:05.541 -> 09:01:18.754
>
>
> How can I reduce this delay?
> I tried to increase the ThreadPoolSize up to the number of CPUs, but
> without success.
> Note that I (must) have HighWaterMark = 0 on every socket (I can't loose
> messages), but the machine is full of free memory (4 GB are always free)
> and never use more than 40% of each cpu.
>
>
> Giacomo
>
>
> ___
> zeromq-dev mailing list
> zeromq-dev@lists.zeromq.org
> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>
>
___
zeromq-dev mailing list
zeromq-dev@lists.zeromq.org
http://lists.zeromq.org/mailman/listinfo/zeromq-dev

Re: [zeromq-dev] PUB/SUB cached queue list

2013-09-26 Thread Gerry Steele

se of the information in 
>> this electronic mail message may be subject to the penalties set forth by 
>> Chilean law. If you have received this electronic mail message in error, we 
>> ask you to destroy the message and its attached file(s) and to immediately 
>> notify the sender by answering this message.
>>
>> ___
>> zeromq-dev mailing list
>> zeromq-dev@lists.zeromq.org
>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>>
> ___
> zeromq-dev mailing list
> zeromq-dev@lists.zeromq.org
> http://lists.zeromq.org/mailman/listinfo/zeromq-dev



-- 
Gerry Steele
___
zeromq-dev mailing list
zeromq-dev@lists.zeromq.org
http://lists.zeromq.org/mailman/listinfo/zeromq-dev

Re: [zeromq-dev] "Resource temporarily unavailable" with pub socket

2013-08-14 Thread Gerry Steele

Worth mentioning that there is a connected consumer and some messages are 
being send accross ok

On Wednesday, August 14, 2013 11:30:36 AM UTC+1, Gerry Steele wrote:
>
> As part of a larger program i ran into a "Resource temporarily 
> unavailable" error when trying to send a message via a pub socket.
>
> The program listens on multiple ipc transports to another process via SUB 
> (works well). i then do something to the data (removed from the example) 
> and then send it out again on a pub socket.
>
> This happens if i run it in a separate thread and even occurs after moving 
> it to the main thread. I can't see anything obviously wrong with the code.
>
> Full code listing here
>
> http://paste.ubuntu.com/5984515/
>
> Many Thanks
> GS
>
___
zeromq-dev mailing list
zeromq-dev@lists.zeromq.org
http://lists.zeromq.org/mailman/listinfo/zeromq-dev

[zeromq-dev] "Resource temporarily unavailable" with pub socket

2013-08-14 Thread Gerry Steele

As part of a larger program i ran into a "Resource temporarily unavailable" 
error when trying to send a message via a pub socket.

The program listens on multiple ipc transports to another process via SUB 
(works well). i then do something to the data (removed from the example) 
and then send it out again on a pub socket.

This happens if i run it in a separate thread and even occurs after moving 
it to the main thread. I can't see anything obviously wrong with the code.

Full code listing here

http://paste.ubuntu.com/5984515/

Many Thanks
GS
___
zeromq-dev mailing list
zeromq-dev@lists.zeromq.org
http://lists.zeromq.org/mailman/listinfo/zeromq-dev

Re: [zeromq-dev] How to define a global queue

Re: [zeromq-dev] Reliable pub sub on c++ linux

Re: [zeromq-dev] Reliable pub sub on c++ linux

Re: [zeromq-dev] SOLUTION-- Encryption failure problem and wireless connectivity

Re: [zeromq-dev] Process eating 100 % of one core

Re: [zeromq-dev] zeromq, abort(), and high reliability environments

Re: [zeromq-dev] question about sub/pub speed and capability

Re: [zeromq-dev] PUB/SUB unreliabiliity

Re: [zeromq-dev] PUB/SUB unreliabiliity

Re: [zeromq-dev] PUB/SUB unreliabiliity

Re: [zeromq-dev] PUB/SUB unreliabiliity

Re: [zeromq-dev] PUB/SUB unreliabiliity

Re: [zeromq-dev] PUB/SUB unreliabiliity

Re: [zeromq-dev] PUB/SUB unreliabiliity

Re: [zeromq-dev] Universal Fast Sockets and ZeroMQ

Re: [zeromq-dev] PUB/SUB unreliabiliity

[zeromq-dev] PUB/SUB unreliabiliity

Re: [zeromq-dev] 10 seconds delay on a PUB/SUB

Re: [zeromq-dev] PUB/SUB cached queue list

Re: [zeromq-dev] "Resource temporarily unavailable" with pub socket

[zeromq-dev] "Resource temporarily unavailable" with pub socket

21 matches

Site Navigation

Mail list logo

Footer information