Re: Streaming DRPC?

Adam Lewis Sun, 16 Feb 2014 04:40:26 -0800

It might help to understand your use case a little bit better.  Are the
requests for a finite amount of data that you just happen to want to stream
out, or are they more akin to a subscription for an unbounded amount of
data?  Also, does the request contain a specification of what needs to be
computed?  That is, do you imagine your bolts (or various trident
components) each keeping a list of active requests and modifying their
behaviour (or data source in the case of spouts) as the requests come in?
 You might want to check out the storm-signals project for ideas of how you
could orchestrate all the components to do the right thing in face of your
requests coming in: https://github.com/ptgoetz/storm-signals



On Sat, Feb 15, 2014 at 7:30 PM, Carl Lerche <m...@carllerche.com> wrote:

> Hey Adam,
>
> Actually, that's quite a good idea. I'm glad you responded, this is a
> better approach than what I was going to attempt (aka, mega hacks). I
> understand how your approach could be done with non-DRPC trident. The
> one drawback that I can think of would be that the request message
> would need to be part of a batch, so if matches only happen every
> 15~20 seconds, it would take a while for the response to start
> arriving.
>
> Perhaps you could elaborate more on how one could use vanilla storm
> could be used to solve this? Perhaps it could help with the batch
> delay problem.
>
> Cheers,
> Carl
>
> On Sat, Feb 15, 2014 at 4:42 AM, Adam Lewis <m...@adamlewis.com> wrote:
> > Hi Carl,
> >
> > DRPC is inherently synchronous in the way it works so if I understand
> what
> > you are trying to do correctly then I suggest you stick to non-DRPC
> trident
> > or even vanilla storm.  You can setup some messaging queues to handle the
> > input (request) and output (streaming result).  Include a field in the
> input
> > tuple that can be used to correlate any downstream results (where that
> ID is
> > client generated), then create a client which handles publishing to the
> > input queue and subscribing to the output queue (filtering on messages
> which
> > have the input correlation id).  You can partition your storm topology
> (and
> > the input and output queues) on that correlation ID to achieve some load
> > balancing.  Finally, if your output streams are infinite, you need some
> > mechanism to stop them...
> >
> > As a side benefit, you can overcome some limitations of DRPC such as no
> > control over serialization and even have multiple trident streams all
> > writing to your output queue (whereas DRPC doesn't support that sort of
> > branching in the topology).
> >
> > Adam
> >
> >
> > On Fri, Feb 14, 2014 at 7:06 PM, Carl Lerche <m...@carllerche.com> wrote:
> >>
> >> Hello,
> >>
> >> I noticed that the DRPC client only allows a single response (and for
> >> that response to be JSON encoded). I was hoping to implement some sort
> >> of DRPC stream where I a constant stream of data based on a given
> >> query. Also, is there a way to serialize the response using Kryo
> >> instead of JSON? The specific format of my data is not very JSON
> >> friendly.
> >>
> >> Cheers,
> >> Carl
> >
> >
>

Re: Streaming DRPC?

Reply via email to