Trident executes a batch every 500ms (by default). A batch involves a bunch
of coordination messages going out to all the bolts to coordinate the batch
(even if the batch is empty). So that's what you're seeing.
On Fri, Sep 26, 2014 at 12:32 PM, Deepak Subhramanian <
deepak.subhraman...@gmail.com>
Storm processes many records from a partition at once. If a record fails,
all tuples from the offset onwards will have to be retried. So in your
case, if a record before that record failed, it's possible that a
subsequent record that had already been successfully processed was retried.
This is tru
You'd probably want to make use of the component-specific configuration on
spouts and bolts to funnel the information to your scheduler as to which
tasks should be colocated.
On Mon, Sep 1, 2014 at 1:42 PM, Nathan Marz wrote:
> Yes, you can definitely accomplish that by writing
e
> read latency really maters here. Is a pluggable scheduler (all bolt tasks
> in a strong node) and a static concurrent hashmap a good approach to it?
>
> Thanks,
> Michael
>
>
>
> On Mon, Sep 1, 2014 at 8:10 PM, Nathan Marz wrote:
>
>> There's not muc
There's not much more to it than what was documented on the blog post
announcing it:
http://storm.incubator.apache.org/2013/01/11/storm082-released.html
What are your questions about it?
On Mon, Sep 1, 2014 at 12:04 PM, Michael Vogiatzis
wrote:
> Hello,
>
> Does anyone know of any good resour
Hi Jie,
A few questions:
1. How many topologies do you have running on this cluster?
2. Once the topology recovers, do the errors still happen or do they
disappear for the most part?
3. Have you noticed any worker failures on the cluster around the time
these errors happen?
-Nathan
On Tue, Aug
Deletion is typically done by running a job that copies the master dataset
into a new folder, filtering out bad data along the way. This is expensive,
but that's ok since this is only done in rare circumstances. When I've done
this in the past I'm extra careful before deleting the corrupted master
You shouldn't do this, so this behavior is undefined.
On Sat, Jul 26, 2014 at 2:26 PM, Rishabh Goyal wrote:
> Hi,
>
> What happens if I ack and fail same tuple during the execute method ? Is
> the tuple finally acked or replayed ?
>
>
--
Twitter: @nathanmarz
http://nathanmarz.com
A stack dump of all workers would be useful in the case of a topology
freeze.
On Thu, Jun 19, 2014 at 3:52 PM, Nathan Marz wrote:
> There were a bunch of changes to the internals, so a regression is
> certainly possible. Let us know as many details as possible if you are able
> to
There were a bunch of changes to the internals, so a regression is
certainly possible. Let us know as many details as possible if you are able
to reproduce it.
On Thu, Jun 19, 2014 at 3:09 PM, Andrew Montalenti
wrote:
> Another interesting 0.9.2 issue I came across: the IConnection interface
>
at be a bad practice?)
>
> - update a state with something like partitionAggregate(sf) or
> partitionPersist(sf)
>
> - create a newStatic(sf) for querying this state (that we have updated)
> later in the topology
>
>
> On Jun 17, 2014 2:18 AM, "Nathan Marz" wrote:
Static state just refers to a state that is not maintained by your Trident
topology but which you still want to be able to query, so something like a
database that some other system is responsible for updating.
On Mon, Jun 16, 2014 at 4:21 AM, Carlos Rodriguez wrote:
> Hi guys,
>
> We are using
When possible it will do as much aggregation Storm-side so as to minimize
amount it needs to interact with database. So if you do a persistent global
count, for example, it will compute the count for the batch (in parallel),
and then the task that finishes the global count will do a single
get/upda
#10 - 5 points
On Sun, May 18, 2014 at 4:47 AM, Marco wrote:
> #3 - 2 pt.
> #4 - 1 pt.
> #5 - 2 pts.
>
> Il Venerdì 16 Maggio 2014 18:41, P. Taylor Goetz ha
> scritto:
>
>
> This is a call to vote on selecting the top 3 Storm logos from the 11
> entries received. This is the first of two rou
er batch coordinator ?
>
> Weide
>
>
> On Tue, May 13, 2014 at 11:30 AM, Nathan Marz wrote:
>
>> Both. topology.max.spout.pending specifies how many batches are processed
>> in parallel. However, for state updates, the batches are processed
>> sequentially. So
persistentAggregate uses partitionPersist underneath the hood.
On Wed, May 14, 2014 at 10:13 AM, Nathan Marz wrote:
> Yes.
>
>
> On Tue, May 13, 2014 at 5:33 PM, Weide Zhang wrote:
>
>> Hi Nathan,
>>
>> I have a followup question on this. If I'
Both. topology.max.spout.pending specifies how many batches are processed
in parallel. However, for state updates, the batches are processed
sequentially. So the state update for batch 2 won't be executed until the
state update for batch 1 succeeds.
On Tue, May 13, 2014 at 10:31 AM, Raphael Hsieh
Yes, that's right.
On Wed, Apr 30, 2014 at 4:43 AM, Kashyap Mhaisekar wrote:
> Ok. So, using BaseBasicBolt guarantees acks for tuples on execute method
> completion, right?
>
> Regards,
> Kashyap
>
>
> On Wednesday, April 30, 2014, Nathan Marz wrote:
>
>>
That sounds like a bad idea. The whole point of using BasicBolt's is to not
manually ack tuples. If you want control over acking, use RichBolt's.
On Tue, Apr 29, 2014 at 9:18 PM, Kashyap Mhaisekar wrote:
> Hi,
> Is there a way to get reference to outputCollector using
> BasicOutputCollector? I w
described
> here<https://github.com/nathanmarz/storm/wiki/Trident-state#opaque-transactional-spouts>
> (
> https://github.com/nathanmarz/storm/wiki/Trident-state#opaque-transactional-spouts)
> myself?
>
>
>
> On Thu, Apr 17, 2014 at 11:00 AM, Nathan Marz wrote:
>
gt; how does this happen ? Will it query the datastore for the current value,
>> then add the current aggregate value to the stored value in order to create
>> the global aggregate ? Where does this logic happen ? I can't seem to find
>> where this happens in the per
Batches are processed sequentially, but each batch is partitioned (and
therefore processed in parallel). As a batch is processed, it can be
repartitioned an arbitrary number of times throughout the Trident topology.
On Wed, Apr 16, 2014 at 4:28 PM, Raphael Hsieh wrote:
> Hi I found myself being
topology.optimize doesn't do anything at the moment. It was something
planned for in the early days but turned out to be unecessary.
On Fri, Mar 21, 2014 at 9:00 PM, bijoy deb wrote:
> Thanks Drew.I am going to try those options and see if that helps.
>
> Thanks
> Bijoy
>
>
> On Fri, Mar 21, 201
No, it doesn't have to be. You're in full control of it. Internally Storm
generates its own tuple ID and maintains a map from that globally unique
tuple id to your spout id. The spout id is simply used in the ack/fail
methods of the spout (so that you know what was acked/failed)
On Tue, Mar 18, 2
onnect to other tech, storm-kafka,
> storm-cassandra, etc
> extras for other things like storm-starter, storm-deploy, storm-puppet
>
>
>
> On 13 Mar 2014, at 3:57 pm, Nathan Marz wrote:
>
> I don't like either name tbh. Storm itself is already broken into modules
&g
;>
> >>> hi, I am in agreement with Taylor and believe I understand his
> >>> intent. An incredible tool/framework/application like Storm is
> >>> only enhanced and gains value from the number of well maintained
> >>> and vetted modules that can
The latest Storm release 0.9.1 is all Apache-licensed. We replaced the
ZeroMQ transport with a Netty transport (though you can still use the
zeromq transport by changing the settings)
On Mon, Mar 3, 2014 at 12:46 AM, Richards Peter wrote:
> Hi team,
>
> We are using storm 0.8.2 in our project. I
Throwing a FailedException is how you programatically fail a batch.
On Sun, Mar 2, 2014 at 8:45 AM, David Smith wrote:
> Also I don't see a way to a fail batch programmatically like you can do
> with traditional storm? What happens if I throw a Failed Exception from a
> within function, state q
Snapshottable is used for storing a single value, like a global count.
SnapshotGet retrieves that value into your Stream.
The globalKey is fixed
On Sun, Mar 2, 2014 at 8:02 PM, Jahagirdar, Madhu <
madhu.jahagir...@philips.com> wrote:
> All,
>
> 1) Could any one explain the usecase where Snap
I'm only +1 for pulling in storm-kafka and updating it. Other projects put
these contrib modules in a "contrib" folder and keep them managed as
completely separate codebases. As it's not actually a "module" necessary
for Storm, there's an argument there for doing it that way rather than via
the mul
If you're using transactional states, then Trident will store the batch id
with the stored values. So if there's a partial failure, only the ones with
different batch ids will be updated on the next try.
On Wed, Feb 19, 2014 at 12:11 PM, Adrian Mocanu
wrote:
> Hi
>
> When using multiPut in Trid
That's not true, it won't make a difference performance-wise whether you
emit one tuple or many in nextTuple. The batching happens automatically
after that.
On Mon, Feb 17, 2014 at 3:51 PM, Danijel Schiavuzzi wrote:
> Also, Trident is batch-based, so in nextTuple() you should really emit
> batch
Yes batchId + partitionIndex consistently represents the same data as long
as:
1. Any repartitioning you do is deterministic (so partitionBy is, but
shuffle is not)
2. You're using a spout that replays the exact same batch each time (which
is true of transactional spouts but not of opaque transact
Yes, it is arbitrary. Opening an issue for this is a good idea.
On Fri, Feb 14, 2014 at 5:24 PM, Adam Lewis wrote:
> This is promising, 150KB chunk size is giving me over 1 MB/sec; 300KB
> chunk size up to 2 MB/sec although spikier performance. My experimental
> setup leaves a lot to be desire
1 is already calculated by Storm and reported in default metrics. For 2 you
could track that yourself in your spout implementation.
On Mon, Feb 3, 2014 at 7:40 PM, Sho Nakatani wrote:
> Hi all,
>
> I'm new to Storm.
> I'm finding the way to count latency per tuple.
>
> What I exactly want are:
Correct, they will be run sequentially within the bolt. Threading within a
bolt adds a ton of complication (needing to synchronize around output
collector, for instance), so that's really bad. What are you trying to
achieve by adding more threading? You won't get better resource usage
because the r
A batch is all the tuples being computed on at once each run of the
topology. Each stage of the processing is split into partitions for
parallelization.
On Mon, Jan 20, 2014 at 3:31 AM, Michal Singer wrote:
> Hi, it is not clear what is the different between batch and partition on
> the Trident
All this information is available through the TopologyContext.
On Mon, Jan 13, 2014 at 1:34 AM, Klausen Schaefersinho <
klaus.schaef...@gmail.com> wrote:
> Hi,
>
> I have to store the state of a bolt persistently. As I have quite a lot of
> events coming through I would like to use some kind of
You shouldn't do that because the spout will be unable to ack or fail
tuples while its blocked.
On Fri, Dec 27, 2013 at 5:06 PM, Link Wang wrote:
> hi, all
> I wonder to know what will happen If I provide a blocked or dead loop
> implementation for nextTuple in my spout, does it just means that
Actually, nevermind, those cases aren't problems since these objects are
contained within the operation of the combiner aggregator. But I still
wouldn't do it as it goes against the spirit of the API.
On Sat, Dec 21, 2013 at 6:56 PM, Nathan Marz wrote:
> They could potentially
in what
> circumstances could those values be reused?
>
> Thanks.
>
>
> On Sat, Dec 21, 2013 at 5:49 PM, Nathan Marz wrote:
>
>> You shouldn't do that. Those input values could potentially be re-used
>> for other operations. I would recommend looking into using
You shouldn't do that. Those input values could potentially be re-used for
other operations. I would recommend looking into using persistent data
structures.
On Sat, Dec 21, 2013 at 6:42 AM, Adam Lewis wrote:
> Hi All,
>
> When implementing CombinerAggregator in Trident,
> is
> there any issue
42 matches
Mail list logo