Let me start sharing my thought. :)
1. Need to enrich docs about metrics / stats.
In fact, I couldn't see the fact - topology stats are sampled by default
and sample rate is 0.05 - from the docs when I was newbie of Apache Storm. It
made me misleading and made me saying "Why there're difference
@Jason: Thanks. Tried searching for Storm code which starts Ephemeral
nodes, but couldn't find it. (am new to Hadoop and Storm, so perhaps I was
searching for the wrong thing)
@Jungtaek: Will explore component tasks. Meanwhile, I had considered
Trident, but didn't go ahead because it was not
Hi Storm users,
I'm Jungtaek Lim, committer and PMC member of Apache Storm.
If you subscribed dev@ mailing list, you may have seen that recently we're
addressing the metrics feature on Apache Storm.
For now, improvements are going forward based on current metrics feature.
- Improve (Topology)
No I just would like to say it's not strange.
What you need to know about is that there could be some additional latency
when your Spout spends some moments in nextTuple(), ack(), fail(). KafkaSpout
reads the data from Kafka so I assume there's execution time in nextTuple()
on KafkaSpout.
Apache
We're using the out of the box kafka spout, are there any issues related to
that here?
> One thing you would like to check is, event handler on Spout is single
thread, which means that same thread calls nextTuple() and ack() and
fail().
I'm not following you here. What would I be checking here?
Oh right sorry it's introduced at Storm 1.0.0.
And max-spout-pending controls the call of nextTuple() so tuple latency
shouldn't be affected.
One thing you would like to check is, event handler on Spout is single
thread, which means that same thread calls nextTuple() and ack() and
fail().
Yes it appears that the skipped-max-spout metric is only in 1.0.0 as part
of the automatic back pressure fixes (
https://issues.apache.org/jira/browse/STORM-886)
On Tue, Apr 19, 2016 at 9:21 PM, Kevin Conaway
wrote:
> We are already sending our metrics to graphite but
Thanks John. I had tried profiling locally with yourkit but my
recollection was that almost all of the time was spent in
backtype.storm.timer sleep which didn't smell right to me.
On Tue, Apr 19, 2016 at 8:43 PM, wrote:
> One thing you can do is profile a worker process
We are already sending our metrics to graphite but I don't see
__skipped-max-spout being logged. Was that added after 0.10? I don't even
see a reference to it in the codebase.
We are capturing the queue metrics for each component (send_queue and
receive_queue) and the population for each is
Hi Kevin,
You can attach metrics consumer to log additional informations for that
topology like disruptor queue metrics, __skipped-max-spout for spout, and
etc.
Please refer
https://github.com/apache/storm/blob/master/storm-core/src/jvm/org/apache/storm/metric/LoggingMetricsConsumer.java
to
how
One thing you can do is profile a worker process with jvisualvm to see what
time is spent where for each executor thread as well as the netty and lmax
layers.
--John
Sent from my iPhone
> On Apr 19, 2016, at 8:41 PM, Kevin Conaway wrote:
>
> In Storm 0.10, is
In Storm 0.10, is there a way to monitor the _maxSpoutPending_ value? I
don' see it exposed in any of the metrics that storm publishes but I'd like
to be able to see how much time each tuple spends waiting and how big the
pending queue size is for each spout task. Is this possible?
Our topology
Hi Harsha,
When the Storm 1.0.0 KafkaSpout (from the storm-kafka jar) attempts to read
from the Kafka 0.8.2.1 partition an IlegalArgumentException is thrown, the
root exception of which is as follows:
at java.nio.Buffer.limit(Buffer.java:267)
at
Hi,
I've done a similar thing before with the exception that I was reading from
Cassandra. The concept is the same though. Assuming you know that you
have 10,000 records and you want each spout to read 1,000 of them, then you
would launch 10 instances of the spouts. The first thing they do
Hi,
Thanks a lot for response!
I’ll check security settings, but I don’t configure anything specific in my
storm.yml, so I guess default.yml entries are used.
My configuration is following:
storm.zookeeper.servers:
- {{ZOOKEEPER_HOST}}
storm.zookeeper.port: {{ZOOKEEPER_PORT}}
Hello!
I found also a post with similliar error that you have. Perhaps you get
some clues.
http://mail-archives.apache.org/mod_mbox/storm-user/201603.mbox/%3c0dd9aa99-8504-43c9-b3a8-6196def07...@viaplay.com%3E
On Tue, Apr 19, 2016 at 2:25 PM, Spico Florin wrote:
> Hi!
I have this
Config config = new Config();
MongoDatabaseManager mongoManager = new MongoDatabaseManager();
config.put("MongoManager", mongoManager);
and MongoDatabaseManager is an empty class:
public class MongoDatabaseManager implements Serializable {}
But after submitting the topology, I get
Thanks guys.
I didn't understand "*...spout instances by utilizing Zookeper.*". How does
one utilize Zookeper? Is it the same as ".setNumTasks(10)" for a Spout?
As of now I've set
config.setNumWorkers(2);
and
builder.setSpout("mongoSpout", new MongoSpout()).setNumTasks(2);
I'm able to get
Please help!
From: Victor Kovrizhkin
Date: Monday, April 18, 2016 at 9:28 PM
To: "user@storm.apache.org"
Subject: Storm 1.0.0 DRPC connection refused
Hi Good People!
I’m trying to update my cluster running Storm 0.10.0 with DRPC to Storm 1.0.0.
I’ve updated all machines with latest
Hi Nikos,
My topic have 6 partition so my kafka-spouts parallelism hint also 6.
My worker is 10
KafkaSpout executor is 6
Only with Kafka-spout without bolt I got 1,00,000/s but excepted 10,00,000/s
Regards,
-Anandh Kumar
On Mon, Apr 18, 2016 at 8:53 PM, Nikos R. Katsipoulakis <
Coreection - group on partition id
On Apr 19, 2016 6:33 AM, "Navin Ipe"
wrote:
> I've seen this:
> http://storm.apache.org/releases/0.10.0/Understanding-the-parallelism-of-a-Storm-topology.html
> but it doesn't explain how workers coordinate with each other, so
>
Hi Navin,
I'm not sure if this scenario is a perfect fit for Storm since you want
precice control of colocation. But If I understand your problem correctly
the following could be a viable approach:
1. Establish a total order of spout instances by utilizing Zookeeper. Your
spout instances will
Hey ,
One way how I handle the similar problem - say if only 1 worker slot is
there on 1 VM then based on hostname/host ip I will force to fetch rows
from the database .Another choice but with diff setup is using hdfs in
place of MySQL.
eg.
23 matches
Mail list logo