Re: [DISCUSS] Would like to make collective intelligence about Metrics on Storm

2016-04-19 Thread Jungtaek Lim
Let me start sharing my thought. :) 1. Need to enrich docs about metrics / stats. In fact, I couldn't see the fact - topology stats are sampled by default and sample rate is 0.05 - from the docs when I was newbie of Apache Storm. It made me misleading and made me saying "Why there're difference

Re: How does one distribute database iteration across workers?

2016-04-19 Thread Navin Ipe
@Jason: Thanks. Tried searching for Storm code which starts Ephemeral nodes, but couldn't find it. (am new to Hadoop and Storm, so perhaps I was searching for the wrong thing) @Jungtaek: Will explore component tasks. Meanwhile, I had considered Trident, but didn't go ahead because it was not

[DISCUSS] Would like to make collective intelligence about Metrics on Storm

2016-04-19 Thread Jungtaek Lim
Hi Storm users, I'm Jungtaek Lim, committer and PMC member of Apache Storm. If you subscribed dev@ mailing list, you may have seen that recently we're addressing the metrics feature on Apache Storm. For now, improvements are going forward based on current metrics feature. - Improve (Topology)

Re: Monitoring Max Spout Pending

2016-04-19 Thread Jungtaek Lim
No I just would like to say it's not strange. What you need to know about is that there could be some additional latency when your Spout spends some moments in nextTuple(), ack(), fail(). KafkaSpout reads the data from Kafka so I assume there's execution time in nextTuple() on KafkaSpout. Apache

Re: Monitoring Max Spout Pending

2016-04-19 Thread Kevin Conaway
We're using the out of the box kafka spout, are there any issues related to that here? > One thing you would like to check is, event handler on Spout is single thread, which means that same thread calls nextTuple() and ack() and fail(). I'm not following you here. What would I be checking here?

Re: Monitoring Max Spout Pending

2016-04-19 Thread Jungtaek Lim
Oh right sorry it's introduced at Storm 1.0.0. And max-spout-pending controls the call of nextTuple() so tuple latency shouldn't be affected. One thing you would like to check is, event handler on Spout is single thread, which means that same thread calls nextTuple() and ack() and fail().

Re: Monitoring Max Spout Pending

2016-04-19 Thread Kevin Conaway
Yes it appears that the skipped-max-spout metric is only in 1.0.0 as part of the automatic back pressure fixes ( https://issues.apache.org/jira/browse/STORM-886) On Tue, Apr 19, 2016 at 9:21 PM, Kevin Conaway wrote: > We are already sending our metrics to graphite but

Re: Monitoring Max Spout Pending

2016-04-19 Thread Kevin Conaway
Thanks John. I had tried profiling locally with yourkit but my recollection was that almost all of the time was spent in backtype.storm.timer sleep which didn't smell right to me. On Tue, Apr 19, 2016 at 8:43 PM, wrote: > One thing you can do is profile a worker process

Re: Monitoring Max Spout Pending

2016-04-19 Thread Kevin Conaway
We are already sending our metrics to graphite but I don't see __skipped-max-spout being logged. Was that added after 0.10? I don't even see a reference to it in the codebase. We are capturing the queue metrics for each component (send_queue and receive_queue) and the population for each is

Re: Monitoring Max Spout Pending

2016-04-19 Thread Jungtaek Lim
Hi Kevin, You can attach metrics consumer to log additional informations for that topology like disruptor queue metrics, __skipped-max-spout for spout, and etc. Please refer https://github.com/apache/storm/blob/master/storm-core/src/jvm/org/apache/storm/metric/LoggingMetricsConsumer.java to how

Re: Monitoring Max Spout Pending

2016-04-19 Thread hokiegeek2
One thing you can do is profile a worker process with jvisualvm to see what time is spent where for each executor thread as well as the netty and lmax layers. --John Sent from my iPhone > On Apr 19, 2016, at 8:41 PM, Kevin Conaway wrote: > > In Storm 0.10, is

Monitoring Max Spout Pending

2016-04-19 Thread Kevin Conaway
In Storm 0.10, is there a way to monitor the _maxSpoutPending_ value? I don' see it exposed in any of the metrics that storm publishes but I'd like to be able to see how much time each tuple spends waiting and how big the pending queue size is for each spout task. Is this possible? Our topology

Re: Is Storm 1.0.0 compatible with Kafka 0.8.2.x?

2016-04-19 Thread John Yost
Hi Harsha, When the Storm 1.0.0 KafkaSpout (from the storm-kafka jar) attempts to read from the Kafka 0.8.2.1 partition an IlegalArgumentException is thrown, the root exception of which is as follows: at java.nio.Buffer.limit(Buffer.java:267) at

Re: How does one distribute database iteration across workers?

2016-04-19 Thread Jason Kusar
Hi, I've done a similar thing before with the exception that I was reading from Cassandra. The concept is the same though. Assuming you know that you have 10,000 records and you want each spout to read 1,000 of them, then you would launch 10 instances of the spouts. The first thing they do

Re: Storm 1.0.0 DRPC connection refused

2016-04-19 Thread Victor Kovrizhkin
Hi, Thanks a lot for response! I’ll check security settings, but I don’t configure anything specific in my storm.yml, so I guess default.yml entries are used. My configuration is following: storm.zookeeper.servers: - {{ZOOKEEPER_HOST}} storm.zookeeper.port: {{ZOOKEEPER_PORT}}

Re: Storm 1.0.0 DRPC connection refused

2016-04-19 Thread Spico Florin
Hello! I found also a post with similliar error that you have. Perhaps you get some clues. http://mail-archives.apache.org/mod_mbox/storm-user/201603.mbox/%3c0dd9aa99-8504-43c9-b3a8-6196def07...@viaplay.com%3E On Tue, Apr 19, 2016 at 2:25 PM, Spico Florin wrote: > Hi!

How do you add a custom class to Config?

2016-04-19 Thread Navin Ipe
I have this Config config = new Config(); MongoDatabaseManager mongoManager = new MongoDatabaseManager(); config.put("MongoManager", mongoManager); and MongoDatabaseManager is an empty class: public class MongoDatabaseManager implements Serializable {} But after submitting the topology, I get

Re: How does one distribute database iteration across workers?

2016-04-19 Thread Navin Ipe
Thanks guys. I didn't understand "*...spout instances by utilizing Zookeper.*". How does one utilize Zookeper? Is it the same as ".setNumTasks(10)" for a Spout? As of now I've set config.setNumWorkers(2); and builder.setSpout("mongoSpout", new MongoSpout()).setNumTasks(2); I'm able to get

Re: Storm 1.0.0 DRPC connection refused

2016-04-19 Thread Victor Kovrizhkin
Please help! From: Victor Kovrizhkin Date: Monday, April 18, 2016 at 9:28 PM To: "user@storm.apache.org" Subject: Storm 1.0.0 DRPC connection refused Hi Good People! I’m trying to update my cluster running Storm 0.10.0 with DRPC to Storm 1.0.0. I’ve updated all machines with latest

Re: Storm 0.10.0 Benchmark issue

2016-04-19 Thread Anandh Kumar
Hi Nikos, My topic have 6 partition so my kafka-spouts parallelism hint also 6. My worker is 10 KafkaSpout executor is 6 Only with Kafka-spout without bolt I got 1,00,000/s but excepted 10,00,000/s Regards, -Anandh Kumar On Mon, Apr 18, 2016 at 8:53 PM, Nikos R. Katsipoulakis <

Re: How does one distribute database iteration across workers?

2016-04-19 Thread Alexander T
Coreection - group on partition id On Apr 19, 2016 6:33 AM, "Navin Ipe" wrote: > I've seen this: > http://storm.apache.org/releases/0.10.0/Understanding-the-parallelism-of-a-Storm-topology.html > but it doesn't explain how workers coordinate with each other, so >

Re: How does one distribute database iteration across workers?

2016-04-19 Thread Alexander T
Hi Navin, I'm not sure if this scenario is a perfect fit for Storm since you want precice control of colocation. But If I understand your problem correctly the following could be a viable approach: 1. Establish a total order of spout instances by utilizing Zookeeper. Your spout instances will

Re: How does one distribute database iteration across workers?

2016-04-19 Thread anshu shukla
Hey , One way how I handle the similar problem - say if only 1 worker slot is there on 1 VM then based on hostname/host ip I will force to fetch rows from the database .Another choice but with diff setup is using hdfs in place of MySQL. eg.