Re: Storm 1.0.0 upgrade Serialization issue

2016-05-11 Thread Jungtaek Lim
Yeah, I'm also okay with it, since topology not starting for some reason is definitely not minor issue. Just feel that we may want to collect bugs for some periods. 2016년 5월 12일 (목) 오후 1:13, P. Taylor Goetz 님이 작성: > I'm okay with a quick turnaround release for this fix. We've

Need sample codes for querying data from Cassandra

2016-05-11 Thread Alex Chew
Hi all, I want to try some tests against Cassandra. In my case I want to query data from Cassandra by using query statement like 'select user_id,fname,lname from users' . From storm-cassandra source codes I only find a CassandraWriterBolt. Is there a similar bolt implementation like

Re: Storm 1.0.0 upgrade Serialization issue

2016-05-11 Thread P. Taylor Goetz
I'm okay with a quick turnaround release for this fix. We've got two valid reports of it, and more will follow quickly as users continue to upgrade. -Taylor > On May 11, 2016, at 11:00 PM, Jungtaek Lim wrote: > > KB, > > Submitted pull request:

Re: Storm 1.0.0 upgrade Serialization issue

2016-05-11 Thread Jungtaek Lim
KB, Submitted pull request: https://github.com/apache/storm/pull/1412 for 1.x Since Storm 1.0.1 was released at May 6, I feel we may want to gather more bugfixes to prepare next version, and go on release process. But if we think it's critical or even blocker, we could initiate discussion for

Re: Benchmarks

2016-05-11 Thread cogumelosmaravilha
No, this is very time consuming! I need to write the best code, to get the top speed, tunning configuration files, chose the best driver, etc. I'm thinking in Hadhoop and Postgresql. In most of the projects we need an ACID and a NoSQL. Because Storm is so fast i can't send the data in the last

Re: Getting Kafka Offset in Storm Bolt

2016-05-11 Thread Milind Vaidya
That was "the" thing in mind. I guess I should give it a try and then see how it performs and see how convenient it is, can't just speculate that. On Wed, May 11, 2016 at 2:44 PM, Nathan Leung wrote: > You don't have to batch the whole tuple in supervisor memory, the data is

Re: Getting Kafka Offset in Storm Bolt

2016-05-11 Thread Milind Vaidya
I also considered that. The approach was follows 1. Can the existing storm - kafka set up be leveraged ? 2. Is there any "proven" open source framework for the same ? Spark is next "best" option looks like by keeping paradigm same. We also considered Secor

Benchmarks

2016-05-11 Thread cogumelosmaravilha
Hi all, I made some database benchmarks that i want to share. Source code and drivers are in Python. Kernel 4.4.10-low-latency. Hardware Core-i7 3.6 32GB Ram. mock data, record; 571fa68da32119f501015f5f 947a20A0A9c5d28550110E05 e71bB0597363389420459F41 2016-05-11 11:48:55.948 57 118 01

Re: Getting Kafka Offset in Storm Bolt

2016-05-11 Thread Nathan Leung
You don't have to batch the whole tuple in supervisor memory, the data is already in Kafka. Just keep the tuple ID and write to file. When you close the file ack all of the tuple IDs. On May 11, 2016 5:42 PM, "Steven Lewis" wrote: > It sounds like you want to use Spark

Re: Getting Kafka Offset in Storm Bolt

2016-05-11 Thread Steven Lewis
It sounds like you want to use Spark / Spark Streaming to do that kind of batching output. From: Milind Vaidya > Reply-To: "user@storm.apache.org" > Date: Wednesday, May

Re: Storm cluster with time

2016-05-11 Thread cogumelosmaravilha
In the example topology there is no bottleneck. Try find it in yours! Google storm latency. On 11-05-2016 22:28, sam mohel wrote: Thanks gor helping . But excuse me I didn't get what big values you mean in visualization graph ? On Wednesday, May 11, 2016,

Re: Is Storm visualization enough for performance ?

2016-05-11 Thread sam mohel
Thanks I'll try to read it . Thanks again On Wednesday, May 11, 2016, Spico Florin wrote: > Hi! > Storm UI should give you a bird overview of the topology behavior on > your cluster. For different tools and techniques for finding performance > issues and fine tuning I

Re: Storm cluster with time

2016-05-11 Thread sam mohel
Thanks gor helping . But excuse me I didn't get what big values you mean in visualization graph ? On Wednesday, May 11, 2016, wrote: > Check the image. Where you find big values there's the bottleneck. > > > Citando sam mohel

Re: Getting Kafka Offset in Storm Bolt

2016-05-11 Thread Milind Vaidya
Yeah. We have some microbatching in place for other topologies. This one is little ambitious, in the sense each message is 1~2KB in size so grouping them to a reasonable chunk is necessary say 500KB ~ 1 GB (this is just my guess, I am not sure how much does S3 support or what is optimum). Once

Re: Getting Kafka Offset in Storm Bolt

2016-05-11 Thread Nathan Leung
You can micro batch kafka contents into a file that's replicated (e.g. HDFS) and then ack all of the input tuples after the file has been closed. On Wed, May 11, 2016 at 3:43 PM, Milind Vaidya wrote: > in case of failure to upload a file or disk corruption leading to loss of

Re: Getting Kafka Offset in Storm Bolt

2016-05-11 Thread Milind Vaidya
in case of failure to upload a file or disk corruption leading to loss of file, we have only current offset in Kafka Spout but have no record as to which offsets were lost in the file which need to be replayed. So these can be stored externally in zookeeper and could be used to account for lost

Re: Getting Kafka Offset in Storm Bolt

2016-05-11 Thread Milind Vaidya
Well I will have a look into it. I know it is kind of conflict of interest to use storm to batch data. But S3 does not support individual message appending so needs to be batched, persisted locally and then bulk uploaded. I am just trying to explore if it is possible as we have pretty stable

Re: Getting Kafka Offset in Storm Bolt

2016-05-11 Thread Nathan Leung
Why not just ack the tuple once it's been written to a file. If your topology fails then the data will be re-read from Kafka. Kafka spout already does this for you. Then uploading files to S3 is the responsibility of another job. For example, a storm topology that monitors the output folder.

Re: Getting Kafka Offset in Storm Bolt

2016-05-11 Thread Milind Vaidya
It does not matter, in the sense I am ready to upgrade if this thing is in the roadmap. None the less kafka_2.9.2-0.8.1.1 apache-storm-0.9.4 On Wed, May 11, 2016 at 5:53 AM, Abhishek Agarwal wrote: > which version of storm-kafka, are you using? > > On Wed, May 11,

New supervisor & topology scheduling

2016-05-11 Thread Simon Cooper
When a new supervisor is added to a cluster, does the scheduler get called to reschedule topologies onto the new supervisor? Using the default scheduler, this doesn’t happen (topologies stay where they are), but I’m not sure if this is due to nimbus not calling the scheduler or the scheduler

Re: Storm 1.0.0 upgrade Serialization issue

2016-05-11 Thread KB
Hi Jungtaek, Thanks for providing the snapshot build with this fix. I have verified the fix and it is working fine. Please let me know when can I expect the release with the fix. Once again thanks a lot for looking into this. On Wed, May 11, 2016 at 8:07 AM, Jungtaek Lim

Re: Storm 1.0.0 simple UI auth

2016-05-11 Thread Abhishek Agarwal
I don't know much about hadoop authentication but the code doesn't support user.name property. https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-auth/src/main/java/org/apache/hadoop/security/authentication/server/PseudoAuthenticationHandler.java#L39 Filter provides the

Re: Getting Kafka Offset in Storm Bolt

2016-05-11 Thread Abhishek Agarwal
which version of storm-kafka, are you using? On Wed, May 11, 2016 at 12:29 AM, Milind Vaidya wrote: > Anybody ? Anything about this ? > > On Wed, May 4, 2016 at 11:31 AM, Milind Vaidya wrote: > >> Is there any way I can know what Kafka offset corresponds

Re: Spout Thread Waiting

2016-05-11 Thread Adrien Carreira
Hi Julien, I'm using it to have a metric of how many tuples are beeing processed. The problem isn't the value setted to high, the problem is that the nextTuple are not called. If you look a this screen : - Green color : Queue size (Another thread in Spout, populating tuple from ES) - Blue

Re: Storm cluster with time

2016-05-11 Thread Sai Dilip Reddy Kiralam
check last 10m capacity in bolts sats & if the value is near to 1 then that bolt is black sheep to your topology increase the parallelism to that bolt. *Best regards,* *K.Sai Dilip Reddy.* On Wed, May 11, 2016 at 1:43 PM, wrote: > Check the image. Where you find

Re: hikiari pool failing

2016-05-11 Thread Sai Dilip Reddy Kiralam
Hi, I'm also looking for answer to same question . *Best regards,* *K.Sai Dilip Reddy.* On Wed, May 11, 2016 at 3:30 PM, sujitha chinnu wrote: > Hi, > > I tried the example code of creating table and inserting the values to the > table using storm jdbc example

Re: Is Storm visualization enough for performance ?

2016-05-11 Thread Spico Florin
Hi! Storm UI should give you a bird overview of the topology behavior on your cluster. For different tools and techniques for finding performance issues and fine tuning I recommend to read the book "Storm applied" the chapters that covers these subject. https://www.safaribooksonline.com/ They