What are the use cases where technologies like Kafka , Storm , Flink , , Hive , Hadoop and Spark differentiates ?

2018-06-27 Thread Malik, Shibha (GE Renewable Energy, consultant)
Hi all, What are the use cases where technologies like Kafka , Storm , Flink , , Hive , Hadoop and Spark differentiates ? Is there a good material online or book to refer for this ? Thanks, Shibha

Re: Kafka-Storm: troubleshooting low R/W throughput

2015-03-22 Thread Manu Zhang
Hi Emmanuel, You can firstly run a kafka producer perf (bin/kafka-producer-perf-test.sh) test with your storm consumers and kafka consumer perf (bin/ kafka-consumer-perf.test.sh) test with your own producers respectively to see if the bottleneck is really in kafka. Thanks, Manu Zhang On Mon,

Re: Kafka-Storm: troubleshooting low R/W throughput

2015-03-22 Thread Harsha
Hi Emmanuel,        Can you post your kafka server.properties and in your producer are your distributing your messages into all kafka topic partitions. --  Harsha On March 20, 2015 at 12:33:02 PM, Emmanuel (ele...@msn.com) wrote: Kafka on test cluster:  2 Kafka nodes, 2GB, 2CPUs 3 Zookeeper

Kafka-Storm: troubleshooting low R/W throughput

2015-03-20 Thread Emmanuel
Kafka on test cluster: 2 Kafka nodes, 2GB, 2CPUs3 Zookeeper nodes, 2GB, 2CPUs Storm:3 nodes, 3CPUs each, on the same Zookeeper cluster as Kafka. 1 topic, 5 partitions, replication x2 Whether I use 1 slot for the Kafka Spout or 5 slots (=#partitions), the throughput seems about the same. I can't

Re: Help is processing huge data through Kafka-storm cluster

2014-06-19 Thread Shaikh Ahmed
Hi All, Thanks for your valuable comments. Sure, I will give a try with Samza and Data Torrent. Meanwhile, I sharing screenshot of Storm UI. Please have a look at it. Kafka producer is able to push 35 million messages to broker in two hours with the of approx. 4k messages per second. On other

Re: Help is processing huge data through Kafka-storm cluster

2014-06-19 Thread hsy...@gmail.com
To clarify for my last email, by 10 nodes, I mean 10 kafka partitions distributed in 10 different brokers. In my test, datatorrent can scale up linearly with kafka partitions without any problem. Whatever you produce to kafka, it can easily take into your application. And I'm quite sure it can

Re: Help is processing huge data through Kafka-storm cluster

2014-06-17 Thread hsy...@gmail.com
Hi Shaikh, I heard some throughput bottleneck of storm. It cannot really scale up with kafka. I recommend you to try DataTorrent platform(https://www.datatorrent.com/) The platform itself is not open-source but it has a open-source library ( https://github.com/DataTorrent/Malhar) which contains

Re: Help is processing huge data through Kafka-storm cluster

2014-06-17 Thread Robert Rodgers
we have been experimenting with Samza which is also worth a look. It's basically a topic-to-topic node on Yarn. On Jun 17, 2014, at 10:44 AM, hsy...@gmail.com wrote: Hi Shaikh, I heard some throughput bottleneck of storm. It cannot really scale up with kafka. I recommend you to try

Re: Help is processing huge data through Kafka-storm cluster

2014-06-17 Thread Neha Narkhede
Samza is an open source stream processing framework built on top of Kafka and YARN. It is high throughput, scalable and has in built state management and fault tolerance support. Though I may be biased, it is worth taking a look :-) Thanks, Neha On Tue, Jun 17, 2014 at 10:55 AM, Robert Rodgers

Re: Help is processing huge data through Kafka-storm cluster

2014-06-15 Thread pushkar priyadarshi
what throughput are you getting from your kafka cluster alone?Storm throughput can be dependent on what processing you are actually doing from inside it.so must look at each component starting from kafka first. Regards, Pushkar On Sat, Jun 14, 2014 at 8:44 PM, Shaikh Ahmed rnsr.sha...@gmail.com

Re: Help is processing huge data through Kafka-storm cluster

2014-06-15 Thread pushkar priyadarshi
and one more thing.using kafka metrices you can easily monitor at what rate you are able to publish on to kafka and what speed your consumer(in this case your spout) is able to drain messages out of kafka.it's possible that due to slowly draining out even publishing rate in worst case might get

Re: Help is processing huge data through Kafka-storm cluster

2014-06-15 Thread Robert Hodges
Hi Riyaz, There are a number of reasons that you may be getting low performance. Here are some questions to get started: 1. How big are your messages? To meet your throughput requirement you need a minimum of 10K messages per second continuously. You specified a replication factor of 3 so at

Re: Help is processing huge data through Kafka-storm cluster

2014-06-15 Thread Robert Hodges
+1 for detailed examination of metrics. You can see the main metrics here: https://kafka.apache.org/documentation.html#monitoring Jconsole is very helpful for looking quickly at what is going on. Cheers, Robert On Sun, Jun 15, 2014 at 7:49 AM, pushkar priyadarshi

Help is processing huge data through Kafka-storm cluster

2014-06-14 Thread Shaikh Ahmed
Hi, Daily we are downloaded 28 Million of messages and Monthly it goes up to 800+ million. We want to process this amount of data through our kafka and storm cluster and would like to store in HBase cluster. We are targeting to process one month of data in one day. Is it possible? We have

Re: Kafka-Storm Run-time Exception

2014-06-06 Thread Abhishek Bhattacharjee
dependency included with kafka_2.9.2-0.8.1 and has more than one dependency that uses scala 2.8.x. If this is where it's coming from, a solution is to just exclude jline from the kafka/storm-kafka dependency in your pom.xml. ex: dependency groupIdorg.apache.kafka/groupId artifactIdkafka_2.9.2

Kafka-Storm Run-time Exception

2014-06-05 Thread Abhishek Bhattacharjee
I am using kafka with storm. I am using maven to build my topology and I am using scala 2.9.2 same as I am using kafka_2.9.2_0.8.1. Topology build perfectly using maven. But hwn I submit the topology to storm I get the following Exception: java.lang.NoSuchMethodError:

Re: Kafka-Storm Run-time Exception

2014-06-05 Thread Andrew Neilson
it's coming from, a solution is to just exclude jline from the kafka/storm-kafka dependency in your pom.xml. ex: dependency groupIdorg.apache.kafka/groupId artifactIdkafka_2.9.2/artifactId version0.8.1/version exclusions exclusion groupIdjline/groupId artifactIdjline

kafka-storm-starter released: code examples that integrate Kafka 0.8 and Storm 0.9

2014-05-23 Thread Michael G. Noll
Hi everyone, to sweeten the upcoming long weekend I have released code examples that show how to integrate Kafka 0.8+ with Storm 0.9+, while using Apache Avro as the data serialization format. https://github.com/miguno/kafka-storm-starter Since the integration of the latest Kafka and Storm

Re: kafka-storm-starter released: code examples that integrate Kafka 0.8 and Storm 0.9

2014-05-23 Thread Neha Narkhede
+, while using Apache Avro as the data serialization format. https://github.com/miguno/kafka-storm-starter Since the integration of the latest Kafka and Storm versions have been a popular topic on the mailing lists (read: many questions/threads) I hope that this will help a little to not only

kafka + storm

2014-01-01 Thread S Ahmed
not be suitable right? I first learned of kafka + storm based on a post by someone from loggly, but loggly can process items randomly I would imagine because at the end of the day each log item is timestamped so after it is processed and indexed things would be fine. But if your use case is such that processing

Re: kafka + storm

2014-01-01 Thread Joseph Lawson
is very important so using something like storm would not be suitable right? I first learned of kafka + storm based on a post by someone from loggly, but loggly can process items randomly I would imagine because at the end of the day each log item is timestamped so after it is processed and indexed

Re: kafka + storm

2014-01-01 Thread Saulius Zemaitaitis
, but just a quick question, storm seems to have all these workers but they way it seems to me the order in which these items are processed off the queue is very random correct? In my use case order is very important so using something like storm would not be suitable right? I first learned of kafka

Application Logic: In Kafka, Storm or Redis?

2013-08-28 Thread Yavar Husain
the points for a particular time than only I give it to Kafka/Storm. I am confused :) Any help would be appreciated. Sorry for any grammatical errors as I just was thinking aloud and jotting down my question. Regards, Yavar

Re: Application Logic: In Kafka, Storm or Redis?

2013-08-28 Thread Travis Brady
. Where should I place my application logic: (1) In Kafka, (2) In Storm or should I use something like Redis to get all the timestamped data and when I get all the points for a particular time than only I give it to Kafka/Storm. I am confused :) Any help would be appreciated. Sorry for any