Hi all,
What are the use cases where technologies like Kafka , Storm , Flink , , Hive ,
Hadoop and Spark differentiates ?
Is there a good material online or book to refer for this ?
Thanks,
Shibha
Hi Emmanuel,
You can firstly run a kafka producer perf (bin/kafka-producer-perf-test.sh)
test with your storm consumers and kafka consumer perf (bin/
kafka-consumer-perf.test.sh) test with your own producers respectively to
see if the bottleneck is really in kafka.
Thanks,
Manu Zhang
On Mon,
Hi Emmanuel,
Can you post your kafka server.properties and in your producer are your
distributing your messages into all kafka topic partitions.
--
Harsha
On March 20, 2015 at 12:33:02 PM, Emmanuel (ele...@msn.com) wrote:
Kafka on test cluster:
2 Kafka nodes, 2GB, 2CPUs
3 Zookeeper
Kafka on test cluster: 2 Kafka nodes, 2GB, 2CPUs3 Zookeeper nodes, 2GB, 2CPUs
Storm:3 nodes, 3CPUs each, on the same Zookeeper cluster as Kafka.
1 topic, 5 partitions, replication x2
Whether I use 1 slot for the Kafka Spout or 5 slots (=#partitions), the
throughput seems about the same.
I can't
Hi All,
Thanks for your valuable comments.
Sure, I will give a try with Samza and Data Torrent.
Meanwhile, I sharing screenshot of Storm UI. Please have a look at it.
Kafka producer is able to push 35 million messages to broker in two hours
with the of approx. 4k messages per second. On other
To clarify for my last email, by 10 nodes, I mean 10 kafka partitions
distributed in 10 different brokers. In my test, datatorrent can scale up
linearly with kafka partitions without any problem. Whatever you produce to
kafka, it can easily take into your application. And I'm quite sure it can
Hi Shaikh,
I heard some throughput bottleneck of storm. It cannot really scale up with
kafka.
I recommend you to try DataTorrent platform(https://www.datatorrent.com/)
The platform itself is not open-source but it has a open-source library (
https://github.com/DataTorrent/Malhar) which contains
we have been experimenting with Samza which is also worth a look. It's
basically a topic-to-topic node on Yarn.
On Jun 17, 2014, at 10:44 AM, hsy...@gmail.com wrote:
Hi Shaikh,
I heard some throughput bottleneck of storm. It cannot really scale up with
kafka.
I recommend you to try
Samza is an open source stream processing framework built on top of Kafka
and YARN. It is high throughput, scalable and has in built state management
and fault tolerance support. Though I may be biased, it is worth taking a
look :-)
Thanks,
Neha
On Tue, Jun 17, 2014 at 10:55 AM, Robert Rodgers
what throughput are you getting from your kafka cluster alone?Storm
throughput can be dependent on what processing you are actually doing from
inside it.so must look at each component starting from kafka first.
Regards,
Pushkar
On Sat, Jun 14, 2014 at 8:44 PM, Shaikh Ahmed rnsr.sha...@gmail.com
and one more thing.using kafka metrices you can easily monitor at what rate
you are able to publish on to kafka and what speed your consumer(in this
case your spout) is able to drain messages out of kafka.it's possible that
due to slowly draining out even publishing rate in worst case might get
Hi Riyaz,
There are a number of reasons that you may be getting low performance.
Here are some questions to get started:
1. How big are your messages? To meet your throughput requirement you need
a minimum of 10K messages per second continuously. You specified a
replication factor of 3 so at
+1 for detailed examination of metrics. You can see the main metrics here:
https://kafka.apache.org/documentation.html#monitoring
Jconsole is very helpful for looking quickly at what is going on.
Cheers, Robert
On Sun, Jun 15, 2014 at 7:49 AM, pushkar priyadarshi
Hi,
Daily we are downloaded 28 Million of messages and Monthly it goes up to
800+ million.
We want to process this amount of data through our kafka and storm cluster
and would like to store in HBase cluster.
We are targeting to process one month of data in one day. Is it possible?
We have
dependency included with
kafka_2.9.2-0.8.1 and has more than one dependency that uses scala 2.8.x.
If this is where it's coming from, a solution is to just exclude jline from
the kafka/storm-kafka dependency in your pom.xml. ex:
dependency
groupIdorg.apache.kafka/groupId
artifactIdkafka_2.9.2
I am using kafka with storm. I am using maven to build my topology and I am
using scala 2.9.2 same as I am using kafka_2.9.2_0.8.1.
Topology build perfectly using maven. But hwn I submit the topology to
storm I get the following Exception:
java.lang.NoSuchMethodError:
it's coming from, a solution is to just exclude jline from
the kafka/storm-kafka dependency in your pom.xml. ex:
dependency
groupIdorg.apache.kafka/groupId
artifactIdkafka_2.9.2/artifactId
version0.8.1/version
exclusions
exclusion
groupIdjline/groupId
artifactIdjline
Hi everyone,
to sweeten the upcoming long weekend I have released code examples that
show how to integrate Kafka 0.8+ with Storm 0.9+, while using Apache
Avro as the data serialization format.
https://github.com/miguno/kafka-storm-starter
Since the integration of the latest Kafka and Storm
+, while using Apache Avro
as the data serialization format.
https://github.com/miguno/kafka-storm-starter
Since the integration of the latest Kafka and Storm versions have been a
popular topic on the mailing lists (read: many questions/threads) I hope
that this will help a little to not only
not be suitable right?
I first learned of kafka + storm based on a post by someone from loggly,
but loggly can process items randomly I would imagine because at the end of
the day each log item is timestamped so after it is processed and indexed
things would be fine.
But if your use case is such that processing
is very important so using something like storm would
not be suitable right?
I first learned of kafka + storm based on a post by someone from loggly,
but loggly can process items randomly I would imagine because at the end of
the day each log item is timestamped so after it is processed and indexed
, but just a quick question, storm seems to
have all these workers but they way it seems to me the order in which these
items are processed off the queue is very random correct?
In my use case order is very important so using something like storm would
not be suitable right?
I first learned of kafka
the points for a particular time than only I give it to
Kafka/Storm.
I am confused :) Any help would be appreciated. Sorry for any grammatical
errors as I just was thinking aloud and jotting down my question.
Regards,
Yavar
.
Where should I place my application logic: (1) In Kafka, (2) In Storm or
should I use something like Redis to get all the timestamped data and
when
I get all the points for a particular time than only I give it to
Kafka/Storm.
I am confused :) Any help would be appreciated. Sorry for any
24 matches
Mail list logo