and one more thing.using kafka metrices you can easily monitor at what rate
you are able to publish on to kafka and what speed your consumer(in this
case your spout) is able to drain messages out of kafka.it's possible that
due to slowly draining out even publishing rate in worst case might get
effected as if consumer lags behind too much then it will result into disk
seeks while consuming the older messages.


On Sun, Jun 15, 2014 at 8:16 PM, pushkar priyadarshi <
priyadarshi.push...@gmail.com> wrote:

> what throughput are you getting from your kafka cluster alone?Storm
> throughput can be dependent on what processing you are actually doing from
> inside it.so must look at each component starting from kafka first.
>
> Regards,
> Pushkar
>
>
> On Sat, Jun 14, 2014 at 8:44 PM, Shaikh Ahmed <rnsr.sha...@gmail.com>
> wrote:
>
>> Hi,
>>
>> Daily we are downloaded 28 Million of messages and Monthly it goes up to
>> 800+ million.
>>
>> We want to process this amount of data through our kafka and storm cluster
>> and would like to store in HBase cluster.
>>
>> We are targeting to process one month of data in one day. Is it possible?
>>
>> We have setup our cluster thinking that we can process million of messages
>> in one sec as mentioned on web. Unfortunately, we have ended-up with
>> processing only 1200-1700 message per second.  if we continue with this
>> speed than it will take min 10 days to process 30 days of data, which is
>> the relevant solution in our case.
>>
>> I suspect that we have to change some configuration to achieve this goal.
>> Looking for help from experts to support me in achieving this task.
>>
>> *Kafka Cluster:*
>> Kafka is running on two dedicated machines with 48 GB of RAM and 2TB of
>> storage. We have total 11 nodes kafka cluster spread across these two
>> servers.
>>
>> *Kafka Configuration:*
>> producer.type=async
>> compression.codec=none
>> request.required.acks=-1
>> serializer.class=kafka.serializer.StringEncoder
>> queue.buffering.max.ms=100000
>> batch.num.messages=10000
>> queue.buffering.max.messages=100000
>> default.replication.factor=3
>> controlled.shutdown.enable=true
>> auto.leader.rebalance.enable=true
>> num.network.threads=2
>> num.io.threads=8
>> num.partitions=4
>> log.retention.hours=12
>> log.segment.bytes=536870912
>> log.retention.check.interval.ms=60000
>> log.cleaner.enable=false
>>
>> *Storm Cluster:*
>> Storm is running with 5 supervisor and 1 nimbus on IBM servers with 48 GB
>> of RAM and 8TB of storage. These servers are shared with hbase cluster.
>>
>> *Kafka spout configuration*
>> kafkaConfig.bufferSizeBytes = 1024*1024*8;
>> kafkaConfig.fetchSizeBytes = 1024*1024*4;
>> kafkaConfig.forceFromStart = true;
>>
>> *Topology: StormTopology*
>> Spout           - Partition: 4
>> First Bolt     -  parallelism hint: 6 and Num tasks: 5
>> Second Bolt -  parallelism hint: 5
>> Third Bolt     -   parallelism hint: 3
>> Fourth Bolt   -  parallelism hint: 3 and Num tasks: 4
>> Fifth Bolt      -  parallelism hint: 3
>> Sixth Bolt     -  parallelism hint: 3
>>
>> *Supervisor configuration:*
>>
>> storm.local.dir: "/app/storm"
>> storm.zookeeper.port: 2181
>> storm.cluster.mode: "distributed"
>> storm.local.mode.zmq: false
>> supervisor.slots.ports:
>>     - 6700
>>     - 6701
>>     - 6702
>>     - 6703
>> supervisor.worker.start.timeout.secs: 180
>> supervisor.worker.timeout.secs: 30
>> supervisor.monitor.frequency.secs: 3
>> supervisor.heartbeat.frequency.secs: 5
>> supervisor.enable: true
>>
>> storm.messaging.netty.server_worker_threads: 2
>> storm.messaging.netty.client_worker_threads: 2
>> storm.messaging.netty.buffer_size: 52428800 #50MB buffer
>> storm.messaging.netty.max_retries: 25
>> storm.messaging.netty.max_wait_ms: 1000
>> storm.messaging.netty.min_wait_ms: 100
>>
>>
>> supervisor.childopts: "-Xmx1024m -Djava.net.preferIPv4Stack=true"
>> worker.childopts: "-Xmx2048m -Djava.net.preferIPv4Stack=true"
>>
>>
>> Please let me know if more information needed..
>>
>> Thanks in advance.
>>
>> Regards,
>> Riyaz
>>
>
>

Reply via email to