Re: Transition from kafka 0.8 to 0.10

2018-10-01 Thread Milind Vaidya
tConfig.setProp. The properties are documented at > https://kafka.apache.org/documentation/#newconsumerconfigs. > > Den tir. 18. sep. 2018 kl. 00.17 skrev Milind Vaidya : > >> Hi >> >> We had been using kafka 0.8 with Storm. It was upgraded to >> kafka_2.11-0.10.0

Capturing Built in metrics with V2

2018-09-24 Thread Milind Vaidya
Hi I am trying to user metric support by storm 1.2.2. As mentioned in the documentation the conventional metric support will be deprecated. Does this mean the support for capturing built in metrics will go away as well ? Is there any way to capture build in metrics with V2 ? Thanks, Milind

Transition from kafka 0.8 to 0.10

2018-09-17 Thread Milind Vaidya
Hi We had been using kafka 0.8 with Storm. It was upgraded to kafka_2.11-0.10.0.1 and Storm 1.1.1 as of now. Though the libraries changed the code pretty much remained the same. Now we are trying to upgrade to version 1.2.2 of Storm and also look into KafkaSpoutRetryService. This also leads to us

Re: Storm-Hive : lock acquiring problem

2018-06-12 Thread Milind Vaidya
h hive release they want to use storm-hive with. The > documentation for storm-hive should also be updated to reflect this > requirement. > > Happy to provide prs if that sounds like a good idea. > > Thanks. > > On Fri, Jun 8, 2018 at 3:21 PM, Abhishek Raj > wrote: > >>

Re: Storm-Hive : lock acquiring problem

2018-06-07 Thread Milind Vaidya
hoo.com/?.src=iOS> > > > On Thursday, June 7, 2018, 11:08 AM, Milind Vaidya > wrote: > > Hi > > I am using storm and strom-hive version 1.1.1 to store data directly to > hive cluster. > > After using mvn shade plugin and overcoming few other errors I am now >

Storm-Hive : lock acquiring problem

2018-06-07 Thread Milind Vaidya
Hi I am using storm and strom-hive version 1.1.1 to store data directly to hive cluster. After using mvn shade plugin and overcoming few other errors I am now stuck at this point. The strange thing observed was few partitions were created but the data was not inserted. *dt=17688/platform=site/c

Too many tuples failing in KafkaSpout

2017-05-15 Thread Milind Vaidya
I have Kakfa - Kafka Spout - Storm Bolts set up. It processes heavy data (well it is supposed to). But I am accumulating it in files and eventually move it to "uploading" directory. Another bolt uploads it to S3. If anything happens to file : say IO error, opening, closing file error, transfer e

Re: Sharing resources across tasks

2017-04-26 Thread Milind Vaidya
Hi > > In current topology setup with 3 bolts, reading from kafka spout, the > config is such that there are multiple tasks within a worker. > > So 1 kafka spout + 3 bolts = min 4 executers in a worker and then each > executer has multiple tasks. (Please correct me if my understanding is > wrong h

Acking and Anchoring issues

2016-11-18 Thread Milind Vaidya
I have following topology structure Kafka Spout Bolt A : Reads tuples from spout and extracts some info *_collector.**emit**(**tuple**,* *new** Values**(...**)**)**;* *_collector.ack(tuple)* *In case of exception / error * *_collector.fail(tuple)* Bolt B : Create files based on info extra

Re: Multi Threading precautions for multiple executers / tasks

2016-11-15 Thread Milind Vaidya
Johansen > > On Tue, Nov 15, 2016 at 3:59 PM, Milind Vaidya wrote: > >> Hi >> >> I am having a use case where few files in a directory are needed to be >> processed by a certain bolt x written in Java. >> >> I am setting number of executers and tasks same

Multi Threading precautions for multiple executers / tasks

2016-11-15 Thread Milind Vaidya
Hi I am having a use case where few files in a directory are needed to be processed by a certain bolt x written in Java. I am setting number of executers and tasks same which is > 1. Say I have 4 executers and tasks. As I understand, these are essentially threads in the worker process. Now I wan

Re: Getting Kafka Offset in Storm Bolt

2016-05-11 Thread Milind Vaidya
a is > already in Kafka. Just keep the tuple ID and write to file. When you close > the file ack all of the tuple IDs. > On May 11, 2016 5:42 PM, "Steven Lewis" wrote: > >> It sounds like you want to use Spark / Spark Streaming to do that kind of >> batch

Re: Getting Kafka Offset in Storm Bolt

2016-05-11 Thread Milind Vaidya
.com/pinterest/secor/blob/master/DESIGN.md) Streamx (https://github.com/qubole/streamx) looks promising too. With Secor looking more promising. On Wed, May 11, 2016 at 2:40 PM, Steven Lewis wrote: > It sounds like you want to use Spark / Spark Streaming to do that kind of > batching outpu

Re: Getting Kafka Offset in Storm Bolt

2016-05-11 Thread Milind Vaidya
then ack all of the input tuples after the file has been closed. > > On Wed, May 11, 2016 at 3:43 PM, Milind Vaidya wrote: > >> in case of failure to upload a file or disk corruption leading to loss of >> file, we have only current offset in Kafka Spout but have no record as to &

Re: Getting Kafka Offset in Storm Bolt

2016-05-11 Thread Milind Vaidya
for you. Then uploading files to S3 is the > responsibility of another job. For example, a storm topology that monitors > the output folder. > > Monitoring the data from Kafka all the way out to S3 seems unnecessary. > > On Wed, May 11, 2016 at 1:50 PM, Milind Vaidya wrote: >

Re: Getting Kafka Offset in Storm Bolt

2016-05-11 Thread Milind Vaidya
nsibility of another job. For example, a storm topology that monitors >> the output folder. >> >> Monitoring the data from Kafka all the way out to S3 seems unnecessary. >> >> On Wed, May 11, 2016 at 1:50 PM, Milind Vaidya wrote: >> >>> It does not

Re: Getting Kafka Offset in Storm Bolt

2016-05-11 Thread Milind Vaidya
M, Milind Vaidya wrote: > >> Anybody ? Anything about this ? >> >> On Wed, May 4, 2016 at 11:31 AM, Milind Vaidya wrote: >> >>> Is there any way I can know what Kafka offset corresponds to current >>> tuple I am processing in a bolt ? >>> >

Re: Getting Kafka Offset in Storm Bolt

2016-05-10 Thread Milind Vaidya
Anybody ? Anything about this ? On Wed, May 4, 2016 at 11:31 AM, Milind Vaidya wrote: > Is there any way I can know what Kafka offset corresponds to current tuple > I am processing in a bolt ? > > Use case : Need to batch events from Kafka, persists them to a local file > and ev

Getting Kafka Offset in Storm Bolt

2016-05-04 Thread Milind Vaidya
Is there any way I can know what Kafka offset corresponds to current tuple I am processing in a bolt ? Use case : Need to batch events from Kafka, persists them to a local file and eventually upload it to the S3. To manager failure cases, need to know the Kafka offset for a message, so that it can

Re: Data loss scenarios

2016-01-20 Thread Milind Vaidya
:22 AM, John Yost wrote: > The only data loss I've seen is where a topology with KafkaSpout gets so > far behind that the Kafka log segment for a given partition is rotated. In > such a scenario, you'll see an OffsetOutOfRangeException. > > --John > > On Tue, Jan 19,

Process/Thread ids for bolts and spouts

2016-01-19 Thread Milind Vaidya
Is there any way to know process/thread ids of kafka spout and underlying bolts in a topology on linux command line ? As an extension of other thread about failure scenarios, I want to kill manually these individual workers/executers/tasks if possible to simulate corresponding failure scenarios an

Re: Data loss scenarios

2016-01-19 Thread Milind Vaidya
done properly. Though data could be lost due to retention being > kicked in kafka. The topology will keep retrying a timed out message but > kafka is not going to keep it forever. > > On Fri, Jan 15, 2016 at 12:21 AM, Milind Vaidya wrote: > >> Hi >> >> I have be

Replaying of logs and Trident

2016-01-14 Thread Milind Vaidya
We have been using regular storm, topology-bolt set up for a while. The input to storm is from kafka cluster and zookeeper keeps the metadata. I was looking at the Trident for its exactly once paradigm. We are trying to achieve minimum data loss, which may lead to replaying the logs (Kafka stores

Data loss scenarios

2016-01-14 Thread Milind Vaidya
Hi I have been using kafka-storm setup for more than a year, running almost 10 different topologies. The flow is something like this Producer --> Kafka Cluster --> Storm cluster --> MongoDB. The zookeeper keeps the metadata. So far the approach was little ad hoc and want it to be more discipl

Re: NoNode exception on storm-kafka

2016-01-14 Thread Milind Vaidya
Try SpoutConfig conf = new SpoutConfig(hosts, topicName, "/event_spout", "event_spout"); You had given empty string in the conf, parameter zkRoot seems missing. On Wed, Jan 13, 2016 at 4:46 PM, Jamie W wrote: > Hi, > > I'm having troubles using KafkaSpout in storm-kafka. It can connect to >

Re: KafkaBolt question: number of executors

2016-01-13 Thread Milind Vaidya
> partitions. In the example I presented in this thread, that would be 2 > topics * 10 partitions per topic = 20. > > Just wondering if my logic makes sense and/or if there is a better > parallelism strategy for KafkaBolts. > > Thanks > > --John > > On Wed, Jan 13,

Re: KafkaBolt question: number of executors

2016-01-12 Thread Milind Vaidya
Hi John, No. It is not driven by number of topics or partitions. This is a number you can configure while setting the bolt into the topology builder. Here is useful link to clarify more : http://storm.apache.org/documentation/Understanding-the-parallelism-of-a-Storm-topology.html On Tue, Jan 1