tConfig.setProp. The properties are documented at
> https://kafka.apache.org/documentation/#newconsumerconfigs.
>
> Den tir. 18. sep. 2018 kl. 00.17 skrev Milind Vaidya :
>
>> Hi
>>
>> We had been using kafka 0.8 with Storm. It was upgraded to
>> kafka_2.11-0.10.0
Hi
I am trying to user metric support by storm 1.2.2. As mentioned in the
documentation the conventional metric support will be deprecated.
Does this mean the support for capturing built in metrics will go away as
well ?
Is there any way to capture build in metrics with V2 ?
Thanks,
Milind
Hi
We had been using kafka 0.8 with Storm. It was upgraded to
kafka_2.11-0.10.0.1 and Storm 1.1.1 as of now. Though the libraries changed
the code pretty much remained the same.
Now we are trying to upgrade to version 1.2.2 of Storm and also look into
KafkaSpoutRetryService. This also leads to us
h hive release they want to use storm-hive with. The
> documentation for storm-hive should also be updated to reflect this
> requirement.
>
> Happy to provide prs if that sounds like a good idea.
>
> Thanks.
>
> On Fri, Jun 8, 2018 at 3:21 PM, Abhishek Raj
> wrote:
>
>>
hoo.com/?.src=iOS>
>
>
> On Thursday, June 7, 2018, 11:08 AM, Milind Vaidya
> wrote:
>
> Hi
>
> I am using storm and strom-hive version 1.1.1 to store data directly to
> hive cluster.
>
> After using mvn shade plugin and overcoming few other errors I am now
>
Hi
I am using storm and strom-hive version 1.1.1 to store data directly to
hive cluster.
After using mvn shade plugin and overcoming few other errors I am now stuck
at this point.
The strange thing observed was few partitions were created but the data was
not inserted.
*dt=17688/platform=site/c
I have Kakfa - Kafka Spout - Storm Bolts set up.
It processes heavy data (well it is supposed to). But I am accumulating it
in files and eventually move it to "uploading" directory. Another bolt
uploads it to S3.
If anything happens to file : say IO error, opening, closing file error,
transfer e
Hi
>
> In current topology setup with 3 bolts, reading from kafka spout, the
> config is such that there are multiple tasks within a worker.
>
> So 1 kafka spout + 3 bolts = min 4 executers in a worker and then each
> executer has multiple tasks. (Please correct me if my understanding is
> wrong h
I have following topology structure
Kafka Spout
Bolt A : Reads tuples from spout and extracts some info
*_collector.**emit**(**tuple**,* *new** Values**(...**)**)**;*
*_collector.ack(tuple)*
*In case of exception / error *
*_collector.fail(tuple)*
Bolt B : Create files based on info extra
Johansen
>
> On Tue, Nov 15, 2016 at 3:59 PM, Milind Vaidya wrote:
>
>> Hi
>>
>> I am having a use case where few files in a directory are needed to be
>> processed by a certain bolt x written in Java.
>>
>> I am setting number of executers and tasks same
Hi
I am having a use case where few files in a directory are needed to be
processed by a certain bolt x written in Java.
I am setting number of executers and tasks same which is > 1. Say I have 4
executers and tasks.
As I understand, these are essentially threads in the worker process. Now I
wan
a is
> already in Kafka. Just keep the tuple ID and write to file. When you close
> the file ack all of the tuple IDs.
> On May 11, 2016 5:42 PM, "Steven Lewis" wrote:
>
>> It sounds like you want to use Spark / Spark Streaming to do that kind of
>> batch
.com/pinterest/secor/blob/master/DESIGN.md)
Streamx (https://github.com/qubole/streamx) looks promising too.
With Secor looking more promising.
On Wed, May 11, 2016 at 2:40 PM, Steven Lewis
wrote:
> It sounds like you want to use Spark / Spark Streaming to do that kind of
> batching outpu
then ack all of the input tuples after the file has been closed.
>
> On Wed, May 11, 2016 at 3:43 PM, Milind Vaidya wrote:
>
>> in case of failure to upload a file or disk corruption leading to loss of
>> file, we have only current offset in Kafka Spout but have no record as to
&
for you. Then uploading files to S3 is the
> responsibility of another job. For example, a storm topology that monitors
> the output folder.
>
> Monitoring the data from Kafka all the way out to S3 seems unnecessary.
>
> On Wed, May 11, 2016 at 1:50 PM, Milind Vaidya wrote:
>
nsibility of another job. For example, a storm topology that monitors
>> the output folder.
>>
>> Monitoring the data from Kafka all the way out to S3 seems unnecessary.
>>
>> On Wed, May 11, 2016 at 1:50 PM, Milind Vaidya wrote:
>>
>>> It does not
M, Milind Vaidya wrote:
>
>> Anybody ? Anything about this ?
>>
>> On Wed, May 4, 2016 at 11:31 AM, Milind Vaidya wrote:
>>
>>> Is there any way I can know what Kafka offset corresponds to current
>>> tuple I am processing in a bolt ?
>>>
>
Anybody ? Anything about this ?
On Wed, May 4, 2016 at 11:31 AM, Milind Vaidya wrote:
> Is there any way I can know what Kafka offset corresponds to current tuple
> I am processing in a bolt ?
>
> Use case : Need to batch events from Kafka, persists them to a local file
> and ev
Is there any way I can know what Kafka offset corresponds to current tuple
I am processing in a bolt ?
Use case : Need to batch events from Kafka, persists them to a local file
and eventually upload it to the S3. To manager failure cases, need to know
the Kafka offset for a message, so that it can
:22 AM, John Yost wrote:
> The only data loss I've seen is where a topology with KafkaSpout gets so
> far behind that the Kafka log segment for a given partition is rotated. In
> such a scenario, you'll see an OffsetOutOfRangeException.
>
> --John
>
> On Tue, Jan 19,
Is there any way to know process/thread ids of kafka spout and underlying
bolts in a topology on linux command line ?
As an extension of other thread about failure scenarios, I want to kill
manually these individual workers/executers/tasks if possible to simulate
corresponding failure scenarios an
done properly. Though data could be lost due to retention being
> kicked in kafka. The topology will keep retrying a timed out message but
> kafka is not going to keep it forever.
>
> On Fri, Jan 15, 2016 at 12:21 AM, Milind Vaidya wrote:
>
>> Hi
>>
>> I have be
We have been using regular storm, topology-bolt set up for a while.
The input to storm is from kafka cluster and zookeeper keeps the metadata.
I was looking at the Trident for its exactly once paradigm. We are trying
to achieve minimum data loss, which may lead to replaying the logs (Kafka
stores
Hi
I have been using kafka-storm setup for more than a year, running almost 10
different topologies.
The flow is something like this
Producer --> Kafka Cluster --> Storm cluster --> MongoDB.
The zookeeper keeps the metadata.
So far the approach was little ad hoc and want it to be more discipl
Try
SpoutConfig conf = new SpoutConfig(hosts, topicName, "/event_spout",
"event_spout");
You had given empty string in the conf, parameter zkRoot seems missing.
On Wed, Jan 13, 2016 at 4:46 PM, Jamie W wrote:
> Hi,
>
> I'm having troubles using KafkaSpout in storm-kafka. It can connect to
>
> partitions. In the example I presented in this thread, that would be 2
> topics * 10 partitions per topic = 20.
>
> Just wondering if my logic makes sense and/or if there is a better
> parallelism strategy for KafkaBolts.
>
> Thanks
>
> --John
>
> On Wed, Jan 13,
Hi John,
No. It is not driven by number of topics or partitions. This is a number
you can configure while setting the bolt into the topology builder.
Here is useful link to clarify more :
http://storm.apache.org/documentation/Understanding-the-parallelism-of-a-Storm-topology.html
On Tue, Jan 1
27 matches
Mail list logo