Re: ISR churn

2017-03-23 Thread Radu Radutiu
I see no errors related to zookeeper. I searched all the logs (kafka and zookeeper) for all 4 servers for all entries in the minute with the ISR change at 08:23:54 . Here are the logs: Node n1 kafka_2.12-0.10.2.0/logs/state-change.log:[2017-03-23 08:23:55,151] TRACE Broker 1 cached leader info (Le

Re: Relationship fetch.replica.max.bytes and message.max.bytes

2017-03-23 Thread Ben Stopford
Hi Kostas - The docs for replica.fetch.max.bytes should be helpful here: The number of bytes of messages to attempt to fetch for each partition. This is not an absolute maximum, if the first message in the first non-empty partition of the fetch is larger than this value, the message will still be

Re: Kafka running in VPN

2017-03-23 Thread Ben Stopford
The bootstrap servers are only used to make an initial connection. From there the clients's request metadata which provides them with a 'map' of the cluster. The addresses in the metadata are those registered in Zookeeper by each broker. They don't relate to the bootstrap list in any way. You can c

Re: Relationship fetch.replica.max.bytes and message.max.bytes

2017-03-23 Thread Manikumar
Also, even though replica fetcher makes progress, individual message size should be less than or equal to message.max.bytes. otherwise, we will get RecordTooLargeException. On Thu, Mar 23, 2017 at 2:34 PM, Ben Stopford wrote: > Hi Kostas - The docs for replica.fetch.max.bytes should be helpful h

Benchmarking streaming frameworks

2017-03-23 Thread Giselle van Dongen
Dear users of Streaming Technologies, As a PhD student in big data analytics, I am currently in the process of compiling a list of benchmarks (to test multiple streaming frameworks) in order to create an expanded benchmarking suite. The benchmark suite is being developed as a part of my current wo

Kafka high cpu usage and disconnects

2017-03-23 Thread Paul van der Linden
Hi, I deployed Kafka about a week ago, but there are a few problems with how Kafka behaves. The first is the surprisingly high resource usage, one this the memory (1.5-2 GB for each broker, 3 brokers) although this might be normal. The other one is the cpu usage, which starts with 20% minimum on e

Re: Error in running PageViewTypedDemo

2017-03-23 Thread Shanthi Nellaiappan
Any example for the above would be appreciated. Thanks On Wed, Mar 22, 2017 at 2:50 PM, Shanthi Nellaiappan wrote: > Thanks for the info. > With "page2",{"user":"2", "page":"22", "timestamp":143527817} as input > for streams-pageview-input an "2",{"region":"CA","timestamp":143527817} >

Re: Kafka high cpu usage and disconnects

2017-03-23 Thread Paul van der Linden
Hi, I deployed Kafka about a week ago, but there are a few problems with how Kafka behaves. The first is the surprisingly high resource usage, one this the memory (1.5-2 GB for each broker, 3 brokers) although this might be normal. The other one is the cpu usage, which starts with 20% minimum on e

Re: Kafka high cpu usage and disconnects

2017-03-23 Thread Manikumar
1. may be you can monitor thread wise cpu usage and correlate with thread dump to identify the bottleneck 2. Broker config property connections.max.idle.ms is used to close idle connections. default is 10min. On Thu, Mar 23, 2017 at 3:55 PM, Paul van der Linden wrote: > Hi, > > I deploye

Re: Need help determining consumer group offsets

2017-03-23 Thread Alexandru Ionita
Hi Greg!! Are you using offset auto commit or do you commit manually? 2017-03-22 22:21 GMT+01:00 Greg Lloyd : > I have a 0.8.2.2 cluster which has been configured > with offsets.storage=kafka. We are experiencing some issues after a few > nodes went down and wrong nodes were brought up in their

Re: Kafka high cpu usage and disconnects

2017-03-23 Thread Paul van der Linden
Thanks. I managed to get a cpu dump from staging. The output: THREAD START (obj=5427, id = 24, name="RMI TCP Accept-0", group="system") THREAD START (obj=5427, id = 21, name="main", group="main") THREAD START (obj=5427, id = 25, name="SensorExpiryThread", group="main") THRE

Getting current value of aggregated key

2017-03-23 Thread Jon Yeargers
If I have an aggregation : KTable, VideoLogLine> outTable = sourceStream.groupByKey().reduce(rowReducer, TimeWindows.of(60 * 60 * 1000L).until(10 * 60 * 1000L), "HourAggStore"); how would I go about getting some value from this with a separate process? I have the "

Re: Kafka high cpu usage and disconnects

2017-03-23 Thread Jaikiran Pai
One thing that you might want to check is the number of consumers that are connected/consuming against this Kafka setup. We have consistently noticed that the CPU usage of the broker is very high even with very few consumers (around 10 Java consumers). There's even a JIRA for it. From what I re

Re: Error in running PageViewTypedDemo

2017-03-23 Thread Michael Noll
The quickest answer I can give you is trying a similar example [1], where we provide a driver that generates the required input data for the page view example. -Michael [1] https://github.com/confluentinc/examples/blob/3.2.x/kafka-streams/src/main/java/io/confluent/examples/streams/PageViewRegi

Re: Getting current value of aggregated key

2017-03-23 Thread Michael Noll
Jon, you can use Kafka's interactive queries feature for this: http://docs.confluent.io/current/streams/developer-guide.html#interactive-queries -Michael On Thu, Mar 23, 2017 at 1:52 PM, Jon Yeargers wrote: > If I have an aggregation : > > KTable, VideoLogLine> outTable = > sourceStream.grou

Re: clearing an aggregation?

2017-03-23 Thread Michael Noll
> Since apparently there isn't a way to iterate through Windowed KTables Im > guessing that this sort of 'aggregate and clear' approach still requires an > external datastore (like Redis). Please correct me if Im wrong. You don't need an external datastore. You can use state stores for that: http

Broker heap size

2017-03-23 Thread Avi Asulin
Hi , Is it normal for broker to have high heap size in use in my case 7GB out of 8GB or does it suggests there is a configuration/hardware issue? Thanks Avi

Mailing list docs are misleading/broken/outdated

2017-03-23 Thread Zac Harvey
I am trying to unsubscribe to this mailing list. When I go to: https://kafka.apache.org/contact Apache Kafka kafka.apache.org Contact Mailing Lists. We have a few mailing lists hosted by Apache: users@kafka.apache.org: A list for general user questions about K

Re: Relationship fetch.replica.max.bytes and message.max.bytes

2017-03-23 Thread Kostas Christidis
On Thu, Mar 23, 2017 at 5:04 AM, Ben Stopford wrote: > Hi Kostas - The docs for replica.fetch.max.bytes should be helpful here: > > The number of bytes of messages to attempt to fetch for each partition. > This is not an absolute maximum, if the first message in the first > non-empty partition of

Re: Benchmarking streaming frameworks

2017-03-23 Thread Eno Thereska
Hi Giselle, Great idea! In Kafka Streams we have a few micro-benchmarks we run nightly. They are at: https://github.com/apache/kafka/blob/trunk/streams/src/test/java/org/apache/kafka/streams/perf/SimpleBenchmark.java

Re: Relationship fetch.replica.max.bytes and message.max.bytes

2017-03-23 Thread Ismael Juma
Hi Kostas, Yes, equal is fine. The code that prints an error if replication fails due to this: error(s"Replication is failing due to a message that is greater than replica.fetch.max.bytes for partition $topicPartition. " + "This generally occurs when the max.message.bytes has been overrid

Kafka queue full configuration

2017-03-23 Thread Mohapatra, Sudhir (Nokia - IN/Gurgaon)
Hi, We are trying to simulate the kafka queue full scenarios on kafka 0.10.0. I have seen in earlier versions there is a configuration parameter "queue.buffering.max.messages" which can be set to simulate the queue full scenario. But in the kafka 0.10.0 this parameter is not there. https://kafka.

Re: Mailing list docs are misleading/broken/outdated

2017-03-23 Thread Zac Harvey
Any ideas? From: Zac Harvey Sent: Thursday, March 23, 2017 10:03:48 AM To: users@kafka.apache.org Subject: Mailing list docs are misleading/broken/outdated I am trying to unsubscribe to this mailing list. When I go to: https://kafka.apache.org/contact Apache

Re: Kafka high cpu usage and disconnects

2017-03-23 Thread Paul van der Linden
Doesn't seem to be the clients indeed. Maybe it already uses 13% of cpu on maintaining the cluster. With no connections at all, except zookeeper and the other 2 brokers. This is the cpu usage: CPU SAMPLES BEGIN (total = 86359) Thu Mar 23 16:47:26 2017 rank self accum count trace method 1 8

Should MirrorMaker produce round robin with this configuration?

2017-03-23 Thread Chris Neal
Hi everyone, I am using MirrorMaker to consume from a 0.8.2.2 cluster and produce to a 0.10.2 cluster. All the topics have two partitions on both clusters. My consumer.properties is: zookeeper.connect=[string of servers] group.id=MirrorMaker num.consumer.fetchers=2 partition.assignment.strategy

Re: Mailing list docs are misleading/broken/outdated

2017-03-23 Thread Zac Harvey
Hi, how do I unsubscribe from this mailing list. From: Zac Harvey Sent: Thursday, March 23, 2017 10:03:48 AM To: users@kafka.apache.org Subject: Mailing list docs are misleading/broken/outdated I am trying to unsubscribe to this mailing list. When I go to: htt

Re: kafka streams in-memory Keyvalue store iterator remove broken on upgrade to 0.10.2.0 from 0.10.1.1

2017-03-23 Thread Matthias J. Sax
There is a difference between .delete() and it.remove(). .delete() can only be called in a Streams operator that is responsible to maintain the state. This is of course required to give the developer writing the operator has full control over the store. However, it.remove() is called *outside* fr

Re: Reg: Callback on kafka metadata update

2017-03-23 Thread Sumit Maheshwari
Can someone please help with the reply ? On Tue, Mar 21, 2017 at 2:53 PM, Sumit Maheshwari wrote: > Hi, > > I am looking for a callback that I can depend on to get notified when > kafka metadata changes. > For example: > >- Creation of new topic >- Addition of partitions to existing topi

Re: Mailing list docs are misleading/broken/outdated

2017-03-23 Thread Zac Harvey
Hi how do I unsubscribe from this mailist list, the details in the docs are not the correct set of steps. From: Zac Harvey Sent: Thursday, March 23, 2017 10:03:48 AM To: users@kafka.apache.org Subject: Mailing list docs are misleading/broken/outdated I am trying

Re: Error in running PageViewTypedDemo

2017-03-23 Thread Matthias J. Sax
I guess, the console producer inserts the data as String -- and not as "binary JSON". try so us a different serializer to insert data with expected format for Streams. -Matthias On 3/22/17 2:50 PM, Shanthi Nellaiappan wrote: > Thanks for the info. > With "page2",{"user":"2", "page":"22", "timest

Re: Need help determining consumer group offsets

2017-03-23 Thread Greg Lloyd
Hi Alex, Nice to see you on this list. I actually figured out my problem was quite silly. I was prefixing the consumer group name with the cassandra keyspace and was looking for the wrong group name. On Thu, Mar 23, 2017 at 7:44 AM, Alexandru Ionita < alexandru.ion...@gmail.com> wrote: > Hi Gre

APPLICATION_SERVER_CONFIG ?

2017-03-23 Thread Jon Yeargers
What does this config param do? I see it referenced / used in some samples and here ( https://cwiki.apache.org/confluence/display/KAFKA/KIP-67%3A+Queryable+state+for+Kafka+Streams )

Kafka Streams and reliable state stores

2017-03-23 Thread Steven Schlansker
Hello everyone, I am looking to enhance my Kafka Streams based application from one instance to many. Part of the difficulty is the it seems that all of the state providers are "instance local", either in memory or on local disk. This means to answer queries for non-local partitions you have to

Re: Fast way search data in kafka

2017-03-23 Thread Milind Vaidya
That looks like a faster option. Now the thing is --file requires list of comma separated files. Is there any way to look at all files in a directory ? I tried *log but did not work or I will have to script something to do that ? On Sat, Mar 4, 2017 at 9:04 PM, Guozhang Wang wrote: > Hi Milin

Re: Mailing list docs are misleading/broken/outdated

2017-03-23 Thread Guozhang Wang
Zac, I tried it myself using a different email account to dev list and it works as expected. This is what I got as confirmations: 1) first an email titled "confirm unsubscribe from d...@kafka.apache.org" .. 2) follow its steps to send to the confirmation address, then you'll receive an email tit

Re: Fast way search data in kafka

2017-03-23 Thread Marko Bonaći
You can use something like this to get a comma-separated list of all filed in a folder: ls -l | awk '{print $9}' ORS=',' Marko Bonaći Monitoring | Alerting | Anomaly Detection | Centralized Log Management Solr & Elasticsearch Support Sematext | Contact

Re: Fast way search data in kafka

2017-03-23 Thread Milind Vaidya
Yup. I hacked a small script in bash to do it for all files and per file as weil. Thanks. On Thu, Mar 23, 2017 at 2:31 PM, Marko Bonaći wrote: > You can use something like this to get a comma-separated list of all filed > in a folder: > > ls -l | awk '{print $9}' ORS=',' > > Marko Bonaći > Moni

Re: APPLICATION_SERVER_CONFIG ?

2017-03-23 Thread Matthias J. Sax
The config does not "do" anything. It's metadata that get's broadcasted to other Streams instances for IQ feature. See this blog post for more details: https://www.confluent.io/blog/unifying-stream-processing-and-interactive-queries-in-apache-kafka/ Happy to answer any follow up question. -Matt

Re: Kafka Streams and reliable state stores

2017-03-23 Thread Matthias J. Sax
> There's also race conditions here -- what if node B owns partition 1, > node A redirects a query from a key in that partition, then B fails over to A > concurrently? You will get an exception, and you need to refresh your metadata. Afterward, you need to query again. This blog posts gives more

Re: [DISCUSS] KIP-120: Cleanup Kafka Streams builder API

2017-03-23 Thread Matthias J. Sax
Jay, about the naming schema: >>1. "kstreams" - the DSL >>2. "processor api" - the lower level callback/topology api >>3. KStream/KTable - entities in the kstreams dsl >>4. "Kafka Streams" - General name for stream processing stuff in Kafka, >>including both kstreams and the p

Kafka behavior when Zookeeper cluster does leader reelection

2017-03-23 Thread sdeokule
Hello,      We seem to observe that after the zookeeper cluster had a leader reelection, kafka broker required a restart before the broker id showed up in /brokers/ids on ZK   Is this a known issue and/or a manifestation of [KAFKA-2729] Cached zkVersion not equal to that in zookeeper, broker not

Re: Benchmarking streaming frameworks

2017-03-23 Thread David Garcia
I don’t think “benchmarking” frameworks WRT Kafka is a particularly informative. The various frameworks available are better compared WRT their features and processing limitations. For example, Akka-streams for kafka effects a more intuitive way to express asynchronous operations. If you were

Re: Should MirrorMaker produce round robin with this configuration?

2017-03-23 Thread Manikumar
Are you sure target cluster topics have more than one partition? If you are sending keyed messages, they may be going to the same partition. On Thu, Mar 23, 2017 at 11:15 PM, Chris Neal wrote: > Hi everyone, > > I am using MirrorMaker to consume from a 0.8.2.2 cluster and produce to a > 0.10.2 c

Re: Reg: Callback on kafka metadata update

2017-03-23 Thread Manikumar
we don't have this notification functionality. one way to implement this is by setting watches on respective zookeeper nodes and listen to notifications. On Thu, Mar 23, 2017 at 11:49 PM, Sumit Maheshwari wrote: > Can someone please help with the reply ? > > On Tue, Mar 21, 2017 at 2:53 PM, Sumi

ORC plugin for Kafka HDFS connector

2017-03-23 Thread Manoj Murumkar
Hi, I am developing a connector to support ORC data type in HDFS connector. Everything is in place except for hive integration. Specifically, in the SchemaFileReader implementation. It wants to extract Avro schema from ORC record. However, I am unable to get record name from ORC record in order to

Re: ORC plugin for Kafka HDFS connector

2017-03-23 Thread Manoj Murumkar
>> It wants to extract Avro schema from ORC record. Should say: It wants to extract connect schema from ORC record. On Thu, Mar 23, 2017 at 11:14 PM, Manoj Murumkar wrote: > Hi, > > I am developing a connector to support ORC data type in HDFS connector. > Everything is in place except for hive