Son if I am running my stream and across a cluster of different machine each machine should have a different client id.
On 4 Mar 2017 12:36 a.m., "Guozhang Wang" <wangg...@gmail.com> wrote: > Sachin, > > The reason that you got metrics name as > > new-part-advice-d1094e71-0f59-45e8-98f4-477f9444aa91-StreamThread-1 > > > Is that you did not set the "CLIENT_ID_CONFIG" in your app, and > KafkaStreams have to use a default combo of "appID: > new-part-advice"-"processID: a UUID to guarantee uniqueness across > machines" as its clientId. > > > As for metricsName, it is always set as "clientId + "-" + threadName" where > "StreamThread-1" is your threadName which is unique WITHIN the JVM and that > is why we still need the globally unique clientId for distinguishment. > > I just checked the source code and this logic was not changed from 0.10.1 > to 0.10.2, so I guess you set your clientId as "new-advice-1" as well in > 0.10.1? > > > Guozhang > > > > On Fri, Mar 3, 2017 at 4:02 AM, Eno Thereska <eno.there...@gmail.com> > wrote: > > > Hi Sachin, > > > > Now that the confluent platform 3.2 is out, we also have some more > > documentation on this here: http://docs.confluent.io/3.2. > > 0/streams/monitoring.html <http://docs.confluent.io/3.2. > > 0/streams/monitoring.html>. We added a note on how to add other metrics. > > > > Yeah, your calculation on poll time makes sense. The important metrics > are > > the “info” ones that are on by default. However, for stageful > applications, > > if you suspect that state stores might be bottlenecking, you might want > to > > collect those metrics too. > > > > On the benchmarks, the one called “processstreamwithstatestore” and > > “count” are the closest to a benchmarking on RocksDb with the default > > configs. The first writes each record to RocksDb, while the second > performs > > simple aggregates (reads and writes from/to RocksDb). > > > > We might need to add more benchmarks here, would be great to get some > > ideas and help from the community. E.g., a pure RocksDb benchmark that > > doesn’t go through streams at all. > > > > Could you open a JIRA on the name issue please? As an “improvement”. > > > > Thanks > > Eno > > > > > > > > > On Mar 2, 2017, at 6:00 PM, Sachin Mittal <sjmit...@gmail.com> wrote: > > > > > > Hi, > > > I had checked the monitoring docs, but could not figure out which > metrics > > > are important ones. > > > > > > Also mainly I am looking at the average time spent between 2 successive > > > poll requests. > > > Can I say that average time between 2 poll requests is sum of > > > > > > commit + poll + process + punctuate (latency-avg). > > > > > > > > > Also I checked the benchmark tests results but could not find any > > > information on rocksdb metrics for fetch and put operations. > > > Is there any benchmark for these or based on my values in previous mail > > can > > > something be commented on its performance. > > > > > > > > > Lastly can we get some help on names like > new-part-advice-d1094e71-0f59- > > > 45e8-98f4-477f9444aa91-StreamThread-1 and have more standard name of > > thread > > > like new-advice-1-StreamThread-1(as in version 10.1.1) so we can log > > these > > > metrics as part of out cron jobs. > > > > > > Thanks > > > Sachin > > > > > > > > > > > > On Thu, Mar 2, 2017 at 9:31 PM, Eno Thereska <eno.there...@gmail.com> > > wrote: > > > > > >> Hi Sachin, > > >> > > >> The new streams metrics are now documented at > https://kafka.apache.org/ > > >> documentation/#kafka_streams_monitoring <https://kafka.apache.org/ > > >> documentation/#kafka_streams_monitoring>. Note that not all of them > are > > >> turned on by default. > > >> > > >> We have several benchmarks that run nightly to monitor streams > > >> performance. They all stem from the SimpleBenchmark.java benchmark. In > > >> addition, their results are published nightly here > > >> http://testing.confluent.io <http://testing.confluent.io/>, (e.g., > > under > > >> the trunk results). E.g., looking at today's results: > > >> http://confluent-kafka-system-test-results.s3-us-west-2. > > >> amazonaws.com/2017-03-02--001.1488449554--apache--trunk-- > > >> ef92bb4/report.html <http://confluent-kafka- > system-test-results.s3-us- > > >> west-2.amazonaws.com/2017-03-02--001.1488449554--apache-- > > >> trunk--ef92bb4/report.html> > > >> (if you search for "benchmarks.streams") you'll see results from a > > series > > >> of benchmarks, ranging from simply consuming, to simple topologies > with > > a > > >> source and sink, to joins and count aggregate. These run on AWS > nightly, > > >> but you can also run manually on your setup. > > >> > > >> In addition, programmatically the code can check the > > KafkaStreams.state() > > >> and register listeners for when the state changes. For example, the > > state > > >> can change from "running" to "rebalancing". > > >> > > >> It is likely we'll need more metrics moving forward and would be great > > to > > >> get feedback from the community. > > >> > > >> > > >> Thanks > > >> Eno > > >> > > >> > > >> > > >> > > >>> On 2 Mar 2017, at 11:54, Sachin Mittal <sjmit...@gmail.com> wrote: > > >>> > > >>> Hello All, > > >>> I had few questions regarding monitoring of kafka streams application > > and > > >>> what are some important metrics we should collect in our case. > > >>> > > >>> Just a brief overview, we have a single thread application (0.10.1.1) > > >>> reading from single partition topic and it is working all fine. > > >>> Then we have same application (using 0.10.2.0) multi threaded with 4 > > >>> threads per machine and 3 machines cluster setup reading for same but > > >>> partitioned topic (12 partitions). > > >>> Thus we have each thread processing single partition same case as > > earlier > > >>> one. > > >>> > > >>> The new setup also works fine in steady state, but under load somehow > > it > > >>> triggers frequent re-balance and then we run into all sort of issues > > like > > >>> stream thread dying due to CommitFailedException or entering into > > >> deadlock > > >>> state. > > >>> After a while we restart all the instances then it works fine for a > > while > > >>> and again we get the same problem and it goes on. > > >>> > > >>> 1. So just to monitor, like when first thread fails what would be > some > > >>> important metrics we should be collecting to get some sense of whats > > >> going > > >>> on? > > >>> > > >>> 2. Is there any metric that tells time elapsed between successive > poll > > >>> requests, so we can monitor that? > > >>> > > >>> Also I did monitor rocksdb put and fetch times for these 2 instances > > and > > >>> here is the output I get: > > >>> 0.10.1.1 > > >>> $>get -s -b kafka.streams:type=stream- > rocksdb-window-metrics,client- > > >> id=new-advice-1-StreamThread-1 > > >>> key-table-put-avg-latency-ms > > >>> #mbean = kafka.streams:type=stream-rocksdb-window-metrics,client- > > >>> id=new-advice-1-StreamThread-1: > > >>> 206431.7497615029 > > >>> $>get -s -b kafka.streams:type=stream- > rocksdb-window-metrics,client- > > >> id=new-advice-1-StreamThread-1 > > >>> key-table-fetch-avg-latency-ms > > >>> #mbean = kafka.streams:type=stream-rocksdb-window-metrics,client- > > >>> id=new-advice-1-StreamThread-1: > > >>> 2595394.2746129474 > > >>> $>get -s -b kafka.streams:type=stream- > rocksdb-window-metrics,client- > > >> id=new-advice-1-StreamThread-1 > > >>> key-table-put-qps > > >>> #mbean = kafka.streams:type=stream-rocksdb-window-metrics,client- > > >>> id=new-advice-1-StreamThread-1: > > >>> 232.86299499317252 > > >>> $>get -s -b kafka.streams:type=stream- > rocksdb-window-metrics,client- > > >> id=new-advice-1-StreamThread-1 > > >>> key-table-fetch-qps > > >>> #mbean = kafka.streams:type=stream-rocksdb-window-metrics,client- > > >>> id=new-advice-1-StreamThread-1: > > >>> 373.61071016166284 > > >>> > > >>> Same values for 0.10.2.0 I get > > >>> $>get -s -b kafka.streams:type=stream-rocksdb-window-metrics,client- > > >>> id=new-part-advice-d1094e71-0f59-45e8-98f4-477f9444aa91- > StreamThread-1 > > >>> key-table-put-latency-avg > > >>> #mbean = kafka.streams:type=stream-rocksdb-window-metrics,client- > > >>> id=new-part-advice-d1094e71-0f59-45e8-98f4-477f9444aa91- > > StreamThread-1: > > >>> 1199859.5535022356 > > >>> $>get -s -b kafka.streams:type=stream-rocksdb-window-metrics,client- > > >>> id=new-part-advice-d1094e71-0f59-45e8-98f4-477f9444aa91- > StreamThread-1 > > >>> key-table-fetch-latency-avg > > >>> #mbean = kafka.streams:type=stream-rocksdb-window-metrics,client- > > >>> id=new-part-advice-d1094e71-0f59-45e8-98f4-477f9444aa91- > > StreamThread-1: > > >>> 3679340.80748852 > > >>> $>get -s -b kafka.streams:type=stream-rocksdb-window-metrics,client- > > >>> id=new-part-advice-d1094e71-0f59-45e8-98f4-477f9444aa91- > StreamThread-1 > > >>> key-table-put-rate > > >>> #mbean = kafka.streams:type=stream-rocksdb-window-metrics,client- > > >>> id=new-part-advice-d1094e71-0f59-45e8-98f4-477f9444aa91- > > StreamThread-1: > > >>> 56.134778706069184 > > >>> $>get -s -b kafka.streams:type=stream-rocksdb-window-metrics,client- > > >>> id=new-part-advice-d1094e71-0f59-45e8-98f4-477f9444aa91- > StreamThread-1 > > >>> key-table-fetch-rate > > >>> #mbean = kafka.streams:type=stream-rocksdb-window-metrics,client- > > >>> id=new-part-advice-d1094e71-0f59-45e8-98f4-477f9444aa91- > > StreamThread-1: > > >>> 136.10721427931827 > > >>> > > >>> I notice that result in 10.2.0 is much worse than same for 10.1.1 > > >>> > > >>> I would like to know > > >>> 1. Is there any benchmark on rocksdb as at what rate/latency it > should > > be > > >>> doing put/fetch operations. > > >>> > > >>> 2. What could be the cause of inferior numbers in 10.2.0, is it > because > > >>> this application is also running three other threads doing the same > > >> thing. > > >>> > > >>> 3. Also whats with the name new-part-advice-d1094e71- > > >>> 0f59-45e8-98f4-477f9444aa91-StreamThread-1 > > >>> I wanted to put this as a part of my cronjob, so why can't we have > > >>> simpler name like we have in 10.1.1, so it is easy to write the > script. > > >>> > > >>> Thanks > > >>> Sachin > > >> > > >> > > > > > > > -- > -- Guozhang >