Can you post full log of one the workers/executors that reads at least one
record? You can remove application logs.

On Mon, Nov 6, 2017 at 12:45 PM, NerdyNick <[email protected]> wrote:

> It's all partitions and it's random on how far behind they show. The topic
> in question has a low events/sec but over sized partition count. So not
> every partition has data to be read the full length of time. So given this,
> it's almost like the partition is drained before the batch end read time.
> Which seems to result in the consumer not posting an update to the offset
> within that batch. Because the offsets will get posted just not with the
> rate of the batches. So it'll look like it falls behind then somewhat
> catches up.
>
> On Mon, Nov 6, 2017 at 12:53 PM, Raghu Angadi <[email protected]> wrote:
>
>> On Mon, Nov 6, 2017 at 10:39 AM, NerdyNick <[email protected]> wrote:
>>
>>> Hey Raghu,
>>>
>>> Runner is Spark on Yarn cluster. Executor/Thread would be the
>>> Thread(Core) within a Spark Executor. I have nothing I can find in the logs
>>> saying that they aren't being closed cleanly outside of forced shutdown.
>>> Interval I have now is 30ms with a min read time of 2secs. Watching the
>>> finished tasks for the read job shows each thread running for the 2secs,
>>> some die early with no log as to why. However watching the offset within
>>> Kafka Manager shows the lag growing, as if the commit isn't happening.
>>> However the event throughput shows it matching that of the kafka topic msgs
>>> inbound. I can't seem to find a working option for getting the offsets
>>> KafkaIO believes it has out of either to compare numbers. I tried the Spark
>>> Metrics sinks provided by the runner, as KafkaIO appears to provide it via
>>> those. But they don't appear to work and actually caused issues, no longer
>>> appeared, with the Streaming stats Spark maintains.
>>>
>>
>> May be one or two Kafka partitions aren't being consumed (due to some
>> error). You could check on Kafka if all the partitions are behind or just
>> some. Otherwise what you describe sounds like bugs/issues in the pipeline
>> (spark or user). If feasible, you can try running with direct-runner to see
>> if auto_commit does not advance.
>>
>> I would also encourage you to file bugs about the missing metrics in
>> spark (and/or report here on another thread).
>>
>> Raghu.
>>
>>
>>> Consumer group name is consistent as a bash script is being used to
>>> launch jobs to guarantee uniformity between launches.
>>>
>>> From what I've been able to dissect from the KafkaIO class. I believe
>>> the watermark management layer via the PartitionState might be the best
>>> tiein for doing manual kafka offset management via the
>>> Consumer.commitSync() or Consumer.commitAsync() methods. This would allow
>>> better uniform and consistent reporting of offset in sync with the values
>>> being maintained in the watermark.
>>>
>>> On Mon, Nov 6, 2017 at 11:11 AM, Raghu Angadi <[email protected]>
>>> wrote:
>>>
>>>> What is the runner?
>>>> Can you elaborate a bit more what you mean by 'executor/thread
>>>> shutting down'? If the KafkaIO reader shutdown cleanly, it would call close
>>>> the consumer cleanly, triggering auto commit. But if you shutdown a
>>>> pipeline, it might not cleanly close the consumers. What is the auto_commit
>>>> interval you have? Please note that there is no way to coordinate
>>>> consistency between Beam pipeline and externally maintained auto commit
>>>> offsets since it is outside Beam. 'Drain' feature Dataflow can help (it
>>>> lets a clean shutdown of the pipeline), also note that many runners provide
>>>> clean ways to update a pipeline that keeps all the state from previous run
>>>> (in this case Kafka offsets), which is the only way for Beam to provide its
>>>> processing guarantees across runs.
>>>>
>>>> KafkaIO leaves auto_commit handling  entirely to KafkaConsumer. If you
>>>> are seeing consumer is not honoring the auto_committed offset, please check
>>>> the log from KafkConsumer on the worker. Only user error I could think of
>>>> is some typo in consumer group name upon restart.
>>>>
>>>> Currently KafkaIO does not actively participate in auto_commit
>>>> management. It lets user directly set KafkaConsumer configuration. May be
>>>> there is a case for some more active support for auto_commit management.
>>>> Please provide more details in your case so that we can discuss actual
>>>> specifics and potential improvements it provides.
>>>>
>>>>
>>>> On Mon, Nov 6, 2017 at 8:15 AM, NerdyNick <[email protected]> wrote:
>>>>
>>>>> There seems to be a lot of oddities with the auto offset committer and
>>>>> the watermark management as well as kafka offsets in general.
>>>>>
>>>>> One issue I keep having is the auto committer will just not commit any
>>>>> offsets. So the topic will look like it's backing up. From what I've been
>>>>> able to trace on it it appears to be in relation to the executor/thread
>>>>> shutting down before the auto commit has a chance to run. Even though the
>>>>> min read times are set. It still prematurely shuts down. Turning auto
>>>>> commit interval down seems to help but doesn't resolve the issue. Just
>>>>> seems to allow it to correct itself much quicker.
>>>>>
>>>>> Another I just had happen is after restarting a pipeline the auto
>>>>> committed offsets reset to the earliest record and the pipeline appears to
>>>>> be working on those records. Which is odd in contrary to a lot of things.
>>>>> When I shut the pipeline down it was only a few thousand records behind.
>>>>> The consumer is configured to start at the latest offset not the earliest.
>>>>> Give that It would appear the recorded watermarks had an odd corruption or
>>>>> something where they believed they where in the past.
>>>>>
>>>>> --
>>>>> Nick Verbeck - NerdyNick
>>>>> ----------------------------------------------------
>>>>> NerdyNick.com
>>>>> Coloco.ubuntu-rocks.org
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Nick Verbeck - NerdyNick
>>> ----------------------------------------------------
>>> NerdyNick.com
>>> Coloco.ubuntu-rocks.org
>>>
>>
>>
>
>
> --
> Nick Verbeck - NerdyNick
> ----------------------------------------------------
> NerdyNick.com
> Coloco.ubuntu-rocks.org
>

Reply via email to