Re: Spark Structured Streaming - stderr getting filled up

karan alang Mon, 19 Sep 2022 16:43:05 -0700

here is the stackoverflow link

https://stackoverflow.com/questions/73780259/spark-structured-streaming-stderr-getting-filled-up


On Mon, Sep 19, 2022 at 4:41 PM karan alang <karan.al...@gmail.com> wrote:

> I've created a stackoverflow ticket for this as well
>
> On Mon, Sep 19, 2022 at 4:37 PM karan alang <karan.al...@gmail.com> wrote:
>
>> Hello All,
>> I've a Spark Structured Streaming job on GCP Dataproc - which picks up
>> data from Kafka, does processing and pushes data back into kafka topics.
>>
>> Couple of questions :
>> 1. Does Spark put all the log (incl. INFO, WARN etc) into stderr ?
>> What I notice is that stdout is empty, while all the logging is put in to
>> stderr
>>
>> 2. Is there a way for me to expire the data in stderr (i.e. expire the
>> older logs) ?
>> Since I've a long running streaming job, the stderr gets filled up over
>> time and nodes/VMs become unavailable.
>>
>> Pls advice.
>>
>> Here is output of the yarn logs command :
>> ```
>>
>> root@versa-structured-stream-v1-w-1:/home/karanalang# yarn logs
>> -applicationId application_1663623368960_0008 -log_files stderr -size -500
>>
>> 2022-09-19 23:26:01,439 INFO client.RMProxy: Connecting to
>> ResourceManager at versa-structured-stream-v1-m/10.142.0.62:8032
>>
>> 2022-09-19 23:26:01,696 INFO client.AHSProxy: Connecting to Application
>> History server at versa-structured-stream-v1-m/10.142.0.62:10200
>>
>> Can not find any log file matching the pattern: [stderr] for the
>> container: container_e01_1663623368960_0008_01_000003 within the
>> application: application_1663623368960_0008
>>
>> Container: container_e01_1663623368960_0008_01_000002 on
>> versa-structured-stream-v1-w-2.c.versa-sml-googl.internal:8026
>>
>> LogAggregationType: LOCAL
>>
>>
>> =======================================================================================================================
>>
>> LogType:stderr
>>
>> LogLastModifiedTime:Mon Sep 19 23:26:02 +0000 2022
>>
>> LogLength:44309782124
>>
>> LogContents:
>>
>> , tenantId=3, vsnId=0, mstatsTotSentOctets=48210,
>> mstatsTotRecvdOctets=242351, mstatsTotSessDuration=300000,
>> mstatsTotSessCount=34, mstatsType=dest-stats, destIp=165.225.216.24,
>> mstatsAttribs=,topic=syslog.ueba-us4.v1.versa.demo3,customer=versa  type(row)
>> is ->  <class 'str'>
>>
>> 22/09/19 23:26:02 WARN
>> org.apache.spark.sql.kafka010.consumer.KafkaDataConsumer: KafkaDataConsumer
>> is not running in UninterruptibleThread. It may hang when
>> KafkaDataConsumer's methods are interrupted because of KAFKA-1894
>>
>> End of LogType:stderr.This log file belongs to a running container
>> (container_e01_1663623368960_0008_01_000002) and so may not be complete.
>>
>> ***********************************************************************
>>
>>
>>
>> Container: container_e01_1663623368960_0008_01_000001 on
>> versa-structured-stream-v1-w-1.c.versa-sml-googl.internal:8026
>>
>> LogAggregationType: LOCAL
>>
>>
>> =======================================================================================================================
>>
>> LogType:stderr
>>
>> LogLastModifiedTime:Mon Sep 19 22:54:55 +0000 2022
>>
>> LogLength:17367929
>>
>> LogContents:
>>
>> on syslog.ueba-us4.v1.versa.demo3-2
>>
>> 22/09/19 22:52:52 INFO
>> org.apache.kafka.clients.consumer.internals.SubscriptionState: [Consumer
>> clientId=consumer-spark-kafka-source-0f984ad9-f663-4ce1-9ef1-349419f3e6ec-1714963016-executor-1,
>> groupId=spark-kafka-source-0f984ad9-f663-4ce1-9ef1-349419f3e6ec-1714963016-executor]
>> Resetting offset for partition syslog.ueba-us4.v1.versa.demo3-2 to offset
>> 449568676.
>>
>> 22/09/19 22:54:55 ERROR
>> org.apache.spark.executor.CoarseGrainedExecutorBackend: RECEIVED SIGNAL TERM
>>
>> End of LogType:stderr.
>>
>> ***********************************************************************
>>
>> ```
>>
>>
>>

Re: Spark Structured Streaming - stderr getting filled up

Reply via email to