I've created a stackoverflow ticket for this as well On Mon, Sep 19, 2022 at 4:37 PM karan alang <karan.al...@gmail.com> wrote:
> Hello All, > I've a Spark Structured Streaming job on GCP Dataproc - which picks up > data from Kafka, does processing and pushes data back into kafka topics. > > Couple of questions : > 1. Does Spark put all the log (incl. INFO, WARN etc) into stderr ? > What I notice is that stdout is empty, while all the logging is put in to > stderr > > 2. Is there a way for me to expire the data in stderr (i.e. expire the > older logs) ? > Since I've a long running streaming job, the stderr gets filled up over > time and nodes/VMs become unavailable. > > Pls advice. > > Here is output of the yarn logs command : > ``` > > root@versa-structured-stream-v1-w-1:/home/karanalang# yarn logs > -applicationId application_1663623368960_0008 -log_files stderr -size -500 > > 2022-09-19 23:26:01,439 INFO client.RMProxy: Connecting to ResourceManager > at versa-structured-stream-v1-m/10.142.0.62:8032 > > 2022-09-19 23:26:01,696 INFO client.AHSProxy: Connecting to Application > History server at versa-structured-stream-v1-m/10.142.0.62:10200 > > Can not find any log file matching the pattern: [stderr] for the > container: container_e01_1663623368960_0008_01_000003 within the > application: application_1663623368960_0008 > > Container: container_e01_1663623368960_0008_01_000002 on > versa-structured-stream-v1-w-2.c.versa-sml-googl.internal:8026 > > LogAggregationType: LOCAL > > > ======================================================================================================================= > > LogType:stderr > > LogLastModifiedTime:Mon Sep 19 23:26:02 +0000 2022 > > LogLength:44309782124 > > LogContents: > > , tenantId=3, vsnId=0, mstatsTotSentOctets=48210, > mstatsTotRecvdOctets=242351, mstatsTotSessDuration=300000, > mstatsTotSessCount=34, mstatsType=dest-stats, destIp=165.225.216.24, > mstatsAttribs=,topic=syslog.ueba-us4.v1.versa.demo3,customer=versa type(row) > is -> <class 'str'> > > 22/09/19 23:26:02 WARN > org.apache.spark.sql.kafka010.consumer.KafkaDataConsumer: KafkaDataConsumer > is not running in UninterruptibleThread. It may hang when > KafkaDataConsumer's methods are interrupted because of KAFKA-1894 > > End of LogType:stderr.This log file belongs to a running container > (container_e01_1663623368960_0008_01_000002) and so may not be complete. > > *********************************************************************** > > > > Container: container_e01_1663623368960_0008_01_000001 on > versa-structured-stream-v1-w-1.c.versa-sml-googl.internal:8026 > > LogAggregationType: LOCAL > > > ======================================================================================================================= > > LogType:stderr > > LogLastModifiedTime:Mon Sep 19 22:54:55 +0000 2022 > > LogLength:17367929 > > LogContents: > > on syslog.ueba-us4.v1.versa.demo3-2 > > 22/09/19 22:52:52 INFO > org.apache.kafka.clients.consumer.internals.SubscriptionState: [Consumer > clientId=consumer-spark-kafka-source-0f984ad9-f663-4ce1-9ef1-349419f3e6ec-1714963016-executor-1, > groupId=spark-kafka-source-0f984ad9-f663-4ce1-9ef1-349419f3e6ec-1714963016-executor] > Resetting offset for partition syslog.ueba-us4.v1.versa.demo3-2 to offset > 449568676. > > 22/09/19 22:54:55 ERROR > org.apache.spark.executor.CoarseGrainedExecutorBackend: RECEIVED SIGNAL TERM > > End of LogType:stderr. > > *********************************************************************** > > ``` > > >