Re: How to get logging right for Spark applications in the YARN ecosystem

2019-08-01 Thread Srinath C
Hi Raman, Probably use the rolling file appender in log4j to compress the rotated log file? Regards. On Fri, Aug 2, 2019 at 12:47 AM raman gugnani wrote: > HI , > > I am looking for right solution for logging the logs produced by the > executors. Most of the places I have seen logging done

Re: Using G1GC in Spark

2018-06-14 Thread Srinath C
You'll have to use "spark.executor.extraJavaOptions" configuration parameter: See documentation link . --conf "spark.executor.extraJavaOptions=-XX:+UseG1GC" Regards, Srinath. On Thu, Jun 14, 2018 at 4:44 PM Aakash

Re: [Spark Optimization] Why is one node getting all the pressure?

2018-06-12 Thread Srinath C
t; > Below is the new executors show while the updated run is taking place - > > > > > Thanks, > Aakash. > > On Tue, Jun 12, 2018 at 2:14 PM, Srinath C wrote: > >> Hi Aakash, >> >> Can you check the logs for Executor ID 0? It was restarted on worker >&g

Re: [Spark Optimization] Why is one node getting all the pressure?

2018-06-12 Thread Srinath C
Hi Aakash, Can you check the logs for Executor ID 0? It was restarted on worker 192.168.49.39 perhaps due to OOM or something. Also observed that the number of tasks are high and unevenly distributed across the workers. Check if there are too many partitions in the RDD and tune it using

Re: AWS credentials needed while trying to read a model from S3 in Spark

2018-05-09 Thread Srinath C
You could use IAM roles in AWS to access the data in S3 without credentials. See this link and this link for an

Timezone conversion using from_utc_timestamp

2018-02-24 Thread Srinath C
Hi, This is question regarding timezone conversion with from_utc_timestamp function. The observation is that the function return different values for zoneId and zoneOffset for the same timezone. Ex: "America/Los_Angeles" and "-08:00" System Timezone is +05:30 Timestamp: 1519430400