date:20200719

Re: Spark 3.0 with Hadoop 2.6 HDFS/Hive

2020-07-19 Thread DB Tsai

If it's standalone mode, it's even easier. You should be able to connect to hadoop 2.6 hdfs using 3.2 client. In your k8s cluster, just don't put hadoop 2.6 into your classpath. On Sun, Jul 19, 2020 at 10:25 PM Ashika Umanga Umagiliya wrote: > > Hello > > "spark.yarn.populateHadoopClasspath" is u

Re: Spark 3.0 with Hadoop 2.6 HDFS/Hive

2020-07-19 Thread Ashika Umanga Umagiliya

Hello "spark.yarn.populateHadoopClasspath" is used in YARN mode correct? However our Spark cluster is standalone cluster not using YARN. We only connect to HDFS/Hive to access data.Computation is done on our spark cluster running on K8s (not Yarn) On Mon, Jul 20, 2020 at 2:04 PM DB Tsai wrote:

Re: Spark 3.0 with Hadoop 2.6 HDFS/Hive

2020-07-19 Thread Prashant Sharma

Hi Ashika, Hadoop 2.6 is now no longer supported, and since it has not been maintained in the last 2 years, it means it may have some security issues unpatched. Spark 3.0 onwards, we no longer support it, in other words, we have modified our codebase in a way that Hadoop 2.6 won't work. However, i

Re: Spark UI

2020-07-19 Thread Piyush Acharya

https://www.youtube.com/watch?v=YgQgJceojJY (Xiao's video ) On Mon, Jul 20, 2020 at 8:03 AM Xiao Li wrote: > https://spark.apache.org/docs/3.0.0/web-ui.html is the official doc > for Spark UI. > > Xiao > > On Sun, Jul 19, 2020 at 1:38 PM venkatadevarapu > wrote: > >> Hi, >> >> I'm looking

Spark 3.0 with Hadoop 2.6 HDFS/Hive

2020-07-19 Thread Ashika Umanga

Greetings, Hadoop 2.6 has been removed according to this ticket https://issues.apache.org/jira/browse/SPARK-25016 We run our Spark cluster on K8s in standalone mode. We access HDFS/Hive running on a Hadoop 2.6 cluster. We've been using Spark 2.4.5 and planning on upgrading to Spark 3.0.0 However,

Re: Spark UI

2020-07-19 Thread Xiao Li

https://spark.apache.org/docs/3.0.0/web-ui.html is the official doc for Spark UI. Xiao On Sun, Jul 19, 2020 at 1:38 PM venkatadevarapu wrote: > Hi, > > I'm looking for a tutorial/video/material which explains the content of > various tabes in SPARK WEB UI. > Can some one direct me with the rele

Re: Overwrite Mode not Working Correctly in spark 3.0.0

2020-07-19 Thread anbutech

Hi, When im using option 1,it is completely overwrite the whole table.this is not expected here.im running for multiple tables with different hours. When im using option 2,im getting the following error Predicate references non-partition column 'json_feeds_flatten_data'. Only the partition colum

Spark UI

2020-07-19 Thread venkatadevarapu

Hi, I'm looking for a tutorial/video/material which explains the content of various tabes in SPARK WEB UI. Can some one direct me with the relevant info. Thanks -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ -

Re: Schedule/Orchestrate spark structured streaming job

2020-07-19 Thread Piyush Acharya

Some of the options of workflows https://medium.com/@xunnan.xu/workflow-processing-engine-overview-2018-airflow-vs-azkaban-vs-conductor-vs-oozie-vs-amazon-step-90affc54d53b Streaming is a kind of infinitely running job, so, you just have to trigger it only once unless you re not using it with Trig

Re: Overwrite Mode not Working Correctly in spark 3.0.0

2020-07-19 Thread Piyush Acharya

Can you please send the error message? it would ve very helpful to get to the root cause. On Sun, Jul 19, 2020 at 10:57 PM anbutech wrote: > Hi Team, > > I'm facing weird behavior in the pyspark dataframe(databricks delta spark > 3.0.0 supported) > > I have tried the below two options to write t

Re: OOM while processing read/write to S3 using Spark Structured Streaming

2020-07-19 Thread Piyush Acharya

Please try with maxBytesPerTrigger option, probably files are big enough to crash the JVM. Please give some info on Executors and file info ( size etc) Regards, ..Piyush On Sun, Jul 19, 2020 at 3:29 PM Rachana Srivastava wrote: > *Issue:* I am trying to process 5000+ files of gzipped json file

Schedule/Orchestrate spark structured streaming job

2020-07-19 Thread anbutech

Hi Team, I'm very new to spark structured streaming.could you please guide me how to Schedule/Orchestrate spark structured streaming job.Any scheduler similar like airflow.I knew airflow doesn't support streaming jobs. Thanks Anbu -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.c

Overwrite Mode not Working Correctly in spark 3.0.0

2020-07-19 Thread anbutech

Hi Team, I'm facing weird behavior in the pyspark dataframe(databricks delta spark 3.0.0 supported) I have tried the below two options to write the processed datafame data into delta table with respect to the partition columns in the table.Actually overwrite mode completely overwrite the whole ta

Re: OOM while processing read/write to S3 using Spark Structured Streaming

2020-07-19 Thread Sanjeev Mishra

Can you reduce maxFilesPerTrigger further and see if the OOM still persists, if it does then the problem may be somewhere else. > On Jul 19, 2020, at 5:37 AM, Jungtaek Lim > wrote: > > Please provide logs and dump file for the OOM case - otherwise no one could > say what's the cause. > > Add

Re: OOM while processing read/write to S3 using Spark Structured Streaming

2020-07-19 Thread Jungtaek Lim

Please provide logs and dump file for the OOM case - otherwise no one could say what's the cause. Add JVM options to driver/executor => -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath="...dir..." On Sun, Jul 19, 2020 at 6:56 PM Rachana Srivastava wrote: > *Issue:* I am trying to process 5000+

OOM while processing read/write to S3 using Spark Structured Streaming

2020-07-19 Thread Rachana Srivastava

Issue: I am trying to process 5000+ files of gzipped json file periodically from S3 using Structured Streaming code. Here are the key steps: - Read json schema and broadccast to executors - Read Stream Dataset inputDS = sparkSession.readStream() .format("text") .option("inf

Re: Spark 3.0 with Hadoop 2.6 HDFS/Hive

Re: Spark 3.0 with Hadoop 2.6 HDFS/Hive

Re: Spark 3.0 with Hadoop 2.6 HDFS/Hive

Re: Spark UI

Spark 3.0 with Hadoop 2.6 HDFS/Hive

Re: Spark UI

Re: Overwrite Mode not Working Correctly in spark 3.0.0

Spark UI

Re: Schedule/Orchestrate spark structured streaming job

Re: Overwrite Mode not Working Correctly in spark 3.0.0

Re: OOM while processing read/write to S3 using Spark Structured Streaming

Schedule/Orchestrate spark structured streaming job

Overwrite Mode not Working Correctly in spark 3.0.0

Re: OOM while processing read/write to S3 using Spark Structured Streaming

Re: OOM while processing read/write to S3 using Spark Structured Streaming

OOM while processing read/write to S3 using Spark Structured Streaming

16 matches

Site Navigation

Mail list logo

Footer information