If it's standalone mode, it's even easier. You should be able to
connect to hadoop 2.6 hdfs using 3.2 client. In your k8s cluster, just
don't put hadoop 2.6 into your classpath.
On Sun, Jul 19, 2020 at 10:25 PM Ashika Umanga Umagiliya
wrote:
>
> Hello
>
> "spark.yarn.populateHadoopClasspath" is u
Hello
"spark.yarn.populateHadoopClasspath" is used in YARN mode correct?
However our Spark cluster is standalone cluster not using YARN.
We only connect to HDFS/Hive to access data.Computation is done on our
spark cluster running on K8s (not Yarn)
On Mon, Jul 20, 2020 at 2:04 PM DB Tsai wrote:
Hi Ashika,
Hadoop 2.6 is now no longer supported, and since it has not been maintained
in the last 2 years, it means it may have some security issues unpatched.
Spark 3.0 onwards, we no longer support it, in other words, we have
modified our codebase in a way that Hadoop 2.6 won't work. However, i
https://www.youtube.com/watch?v=YgQgJceojJY (Xiao's video )
On Mon, Jul 20, 2020 at 8:03 AM Xiao Li wrote:
> https://spark.apache.org/docs/3.0.0/web-ui.html is the official doc
> for Spark UI.
>
> Xiao
>
> On Sun, Jul 19, 2020 at 1:38 PM venkatadevarapu
> wrote:
>
>> Hi,
>>
>> I'm looking
Greetings,
Hadoop 2.6 has been removed according to this ticket
https://issues.apache.org/jira/browse/SPARK-25016
We run our Spark cluster on K8s in standalone mode.
We access HDFS/Hive running on a Hadoop 2.6 cluster.
We've been using Spark 2.4.5 and planning on upgrading to Spark 3.0.0
However,
https://spark.apache.org/docs/3.0.0/web-ui.html is the official doc
for Spark UI.
Xiao
On Sun, Jul 19, 2020 at 1:38 PM venkatadevarapu
wrote:
> Hi,
>
> I'm looking for a tutorial/video/material which explains the content of
> various tabes in SPARK WEB UI.
> Can some one direct me with the rele
Hi,
When im using option 1,it is completely overwrite the whole table.this is
not expected here.im running for multiple tables with different hours.
When im using option 2,im getting the following error
Predicate references non-partition column 'json_feeds_flatten_data'. Only
the partition colum
Hi,
I'm looking for a tutorial/video/material which explains the content of
various tabes in SPARK WEB UI.
Can some one direct me with the relevant info.
Thanks
--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
-
Some of the options of workflows
https://medium.com/@xunnan.xu/workflow-processing-engine-overview-2018-airflow-vs-azkaban-vs-conductor-vs-oozie-vs-amazon-step-90affc54d53b
Streaming is a kind of infinitely running job, so, you just have to trigger
it only once unless you re not using it with Trig
Can you please send the error message? it would ve very helpful to get to
the root cause.
On Sun, Jul 19, 2020 at 10:57 PM anbutech wrote:
> Hi Team,
>
> I'm facing weird behavior in the pyspark dataframe(databricks delta spark
> 3.0.0 supported)
>
> I have tried the below two options to write t
Please try with maxBytesPerTrigger option, probably files are big enough to
crash the JVM.
Please give some info on Executors and file info ( size etc)
Regards,
..Piyush
On Sun, Jul 19, 2020 at 3:29 PM Rachana Srivastava
wrote:
> *Issue:* I am trying to process 5000+ files of gzipped json file
Hi Team,
I'm very new to spark structured streaming.could you please guide me how to
Schedule/Orchestrate spark structured streaming job.Any scheduler similar
like airflow.I knew airflow doesn't support streaming jobs.
Thanks
Anbu
--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.c
Hi Team,
I'm facing weird behavior in the pyspark dataframe(databricks delta spark
3.0.0 supported)
I have tried the below two options to write the processed datafame data into
delta table with respect to the partition columns in the table.Actually
overwrite mode completely overwrite the whole ta
Can you reduce maxFilesPerTrigger further and see if the OOM still persists, if
it does then the problem may be somewhere else.
> On Jul 19, 2020, at 5:37 AM, Jungtaek Lim
> wrote:
>
> Please provide logs and dump file for the OOM case - otherwise no one could
> say what's the cause.
>
> Add
Please provide logs and dump file for the OOM case - otherwise no one could
say what's the cause.
Add JVM options to driver/executor => -XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath="...dir..."
On Sun, Jul 19, 2020 at 6:56 PM Rachana Srivastava
wrote:
> *Issue:* I am trying to process 5000+
Issue: I am trying to process 5000+ files of gzipped json file periodically
from S3 using Structured Streaming code.
Here are the key steps:
-
Read json schema and broadccast to executors
-
Read Stream
Dataset inputDS = sparkSession.readStream() .format("text")
.option("inf
16 matches
Mail list logo