ML using Spark Connect

2023-12-01 Thread Faiz Halde
Hello, Is it possible to run SparkML using Spark Connect 3.5.0? So far I've had no success setting up a connect client that uses ML package The ML package uses spark core/sql afaik which seems to be shadowing the Spark connect client classes Do I have to exclude any dependencies from the mllib

Re: Error using SPARK with Rapid GPU

2022-11-30 Thread Alessandro Bellina
Vajiha filed a spark-rapids discussion here https://github.com/NVIDIA/spark-rapids/discussions/7205, so if you are interested please follow there. On Wed, Nov 30, 2022 at 7:17 AM Vajiha Begum S A < vajihabegu...@maestrowiz.com> wrote: > Hi, > I'm using an Ubuntu system with the NVIDIA Quadro

Error using SPARK with Rapid GPU

2022-11-30 Thread Vajiha Begum S A
Hi, I'm using an Ubuntu system with the NVIDIA Quadro K1200 with GPU memory 20GB Installed - CUDF 22.10.0 jar file, Rapid 4 Spark 2.12-22.10.0 jar file, CUDA Toolkit 11.8.0 Linux version., JAVA 8 I'm running only single server, Master is localhost I'm trying to run pyspark code through spark

Error - using Spark with GPU

2022-11-30 Thread Vajiha Begum S A
spark-submit /home/mwadmin/Documents/test.py 22/11/30 14:59:32 WARN Utils: Your hostname, mwadmin-HP-Z440-Workstation resolves to a loopback address: 127.0.1.1; using ***.***.**.** instead (on interface eno1) 22/11/30 14:59:32 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address

回复:Re: Spark got incorrect scala version while using spark 3.2.1 and spark 3.2.2

2022-08-26 Thread ckgppl_yan
Oh, I got it. I thought SPARK can get local scala version. - 原始邮件 - 发件人:Sean Owen 收件人:ckgppl_...@sina.cn 抄送人:user 主题:Re: Spark got incorrect scala version while using spark 3.2.1 and spark 3.2.2 日期:2022年08月26日 21点08分 Spark is built with and ships with a copy of Scala. It doesn't use

Re: Spark got incorrect scala version while using spark 3.2.1 and spark 3.2.2

2022-08-26 Thread pengyh
good answer. nice to know too. Sean Owen wrote: Spark is built with and ships with a copy of Scala. It doesn't use your local version. - To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: Spark got incorrect scala version while using spark 3.2.1 and spark 3.2.2

2022-08-26 Thread Sean Owen
Spark is built with and ships with a copy of Scala. It doesn't use your local version. On Fri, Aug 26, 2022 at 2:55 AM wrote: > Hi all, > > I found a strange thing. I have run SPARK 3.2.1 prebuilt in local mode. My > OS scala version is 2.13.7. > But when I run spark-sumit then check the

Spark got incorrect scala version while using spark 3.2.1 and spark 3.2.2

2022-08-26 Thread ckgppl_yan
Hi all, I found a strange thing. I have run SPARK 3.2.1 prebuilt in local mode. My OS scala version is 2.13.7.But when I run spark-sumit then check the SparkUI, the web page shown that my scala version is 2.13.5.I used spark-shell, it also shown that my scala version is 2.13.5.Then I tried

Re: [Spark SQL]: Configuring/Using Spark + Catalyst optimally for read-heavy transactional workloads in JDBC sources?

2022-05-18 Thread Gavin Ray
Following up on this in case anyone runs across it in the archives in the future >From reading through the config docs and trying various combinations, I've discovered that: - You don't want to disable codegen. This roughly doubled the time to perform simple, few-column/few-row queries from basic

[Spark SQL]: Configuring/Using Spark + Catalyst optimally for read-heavy transactional workloads in JDBC sources?

2022-05-16 Thread Gavin Ray
Hi all, I've not got much experience with Spark, but have been reading the Catalyst and Datasources V2 code/tests to try to get a basic understanding. I'm interested in trying Catalyst's query planner + optimizer for queries spanning one-or-more JDBC sources. Somewhat unusually, I'd like to do

trouble using spark in kubernetes

2022-05-03 Thread Andreas Klos
Hello together, I am trying to run a minimal example in my k8s cluster. First, I cloned the petastorm github repo: https://github.com/uber/petastorm Second, I created a Dockerimage as follows: FROMubuntu:20.04 RUN apt-get update -qq RUN apt-get install -qq -y software-properties-common RUN

Re: [EXTERNAL] Re: Unable to access Google buckets using spark-submit

2022-02-14 Thread Saurabh Gulati
ubject: [EXTERNAL] Re: Unable to access Google buckets using spark-submit Caution! This email originated outside of FedEx. Please do not open attachments or click links from an unknown or suspicious origin. Hi Gaurav, All, I'm doing a spark-submit from my local system to a GCP Dataproc c

Re: Unable to access Google buckets using spark-submit

2022-02-13 Thread karan alang
put the GS access jar with your Spark jars — that’s what the >> class not found exception is pointing you towards. >> >> On Fri, Feb 11, 2022 at 11:58 PM Mich Talebzadeh < >> mich.talebza...@gmail.com> wrote: >> >>> BTW I also answered you in in stackove

Re: Unable to access Google buckets using spark-submit

2022-02-13 Thread karan alang
owards. > > On Fri, Feb 11, 2022 at 11:58 PM Mich Talebzadeh < > mich.talebza...@gmail.com> wrote: > >> BTW I also answered you in in stackoverflow : >> >> >> https://stackoverflow.com/questions/71088934/unable-to-access-google-buckets-using-spark-submi

Re: Unable to access Google buckets using spark-submit

2022-02-13 Thread karan alang
Thanks, Mich - will check this and update. regds, Karan Alang On Sat, Feb 12, 2022 at 1:57 AM Mich Talebzadeh wrote: > BTW I also answered you in in stackoverflow : > > > https://stackoverflow.com/questions/71088934/unable-to-access-google-buckets-using-spark-submit > > HT

Re: Unable to access Google buckets using spark-submit

2022-02-13 Thread Mich Talebzadeh
and I quote "I'm trying to access google buckets, when using spark-submit and running into issues., What needs to be done to debug/fix this". Quote from stack overflow Hence the approach adopted is correct. He has created a bucket in GCP called gs://spark-jars-karan/ and wants to ac

Re: Unable to access Google buckets using spark-submit

2022-02-12 Thread Gourav Sengupta
t; > On Fri, Feb 11, 2022 at 11:58 PM Mich Talebzadeh < > mich.talebza...@gmail.com> wrote: > >> BTW I also answered you in in stackoverflow : >> >> >> https://stackoverflow.com/questions/71088934/unable-to-access-google-buckets-using-spark-submi

Re: Unable to access Google buckets using spark-submit

2022-02-12 Thread Holden Karau
88934/unable-to-access-google-buckets-using-spark-submit > > HTH > > >view my Linkedin profile > <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> > > > https://en.everybodywiki.com/Mich_Talebzadeh > > > > *Disclaimer:* Use it at your ow

Re: Unable to access Google buckets using spark-submit

2022-02-12 Thread Mich Talebzadeh
BTW I also answered you in in stackoverflow : https://stackoverflow.com/questions/71088934/unable-to-access-google-buckets-using-spark-submit HTH view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> https://en.everybodywiki.com/Mich_Tale

Re: Unable to access Google buckets using spark-submit

2022-02-12 Thread Mich Talebzadeh
.UnsupportedFileSystemException: No FileSystem for scheme > "gs" > > ``` > I tried adding the --conf > spark.hadoop.fs.gs.impl=com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem > > to the spark-submit command, but getting ClassNotFoundException > > Details are in s

Unable to access Google buckets using spark-submit

2022-02-11 Thread karan alang
are in stackoverflow : https://stackoverflow.com/questions/71088934/unable-to-access-google-buckets-using-spark-submit Any ideas on how to fix this ? tia !

Re: [issue] not able to add external libs to pyspark job while using spark-submit

2021-11-24 Thread Mich Talebzadeh
coordinates? So that > we run sth like pip install or download from pypi index? > > > > *From:* Mich Talebzadeh > *Sent:* Mittwoch, 24. November 2021 18:28 > *Cc:* user@spark.apache.org > *Subject:* Re: [issue] not able to add external libs to pyspark job while > using s

RE: [issue] not able to add external libs to pyspark job while using spark-submit

2021-11-24 Thread Bode, Meikel, NMA-CFD
using spark-submit The easiest way to set this up is to create dependencies.zip file. Assuming that you have a virtual environment already set-up, where there is directory called site-packages, go to that directory and just create a minimal a shell script say package_and_zip_dependencies.sh to do

Re: [issue] not able to add external libs to pyspark job while using spark-submit

2021-11-24 Thread Mich Talebzadeh
Dear Spark team, > hope my email finds you well > > > I am using pyspark 3.0 and facing an issue with adding external library > [configparser] while running the job using [spark-submit] & [yarn] > > issue: > > > import configparser > ImportError: No module named c

Re: [issue] not able to add external libs to pyspark job while using spark-submit

2021-11-24 Thread Atheer Alabdullatif
external libs to pyspark job while using spark-submit You don't often get email from sro...@gmail.com. Learn why this is important<http://aka.ms/LearnAboutSenderIdentification> External Sender: be CAUTION , Particularly with links and attachments. That's not how you add a library. From th

Re: [issue] not able to add external libs to pyspark job while using spark-submit

2021-11-24 Thread Sean Owen
facing an issue with adding external library > [configparser] while running the job using [spark-submit] & [yarn] > > issue: > > > import configparser > ImportError: No module named configparser21/11/24 08:54:38 INFO > util.ShutdownHookManager: Shutdown hook called > >

[issue] not able to add external libs to pyspark job while using spark-submit

2021-11-24 Thread Atheer Alabdullatif
Dear Spark team, hope my email finds you well I am using pyspark 3.0 and facing an issue with adding external library [configparser] while running the job using [spark-submit] & [yarn] issue: import configparser ImportError: No module named configparser 21/11/24 08:54:38

Accessing a kerberized HDFS using Spark on Openshift

2021-10-13 Thread Gal Shinder
Hi, I have a pod on openshift 4.6 running a jupyter notebook with spark 3.1.1 and python 3.7 (based on open data hub, tweaked the dockerfile because I wanted this specific python version). I'm trying to run spark in client mode using the image of google's spark operator

How to process S3 data in Scalable Manner Using Spark API (wholeTextFile VERY SLOW and NOT scalable)

2021-10-02 Thread Alchemist
Issue:  We are using wholeTextFile() API to read files from S3.  But this API is extremely SLOW due to reasons mentioned below.  Question is how to fix this issue? Here is our analysis so FAR:  Issue is we are using Spark WholeTextFile API to read s3 files. WholeTextFile API works in two step

Re: Unable to write data into hive table using Spark via Hive JDBC driver Caused by: org.apache.hive.service.cli.HiveSQLException: Error while compiling statement: FAILED

2021-07-20 Thread Mich Talebzadeh
BTW what assumption is there that the thread owner is writing to the cluster? The thrift server is running locally on localhost:1. I concur that JDBC to remote Hive is needed. However, this is not the impression I get here. df.write .format("jdbc") .option("url",

Re: Unable to write data into hive table using Spark via Hive JDBC driver Caused by: org.apache.hive.service.cli.HiveSQLException: Error while compiling statement: FAILED

2021-07-20 Thread Daniel de Oliveira Mantovani
>From the Cloudera Documentation: https://docs.cloudera.com/documentation/other/connectors/hive-jdbc/latest/Cloudera-JDBC-Driver-for-Apache-Hive-Install-Guide.pdf UseNativeQuery 1: The driver does not transform the queries emitted by applications, so the native query is used. 0: The driver

Re: Unable to write data into hive table using Spark via Hive JDBC driver Caused by: org.apache.hive.service.cli.HiveSQLException: Error while compiling statement: FAILED

2021-07-20 Thread Daniel de Oliveira Mantovani
Insert mode is "overwrite", it shouldn't doesn't matter if the table already exists or not. The JDBC driver should be based on the Cloudera Hive version, we can't know the CDH version he's using. On Tue, Jul 20, 2021 at 1:21 PM Mich Talebzadeh wrote: > The driver is fine and latest and it

Re: Unable to write data into hive table using Spark via Hive JDBC driver Caused by: org.apache.hive.service.cli.HiveSQLException: Error while compiling statement: FAILED

2021-07-20 Thread Mich Talebzadeh
The driver is fine and latest and it should work. I have asked the thread owner to send the DDL of the table and how the table is created. In this case JDBC from Spark expects the table to be there. The error below java.sql.SQLException: [Cloudera][HiveJDBCDriver](500051) ERROR processing

Re: Unable to write data into hive table using Spark via Hive JDBC driver Caused by: org.apache.hive.service.cli.HiveSQLException: Error while compiling statement: FAILED

2021-07-20 Thread Daniel de Oliveira Mantovani
Badrinath is trying to write to a Hive in a cluster where he doesn't have permission to submit spark jobs, he doesn't have Hive/Spark metadata access. The only way to communicate with this third-party Hive cluster is through JDBC protocol. [ Cloudera Data Hub - Hive Server] <-> [Spark Standalone]

Re: Unable to write data into hive table using Spark via Hive JDBC driver Caused by: org.apache.hive.service.cli.HiveSQLException: Error while compiling statement: FAILED

2021-07-19 Thread Artemis User
As Mich mentioned, no need to use jdbc API, using the DataFrameWriter's saveAsTable method is the way to go.   JDBC Driver is for a JDBC client (a Java client for instance) to access the Hive tables in Spark via the Thrift server interface. -- ND On 7/19/21 2:42 AM, Badrinath Patchikolla

Re: Unable to write data into hive table using Spark via Hive JDBC driver Caused by: org.apache.hive.service.cli.HiveSQLException: Error while compiling statement: FAILED

2021-07-19 Thread Badrinath Patchikolla
I have trying to create table in hive from spark itself, And using local mode it will work what I am trying here is from spark standalone I want to create the manage table in hive (another spark cluster basically CDH) using jdbc mode. When I try that below are the error I am facing. On Thu, 15

Re: Unable to write data into hive table using Spark via Hive JDBC driver Caused by: org.apache.hive.service.cli.HiveSQLException: Error while compiling statement: FAILED

2021-07-19 Thread Mich Talebzadeh
Your Driver seems to be OK. hive_driver: com.cloudera.hive.jdbc41.HS2Driver However this is theSQL error you are getting Caused by: com.cloudera.hiveserver2.support.exceptions.GeneralException: [Cloudera][HiveJDBCDriver](500051) ERROR processing query/statement. Error Code: 4, SQL state:

Re: Unable to write data into hive table using Spark via Hive JDBC driver Caused by: org.apache.hive.service.cli.HiveSQLException: Error while compiling statement: FAILED

2021-07-15 Thread Mich Talebzadeh
Have you created that table in Hive or are you trying to create it from Spark itself. You Hive is local. In this case you don't need a JDBC connection. Have you tried: df2.write.mode("overwrite").saveAsTable(mydb.mytable) HTH view my Linkedin profile

Unable to write data into hive table using Spark via Hive JDBC driver Caused by: org.apache.hive.service.cli.HiveSQLException: Error while compiling statement: FAILED

2021-07-15 Thread Badrinath Patchikolla
Hi, Trying to write data in spark to the hive as JDBC mode below is the sample code: spark standalone 2.4.7 version 21/07/15 08:04:07 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Setting default log level to

Re: Insert into table with one the value is derived from DB function using spark

2021-06-20 Thread Mich Talebzadeh
'.'+tableName >>>>> user = self.config['OracleVariables']['oracle_user'] >>>>> password = self.config['OracleVariables']['oracle_password'] >>>>> driver = self.config['OracleVariables']['oracle_driver'] >>>>>

Re: Insert into table with one the value is derived from DB function using spark

2021-06-19 Thread Sebastian Piu
tchsize = self.config['OracleVariables']['fetchsize'] >>>> read_df = >>>> s.loadTableFromJDBC(self.spark,oracle_url,fullyQualifiedTableName,user,password,driver,fetchsize) >>>> # check that all rows are there >>>> if df2.

Re: Insert into table with one the value is derived from DB function using spark

2021-06-19 Thread Mich Talebzadeh
oaded to Oracle table, quitting") >>> sys.exit(1) >>> >>> in the statement where it says >>> >>> option("dbtable", tableName). \ >>> >>> You can replace *tableName* with the equivalent SQL insert statement >>>

Re: Insert into table with one the value is derived from DB function using spark

2021-06-19 Thread ayan guha
t;>> >>> option("dbtable", tableName). \ >>> >>> You can replace *tableName* with the equivalent SQL insert statement >>> >>> You will need JDBC driver for Oracle say ojdbc6.jar in >>> $SPARK_HOME/conf/spark-defau

Re: Insert into table with one the value is derived from DB function using spark

2021-06-18 Thread Mich Talebzadeh
.driver.extraClassPath >> /home/hduser/jars/jconn4.jar:/home/hduser/jars/ojdbc6.jar >> >> HTH >> >> >> >>view my Linkedin profile >> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> >> >> >> >> *Disclaimer:

Re: Insert into table with one the value is derived from DB function using spark

2021-06-18 Thread Anshul Kala
gt; spark.driver.extraClassPath > /home/hduser/jars/jconn4.jar:/home/hduser/jars/ojdbc6.jar > > HTH > > > >view my Linkedin profile > <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> > > > > *Disclaimer:* Use it at your own risk. Any and all resp

Re: Insert into table with one the value is derived from DB function using spark

2021-06-18 Thread Mich Talebzadeh
ll in no case be liable for any monetary damages arising from such loss, damage or destruction. On Fri, 18 Jun 2021 at 20:49, Anshul Kala wrote: > Hi All, > > I am using spark to ingest data from file to database Oracle table . For > one of the fields , the value to be populated i

Insert into table with one the value is derived from DB function using spark

2021-06-18 Thread Anshul Kala
Hi All, I am using spark to ingest data from file to database Oracle table . For one of the fields , the value to be populated is generated from a function that is written in database . The input to the function is one of the fields of data frame I wanted to use spark.dbc.write to perform

Re: Moving millions of file using spark

2021-06-16 Thread Molotch
Definitely not a spark task. Moving files within the same filesystem is merely a linking exercise, you don't have to actually move any data. Write a shell script creating hard links in the new location, once you're satisfied, remove the old links, profit. -- Sent from:

Moving millions of file using spark

2021-06-16 Thread rajat kumar
Hello , I know this might not be a valid use case for spark. But I have millions of files in a single folder. file names are having a pattern. based on pattern I want to move it to different directory. Can you pls suggest what can be done? Thanks rajat

Data Lakes using Spark

2021-04-07 Thread Boris Litvak
Hi Friends, I’d like to publish a document to Medium about data lakes using Spark. Its latter parts include info that is not widely known, unless you have experience with data lakes. https://github.com/borislitvak/datalake-article/blob/initial_comments/Building%20a%20Real%20Life%20Data%20Lake

Re: [Spark SQL]: Can complex oracle views be created using Spark SQL

2021-03-23 Thread Mich Talebzadeh
ying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction. On Mon, 22 Mar 2021 at 05:38, Gaurav Singh wrote: > Hi Team, > > We have lots of complex oracle views ( containing multiple tables, joins, > analytical and aggregate functions, sub queries etc) and we are wondering > if Spark can help us execute those views faster. > > Also we want to know if those complex views can be implemented using Spark > SQL? > > Thanks and regards, > Gaurav Singh > +91 8600852256 > >

Re: [Spark SQL]: Can complex oracle views be created using Spark SQL

2021-03-22 Thread Mich Talebzadeh
nt to know if those complex views can be implemented using Spark > SQL? > > Thanks and regards, > Gaurav Singh > +91 8600852256 > >

[Spark SQL]: Can complex oracle views be created using Spark SQL

2021-03-21 Thread Gaurav Singh
Hi Team, We have lots of complex oracle views ( containing multiple tables, joins, analytical and aggregate functions, sub queries etc) and we are wondering if Spark can help us execute those views faster. Also we want to know if those complex views can be implemented using Spark SQL? Thanks

Re: Using Spark as a fail-over platform for Java app

2021-03-12 Thread Jungtaek Lim
, March 12, 2021 at 2:53 PM > *To: *User > *Subject: *[EXTERNAL] Using Spark as a fail-over platform for Java app > > > > *CAUTION*: This email originated from outside of the organization. Do not > click links or open attachments unless you can confirm the sender and kno

Re: Using Spark as a fail-over platform for Java app

2021-03-12 Thread Lalwani, Jayesh
balancer will shift the traffic to the healthy node until the crashed node recovers. From: Sergey Oboguev Date: Friday, March 12, 2021 at 2:53 PM To: User Subject: [EXTERNAL] Using Spark as a fail-over platform for Java app CAUTION: This email originated from outside of the organization. Do

Using Spark as a fail-over platform for Java app

2021-03-12 Thread Sergey Oboguev
I have an existing plain-Java (non-Spark) application that needs to run in a fault-tolerant way, i.e. if the node crashes then the application is restarted on another node, and if the application crashes because of internal fault, the application is restarted too. Normally I would run it in a

Re: Hive using Spark engine vs native spark with hive integration.

2020-10-07 Thread Patrick McCarthy
I think a lot will depend on what the scripts do. I've seen some legacy hive scripts which were written in an awkward way (e.g. lots of subqueries, nested explodes) because pre-spark it was the only way to express certain logic. For fairly straightforward operations I expect Catalyst would reduce

Re: Hive using Spark engine vs native spark with hive integration.

2020-10-06 Thread Ricardo Martinelli de Oliveira
My 2 cents is that this is a complicated question since I'm not confident that Spark is 100% compatible with Hive in terms of query language. I have an unanswered question in this list about this:

Hive using Spark engine vs native spark with hive integration.

2020-10-06 Thread Manu Jacob
Hi All, Not sure if I need to ask this question on spark community or hive community. We have a set of hive scripts that runs on EMR (Tez engine). We would like to experiment by moving some of it onto Spark. We are planning to experiment with two options. 1. Use the current code based on

Unable to run bash script when using spark-submit in cluster mode.

2020-07-23 Thread Nasrulla Khan Haris
Hi Spark Users, I am trying to execute bash script from my spark app. I can run the below command without issues from spark-shell however when I use it in the spark-app and submit with spark-submit, container is not able to find the directories. val result = "export LD_LIBRARY_PATH=/

RE: Unable to run bash script when using spark-submit in cluster mode.

2020-07-23 Thread Nasrulla Khan Haris
Are local paths not exposed in containers ? Thanks, Nasrulla From: Nasrulla Khan Haris Sent: Thursday, July 23, 2020 6:13 PM To: user@spark.apache.org Subject: Unable to run bash script when using spark-submit in cluster mode. Importance: High Hi Spark Users, I am trying to execute bash

Re: OOM while processing read/write to S3 using Spark Structured Streaming

2020-07-19 Thread Piyush Acharya
Please try with maxBytesPerTrigger option, probably files are big enough to crash the JVM. Please give some info on Executors and file info ( size etc) Regards, ..Piyush On Sun, Jul 19, 2020 at 3:29 PM Rachana Srivastava wrote: > *Issue:* I am trying to process 5000+ files of gzipped json file

Re: OOM while processing read/write to S3 using Spark Structured Streaming

2020-07-19 Thread Sanjeev Mishra
Can you reduce maxFilesPerTrigger further and see if the OOM still persists, if it does then the problem may be somewhere else. > On Jul 19, 2020, at 5:37 AM, Jungtaek Lim > wrote: > > Please provide logs and dump file for the OOM case - otherwise no one could > say what's the cause. > >

Re: OOM while processing read/write to S3 using Spark Structured Streaming

2020-07-19 Thread Jungtaek Lim
Please provide logs and dump file for the OOM case - otherwise no one could say what's the cause. Add JVM options to driver/executor => -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath="...dir..." On Sun, Jul 19, 2020 at 6:56 PM Rachana Srivastava wrote: > *Issue:* I am trying to process 5000+

OOM while processing read/write to S3 using Spark Structured Streaming

2020-07-19 Thread Rachana Srivastava
Issue: I am trying to process 5000+ files of gzipped json file periodically from S3 using Structured Streaming code.  Here are the key steps: - Read json schema and broadccast to executors - Read Stream Dataset inputDS = sparkSession.readStream() .format("text")

Re: Issue in parallelization of CNN model using spark

2020-07-17 Thread Mukhtaj Khan
allelize model training developed using standard libraries like >> Keras, use Horovod from Uber. >> https://horovod.readthedocs.io/en/stable/spark_include.html >> >> On Mon, Jul 13, 2020 at 6:59 AM Mukhtaj Khan wrote: >> > >> > Dear Spark User >> > &

Re: “Pyspark.zip does not exist” using Spark in cluster mode with Yarn

2020-07-16 Thread Hulio andres
ent: Thursday, July 16, 2020 at 6:54 PM > From: "Davide Curcio" > To: "user@spark.apache.org" > Subject: “Pyspark.zip does not exist” using Spark in cluster mode with Yarn > > I'm trying to run some Spark script in cluster mode using Yarn but I've > always obt

“Pyspark.zip does not exist” using Spark in cluster mode with Yarn

2020-07-16 Thread Davide Curcio
I'm trying to run some Spark script in cluster mode using Yarn but I've always obtained this error. I read in other similar question that the cause can be: "Local" set up hard-coded as a master but I don't have it HADOOP_CONF_DIR environment variable that's wrong inside spark-env.sh but it

Re: Issue in parallelization of CNN model using spark

2020-07-14 Thread Anwar AliKhan
Ok, thanks. You can buy it here https://www.amazon.com/s?k=hands+on+machine+learning+with+scikit-learn+and+tensorflow+2=2U0P9XVIJ790T=Hands+on+machine+%2Caps%2C246=nb_sb_ss_i_1_17 This book is like an accompaniment to the Andrew Ng course on coursera. It uses exact same mathematical notations ,

Re: Issue in parallelization of CNN model using spark

2020-07-14 Thread Sean Owen
It is still copyrighted material, no matter its state of editing. Yes, you should not be sharing this on the internet. On Tue, Jul 14, 2020 at 9:46 AM Anwar AliKhan wrote: > > Please note It is freely available because it is an early unedited raw > edition. > It is not 100% complete , it is not

Re: Issue in parallelization of CNN model using spark

2020-07-14 Thread Anwar AliKhan
t;> On Mon, Jul 13, 2020 at 6:59 AM Mukhtaj Khan >>> wrote: >>> > >>> > Dear Spark User >>> > >>> > I am trying to parallelize the CNN (convolutional neural network) >>> model using spark. I have developed the model using pyt

Re: Issue in parallelization of CNN model using spark

2020-07-14 Thread Patrick McCarthy
gt;> > >> > Dear Spark User >> > >> > I am trying to parallelize the CNN (convolutional neural network) model >> using spark. I have developed the model using python and Keras library. The >> model works fine on a single machine but when we try on multi

Re: Issue in parallelization of CNN model using spark

2020-07-13 Thread Anwar AliKhan
M Mukhtaj Khan wrote: > > > > Dear Spark User > > > > I am trying to parallelize the CNN (convolutional neural network) model > using spark. I have developed the model using python and Keras library. The > model works fine on a single machine but when we try on multiple machines,

Re: Issue in parallelization of CNN model using spark

2020-07-13 Thread Anwar AliKhan
libraries like > Keras, use Horovod from Uber. > https://horovod.readthedocs.io/en/stable/spark_include.html > > On Mon, Jul 13, 2020 at 6:59 AM Mukhtaj Khan wrote: > > > > Dear Spark User > > > > I am trying to parallelize the CNN (convolutional neural network) mo

Using Spark UI with Running Spark on Hadoop Yarn

2020-07-13 Thread ArtemisDev
when running spark on a hadoop yarn cluster.  Is this correct?   Does the spark history server have the same user functions as the Spark UI? But how could this be possible (the possibility of using Spark UI) if the Spark master server isn't active when all the job scheduling and resource

Re: Issue in parallelization of CNN model using spark

2020-07-13 Thread Sean Owen
Mukhtaj Khan wrote: > > Dear Spark User > > I am trying to parallelize the CNN (convolutional neural network) model using > spark. I have developed the model using python and Keras library. The model > works fine on a single machine but when we try on multiple machines, the > e

Re: Issue in parallelization of CNN model using spark

2020-07-13 Thread Juan Martín Guillén
CNN (convolutional neural network) model using spark. I have developed the model using python and Keras library. The model works fine on a single machine but when we try on multiple machines, the execution time remains the same as sequential.Could you please tell me that there is any built-in library for

Issue in parallelization of CNN model using spark

2020-07-13 Thread Mukhtaj Khan
Dear Spark User I am trying to parallelize the CNN (convolutional neural network) model using spark. I have developed the model using python and Keras library. The model works fine on a single machine but when we try on multiple machines, the execution time remains the same as sequential. Could

Re: Is it possible to use Hadoop 3.x and Hive 3.x using spark 2.4?

2020-07-06 Thread Daniel de Oliveira Mantovani
Hi Teja, To access Hive 3 using Apache Spark 2.x.x you need to use this connector from Cloudera https://docs.cloudera.com/HDPDocuments/HDP3/HDP-3.1.5/integrating-hive/content/hive_hivewarehouseconnector_for_handling_apache_spark_data.html . It has many limitations You just can write to Hive

Re: Is it possible to use Hadoop 3.x and Hive 3.x using spark 2.4?

2020-07-06 Thread Sean Owen
2.4 works with Hadoop 3 (optionally) and Hive 1. I doubt it will work connecting to Hadoop 3 / Hive 3; it's possible in a few cases. It's also possible some vendor distributions support this combination. On Mon, Jul 6, 2020 at 7:51 AM Teja wrote: > > We use spark 2.4.0 to connect to Hadoop 2.7

Is it possible to use Hadoop 3.x and Hive 3.x using spark 2.4?

2020-07-06 Thread Teja
We use spark 2.4.0 to connect to Hadoop 2.7 cluster and query from Hive Metastore version 2.3. But the Cluster managing team has decided to upgrade to Hadoop 3.x and Hive 3.x. We could not migrate to spark 3 yet, which is compatible with Hadoop 3 and Hive 3, as we could not test if anything

Re: Using Spark Accumulators with Structured Streaming

2020-06-08 Thread Something Something
way: >>>>>> > > > > > > Dataset productUpdates = watermarkedDS >>>>>> > > > > > > .groupByKey( >>>>>> > > > > > > (MapFunction>>>>> String>) event >>>>

Re: Using Spark Accumulators with Structured Streaming

2020-06-08 Thread Srinivas V
t; appConfig, accumulators), >>>>> > > > > > > >>>>> Encoders.bean(ModelStateInfo.class), >>>>> > > > > > > Encoders.bean(ModelUpdate.class), >>>>> > > > > > > >>>

Re: Using Spark Accumulators with Structured Streaming

2020-06-08 Thread Something Something
gt; Yes, that's exactly how I am creating them. >>>> > > > > > > > >>>> > > > > > > > Question... Are you using 'Stateful Structured Streaming' >>>> in which >>>> > > > > >

Re: Using Spark Accumulators with Structured Streaming

2020-06-08 Thread Something Something
gt; updateAcrossEvents >>> > > > > > > > ) >>> > > > > > > > >>> > > > > > > > And updating the Accumulator inside 'updateAcrossEvents'? >>> We're >>> > > > > > experiencing this only under 'S

Re: Using Spark Accumulators with Structured Streaming

2020-06-08 Thread Srinivas V
> > am >> > > > > > > >> getting the values printed in my driver log as well as >> sent to >> > > > > > Grafana. Not >> > > > > > > >> sure where and when I saw 0 before. My deploy mode

Re: Using Spark Accumulators with Structured Streaming

2020-06-07 Thread Something Something
gt; > >> Create accumulators like this: > > > > > > > >> AccumulatorV2 accumulator = > sparkContext.longAccumulator(name); > > > > > > > >> > > > > > > > >> > > > > > > > >> On Tue, May 26, 2020

Re: Using Spark Accumulators with Structured Streaming

2020-06-04 Thread ZHANG Wei
about the Application Specific > > > Accumulators. > > > > > The > > > > > > >>> other standard counters such as > > > 'event.progress.inputRowsPerSecond' > > > > > are > > > > > > >>> getting populated correctly! > > > > > > >>> > > > > > >

Re: Using Spark Accumulators with Structured Streaming

2020-06-01 Thread ZHANG Wei
print in OnQueryProgress. I use > > > > > >>>> LongAccumulator as well. Yes, it prints on my local but not on > > > > cluster. > > > > > >>>> But one consolation is that when I send metrics to Graphana, the > >

Re: Using Spark Accumulators with Structured Streaming

2020-05-30 Thread Srinivas V
t;>>>> Accumulators. The other standard counters such as >>>>>>>>> 'event.progress.inputRowsPerSecond' are getting populated correctly! >>>>>>>>> >>>>>>>>> On Mon, May 25, 2020 at 8:39 PM Srinivas V >

Re: Using Spark Accumulators with Structured Streaming

2020-05-29 Thread Something Something
>>>> wrote: >>>>>>>> >>>>>>>>> Hello, >>>>>>>>> Even for me it comes as 0 when I print in OnQueryProgress. I use >>>>>>>>> LongAccumulator as well. Yes, it prints on my local but not on

Re: Using Spark Accumulators with Structured Streaming

2020-05-29 Thread Srinivas V
;>>> But one consolation is that when I send metrics to Graphana, the >>>>>>>> values are coming there. >>>>>>>> >>>>>>>> On Tue, May 26, 2020 at 3:10 AM Something Something < >>>>>>>> mailinglist...@gmail.com

Re: Using Spark Accumulators with Structured Streaming

2020-05-29 Thread Something Something
ist...@gmail.com> wrote: >>>>>>> >>>>>>>> No this is not working even if I use LongAccumulator. >>>>>>>> >>>>>>>> On Fri, May 15, 2020 at 9:54 PM ZHANG Wei wrote: >>>>>>>> >>>>>>>>> There is a restriction in Accumulat

Re: Using Spark Accumulators with Structured Streaming

2020-05-29 Thread Something Something
;>>> > > > > >>>>> No this is not working even if I use LongAccumulator. > > > > >>>>> > > > > >>>>> On Fri, May 15, 2020 at 9:54 PM ZHANG Wei > > > > wrote: > > > > >>>

Re: Using Spark Accumulators with Structured Streaming

2020-05-29 Thread Srinivas V
t;>> >>>>>>>> There is a restriction in AccumulatorV2 API [1], the OUT type >>>>>>>> should be atomic or thread safe. I'm wondering if the implementation >>>>>>>> for >>>>>>>> `java.util.Map[T, Long

Re: Using Spark Accumulators with Structured Streaming

2020-05-28 Thread ZHANG Wei
gt;>>>>> `java.util.Map[T, Long]` can meet it or not. Is there any chance > > to replace > > > >>>>>> CollectionLongAccumulator by CollectionAccumulator[2] or > > LongAccumulator[3] > > > >>>>>> and test if the StreamingListener and ot

Re: Using Spark Accumulators with Structured Streaming

2020-05-28 Thread Something Something
gt;>>>>> replace >>>>>>> CollectionLongAccumulator by CollectionAccumulator[2] or >>>>>>> LongAccumulator[3] >>>>>>> and test if the StreamingListener and other codes are able to work? >>>>>>> &g

Re: Using Spark Accumulators with Structured Streaming

2020-05-28 Thread Srinivas V
>>> > https://eur06.safelinks.protection.outlook.com/?url=http%3A%2F%2Fspark.apache.org%2Fdocs%2Flatest%2Fapi%2Fscala%2Findex.html%23org.apache.spark.util.AccumulatorV2data=02%7C01%7C%7Ce9cd79340511422f368608d802fc468d%7C84df9e7fe9f640afb435%7C1%7C0%7C637262629

Re: Using Spark Accumulators with Structured Streaming

2020-05-28 Thread ZHANG Wei
.apache.spark.util.AccumulatorV2data=02%7C01%7C%7Ce9cd79340511422f368608d802fc468d%7C84df9e7fe9f640afb435%7C1%7C0%7C637262629816034378sdata=73AxOzjhvImCuhXPoMN%2Bm7%2BY3KYwwaoCvmYMoOEGDtU%3Dreserved=0 > >>>>>> [2] > >>>>>> https://eur06.safelinks.protection.

Re: Using Spark Accumulators with Structured Streaming

2020-05-28 Thread Srinivas V
-- >>>>>> Cheers, >>>>>> -z >>>>>> [1] >>>>>> http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.util.AccumulatorV2 >>>>>> [2] >>>>>> http://spark.apache.org/docs/late

  1   2   3   4   5   6   7   8   9   10   >