[Spark-SQL] Dataframe write saveAsTable failed

2023-06-26 Thread Anil Dasari
Hi, We have upgraded Spark from 2.4.x to 3.3.1 recently and managed table creation while writing dataframe as saveAsTable failed with below error. Can not create the managed table(``) The associated location('hdfs:') already exists. On high level our code does below before writing dataframe as t

Accelerating Spark SQL / Dataframe using GPUs & Alluxio

2021-04-23 Thread Bin Fan
Hi Spark users, We have been working on GPU acceleration for Apache Spark SQL / Dataframe using the RAPIDS Accelerator for Apache Spark <https://www.nvidia.com/en-us/deep-learning-ai/solutions/data-science/apache-spark-3/> and open source project Alluxio <https://github.com/Alluxi

[Spark SQL]: Dataframe group by potential bug (Scala)

2019-10-31 Thread ludwiggj
This is using Spark Scala 2.4.4. I'm getting some very strange behaviour after reading in a dataframe from a json file, using sparkSession.read in permissive mode. I've included the error column when reading in the data, as I want to log details of any errors in the input json file. My suspicion i

[Spark SQL]: DataFrame schema resulting in NullPointerException

2017-11-19 Thread Chitral Verma
Hey, I'm working on this use case that involves converting DStreams to Dataframes after some transformations. I've simplified my code into the following snippet so as to reproduce the error. Also, I've mentioned below my environment settings. *Environment:* Spark Version: 2.2.0 Java: 1.8 Executi

RE: Spark SQL DataFrame to Kafka Topic

2017-05-16 Thread Revin Chalil
Thanks Michael, that worked, appreciate your help. From: Michael Armbrust [mailto:mich...@databricks.com] Sent: Monday, May 15, 2017 11:45 AM To: Revin Chalil Cc: User Subject: Re: Spark SQL DataFrame to Kafka Topic The foreach sink from that blog post requires that you have a DataFrame with

Re: Spark SQL DataFrame to Kafka Topic

2017-05-15 Thread Michael Armbrust
an > wrote: > > Yes, it is called Structured Streaming: https://docs. > databricks.com/_static/notebooks/structured-streaming-kafka.html > > http://spark.apache.org/docs/latest/structured-streaming- > programming-guide.html > > > > On Fri, Jan 13, 2017 at 3:32 AM,

RE: Spark SQL DataFrame to Kafka Topic

2017-05-15 Thread Revin Chalil
Kumar ; User ; senthilec...@apache.org; Ofir Manor ; Hemanth Gudela ; lucas.g...@gmail.com; Koert Kuipers ; silvio.fior...@granturing.com Subject: RE: Spark SQL DataFrame to Kafka Topic Hi TD / Michael, I am trying to use the foreach sink to write to Kafka and followed this<ht

RE: Spark SQL DataFrame to Kafka Topic

2017-05-14 Thread Revin Chalil
.das1...@gmail.com] Sent: Friday, January 13, 2017 3:31 PM To: Koert Kuipers Cc: Peyman Mohajerian ; Senthil Kumar ; User ; senthilec...@apache.org Subject: Re: Spark SQL DataFrame to Kafka Topic Structured Streaming has a foreach sink, where you can essentially do what you want wit

Re: Spark SQL, dataframe join questions.

2017-03-29 Thread vaquar khan
ause another shuffle. >> So I am not sure if it is a smart way. >> >> Yong >> >> -- >> *From:* shyla deshpande >> *Sent:* Wednesday, March 29, 2017 12:33 PM >> *To:* user >> *Subject:* Re: Spark SQL, dataframe join questions. >> >&g

Re: Spark SQL, dataframe join questions.

2017-03-29 Thread Vidya Sujeet
-- > *From:* shyla deshpande > *Sent:* Wednesday, March 29, 2017 12:33 PM > *To:* user > *Subject:* Re: Spark SQL, dataframe join questions. > > > > On Tue, Mar 28, 2017 at 2:57 PM, shyla deshpande > wrote: > >> Following are my questions. Thank you

Re: Spark SQL, dataframe join questions.

2017-03-29 Thread Yong Zhang
owing join COULD cause another shuffle. So I am not sure if it is a smart way. Yong From: shyla deshpande Sent: Wednesday, March 29, 2017 12:33 PM To: user Subject: Re: Spark SQL, dataframe join questions. On Tue, Mar 28, 2017 at 2:57 PM, shyla desh

Re: Spark SQL, dataframe join questions.

2017-03-29 Thread shyla deshpande
On Tue, Mar 28, 2017 at 2:57 PM, shyla deshpande wrote: > Following are my questions. Thank you. > > 1. When joining dataframes is it a good idea to repartition on the key column > that is used in the join or > the optimizer is too smart so forget it. > > 2. In RDD join, wherever possible we do

Re: Spark SQL DataFrame to Kafka Topic

2017-01-24 Thread ayan guha
docs.databricks.com/_static/notebooks/structured-streaming-kafka.html > > http://spark.apache.org/docs/latest/structured-streaming-programming-guide.html > > On Fri, Jan 13, 2017 at 3:32 AM, Senthil Kumar > wrote: > > Hi Team , > > Sorry if this question alr

Re: Spark SQL DataFrame to Kafka Topic

2017-01-24 Thread Koert Kuipers
ming-pro >>> gramming-guide.html >>> >>> On Fri, Jan 13, 2017 at 3:32 AM, Senthil Kumar >>> wrote: >>> >>>> Hi Team , >>>> >>>> Sorry if this question already asked in this forum.. >>>> >>>&

Re: Spark SQL DataFrame to Kafka Topic

2017-01-13 Thread Tathagata Das
notebooks/structured-streaming-kafka.html >> http://spark.apache.org/docs/latest/structured-streaming-pro >> gramming-guide.html >> >> On Fri, Jan 13, 2017 at 3:32 AM, Senthil Kumar >> wrote: >> >>> Hi Team , >>> >>> Sorry if this question

Re: Spark SQL DataFrame to Kafka Topic

2017-01-13 Thread Koert Kuipers
.org/docs/latest/structured-streaming- > programming-guide.html > > On Fri, Jan 13, 2017 at 3:32 AM, Senthil Kumar > wrote: > >> Hi Team , >> >> Sorry if this question already asked in this forum.. >> >> Can we ingest data to Apache Kafka Topic f

Re: Spark SQL DataFrame to Kafka Topic

2017-01-13 Thread Peyman Mohajerian
question already asked in this forum.. > > Can we ingest data to Apache Kafka Topic from Spark SQL DataFrame ?? > > Here is my Code which Reads Parquet File : > > *val sqlContext = new org.apache.spark.sql.SQLContext(sc);* > > *val df = sqlContext.

Spark SQL DataFrame to Kafka Topic

2017-01-13 Thread Senthil Kumar
Hi Team , Sorry if this question already asked in this forum.. Can we ingest data to Apache Kafka Topic from Spark SQL DataFrame ?? Here is my Code which Reads Parquet File : *val sqlContext = new org.apache.spark.sql.SQLContext(sc);* *val df = sqlContext.read.parquet("

Spark sql dataframe

2016-06-29 Thread pooja mehta
Hi, Want to add a metadata field to StructField case class in spark. case class StructField(name: String) And how to carry over the metadata in query execution.

Re: Spark SQL DataFrame 1.5.0 is extremely slow for take(1) or head() or first()

2015-09-21 Thread Yin Huai
Looks like the problem is df.rdd does not work very well with limit. In scala, df.limit(1).rdd will also trigger the issue you observed. I will add this in the jira. On Mon, Sep 21, 2015 at 10:44 AM, Jerry Lam wrote: > I just noticed you found 1.4 has the same issue. I added that as well in > th

Re: Spark SQL DataFrame 1.5.0 is extremely slow for take(1) or head() or first()

2015-09-21 Thread Jerry Lam
I just noticed you found 1.4 has the same issue. I added that as well in the ticket. On Mon, Sep 21, 2015 at 1:43 PM, Jerry Lam wrote: > Hi Yin, > > You are right! I just tried the scala version with the above lines, it > works as expected. > I'm not sure if it happens also in 1.4 for pyspark bu

Re: Spark SQL DataFrame 1.5.0 is extremely slow for take(1) or head() or first()

2015-09-21 Thread Jerry Lam
Hi Yin, You are right! I just tried the scala version with the above lines, it works as expected. I'm not sure if it happens also in 1.4 for pyspark but I thought the pyspark code just calls the scala code via py4j. I didn't expect that this bug is pyspark specific. That surprises me actually a bi

Re: Spark SQL DataFrame 1.5.0 is extremely slow for take(1) or head() or first()

2015-09-21 Thread Yin Huai
Seems 1.4 has the same issue. On Mon, Sep 21, 2015 at 10:01 AM, Yin Huai wrote: > btw, does 1.4 has the same problem? > > On Mon, Sep 21, 2015 at 10:01 AM, Yin Huai wrote: > >> Hi Jerry, >> >> Looks like it is a Python-specific issue. Can you create a JIRA? >> >> Thanks, >> >> Yin >> >> On Mon,

Re: Spark SQL DataFrame 1.5.0 is extremely slow for take(1) or head() or first()

2015-09-21 Thread Yin Huai
btw, does 1.4 has the same problem? On Mon, Sep 21, 2015 at 10:01 AM, Yin Huai wrote: > Hi Jerry, > > Looks like it is a Python-specific issue. Can you create a JIRA? > > Thanks, > > Yin > > On Mon, Sep 21, 2015 at 8:56 AM, Jerry Lam wrote: > >> Hi Spark Developers, >> >> I just ran some very s

Re: Spark SQL DataFrame 1.5.0 is extremely slow for take(1) or head() or first()

2015-09-21 Thread Yin Huai
Hi Jerry, Looks like it is a Python-specific issue. Can you create a JIRA? Thanks, Yin On Mon, Sep 21, 2015 at 8:56 AM, Jerry Lam wrote: > Hi Spark Developers, > > I just ran some very simple operations on a dataset. I was surprise by the > execution plan of take(1), head() or first(). > > Fo

Spark SQL DataFrame 1.5.0 is extremely slow for take(1) or head() or first()

2015-09-21 Thread Jerry Lam
Hi Spark Developers, I just ran some very simple operations on a dataset. I was surprise by the execution plan of take(1), head() or first(). For your reference, this is what I did in pyspark 1.5: df=sqlContext.read.parquet("someparquetfiles") df.head() The above lines take over 15 minutes. I wa

Re: Possible issue for Spark SQL/DataFrame

2015-08-12 Thread Eugene Morozov
uot;salary")) * }} * @group dfops */ On 10 Aug 2015, at 09:36, Netwaver wrote: > Hi Spark experts, > I am now using Spark 1.4.1 and trying Spark SQL/DataFrame > API with text file in below format > id gender height >

Re:Re: Possible issue for Spark SQL/DataFrame

2015-08-12 Thread Netwaver
0, 2015 at 12:06 PM, Netwaver wrote: Hi Spark experts, I am now using Spark 1.4.1 and trying Spark SQL/DataFrame API with text file in below format id gender height 1 M 180

Re: Possible issue for Spark SQL/DataFrame

2015-08-10 Thread Akhil Das
Isnt it a space separated data? It is not a comma(,) separated nor pipe (|) separated data. Thanks Best Regards On Mon, Aug 10, 2015 at 12:06 PM, Netwaver wrote: > Hi Spark experts, > I am now using Spark 1.4.1 and trying Spark SQL/DataFrame > API with text file

Possible issue for Spark SQL/DataFrame

2015-08-09 Thread Netwaver
Hi Spark experts, I am now using Spark 1.4.1 and trying Spark SQL/DataFrame API with text file in below format id gender height 1 M 180 2 F 167 ... ... But I meet

Re: Spark SQL DataFrame: Nullable column and filtering

2015-08-01 Thread Martin Senne
> >> >>>> val df = ..... // some code that creates a DataFrame >> >>>> df.filter( df("columnname").isNotNull() ) >> >>>> >> >>>> +-+-++ >> >>>> |x|a| y| >> >>>&g

Re: Spark SQL DataFrame: Nullable column and filtering

2015-07-31 Thread Martin Senne
; >>>> |2|bob|5| > >>>> +-+---+-+ > >>>> > >>>> > >>>> Unfortunetaly and while this is a true for a nullable column > (according to > >>>> df.printSchema), it is not true for a column that is not nullable: &g

Re: Spark SQL DataFrame: Nullable column and filtering

2015-07-30 Thread Martin Senne
lse) >>>> >>>> +-+-++ >>>> |x|a| y| >>>> +-+-++ >>>> |1|hello|null| >>>> |2| bob| 5| >>>> +-+-++ >>>> >>>> such that the output is not affected by the

Re: Spark SQL DataFrame: Nullable column and filtering

2015-07-30 Thread Michael Armbrust
t;>> A came uo with this:* >>> >>> /** >>>* Set, if a column is nullable. >>>* @param df source DataFrame >>>* @param cn is the column name to change >>>* @param nullable is the flag to set, such that the column is either &g

Re: Spark SQL DataFrame: Nullable column and filtering

2015-07-30 Thread Martin Senne
either >> nullable or not >>*/ >> def setNullableStateOfColumn( df: DataFrame, cn: String, nullable: >> Boolean) : DataFrame = { >> >> val schema = df.schema >> val newSchema = StructType(schema.map { >> cas

Re: Spark SQL DataFrame: Nullable column and filtering

2015-07-30 Thread Michael Armbrust
Boolean) : DataFrame = { > > val schema = df.schema > val newSchema = StructType(schema.map { > case StructField( c, t, _, m) if c.equals(cn) => StructField( c, t, > nullable = nullable, m) > case y: StructField => y > }) > df.sqlContext.createDataFrame( df.rdd, newSchema) &g

Spark SQL DataFrame: Nullable column and filtering

2015-07-30 Thread martinibus77
comments?* Cheers and thx in advance, Martin -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-DataFrame-Nullable-column-and-filtering-tp24087.html Sent from the Apache Spark User List mailing