Hi,
We have upgraded Spark from 2.4.x to 3.3.1 recently and managed table
creation while writing dataframe as saveAsTable failed with below error.
Can not create the managed table(``) The associated
location('hdfs:') already exists.
On high level our code does below before writing dataframe as t
Hi Spark users,
We have been working on GPU acceleration for Apache Spark SQL / Dataframe
using the RAPIDS Accelerator for Apache Spark
<https://www.nvidia.com/en-us/deep-learning-ai/solutions/data-science/apache-spark-3/>
and open source project Alluxio <https://github.com/Alluxi
This is using Spark Scala 2.4.4. I'm getting some very strange behaviour
after reading in a dataframe from a json file, using sparkSession.read in
permissive mode. I've included the error column when reading in the data, as
I want to log details of any errors in the input json file.
My suspicion i
Hey,
I'm working on this use case that involves converting DStreams to
Dataframes after some transformations. I've simplified my code into the
following snippet so as to reproduce the error. Also, I've mentioned below
my environment settings.
*Environment:*
Spark Version: 2.2.0
Java: 1.8
Executi
Thanks Michael, that worked, appreciate your help.
From: Michael Armbrust [mailto:mich...@databricks.com]
Sent: Monday, May 15, 2017 11:45 AM
To: Revin Chalil
Cc: User
Subject: Re: Spark SQL DataFrame to Kafka Topic
The foreach sink from that blog post requires that you have a DataFrame with
an
> wrote:
>
> Yes, it is called Structured Streaming: https://docs.
> databricks.com/_static/notebooks/structured-streaming-kafka.html
>
> http://spark.apache.org/docs/latest/structured-streaming-
> programming-guide.html
>
>
>
> On Fri, Jan 13, 2017 at 3:32 AM,
Kumar
; User ;
senthilec...@apache.org; Ofir Manor ; Hemanth Gudela
; lucas.g...@gmail.com; Koert Kuipers
; silvio.fior...@granturing.com
Subject: RE: Spark SQL DataFrame to Kafka Topic
Hi TD / Michael,
I am trying to use the foreach sink to write to Kafka and followed
this<ht
.das1...@gmail.com]
Sent: Friday, January 13, 2017 3:31 PM
To: Koert Kuipers
Cc: Peyman Mohajerian ; Senthil Kumar
; User ; senthilec...@apache.org
Subject: Re: Spark SQL DataFrame to Kafka Topic
Structured Streaming has a foreach sink, where you can essentially do what you
want wit
ause another shuffle.
>> So I am not sure if it is a smart way.
>>
>> Yong
>>
>> --
>> *From:* shyla deshpande
>> *Sent:* Wednesday, March 29, 2017 12:33 PM
>> *To:* user
>> *Subject:* Re: Spark SQL, dataframe join questions.
>>
>&g
--
> *From:* shyla deshpande
> *Sent:* Wednesday, March 29, 2017 12:33 PM
> *To:* user
> *Subject:* Re: Spark SQL, dataframe join questions.
>
>
>
> On Tue, Mar 28, 2017 at 2:57 PM, shyla deshpande > wrote:
>
>> Following are my questions. Thank you
owing join COULD cause another shuffle. So I
am not sure if it is a smart way.
Yong
From: shyla deshpande
Sent: Wednesday, March 29, 2017 12:33 PM
To: user
Subject: Re: Spark SQL, dataframe join questions.
On Tue, Mar 28, 2017 at 2:57 PM, shyla desh
On Tue, Mar 28, 2017 at 2:57 PM, shyla deshpande
wrote:
> Following are my questions. Thank you.
>
> 1. When joining dataframes is it a good idea to repartition on the key column
> that is used in the join or
> the optimizer is too smart so forget it.
>
> 2. In RDD join, wherever possible we do
docs.databricks.com/_static/notebooks/structured-streaming-kafka.html
>
> http://spark.apache.org/docs/latest/structured-streaming-programming-guide.html
>
> On Fri, Jan 13, 2017 at 3:32 AM, Senthil Kumar
> wrote:
>
> Hi Team ,
>
> Sorry if this question alr
ming-pro
>>> gramming-guide.html
>>>
>>> On Fri, Jan 13, 2017 at 3:32 AM, Senthil Kumar
>>> wrote:
>>>
>>>> Hi Team ,
>>>>
>>>> Sorry if this question already asked in this forum..
>>>>
>>>&
notebooks/structured-streaming-kafka.html
>> http://spark.apache.org/docs/latest/structured-streaming-pro
>> gramming-guide.html
>>
>> On Fri, Jan 13, 2017 at 3:32 AM, Senthil Kumar
>> wrote:
>>
>>> Hi Team ,
>>>
>>> Sorry if this question
.org/docs/latest/structured-streaming-
> programming-guide.html
>
> On Fri, Jan 13, 2017 at 3:32 AM, Senthil Kumar
> wrote:
>
>> Hi Team ,
>>
>> Sorry if this question already asked in this forum..
>>
>> Can we ingest data to Apache Kafka Topic f
question already asked in this forum..
>
> Can we ingest data to Apache Kafka Topic from Spark SQL DataFrame ??
>
> Here is my Code which Reads Parquet File :
>
> *val sqlContext = new org.apache.spark.sql.SQLContext(sc);*
>
> *val df = sqlContext.
Hi Team ,
Sorry if this question already asked in this forum..
Can we ingest data to Apache Kafka Topic from Spark SQL DataFrame ??
Here is my Code which Reads Parquet File :
*val sqlContext = new org.apache.spark.sql.SQLContext(sc);*
*val df = sqlContext.read.parquet("
Hi,
Want to add a metadata field to StructField case class in spark.
case class StructField(name: String)
And how to carry over the metadata in query execution.
Looks like the problem is df.rdd does not work very well with limit. In
scala, df.limit(1).rdd will also trigger the issue you observed. I will add
this in the jira.
On Mon, Sep 21, 2015 at 10:44 AM, Jerry Lam wrote:
> I just noticed you found 1.4 has the same issue. I added that as well in
> th
I just noticed you found 1.4 has the same issue. I added that as well in
the ticket.
On Mon, Sep 21, 2015 at 1:43 PM, Jerry Lam wrote:
> Hi Yin,
>
> You are right! I just tried the scala version with the above lines, it
> works as expected.
> I'm not sure if it happens also in 1.4 for pyspark bu
Hi Yin,
You are right! I just tried the scala version with the above lines, it
works as expected.
I'm not sure if it happens also in 1.4 for pyspark but I thought the
pyspark code just calls the scala code via py4j. I didn't expect that this
bug is pyspark specific. That surprises me actually a bi
Seems 1.4 has the same issue.
On Mon, Sep 21, 2015 at 10:01 AM, Yin Huai wrote:
> btw, does 1.4 has the same problem?
>
> On Mon, Sep 21, 2015 at 10:01 AM, Yin Huai wrote:
>
>> Hi Jerry,
>>
>> Looks like it is a Python-specific issue. Can you create a JIRA?
>>
>> Thanks,
>>
>> Yin
>>
>> On Mon,
btw, does 1.4 has the same problem?
On Mon, Sep 21, 2015 at 10:01 AM, Yin Huai wrote:
> Hi Jerry,
>
> Looks like it is a Python-specific issue. Can you create a JIRA?
>
> Thanks,
>
> Yin
>
> On Mon, Sep 21, 2015 at 8:56 AM, Jerry Lam wrote:
>
>> Hi Spark Developers,
>>
>> I just ran some very s
Hi Jerry,
Looks like it is a Python-specific issue. Can you create a JIRA?
Thanks,
Yin
On Mon, Sep 21, 2015 at 8:56 AM, Jerry Lam wrote:
> Hi Spark Developers,
>
> I just ran some very simple operations on a dataset. I was surprise by the
> execution plan of take(1), head() or first().
>
> Fo
Hi Spark Developers,
I just ran some very simple operations on a dataset. I was surprise by the
execution plan of take(1), head() or first().
For your reference, this is what I did in pyspark 1.5:
df=sqlContext.read.parquet("someparquetfiles")
df.head()
The above lines take over 15 minutes. I wa
uot;salary"))
* }}
* @group dfops
*/
On 10 Aug 2015, at 09:36, Netwaver wrote:
> Hi Spark experts,
> I am now using Spark 1.4.1 and trying Spark SQL/DataFrame
> API with text file in below format
> id gender height
>
0, 2015 at 12:06 PM, Netwaver wrote:
Hi Spark experts,
I am now using Spark 1.4.1 and trying Spark SQL/DataFrame API
with text file in below format
id gender height
1 M 180
Isnt it a space separated data? It is not a comma(,) separated nor pipe (|)
separated data.
Thanks
Best Regards
On Mon, Aug 10, 2015 at 12:06 PM, Netwaver wrote:
> Hi Spark experts,
> I am now using Spark 1.4.1 and trying Spark SQL/DataFrame
> API with text file
Hi Spark experts,
I am now using Spark 1.4.1 and trying Spark SQL/DataFrame API
with text file in below format
id gender height
1 M 180
2 F 167
... ...
But I meet
>
>> >>>> val df = ..... // some code that creates a DataFrame
>> >>>> df.filter( df("columnname").isNotNull() )
>> >>>>
>> >>>> +-+-++
>> >>>> |x|a| y|
>> >>>&g
; >>>> |2|bob|5|
> >>>> +-+---+-+
> >>>>
> >>>>
> >>>> Unfortunetaly and while this is a true for a nullable column
> (according to
> >>>> df.printSchema), it is not true for a column that is not nullable:
&g
lse)
>>>>
>>>> +-+-++
>>>> |x|a| y|
>>>> +-+-++
>>>> |1|hello|null|
>>>> |2| bob| 5|
>>>> +-+-++
>>>>
>>>> such that the output is not affected by the
t;>> A came uo with this:*
>>>
>>> /**
>>>* Set, if a column is nullable.
>>>* @param df source DataFrame
>>>* @param cn is the column name to change
>>>* @param nullable is the flag to set, such that the column is either
&g
either
>> nullable or not
>>*/
>> def setNullableStateOfColumn( df: DataFrame, cn: String, nullable:
>> Boolean) : DataFrame = {
>>
>> val schema = df.schema
>> val newSchema = StructType(schema.map {
>> cas
Boolean) : DataFrame = {
>
> val schema = df.schema
> val newSchema = StructType(schema.map {
> case StructField( c, t, _, m) if c.equals(cn) => StructField( c, t,
> nullable = nullable, m)
> case y: StructField => y
> })
> df.sqlContext.createDataFrame( df.rdd, newSchema)
&g
comments?*
Cheers and thx in advance,
Martin
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-DataFrame-Nullable-column-and-filtering-tp24087.html
Sent from the Apache Spark User List mailing
37 matches
Mail list logo