>
> I don't have a good sense of the overhead of continuing to support
> Python 2; is it large enough to consider dropping it in Spark 3.0?
>
> from the build/test side, it will actually be pretty easy to continue
support for python2.7 for spark 2.x as the feature sets won't be expanding.
that
Here’s the tweet from the horse’s mouth:
https://twitter.com/gvanrossum/status/1133496146700058626?s=21
Cheers
Jules
—
Sent from my iPhone
Pardon the dumb thumb typos :)
> On May 29, 2019, at 10:12 PM, Sean Owen wrote:
>
> Deprecated -- certainly and sooner than later.
> I don't have a
Deprecated -- certainly and sooner than later.
I don't have a good sense of the overhead of continuing to support
Python 2; is it large enough to consider dropping it in Spark 3.0?
On Wed, May 29, 2019 at 11:47 PM Xiangrui Meng wrote:
>
> Hi all,
>
> I want to revive this old thread since no
Hi all,
I want to revive this old thread since no action was taken so far. If we
plan to mark Python 2 as deprecated in Spark 3.0, we should do it as early
as possible and let users know ahead. PySpark depends on Python, numpy,
pandas, and pyarrow, all of which are sunsetting Python 2 support by
Don't you have a date/timestamp to handle updates? So, you're talking about
CDC? If you've Datestamp you can check if that/those key(s) exists, if
exists then check if timestamp matches, if that matches, then ignore, if
that doesn't then update.
On Thu 30 May, 2019, 7:11 AM Genieliu, wrote:
>
Isn't step1 and step2 producing the copy of Table A?
--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Hi all,
I am new to spark and I am trying to write an application using dataframes
that normalize data.
So I have a dataframe `denormalized_cities` with 3 columns: COUNTRY, CITY,
CITY_NICKNAME
Here is what I want to do:
1. Map by country, then for each country generate a new ID and write
Hello,
I am trying to understand the *content* of a checkpoint and corresponding
recovery; understanding the process of checkpointing is obviously the
natural way of going about it and so I went over the following list:
- medium post
Why don't you simply copy whole of delta data (Table A) into a stage table
(temp table in your case) and insert depending on a *WHERE NOT EXISTS* check
on primary key/composite key which already exists in the table B?
That's faster and does the reconciliation job smoothly enough.
Others, any
Hello,
I am trying to understand the *content* of a checkpoint and corresponding
recovery; understanding the process of checkpointing is obviously the
natural way of going about it and so I went over the following list:
- medium post
Hello,
I am trying to understand the content of a checkpoint and corresponding
recovery.
*My understanding of Spark Checkpointing:
*
If you have really long DAGs and your spark cluster fails, checkpointing
helps by persisting intermediate state e.g. to HDFS. So, a DAG of 50
transformations can
Hey Aakash,
That will work for records which dont exist yet in the target table. What
about records which have to be updated ?
As I mentioned, I want to do an upsert. That means, I want to add not
existing records and update those which already exist.
Thanks
Tom
On Wed 29 May 2019 at 18:39,
Hey Guys,
I am wondering what would be your approach to following scenario:
I have two tables - one (Table A) is relatively small (e.g 50GB) and second
one (Table B) much bigger (e.g. 3TB). Both are parquet tables.
I want to ADD all records from Table A to Table B which dont exist in
Table B
Nope Not at all
On Sun, May 26, 2019 at 8:15 AM yeikel valdes wrote:
> Isn't match_recognize just a filter?
>
> df.filter(predicate)?
>
>
> On Sat, 25 May 2019 12:55:47 -0700 * kanth...@gmail.com
> * wrote
>
> Hi All,
>
> Does Spark SQL has match_recognize? I am not sure why CEP
Hi,
A few thoughts to add to Nicholas' apt reply.
We were loading multiple files from AWS S3 in our Spark application. When
the spark step of load files is called, the driver spends significant time
fetching the exact path of files from AWS s3.
Especially because we specified S3 paths like regex
15 matches
Mail list logo