date:20200707

Re: [Announcement] Cloud data lake conference with heavy focus on open source

2020-07-07 Thread Ashley Hoff

Interesting You've piqued my interest. Will the sessons be available after the conference? (I'm in the wrong timezone to see this during daylight hours) On Wed, Jul 8, 2020 at 2:40 AM ldazaa11 wrote: > Hello Sparkers, > > If you’re interested in how Spark is being applied in cloud data la

Re: Mocking pyspark read writes

2020-07-07 Thread Jörn Franke

Write to a local temp directory via file:// ? > Am 07.07.2020 um 20:07 schrieb Dark Crusader : > > > Hi everyone, > > I have a function which reads and writes a parquet file from HDFS. When I'm > writing a unit test for this function, I want to mock this read & write. > > How do you achieve

Mocking pyspark read writes

2020-07-07 Thread Dark Crusader

Hi everyone, I have a function which reads and writes a parquet file from HDFS. When I'm writing a unit test for this function, I want to mock this read & write. How do you achieve this? Any help would be appreciated. Thank you.

[Announcement] Cloud data lake conference with heavy focus on open source

2020-07-07 Thread ldazaa11

Hello Sparkers, If you’re interested in how Spark is being applied in cloud data lake environments, then you should check out a new 1-day LIVE, virtual conference on July 30. This conference is called Subsurface and the focus is technical talks tailored specifically for data architects and engine

Re: When does SparkContext.defaultParallelism have the correct value?

2020-07-07 Thread Sean Owen

If not set explicitly with spark.default.parallelism, it will default to the number of cores currently available (minimum 2). At the very start, some executors haven't completed registering, which I think explains why it goes up after a short time. (In the case of dynamic allocation it will change

ANALYZE command not supported on Spark 2.3.2?

2020-07-07 Thread daniel123

Does anyone know if ANALYZE TABLE is supported on Spark 2.3.2? The command doesnt appear in the documentation (spark.apache.org/docs/2.3.2/sql-programming-guide.html) although we can launch it with estrange results The analyse table job takes hours and doesnt launch any executors, it just runs in

how to disable hivemetastore connection

2020-07-07 Thread iamabug

Hi community,I am running hundreds of Spark jobs at the same time, which cause Hive Metastore connection numbers to be very high (> 1K), since the jobs do not use HMS really, so I wish to disable that, I have tried setting spark.s

Re: [Announcement] Cloud data lake conference with heavy focus on open source

Re: Mocking pyspark read writes

Mocking pyspark read writes

[Announcement] Cloud data lake conference with heavy focus on open source

Re: When does SparkContext.defaultParallelism have the correct value?

ANALYZE command not supported on Spark 2.3.2?

how to disable hivemetastore connection

7 matches

Site Navigation

Mail list logo

Footer information