from:"Eric Eijkelenboom"

Upgrade to parquet 1.6.0

2015-06-12 Thread Eric Eijkelenboom

Hi What is the reason that Spark still comes with Parquet 1.6.0rc3? It seems like newer Parquet versions are available (e.g. 1.6.0). This would fix problems with ‘spark.sql.parquet.filterPushdown’, which currently is disabled by default, because of a bug in Parquet 1.6.0rc3. Thanks! Eric

Re: Upgrade to parquet 1.6.0

2015-06-12 Thread Eric Eijkelenboom

to 1.7.0 (which is exactly the same as 1.6.0 with package name renamed from com.twitter to org.apache.parquet) on master branch recently. Cheng On 6/12/15 6:16 PM, Eric Eijkelenboom wrote: Hi What is the reason that Spark still comes with Parquet 1.6.0rc3? It seems like newer Parquet

Re: Parquet number of partitions

2015-05-07 Thread Eric Eijkelenboom

in the path. Q2: To reduce the number of partitions you can use rdd.repartition(x), x= number of partitions. Depend on your case, repartition could be a heavy task Regards. Miguel. On Tue, May 5, 2015 at 3:56 PM, Eric Eijkelenboom eric.eijkelenb...@gmail.com mailto:eric.eijkelenb

Parquet number of partitions

2015-05-05 Thread Eric Eijkelenboom

Hello guys Q1: How does Spark determine the number of partitions when reading a Parquet file? val df = sqlContext.parquetFile(path) Is it some way related to the number of Parquet row groups in my input? Q2: How can I reduce this number of partitions? Doing this: df.rdd.coalesce(200).count

Re: Opening many Parquet files = slow

2015-04-13 Thread Eric Eijkelenboom

. sqlContext.load() took about 30 minutes for 5000 Parquet files on S3, the same as 1.3.0. Any help would be greatly appreciated! Thanks a lot. Eric On 10 Apr 2015, at 16:46, Eric Eijkelenboom eric.eijkelenb...@gmail.com wrote: Hi Ted Ah, I guess the term ‘source’ confused me :) Doing

Opening many Parquet files = slow

2015-04-08 Thread Eric Eijkelenboom

minutes when working with Parquet files. Previously, we used LZO files and did not experience this problem. Bonus info: This also happens when I use auto partition discovery (i.e. sqlContext.parquetFile(“/path/to/logsroot/)). What can I do to avoid this? Thanks in advance! Eric Eijkelenboom

Upgrade to parquet 1.6.0

Re: Upgrade to parquet 1.6.0

Re: Parquet number of partitions

Parquet number of partitions

Re: Opening many Parquet files = slow

Opening many Parquet files = slow

6 matches

Site Navigation

Mail list logo

Footer information