RE: Can we load csv partitioned data into one DF?

2016-02-22 Thread Mohammed Guller
: user@spark.apache.org Subject: Can we load csv partitioned data into one DF? Hello all, I am facing a silly data question. If I have +100 csv files which are part of the same data, but each csv is for example, a year on a timeframe column (i.e. partitioned by year), what would you suggest inst

Re: Can we load csv partitioned data into one DF?

2016-02-22 Thread Mich Talebzadeh
Indeed this will work. Additionally the files could be zipped as well (gz or bzip2) val df = sqlContext.read.format("com.databricks.spark.csv").option("inferSchema", "true").option("header", "true").load("/data/stg") On 22/02/2016 15:32, Alex Dzhagriev wrote: > Hi Saif, > > You can put y

Can we load csv partitioned data into one DF?

2016-02-22 Thread Saif.A.Ellafi
Hello all, I am facing a silly data question. If I have +100 csv files which are part of the same data, but each csv is for example, a year on a timeframe column (i.e. partitioned by year), what would you suggest instead of loading all those files and joining them? Final target would be parquet.

Re: Can we load csv partitioned data into one DF?

2016-02-22 Thread Alex Dzhagriev
Hi Saif, You can put your files into one directory and read it as text. Another option is to read them separately and then union the datasets. Thanks, Alex. On Mon, Feb 22, 2016 at 4:25 PM, wrote: > Hello all, I am facing a silly data question. > > If I have +100 csv files which are part of th