Re: Spark SQL, Hive & Parquet data types

2015-02-23 Thread The Watcher
> > Yes, recently we improved ParquetRelation2 quite a bit. Spark SQL uses its > own Parquet support to read partitioned Parquet tables declared in Hive > metastore. Only writing to partitioned tables is not covered yet. These > improvements will be included in Spark 1.3.0. > > Just created SPARK-5

Re: Spark SQL, Hive & Parquet data types

2015-02-20 Thread The Watcher
> > >1. In Spark 1.3.0, timestamp support was added, also Spark SQL uses >its own Parquet support to handle both read path and write path when >dealing with Parquet tables declared in Hive metastore, as long as you’re >not writing to a partitioned table. So yes, you can. > > Ah, I h

Spark SQL, Hive & Parquet data types

2015-02-19 Thread The Watcher
Still trying to get my head around Spark SQL & Hive. 1) Let's assume I *only* use Spark SQL to create and insert data into HIVE tables, declared in a Hive meta-store. Does it matter at all if Hive supports the data types I need with Parquet, or is all that matters what Catalyst & spark's parquet

Hive SKEWED feature supported in Spark SQL ?

2015-02-19 Thread The Watcher
I have done some testing of inserting into tables defined in Hive using 1.2 and I can see that the PARTITION clause is honored : data files get created in multiple subdirectories correctly. I tried the SKEWED BY ON STORED AS DIRECTORIES clause on the CREATE TABLE clause but I didn't see subdirecto

Spark & Hive

2015-02-15 Thread The Watcher
I'm a little confused around Hive & Spark, can someone shed some light ? Using Spark, I can access the Hive metastore and run Hive queries. Since I am able to do this in stand-alone mode, it can't be using map-reduce to run the Hive queries and I suppose it's building a query plan and executing it