Re: Spark parquet file read problem !

2017-08-01 Thread ??????????
?9?4 I have no idea about this ---Original--- From: "serkan ta?0?6"<serkan_...@hotmail.com> Date: 2017/7/31 16:42:59 To: "pandees waran"<pande...@gmail.com>;"??"<1427357...@qq.com>; Cc: "user@spark.apache.org"<user@sp

Re: Spark parquet file read problem !

2017-07-31 Thread serkan taş
ser@spark.apache.org"<user@spark.apache.org>; Subject: Re: Spark parquet file read problem ! I checked and realised that the schema of the files different with some missing fields and some fields with same name but different type. How may i overcome the issue? Android için Outlook<

Re: Spark parquet file read problem !

2017-07-31 Thread ??????????
please add the schemaMerge to the option. ---Original--- From: "serkan ta?0?6"<serkan_...@hotmail.com> Date: 2017/7/31 13:54:14 To: "pandees waran"<pande...@gmail.com>; Cc: "user@spark.apache.org"<user@spark.apache.org>; Subject: Re: Spark parque

Re: Spark parquet file read problem !

2017-07-30 Thread serkan taş
<pande...@gmail.com> Sent: Sunday, July 30, 2017 7:12:55 PM To: serkan taş Cc: user@spark.apache.org Subject: Re: Spark parquet file read problem ! I have encountered the similar error when the schema / datatypes are conflicting in those 2 parquet files. Are you sure that the 2 indi

Re: Spark parquet file read problem !

2017-07-30 Thread serkan taş
* for what ? Yehuda Finkelshtein > şunları yazdı (30 Tem 2017 20:45): Try to add "*" at the end of the folder and parquetFile = spark.read.parquet(“hdfs://xxx/20170719/*”) On Jul 30, 2017 19:13, "pandees waran"

Re: Spark parquet file read problem !

2017-07-30 Thread Yehuda Finkelshtein
Try to add "*" at the end of the folder and parquetFile = spark.read.parquet(“hdfs://xxx/20170719/*”) On Jul 30, 2017 19:13, "pandees waran" wrote: I have encountered the similar error when the schema / datatypes are conflicting in those 2 parquet files. Are you sure

Re: Spark parquet file read problem !

2017-07-30 Thread pandees waran
I have encountered the similar error when the schema / datatypes are conflicting in those 2 parquet files. Are you sure that the 2 individual files are in the same structure with similar datatypes. If not you have to fix this by enforcing the default values for the missing values to make the

Spark parquet file read problem !

2017-07-30 Thread serkan taş
Hi, I have a problem while reading parquet files located in hdfs. If i read the files individually nothing wrong and i can get the file content. parquetFile = spark.read.parquet(“hdfs://xxx/20170719/part-0-3a9c226f-4fef-44b8-996b-115a2408c746.snappy.parquet") and parquetFile =