from:"Rex Xiong"

Range partition for parquet file?

2016-05-27 Thread Rex Xiong

Hi, I have a spark job output DataFrame which contains a column named Id, which is a GUID string. We will use Id to filter data in another spark application, so it should be a partition key. I found these two methods in Internet: 1. DataFrame.write.save("Id") method will help, but the possible v

Re: Issue of Hive parquet partitioned table schema mismatch

2015-11-06 Thread Rex Xiong

an : > Is there any chance that " spark.sql.hive.convertMetastoreParquet" is > turned off? > > Cheng > > On 11/4/15 5:15 PM, Rex Xiong wrote: > > Thanks Cheng Lian. > I found in 1.5, if I use spark to create this table with partition > discovery, the partition pr

Re: Issue of Hive parquet partitioned table schema mismatch

2015-11-04 Thread Rex Xiong

plan optimization is different. 2015-11-03 23:10 GMT+08:00 Cheng Lian : > SPARK-11153 should be irrelevant because you are filtering on a partition > key while SPARK-11153 is about Parquet filter push-down and doesn't affect > partition pruning. > > Cheng > > > On 11/

Re: Issue of Hive parquet partitioned table schema mismatch

2015-11-03 Thread Rex Xiong

5年10月31日下午7:38，"Rex Xiong" 写道： > Add back this thread to email list, forgot to reply all. > 2015年10月31日下午7:23，"Michael Armbrust" 写道： > >> Not that I know of. >> >> On Sat, Oct 31, 2015 at 12:22 PM, Rex Xiong wrote: >> >>> Good to kn

Re: Issue of Hive parquet partitioned table schema mismatch

2015-10-31 Thread Rex Xiong

Add back this thread to email list, forgot to reply all. 2015年10月31日下午7:23，"Michael Armbrust" 写道： > Not that I know of. > > On Sat, Oct 31, 2015 at 12:22 PM, Rex Xiong wrote: > >> Good to know that, will have a try. >> So there is no easy way to achieve it in p

Issue of Hive parquet partitioned table schema mismatch

2015-10-30 Thread Rex Xiong

Hi folks, I have a Hive external table with partitions. Every day, an App will generate a new partition day=-MM-dd stored by parquet and run add-partition Hive command. In some cases, we will add additional column to new partitions and update Hive table schema, then a query across new and old

Re: Issue of jar dependency in yarn-cluster mode

2015-10-16 Thread Rex Xiong

I resolve this issue finally by adding --conf spark.executor.extraClassPath= snakeyaml-1.10.jar 2015-10-16 22:57 GMT+08:00 Rex Xiong : > Hi folks, > > In my spark application, executor task depends on snakeyaml-1.10.jar > I build it with Maven and it works fine: > spark-submit

Issue of jar dependency in yarn-cluster mode

2015-10-16 Thread Rex Xiong

Hi folks, In my spark application, executor task depends on snakeyaml-1.10.jar I build it with Maven and it works fine: spark-submit --master local --jars d:\snakeyaml-1.10.jar ... But when I try to run it in yarn, I have issue, it seems spark executor cannot find the jar file: spark-subm

Jar is cached in yarn-cluster mode?

2015-10-09 Thread Rex Xiong

I use "spark-submit -master yarn-cluster hdfs://.../a.jar .." to submit my app to yarn. Then I update this a.jar in HDFS, run the command again, I found a line of log that was been removed still exist in "yarn logs ". Is there a cache mechanism I need to disable? Thanks

Is it possible to disable AM page proxy in Yarn client mode?

2015-08-03 Thread Rex Xiong

In Yarn client mode, Spark driver URL will be redirected to Yarn web proxy server, but I don't want to use this dynamic name, is it possible to still use : as standalone mode?

DESCRIBE FORMATTED doesn't work in Hive Thrift Server?

2015-07-05 Thread Rex Xiong

Hi, I try to use for one table created in spark, but it seems the results are all empty, I want to get metadata for table, what's other options? Thanks +---+ |result | +---+ | # col_name| |

How to disable parquet schema merging in 1.4?

2015-06-23 Thread Rex Xiong

I remember in a previous PR, schema merging can be disabled by setting spark.sql.hive.convertMetastoreParquet.mergeSchema to false. But in 1.4 release, I don't see this config anymore, is there a new way to do it? Thanks

How to get Master UI with ZooKeeper HA setup?

2015-05-11 Thread Rex Xiong

Hi, We have a 3-node master setup with ZooKeeper HA. Driver can find the master with spark://xxx:xxx,xxx:xxx,xxx:xxx But how can I find out the valid Master UI without looping through all 3 nodes? Thanks

Re: Parquet Hive table become very slow on 1.3?

2015-04-22 Thread Rex Xiong

sses will hit > the metadata cache. > > Thanks, > > Yin > > On Tue, Apr 21, 2015 at 1:13 AM, Rex Xiong wrote: > >> We have the similar issue with massive parquet files, Cheng Lian, could >> you have a look? >> >> 2015-04-08 15:47 GMT+08:00 Zheng, Xu

Re: Parquet Hive table become very slow on 1.3?

2015-04-21 Thread Rex Xiong

We have the similar issue with massive parquet files, Cheng Lian, could you have a look? 2015-04-08 15:47 GMT+08:00 Zheng, Xudong : > Hi Cheng, > > I tried both these patches, and seems still not resolve my issue. And I > found the most time is spend on this line in newParquet.scala: > > ParquetF

Issue of sqlContext.createExternalTable with parquet partition discovery after changing folder structure

2015-04-04 Thread Rex Xiong

Hi Spark Users, I'm testing 1.3 new feature of parquet partition discovery. I have 2 sub folders, each has 800 rows. /data/table1/key=1 /data/table1/key=2 In spark-shell, run this command: val t = sqlContext.createExternalTable("table1", "hdfs:///data/table1", "parquet") t.count It shows

Parquet timestamp support for Hive?

2015-04-03 Thread Rex Xiong

Hi, I got this error when creating a hive table from parquet file: DDLTask: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.UnsupportedOperationException: Parquet does not support timestamp. See HIVE-6384 I check HIVE-6384, it's fixed in 0.14. The hive in spark build is a customized v

Return jobid for a hive query?

2015-03-03 Thread Rex Xiong

Hi there, I have an app talking to Spark Hive Server using Hive ODBC, querying is OK. But in this interface, I can't get much running details when my query goes wrong, only one error message is shown. I want to get jobid for my query, so that I can go to Application Detail UI to see what's going o

Range partition for parquet file?

Re: Issue of Hive parquet partitioned table schema mismatch

Re: Issue of Hive parquet partitioned table schema mismatch

Re: Issue of Hive parquet partitioned table schema mismatch

Re: Issue of Hive parquet partitioned table schema mismatch

Issue of Hive parquet partitioned table schema mismatch

Re: Issue of jar dependency in yarn-cluster mode

Issue of jar dependency in yarn-cluster mode

Jar is cached in yarn-cluster mode?

Is it possible to disable AM page proxy in Yarn client mode?

DESCRIBE FORMATTED doesn't work in Hive Thrift Server?

How to disable parquet schema merging in 1.4?

How to get Master UI with ZooKeeper HA setup?

Re: Parquet Hive table become very slow on 1.3?

Re: Parquet Hive table become very slow on 1.3?

Issue of sqlContext.createExternalTable with parquet partition discovery after changing folder structure

Parquet timestamp support for Hive?

Return jobid for a hive query?

18 matches

Site Navigation

Mail list logo

Footer information