Re: Spark 1.6.0 + Hive + HBase

2016-02-15 Thread chutium
anyone took a look at this issue: https://issues.apache.org/jira/browse/HIVE-11166 i got same exception by inserting into hbase table -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Spark-1-6-0-Hive-HBase-tp16128p16332.html Sent from the Apache Spark

hive client.getAllPartitions in lookupRelation can take a very long time

2014-09-02 Thread chutium
in our hive warehouse there are many tables with a lot of partitions, such as scala hiveContext.sql(use db_external) scala val result = hiveContext.sql(show partitions et_fullorders).count result: Long = 5879 i noticed that this part of code:

RE: HiveContext, schemaRDD.printSchema get different dataTypes, feature or a bug? really strange and surprised...

2014-09-01 Thread chutium
thanks a lot, Hao, finally solved this problem, changes of CSVSerDe are here: https://github.com/chutium/csv-serde/commit/22c667c003e705613c202355a8791978d790591e btw, add jar in spark hive or hive-thriftserver always doesn't work, we build the spark with libraryDependencies += csv-serde

Re: HiveContext, schemaRDD.printSchema get different dataTypes, feature or a bug? really strange and surprised...

2014-08-31 Thread chutium
Hi Cheng, thank you very much for helping me to finally find out the secret of this magic... actually we defined this external table with SID STRING REQUEST_ID STRING TIMES_DQ TIMESTAMP TOTAL_PRICE FLOAT ... using desc table ext_fullorders it is only shown as [# col_name

Re: [VOTE] Release Apache Spark 1.1.0 (RC3)

2014-08-31 Thread chutium
has anyone tried to build it on hadoop.version=2.0.0-mr1-cdh4.3.0 or hadoop.version=1.0.3-mapr-3.0.3 ? see comments in https://issues.apache.org/jira/browse/SPARK-3124 https://github.com/apache/spark/pull/2035 i built spark snapshot on hadoop.version=1.0.3-mapr-3.0.3 and the ticket creator built

HiveContext, schemaRDD.printSchema get different dataTypes, feature or a bug? really strange and surprised...

2014-08-26 Thread chutium
is there any dataType auto convert or detect or something in HiveContext ?all columns of a table is defined as string in hive metastoreone column is total_price with values like 123.45, then this column will be recognized as dataType Float in HiveContext...this is a feature or a bug? it really

HiveContext, schemaRDD.printSchema get different dataTypes, feature or a bug? really strange and surprised...

2014-08-26 Thread chutium
is there any dataType auto convert or detect or something in HiveContext ? all columns of a table is defined as string in hive metastore one column is total_price with values like 123.45, then this column will be recognized as dataType Float in HiveContext... this is a feature or a bug? it

Re: HiveContext, schemaRDD.printSchema get different dataTypes, feature or a bug? really strange and surprised...

2014-08-26 Thread chutium
oops, i tried on a managed table, column types will not be changed so it is mostly due to the serde lib CSVSerDe (https://github.com/ogrodnek/csv-serde/blob/master/src/main/java/com/bizo/hive/serde/csv/CSVSerde.java#L123) or maybe CSVReader from opencsv?... but if the columns are defined as

Re: Spark SQL Query and join different data sources.

2014-08-21 Thread chutium
as far as i know, HQL queries try to find the schema info of all the tables in this query from hive metastore, so it is not possible to join tables from sqlContext using hiveContext.hql but this should work: hiveContext.hql(select ...).regAsTable(a) sqlContext.jsonFile(xxx).regAsTable(b) then

Re: spark-shell is broken! (bad option: '--master')

2014-08-08 Thread chutium
no one use spark-shell in master branch? i created a PR as follow up commit of SPARK-2678 and PR #1801: https://github.com/apache/spark/pull/1861 -- View this message in context: