Spark Streaming: HiveContext within Custom Actor

2014-12-29 Thread sranga
.nabble.com/Spark-Streaming-HiveContext-within-Custom-Actor-tp20892.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands

Can HiveContext be used without using Hive?

2014-12-09 Thread Manoj Samel
From 1.1.1 documentation, it seems one can use HiveContext instead of SQLContext without having a Hive installation. The benefit is richer SQL dialect. Is my understanding correct ? Thanks

Re: Can HiveContext be used without using Hive?

2014-12-09 Thread Michael Armbrust
That is correct. It the hive context will create an embedded metastore in the current directory if you have not configured hive. On Tue, Dec 9, 2014 at 5:51 PM, Manoj Samel manojsamelt...@gmail.com wrote: From 1.1.1 documentation, it seems one can use HiveContext instead of SQLContext without

Re: Can HiveContext be used without using Hive?

2014-12-09 Thread Anas Mosaad
, Manoj Samel manojsamelt...@gmail.com wrote: From 1.1.1 documentation, it seems one can use HiveContext instead of SQLContext without having a Hive installation. The benefit is richer SQL dialect. Is my understanding correct ? Thanks

RE: Can HiveContext be used without using Hive?

2014-12-09 Thread Cheng, Hao
It works exactly like Create Table As Select (CTAS) in Hive. Cheng Hao From: Anas Mosaad [mailto:anas.mos...@incorta.com] Sent: Wednesday, December 10, 2014 11:59 AM To: Michael Armbrust Cc: Manoj Samel; user@spark.apache.org Subject: Re: Can HiveContext be used without using Hive

Re: Is there a way to get column names using hiveContext ?

2014-12-08 Thread Michael Armbrust
You can call .schema on SchemaRDDs. For example: results.schema.fields.map(_.name) On Sun, Dec 7, 2014 at 11:36 PM, abhishek reachabhishe...@gmail.com wrote: Hi, I have iplRDD which is a json, and I do below steps and query through hivecontext. I get the results but without columns headers

Is there a way to get column names using hiveContext ?

2014-12-07 Thread abhishek
Hi, I have iplRDD which is a json, and I do below steps and query through hivecontext. I get the results but without columns headers. Is there is a way to get the columns names ? val teamRDD = hiveContext.jsonRDD(iplRDD) teamRDD.registerTempTable(teams) hiveContext.cacheTable(teams) val result

error when importing HiveContext

2014-11-07 Thread Pagliari, Roberto
I'm getting this error when importing hive context from pyspark.sql import HiveContext Traceback (most recent call last): File stdin, line 1, in module File /path/spark-1.1.0/python/pyspark/__init__.py, line 63, in module from pyspark.context import SparkContext File /path/spark-1.1.0

Re: error when importing HiveContext

2014-11-07 Thread Davies Liu
pyspark.sql import HiveContext Traceback (most recent call last): File stdin, line 1, in module File /path/spark-1.1.0/python/pyspark/__init__.py, line 63, in module from pyspark.context import SparkContext File /path/spark-1.1.0/python/pyspark/context.py, line 30, in module

Re: Unable to use HiveContext in spark-shell

2014-11-06 Thread tridib
Help please! -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Unable-to-use-HiveContext-in-spark-shell-tp18261p18280.html Sent from the Apache Spark User List mailing list archive at Nabble.com

Re: Unable to use HiveContext in spark-shell

2014-11-06 Thread Jimmy McErlain
.* *E*: ji...@sellpoints.com *M*: *510.303.7751* On Thu, Nov 6, 2014 at 9:22 AM, tridib tridib.sama...@live.com wrote: Help please! -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Unable-to-use-HiveContext-in-spark-shell-tp18261p18280.html Sent from

Re: Unable to use HiveContext in spark-shell

2014-11-06 Thread Terry Siu
What version of Spark are you using? Did you compile your Spark version and if so, what compile options did you use? On 11/6/14, 9:22 AM, tridib tridib.sama...@live.com wrote: Help please! -- View this message in context:

RE: Unable to use HiveContext in spark-shell

2014-11-06 Thread Tridib Samanta
sqlContext: org.apache.spark.sql.hive.HiveContext private local triedcooking sqlContext tpt // tree.tpe=org.apache.spark.sql.hive.HiveContext Apply( // def init(sc: org.apache.spark.SparkContext): org.apache.spark.sql.hive.HiveContext in class HiveContext, tree.tpe

Re: Unable to use HiveContext in spark-shell

2014-11-06 Thread Terry Siu
...@spark.incubator.apache.org Subject: RE: Unable to use HiveContext in spark-shell I am using spark 1.1.0. I built it using: ./make-distribution.sh -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 -Phive -DskipTests My ultimate goal is to execute a query on parquet file with nested structure and cast

Re: Unable to use HiveContext in spark-shell

2014-11-06 Thread tridib
Yes. I have org.apache.hadoop.hive package in spark assembly. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Unable-to-use-HiveContext-in-spark-shell-tp18261p18322.html Sent from the Apache Spark User List mailing list archive at Nabble.com

Re: Unable to use HiveContext in spark-shell

2014-11-06 Thread tridib
I built spark-1.1.0 in a new fresh machine. This issue is gone! Thank you all for your help. Thanks Regards Tridib -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Unable-to-use-HiveContext-in-spark-shell-tp18261p18324.html Sent from the Apache Spark User

Unable to use HiveContext in spark-shell

2014-11-05 Thread tridib
I am connecting to a remote master using spark shell. Then I am getting following error while trying to instantiate HiveContext. scala val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc) error: bad symbolic reference. A signature in HiveContext.class refers to term hive in package

How to retrive spark context when hiveContext is used in sparkstreaming

2014-10-29 Thread critikaled
(spark.cassandra.connection.host, 127.0.0.1).set(spark.cleaner.ttl, 300) val context = new SparkContext(conf) val hiveContext = new HiveContext(context) import com.dgm.Trail.hiveContext._ context textFile logs/log1.txt flatMap { data = val Array(id, signals) = data split '|' signals

Ephemeral Hive metastore for HiveContext?

2014-10-27 Thread Jianshi Huang
There's an annoying small usability issue in HiveContext. By default, it creates a local metastore which forbids other processes using HiveContext to be launched from the same directory. How can I make the metastore local to each HiveContext? Is there an in-memory metastore configuration? /tmp

Re: Ephemeral Hive metastore for HiveContext?

2014-10-27 Thread Cheng Lian
: There's an annoying small usability issue in HiveContext. By default, it creates a local metastore which forbids other processes using HiveContext to be launched from the same directory. How can I make the metastore local to each HiveContext? Is there an in-memory metastore configuration

Re: Ephemeral Hive metastore for HiveContext?

2014-10-27 Thread Ted Yu
as metastore https://db.apache.org/derby/docs/10.7/devguide/cdevdvlpinmemdb.html I'll investigate this when free, guess we can use this for Spark SQL Hive support testing. On 10/27/14 4:38 PM, Jianshi Huang wrote: There's an annoying small usability issue in HiveContext. By default

Re: Ephemeral Hive metastore for HiveContext?

2014-10-27 Thread Cheng Lian
Thanks Ted, this is exactly what Spark SQL LocalHiveContext does. To make an embedded metastore local to a single HiveContext, we must allocate different Derby database directories for each HiveContext, and Jianshi is also trying to avoid that. On 10/27/14 9:44 PM, Ted Yu wrote: Please see

Re: Ephemeral Hive metastore for HiveContext?

2014-10-27 Thread Jianshi Huang
at 9:57 PM, Cheng Lian lian.cs@gmail.com wrote: Thanks Ted, this is exactly what Spark SQL LocalHiveContext does. To make an embedded metastore local to a single HiveContext, we must allocate different Derby database directories for each HiveContext, and Jianshi is also trying to avoid

Re: Spark - HiveContext - Unstructured Json

2014-10-22 Thread Harivardan Jayaraman
HiveContext so that it can be accessed from the JDBC Thrift Server. I notice there are primarily only two methods available on the SchemaRDD for data - saveAsTable and insertInto. One defines the schema while the other can be used to insert in to the table, but there is no way to Alter the table

Re: Spark - HiveContext - Unstructured Json

2014-10-21 Thread Cheng Lian
: Hi, I have unstructured JSON as my input which may have extra columns row to row. I want to store these json rows using HiveContext so that it can be accessed from the JDBC Thrift Server. I notice there are primarily only two methods available on the SchemaRDD for data - saveAsTable

Re: Spark SQL HiveContext Projection Pushdown

2014-10-13 Thread Michael Armbrust
Is there any plan to support windowing queries? I know that Shark supported it in its last release and expected it to be already included. Someone from redhat is working on this. Unclear if it will make the 1.2 release.

Re: Unable to share Sql between HiveContext and JDBC Thrift Server

2014-10-10 Thread Cheng Lian
| and |start-thriftserver.sh| under the same |$SPARK_HOME| should work. Just verified this against Spark 1.1. On 10/10/14 9:32 AM, Steve Arnold wrote: I am writing a Spark job to persist data using HiveContext so that it can be accessed via the JDBC Thrift server. Although my code doesn't throw an error

Unable to share Sql between HiveContext and JDBC Thrift Server

2014-10-09 Thread Steve Arnold
I am writing a Spark job to persist data using HiveContext so that it can be accessed via the JDBC Thrift server. Although my code doesn't throw an error, I am unable to see my persisted data when I query from the Thrift server. I tried three different ways to get this to work: 1) val

Spark SQL HiveContext Projection Pushdown

2014-10-08 Thread Anand Mohan
we have timestamp based duplicate removal which requires windowing queries which are not working in SQLContext.sql parsing mode. 2. We then tried HiveQL using HiveContext by creating a Hive external table backed by the same Parquet data. However, in this mode, projection pushdown doesnt seem

Re: Spark SQL HiveContext Projection Pushdown

2014-10-08 Thread Michael Armbrust
We are working to improve the integration here, but I can recommend the following when running spark 1.1: create an external table and set spark.sql.hive.convertMetastoreParquet=true Note that even with a HiveContext we don't support window functions yet. On Wed, Oct 8, 2014 at 10:41 AM, Anand

Re: HiveContext: cache table not supported for partitioned table?

2014-10-03 Thread Du Li
...@spark.apache.org Cc: user@spark.apache.orgmailto:user@spark.apache.org user@spark.apache.orgmailto:user@spark.apache.org Subject: Re: HiveContext: cache table not supported for partitioned table? Cache table works with partitioned table. I guess you’re experimenting with a default local metastore

HiveContext: cache table not supported for partitioned table?

2014-10-02 Thread Du Li
Hi, In Spark 1.1 HiveContext, I ran a create partitioned table command followed by a cache table command and got a java.sql.SQLSyntaxErrorException: Table/View 'PARTITIONS' does not exist. But cache table worked fine if the table is not a partitioned table. Can anybody confirm that cache

Getting table info from HiveContext

2014-10-02 Thread Banias
Hi, Would anybody know how to get the following information from HiveContext given a Hive table name? - partition key(s) - table directory - input/output format I am new to Spark. And I have a couple tables created using Parquet data like: CREATE EXTERNAL TABLE parquet_table ( COL1 string

Re: Getting table info from HiveContext

2014-10-02 Thread Michael Armbrust
We actually leave all the DDL commands up to hive, so there is no programatic way to access the things you are looking for. On Thu, Oct 2, 2014 at 5:17 PM, Banias calvi...@yahoo.com.invalid wrote: Hi, Would anybody know how to get the following information from HiveContext given a Hive table

Re: HiveContext: cache table not supported for partitioned table?

2014-10-02 Thread Cheng Lian
:39 AM, Du Li wrote: Hi, In Spark 1.1 HiveContext, I ran a create partitioned table command followed by a cache table command and got a java.sql.SQLSyntaxErrorException: Table/View 'PARTITIONS' does not exist. But cache table worked fine if the table is not a partitioned table. Can anybody

Re: Getting table info from HiveContext

2014-10-02 Thread Banias
...@yahoo.com.invalid wrote: Hi, Would anybody know how to get the following information from HiveContext given a Hive table name? - partition key(s) - table directory - input/output format I am new to Spark. And I have a couple tables created using Parquet data like: CREATE EXTERNAL TABLE parquet_table

Re: problem with HiveContext inside Actor

2014-09-26 Thread Cheng Lian
NPE since the |Driver| is created in a thread different from the one |HiveContext| (and the contained |SessionState|) gets constructed. On 9/19/14 3:31 AM, Du Li wrote: I have figured it out. As shown in the code below, if the HiveContext hc were created in the actor object and used to create

Re: problem with HiveContext inside Actor

2014-09-26 Thread Cheng Lian
causes NPE since the |Driver| is created in a thread different from the one |HiveContext| (and the contained |SessionState|) gets constructed. On 9/19/14 3:31 AM, Du Li wrote: I have figured it out. As shown in the code below, if the HiveContext hc were created in the actor object and used

Re: problem with HiveContext inside Actor

2014-09-18 Thread Du Li
I have figured it out. As shown in the code below, if the HiveContext hc were created in the actor object and used to create db in response to message, it would throw null pointer exception. This is fixed by creating the HiveContext inside the MyActor class instead. I also tested the code

problem with HiveContext inside Actor

2014-09-17 Thread Du Li
Hi, Wonder anybody had similar experience or any suggestion here. I have an akka Actor that processes database requests in high-level messages. Inside this Actor, it creates a HiveContext object that does the actual db work. The main thread creates the needed SparkContext and passes

RE: problem with HiveContext inside Actor

2014-09-17 Thread Cheng, Hao
Hi, Du I am not sure what you mean triggers the HiveContext to create a database, do you create the sub class of HiveContext? Just be sure you call the HiveContext.sessionState eagerly, since it will set the proper hiveconf into the SessionState, otherwise the HiveDriver will always get

Re: problem with HiveContext inside Actor

2014-09-17 Thread Michael Armbrust
- dev Is it possible that you are constructing more than one HiveContext in a single JVM? Due to global state in Hive code this is not allowed. Michael On Wed, Sep 17, 2014 at 7:21 PM, Cheng, Hao hao.ch...@intel.com wrote: Hi, Du I am not sure what you mean “triggers the HiveContext

Move Spark configuration from SPARK_CLASSPATH to spark-default.conf , HiveContext went wrong with Class com.hadoop.compression.lzo.LzoCodec not found

2014-09-17 Thread Zhun Shen
Hi there, My product environment is AWS EMR with hadoop2.4.0 and spark1.0.2. I moved the spark configuration in SPARK_CLASSPATH to spark-default.conf,  then the hiveContext went wrong. I also found WARN info “WARN DataNucleus.General: Plugin (Bundle) org.datanucleus.store.rdbms is already

Re: problem with HiveContext inside Actor

2014-09-17 Thread Du Li
Thanks for your reply. Michael: No. I only create one HiveContext in the code. Hao: Yes. I subclass HiveContext and defines own function to create database and then subclass akka Actor to call that function in response to an abstract message. By your suggestion, I called println

SparkSQL HiveContext TypeTag compile error

2014-09-11 Thread Du Li
Hi, I have the following code snippet. It works fine on spark-shell but in a standalone app it reports No TypeTag available for MySchema” at compile time when calling hc.createScheamaRdd(rdd). Anybody knows what might be missing? Thanks, Du -- Import org.apache.spark.sql.hive.HiveContext

Re: SparkSQL HiveContext TypeTag compile error

2014-09-11 Thread Du Li
@spark.apache.orgmailto:user@spark.apache.org user@spark.apache.orgmailto:user@spark.apache.org Subject: SparkSQL HiveContext TypeTag compile error Hi, I have the following code snippet. It works fine on spark-shell but in a standalone app it reports No TypeTag available for MySchema” at compile

Re: SparkSQL HiveContext TypeTag compile error

2014-09-11 Thread Du Li
, 2014 at 12:33 PM To: user@spark.apache.orgmailto:user@spark.apache.org user@spark.apache.orgmailto:user@spark.apache.org Subject: Re: SparkSQL HiveContext TypeTag compile error Solved it. The problem occurred because the case class was defined within a test case in FunSuite. Moving the case class

Re: Problem Accessing Hive Table from hiveContext

2014-09-01 Thread Yin Huai
Hello Igor, Although Decimal is supported, Hive 0.12 does not support user definable precision and scale (it was introduced in Hive 0.13). Thanks, Yin On Sat, Aug 30, 2014 at 1:50 AM, Zitser, Igor igor.zit...@citi.com wrote: Hi All, New to spark and using Spark 1.0.2 and hive 0.12. If

Re: SparkSQL HiveContext No Suitable Driver / Cannot Find Driver

2014-08-30 Thread Denny Lee
/incubator-spark-user/201406.mbox/%3ccadoad2ks9_qgeign5-w7xogmrotrlbchvfukctgstj5qp9q...@mail.gmail.com%3E. Currently using Spark-1.1 (grabbed from git two days ago) and using Hive 0.12 with my metastore in MySQL.  If I run any HiveContext statements, it results in cannot find the driver in CLASSPATH

Problem Accessing Hive Table from hiveContext

2014-08-29 Thread Zitser, Igor
Hi All, New to spark and using Spark 1.0.2 and hive 0.12. If hive table created as test_datatypes(testbigint bigint, ss bigint )  select * from test_datatypes from spark works fine. For create table test_datatypes(testbigint bigint, testdec decimal(5,2) ) scala val

Problem using accessing HiveContext

2014-08-28 Thread Zitser, Igor
Hi, While using HiveContext. If hive table created as test_datatypes(testbigint bigint, ss bigint ) select below works fine. For create table test_datatypes(testbigint bigint, testdec decimal(5,2) ) scala val dataTypes=hiveContext.hql(select * from test_datatypes) 14/08/28 21:18:44 INFO

Re: Does HiveContext support Parquet?

2014-08-27 Thread Silvio Fiorito
-shell --master spark://S4:7077 --jars /home/hduser/parquet-hive-bundle-1.5.0.jar #import related hiveContext (successful) ... # create parquet table: hql(CREATE TABLE parquet_test (id int, str string, mp MAPSTRING,STRING, lst ARRAYSTRING, strct STRUCTA:STRING,B:STRING) PARTITIONED BY (part string) ROW

Re: Does HiveContext support Parquet?

2014-08-27 Thread lyc
Thanks a lot. Finally, I can create parquet table using your command -driver-class-path. I am using hadoop 2.3. Now, I will try to load data into the tables. Thanks, lyc -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Does-HiveContext-support-Parquet

Re: Does HiveContext support Parquet?

2014-08-27 Thread Michael Armbrust
. Thanks, lyc -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Does-HiveContext-support-Parquet-tp12209p12931.html Sent from the Apache Spark User List mailing list archive at Nabble.com

Re: Does HiveContext support Parquet?

2014-08-26 Thread lyc
/hduser/parquet-hive-bundle-1.5.0.jar #import related hiveContext (successful) ... # create parquet table: hql(CREATE TABLE parquet_test (id int, str string, mp MAPSTRING,STRING, lst ARRAYSTRING, strct STRUCTA:STRING,B:STRING) PARTITIONED BY (part string) ROW FORMAT SERDE

Re: HiveContext ouput log file

2014-08-25 Thread Michael Armbrust
...@yahoo.com.invalid wrote: Hello All, I have executed the following udf sql in my spark hivecontext, hiveContext.hql(select count(t1.col1) from t1 join t2 where myUDF(t1.id , t2.id) = true) Where do i find the count output? Thanks and Regards, Sankar S.

Re: Is hive UDF are supported in HiveContext

2014-08-19 Thread chutium
there is no collect_list in hive 0.12 try this after this ticket is done https://issues.apache.org/jira/browse/SPARK-2706 i am also looking forward to this. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Is-hive-UDF-are-supported-in-HiveContext

RE: Does HiveContext support Parquet?

2014-08-18 Thread lyc
I followed your instructions to try to load data as parquet format through hiveContext but failed. Do you happen to know my uncorrectness in the following steps? The steps I am following is like: 1. download parquet-hive-bundle-1.5.0.jar 2. revise hive-site.xml including this: property

Re: Does HiveContext support Parquet?

2014-08-18 Thread Silvio Fiorito
on Hive 0.12. That¹s what¹s worked for me. Thanks, Silvio On 8/18/14, 3:14 PM, lyc yanchen@huawei.com wrote: I followed your instructions to try to load data as parquet format through hiveContext but failed. Do you happen to know my uncorrectness in the following steps? The steps I am

RE: Does HiveContext support Parquet?

2014-08-16 Thread Silvio Fiorito
@huawei.com Sent: ‎8/‎16/‎2014 12:56 AM To: u...@spark.incubator.apache.orgmailto:u...@spark.incubator.apache.org Subject: Re: Does HiveContext support Parquet? Thank you for your reply. Do you know where I can find some detailed information about how to use Parquet in HiveContext? Any information

RE: Does HiveContext support Parquet?

2014-08-16 Thread lyc
Thanks for your help. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Does-HiveContext-support-Parquet-tp12209p12231.html Sent from the Apache Spark User List mailing list archive at Nabble.com

RE: Does HiveContext support Parquet?

2014-08-16 Thread Flavio Pompermaier
from all applications? Best, FP On Aug 16, 2014 5:29 PM, lyc yanchen@huawei.com wrote: Thanks for your help. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Does-HiveContext-support-Parquet-tp12209p12231.html Sent from the Apache Spark User List

RE: Does HiveContext support Parquet?

2014-08-16 Thread Silvio Fiorito
If you're using HiveContext then all metadata is in the Hive metastore as defined in hive-site.xml. Concurrent writes should be fine as long as you're using a concurrent metastore db. From: Flavio Pompermaiermailto:pomperma...@okkam.it Sent: ‎8/‎16/‎2014 1:26 PM

Re: Does HiveContext support Parquet?

2014-08-16 Thread Michael Armbrust
with the HiveContext. In 1.1 we have renamed this function to registerTempTable to make this more clear. 2) If I have multiple hiveContexts (one per application) using the same parquet table, is there any problem if inserting concurrently from all applications? This is not supported.

Does HiveContext support Parquet?

2014-08-15 Thread lyc
://apache-spark-user-list.1001560.n3.nabble.com/Does-HiveContext-support-Parquet-tp12209.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org

Re: Does HiveContext support Parquet?

2014-08-15 Thread Silvio Fiorito
://apache-spark-user-list.1001560.n3.nabble.com/Does-HiveContext-support-Parquet-tp12209.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org

Re: Does HiveContext support Parquet?

2014-08-15 Thread lyc
Thank you for your reply. Do you know where I can find some detailed information about how to use Parquet in HiveContext? Any information is appreciated. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Does-HiveContext-support-Parquet-tp12209p12216.html

Re: CDH5, HiveContext, Parquet

2014-08-11 Thread chutium
hive-thriftserver does not work with parquet tables in hive metastore also, this PR will fix it too? do not need to change any pom.xml ? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/CDH5-HiveContext-Parquet-tp11853p11880.html Sent from the Apache Spark

CDH5, HiveContext, Parquet

2014-08-10 Thread Eric Friedman
I have a CDH5.0.3 cluster with Hive tables written in Parquet. The tables have the DeprecatedParquetInputFormat on their metadata, and when I try to select from one using Spark SQL, it blows up with a stack trace like this: java.lang.RuntimeException: java.lang.ClassNotFoundException:

Re: CDH5, HiveContext, Parquet

2014-08-10 Thread Sean Owen
As far as I can tell, the method was removed after 0.12.0 in the fix for HIVE-5223 (https://github.com/apache/hive/commit/4059a32f34633dcef1550fdef07d9f9e044c722c#diff-948cc2a95809f584eb030e2b57be3993), and that fix was back-ported in its entirety to 5.0.0+:

Re: CDH5, HiveContext, Parquet

2014-08-10 Thread Eric Friedman
Hi Sean, Thanks for the reply. I'm on CDH 5.0.3 and upgrading the whole cluster to 5.1.0 will eventually happen but not immediately. I've tried running the CDH spark-1.0 release and also building it from source. This, unfortunately goes into a whole other rathole of dependencies. :-( Eric

Re: CDH5, HiveContext, Parquet

2014-08-10 Thread Sean Owen
Hm, I was thinking that the issue is that Spark has to use a forked hive-exec since hive-exec unfortunately includes a bunch of dependencies it shouldn't. It forked Hive 0.12.0: http://mvnrepository.com/artifact/org.spark-project.hive/hive-exec/0.12.0 ... and then I was thinking maybe CDH wasn't

Re: CDH5, HiveContext, Parquet

2014-08-10 Thread Eric Friedman
Yeah, that's what I feared. Unfortunately upgrades on very large production clusters aren't a cheap way to find out what else is broken. Perhaps I can create an RCFile table and sidestep parquet for now. On Aug 10, 2014, at 1:45 PM, Sean Owen so...@cloudera.com wrote: Hm, I was thinking

Re: CDH5, HiveContext, Parquet

2014-08-10 Thread Michael Armbrust
I imagine it's not the only instance of this kind of problem people will ever encounter. Can you rebuild Spark with this particular release of Hive? Unfortunately the Hive APIs that we use change to much from release to release to make this possible. There is a JIRA for compiling Spark SQL

Re: CDH5, HiveContext, Parquet

2014-08-10 Thread Eric Friedman
Thanks Michael, I can try that too. I know you guys aren't in sales/marketing (thank G-d), but given all the hoopla about the CDH-DataBricks partnership, it'd be awesome if you guys were somewhat more aligned, by which I mean that the DataBricks releases on Apache that say for CDH5 would

Re: CDH5, HiveContext, Parquet

2014-08-10 Thread Yin Huai
If the link to PR/1819 is broken. Here is the one https://github.com/apache/spark/pull/1819. On Sun, Aug 10, 2014 at 5:56 PM, Eric Friedman eric.d.fried...@gmail.com wrote: Thanks Michael, I can try that too. I know you guys aren't in sales/marketing (thank G-d), but given all the hoopla

Re: CDH5, HiveContext, Parquet

2014-08-10 Thread Eric Friedman
On Sun, Aug 10, 2014 at 2:43 PM, Michael Armbrust mich...@databricks.com wrote: if I try to add hive-exec-0.12.0-cdh5.0.3.jar to my SPARK_CLASSPATH, in order to get DeprecatedParquetInputFormat, I find out that there is an incompatibility in the SerDeUtils class. Spark's Hive snapshot

Got error “java.lang.IllegalAccessError when using HiveContext in Spark shell on AWS

2014-08-07 Thread Zhun Shen
Hi, When I try to use HiveContext in Spark shell on AWS, I got the error java.lang.IllegalAccessError: tried to access method com.google.common.collect.MapMaker.makeComputingMap(Lcom/google/common/base/Function;)Ljava/util/concurrent/ConcurrentMap. I follow the steps below to compile and install

Re: Got error “java.lang.IllegalAccessError when using HiveContext in Spark shell on AWS

2014-08-07 Thread Cheng Lian
Hey Zhun, Thanks for the detailed problem description. Please see my comments inlined below. On Thu, Aug 7, 2014 at 6:18 PM, Zhun Shen shenzhunal...@gmail.com wrote: Caused by: java.lang.IllegalAccessError: tried to access method

Re: Got error “java.lang.IllegalAccessError when using HiveContext in Spark shell on AWS

2014-08-07 Thread Zhun Shen
Hi Cheng, I replaced Guava 15.0 with Guava 14.0.1 in my spark classpath, the problem was solved. So your method is correct. It proved that this issue was caused by AWS EMR (ami-version 3.1.0) libs which include Guava 15.0. Many thanks and see you in the first Spark User Beijing Meetup

Re: HiveContext is creating metastore warehouse locally instead of in hdfs

2014-08-01 Thread chenjie
the spark logs says? Enable debug mode to see what's going on in spark-shell when it tries to interact and init HiveContext. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/HiveContext-is-creating-metastore-warehouse-locally-instead-of-in-hdfs-tp10838p11147

Re: HiveContext is creating metastore warehouse locally instead of in hdfs

2014-07-31 Thread Andrew Lee
Could you enable HistoryServer and provide the properties and CLASSPATH for the spark-shell? And 'env' command to list your environment variables? By the way, what does the spark logs says? Enable debug mode to see what's going on in spark-shell when it tries to interact and init HiveContext

RE: HiveContext is creating metastore warehouse locally instead of in hdfs

2014-07-29 Thread nikroy16
Thanks for the response... hive-site.xml is in the classpath so that doesn't seem to be the issue. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/HiveContext-is-creating-metastore-warehouse-locally-instead-of-in-hdfs-tp10838p10871.html Sent from the Apache

Re: HiveContext is creating metastore warehouse locally instead of in hdfs

2014-07-29 Thread Michael Armbrust
this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/HiveContext-is-creating-metastore-warehouse-locally-instead-of-in-hdfs-tp10838p10871.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

HiveContext is creating metastore warehouse locally instead of in hdfs

2014-07-28 Thread nikroy16
Hi, Even though hive.metastore.warehouse.dir in hive-site.xml is set to the default user/hive/warehouse and the permissions are correct in hdfs, HiveContext seems to be creating metastore locally instead of hdfs. After looking into the spark code, I found the following in HiveContext.scala

Support for Percentile and Variance Aggregation functions in Spark with HiveContext

2014-07-25 Thread vinay . kashyap
.applyOrElse(Analyzer.scala:113)     at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:165) I have read in the documentation that with HiveContext Spark SQL supports all the UDFs supported in Hive. I want to know if there is anything else I need to follow to use Percentile

How to add jar with SparkSQL HiveContext?

2014-06-17 Thread Earthson
I have a problem with add jar command hql(add jar /.../xxx.jar) Error: Exception in thread main java.lang.AssertionError: assertion failed: No plan for AddJar ... How could I do this job with HiveContext, I can't find any api to do it. Does SparkSQL with Hive support UDF/UDAF? -- View

Re: How to add jar with SparkSQL HiveContext?

2014-06-17 Thread Michael Armbrust
in thread main java.lang.AssertionError: assertion failed: No plan for AddJar ... How could I do this job with HiveContext, I can't find any api to do it. Does SparkSQL with Hive support UDF/UDAF? -- View this message in context: How to add jar with SparkSQL HiveContext

Re: SQLContext and HiveContext Query Performance

2014-06-04 Thread Zongheng Yang
to run the optimizer generate physical plans that slows down the query. Thanks, Zongheng On Wed, Jun 4, 2014 at 2:16 PM, ssb61 santoshbalma...@gmail.com wrote: -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/SQLContext-and-HiveContext-Query-Performance

Re: SQLContext and HiveContext Query Performance

2014-06-04 Thread ssb61
mapPartitions at Exchange.scala:44 - 13 s Thanks, Santosh -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/SQLContext-and-HiveContext-Query-Performance-tp6948p6981.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

<    1   2   3   4