Unable to create hive table using HiveContext

2015-12-23 Thread Soni spark
Hi friends, I am trying to create hive table through spark with Java code in Eclipse using below code. HiveContext sqlContext = new org.apache.spark.sql.hive.HiveContext(sc.sc()); sqlContext.sql("CREATE TABLE IF NOT EXISTS src (key INT, value STRING)"); but i am getting error

Re: Unable to create hive table using HiveContext

2015-12-23 Thread Zhan Zhang
<soni2015.sp...@gmail.com<mailto:soni2015.sp...@gmail.com>> wrote: Hi friends, I am trying to create hive table through spark with Java code in Eclipse using below code. HiveContext sqlContext = new org.apache.spark.sql.hive.HiveContext(sc.sc<http://sc.sc/>()); sqlContext

Can't read data correctly through beeline when data is save by HiveContext

2015-12-22 Thread licl
Hi, Here is my javacode; SparkConf sparkConf = Constance.getSparkConf(); JavaSparkContext sc = new JavaSparkContext(sparkConf); SQLContext sql = new SQLContext(sc); HiveContext sqlContext = new HiveContext(sc.sc()); List fields = new

Re: Can't read data correctly through beeline when data is save by HiveContext

2015-12-22 Thread licl
-is-save-by-HiveContext-tp25774p25776.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h

Re: Can't read data correctly through beeline when data is save by HiveContext

2015-12-22 Thread licl
i solove this now; just run 'refresh table shop.id' on beeline; -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Can-t-read-data-correctly-through-beeline-when-data-is-save-by-HiveContext-tp25774p25779.html Sent from the Apache Spark User List mailing list

Re: HiveContext Self join not reading from cache

2015-12-18 Thread Gourav Sengupta
> Hi Ted, > > The self join works fine on tbales where the hivecontext tables are direct > hive tables, therefore > > table1 = hiveContext.sql("select columnA, columnB from hivetable1") > table1.registerTempTable("table1") > table1.cache() > table1

Re: HiveContext Self join not reading from cache

2015-12-18 Thread Gourav Sengupta
in SPARK > > https://forums.databricks.com/questions/2142/self-join-in-spark-sql.html > > > Regards, > Gourav > > On Thu, Dec 17, 2015 at 10:52 AM, Gourav Sengupta < > gourav.sengu...@gmail.com> wrote: > >> Hi Ted, >> >> The self join works fi

Re: HiveContext Self join not reading from cache

2015-12-18 Thread Gourav Sengupta
a < > gourav.sengu...@gmail.com> wrote: > >> hi, >> >> I think that people have reported the same issue elsewhere, and this >> should be registered as a bug in SPARK >> >> https://forums.databricks.com/questions/2142/self-join-in-spark-sql.html >&g

Re: HiveContext Self join not reading from cache

2015-12-18 Thread Ted Yu
t; [programme_key#1802,is_logged_in#1295L,is_4od_video_view#1327L], >> (MetastoreRelation default, omnitureweb_log, None), [hit_month#1289 IN >> (2015-11),hit_day#1290 IN (20)] >> >> Code Generation: true >> >> >> >> Regards, >> Gourav &g

Re: HiveContext Self join not reading from cache

2015-12-17 Thread Gourav Sengupta
Hi Ted, The self join works fine on tbales where the hivecontext tables are direct hive tables, therefore table1 = hiveContext.sql("select columnA, columnB from hivetable1") table1.registerTempTable("table1") table1.cache() table1.count() and if I do a self join on table1 t

Re: HiveContext Self join not reading from cache

2015-12-16 Thread Ted Yu
I did the following exercise in spark-shell ("c" is cached table): scala> sqlContext.sql("select x.b from c x join c y on x.a = y.a").explain == Physical Plan == Project [b#4] +- BroadcastHashJoin [a#3], [a#125], BuildRight :- InMemoryColumnarTableScan [b#4,a#3], InMemoryRelation

HiveContext Self join not reading from cache

2015-12-16 Thread Gourav Sengupta
Hi, This is how the data can be created: 1. TableA : cached() 2. TableB : cached() 3. TableC: TableA inner join TableB cached() 4. TableC join TableC does not take the data from cache but starts reading the data for TableA and TableB from disk. Does this sound like a bug? The self join between

Re: hiveContext: storing lookup of partitions

2015-12-16 Thread Jeff Zhang
Hi, >>> >>> I have a HIVE table with few thousand partitions (based on date and >>> time). It takes a long time to run if for the first time and then >>> subsequently it is fast. >>> >>> Is there a way to store the cache of partition lookups

Re: hiveContext: storing lookup of partitions

2015-12-16 Thread Gourav Sengupta
Sengupta < >>> gourav.sengu...@gmail.com> wrote: >>> >>>> Hi, >>>> >>>> I have a HIVE table with few thousand partitions (based on date and >>>> time). It takes a long time to run if for the first time and then >>>> sub

Re: hiveContext: storing lookup of partitions

2015-12-16 Thread Gourav Sengupta
t;> I start a new SPARK instance (cannot keep my personal server running >> continuously), I can immediately restore back the temptable in hiveContext >> without asking it go again and cache the partition lookups? >> >> Currently it takes around 1.5 hours for me just to cache in

hiveContext: storing lookup of partitions

2015-12-15 Thread Gourav Sengupta
server running continuously), I can immediately restore back the temptable in hiveContext without asking it go again and cache the partition lookups? Currently it takes around 1.5 hours for me just to cache in the partition information and after that I can see that the job gets queued in the SPARK UI

Re: hiveContext: storing lookup of partitions

2015-12-15 Thread Jeff Zhang
n subsequently it > is fast. > > Is there a way to store the cache of partition lookups so that every time > I start a new SPARK instance (cannot keep my personal server running > continuously), I can immediately restore back the temptable in hiveContext > without asking it go again and cache

Kryo serialization fails when using SparkSQL and HiveContext

2015-12-14 Thread Linh M. Tran
Hi everyone, I'm using HiveContext and SparkSQL to query a Hive table and doing join operation on it. After changing the default serializer to Kryo with spark.kryo.registrationRequired = true, the Spark application failed with the following error: java.lang.IllegalArgumentException: Class

Re: Kryo serialization fails when using SparkSQL and HiveContext

2015-12-14 Thread Michael Armbrust
wrote: > Hi everyone, > I'm using HiveContext and SparkSQL to query a Hive table and doing join > operation on it. > After changing the default serializer to Kryo with > spark.kryo.registrationRequired = true, the Spark application failed with > the following error: > > java

Using TestHiveContext/HiveContext in unit tests

2015-12-11 Thread Sahil Sareen
I'm trying to do this in unit tests: val sConf = new SparkConf() .setAppName("RandomAppName") .setMaster("local") val sc = new SparkContext(sConf) val sqlContext = new TestHiveContext(sc) // tried new HiveContext(sc) as well But I get this: *[sc

Re: Using TestHiveContext/HiveContext in unit tests

2015-12-11 Thread Michael Armbrust
t; > val sConf = new SparkConf() > .setAppName("RandomAppName") > .setMaster("local") > val sc = new SparkContext(sConf) > val sqlContext = new TestHiveContext(sc) // tried new > HiveContext(sc) as well > > > But I get this:

Re: HiveContext creation failed with Kerberos

2015-12-09 Thread Neal Yin
ate: Tuesday, December 8, 2015 at 4:09 AM To: "user@spark.apache.org<mailto:user@spark.apache.org>" <user@spark.apache.org<mailto:user@spark.apache.org>> Subject: Re: HiveContext creation failed with Kerberos On 8 Dec 2015, at 06:52, Neal Yin <neal@workday.com<

Re: HiveContext creation failed with Kerberos

2015-12-08 Thread Steve Loughran
On 8 Dec 2015, at 06:52, Neal Yin > wrote: 15/12/08 04:12:28 ERROR transport.TSaslTransport: SASL negotiation failure javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism

HiveContext creation failed with Kerberos

2015-12-07 Thread Neal Yin
eSparkConf(…) val sparkContext = new JavaSparkContext(sparkConf) new HiveContext(sparkContext.sc) // failed }) Spark context boots up fine with UGI, but HiveContext creation failed with following message. If I manually do kinit within same shell, this code works. Any thoughts? 15/12/0

RE: RE: error while creating HiveContext

2015-11-27 Thread Chandra Mohan, Ananda Vel Murugan
uru...@honeywell.com>; user <user@spark.apache.org> Subject: Re: RE: error while creating HiveContext Could you provide your hive-site.xml file info ? Best, Sun. fightf...@163.com<mailto:fightf...@163.com> From: Chandra Mohan, Ananda Vel Murugan&

Re: RE: error while creating HiveContext

2015-11-27 Thread fightf...@163.com
Could you provide your hive-site.xml file info ? Best, Sun. fightf...@163.com From: Chandra Mohan, Ananda Vel Murugan Date: 2015-11-27 17:04 To: fightf...@163.com; user Subject: RE: error while creating HiveContext Hi, I verified and I could see hive-site.xml in spark conf directory

RE: error while creating HiveContext

2015-11-27 Thread Chandra Mohan, Ananda Vel Murugan
Subject: Re: error while creating HiveContext Hi, I think you just want to put the hive-site.xml in the spark/conf directory and it would load it into spark classpath. Best, Sun. fightf...@163.com<mailto:fightf...@163.com> From: Chandra Mohan,

error while creating HiveContext

2015-11-26 Thread Chandra Mohan, Ananda Vel Murugan
Hi, I am building a spark-sql application in Java. I created a maven project in Eclipse and added all dependencies including spark-core and spark-sql. I am creating HiveContext in my spark program and then try to run sql queries against my Hive Table. When I submit this job in spark, for some

Re: error while creating HiveContext

2015-11-26 Thread fightf...@163.com
Hi, I think you just want to put the hive-site.xml in the spark/conf directory and it would load it into spark classpath. Best, Sun. fightf...@163.com From: Chandra Mohan, Ananda Vel Murugan Date: 2015-11-27 15:04 To: user Subject: error while creating HiveContext Hi, I am building

Re: [Spark-SQL]: Disable HiveContext from instantiating in spark-shell

2015-11-06 Thread Jerry Lam
Hi Zhan, Thank you for providing a workaround! I will try this out but I agree with Ted, there should be a better way to capture the exception and handle it by just initializing SQLContext instead of HiveContext. WARN the user that something is wrong with his hive setup. Having

Re: [Spark-SQL]: Disable HiveContext from instantiating in spark-shell

2015-11-06 Thread Zhan Zhang
1:9083 HW11188:spark zzhang$ By the way, I don’t know whether there is any caveat for this walk around. Thanks. Zhan Zhang On Nov 6, 2015, at 2:40 PM, Jerry Lam <chiling...@gmail.com<mailto:chiling...@gmail.com>> wrote: Hi Zhan, I don’t use HiveContext features at

Re: [Spark-SQL]: Disable HiveContext from instantiating in spark-shell

2015-11-06 Thread Zhan Zhang
I agree with minor change. Adding a config to provide the option to init SQLContext or HiveContext, with HiveContext as default instead of bypassing when hitting the Exception. Thanks. Zhan Zhang On Nov 6, 2015, at 2:53 PM, Ted Yu <yuzhih...@gmail.com<mailto:yuzhih...@gmail.com>&g

Re: [Spark-SQL]: Disable HiveContext from instantiating in spark-shell

2015-11-06 Thread Ted Yu
I would suggest adding a config parameter that allows bypassing initialization of HiveContext in case of SQLException Cheers On Fri, Nov 6, 2015 at 2:50 PM, Zhan Zhang <zzh...@hortonworks.com> wrote: > Hi Jerry, > > OK. Here is an ugly walk around. > > Put a hive-site.xml u

Re: [Spark-SQL]: Disable HiveContext from instantiating in spark-shell

2015-11-06 Thread Zhan Zhang
If you assembly jar have hive jar included, the HiveContext will be used. Typically, HiveContext has more functionality than SQLContext. In what case you have to use SQLContext that cannot be done by HiveContext? Thanks. Zhan Zhang On Nov 6, 2015, at 10:43 AM, Jerry Lam <chiling...@gmail.

Re: [Spark-SQL]: Disable HiveContext from instantiating in spark-shell

2015-11-06 Thread Jerry Lam
Hi Zhan, I don’t use HiveContext features at all. I use mostly DataFrame API. It is sexier and much less typo. :) Also, HiveContext requires metastore database setup (derby by default). The problem is that I cannot have 2 spark-shell sessions running at the same time in the same host (e.g

Re: [Spark-SQL]: Disable HiveContext from instantiating in spark-shell

2015-11-06 Thread Zhan Zhang
ree with Ted, there should be a better way to capture the exception and handle it by just initializing SQLContext instead of HiveContext. WARN the user that something is wrong with his hive setup. Having spark.sql.hive.enabled false configuration would be lovely too. :) Just an addi

Re: [Spark-SQL]: Disable HiveContext from instantiating in spark-shell

2015-11-06 Thread Jerry Lam
Hi Ted, I was trying to set spark.sql.dialect to sql as to specify I only need “SQLContext” not HiveContext. It didn’t work. It still instantiate HiveContext. Since I don’t use HiveContext and I don’t want to start a mysql database because I want to have more than 1 session of spark-shell

[Spark-SQL]: Disable HiveContext from instantiating in spark-shell

2015-11-06 Thread Jerry Lam
Hi spark users and developers, Is it possible to disable HiveContext from being instantiated when using spark-shell? I got the following errors when I have more than one session starts. Since I don't use HiveContext, it would be great if I can have more than 1 spark-shell start at the same time

Re: [Spark-SQL]: Disable HiveContext from instantiating in spark-shell

2015-11-06 Thread Jerry Lam
What is interesting is that pyspark shell works fine with multiple session in the same host even though multiple HiveContext has been created. What does pyspark does differently in terms of starting up the shell? > On Nov 6, 2015, at 12:12 PM, Ted Yu <yuzhih...@gmail.com&

Why does predicate pushdown not work on HiveContext (concrete HiveThriftServer2) ?

2015-10-31 Thread Martin Senne
Hi all, # Programm Sketch I create a HiveContext `hiveContext` With that context, I create a DataFrame `df` from a JDBC relational table.I register the DataFrame `df` viadf.registerTempTable("TESTTABLE")I start a HiveThriftServer2 via HiveThriftServer2.startWithContext(h

RE: HiveContext ignores ("skip.header.line.count"="1")

2015-10-26 Thread Cheng, Hao
I am not sure if we really want to support that with HiveContext, but a workround is to use the Spark package at https://github.com/databricks/spark-csv From: Felix Cheung [mailto:felixcheun...@hotmail.com] Sent: Tuesday, October 27, 2015 10:54 AM To: Daniel Haviv; user Subject: RE: HiveContext

Re: HiveContext ignores ("skip.header.line.count"="1")

2015-10-26 Thread Daniel Haviv
I will Thank you. > On 27 באוק׳ 2015, at 4:54, Felix Cheung <felixcheun...@hotmail.com> wrote: > > Please open a JIRA? > > > Date: Mon, 26 Oct 2015 15:32:42 +0200 > Subject: HiveContext ignores ("skip.header.line.count"="1") > From: daniel.ha

RE: HiveContext ignores ("skip.header.line.count"="1")

2015-10-26 Thread Felix Cheung
Please open a JIRA? Date: Mon, 26 Oct 2015 15:32:42 +0200 Subject: HiveContext ignores ("skip.header.line.count"="1") From: daniel.ha...@veracity-group.com To: user@spark.apache.org Hi,I have a csv table in Hive which is configured to skip the header row

HiveContext ignores ("skip.header.line.count"="1")

2015-10-26 Thread Daniel Haviv
Hi, I have a csv table in Hive which is configured to skip the header row using TBLPROPERTIES("skip.header.line.count"="1"). When querying from Hive the header row is not included in the data, but when running the same query via HiveContext I get the header row. I made sure t

RE: Insert via HiveContext is slow

2015-10-09 Thread Cheng, Hao
I think DF performs the same as the SQL API does in the multi-inserts, if you don’t use the cached table. Hao From: Daniel Haviv [mailto:daniel.ha...@veracity-group.com] Sent: Friday, October 9, 2015 3:09 PM To: Cheng, Hao Cc: user Subject: Re: Insert via HiveContext is slow Thanks Hao

Re: Insert via HiveContext is slow

2015-10-09 Thread Daniel Haviv
out soon. > > > > Hao > > > > *From:* Daniel Haviv [mailto:daniel.ha...@veracity-group.com] > *Sent:* Friday, October 9, 2015 3:08 AM > *To:* user > *Subject:* Re: Insert via HiveContext is slow > > > > Forgot to mention that my insert is a multi table insert

RowNumber in HiveContext returns null or negative values

2015-10-08 Thread Saif.A.Ellafi
Hi all, would this be a bug?? val ws = Window. partitionBy("clrty_id"). orderBy("filemonth_dtt") val nm = "repeatMe" df.select(df.col("*"), rowNumber().over(ws).cast("int").as(nm))

Re: RowNumber in HiveContext returns null or negative values

2015-10-08 Thread Michael Armbrust
Which version of Spark? On Thu, Oct 8, 2015 at 7:25 AM, wrote: > Hi all, would this be a bug?? > > val ws = Window. > partitionBy("clrty_id"). > orderBy("filemonth_dtt") > > val nm = "repeatMe" >

RE: RowNumber in HiveContext returns null or negative values

2015-10-08 Thread Saif.A.Ellafi
Hi, thanks for looking into. v1.5.1. I am really worried. I dont have hive/hadoop for real in the environment. Saif From: Michael Armbrust [mailto:mich...@databricks.com] Sent: Thursday, October 08, 2015 2:57 PM To: Ellafi, Saif A. Cc: user Subject: Re: RowNumber in HiveContext returns null

RE: RowNumber in HiveContext returns null or negative values

2015-10-08 Thread Saif.A.Ellafi
...@databricks.com Cc: user@spark.apache.org Subject: RE: RowNumber in HiveContext returns null or negative values Hi, thanks for looking into. v1.5.1. I am really worried. I dont have hive/hadoop for real in the environment. Saif From: Michael Armbrust [mailto:mich...@databricks.com] Sent: Thursday

Re: Insert via HiveContext is slow

2015-10-08 Thread Daniel Haviv
Oct 8, 2015 at 9:51 PM, Daniel Haviv < daniel.ha...@veracity-group.com> wrote: > Hi, > I'm inserting into a partitioned ORC table using an insert sql statement > passed via HiveContext. > The performance I'm getting is pretty bad and I was wondering if there are > ways to speed thi

RE: Insert via HiveContext is slow

2015-10-08 Thread Cheng, Hao
out soon. Hao From: Daniel Haviv [mailto:daniel.ha...@veracity-group.com] Sent: Friday, October 9, 2015 3:08 AM To: user Subject: Re: Insert via HiveContext is slow Forgot to mention that my insert is a multi table insert : sqlContext2.sql("""from avro_events later

Insert via HiveContext is slow

2015-10-08 Thread Daniel Haviv
Hi, I'm inserting into a partitioned ORC table using an insert sql statement passed via HiveContext. The performance I'm getting is pretty bad and I was wondering if there are ways to speed things up. Would saving the DF like this df.write().mode(SaveMode.Append).partitionBy("date").s

RE: RowNumber in HiveContext returns null or negative values

2015-10-08 Thread Saif.A.Ellafi
Repartition and default parallelism to 1, in cluster mode, is still broken. So the problem is not the parallelism, but the cluster mode itself. Something wrong with HiveContext + cluster mode. Saif From: saif.a.ell...@wellsfargo.com [mailto:saif.a.ell...@wellsfargo.com] Sent: Thursday, October

Re: RowNumber in HiveContext returns null or negative values

2015-10-08 Thread Michael Armbrust
Can you open a JIRA? On Thu, Oct 8, 2015 at 11:24 AM, <saif.a.ell...@wellsfargo.com> wrote: > Repartition and default parallelism to 1, in cluster mode, is still > *broken*. > > > > So the problem is not the parallelism, but the cluster mode itself. > Something wrong

hiveContext sql number of tasks

2015-10-07 Thread patcharee
Hi, I do a sql query on about 10,000 partitioned orc files. Because of the partition schema the files cannot be merged any longer (to reduce the total number). From this command hiveContext.sql(sqlText), the 10K tasks were created to handle each file. Is it possible to use less tasks? How

Please help: Processes with HiveContext slower in cluster

2015-10-05 Thread Saif.A.Ellafi
Hi, I have a HiveContext job which takes less than 1 minute to complete in local mode with 16 cores. However, when I launch it over stand-alone cluster, it takes for ever, probably can't even finish. Even when I have the same only node running up in which I execute it locally. How could I

Stopping SparkContext and HiveContext

2015-09-13 Thread Ophir Cohen
on 'clean' context. I found it hard to do as, first of all, I couldn't find any way to understand of SparkContext is already stopped. It has private flag for that but its private. Anther problem is that when creating local HiveContext it initialize derby instance. when trying to create new

Re: Stopping SparkContext and HiveContext

2015-09-13 Thread Ted Yu
t; would like to create new SparkContext in order to run the tests on 'clean' > context. > I found it hard to do as, first of all, I couldn't find any way to > understand of SparkContext is already stopped. It has private flag for that > but its private. > Anther problem is that when crea

Re: Stopping SparkContext and HiveContext

2015-09-13 Thread Ted Yu
sts on 'clean' >> context. >> I found it hard to do as, first of all, I couldn't find any way to >> understand of SparkContext is already stopped. It has private flag for that >> but its private. >> Anther problem is that when creating local HiveContext it initialize >>

Re: Creating Parquet external table using HiveContext API

2015-09-10 Thread Michael Armbrust
rote: > Hi, > I want to create an external hive table using HiveContext. I have the > following : > 1. full path/location of parquet data directory > 2. name of the new table > 3. I can get the schema as well. > > What API will be the best (for 1,3.x or 1.4.x)? I can see 6 &

Re: Creating Parquet external table using HiveContext API

2015-09-10 Thread Mohammad Islam
TIONS (path '')") When you specify the path its automatically created as an external table.  The schema will be discovered. On Wed, Sep 9, 2015 at 9:33 PM, Mohammad Islam <misla...@yahoo.com.invalid> wrote: Hi,I want to create  an external hive table using HiveContext. I have the following

Creating Parquet external table using HiveContext API

2015-09-09 Thread Mohammad Islam
Hi,I want to create  an external hive table using HiveContext. I have the following :1. full path/location of parquet data directory2. name of the new table3. I can get the schema as well. What API will be the best (for 1,3.x or 1.4.x)? I can see 6 createExternalTable() APIs but not sure which

Re: How to unit test HiveContext without OutOfMemoryError (using sbt)

2015-08-26 Thread Mike Trienis
of the unit test) and I believe it has something to do with HiveContext not reclaiming memory after it is finished (or I'm not shutting it down properly). It could very well be related to sbt, however, it's not clear to me. On Tue, Aug 25, 2015 at 1:12 PM, Yana Kadiyska yana.kadiy...@gmail.com wrote

Re: How to unit test HiveContext without OutOfMemoryError (using sbt)

2015-08-26 Thread Michael Armbrust
. However, the primary issue is that running the same unit test in the same JVM (multiple times) results in increased memory (each run of the unit test) and I believe it has something to do with HiveContext not reclaiming memory after it is finished (or I'm not shutting it down properly). It could

How to unit test HiveContext without OutOfMemoryError (using sbt)

2015-08-25 Thread Mike Trienis
Hello, I am using sbt and created a unit test where I create a `HiveContext` and execute some query and then return. Each time I run the unit test the JVM will increase it's memory usage until I get the error: Internal error when running tests: java.lang.OutOfMemoryError: PermGen space Exception

Re: How to unit test HiveContext without OutOfMemoryError (using sbt)

2015-08-25 Thread Yana Kadiyska
test where I create a `HiveContext` and execute some query and then return. Each time I run the unit test the JVM will increase it's memory usage until I get the error: Internal error when running tests: java.lang.OutOfMemoryError: PermGen space Exception in thread Thread-2

Re: shutdown local hivecontext?

2015-08-06 Thread Cesar Flores
Well. I managed to solve that issue after running my tests on a linux system instead of windows (which I was originally using). However, now I have an error when I try to reset the hive context using hc.reset(). It tries to create a file inside directory /user/my_user_name instead of the usual

Re: shutdown local hivecontext?

2015-08-06 Thread Cesar Flores
Well, I try this approach, and still have issues. Apparently TestHive can not delete the hive metastore directory. The complete error that I have is: 15/08/06 15:01:29 ERROR Driver: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask.

Best practices to call hiveContext in DataFrame.foreach in executor program or how to have a for loop in driver program

2015-08-05 Thread unk1102
if not exists and does insert into using hiveContext.sql. Now we cant execute hiveContext in executor so I have to execute this for loop in driver program and should run serially one by one. When I submit this Spark job in YARN cluster almost all the time my executor gets lost because of shuffle not found

HiveContext error

2015-08-05 Thread Stefan Panayotov
Hello, I am trying to define an external Hive table from Spark HiveContext like the following: import org.apache.spark.sql.hive.HiveContext val hiveCtx = new HiveContext(sc) hiveCtx.sql(sCREATE EXTERNAL TABLE IF NOT EXISTS Rentrak_Ratings (Version string, Gen_Date string, Market_Number

shutdown local hivecontext?

2015-08-03 Thread Cesar Flores
We are using a local hive context in order to run unit tests. Our unit tests runs perfectly fine if we run why by one using sbt as the next example: sbt test-only com.company.pipeline.scalers.ScalerSuite.scala sbt test-only com.company.pipeline.labels.ActiveUsersLabelsSuite.scala However, if we

Re: shutdown local hivecontext?

2015-08-03 Thread Michael Armbrust
TestHive takes care of creating a temporary directory for each invocation so that multiple test runs won't conflict. On Mon, Aug 3, 2015 at 3:09 PM, Cesar Flores ces...@gmail.com wrote: We are using a local hive context in order to run unit tests. Our unit tests runs perfectly fine if we run

create HiveContext if available, otherwise SQLContext

2015-07-16 Thread Koert Kuipers
has anyone tried to make HiveContext only if the class is available? i tried this: implicit lazy val sqlc: SQLContext = try { Class.forName(org.apache.spark.sql.hive.HiveContext, true, Thread.currentThread.getContextClassLoader) .getConstructor(classOf[SparkContext]).newInstance(sc

Use rank with distribute by in HiveContext

2015-07-16 Thread Lior Chaga
Does spark HiveContext support the rank() ... distribute by syntax (as in the following article- http://www.edwardcapriolo.com/roller/edwardcapriolo/entry/doing_rank_with_hive )? If not, how can it be achieved? Thanks, Lior

RE: Use rank with distribute by in HiveContext

2015-07-16 Thread java8964
the customize UDF of rank. Yong Date: Thu, 16 Jul 2015 15:10:58 +0300 Subject: Use rank with distribute by in HiveContext From: lio...@taboola.com To: user@spark.apache.org Does spark HiveContext support the rank() ... distribute by syntax (as in the following article- http://www.edwardcapriolo.com

Re: Use rank with distribute by in HiveContext

2015-07-16 Thread Todd Nist
functionsrankrankdense_rankdenseRankpercent_rank percentRankntilentilerow_numberrowNumber HTH. -Todd On Thu, Jul 16, 2015 at 8:10 AM, Lior Chaga lio...@taboola.com wrote: Does spark HiveContext support the rank() ... distribute by syntax (as in the following article- http://www.edwardcapriolo.com

Re: create HiveContext if available, otherwise SQLContext

2015-07-16 Thread Yin Huai
...@tresata.com wrote: has anyone tried to make HiveContext only if the class is available? i tried this: implicit lazy val sqlc: SQLContext = try { Class.forName(org.apache.spark.sql.hive.HiveContext, true, Thread.currentThread.getContextClassLoader) .getConstructor(classOf[SparkContext

Re: create HiveContext if available, otherwise SQLContext

2015-07-16 Thread Koert Kuipers
#L1023-L1037). What is the version of Spark you are using? How did you add the spark-csv jar? On Thu, Jul 16, 2015 at 1:21 PM, Koert Kuipers ko...@tresata.com wrote: has anyone tried to make HiveContext only if the class is available? i tried this: implicit lazy val sqlc: SQLContext = try

Re: create HiveContext if available, otherwise SQLContext

2015-07-16 Thread Koert Kuipers
HiveContext only if the class is available? i tried this: implicit lazy val sqlc: SQLContext = try { Class.forName(org.apache.spark.sql.hive.HiveContext, true, Thread.currentThread.getContextClassLoader) .getConstructor(classOf[SparkContext]).newInstance(sc).asInstanceOf[SQLContext

Re: create HiveContext if available, otherwise SQLContext

2015-07-16 Thread Koert Kuipers
16, 2015 at 1:21 PM, Koert Kuipers ko...@tresata.com wrote: has anyone tried to make HiveContext only if the class is available? i tried this: implicit lazy val sqlc: SQLContext = try { Class.forName(org.apache.spark.sql.hive.HiveContext, true, Thread.currentThread.getContextClassLoader

Re: create HiveContext if available, otherwise SQLContext

2015-07-16 Thread Yin Huai
#L1023-L1037). What is the version of Spark you are using? How did you add the spark-csv jar? On Thu, Jul 16, 2015 at 1:21 PM, Koert Kuipers ko...@tresata.com wrote: has anyone tried to make HiveContext only if the class is available? i tried this: implicit lazy val sqlc: SQLContext = try

Re: create HiveContext if available, otherwise SQLContext

2015-07-16 Thread Koert Kuipers
/apache/spark/blob/master/repl/scala-2.10/src/main/scala/org/apache/spark/repl/SparkILoop.scala#L1023-L1037). What is the version of Spark you are using? How did you add the spark-csv jar? On Thu, Jul 16, 2015 at 1:21 PM, Koert Kuipers ko...@tresata.com wrote: has anyone tried to make HiveContext

HiveContext with Cloudera Pseudo Cluster

2015-07-10 Thread Sukhmeet Sethi
Hi All, I am trying to run a simple join on Hive through SparkShell on pseudo cloudera cluster on ubuntu machine : *val hc = new HiveContext(sc);* *hc.sql(use testdb);* But it is failing with the message : org.apache.hadoop.hive.ql.parse.SemanticException: Database does not exist: testdb

RE: [SparkR] Float type coercion with hiveContext

2015-07-08 Thread Sun, Rui
To: huangzheng Cc: Apache Spark User List Subject: Re: [SparkR] Float type coercion with hiveContext I used spark 1.4.0 binaries from official site: http://spark.apache.org/downloads.html And running it on: * Hortonworks HDP 2.2.0.0-2041 * with Hive 0.14 * with disabled hooks for Application Timeline Servers

Re: [SparkR] Float type coercion with hiveContext

2015-07-08 Thread Evgeny Sinelnikov
track it to see how it will be solved. Ray -Original Message- From: Evgeny Sinelnikov [mailto:esinelni...@griddynamics.com] Sent: Monday, July 6, 2015 7:27 PM To: huangzheng Cc: Apache Spark User List Subject: Re: [SparkR] Float type coercion with hiveContext I used spark 1.4.0

HiveContext throws org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient

2015-07-07 Thread bdev
Just trying to get started with Spark and attempting to use HiveContext using spark-shell to interact with existing Hive tables on my CDH cluster but keep running into the errors (pls see below) when I do 'hiveContext.sql(show tables)'. Wanted to know what all JARs need to be included to have

回复:HiveContext throws org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient

2015-07-07 Thread prosp4300
trying to get started with Spark and attempting to use HiveContext using spark-shell to interact with existing Hive tables on my CDH cluster but keep running into the errors (pls see below) when I do 'hiveContext.sql(show tables)'. Wanted to know what all JARs need to be included to have this working

RE: HiveContext throws org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient

2015-07-07 Thread Cheng, Hao
[mailto:buntu...@gmail.com] Sent: Tuesday, July 7, 2015 5:07 PM To: user@spark.apache.org Subject: HiveContext throws org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient Just trying to get started with Spark and attempting to use HiveContext using spark-shell to interact with existing

[SparkR] Float type coercion with hiveContext

2015-07-06 Thread Evgeny Sinelnikov
Hello, I'm got a trouble with float type coercion on SparkR with hiveContext. result - sql(hiveContext, SELECT offset, percentage from data limit 100) show(result) DataFrame[offset:float, percentage:float] head(result) Error in as.data.frame.default(x[[i]], optional = TRUE) : cannot

Re: [SparkR] Float type coercion with hiveContext

2015-07-06 Thread Evgeny Sinelnikov
) 晚上6:31 收件人: useruser@spark.apache.org; 主题: [SparkR] Float type coercion with hiveContext Hello, I'm got a trouble with float type coercion on SparkR with hiveContext. result - sql(hiveContext, SELECT offset, percentage from data limit 100) show(result) DataFrame[offset:float

Float type coercion on SparkR with hiveContext

2015-07-03 Thread Evgeny Sinelnikov
Hello, I'm got a trouble with float type coercion on SparkR with hiveContext. result - sql(hiveContext, SELECT offset, percentage from data limit 100) show(result) DataFrame[offset:float, percentage:float] head(result) Error in as.data.frame.default(x[[i]], optional = TRUE) : cannot

Re: Starting Spark without automatically starting HiveContext

2015-07-03 Thread Akhil Das
. Thanks Best Regards On Thu, Jul 2, 2015 at 6:11 PM, Daniel Haviv daniel.ha...@veracity-group.com wrote: Hi, I've downloaded the pre-built binaries for Hadoop 2.6 and whenever I start the spark-shell it always start with HiveContext. How can I disable the HiveContext from being initialized

Re: Starting Spark without automatically starting HiveContext

2015-07-03 Thread Daniel Haviv
The main reason is Spark's startup time and the need to configure a component I don't really need (without configs the hivecontext takes more time to load) Thanks, Daniel On 3 ביולי 2015, at 11:13, Robin East robin.e...@xense.co.uk wrote: As Akhil mentioned there isn’t AFAIK any kind

Re: Starting Spark without automatically starting HiveContext

2015-07-03 Thread ayan guha
Hivecontext should be supersets of SQL context so you should be able to perform all your tasks. Are you facing any problem with hivecontext? On 3 Jul 2015 17:33, Daniel Haviv daniel.ha...@veracity-group.com wrote: Thanks I was looking for a less hack-ish way :) Daniel On Fri, Jul 3, 2015

Re: Starting Spark without automatically starting HiveContext

2015-07-03 Thread Daniel Haviv
for Hadoop 2.6 and whenever I start the spark-shell it always start with HiveContext. How can I disable the HiveContext from being initialized automatically ? Thanks, Daniel

Starting Spark without automatically starting HiveContext

2015-07-02 Thread Daniel Haviv
Hi, I've downloaded the pre-built binaries for Hadoop 2.6 and whenever I start the spark-shell it always start with HiveContext. How can I disable the HiveContext from being initialized automatically ? Thanks, Daniel

Spark SQL parallel query submission via single HiveContext

2015-06-29 Thread V Dineshkumar
Hi, As per my use case I need to submit multiple queries to Spark SQL in parallel but due to HiveContext being thread safe the jobs are getting submitted sequentially. I could see many threads are waiting for HiveContext. on-spray-can-akka.actor.default-dispatcher-26 - Thread t@149

HiveContext /Spark much slower than Hive

2015-06-24 Thread afarahat
from mx3.post_tp_annotated_mb_impr where ad_id = 30590918987 and datestamp ='20150623' ) Thanks Ayman -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/HiveContext-Spark-much-slower-than-Hive-tp23480.html Sent from the Apache Spark User List mailing list

Re: Does HiveContext connect to HiveServer2?

2015-06-24 Thread Nitin kak
Hi Marcelo, The issue does not happen while connecting to the hive metstore, that works fine. It seems that HiveContext only uses Hive CLI to execute the queries while HiveServer2 does not support it. I dont think you can specify any configuration in hive-site.xml which can make it connect

<    1   2   3   4   >