I have tables dw_bid that is created in Hive and has nothing to do with Spark. I have data in avro that i want to join with dw_bid table, this join needs to be done using Spark SQL. However for some reason Spark says dw_bid table does not exist. How do i say spark that dw_bid is a table created in Hive and read it.
Query that is run from Spark SQL ============================== insert overwrite table sojsuccessevents2_spark select guid,sessionKey,sessionStartDate,sojDataDate,seqNum,eventTimestamp,siteId,successEventType,sourceType,itemId, shopCartId,b.transaction_Id as transactionId,offerId,b.bdr_id as userId,priorPage1SeqNum,priorPage1PageId,exclWMSearchAttemptSeqNum,exclPriorSearchPageId, exclPriorSearchSeqNum,exclPriorSearchCategory,exclPriorSearchL1,exclPriorSearchL2,currentImpressionId,sourceImpressionId,exclPriorSearchSqr,exclPriorSearchSort, isDuplicate,b.bid_date as transactionDate,auctionTypeCode,isBin,leafCategoryId,itemSiteId,b.qty_bid as bidQuantity, b.bid_amt_unit_lstg_curncy * b.bid_exchng_rate as bidAmtUsd,offerQuantity,offerAmountUsd,offerCreateDate,buyerSegment,buyerCountryId,sellerId,sellerCountryId, sellerStdLevel,cssSellerLevel,a.experimentChannel from sojsuccessevents1 a join dw_bid b on a.itemId = b.item_id and a.transactionId = b.transaction_id where b.auct_end_dt >= '2015-02-16' AND b.bid_dt >= '2015-02-16' AND b.bid_type_code IN (1,9) AND b.bdr_id > 0 AND ( b.bid_flags & 32) = 0 and lower(a.successEventType) IN ('bid','bin') If i create sojsuccessevents2_spark from hive command line and run above command form Spark SQL program then i get error "sojsuccessevents2_spark table not found". Hence i dropped the command from Hive and run create table sojsuccessevents2_spark from Spark SQL before running above command and it works until it hits next road block "dw_bid table not found" This makes me belive that Spark for some reason is not able to read/understand the tables created outside Spark. I did copy /apache/hive/conf/hive-site.xml into Spark conf directory. Please suggest. Logs ——— 15/03/26 03:50:40 INFO HiveMetaStore.audit: ugi=dvasthimal ip=unknown-ip-addr cmd=get_table : db=default tbl=dw_bid 15/03/26 03:50:40 ERROR metadata.Hive: NoSuchObjectException(message:default.dw_bid table not found) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_table(HiveMetaStore.java:1560) 15/03/26 03:50:40 ERROR yarn.ApplicationMaster: User class threw exception: no such table List(dw_bid); line 1 pos 843 org.apache.spark.sql.AnalysisException: no such table List(dw_bid); line 1 pos 843 at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42) at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.getTable(Analyzer.scala:178) at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$6.applyOrElse(Analyzer.scala:187) Regards, Deepak On Thu, Mar 26, 2015 at 4:27 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) <deepuj...@gmail.com> wrote: > I have this query > > insert overwrite table sojsuccessevents2_spark select > guid,sessionKey,sessionStartDate,sojDataDate,seqNum,eventTimestamp,siteId,successEventType,sourceType,itemId, > shopCartId,b.transaction_Id as transactionId,offerId,b.bdr_id as > userId,priorPage1SeqNum,priorPage1PageId,exclWMSearchAttemptSeqNum,exclPriorSearchPageId, > exclPriorSearchSeqNum,exclPriorSearchCategory,exclPriorSearchL1,exclPriorSearchL2,currentImpressionId,sourceImpressionId,exclPriorSearchSqr,exclPriorSearchSort, > isDuplicate,b.bid_date as > transactionDate,auctionTypeCode,isBin,leafCategoryId,itemSiteId,b.qty_bid > as bidQuantity, b.bid_amt_unit_lstg_curncy * b.bid_exchng_rate as > > bidAmtUsd,offerQuantity,offerAmountUsd,offerCreateDate,buyerSegment,buyerCountryId,sellerId,sellerCountryId, > sellerStdLevel,cssSellerLevel,a.experimentChannel from sojsuccessevents1 a > *join > dw_bid b* on a.itemId = b.item_id and a.transactionId = > b.transaction_id where b.auct_end_dt >= '2015-02-16' AND b.bid_dt >= > '2015-02-16' AND b.bid_type_code IN (1,9) AND b.bdr_id > 0 AND ( > b.bid_flags & 32) = 0 and lower(a.successEventType) IN ('bid','bin') > > > If i create sojsuccessevents2_spark from hive command line and run above > command form Spark SQL program then i get error "sojsuccessevents2_spark > table not found". > > Hence i dropped the command from Hive and run create table > sojsuccessevents2_spark from Spark SQL before running above command and it > works until it hits next road block "*dw_bid table not found"* > > This makes me belive that Spark for some reason is not able to > read/understand the tables created outside Spark. I did copy > /apache/hive/conf/hive-site.xml into Spark conf directory. > > Please suggest. > > Regards, > Deepak > > > On Thu, Mar 26, 2015 at 1:26 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) <deepuj...@gmail.com> > wrote: > >> I have a hive table named dw_bid, when i run hive from command prompt and >> run describe dw_bid, it works. >> >> I want to join a avro file (table) in HDFS with this hive dw_bid table >> and i refer it as dw_bid from Spark SQL program, however i see >> >> 15/03/26 00:31:01 INFO HiveMetaStore.audit: ugi=dvasthimal >> ip=unknown-ip-addr cmd=get_table : db=default tbl=dw_bid >> 15/03/26 00:31:01 ERROR metadata.Hive: >> NoSuchObjectException(message:default.dw_bid table not found) >> at >> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_table(HiveMetaStore.java:1560) >> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >> at >> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) >> >> >> Code: >> >> val successDetail_S1 = sqlContext.avroFile(input) >> successDetail_S1.registerTempTable("sojsuccessevents1") >> val countS1 = sqlContext.sql("select >> guid,sessionKey,sessionStartDate,sojDataDate,seqNum,eventTimestamp,siteId,successEventType,sourceType,itemId," >> + >> " shopCartId,b.transaction_Id as transactionId,offerId,b.bdr_id >> as >> userId,priorPage1SeqNum,priorPage1PageId,exclWMSearchAttemptSeqNum,exclPriorSearchPageId," >> + >> " >> exclPriorSearchSeqNum,exclPriorSearchCategory,exclPriorSearchL1,exclPriorSearchL2,currentImpressionId,sourceImpressionId,exclPriorSearchSqr,exclPriorSearchSort," >> + >> " isDuplicate,b.bid_date as >> transactionDate,auctionTypeCode,isBin,leafCategoryId,itemSiteId,b.qty_bid >> as bidQuantity," + >> " b.bid_amt_unit_lstg_curncy * b.bid_exchng_rate as >> >> bidAmtUsd,offerQuantity,offerAmountUsd,offerCreateDate,buyerSegment,buyerCountryId,sellerId,sellerCountryId," >> + >> " sellerStdLevel,cssSellerLevel,a.experimentChannel" + >> " from sojsuccessevents1 a join dw_bid b " + >> " on a.itemId = b.item_id and a.transactionId = b.transaction_id >> " + >> " where b.bid_type_code IN (1,9) AND b.bdr_id > 0 AND ( b.bid_flags >> & 32) = 0 and lower(a.successEventType) IN ('bid','bin')") >> println("countS1.first:" + countS1.first) >> >> >> >> Any suggestions on how to refer a hive table form Spark SQL? >> -- >> >> Deepak >> >> > > > -- > Deepak > > -- Deepak