Good day colleagues. Quick question on Parquet and Dataframes. Right now I have the 4 parquet files stored in HDFS under the same path: /path/to/parquets/parquet1, /path/to/parquets/parquet2, /path/to/parquets/parquet3, /path/to/parquets/parquet4… I want to perform a union on all this parquet files. Is there any other way of doing this different to DataFrame’s unionAll?
Thank you very much in advance. Andres Fernandez From: Mich Talebzadeh [mailto:mich.talebza...@gmail.com] Sent: Tuesday, March 01, 2016 1:50 PM To: Jeff Zhang Cc: Yogesh Vyas; user@spark.apache.org Subject: Re: Save DataFrame to Hive Table Hi It seems that your code is not specifying which database is your table created Try this scala> val HiveContext = new org.apache.spark.sql.hive.HiveContext(sc) scala> // Choose a database scala> HiveContext.sql("show databases").show scala> HiveContext.sql("use test") // I chose test database scala> HiveContext.sql("CREATE TABLE IF NOT EXISTS TableName (key INT, value STRING)") scala> HiveContext.sql("desc TableName").show +--------+---------+-------+ |col_name|data_type|comment| +--------+---------+-------+ | key| int| null| | value| string| null| +--------+---------+-------+ // create a simple DF Seq((1, "Mich"), (2, "James")) val b = a.toDF //Let me keep it simple. Create a temporary table and do a simple insert/select. No need to convolute it b.registerTempTable("tmp") // Rember this temporaryTable is created in sql context NOT HiveContext/ So HiveContext will NOT see that table // HiveContext.sql("INSERT INTO TableName SELECT * FROM tmp") org.apache.spark.sql.AnalysisException: no such table tmp; line 1 pos 36 // This will work sql("INSERT INTO TableName SELECT * FROM tmp") sql("select count(1) from TableName").show +---+ |_c0| +---+ | 2| +---+ HTH Dr Mich Talebzadeh LinkedIn https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw http://talebzadehmich.wordpress.com<http://talebzadehmich.wordpress.com/> On 1 March 2016 at 06:33, Jeff Zhang <zjf...@gmail.com<mailto:zjf...@gmail.com>> wrote: The following line does not execute the sql so the table is not created. Add .show() at the end to execute the sql. hiveContext.sql("CREATE TABLE IF NOT EXISTS TableName (key INT, value STRING)") On Tue, Mar 1, 2016 at 2:22 PM, Yogesh Vyas <informy...@gmail.com<mailto:informy...@gmail.com>> wrote: Hi, I have created a DataFrame in Spark, now I want to save it directly into the hive table. How to do it.? I have created the hive table using following hiveContext: HiveContext hiveContext = new org.apache.spark.sql.hive.HiveContext(sc.sc<http://sc.sc>()); hiveContext.sql("CREATE TABLE IF NOT EXISTS TableName (key INT, value STRING)"); I am using the following to save it into hive: DataFrame.write().mode(SaveMode.Append).insertInto("TableName"); But it gives the error: Exception in thread "main" java.lang.RuntimeException: Table Not Found: TableName at scala.sys.package$.error(package.scala:27) at org.apache.spark.sql.catalyst.analysis.SimpleCatalog.lookupRelation(Catalog.scala:139) at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.getTable(Analyzer.scala:257) at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$7.applyOrElse(Analyzer.scala:266) at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$7.applyOrElse(Analyzer.scala:264) at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$resolveOperators$1.apply(LogicalPlan.scala:57) at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$resolveOperators$1.apply(LogicalPlan.scala:57) at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:51) at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperators(LogicalPlan.scala:56) at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.apply(Analyzer.scala:264) at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.apply(Analyzer.scala:254) at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:83) at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:80) at scala.collection.LinearSeqOptimized$class.foldLeft(LinearSeqOptimized.scala:111) at scala.collection.immutable.List.foldLeft(List.scala:84) at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:80) at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:72) at scala.collection.immutable.List.foreach(List.scala:318) at org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:72) at org.apache.spark.sql.SQLContext$QueryExecution.analyzed$lzycompute(SQLContext.scala:916) at org.apache.spark.sql.SQLContext$QueryExecution.analyzed(SQLContext.scala:916) at org.apache.spark.sql.SQLContext$QueryExecution.assertAnalyzed(SQLContext.scala:914) at org.apache.spark.sql.SQLContext$QueryExecution.withCachedData$lzycompute(SQLContext.scala:918) at org.apache.spark.sql.SQLContext$QueryExecution.withCachedData(SQLContext.scala:917) at org.apache.spark.sql.SQLContext$QueryExecution.optimizedPlan$lzycompute(SQLContext.scala:921) at org.apache.spark.sql.SQLContext$QueryExecution.optimizedPlan(SQLContext.scala:921) at org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan$lzycompute(SQLContext.scala:926) at org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan(SQLContext.scala:924) at org.apache.spark.sql.SQLContext$QueryExecution.executedPlan$lzycompute(SQLContext.scala:930) at org.apache.spark.sql.SQLContext$QueryExecution.executedPlan(SQLContext.scala:930) at org.apache.spark.sql.SQLContext$QueryExecution.toRdd$lzycompute(SQLContext.scala:933) at org.apache.spark.sql.SQLContext$QueryExecution.toRdd(SQLContext.scala:933) at org.apache.spark.sql.DataFrameWriter.insertInto(DataFrameWriter.scala:176) at org.apache.spark.sql.DataFrameWriter.insertInto(DataFrameWriter.scala:164) at com.honeywell.Track.combine.App.main(App.java:451) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:672) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org<mailto:user-unsubscr...@spark.apache.org> For additional commands, e-mail: user-h...@spark.apache.org<mailto:user-h...@spark.apache.org> -- Best Regards Jeff Zhang