I followed your instructions to try to load data as parquet format through
hiveContext but failed. Do you happen to know my uncorrectness in the
following steps?
The steps I am following is like:
1. download "parquet-hive-bundle-1.5.0.jar"
2. revise hive-site.xml including this:
<property>
<name>hive.jar.directory</name>
<value>/home/hduser/hive/lib/parquet-hive-bundle-1.5.0.jar</value>
<description>
This is the location hive in tez mode will look for to find a site wide
installed hive instance. If not set, the directory under
hive.user.install.directory
corresponding to current user name will be used.
</description>
</property>
3. copy hive-site.xml to all nodes.
4. start spark-shell, then try to create table:
val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc)
import hiveContext._
hql("create table part (P_PARTKEY INT, P_NAME STRING, P_MFGR STRING,
P_BRAND STRING, P_TYPE STRING, P_SIZE INT, P_CONTAINER STRING, P_RETAILPRICE
DOUBLE, P_COMMENT STRING) STORED AS PARQUET")
Then I got this error:
14/08/18 19:09:00 ERROR Driver: FAILED: SemanticException Unrecognized file
format in STORED AS clause: PARQUET
org.apache.hadoop.hive.ql.parse.SemanticException: Unrecognized file format
in STORED AS clause: PARQUET
at
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.handleGenericFileFormat(BaseSemanticAnalyzer.java:569)
at
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeCreateTable(SemanticAnalyzer.java:8968)
at
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:8313)
at
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:284)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:441)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:342)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:977)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:888)
at
org.apache.spark.sql.hive.HiveContext.runHive(HiveContext.scala:186)
at
org.apache.spark.sql.hive.HiveContext.runSqlHive(HiveContext.scala:160)
at
org.apache.spark.sql.hive.HiveContext$QueryExecution.toRdd$lzycompute(HiveContext.scala:250)
at
org.apache.spark.sql.hive.HiveContext$QueryExecution.toRdd(HiveContext.scala:247)
at
org.apache.spark.sql.hive.HiveContext.hiveql(HiveContext.scala:85)
at org.apache.spark.sql.hive.HiveContext.hql(HiveContext.scala:90)
at $line44.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:18)
at $line44.$read$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:23)
at $line44.$read$$iwC$$iwC$$iwC$$iwC.<init>(<console>:25)
at $line44.$read$$iwC$$iwC$$iwC.<init>(<console>:27)
at $line44.$read$$iwC$$iwC.<init>(<console>:29)
at $line44.$read$$iwC.<init>(<console>:31)
at $line44.$read.<init>(<console>:33)
at $line44.$read$.<init>(<console>:37)
at $line44.$read$.<clinit>(<console>)
at $line44.$eval$.<init>(<console>:7)
at $line44.$eval$.<clinit>(<console>)
at $line44.$eval.$print(<console>)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at
org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:788)
at
org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1056)
at
org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:614)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:645)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:609)
at
org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:796)
at
org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:841)
at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:753)
at
org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:601)
at
org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:608)
at org.apache.spark.repl.SparkILoop.loop(SparkILoop.scala:611)
at
org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply$mcZ$sp(SparkILoop.scala:936)
at
org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:884)
at
org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:884)
at
scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:884)
at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:982)
at org.apache.spark.repl.Main$.main(Main.scala:31)
at org.apache.spark.repl.Main.main(Main.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at
org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:292)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:55)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Many thanks for help!
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Does-HiveContext-support-Parquet-tp12209p12318.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]