I was able to query data from Impala table. Here is my git repo for anyone
who would like to check it :-
https://github.com/morfious902002/impala-spark-jdbc-kerberos
--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com
Did you ever find a solution to this? If so, can you share your solution? I
am running into similar issue in YARN cluster mode connecting to impala
table.
--
View this message in context:
The issue seems to be with primordial class loader. I cannot load the drivers
to all the nodes at the same location but have loaded the jars to HDFS. I
have tried SPARK_YARN_DIST_FILES as well as SPARK_CLASSPATH on the edge node
with no luck. Is there another way to load these jars through
Hi,
I am trying to create a Dataframe by querying Impala Table. It works fine in
my local environment but when I try to run it in cluster I either get
Error:java.lang.ClassNotFoundException: com.cloudera.impala.jdbc41.Driver
or
No Suitable Driver found.
Can someone help me or direct me to
We are using spark 1.6.1 on a CDH 5.5 cluster. The job worked fine with
Kerberos but when we implemented Encryption at Rest we ran into the
following issue:-
Df.write().mode(SaveMode.Append).partitionBy("Partition").parquet(path);
I have already tried setting these values with no success :-
I am using Spark 1.6.1 and writing to HDFS. In some cases it seems like all
the work is being done by one thread. Why is that?
Also, I need parquet.enable.summary-metadata to register the parquet files
to Impala.
Df.write().partitionBy("COLUMN").parquet(outputFileLocation);
It also, seems
I have a spark job that creates 6 million rows in RDDs. I convert the RDD
into Data-frame and write it to HDFS. Currently it takes 3 minutes to write
it to HDFS.
I am using spark 1.5.1 with YARN.
Here is the snippet:-
RDDList.parallelStream().forEach(mapJavaRDD -> {
if
Hi,
I created a cluster using spark-ec2 script. But it installs HDFS version
1.0. I would like to use this cluster to connect to HIVE installed on a
cloudera CDH 5.3 cluster. But I am getting the following error:-
org.apache.hadoop.ipc.RemoteException: Server IPC version 9 cannot
communicate
Hi,
I am trying to create a Spark cluster using the spark-ec2 script which will
support 2.5.0-cdh5.3.2 for HDFS as well as Hive. I created a cluster by
adding --hadoop-major-version=2.5.0 which solved some of the errors I was
getting. But now when I run select query on hive I get the following