Thanks Lijun,
I found an answer that fixed my problem. Apparently, Ambari configuration of
the HDFS HA mode configured all of the core Horton packaged modules correctly
for them to work in HA mode (including HIVE). Hive would no longer accept a
hdfs://<hostname>:<port>/ syntax in HA mode and was expecting
hdfs://<ha_cluster_name>/ format.
The issue was that the Ambari 2.6 platform was still using a deprecated setting
in core-site.xml named “fs.defaultFS” pointing to my new HA cluster
“hdfs://bdp01”. However, apparently Kylin has this deprecated and is expecting
“fs.default.name” to hold this setting. So, I went into Ambari HDFS advanced
configs, under the “Custom core-site” section and added a custom property for:
“fs.default.name=hdfs://bdp01” and rolled out the configs via Ambari. Kylin
was then able to find the updated core-site.xml file in
${KYLIN_HOME}/Hadoop-conf/core-site.xml (a symbolic link to the
Ambari-configured file). Once this was done, Kylin was once again able to
build cubes!
-Phil
From: Lijun Cao <[email protected]>
Sent: Monday, November 12, 2018 6:46 PM
To: [email protected]
Subject: Re: Kylin 2.3.1 failing cube build on HDFS High-Availability cluster
Hi Phil,
What’s your deployment of your HBase cluster? Is it deployed as a standalone
cluster?
Here is a blog which have mentioned the settings of NN
HA(http://kylin.apache.org/blog/2016/06/10/standalone-hbase-cluster/). But the
scene is deploying HBase cluster as a standalone cluster.
See if it can help you.
Best Regards
Lijun Cao
在 2018年11月13日,02:39,Phil Scott
<[email protected]<mailto:[email protected]>> 写道:
Folks,
My real question: Are there any settings in kylin.properties, or in the
hdfs-site.xml or hive-site.xml, that can clue Kylin into the required syntax
for HA HDFS urls?
Background:
I have been running Kylin 2.3.1 for almost a year (very happily), on a Horton
HDP 2.6 cluster. This weekend, my HDFS namenode had an issue and went down. I
decided to upgrade it to HDFS High Availability mode.
See:
https://docs.hortonworks.com/HDPDocuments/Ambari-2.6.2.2/bk_ambari-operations/content/how_to_configure_namenode_high_availability.html
for details.
My HDFS cluster is now operating in HA mode, and now my Kylin Cube Builds are
failing on step 1. They’ve been working fine up until this change.
Once in HA mode, HDFS clients are supposed to recognize from the hdfs-site.conf
file that the HA mode is enabled, and use a different syntax for talking to
HDFS urls. For example, in the logs for cube-build step 1, Kylin is trying to
tell Hive to create an external table and map its “location” to an HDFS
location, using the old NameNode’s hostname directly (like this…)
(**** CREATE EXTERNAL TABLE code snipped out above ***)
STORED AS SEQUENCEFILE
LOCATION
'hdfs://pschd01.internaldomain.com:8020/<hdfs://pschd01.internaldomain.com:8020/kylin/kylin_metadata/kylin-51b56b1a-0f95-4825-ab13-d23a5ccb90ee/kylin_intermediate_ereputationv2_reviews_distinct_v2_prod_cube_453e6583_b7fb_4e62_8ffc_a330bb4e246f>kylin/kylin_metadata/kylin-51b56b1a-0f95-4825-ab13-d23a5ccb90ee/kylin_intermediate_ereputationv2_reviews_distinct_v2_prod_cube_453e6583_b7fb_4e62_8ffc_a330bb4e246f<hdfs://pschd01.internaldomain.com:8020/kylin/kylin_metadata/kylin-51b56b1a-0f95-4825-ab13-d23a5ccb90ee/kylin_intermediate_ereputationv2_reviews_distinct_v2_prod_cube_453e6583_b7fb_4e62_8ffc_a330bb4e246f>';
(*** ALTER TABLE command comes next ***
In the above, the
‘hdfs://pschd01.internaldomain.com:8020/’<hdfs://pschd01.internaldomain.com:8020/%E2%80%99>
address is directly addressing the old HDFS NameNode. This throws an error as
follows:
Failed with exception Wrong FS:
hdfs://pschd01.internaldomain.com:8020<hdfs://pschd01.internaldomain.com:8020/kylin/kylin_metadata/kylin-51b56b1a-0f95-4825-ab13-d23a5ccb90ee/kylin_intermediate_ereputationv2_reviews_distinct_v2_prod_cube_453e6583_b7fb_4e62_8ffc_a330bb4e246f/.hive-staging_hive_2018-11-12_01-22-26_487_4731507377031334971-1/-ext-10000>/kylin/kylin_metadata/kylin-51b56b1a-0f95-4825-ab13-d23a5ccb90ee/kylin_intermediate_ereputationv2_reviews_distinct_v2_prod_cube_453e6583_b7fb_4e62_8ffc_a330bb4e246f/.hive-staging_hive_2018-11-12_01-22-26_487_4731507377031334971-1/-ext-10000<hdfs://pschd01.internaldomain.com:8020/kylin/kylin_metadata/kylin-51b56b1a-0f95-4825-ab13-d23a5ccb90ee/kylin_intermediate_ereputationv2_reviews_distinct_v2_prod_cube_453e6583_b7fb_4e62_8ffc_a330bb4e246f/.hive-staging_hive_2018-11-12_01-22-26_487_4731507377031334971-1/-ext-10000>,
expected: hdfs://bdp01
FAILED: Execution Error, return code 1 from
org.apache.hadoop.hive.ql.exec.MoveTask
So, Hive is complaining that it is expecting to see the new HA syntax which is:
hdfs://<ha_service_name>/ instead of hdfs://<namenode_host>:<namenode_port>/
It looks like Kylin is generating HIVE statements that use the old namenode
host syntax, but needs to somehow be configured to use the new HDFS HA syntax.
I appreciate any help!!!
-Phil