Hi all,
I’ve followed carefully the instructions provided in
http://kylin.apache.org/docs23/install/kylin_aws_emr.html
<http://kylin.apache.org/docs23/install/kylin_aws_emr.html>
My idea is to use s3 as the storage for Hbase, I have configured the cluster
following the instructions but I get that tables that contain cube definition
keep "on transition" when deploying new cluster and Kylie metadata seems
outdated...
These are the steps I follow to create the cluster
Cluster creation command:
aws emr create-cluster \
--applications Name=Hadoop Name=Hue Name=Spark Name=Zeppelin Name=Ganglia
Name=Hive Name=Hbase Name=HCatalog Name=Tez \
--tags 'hive=' 'spark=' 'zeppelin=' \
--ec2-attributes 'file://../config/ec2-attributes.json
<file://../config/ec2-attributes.json>' \
--release-label emr-5.16.0 \
--log-uri 's3n://sns-da-logs/ <s3n://sns-da-logs/>' \
--instance-groups 'file://../config/instance-hive-datawarehouse.json
<file://../config/instance-hive-datawarehouse.json>' \
--configurations 'file://../config/hive-hbase-s3.json
<file://../config/hive-hbase-s3.json>' \
--auto-scaling-role EMR_AutoScaling_DefaultRole \
--ebs-root-volume-size 10 \
--service-role EMR_DefaultRole \
--enable-debugging \
--name 'hbase-hive-datawarehouse' \
--scale-down-behavior TERMINATE_AT_TASK_COMPLETION \
--region us-east-1
My configuration hive-hbase-s3.json:
[
{
"Classification": "hive-site",
"Configurations": [],
"Properties": {
"hive.metastore.warehouse.dir": "s3://xxxxxxxx-datawarehouse/hive.db
<s3://xxxxxxxx-datawarehouse/hive.db>",
"javax.jdo.option.ConnectionDriverName": "org.mariadb.jdbc.Driver",
"javax.jdo.option.ConnectionPassword": “xxxxx",
"javax.jdo.option.ConnectionURL":
"jdbc:mysql://xxxxxx:3306/hive_metastore?createDatabaseIfNotExist=true
<mysql://xxxxxx:3306/hive_metastore?createDatabaseIfNotExist=true>",
"javax.jdo.option.ConnectionUserName": “xxxx"
}
},
{
"Classification": "hbase",
"Configurations": [],
"Properties": {
"hbase.emr.storageMode": "s3"
}
},
{
"Classification": "hbase-site",
"Configurations": [],
"Properties": {
"hbase.rpc.timeout": "3600000",
"hbase.rootdir": "s3://xxxxxx-hbase/ <s3://xxxxxx-hbase/>"
}
},
{
"Classification": "core-site",
"Properties": {
"io.file.buffer.size": "65536"
}
},
{
"Classification": "mapred-site",
"Properties": {
"mapred.map.tasks.speculative.execution": "false",
"mapred.reduce.tasks.speculative.execution": "false",
"mapreduce.map.speculative": "false",
"mapreduce.reduce.speculative": "false"
}
}
]
When I shut down the cluster I perform these commands:
../kylin_home/bin/kylin.sh stop
#Before you shutdown/restart the cluster, you must backup the “/kylin” data on
HDFS to S3 with S3DistCp,
aws s3 rm s3://xxxxxx-config/metadata/kylin/*
<s3://xxxxxx-config/metadata/kylin/*>
s3-dist-cp --src=hdfs:///kylin <hdfs:///kylin>
--dest=s3://xxxxxx-config/metadata/kylin <s3://xxxxxx-config/metadata/kylin>
bash /usr/lib/hbase/bin/disable_all_tables.sh
Please, could you be so kind to indicate me what am I missing
Thanks in advance