Hi Moisés,
If I have understood right you have been able to deploy Kylin on EMR successfully . However you lose metadata when you terminate the cluster. is it right? Have you tried to restore Kylin metadata backup after cluster re-creation? Moreover, do you enable all HBase tables after cluster re-creation? We successfully deployed Kylin on EMR using S3 as storage for HBase and Hive. But our configuration differ by 2 points: · We use EMRFS https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-fs.html o { o "Classification": "emrfs-site", o "Properties": { o "fs.s3.consistent.retryPeriodSeconds": "10", o "fs.s3.consistent": "true", o "fs.s3.consistent.retryCount": "5", o "fs.s3.consistent.metadata.tableName": "EmrFSMetadata" o }, o "Configurations": [] o } · We deployed Kylin on an EC2 machine separated from the cluster. I hope this helps you. Roberto Tardío From: Moisés Català [mailto:[email protected]] Sent: sábado, 28 de julio de 2018 16:17 To: [email protected] Subject: Kylin with S3, cubes tables get in transition when new cluster booted Hi all, I’ve followed carefully the instructions provided in <http://kylin.apache.org/docs23/install/kylin_aws_emr.html> http://kylin.apache.org/docs23/install/kylin_aws_emr.html My idea is to use s3 as the storage for Hbase, I have configured the cluster following the instructions but I get that tables that contain cube definition keep "on transition" when deploying new cluster and Kylie metadata seems outdated... These are the steps I follow to create the cluster Cluster creation command: aws emr create-cluster \ --applications Name=Hadoop Name=Hue Name=Spark Name=Zeppelin Name=Ganglia Name=Hive Name=Hbase Name=HCatalog Name=Tez \ --tags 'hive=' 'spark=' 'zeppelin=' \ --ec2-attributes ' <file://../config/ec2-attributes.json> file://../config/ec2-attributes.json' \ --release-label emr-5.16.0 \ --log-uri 's3n://sns-da-logs/' \ --instance-groups ' <file://../config/instance-hive-datawarehouse.json> file://../config/instance-hive-datawarehouse.json' \ --configurations ' <file://../config/hive-hbase-s3.json> file://../config/hive-hbase-s3.json' \ --auto-scaling-role EMR_AutoScaling_DefaultRole \ --ebs-root-volume-size 10 \ --service-role EMR_DefaultRole \ --enable-debugging \ --name 'hbase-hive-datawarehouse' \ --scale-down-behavior TERMINATE_AT_TASK_COMPLETION \ --region us-east-1 My configuration hive-hbase-s3.json: [ { "Classification": "hive-site", "Configurations": [], "Properties": { "hive.metastore.warehouse.dir": "s3://xxxxxxxx-datawarehouse/hive.db", "javax.jdo.option.ConnectionDriverName": "org.mariadb.jdbc.Driver", "javax.jdo.option.ConnectionPassword": “xxxxx", "javax.jdo.option.ConnectionURL": "jdbc: <mysql://xxxxxx:3306/hive_metastore?createDatabaseIfNotExist=true> mysql://xxxxxx:3306/hive_metastore?createDatabaseIfNotExist=true", "javax.jdo.option.ConnectionUserName": “xxxx" } }, { "Classification": "hbase", "Configurations": [], "Properties": { "hbase.emr.storageMode": "s3" } }, { "Classification": "hbase-site", "Configurations": [], "Properties": { "hbase.rpc.timeout": "3600000", "hbase.rootdir": "s3://xxxxxx-hbase/" } }, { "Classification": "core-site", "Properties": { "io.file.buffer.size": "65536" } }, { "Classification": "mapred-site", "Properties": { "mapred.map.tasks.speculative.execution": "false", "mapred.reduce.tasks.speculative.execution": "false", "mapreduce.map.speculative": "false", "mapreduce.reduce.speculative": "false" } } ] When I shut down the cluster I perform these commands: ../kylin_home/bin/kylin.sh stop #Before you shutdown/restart the cluster, you must backup the “/kylin” data on HDFS to S3 with S3DistCp, aws s3 rm s3://xxxxxx-config/metadata/kylin/* s3-dist-cp --src= <hdfs://kylin> hdfs:///kylin --dest=s3://xxxxxx-config/metadata/kylin bash /usr/lib/hbase/bin/disable_all_tables.sh Please, could you be so kind to indicate me what am I missing Thanks in advance
