Hi Moisés,

 

If I have understood right you have been able to deploy Kylin on EMR 
successfully . However you lose metadata when you terminate the cluster. is it 
right? 

 

Have you tried to restore Kylin metadata backup after cluster re-creation? 
Moreover, do you enable all HBase tables after cluster re-creation? 

 

We successfully deployed Kylin on EMR using S3 as storage for HBase and Hive. 
But our configuration differ by 2 points:

·         We use EMRFS 
https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-fs.html 

o   {

o                  "Classification": "emrfs-site",

o                  "Properties": {

o                                  "fs.s3.consistent.retryPeriodSeconds": "10",

o                                  "fs.s3.consistent": "true",

o                                  "fs.s3.consistent.retryCount": "5",

o                                  "fs.s3.consistent.metadata.tableName": 
"EmrFSMetadata"

o                  },

o                  "Configurations": []

o   }

·         We deployed Kylin on an EC2 machine separated from the cluster.

 

I hope this helps you.

 

Roberto Tardío

 

From: Moisés Català [mailto:[email protected]] 
Sent: sábado, 28 de julio de 2018 16:17
To: [email protected]
Subject: Kylin with S3, cubes tables get in transition when new cluster booted

 

Hi all,

 

I’ve followed carefully the instructions provided in  
<http://kylin.apache.org/docs23/install/kylin_aws_emr.html> 
http://kylin.apache.org/docs23/install/kylin_aws_emr.html 

 

My idea is to use s3 as the storage for Hbase, I have configured the cluster 
following the instructions but I get that tables that contain cube definition 
keep "on transition" when deploying new cluster and Kylie metadata seems 
outdated...

 

These are the steps I follow to create the cluster

 

Cluster creation command:

 

aws emr create-cluster \

--applications Name=Hadoop Name=Hue Name=Spark Name=Zeppelin Name=Ganglia 
Name=Hive Name=Hbase Name=HCatalog Name=Tez \

--tags 'hive=' 'spark=' 'zeppelin=' \

--ec2-attributes ' <file://../config/ec2-attributes.json> 
file://../config/ec2-attributes.json' \

--release-label emr-5.16.0 \

--log-uri 's3n://sns-da-logs/' \

--instance-groups ' <file://../config/instance-hive-datawarehouse.json> 
file://../config/instance-hive-datawarehouse.json' \

--configurations  ' <file://../config/hive-hbase-s3.json> 
file://../config/hive-hbase-s3.json' \

--auto-scaling-role EMR_AutoScaling_DefaultRole \

--ebs-root-volume-size 10 \

--service-role EMR_DefaultRole \

--enable-debugging \

--name 'hbase-hive-datawarehouse' \

--scale-down-behavior TERMINATE_AT_TASK_COMPLETION \

--region us-east-1

 

 

My configuration hive-hbase-s3.json:

 

[

  {

    "Classification": "hive-site",

    "Configurations": [],

    "Properties": {

      "hive.metastore.warehouse.dir": "s3://xxxxxxxx-datawarehouse/hive.db",

      "javax.jdo.option.ConnectionDriverName": "org.mariadb.jdbc.Driver",

      "javax.jdo.option.ConnectionPassword": “xxxxx",

      "javax.jdo.option.ConnectionURL": "jdbc: 
<mysql://xxxxxx:3306/hive_metastore?createDatabaseIfNotExist=true> 
mysql://xxxxxx:3306/hive_metastore?createDatabaseIfNotExist=true",

      "javax.jdo.option.ConnectionUserName": “xxxx"

    }

  },

  {

    "Classification": "hbase",

    "Configurations": [],

    "Properties": {

      "hbase.emr.storageMode": "s3"

    }

  },

  {

    "Classification": "hbase-site",

    "Configurations": [],

    "Properties": {

      "hbase.rpc.timeout": "3600000",

      "hbase.rootdir": "s3://xxxxxx-hbase/"

    }

  },

  {

      "Classification": "core-site",

      "Properties": {

        "io.file.buffer.size": "65536"

      }

  },

  {

      "Classification": "mapred-site",

      "Properties": {

        "mapred.map.tasks.speculative.execution": "false",

        "mapred.reduce.tasks.speculative.execution": "false",

        "mapreduce.map.speculative": "false",

        "mapreduce.reduce.speculative": "false"

 

      }

  } 

]

 

When I shut down the cluster I perform these commands:

 

../kylin_home/bin/kylin.sh stop

 

 

#Before you shutdown/restart the cluster, you must backup the “/kylin” data on 
HDFS to S3 with S3DistCp,

  

aws s3 rm s3://xxxxxx-config/metadata/kylin/*

s3-dist-cp --src= <hdfs://kylin> hdfs:///kylin 
--dest=s3://xxxxxx-config/metadata/kylin

 

bash /usr/lib/hbase/bin/disable_all_tables.sh

 

 

Please, could you be so kind to indicate me what am I missing

 

 

Thanks in advance

 

Reply via email to