Hi, Sometimes the EMRFS becomes inconsistent with S3; EMRFS uses a DynamoDB to cache the object entries and status. If you or your applications directly update S3 (not via EMRFS), then the entries in EMRFS are inconsistent.
You can refer to this post: https://stackoverflow.com/questions/39823283/emrfs-file-sync-with-s3-not-working In my experience, I did this one or two times: emrfs delete s3://path emrfs import s3://path emrfs sync s3://path The key point is, when using EMRFS, all update to the bucket should go through EMRFS, not S3. Hope this can help. 2018-07-30 23:26 GMT+08:00 Moisés Català <[email protected]>: > Thanks for the tips Roberto, > > You’re right, when I deploy emr and install Kylie everything works like a > charm, even I can build the sample cube. > > I have added the config you suggested about using EMRFS in emrfs-site and > I have launched a brand new cluster. > I also deployed Kylie and built the cube. Finally I shut down Kylin & > disabled all hbase tables. > > Unfortunately, when I launch a new cluster, hbase master node can’t boot, > looking the log appears this: > > 2018-07-30 15:00:31,103 ERROR [ip-172-31-85-0:16000.activeMasterManager] > *consistency.ConsistencyCheckerS3FileSystem: No s3 object for metadata item > /da-hbase/storage/data/hbase/meta/.tabledesc/.tableinfo.0000000001* > 2018-07-30 15:02:49,220 ERROR [ip-172-31-85-0:16000.activeMasterManager] > consistency.ConsistencyCheckerS3FileSystem: No s3 object for metadata item > /da-hbase/storage/data/hbase/meta/.tabledesc/.tableinfo.0000000001 > 2018-07-30 15:09:01,324 ERROR [ip-172-31-85-0:16000.activeMasterManager] > consistency.ConsistencyCheckerS3FileSystem: No s3 object for metadata item > /da-hbase/storage/data/hbase/meta/.tabledesc/.tableinfo.0000000001 > 2018-07-30 15:09:01,325 FATAL [ip-172-31-85-0:16000.activeMasterManager] > master.HMaster: Failed to become active master > com.amazon.ws.emr.hadoop.fs.consistency.exception.ConsistencyEx*ception: 1 > items inconsistent (no s3 object for associated metadata item).* First > object: /da-hbase/storage/data/hbase/meta/.tabledesc/.tableinfo.0000000001 > at > com.amazon.ws.emr.hadoop.fs.consistency.ConsistencyCheckerS3FileSystem.listStatus(ConsistencyCheckerS3FileSystem.java:749) > at > com.amazon.ws.emr.hadoop.fs.consistency.ConsistencyCheckerS3FileSystem.listStatus(ConsistencyCheckerS3FileSystem.java:519) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) > at com.sun.proxy.$Proxy30.listStatus(Unknown Source) > at > com.amazon.ws.emr.hadoop.fs.s3n2.S3NativeFileSystem2.listStatus(S3NativeFileSystem2.java:206) > at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1532) > at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1558) > at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1603) > at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1597) > at > com.amazon.ws.emr.hadoop.fs.EmrFileSystem.listStatus(EmrFileSystem.java:347) > at org.apache.hadoop.hbase.util.FSUtils.listStatus(FSUtils.java:1737) > at > org.apache.hadoop.hbase.util.FSTableDescriptors.getCurrentTableInfoStatus(FSTableDescriptors.java:377) > at > org.apache.hadoop.hbase.util.FSTableDescriptors.getTableInfoPath(FSTableDescriptors.java:358) > at > org.apache.hadoop.hbase.util.FSTableDescriptors.getTableInfoPath(FSTableDescriptors.java:339) > at > org.apache.hadoop.hbase.util.FSTableDescriptorMigrationToSubdir.needsMigration(FSTableDescriptorMigrationToSubdir.java:59) > at > org.apache.hadoop.hbase.util.FSTableDescriptorMigrationToSubdir.migrateFSTableDescriptorsIfNecessary(FSTableDescriptorMigrationToSubdir.java:45) > at > org.apache.hadoop.hbase.master.MasterFileSystem.checkRootDir(MasterFileSystem.java:526) > at > org.apache.hadoop.hbase.master.MasterFileSystem.createInitialFileSystemLayout(MasterFileSystem.java:166) > at > org.apache.hadoop.hbase.master.MasterFileSystem.<init>(MasterFileSystem.java:141) > at > org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:725) > at org.apache.hadoop.hbase.master.HMaster.access$600(HMaster.java:198) > at org.apache.hadoop.hbase.master.HMaster$2.run(HMaster.java:1907) > at java.lang.Thread.run(Thread.java:748) > 2018-07-30 15:09:01,326 FATAL [ip-172-31-85-0:16000.activeMasterManager] > *master.HMaster: Unhandled exception. Starting shutdown.* > > > I have attached the full log to the email. > > What am I missing??? > > Thanks in advance > > > > > > > > El 30 jul 2018, a las 9:02, <[email protected]> < > [email protected]> escribió: > > Hi Moisés, > > If I have understood right you have been able to deploy Kylin on EMR > successfully . However you lose metadata when you terminate the cluster. is > it right? > > Have you tried to restore Kylin metadata backup after cluster re-creation? > Moreover, do you enable all HBase tables after cluster re-creation? > > We successfully deployed Kylin on EMR using S3 as storage for HBase and > Hive. But our configuration differ by 2 points: > · We use EMRFS https://docs.aws.amazon.com/emr/latest/ > ManagementGuide/emr-fs.html > o { > o "Classification": "emrfs-site", > o "Properties": { > o "fs.s3.consistent.retryPeriodSeconds": > "10", > o "fs.s3.consistent": "true", > o "fs.s3.consistent.retryCount": "5", > o "fs.s3.consistent.metadata.tableName": > "EmrFSMetadata" > o }, > o "Configurations": [] > o } > · We deployed Kylin on an EC2 machine separated from the cluster. > > I hope this helps you. > > Roberto Tardío > > *From:* Moisés Català [mailto:[email protected] > <[email protected]>] > *Sent:* sábado, 28 de julio de 2018 16:17 > *To:* [email protected] > *Subject:* Kylin with S3, cubes tables get in transition when new cluster > booted > > Hi all, > > I’ve followed carefully the instructions provided in > http://kylin.apache.org/docs23/install/kylin_aws_emr.html > > My idea is to use s3 as the storage for Hbase, I have configured the > cluster following the instructions but I get that tables that contain cube > definition keep "on transition" when deploying new cluster and Kylie > metadata seems outdated... > > These are the steps I follow to create the cluster > > *Cluster creation command:* > > aws emr create-cluster \ > --applications Name=Hadoop Name=Hue Name=Spark Name=Zeppelin Name=Ganglia > Name=Hive Name=Hbase Name=HCatalog Name=Tez \ > --tags 'hive=' 'spark=' 'zeppelin=' \ > --ec2-attributes 'file://../config/ec2-attributes.json' \ > --release-label emr-*5.16.0 *\ > --log-uri 's3n://sns-da-logs/' \ > --instance-groups 'file://../config/instance-hive-datawarehouse.json' \ > --configurations 'file://../config/hive-hbase-s3.json' \ > --auto-scaling-role EMR_AutoScaling_DefaultRole \ > --ebs-root-volume-size 10 \ > --service-role EMR_DefaultRole \ > --enable-debugging \ > --name 'hbase-hive-datawarehouse' \ > --scale-down-behavior TERMINATE_AT_TASK_COMPLETION \ > --region us-east-1 > > > *My configuration hive-hbase-s3.json:* > > [ > { > "Classification": "hive-site", > "Configurations": [], > "Properties": { > "hive.metastore.warehouse.dir": "s3://xxxxxxxx-datawarehouse/hive.db > ", > "javax.jdo.option.ConnectionDriverName": "org.mariadb.jdbc.Driver", > "javax.jdo.option.ConnectionPassword": “xxxxx", > "javax.jdo.option.ConnectionURL": "jdbc:mysql://xxxxxx:3306/ > hive_metastore?createDatabaseIfNotExist=true", > "javax.jdo.option.ConnectionUserName": “xxxx" > } > }, > { > "Classification": "hbase", > "Configurations": [], > "Properties": { > "hbase.emr.storageMode": "s3" > } > }, > { > "Classification": "hbase-site", > "Configurations": [], > "Properties": { > "hbase.rpc.timeout": "3600000", > "hbase.rootdir": "s3://xxxxxx-hbase/" > } > }, > { > "Classification": "core-site", > "Properties": { > "io.file.buffer.size": "65536" > } > }, > { > "Classification": "mapred-site", > "Properties": { > "mapred.map.tasks.speculative.execution": "false", > "mapred.reduce.tasks.speculative.execution": "false", > "mapreduce.map.speculative": "false", > "mapreduce.reduce.speculative": "false" > > } > } > ] > > *When I shut down the cluster I perform these commands:* > > ../kylin_home/bin/kylin.sh stop > > > #Before you shutdown/restart the cluster, you must backup the “/kylin” > data on HDFS to S3 with S3DistCp, > > aws s3 rm s3://xxxxxx-config/metadata/kylin/* > s3-dist-cp --src=hdfs:///kylin --dest=s3://xxxxxx-config/metadata/kylin > > bash /usr/lib/hbase/bin/disable_all_tables.sh > > > Please, could you be so kind to indicate me what am I missing > > > Thanks in advance > > > > -- Best regards, Shaofeng Shi 史少锋
