Thanks, I'll check this out.
El lun., 6 ago. 2018 3:10, ShaoFeng Shi <[email protected]> escribió: > Hi Moses, > > Your steps look good. The re-sync EMRFS cache is optional, you can decide > whether to add that; > > One more thing to take care; The HBase tables Kylin created will have a > coprocessor registered. The coprocessor jar location is on HDFS (because > you using "hdfs:///kylin" as Kylin's working dir). When EMR cluster > re-started, the NN will change, so the old coprocessor location will be out > of date, which causes HBase region server start error. In this case, you > need to re-deploy the coprocessor: > > https://kylin.apache.org/docs/howto/howto_update_coprocessor.html > > Please take a try. When you encounter the problem again, please check > HBase master and region server's log as well. Besides, AWS support is > worth to try. > > In the Cloud S3 has higher reliability and durability than HDFS. Suggest > you using HDFS for intermediate storage (like Cube computing), S3 as the > final storage (HBase); And using a cron job to incrementally backup HDFS > data to S3. > > > 2018-08-02 16:35 GMT+08:00 Moisés Català <[email protected]> > : > >> Thanks ShaoFeng Shi, >> >> As recommended in the installation tutorial, I use HDFS for intermediate >> data storage, so before shutting down the cluster I back up >> hdfs://user/kylin in S3 with dist-cp. >> >> I have 2 buckets and I don’t do any modifications to S3 hbase root >> directly,these are my buckets : >> >> >> - Configuration Bucket in s3://*xxxx-config*/metadata/kylin where I >> store The contents of *hdfs*:///user/Kylin >> - HBASE rootdir in s3://*xxxx-hbase*/storage >> >> >> When I shut down the cluster I execute these commands in a shutdown >> script: >> >> #!/bin/bash >> #stop kylin >> *$KYLIN_HOME/bin/kylin.sh stop* >> >> #To shut down an Amazon EMR cluster without losing data that hasn’t been >> written to Amazon S3, >> #the MemStore cache needs to flush to Amazon S3 to write new store files. >> #To do this, you can run a shell script provided on the EMR cluster. >> >> *bash /usr/lib/hbase/bin/disable_all_tables.sh* >> >> >> #Before you shutdown/restart the cluster, you must backup the “/kylin” >> data on HDFS to S3 with S3DistCp, >> # or you may lost data and couldn’t recover the cluster later. >> >> *s3-dist-cp --src=hdfs:///kylin --dest=s3://da-config/metadata/kylin* >> >> >> s3-dist-cp creates a hadoop Job, so it will be monitored by consistent >> view in EMRFS. >> >> So should I add these commands to my shutdown script?: >> >> *emrfs delete s3://xxxx-config/metadata/kylin * >> >> *emrfs import s3://xxxx-config/metadata/kylin * >> >> *emrfs sync s3://xxxx-config/metadata/kylin * >> >> *emrfs delete s3://xxxx-hbase/storage* >> >> *emrfs import s3://xxxx-**hbase/storage* >> >> *emrfs sync s3://xxxx-**hbase/storage* >> >> Should I do something in the hbase root directory in S3? >> >> When I start a brand new cluster, apart of doing: >> >> hadoop fs -mkdir /kylin >> s3-dist-cp --src=s3://xxxx-config/metadata/kylin --dest=hdfs:///kylin >> >> do I have to do any other action? >> >> Thank you very much for your help, >> >> A final question, ¿Is it worth using S3 as hbase storage for a production >> environment o it would be safer using just HDFS? My plan is to use Hive + >> Kylin as EDW >> >> >> >> >> Moisés Català >> Senior Data Engineer >> La Cupula Music - Sonosuite >> T: *+34 93 250 38 05* >> www.lacupulamusic.com >> >> >> El 1 ago 2018, a las 3:10, ShaoFeng Shi <[email protected]> >> escribió: >> >> Hi, >> >> Sometimes the EMRFS becomes inconsistent with S3; EMRFS uses a DynamoDB >> to cache the object entries and status. If you or your applications >> directly update S3 (not via EMRFS), then the entries in EMRFS are >> inconsistent. >> >> You can refer to this post: >> https://stackoverflow.com/questions/39823283/emrfs-file-sync-with-s3-not-working >> >> In my experience, I did this one or two times: >> >> emrfs delete s3://path >> >> emrfs import s3://path >> >> emrfs sync s3://path >> >> The key point is, when using EMRFS, all update to the bucket should go >> through EMRFS, not S3. Hope this can help. >> >> 2018-07-30 23:26 GMT+08:00 Moisés Català <[email protected] >> >: >> >>> Thanks for the tips Roberto, >>> >>> You’re right, when I deploy emr and install Kylie everything works like >>> a charm, even I can build the sample cube. >>> >>> I have added the config you suggested about using EMRFS in emrfs-site >>> and I have launched a brand new cluster. >>> I also deployed Kylie and built the cube. Finally I shut down Kylin & >>> disabled all hbase tables. >>> >>> Unfortunately, when I launch a new cluster, hbase master node can’t >>> boot, looking the log appears this: >>> >>> 2018-07-30 15:00:31,103 ERROR [ip-172-31-85-0:16000.activeMasterManager] >>> *consistency.ConsistencyCheckerS3FileSystem: No s3 object for metadata item >>> /da-hbase/storage/data/hbase/meta/.tabledesc/.tableinfo.0000000001* >>> 2018-07-30 15:02:49,220 ERROR [ip-172-31-85-0:16000.activeMasterManager] >>> consistency.ConsistencyCheckerS3FileSystem: No s3 object for metadata item >>> /da-hbase/storage/data/hbase/meta/.tabledesc/.tableinfo.0000000001 >>> 2018-07-30 15:09:01,324 ERROR [ip-172-31-85-0:16000.activeMasterManager] >>> consistency.ConsistencyCheckerS3FileSystem: No s3 object for metadata item >>> /da-hbase/storage/data/hbase/meta/.tabledesc/.tableinfo.0000000001 >>> 2018-07-30 15:09:01,325 FATAL [ip-172-31-85-0:16000.activeMasterManager] >>> master.HMaster: Failed to become active >>> mastercom.amazon.ws.emr.hadoop.fs.consistency.exception.ConsistencyEx*ception: >>> 1 items inconsistent (no s3 object for associated metadata item).* First >>> object: /da-hbase/storage/data/hbase/meta/.tabledesc/.tableinfo.0000000001 >>> at >>> com.amazon.ws.emr.hadoop.fs.consistency.ConsistencyCheckerS3FileSystem.listStatus(ConsistencyCheckerS3FileSystem.java:749) >>> at >>> com.amazon.ws.emr.hadoop.fs.consistency.ConsistencyCheckerS3FileSystem.listStatus(ConsistencyCheckerS3FileSystem.java:519) >>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >>> at >>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) >>> at >>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >>> at java.lang.reflect.Method.invoke(Method.java:498) >>> at >>> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191) >>> at >>> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) >>> at com.sun.proxy.$Proxy30.listStatus(Unknown Source) >>> at >>> com.amazon.ws.emr.hadoop.fs.s3n2.S3NativeFileSystem2.listStatus(S3NativeFileSystem2.java:206) >>> at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1532) >>> at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1558) >>> at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1603) >>> at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1597) >>> at >>> com.amazon.ws.emr.hadoop.fs.EmrFileSystem.listStatus(EmrFileSystem.java:347) >>> at org.apache.hadoop.hbase.util.FSUtils.listStatus(FSUtils.java:1737) >>> at >>> org.apache.hadoop.hbase.util.FSTableDescriptors.getCurrentTableInfoStatus(FSTableDescriptors.java:377) >>> at >>> org.apache.hadoop.hbase.util.FSTableDescriptors.getTableInfoPath(FSTableDescriptors.java:358) >>> at >>> org.apache.hadoop.hbase.util.FSTableDescriptors.getTableInfoPath(FSTableDescriptors.java:339) >>> at >>> org.apache.hadoop.hbase.util.FSTableDescriptorMigrationToSubdir.needsMigration(FSTableDescriptorMigrationToSubdir.java:59) >>> at >>> org.apache.hadoop.hbase.util.FSTableDescriptorMigrationToSubdir.migrateFSTableDescriptorsIfNecessary(FSTableDescriptorMigrationToSubdir.java:45) >>> at >>> org.apache.hadoop.hbase.master.MasterFileSystem.checkRootDir(MasterFileSystem.java:526) >>> at >>> org.apache.hadoop.hbase.master.MasterFileSystem.createInitialFileSystemLayout(MasterFileSystem.java:166) >>> at >>> org.apache.hadoop.hbase.master.MasterFileSystem.<init>(MasterFileSystem.java:141) >>> at >>> org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:725) >>> at org.apache.hadoop.hbase.master.HMaster.access$600(HMaster.java:198) >>> at org.apache.hadoop.hbase.master.HMaster$2.run(HMaster.java:1907) >>> at java.lang.Thread.run(Thread.java:748) >>> 2018-07-30 15:09:01,326 FATAL [ip-172-31-85-0:16000.activeMasterManager] >>> *master.HMaster: Unhandled exception. Starting shutdown.* >>> >>> >>> I have attached the full log to the email. >>> >>> What am I missing??? >>> >>> Thanks in advance >>> >>> >>> >>> >>> >>> >>> >>> El 30 jul 2018, a las 9:02, <[email protected]> < >>> [email protected]> escribió: >>> >>> Hi Moisés, >>> >>> If I have understood right you have been able to deploy Kylin on EMR >>> successfully . However you lose metadata when you terminate the cluster. is >>> it right? >>> >>> Have you tried to restore Kylin metadata backup after cluster >>> re-creation? Moreover, do you enable all HBase tables after cluster >>> re-creation? >>> >>> We successfully deployed Kylin on EMR using S3 as storage for HBase and >>> Hive. But our configuration differ by 2 points: >>> · We use EMRFS >>> https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-fs.html >>> o { >>> o "Classification": "emrfs-site", >>> o "Properties": { >>> o >>> "fs.s3.consistent.retryPeriodSeconds": "10", >>> o "fs.s3.consistent": "true", >>> o "fs.s3.consistent.retryCount": "5", >>> o >>> "fs.s3.consistent.metadata.tableName": "EmrFSMetadata" >>> o }, >>> o "Configurations": [] >>> o } >>> · We deployed Kylin on an EC2 machine separated from the >>> cluster. >>> >>> I hope this helps you. >>> >>> Roberto Tardío >>> >>> *From:* Moisés Català [mailto:[email protected] >>> <[email protected]>] >>> *Sent:* sábado, 28 de julio de 2018 16:17 >>> *To:* [email protected] >>> *Subject:* Kylin with S3, cubes tables get in transition when new >>> cluster booted >>> >>> Hi all, >>> >>> I’ve followed carefully the instructions provided in >>> http://kylin.apache.org/docs23/install/kylin_aws_emr.html >>> >>> My idea is to use s3 as the storage for Hbase, I have configured the >>> cluster following the instructions but I get that tables that contain cube >>> definition keep "on transition" when deploying new cluster and Kylie >>> metadata seems outdated... >>> >>> These are the steps I follow to create the cluster >>> >>> *Cluster creation command:* >>> >>> aws emr create-cluster \ >>> --applications Name=Hadoop Name=Hue Name=Spark Name=Zeppelin >>> Name=Ganglia Name=Hive Name=Hbase Name=HCatalog Name=Tez \ >>> --tags 'hive=' 'spark=' 'zeppelin=' \ >>> --ec2-attributes 'file://../config/ec2-attributes.json' \ >>> --release-label emr-*5.16.0 *\ >>> --log-uri 's3n://sns-da-logs/' \ >>> --instance-groups 'file://../config/instance-hive-datawarehouse.json' \ >>> --configurations 'file://../config/hive-hbase-s3.json' \ >>> --auto-scaling-role EMR_AutoScaling_DefaultRole \ >>> --ebs-root-volume-size 10 \ >>> --service-role EMR_DefaultRole \ >>> --enable-debugging \ >>> --name 'hbase-hive-datawarehouse' \ >>> --scale-down-behavior TERMINATE_AT_TASK_COMPLETION \ >>> --region us-east-1 >>> >>> >>> *My configuration hive-hbase-s3.json:* >>> >>> [ >>> { >>> "Classification": "hive-site", >>> "Configurations": [], >>> "Properties": { >>> "hive.metastore.warehouse.dir": " >>> s3://xxxxxxxx-datawarehouse/hive.db", >>> "javax.jdo.option.ConnectionDriverName": "org.mariadb.jdbc.Driver", >>> "javax.jdo.option.ConnectionPassword": “xxxxx", >>> "javax.jdo.option.ConnectionURL": "jdbc: >>> mysql://xxxxxx:3306/hive_metastore?createDatabaseIfNotExist=true", >>> "javax.jdo.option.ConnectionUserName": “xxxx" >>> } >>> }, >>> { >>> "Classification": "hbase", >>> "Configurations": [], >>> "Properties": { >>> "hbase.emr.storageMode": "s3" >>> } >>> }, >>> { >>> "Classification": "hbase-site", >>> "Configurations": [], >>> "Properties": { >>> "hbase.rpc.timeout": "3600000", >>> "hbase.rootdir": "s3://xxxxxx-hbase/" >>> } >>> }, >>> { >>> "Classification": "core-site", >>> "Properties": { >>> "io.file.buffer.size": "65536" >>> } >>> }, >>> { >>> "Classification": "mapred-site", >>> "Properties": { >>> "mapred.map.tasks.speculative.execution": "false", >>> "mapred.reduce.tasks.speculative.execution": "false", >>> "mapreduce.map.speculative": "false", >>> "mapreduce.reduce.speculative": "false" >>> >>> } >>> } >>> ] >>> >>> *When I shut down the cluster I perform these commands:* >>> >>> ../kylin_home/bin/kylin.sh stop >>> >>> >>> #Before you shutdown/restart the cluster, you must backup the “/kylin” >>> data on HDFS to S3 with S3DistCp, >>> >>> aws s3 rm s3://xxxxxx-config/metadata/kylin/* >>> s3-dist-cp --src=hdfs:///kylin --dest=s3://xxxxxx-config/metadata/kylin >>> >>> bash /usr/lib/hbase/bin/disable_all_tables.sh >>> >>> >>> Please, could you be so kind to indicate me what am I missing >>> >>> >>> Thanks in advance >>> >>> >>> >>> >> >> >> -- >> Best regards, >> >> Shaofeng Shi 史少锋 >> >> >> > > > -- > Best regards, > > Shaofeng Shi 史少锋 > >
