RE: [EXTERNAL] Re: Accumulo on S3

Arvind Shyamsundar Fri, 03 Apr 2020 09:55:24 -0700

hi Josh - I do have a recording of your talk from Nov 12, 2019. Let me 
separately work with Marc Parisi and yourself on an appropriate way to share 
broadly and then we can update this thread.


Thanks.

Arvind Shyamsundar

-----Original Message-----
From: Josh Elser <[email protected]> 
Sent: Friday, April 3, 2020 9:10 AM
To: [email protected]
Subject: [EXTERNAL] Re: Accumulo on S3

It sounds like you're running into the known S3 consistency issues. 
However, I don't know what exactly EMRFS is supposed to support all of the 
things that Accumulo requires. I would assume that EMRFS should be bridging the 
gap from S3 (a blobstore) to a consistent, distributed FileSystem that Accumulo 
provides. Their summary[1] indicates that consistent listings and 
read-after-write is solve which is a big problem. Not sure if you are supposed 
to also get atomic rename from it.

This presentation[2] should be a good primer I put together earlier this year 
on cloud storage for BigTables which may help you understand what's going on. I 
gave it at a meetup here in MD a couple of months back, but I don't think we 
were recording it.

[1] 
https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs.aws.amazon.com%2Femr%2Flatest%2FManagementGuide%2Femr-fs.html&amp;data=02%7C01%7Carvindsh%40microsoft.com%7C754bbcda266842aaf1f908d7d7e98916%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637215270338426423&amp;sdata=ywPGgV11aBQZqH%2BcvepDlWQuw0L8jmeSzftR7Zc0Jx4%3D&amp;reserved=0
[2]
https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdrive.google.com%2Ffile%2Fd%2F1Or1s-X0JjiLM87HKIOWlh3WlkdUQfYH9%2Fview%3Fusp%3Dsharing&amp;data=02%7C01%7Carvindsh%40microsoft.com%7C754bbcda266842aaf1f908d7d7e98916%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637215270338426423&amp;sdata=hW9tOmnL63w6D2AC3f8MJ7v5GvpY69EEmcbj5%2FDffEI%3D&amp;reserved=0

On 4/2/20 3:56 PM, Kevin Hobbs wrote:
> Accumulo Users,
> 
> Is AWS EMR's "EMRFS consistent view" useful or required for Accumulo2 
> on S3? Has anyone else tried EMR + Accumulo2 on S3?
> 
> I have incorporated *most* of the steps in the blog post
> 
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Faccu
> mulo.apache.org%2Fblog%2F2019%2F09%2F10%2Faccumulo-S3-notes.html&amp;d
> ata=02%7C01%7Carvindsh%40microsoft.com%7C754bbcda266842aaf1f908d7d7e98
> 916%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637215270338436394&am
> p;sdata=DjyhduLB12AzSdR5GnbxVABVmupH3YeL%2FQFAhBlmwpQ%3D&amp;reserved=
> 0
> 
> into an AWS EMR bootstrap action, that creates an Accumulo cluster 
> running on emr-6.0.0-beta2. I have not used the hadoop-aws-relocated 
> jar as the emr jars are available.
> 
> I am able to use a GeoMesa snapshot to ingest and retrieve data on the
> s3 volume. However, I just tried an ingest of about 10GB which 
> progressed smoothly for a while until the masters  web UI reported 
> "MajC Failed, extent = a<;":
> 
> java.io.IOException: Rename
> s3://THEBUCKET/accumulo/tables/a/default_tablet/A00000ci.rf_tmp to 
> s3://THEBUCKET/accumulo/tables/a/default_tablet/A00000ci.rf returned 
> false
>      at
> org.apache.accumulo.tserver.tablet.DatafileManager.rename(DatafileMana
> ger.java:85)
> 
>      at
> org.apache.accumulo.tserver.tablet.DatafileManager.bringMajorCompactio
> nOnline(DatafileManager.java:533)
> 
>      at
> org.apache.accumulo.tserver.tablet.Tablet._majorCompact(Tablet.java:20
> 51)
>      at
> org.apache.accumulo.tserver.tablet.Tablet.majorCompact(Tablet.java:216
> 4)
>      at
> org.apache.accumulo.tserver.tablet.CompactionRunner.run(CompactionRunn
> er.java:37)
> 
>      at 
> org.apache.htrace.wrappers.TraceRunnable.run(TraceRunnable.java:57)
>      at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.j
> ava:1149)
> 
>      at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.
> java:624)
> 
>      at
> org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java
> :35)
>      at java.lang.Thread.run(Thread.java:748)
> 
> 
> A bit later it reported:
> 
> java.io.FileNotFoundException: No such file or directory 
> 's3://THEBUCKET/accumulo/tables/c/t-0000090/F00000nz.rf'
>      at
> com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem.getFileStatus(S3Nat
> iveFileSystem.java:808)
> 
>      at
> com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem.open(S3NativeFileSy
> stem.java:1212)
> 
>      at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:902)
>      at
> com.amazon.ws.emr.hadoop.fs.EmrFileSystem.open(EmrFileSystem.java:207)
>      at
> org.apache.accumulo.core.file.blockfile.impl.CachableBlockFile$Cachabl
> eBuilder.lambda$fsPath$0(CachableBlockFile.java:91)
> 
>      at
> org.apache.accumulo.core.file.blockfile.impl.CachableBlockFile$Reader.
> getBCFile(CachableBlockFile.java:172)
> 
>      at
> org.apache.accumulo.core.file.blockfile.impl.CachableBlockFile$Reader.
> getMetaBlock(CachableBlockFile.java:400)
> 
>      at
> org.apache.accumulo.core.file.rfile.RFile$Reader.<init>(RFile.java:115
> 6)
>      at
> org.apache.accumulo.core.file.rfile.RFile$Reader.<init>(RFile.java:125
> 1)
>      at
> org.apache.accumulo.core.file.rfile.RFileOperations.getReader(RFileOpe
> rations.java:53)
> 
>      at
> org.apache.accumulo.core.file.rfile.RFileOperations.openReader(RFileOp
> erations.java:68)
> 
>      at
> org.apache.accumulo.core.file.DispatchingFileFactory.openReader(Dispat
> chingFileFactory.java:83)
> 
>      at
> org.apache.accumulo.core.file.FileOperations$ReaderBuilder.build(FileO
> perations.java:478)
> 
>      at
> org.apache.accumulo.tserver.tablet.Compactor.openMapDataFiles(Compacto
> r.java:299)
> 
>      at
> org.apache.accumulo.tserver.tablet.Compactor.compactLocalityGroup(Comp
> actor.java:344)
> 
>      at
> org.apache.accumulo.tserver.tablet.Compactor.call(Compactor.java:225)
>      at
> org.apache.accumulo.tserver.tablet.Tablet._majorCompact(Tablet.java:20
> 39)
>      at
> org.apache.accumulo.tserver.tablet.Tablet.majorCompact(Tablet.java:216
> 4)
>      at
> org.apache.accumulo.tserver.tablet.CompactionRunner.run(CompactionRunn
> er.java:37)
> 
>      at 
> org.apache.htrace.wrappers.TraceRunnable.run(TraceRunnable.java:57)
>      at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.j
> ava:1149)
> 
>      at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.
> java:624)
> 
>      at
> org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java
> :35)
>      at java.lang.Thread.run(Thread.java:748)
> 
> 
> These seem like the same sort of problems HBASE on EMR can have when 
> EMRFS isn't functioning properly.
> 
> --Kevin
> 
> On 3/3/20 1:57 PM, Jim Hughes wrote:
>> Hi all,
>>
>> The next major release of GeoMesa is aimed at supporting Accumulo 2.x. 
>> As part of testing, my coworker Kevin and I are trying out Accumulo
>> 2.0 on S3.
>>
>> Keith's blog post[1] is great.  As people have tested Accumulo 2.0 in 
>> AWS, has anyone tried using EMR for the underlying HDFS cluster (and 
>> then installing Accumulo via bootstrap actions)?  Is there a 
>> preferred/suggested deployment strategy?
>>
>> Cheers,
>>
>> Jim
>>
>> 1. 
>> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Faccumulo.apache.org%2Fblog%2F2019%2F09%2F10%2Faccumulo-S3-notes.html&amp;data=02%7C01%7Carvindsh%40microsoft.com%7C754bbcda266842aaf1f908d7d7e98916%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637215270338436394&amp;sdata=DjyhduLB12AzSdR5GnbxVABVmupH3YeL%2FQFAhBlmwpQ%3D&amp;reserved=0
>>

RE: [EXTERNAL] Re: Accumulo on S3

Reply via email to