Unfortunately, that feature wasn't added until 2.0, which hasn't yet been released, but I'm hoping it will be later this year.
However, I'm not convinced this is a write pattern issue, though. I commented on https://github.com/GoogleCloudPlatform/bigdata-interop/issues/103#issuecomment-399608543 On Fri, Jun 22, 2018 at 1:50 PM Stephen Meyles <smey...@gmail.com> wrote: > Knowing that HBase has been run successfully on ADLS, went looking there > (as they have the same WAL write pattern). This is informative: > > > https://www.cloudera.com/documentation/enterprise/5-12-x/topics/admin_using_adls_storage_with_hbase.html > > which suggests a need to split the WALs off on HDFS proper versus ADLS (or > presumably GCS) barring changes in the underlying semantics of each. AFAICT > you can't currently configure Accumulo to send WAL logs to a separate > cluster - is this correct? > > S. > > > On Fri, Jun 22, 2018 at 9:07 AM, Stephen Meyles <smey...@gmail.com> wrote: > >> > Did you try to adjust any Accumulo properties to do bigger writes less >> frequently or something like that? >> >> We're using BatchWriters and sending reasonable larges batches of >> Mutations. Given the stack traces in both our cases are related to WAL >> writes it seems like batch size would be the only tweak available here >> (though, without reading the code carefully it's not even clear to me that >> is impactful) but if there others have suggestions I'd be happy to try. >> >> Given we have this working well and stable in other clusters atop >> traditional HDFS I'm currently pursuing this further with the MS to >> understand the variance to ADLS. Depending what emerges from that I may >> circle back with more details and a bug report and start digging in more >> deeply to the relevant code in Accumulo. >> >> S. >> >> >> On Fri, Jun 22, 2018 at 6:09 AM, Maxim Kolchin <kolchin...@gmail.com> >> wrote: >> >>> > If somebody is interested in using Accumulo on GCS, I'd like to >>> encourage them to submit any bugs they encounter, and any patches (if they >>> are able) which resolve those bugs. >>> >>> I'd like to contribute a fix, but I don't know where to start. We tried >>> to get any help from the Google Support about [1] over email, but they just >>> say that the GCS doesn't support such write pattern. In the end, we can >>> only guess how to adjust the Accumulo behaviour to minimise broken >>> connections to the GCS. >>> >>> BTW although we observe this exception, the tablet server doesn't fail, >>> so it means that after some retries it is able to write WALs to GCS. >>> >>> @Stephen, >>> >>> > as discussions with MS engineers have suggested, similar to the GCS >>> thread, that small writes at high volume are, at best, suboptimal for ADLS. >>> >>> Did you try to adjust any Accumulo properties to do bigger writes less >>> frequently or something like that? >>> >>> [1]: https://github.com/GoogleCloudPlatform/bigdata-interop/issues/103 >>> >>> Maxim >>> >>> On Thu, Jun 21, 2018 at 7:17 AM Stephen Meyles <smey...@gmail.com> >>> wrote: >>> >>>> I think we're seeing something similar but in our case we're trying to >>>> run Accumulo atop ADLS. When we generate sufficient write load we start to >>>> see stack traces like the following: >>>> >>>> [log.DfsLogger] ERROR: Failed to write log entries >>>> java.io.IOException: attempting to write to a closed stream; >>>> at >>>> com.microsoft.azure.datalake.store.ADLFileOutputStream.write(ADLFileOutputStream.java:88) >>>> at >>>> com.microsoft.azure.datalake.store.ADLFileOutputStream.write(ADLFileOutputStream.java:77) >>>> at >>>> org.apache.hadoop.fs.adl.AdlFsOutputStream.write(AdlFsOutputStream.java:57) >>>> at >>>> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:48) >>>> at java.io.DataOutputStream.write(DataOutputStream.java:88) >>>> at java.io.DataOutputStream.writeByte(DataOutputStream.java:153) >>>> at >>>> org.apache.accumulo.tserver.logger.LogFileKey.write(LogFileKey.java:87) >>>> at org.apache.accumulo.tserver.log.DfsLogger.write(DfsLogger.java:537) >>>> >>>> We have developed a rudimentary LogCloser implementation that allows us >>>> to recover from this but overall performance is significantly impacted by >>>> this. >>>> >>>> > As for the WAL closing issue on GCS, I recall a previous thread >>>> about that >>>> >>>> I searched more for this but wasn't able to find anything, nor similar >>>> re: ADL. I am also curious about the earlier question: >>>> >>>> >> Does Accumulo have a specific write pattern [to WALs], so that file >>>> system may not support it? >>>> >>>> as discussions with MS engineers have suggested, similar to the GCS >>>> thread, that small writes at high volume are, at best, suboptimal for ADLS. >>>> >>>> Regards >>>> >>>> Stephen >>>> >>>> >>>> On Wed, Jun 20, 2018 at 11:20 AM, Christopher <ctubb...@apache.org> >>>> wrote: >>>> >>>>> For what it's worth, this is an Apache project, not a Sqrrl project. >>>>> Amazon is free to contribute to Accumulo to improve its support of their >>>>> platform, just as anybody is free to do. Amazon may start contributing >>>>> more >>>>> as a result of their acquisition... or they may not. There is no reason to >>>>> expect that their acquisition will have any impact whatsoever on the >>>>> platforms Accumulo supports, because Accumulo is not, and has not ever >>>>> been, a Sqrrl project (although some Sqrrl employees have contributed), >>>>> and >>>>> thus will not become an Amazon project. It has been, and will remain, a >>>>> vendor-neutral Apache project. Regardless, we welcome contributions from >>>>> anybody which would improve Accumulo's support of any additional platform >>>>> alternatives to HDFS, whether it be GCS, S3, or something else. >>>>> >>>>> As for the WAL closing issue on GCS, I recall a previous thread about >>>>> that... I think a simple patch might be possible to solve that issue, but >>>>> to date, nobody has contributed a fix. If somebody is interested in using >>>>> Accumulo on GCS, I'd like to encourage them to submit any bugs they >>>>> encounter, and any patches (if they are able) which resolve those bugs. If >>>>> they need help submitting a fix, please ask on the dev@ list. >>>>> >>>>> >>>>> >>>>> On Wed, Jun 20, 2018 at 8:21 AM Geoffry Roberts < >>>>> threadedb...@gmail.com> wrote: >>>>> >>>>>> Maxim, >>>>>> >>>>>> Interesting that you were able to run A on GCS. I never thought of >>>>>> that--good to know. >>>>>> >>>>>> Since I am now an AWS guy (at least or the time being), in light of >>>>>> the fact that Amazon purchased Sqrrl, I am interested to see what >>>>>> develops. >>>>>> >>>>>> >>>>>> On Wed, Jun 20, 2018 at 5:15 AM, Maxim Kolchin <kolchin...@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> Hi Geoffry, >>>>>>> >>>>>>> Thank you for the feedback! >>>>>>> >>>>>>> Thanks to [1, 2], I was able to run Accumulo cluster on Google VMs >>>>>>> and with GCS instead of HDFS. And I used Google Dataproc to run Hadoop >>>>>>> jobs >>>>>>> on Accumulo. Almost everything was good until I've not faced some >>>>>>> connection issues with GCS. Quite often, the connection to GCS breaks on >>>>>>> writing or closing WALs. >>>>>>> >>>>>>> To all, >>>>>>> >>>>>>> Does Accumulo have a specific write pattern, so that file system may >>>>>>> not support it? Are there Accumulo properties which I can play with to >>>>>>> adjust the write pattern? >>>>>>> >>>>>>> [1]: https://github.com/cybermaggedon/accumulo-gs >>>>>>> [2]: https://github.com/cybermaggedon/accumulo-docker >>>>>>> >>>>>>> Thank you! >>>>>>> Maxim >>>>>>> >>>>>>> On Tue, Jun 19, 2018 at 10:31 PM Geoffry Roberts < >>>>>>> threadedb...@gmail.com> wrote: >>>>>>> >>>>>>>> I tried running Accumulo on Google. I first tried running it on >>>>>>>> Google's pre-made Hadoop. I found the various file paths one must >>>>>>>> contend >>>>>>>> with are different on Google than on a straight download from Apache. >>>>>>>> It >>>>>>>> seems they moved things around. To counter this, I installed my own >>>>>>>> Hadoop >>>>>>>> along with Zookeeper and Accumulo on a Google node. All went well >>>>>>>> until >>>>>>>> one fine day when I could no longer log in. It seems Google had >>>>>>>> pushed out >>>>>>>> some changes over night that broke my client side Google Cloud >>>>>>>> installation. Google referred the affected to a lengthy, >>>>>>>> easy-to-make-a-mistake procedure for resolving the issue. >>>>>>>> >>>>>>>> I decided life was too short for this kind of thing and switched to >>>>>>>> Amazon. >>>>>>>> >>>>>>>> On Tue, Jun 19, 2018 at 7:34 AM, Maxim Kolchin < >>>>>>>> kolchin...@gmail.com> wrote: >>>>>>>> >>>>>>>>> Hi all, >>>>>>>>> >>>>>>>>> Does anyone have experience running Accumulo on top of Google >>>>>>>>> Cloud Storage instead of HDFS? In [1] you can see some details if you >>>>>>>>> never >>>>>>>>> heard about this feature. >>>>>>>>> >>>>>>>>> I see some discussion (see [2], [3]) around this topic, but it >>>>>>>>> looks to me that this isn't as popular as, I believe, should be. >>>>>>>>> >>>>>>>>> [1]: >>>>>>>>> https://cloud.google.com/dataproc/docs/concepts/connectors/cloud-storage >>>>>>>>> [2]: https://github.com/apache/accumulo/issues/428 >>>>>>>>> [3]: >>>>>>>>> https://github.com/GoogleCloudPlatform/bigdata-interop/issues/103 >>>>>>>>> >>>>>>>>> Best regards, >>>>>>>>> Maxim >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> There are ways and there are ways, >>>>>>>> >>>>>>>> Geoffry Roberts >>>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> There are ways and there are ways, >>>>>> >>>>>> Geoffry Roberts >>>>>> >>>>> >>>> >> >