Just FYI: A separate discussion was started in the GCS connector issue tracker to come up with a way to support Accumulo. See https://github.com/GoogleCloudPlatform/bigdata-interop/issues/104
It'd be great to increase some attention to the issue, so please if everyone interested press the thumb up button :) Maxim On Fri, Jun 22, 2018 at 4:09 PM Maxim Kolchin <kolchin...@gmail.com> wrote: > > If somebody is interested in using Accumulo on GCS, I'd like to > encourage them to submit any bugs they encounter, and any patches (if they > are able) which resolve those bugs. > > I'd like to contribute a fix, but I don't know where to start. We tried to > get any help from the Google Support about [1] over email, but they just > say that the GCS doesn't support such write pattern. In the end, we can > only guess how to adjust the Accumulo behaviour to minimise broken > connections to the GCS. > > BTW although we observe this exception, the tablet server doesn't fail, so > it means that after some retries it is able to write WALs to GCS. > > @Stephen, > > > as discussions with MS engineers have suggested, similar to the GCS > thread, that small writes at high volume are, at best, suboptimal for ADLS. > > Did you try to adjust any Accumulo properties to do bigger writes less > frequently or something like that? > > [1]: https://github.com/GoogleCloudPlatform/bigdata-interop/issues/103 > > Maxim > > On Thu, Jun 21, 2018 at 7:17 AM Stephen Meyles <smey...@gmail.com> wrote: > >> I think we're seeing something similar but in our case we're trying to >> run Accumulo atop ADLS. When we generate sufficient write load we start to >> see stack traces like the following: >> >> [log.DfsLogger] ERROR: Failed to write log entries >> java.io.IOException: attempting to write to a closed stream; >> at >> com.microsoft.azure.datalake.store.ADLFileOutputStream.write(ADLFileOutputStream.java:88) >> at >> com.microsoft.azure.datalake.store.ADLFileOutputStream.write(ADLFileOutputStream.java:77) >> at >> org.apache.hadoop.fs.adl.AdlFsOutputStream.write(AdlFsOutputStream.java:57) >> at >> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:48) >> at java.io.DataOutputStream.write(DataOutputStream.java:88) >> at java.io.DataOutputStream.writeByte(DataOutputStream.java:153) >> at org.apache.accumulo.tserver.logger.LogFileKey.write(LogFileKey.java:87) >> at org.apache.accumulo.tserver.log.DfsLogger.write(DfsLogger.java:537) >> >> We have developed a rudimentary LogCloser implementation that allows us >> to recover from this but overall performance is significantly impacted by >> this. >> >> > As for the WAL closing issue on GCS, I recall a previous thread about >> that >> >> I searched more for this but wasn't able to find anything, nor similar >> re: ADL. I am also curious about the earlier question: >> >> >> Does Accumulo have a specific write pattern [to WALs], so that file >> system may not support it? >> >> as discussions with MS engineers have suggested, similar to the GCS >> thread, that small writes at high volume are, at best, suboptimal for ADLS. >> >> Regards >> >> Stephen >> >> >> On Wed, Jun 20, 2018 at 11:20 AM, Christopher <ctubb...@apache.org> >> wrote: >> >>> For what it's worth, this is an Apache project, not a Sqrrl project. >>> Amazon is free to contribute to Accumulo to improve its support of their >>> platform, just as anybody is free to do. Amazon may start contributing more >>> as a result of their acquisition... or they may not. There is no reason to >>> expect that their acquisition will have any impact whatsoever on the >>> platforms Accumulo supports, because Accumulo is not, and has not ever >>> been, a Sqrrl project (although some Sqrrl employees have contributed), and >>> thus will not become an Amazon project. It has been, and will remain, a >>> vendor-neutral Apache project. Regardless, we welcome contributions from >>> anybody which would improve Accumulo's support of any additional platform >>> alternatives to HDFS, whether it be GCS, S3, or something else. >>> >>> As for the WAL closing issue on GCS, I recall a previous thread about >>> that... I think a simple patch might be possible to solve that issue, but >>> to date, nobody has contributed a fix. If somebody is interested in using >>> Accumulo on GCS, I'd like to encourage them to submit any bugs they >>> encounter, and any patches (if they are able) which resolve those bugs. If >>> they need help submitting a fix, please ask on the dev@ list. >>> >>> >>> >>> On Wed, Jun 20, 2018 at 8:21 AM Geoffry Roberts <threadedb...@gmail.com> >>> wrote: >>> >>>> Maxim, >>>> >>>> Interesting that you were able to run A on GCS. I never thought of >>>> that--good to know. >>>> >>>> Since I am now an AWS guy (at least or the time being), in light of the >>>> fact that Amazon purchased Sqrrl, I am interested to see what develops. >>>> >>>> >>>> On Wed, Jun 20, 2018 at 5:15 AM, Maxim Kolchin <kolchin...@gmail.com> >>>> wrote: >>>> >>>>> Hi Geoffry, >>>>> >>>>> Thank you for the feedback! >>>>> >>>>> Thanks to [1, 2], I was able to run Accumulo cluster on Google VMs and >>>>> with GCS instead of HDFS. And I used Google Dataproc to run Hadoop jobs on >>>>> Accumulo. Almost everything was good until I've not faced some connection >>>>> issues with GCS. Quite often, the connection to GCS breaks on writing or >>>>> closing WALs. >>>>> >>>>> To all, >>>>> >>>>> Does Accumulo have a specific write pattern, so that file system may >>>>> not support it? Are there Accumulo properties which I can play with to >>>>> adjust the write pattern? >>>>> >>>>> [1]: https://github.com/cybermaggedon/accumulo-gs >>>>> [2]: https://github.com/cybermaggedon/accumulo-docker >>>>> >>>>> Thank you! >>>>> Maxim >>>>> >>>>> On Tue, Jun 19, 2018 at 10:31 PM Geoffry Roberts < >>>>> threadedb...@gmail.com> wrote: >>>>> >>>>>> I tried running Accumulo on Google. I first tried running it on >>>>>> Google's pre-made Hadoop. I found the various file paths one must >>>>>> contend >>>>>> with are different on Google than on a straight download from Apache. It >>>>>> seems they moved things around. To counter this, I installed my own >>>>>> Hadoop >>>>>> along with Zookeeper and Accumulo on a Google node. All went well until >>>>>> one fine day when I could no longer log in. It seems Google had pushed >>>>>> out >>>>>> some changes over night that broke my client side Google Cloud >>>>>> installation. Google referred the affected to a lengthy, >>>>>> easy-to-make-a-mistake procedure for resolving the issue. >>>>>> >>>>>> I decided life was too short for this kind of thing and switched to >>>>>> Amazon. >>>>>> >>>>>> On Tue, Jun 19, 2018 at 7:34 AM, Maxim Kolchin <kolchin...@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> Hi all, >>>>>>> >>>>>>> Does anyone have experience running Accumulo on top of Google Cloud >>>>>>> Storage instead of HDFS? In [1] you can see some details if you never >>>>>>> heard >>>>>>> about this feature. >>>>>>> >>>>>>> I see some discussion (see [2], [3]) around this topic, but it looks >>>>>>> to me that this isn't as popular as, I believe, should be. >>>>>>> >>>>>>> [1]: >>>>>>> https://cloud.google.com/dataproc/docs/concepts/connectors/cloud-storage >>>>>>> [2]: https://github.com/apache/accumulo/issues/428 >>>>>>> [3]: >>>>>>> https://github.com/GoogleCloudPlatform/bigdata-interop/issues/103 >>>>>>> >>>>>>> Best regards, >>>>>>> Maxim >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> There are ways and there are ways, >>>>>> >>>>>> Geoffry Roberts >>>>>> >>>>> >>>> >>>> >>>> -- >>>> There are ways and there are ways, >>>> >>>> Geoffry Roberts >>>> >>> >>