Ah, ok. One of the comments on the issue led me to believe that it was the
same issue as the missing custom log closer.

On Sat, Jun 23, 2018, 01:10 Stephen Meyles <smey...@gmail.com> wrote:

> > I'm not convinced this is a write pattern issue, though. I commented
> on..
>
> The note there suggests the need for a LogCloser implementation; in my
> (ADLS) case I've written one and have it configured - the exception I'm
> seeing involves failures during writes, not during recovery (though it then
> leads to a need for recovery).
>
> S.
>
> On Fri, Jun 22, 2018 at 4:33 PM, Christopher <ctubb...@apache.org> wrote:
>
>> Unfortunately, that feature wasn't added until 2.0, which hasn't yet been
>> released, but I'm hoping it will be later this year.
>>
>> However, I'm not convinced this is a write pattern issue, though. I
>> commented on
>> https://github.com/GoogleCloudPlatform/bigdata-interop/issues/103#issuecomment-399608543
>>
>> On Fri, Jun 22, 2018 at 1:50 PM Stephen Meyles <smey...@gmail.com> wrote:
>>
>>> Knowing that HBase has been run successfully on ADLS, went looking there
>>> (as they have the same WAL write pattern). This is informative:
>>>
>>>
>>> https://www.cloudera.com/documentation/enterprise/5-12-x/topics/admin_using_adls_storage_with_hbase.html
>>>
>>> which suggests a need to split the WALs off on HDFS proper versus ADLS
>>> (or presumably GCS) barring changes in the underlying semantics of each.
>>> AFAICT you can't currently configure Accumulo to send WAL logs to a
>>> separate cluster - is this correct?
>>>
>>> S.
>>>
>>>
>>> On Fri, Jun 22, 2018 at 9:07 AM, Stephen Meyles <smey...@gmail.com>
>>> wrote:
>>>
>>>> > Did you try to adjust any Accumulo properties to do bigger writes
>>>> less frequently or something like that?
>>>>
>>>> We're using BatchWriters and sending reasonable larges batches of
>>>> Mutations. Given the stack traces in both our cases are related to WAL
>>>> writes it seems like batch size would be the only tweak available here
>>>> (though, without reading the code carefully it's not even clear to me that
>>>> is impactful) but if there others have suggestions I'd be happy to try.
>>>>
>>>> Given we have this working well and stable in other clusters atop
>>>> traditional HDFS I'm currently pursuing this further with the MS to
>>>> understand the variance to ADLS. Depending what emerges from that I may
>>>> circle back with more details and a bug report and start digging in more
>>>> deeply to the relevant code in Accumulo.
>>>>
>>>> S.
>>>>
>>>>
>>>> On Fri, Jun 22, 2018 at 6:09 AM, Maxim Kolchin <kolchin...@gmail.com>
>>>> wrote:
>>>>
>>>>> > If somebody is interested in using Accumulo on GCS, I'd like to
>>>>> encourage them to submit any bugs they encounter, and any patches (if they
>>>>> are able) which resolve those bugs.
>>>>>
>>>>> I'd like to contribute a fix, but I don't know where to start. We
>>>>> tried to get any help from the Google Support about [1] over email, but
>>>>> they just say that the GCS doesn't support such write pattern. In the end,
>>>>> we can only guess how to adjust the Accumulo behaviour to minimise broken
>>>>> connections to the GCS.
>>>>>
>>>>> BTW although we observe this exception, the tablet server doesn't
>>>>> fail, so it means that after some retries it is able to write WALs to GCS.
>>>>>
>>>>> @Stephen,
>>>>>
>>>>> > as discussions with MS engineers have suggested, similar to the GCS
>>>>> thread, that small writes at high volume are, at best, suboptimal for 
>>>>> ADLS.
>>>>>
>>>>> Did you try to adjust any Accumulo properties to do bigger writes less
>>>>> frequently or something like that?
>>>>>
>>>>> [1]: https://github.com/GoogleCloudPlatform/bigdata-interop/issues/103
>>>>>
>>>>> Maxim
>>>>>
>>>>> On Thu, Jun 21, 2018 at 7:17 AM Stephen Meyles <smey...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> I think we're seeing something similar but in our case we're trying
>>>>>> to run Accumulo atop ADLS. When we generate sufficient write load we 
>>>>>> start
>>>>>> to see stack traces like the following:
>>>>>>
>>>>>> [log.DfsLogger] ERROR: Failed to write log entries
>>>>>> java.io.IOException: attempting to write to a closed stream;
>>>>>> at
>>>>>> com.microsoft.azure.datalake.store.ADLFileOutputStream.write(ADLFileOutputStream.java:88)
>>>>>> at
>>>>>> com.microsoft.azure.datalake.store.ADLFileOutputStream.write(ADLFileOutputStream.java:77)
>>>>>> at
>>>>>> org.apache.hadoop.fs.adl.AdlFsOutputStream.write(AdlFsOutputStream.java:57)
>>>>>> at
>>>>>> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:48)
>>>>>> at java.io.DataOutputStream.write(DataOutputStream.java:88)
>>>>>> at java.io.DataOutputStream.writeByte(DataOutputStream.java:153)
>>>>>> at
>>>>>> org.apache.accumulo.tserver.logger.LogFileKey.write(LogFileKey.java:87)
>>>>>> at org.apache.accumulo.tserver.log.DfsLogger.write(DfsLogger.java:537)
>>>>>>
>>>>>> We have developed a rudimentary LogCloser implementation that allows
>>>>>> us to recover from this but overall performance is significantly impacted
>>>>>> by this.
>>>>>>
>>>>>> > As for the WAL closing issue on GCS, I recall a previous thread
>>>>>> about that
>>>>>>
>>>>>> I searched more for this but wasn't able to find anything, nor
>>>>>> similar re: ADL. I am also curious about the earlier question:
>>>>>>
>>>>>> >> Does Accumulo have a specific write pattern [to WALs], so that
>>>>>> file system may not support it?
>>>>>>
>>>>>> as discussions with MS engineers have suggested, similar to the GCS
>>>>>> thread, that small writes at high volume are, at best, suboptimal for 
>>>>>> ADLS.
>>>>>>
>>>>>> Regards
>>>>>>
>>>>>> Stephen
>>>>>>
>>>>>>
>>>>>> On Wed, Jun 20, 2018 at 11:20 AM, Christopher <ctubb...@apache.org>
>>>>>> wrote:
>>>>>>
>>>>>>> For what it's worth, this is an Apache project, not a Sqrrl project.
>>>>>>> Amazon is free to contribute to Accumulo to improve its support of their
>>>>>>> platform, just as anybody is free to do. Amazon may start contributing 
>>>>>>> more
>>>>>>> as a result of their acquisition... or they may not. There is no reason 
>>>>>>> to
>>>>>>> expect that their acquisition will have any impact whatsoever on the
>>>>>>> platforms Accumulo supports, because Accumulo is not, and has not ever
>>>>>>> been, a Sqrrl project (although some Sqrrl employees have contributed), 
>>>>>>> and
>>>>>>> thus will not become an Amazon project. It has been, and will remain, a
>>>>>>> vendor-neutral Apache project. Regardless, we welcome contributions from
>>>>>>> anybody which would improve Accumulo's support of any additional 
>>>>>>> platform
>>>>>>> alternatives to HDFS, whether it be GCS, S3, or something else.
>>>>>>>
>>>>>>> As for the WAL closing issue on GCS, I recall a previous thread
>>>>>>> about that... I think a simple patch might be possible to solve that 
>>>>>>> issue,
>>>>>>> but to date, nobody has contributed a fix. If somebody is interested in
>>>>>>> using Accumulo on GCS, I'd like to encourage them to submit any bugs 
>>>>>>> they
>>>>>>> encounter, and any patches (if they are able) which resolve those bugs. 
>>>>>>> If
>>>>>>> they need help submitting a fix, please ask on the dev@ list.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Jun 20, 2018 at 8:21 AM Geoffry Roberts <
>>>>>>> threadedb...@gmail.com> wrote:
>>>>>>>
>>>>>>>> Maxim,
>>>>>>>>
>>>>>>>> Interesting that you were able to run A on GCS.  I never thought of
>>>>>>>> that--good to know.
>>>>>>>>
>>>>>>>> Since I am now an AWS guy (at least or the time being), in light of
>>>>>>>> the fact that Amazon purchased Sqrrl,  I am interested to see what 
>>>>>>>> develops.
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Jun 20, 2018 at 5:15 AM, Maxim Kolchin <
>>>>>>>> kolchin...@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Hi Geoffry,
>>>>>>>>>
>>>>>>>>> Thank you for the feedback!
>>>>>>>>>
>>>>>>>>> Thanks to [1, 2], I was able to run Accumulo cluster on Google VMs
>>>>>>>>> and with GCS instead of HDFS. And I used Google Dataproc to run 
>>>>>>>>> Hadoop jobs
>>>>>>>>> on Accumulo. Almost everything was good until I've not faced some
>>>>>>>>> connection issues with GCS. Quite often, the connection to GCS breaks 
>>>>>>>>> on
>>>>>>>>> writing or closing WALs.
>>>>>>>>>
>>>>>>>>> To all,
>>>>>>>>>
>>>>>>>>> Does Accumulo have a specific write pattern, so that file system
>>>>>>>>> may not support it? Are there Accumulo properties which I can play 
>>>>>>>>> with to
>>>>>>>>> adjust the write pattern?
>>>>>>>>>
>>>>>>>>> [1]: https://github.com/cybermaggedon/accumulo-gs
>>>>>>>>> [2]: https://github.com/cybermaggedon/accumulo-docker
>>>>>>>>>
>>>>>>>>> Thank you!
>>>>>>>>> Maxim
>>>>>>>>>
>>>>>>>>> On Tue, Jun 19, 2018 at 10:31 PM Geoffry Roberts <
>>>>>>>>> threadedb...@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> I tried running Accumulo on Google.  I first tried running it on
>>>>>>>>>> Google's pre-made Hadoop.  I found the various file paths one must 
>>>>>>>>>> contend
>>>>>>>>>> with are different on Google than on a straight download from 
>>>>>>>>>> Apache.  It
>>>>>>>>>> seems they moved things around.  To counter this, I installed my own 
>>>>>>>>>> Hadoop
>>>>>>>>>> along with Zookeeper and Accumulo on a Google node.  All went well 
>>>>>>>>>> until
>>>>>>>>>> one fine day when I could no longer log in.  It seems Google had 
>>>>>>>>>> pushed out
>>>>>>>>>> some changes over night that broke my client side Google Cloud
>>>>>>>>>> installation.  Google referred the affected to a lengthy,
>>>>>>>>>> easy-to-make-a-mistake procedure for resolving the issue.
>>>>>>>>>>
>>>>>>>>>> I decided life was too short for this kind of thing and switched
>>>>>>>>>> to Amazon.
>>>>>>>>>>
>>>>>>>>>> On Tue, Jun 19, 2018 at 7:34 AM, Maxim Kolchin <
>>>>>>>>>> kolchin...@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi all,
>>>>>>>>>>>
>>>>>>>>>>> Does anyone have experience running Accumulo on top of Google
>>>>>>>>>>> Cloud Storage instead of HDFS? In [1] you can see some details if 
>>>>>>>>>>> you never
>>>>>>>>>>> heard about this feature.
>>>>>>>>>>>
>>>>>>>>>>> I see some discussion (see [2], [3]) around this topic, but it
>>>>>>>>>>> looks to me that this isn't as popular as, I believe, should be.
>>>>>>>>>>>
>>>>>>>>>>> [1]:
>>>>>>>>>>> https://cloud.google.com/dataproc/docs/concepts/connectors/cloud-storage
>>>>>>>>>>> [2]: https://github.com/apache/accumulo/issues/428
>>>>>>>>>>> [3]:
>>>>>>>>>>> https://github.com/GoogleCloudPlatform/bigdata-interop/issues/103
>>>>>>>>>>>
>>>>>>>>>>> Best regards,
>>>>>>>>>>> Maxim
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> There are ways and there are ways,
>>>>>>>>>>
>>>>>>>>>> Geoffry Roberts
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> There are ways and there are ways,
>>>>>>>>
>>>>>>>> Geoffry Roberts
>>>>>>>>
>>>>>>>
>>>>>>
>>>>

Reply via email to