Thanks for sharing, Maxim.

What kind of failure/recovery testing did you do as a part of this? If you haven't done any yet, are you planning to do some such testing?

- Josh

On 1/15/19 10:02 AM, Maxim Kolchin wrote:
Hi,

I just wanted to leave intermediate feedback on the topic.

So far, Accumulo works pretty well on top of Google Storage. The aforementioned issue still exists, but it doesn't break anything. However, I can't give you any useful performance numbers at the moment.

The cluster:

  - master (with zookeeper) (n1-standard-1) + 2 tservers (n1-standard-4)
  - 32+ billlion entries
  - 5 tables (excluding system tables)

Some averaged numbers from two use cases:

 - batch write into pre-splitted tables with 40 client machines + 4 tservers (n1-standard-4) - max speed 1.5M entries/sec.  - sequential read with 2 client iterators (1 - filters by key, 2- filters by timestamp), with 5 client machines +  2 tservers (n1-standard-4 ) and less than 60k entries returned - max speed 1M+ entries/sec.

Maxim

On Mon, Jun 25, 2018 at 12:57 AM Christopher <[email protected] <mailto:[email protected]>> wrote:

    Ah, ok. One of the comments on the issue led me to believe that it
    was the same issue as the missing custom log closer.

    On Sat, Jun 23, 2018, 01:10 Stephen Meyles <[email protected]
    <mailto:[email protected]>> wrote:

         > I'm not convinced this is a write pattern issue, though. I
        commented on..

        The note there suggests the need for a LogCloser implementation;
        in my (ADLS) case I've written one and have it configured - the
        exception I'm seeing involves failures during writes, not during
        recovery (though it then leads to a need for recovery).

        S.

        On Fri, Jun 22, 2018 at 4:33 PM, Christopher
        <[email protected] <mailto:[email protected]>> wrote:

            Unfortunately, that feature wasn't added until 2.0, which
            hasn't yet been released, but I'm hoping it will be later
            this year.

            However, I'm not convinced this is a write pattern issue,
            though. I commented on
            
https://github.com/GoogleCloudPlatform/bigdata-interop/issues/103#issuecomment-399608543

            On Fri, Jun 22, 2018 at 1:50 PM Stephen Meyles
            <[email protected] <mailto:[email protected]>> wrote:

                Knowing that HBase has been run successfully on ADLS,
                went looking there (as they have the same WAL write
                pattern). This is informative:

                
https://www.cloudera.com/documentation/enterprise/5-12-x/topics/admin_using_adls_storage_with_hbase.html

                which suggests a need to split the WALs off on HDFS
                proper versus ADLS (or presumably GCS) barring changes
                in the underlying semantics of each. AFAICT you can't
                currently configure Accumulo to send WAL logs to a
                separate cluster - is this correct?

                S.


                On Fri, Jun 22, 2018 at 9:07 AM, Stephen Meyles
                <[email protected] <mailto:[email protected]>> wrote:

                    > Did you try to adjust any Accumulo properties to do
                    bigger writes less frequently or something like that?

                    We're using BatchWriters and sending reasonable
                    larges batches of Mutations. Given the stack traces
                    in both our cases are related to WAL writes it seems
                    like batch size would be the only tweak available
                    here (though, without reading the code carefully
                    it's not even clear to me that is impactful) but if
                    there others have suggestions I'd be happy to try.

                    Given we have this working well and stable in other
                    clusters atop traditional HDFS I'm currently
                    pursuing this further with the MS to understand the
                    variance to ADLS. Depending what emerges from that I
                    may circle back with more details and a bug report
                    and start digging in more deeply to the relevant
                    code in Accumulo.

                    S.


                    On Fri, Jun 22, 2018 at 6:09 AM, Maxim Kolchin
                    <[email protected] <mailto:[email protected]>>
                    wrote:

                        > If somebody is interested in using Accumulo on GCS, 
I'd like to encourage them to submit any bugs they encounter, and any patches (if 
they are able) which resolve those bugs.

                        I'd like to contribute a fix, but I don't know
                        where to start. We tried to get any help from
                        the Google Support about [1] over email, but
                        they just say that the GCS doesn't support such
                        write pattern. In the end, we can only guess how
                        to adjust the Accumulo behaviour to minimise
                        broken connections to the GCS.

                        BTW although we observe this exception, the
                        tablet server doesn't fail, so it means that
                        after some retries it is able to write WALs to GCS.

                        @Stephen,

                        > as discussions with MS engineers have suggested,
                        similar to the GCS thread, that small writes at
                        high volume are, at best, suboptimal for ADLS.

                        Did you try to adjust any Accumulo properties to
                        do bigger writes less frequently or something
                        like that?

                        [1]:
                        
https://github.com/GoogleCloudPlatform/bigdata-interop/issues/103

                        Maxim

                        On Thu, Jun 21, 2018 at 7:17 AM Stephen Meyles
                        <[email protected] <mailto:[email protected]>>
                        wrote:

                            I think we're seeing something similar but
                            in our case we're trying to run Accumulo
                            atop ADLS. When we generate sufficient write
                            load we start to see stack traces like the
                            following:

                            [log.DfsLogger] ERROR: Failed to write log
                            entries
                            java.io.IOException: attempting to write to
                            a closed stream;
                            at
                            
com.microsoft.azure.datalake.store.ADLFileOutputStream.write(ADLFileOutputStream.java:88)
                            at
                            
com.microsoft.azure.datalake.store.ADLFileOutputStream.write(ADLFileOutputStream.java:77)
                            at
                            
org.apache.hadoop.fs.adl.AdlFsOutputStream.write(AdlFsOutputStream.java:57)
                            at
                            
org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:48)
                            at
                            
java.io.DataOutputStream.write(DataOutputStream.java:88)
                            at
                            
java.io.DataOutputStream.writeByte(DataOutputStream.java:153)
                            at
                            
org.apache.accumulo.tserver.logger.LogFileKey.write(LogFileKey.java:87)
                            at
                            
org.apache.accumulo.tserver.log.DfsLogger.write(DfsLogger.java:537)

                            We have developed a rudimentary LogCloser
                            implementation that allows us to recover
                            from this but overall performance is
                            significantly impacted by this.

                             > As for the WAL closing issue on GCS, I
                            recall a previous thread about that

                            I searched more for this but wasn't able to
                            find anything, nor similar re: ADL. I am
                            also curious about the earlier question:

                            >> Does Accumulo have a specific write pattern [to 
WALs], so that file system may not support it?

                            as discussions with MS engineers have
                            suggested, similar to the GCS thread, that
                            small writes at high volume are, at best,
                            suboptimal for ADLS.

                            Regards

                            Stephen

                            On Wed, Jun 20, 2018 at 11:20 AM,
                            Christopher <[email protected]
                            <mailto:[email protected]>> wrote:

                                For what it's worth, this is an Apache
                                project, not a Sqrrl project. Amazon is
                                free to contribute to Accumulo to
                                improve its support of their platform,
                                just as anybody is free to do. Amazon
                                may start contributing more as a result
                                of their acquisition... or they may not.
                                There is no reason to expect that their
                                acquisition will have any impact
                                whatsoever on the platforms Accumulo
                                supports, because Accumulo is not, and
                                has not ever been, a Sqrrl project
                                (although some Sqrrl employees have
                                contributed), and thus will not become
                                an Amazon project. It has been, and will
                                remain, a vendor-neutral Apache project.
                                Regardless, we welcome contributions
                                from anybody which would improve
                                Accumulo's support of any additional
                                platform alternatives to HDFS, whether
                                it be GCS, S3, or something else.

                                As for the WAL closing issue on GCS, I
                                recall a previous thread about that... I
                                think a simple patch might be possible
                                to solve that issue, but to date, nobody
                                has contributed a fix. If somebody is
                                interested in using Accumulo on GCS, I'd
                                like to encourage them to submit any
                                bugs they encounter, and any patches (if
                                they are able) which resolve those bugs.
                                If they need help submitting a fix,
                                please ask on the dev@ list.



                                On Wed, Jun 20, 2018 at 8:21 AM Geoffry
                                Roberts <[email protected]
                                <mailto:[email protected]>> wrote:

                                    Maxim,

                                    Interesting that you were able to
                                    run A on GCS.  I never thought of
                                    that--good to know.

                                    Since I am now an AWS guy (at least
                                    or the time being), in light of the
                                    fact that Amazon purchased Sqrrl,  I
                                    am interested to see what develops.


                                    On Wed, Jun 20, 2018 at 5:15 AM,
                                    Maxim Kolchin <[email protected]
                                    <mailto:[email protected]>> wrote:

                                        Hi Geoffry,

                                        Thank you for the feedback!

                                        Thanks to [1, 2], I was able to
                                        run Accumulo cluster on Google
                                        VMs and with GCS instead of
                                        HDFS. And I used Google Dataproc
                                        to run Hadoop jobs on Accumulo.
                                        Almost everything was good until
                                        I've not faced some connection
                                        issues with GCS. Quite often,
                                        the connection to GCS breaks on
                                        writing or closing WALs.

                                        To all,

                                        Does Accumulo have a specific
                                        write pattern, so that file
                                        system may not support it? Are
                                        there Accumulo properties which
                                        I can play with to adjust the
                                        write pattern?

                                        [1]:
                                        
https://github.com/cybermaggedon/accumulo-gs
                                        [2]:
                                        
https://github.com/cybermaggedon/accumulo-docker

                                        Thank you!
                                        Maxim

                                        On Tue, Jun 19, 2018 at 10:31 PM
                                        Geoffry Roberts
                                        <[email protected]
                                        <mailto:[email protected]>>
                                        wrote:

                                            I tried running Accumulo on
                                            Google.  I first tried
                                            running it on Google's
                                            pre-made Hadoop.  I found
                                            the various file paths one
                                            must contend with are
                                            different on Google than on
                                            a straight download from
                                            Apache.  It seems they moved
                                            things around.  To counter
                                            this, I installed my own
                                            Hadoop along with Zookeeper
                                            and Accumulo on a
                                            Google node.  All went well
                                            until one fine day when I
                                            could no longer log in.  It
                                            seems Google had pushed out
                                            some changes over night that
                                            broke my client side Google
Cloud installation. Google referred the affected
                                            to a lengthy,
                                            easy-to-make-a-mistake
                                            procedure for resolving the
                                            issue.

                                            I decided life was too short
                                            for this kind of thing and
                                            switched to Amazon.

                                            On Tue, Jun 19, 2018 at 7:34
                                            AM, Maxim Kolchin
                                            <[email protected]
                                            <mailto:[email protected]>>
                                            wrote:

                                                Hi all,

                                                Does anyone have
                                                experience running
                                                Accumulo on top of
                                                Google Cloud Storage
                                                instead of HDFS? In [1]
                                                you can see some details
                                                if you never heard about
                                                this feature.

                                                I see some discussion
                                                (see [2], [3]) around
                                                this topic, but it looks
                                                to me that this isn't as
                                                popular as, I believe,
                                                should be.

                                                [1]:
                                                
https://cloud.google.com/dataproc/docs/concepts/connectors/cloud-storage
                                                [2]:
                                                
https://github.com/apache/accumulo/issues/428
                                                [3]:
                                                
https://github.com/GoogleCloudPlatform/bigdata-interop/issues/103

                                                Best regards,
                                                Maxim




-- There are ways and there are
                                            ways,

                                            Geoffry Roberts




-- There are ways and there are ways,

                                    Geoffry Roberts



Reply via email to