If it were a class-loading issue I would think that we'd see an exception of some kind. Maybe double-check that flink-shaded-hadoop is not in the lib directory. (usually I would ask for the full classpath that the HS is started with, but as it turns out this isn't getting logged :( (FLINK-18008))

The fact that overview.json and jobs/overview.json are missing indicates that something goes wrong directly on startup. What is supposed to happens is that the HS starts, fetches all currently available archives and then creates these files.
So it seems like the download gets stuck for some reason.

Can you use jstack to create a thread dump, and see what the Flink-HistoryServer-ArchiveFetcher is doing?

I will also file a JIRA for adding more logging statements, like when fetching starts/stops.

On 27/05/2020 20:57, Hailu, Andreas wrote:

Hi Chesney, apologies for not getting back to you sooner here. So I did what you suggested - I downloaded a few files from my jobmanager.archive.fs.dir HDFS directory to a locally available directory named /local/scratch/hailua_p2epdlsuat/historyserver/archived/. I then changed my historyserver.archive.fs.dir to file:///local/scratch/hailua_p2epdlsuat/historyserver/archived/ and that seemed to work. I’m able to see the history of the applications I downloaded. So this points to a problem with sourcing the history from HDFS.

Do you think this could be classpath related? This is what we use for our HADOOP_CLASSPATH var:

//gns/software/infra/big-data/hadoop/hdp-2.6.5.0/hadoop/*:/gns/software/infra/big-data/hadoop/hdp-2.6.5.0/hadoop/lib/*:/gns/software/infra/big-data/hadoop/hdp-2.6.5.0/hadoop-hdfs/*:/gns/software/infra/big-data/hadoop/hdp-2.6.5.0/hadoop-hdfs/lib/*:/gns/software/infra/big-data/hadoop/hdp-2.6.5.0/hadoop-mapreduce/*:/gns/software/infra/big-data/hadoop/hdp-2.6.5.0/hadoop-mapreduce/lib/*:/gns/software/infra/big-data/hadoop/hdp-2.6.5.0/hadoop-yarn/*:/gns/software/infra/big-data/hadoop/hdp-2.6.5.0/hadoop-yarn/lib/*:/gns/software/ep/da/dataproc/dataproc-prod/lakeRmProxy.jar:/gns/software/infra/big-data/hadoop/hdp-2.6.5.0/hadoop/bin::/gns/mw/dbclient/postgres/jdbc/pg-jdbc-9.3.v01/postgresql-9.3-1100-jdbc4.jar/

//

You can see we have references to Hadoop mapred/yarn/hdfs libs in there.

*// *ah**

*From:*Chesnay Schepler <ches...@apache.org>
*Sent:* Sunday, May 3, 2020 6:00 PM
*To:* Hailu, Andreas [Engineering] <andreas.ha...@ny.email.gs.com>; user@flink.apache.org
*Subject:* Re: History Server Not Showing Any Jobs - File Not Found?

yes, exactly; I want to rule out that (somehow) HDFS is the problem.

I couldn't reproduce the issue locally myself so far.

On 01/05/2020 22:31, Hailu, Andreas wrote:

    Hi Chesnay, yes – they were created using Flink 1.9.1 as we’ve
    only just started to archive them in the past couple weeks. Could
    you clarify on how you want to try local filesystem archives? As
    in changing jobmanager.archive.fs.dir and historyserver.web.tmpdir
    to the same local directory?

    *// *ah

    *From:*Chesnay Schepler <ches...@apache.org>
    <mailto:ches...@apache.org>
    *Sent:* Wednesday, April 29, 2020 8:26 AM
    *To:* Hailu, Andreas [Engineering] <andreas.ha...@ny.email.gs.com>
    <mailto:andreas.ha...@ny.email.gs.com>; user@flink.apache.org
    <mailto:user@flink.apache.org>
    *Subject:* Re: History Server Not Showing Any Jobs - File Not Found?

    hmm...let's see if I can reproduce the issue locally.

    Are the archives from the same version the history server runs on?
    (Which I supposed would be 1.9.1?)

    Just for the sake of narrowing things down, it would also be
    interesting to check if it works with the archives residing in the
    local filesystem.

    On 27/04/2020 18:35, Hailu, Andreas wrote:

        bash-4.1$ ls -l /local/scratch/flink_historyserver_tmpdir/

        total 8

        drwxrwxr-x 3 p2epdlsuat p2epdlsuat 4096 Apr 21 10:43
        flink-web-history-7fbb97cc-9f38-4844-9bcf-6272fe6828e9

        drwxrwxr-x 3 p2epdlsuat p2epdlsuat 4096 Apr 21 10:22
        flink-web-history-95b3f928-c60f-4351-9926-766c6ad3ee76

        There are just two directories in here. I don’t see cache
        directories from my attempts today, which is interesting.
        Looking a little deeper into them:

        bash-4.1$ ls -lr
        
/local/scratch/flink_historyserver_tmpdir/flink-web-history-7fbb97cc-9f38-4844-9bcf-6272fe6828e9

        total 1756

        drwxrwxr-x 2 p2epdlsuat p2epdlsuat 1789952 Apr 21 10:44 jobs

        bash-4.1$ ls -lr
        
/local/scratch/flink_historyserver_tmpdir/flink-web-history-7fbb97cc-9f38-4844-9bcf-6272fe6828e9/jobs

        total 0

        -rw-rw-r-- 1 p2epdlsuat p2epdlsuat 0 Apr 21 10:43 overview.json

        There are indeed archives already in HDFS – I’ve included some
        in my initial mail, but here they are again just for reference:

        -bash-4.1$ hdfs dfs -ls /user/p2epda/lake/delp_qa/flink_hs

        Found 44282 items

        -rw-r----- 3 delp datalake_admin_dev      50569 2020-03-21
        23:17
        /user/p2epda/lake/delp_qa/flink_hs/000144dba9dc0f235768a46b2f26e936

        -rw-r----- 3 delp datalake_admin_dev      49578 2020-03-03
        08:45
        /user/p2epda/lake/delp_qa/flink_hs/000347625d8128ee3fd0b672018e38a5

        -rw-r----- 3 delp datalake_admin_dev      50842 2020-03-24
        15:19
        /user/p2epda/lake/delp_qa/flink_hs/0004be6ce01ba9677d1eb619ad0fa757

        ...

        *// *ah

        *From:*Chesnay Schepler <ches...@apache.org>
        <mailto:ches...@apache.org>
        *Sent:* Monday, April 27, 2020 10:28 AM
        *To:* Hailu, Andreas [Engineering]
        <andreas.ha...@ny.email.gs.com>
        <mailto:andreas.ha...@ny.email.gs.com>; user@flink.apache.org
        <mailto:user@flink.apache.org>
        *Subject:* Re: History Server Not Showing Any Jobs - File Not
        Found?

        If historyserver.web.tmpdir is not set then java.io.tmpdir is
        used, so that should be fine.

        What are the contents of
        /local/scratch/flink_historyserver_tmpdir?

        I assume there are already archives in HDFS?

        On 27/04/2020 16:02, Hailu, Andreas wrote:

            My machine’s /tmp directory is not large enough to support
            the archived files, so I changed my java.io.tmpdir to be
            in some other location which is significantly larger. I
            hadn’t set anything for historyserver.web.tmpdir, so I
            suspect it was still pointing at /tmp. I just tried
            setting historyserver.web.tmpdir to the same location as
            my java.io.tmpdir location, but I’m afraid I’m still
            seeing the following issue:

            2020-04-27 09:37:42,904 [nioEventLoopGroup-3-4] DEBUG
            HistoryServerStaticFileServerHandler - Unable to load
            requested file /overview.json from classloader

            2020-04-27 09:37:42,906 [nioEventLoopGroup-3-6] DEBUG
            HistoryServerStaticFileServerHandler - Unable to load
            requested file /jobs/overview.json from classloader

            flink-conf.yaml for reference:

            jobmanager.archive.fs.dir:
            hdfs:///user/p2epda/lake/delp_qa/flink_hs/

            historyserver.archive.fs.dir:
            hdfs:///user/p2epda/lake/delp_qa/flink_hs/

            historyserver.web.tmpdir:
            /local/scratch/flink_historyserver_tmpdir/

            Did you have anything else in mind when you said pointing
            somewhere funny?

            *// *ah

            *From:*Chesnay Schepler <ches...@apache.org>
            <mailto:ches...@apache.org>
            *Sent:* Monday, April 27, 2020 5:56 AM
            *To:* Hailu, Andreas [Engineering]
            <andreas.ha...@ny.email.gs.com>
            <mailto:andreas.ha...@ny.email.gs.com>;
            user@flink.apache.org <mailto:user@flink.apache.org>
            *Subject:* Re: History Server Not Showing Any Jobs - File
            Not Found?

            overview.json is a generated file that is placed in the
            local directory controlled by /historyserver.web.tmpdir/.

            Have you configured this option to point to some non-local
            filesystem? (Or if not, is the java.io.tmpdir property
            pointing somewhere funny?)

            On 24/04/2020 18:24, Hailu, Andreas wrote:

                I’m having a further look at the code in
                HistoryServerStaticFileServerHandler - is there an
                assumption about where overview.json is supposed to be
                located?

                *// *ah

                *From:*Hailu, Andreas [Engineering]
                *Sent:* Wednesday, April 22, 2020 1:32 PM
                *To:* 'Chesnay Schepler' <ches...@apache.org>
                <mailto:ches...@apache.org>; Hailu, Andreas
                [Engineering] <andreas.ha...@ny.email.gs.com>
                <mailto:andreas.ha...@ny.email.gs.com>;
                user@flink.apache.org <mailto:user@flink.apache.org>
                *Subject:* RE: History Server Not Showing Any Jobs -
                File Not Found?

                Hi Chesnay, thanks for responding. We’re using Flink
                1.9.1. I enabled DEBUG level logging and this is
                something relevant I see:

                2020-04-22 13:25:52,566
                [Flink-HistoryServer-ArchiveFetcher-thread-1] DEBUG
                DFSInputStream - Connecting to datanode 10.79.252.101:1019

                2020-04-22 13:25:52,567
                [Flink-HistoryServer-ArchiveFetcher-thread-1] DEBUG
                SaslDataTransferClient - SASL encryption trust check:
                localHostTrusted = false, remoteHostTrusted = false

                2020-04-22 13:25:52,567
                [Flink-HistoryServer-ArchiveFetcher-thread-1] DEBUG
                SaslDataTransferClient - SASL client skipping
                handshake in secured configuration with privileged
                port for addr = /10.79.252.101, datanodeId = DatanodeI

                
nfoWithStorage[10.79.252.101:1019,DS-7f4ec55d-7c5f-4a0e-b817-d9e635480b21,DISK]

                *2020-04-22 13:25:52,571
                [Flink-HistoryServer-ArchiveFetcher-thread-1] DEBUG
                DFSInputStream - DFSInputStream has been closed already*

                *2020-04-22 13:25:52,573 [nioEventLoopGroup-3-6] DEBUG
                HistoryServerStaticFileServerHandler - Unable to load
                requested file /jobs/overview.json from classloader*

                2020-04-22 13:25:52,576 [IPC Parameter Sending Thread
                #0] DEBUG Client$Connection$3 - IPC Client
                (1578587450) connection to
                d279536-002.dc.gs.com/10.59.61.87:8020 from
                d...@gs.com <mailto:d...@gs.com> sending #1391

                Aside from that, it looks like a lot of logging around
                datanodes and block location metadata. Did I miss
                something in my classpath, perhaps? If so, do you have
                a suggestion on what I could try?

                *// *ah

                *From:*Chesnay Schepler <ches...@apache.org
                <mailto:ches...@apache.org>>
                *Sent:* Wednesday, April 22, 2020 2:16 AM
                *To:* Hailu, Andreas [Engineering]
                <andreas.ha...@ny.email.gs.com
                <mailto:andreas.ha...@ny.email.gs.com>>;
                user@flink.apache.org <mailto:user@flink.apache.org>
                *Subject:* Re: History Server Not Showing Any Jobs -
                File Not Found?

                Which Flink version are you using?

                Have you checked the history server logs after
                enabling debug logging?

                On 21/04/2020 17:16, Hailu, Andreas [Engineering] wrote:

                    Hi,

                    I’m trying to set up the History Server, but none
                    of my applications are showing up in the Web UI.
                    Looking at the console, I see that all of the
                    calls to /overview return the following 404
                    response: {"errors":["File not found."]}.

                    I’ve set up my configuration as follows:

                    JobManager Archive directory:

                    *jobmanager.archive.fs.dir*:
                    hdfs:///user/p2epda/lake/delp_qa/flink_hs/

                    -bash-4.1$ hdfs dfs -ls
                    /user/p2epda/lake/delp_qa/flink_hs

                    Found 44282 items

                    -rw-r----- 3 delp datalake_admin_dev      50569
                    2020-03-21 23:17
                    
/user/p2epda/lake/delp_qa/flink_hs/000144dba9dc0f235768a46b2f26e936

                    -rw-r----- 3 delp datalake_admin_dev      49578
                    2020-03-03 08:45
                    
/user/p2epda/lake/delp_qa/flink_hs/000347625d8128ee3fd0b672018e38a5

                    -rw-r----- 3 delp datalake_admin_dev      50842
                    2020-03-24 15:19
                    
/user/p2epda/lake/delp_qa/flink_hs/0004be6ce01ba9677d1eb619ad0fa757

                    ...

                    ...

                    History Server will fetch the archived jobs from
                    the same location:

                    *historyserver.archive.fs.dir*:
                    hdfs:///user/p2epda/lake/delp_qa/flink_hs/

                    So I’m able to confirm that there are indeed
                    archived applications that I should be able to
                    view in the histserver. I’m not able to find out
                    what file the overview service is looking for from
                    the repository – any suggestions as to what I
                    could look into next?

                    Best,

                    Andreas

                    
------------------------------------------------------------------------


                    Your Personal Data: We may collect and process
                    information about you that may be subject to data
                    protection laws. For more information about how we
                    use and disclose your personal data, how we
                    protect your information, our legal basis to use
                    your information, your rights and who you can
                    contact, please refer to:
                    www.gs.com/privacy-notices
                    <http://www.gs.com/privacy-notices>

                
------------------------------------------------------------------------


                Your Personal Data: We may collect and process
                information about you that may be subject to data
                protection laws. For more information about how we use
                and disclose your personal data, how we protect your
                information, our legal basis to use your information,
                your rights and who you can contact, please refer to:
                www.gs.com/privacy-notices
                <http://www.gs.com/privacy-notices>

            
------------------------------------------------------------------------


            Your Personal Data: We may collect and process information
            about you that may be subject to data protection laws. For
            more information about how we use and disclose your
            personal data, how we protect your information, our legal
            basis to use your information, your rights and who you can
            contact, please refer to: www.gs.com/privacy-notices
            <http://www.gs.com/privacy-notices>

        ------------------------------------------------------------------------


        Your Personal Data: We may collect and process information
        about you that may be subject to data protection laws. For
        more information about how we use and disclose your personal
        data, how we protect your information, our legal basis to use
        your information, your rights and who you can contact, please
        refer to: www.gs.com/privacy-notices
        <http://www.gs.com/privacy-notices>

    ------------------------------------------------------------------------


    Your Personal Data: We may collect and process information about
    you that may be subject to data protection laws. For more
    information about how we use and disclose your personal data, how
    we protect your information, our legal basis to use your
    information, your rights and who you can contact, please refer to:
    www.gs.com/privacy-notices <http://www.gs.com/privacy-notices>


------------------------------------------------------------------------

Your Personal Data: We may collect and process information about you that may be subject to data protection laws. For more information about how we use and disclose your personal data, how we protect your information, our legal basis to use your information, your rights and who you can contact, please refer to: www.gs.com/privacy-notices <http://www.gs.com/privacy-notices>


Reply via email to