That was it! Issue resolved :). I ran my test twice - 40 out 40 successful. I also checked it once more manually in the node console and in Ezra Project and it works without issues now!
Thank you so much, Troy.

Now I can switch to the latest SWORD trunk with Ezra Project!

Best regards,
Tobias

On 10/17/20 8:13 PM, Troy A. Griffitts wrote:

untar now updated. Thanks for your time with this.


On 10/17/20 4:36 PM, Tobias Klein wrote:

Updated to SVN Rev. 3813.

Still hanging here:

#0  0x000056143eb2f28c in sword::FileMgr::sysOpen(sword::FileDesc*) ()
#1  0x000056143eb18b0b in sword::FileDesc::getFd() ()
#2  0x000056143eb9323b in (anonymous namespace)::untar(void*, char const*) () #3  0x000056143eb93b60 in sword::ZipCompress::unTarGZ(int, char const*) () #4  0x000056143eb483f6 in sword::InstallMgr::refreshRemoteSource(sword::InstallSource*) ()

Best regards,
Tobias

On 10/17/20 1:52 PM, Troy A. Griffitts wrote:

OK Tobias,

Give it a go when you have a chance and let me know.

Troy


On 10/17/20 12:16 PM, Troy A. Griffitts wrote:
The unTarGZ is also a new method and it looks like it is using the default file handle pool functionality of FileMgr, from looking at your stack trace. Give me about an hour and I'll have a chance to take a look at it. Good news is that it's not having trouble in the CURLFTPTransport. I have the same change queued up for commit for the other 3 transport impls so I will go ahead and commit those, as well. Thank you for working through this with me.

Troy

On October 17, 2020 10:44:18 AM GMT+02:00, Tobias Klein <cont...@tklein.info> wrote:

    Dear Troy,

    Thank you so much for the help and all your work on this.
    Unfortunately the issue is still not resolved for me based on
    your latest commits.

    I have n threads that all run InstallMgr::refreshRemoteSource.
    n corresponds to the number of repositories available, so it's
    currently 10.
    The operation works until at some point things start hanging
    again. Sometime it happens after ten consecutive calls of the
    update function, sometimes already after three times and I also
    had it hanging after only one attempt.
    In the calling function the hanging occurs when I join the
    threads (waiting for them to complete).

    Looking at details in gdb I find this at the point of hanging:

    (gdb) info threads
      Id   Target Id         Frame
      102  Thread 0x7f8e3cdd1700 (LWP 46520) "node_sword_cli"
    0x000056411eacb46c in sword::FileMgr::sysOpen(sword::FileDesc*) ()
      95   Thread 0x7f8e37fff700 (LWP 46514) "node_sword_cli"
    0x000056411eacb283 in sword::FileMgr::sysOpen(sword::FileDesc*) ()
      93   Thread 0x7f8e3ddf6700 (LWP 46511) "node_sword_cli"
    0x000056411eacb283 in sword::FileMgr::sysOpen(sword::FileDesc*) ()
      91   Thread 0x7f8e2bfff700 (LWP 46510) "node_sword_cli"
    0x000056411eacb46c in sword::FileMgr::sysOpen(sword::FileDesc*) ()
    * 90   Thread 0x7f8e2b7fe700 (LWP 46509) "node_sword_cli"
    0x000056411eacb296 in sword::FileMgr::sysOpen(sword::FileDesc*) ()
      1    Thread 0x7f8e3ddf9e00 (LWP 46380) "node_sword_cli"
    0x00007f8e40d98cd7 in __pthread_clockjoin_ex () from
    /lib/x86_64-linux-gnu/libpthread.so.0

    And this stacktrace for each individual thread (relevant portion):
    #0  0x000056411eacb46c in
    sword::FileMgr::sysOpen(sword::FileDesc*) ()
    #1  0x000056411eab4b0b in sword::FileDesc::getFd() ()
    #2  0x000056411eb2fb70 in
    sword::ZipCompress::unTarGZ(sword::FileDesc*, char const*) ()
    #3  0x000056411eae4437 in
    sword::InstallMgr::refreshRemoteSource(sword::InstallSource*) ()

    I'm not sure whether all of these threads here are now hanging
    or only one of them. It could be just the one that the main
    function tries to join right now.

    Another observation is that I am getting random output like
    this during the process (it happens with different conf files,
    not this one all the time):

    error writing
    /home/tobi/.sword/installMgr/20120711005000/mods.d/ngu_BL_1987.conf
    skipping...

    I didn't get these error messages with earlier SVN revisions.

    To be sure I just once more tested with SVN Rev. 3759 and there
    I consistently get 20 out of 20 attempts successful.

    Best regards,
    Tobias

    PS: I'm sending this e-mail the second time, didn't seem to
    come through via mailman the first time (at 9:11 CEST).

    On 10/15/20 8:09 PM, Troy A. Griffitts wrote:

    Dear Tobias,

    Thank you for all the great information.  This enabled me to
    isolate the change which caused the issue.

    So, for a bit of background, SWORD has no calls to mark
    critical sections which might be problematic for re-entrant
    usage.  This has been due to the many implementations of
    threading across many different platforms over the years,
    before C++11.  But, as a policy to support clients which
    desire to use SWORD in a multithreaded manner, we do our best
    to make this safe by advising clients to use separate SWMgr
    instances per thread.  There are still some shared objects in
    this scenario, but we do our best to do all the writing to
    these shared objects upon initialization.  We broke this rule
    in commit 2760, which is what caused your problem.  SWORD have
    a facility to pool open file handles, to help OSs which have
    small open file handle limits.  This work is done in FileMgr. 
    Recently, to support Windows Unicode path names (the commit
    you found which breaks your multithreaded use), we rounded up
    all remaining native file IO calls and replaced them to used
    FileMgr for the IO and then extended FileMgr to handle Windows
    Unicode paths in a Windows-specific manner.  One of these
    changes was in CURLFTPTransport, which is where you are having
    the issue.  The problem is that, where previously this class
    was directly opening a FILE to do its writing, commit 2760
    changed this to use FileMgr to open the file, which involved
    the SWORD-wide file handle pool, and since we are create a new
    file, we are always writing to this shared pool container,
    which is not threadsafe.  My guess is that you have two
    threads trying to update the pool container at exactly the
    same time.  Using the file handle pool is usually safe,
    because SWMgr "opens" all of its file handles on
    initialization (these are not actually opening OS file
    handles, but instead updating the file handle pool container
    with proxy objects which delay actual OS open to on-demand,
    but the point is this instance of shared file handle pool
    container writing is done on creation of the SWMgr, afterward,
    the shared resource file handle pool is only read and each
    object in the pool is owned by only 1 thread if the "each
    thread must have its own SWMgr" rule is followed.

    Regardless of the details.  I believe I have committed a fix
    for you.  In short, I have changed CURLFTPTransport to follow
    our rule to avoid writing to shared objects when we might be
    re-entrant.  Here we now use FileMgr's methods which isolate
    OS implementation, but not FileMgr's file handle pool (as it
    did not previously use the pool before this commit).  This
    should allow this to still take advantage of the Windows
    OS-specific implementation, and also avoid the critical
    section.  Can you please try SVN head and let me know if we
    are back to 20 out of 20 successes?

    Thanks again for the very helpful debug log and exact revision
    where failure began.

    Troy


    On 10/13/20 10:08 PM, Tobias Klein wrote:

    I managed to get a backtrace to a segmentation fault using GDB.

    It seems like the crash is happening in sword::FileMgr::open( ...

    The starting point is sword::InstallMgr::refreshRemoteSource
    as I was writing before.

    Best regards,
    Tobias

    Program received signal SIGSEGV, Segmentation fault.
    [Switching to Thread 0x7f1af3fff700 (LWP 220833)]
    0x00007f1b027045a4 in sword::FileMgr::open(char const*, int,
    int, bool) () from
    
/home/tobi/dev/ezra_project/node-sword-interface-git/build/Release/node_sword_interface.node
    (gdb) backtrace
    #0  0x00007f1b027045a4 in sword::FileMgr::open(char const*,
    int, int, bool) () from
    
/home/tobi/dev/ezra_project/node-sword-interface-git/build/Release/node_sword_interface.node
    #1  0x00007f1b0276ad7b in sword::(anonymous
    namespace)::my_fwrite(void*, unsigned long, unsigned long,
    void*) ()
       from
    
/home/tobi/dev/ezra_project/node-sword-interface-git/build/Release/node_sword_interface.node
    #2  0x00007f1b180626bf in ?? () from
    /usr/lib/x86_64-linux-gnu/libcurl-gnutls.so.4
    #3  0x00007f1b18074a2b in ?? () from
    /usr/lib/x86_64-linux-gnu/libcurl-gnutls.so.4
    #4  0x00007f1b1807e2e4 in ?? () from
    /usr/lib/x86_64-linux-gnu/libcurl-gnutls.so.4
    #5  0x00007f1b1807f6f9 in curl_multi_perform () from
    /usr/lib/x86_64-linux-gnu/libcurl-gnutls.so.4
    #6  0x00007f1b18075d13 in curl_easy_perform () from
    /usr/lib/x86_64-linux-gnu/libcurl-gnutls.so.4
    #7  0x00007f1b0276b683 in
    sword::CURLFTPTransport::getURL(char const*, char const*,
    sword::SWBuf*) () from
    
/home/tobi/dev/ezra_project/node-sword-interface-git/build/Release/node_sword_interface.node
    #8  0x00007f1b0271d5d2 in
    sword::InstallMgr::remoteCopy(sword::InstallSource*, char
    const*, char const*, bool, char const*) ()
       from
    
/home/tobi/dev/ezra_project/node-sword-interface-git/build/Release/node_sword_interface.node
    #9  0x00007f1b0271edc7 in
    sword::InstallMgr::refreshRemoteSource(sword::InstallSource*)
    () from
    
/home/tobi/dev/ezra_project/node-sword-interface-git/build/Release/node_sword_interface.node
    #10 0x00007f1b026ad734 in
    
RepositoryInterface::refreshIndividualRemoteSource(std::__cxx11::basic_string<char,
    std::char_traits<char>, std::allocator<char> >,
    std::function<void (unsigned int)>*) ()
       from
    
/home/tobi/dev/ezra_project/node-sword-interface-git/build/Release/node_sword_interface.node
    #11 0x00007f1b026b17dd in
    std::thread::_State_impl<std::thread::_Invoker<std::tuple<int
    (RepositoryInterface::*)(std::__cxx11::basic_string<char,
    std::char_traits<char>, std::allocator<char> >,
    std::function<void (unsigned int)>*), RepositoryInterface*,
    std::__cxx11::basic_string<char, std::char_traits<char>,
    std::allocator<char> >, std::function<void (unsigned int)>*>
    > >::_M_run() ()
       from
    
/home/tobi/dev/ezra_project/node-sword-interface-git/build/Release/node_sword_interface.node
    #12 0x00007f1b1d622cb4 in ?? () from
    /usr/lib/x86_64-linux-gnu/libstdc++.so.6
    #13 0x00007f1b1e20a609 in start_thread () from
    /lib/x86_64-linux-gnu/libpthread.so.0
    #14 0x00007f1b1e131103 in clone () from
    /lib/x86_64-linux-gnu/libc.so.6

    On 10/13/20 1:07 PM, Tobias Klein wrote:

    Hi Troy,

    I tested more SVN revisions of SWORD trunk (starting from my
    stable version until I hit the bug) and I can now say that

    SVN Rev. 3759 is the last SVN revision that works without
    hanging for the below mentioned scenario. (20 out of 20
    tests successful)

    SVN Rev. 3760 is the first SVN revision where the hanging
    occurs. The commit message is "First cut at better isolation
    of FileIO to FileMgr and providing a WIN32 impl with works
    with wchar_t".

    Modified files:
    include/filemgr.h
    include/swbuf.h
    lib/bcppmake/libsword.bpr
    src/mgr/curlftpt.cpp
    src/mgr/curlhttpt.cpp
    src/mgr/filemgr.cpp
    src/mgr/installmgr.cpp
    src/mgr/swmgr.cpp
    src/utilfuns/utilstr.cpp

    Maybe this helps to find the root-cause.

    Best regards,
    Tobias

    On 10/12/20 9:20 PM, Tobias Klein wrote:

    I'll see whether I can collect a stack trace. It may take
    some time until I have it.

    The multi-threaded "remote source refreshing" worked
    without issues until recently.

    Here is the code of the function that does the actual work
    in a thread.
    See
    
https://github.com/tobias-klein/node-sword-interface/blob/787160ccb4b3bab2a762d22f74031c7237edc803/src/sword_backend/repository_interface.cpp#L105.

    
intRepositoryInterface::refreshIndividualRemoteSource(stringremoteSourceName,
    std::function<void(unsignedintprogress)>*progressCallback)
    {
    //cout << "Refreshing source " << remoteSourceName << endl
    << flush;
    InstallSource* source= this->getRemoteSource(remoteSourceName);
    intresult= this->_installMgr->refreshRemoteSource(source);
    if(result!= 0) {
    cerr<<"Failed to refresh source
    "<<remoteSourceName<<endl<<flush;
    }
    remoteSourceUpdateMutex.lock();
    this->_remoteSourceUpdateCount++;
    unsignedinttotalPercent=
    (unsignedint)calculateIntPercentage<double>(this->_remoteSourceUpdateCount,
    this->_remoteSourceCount);
    if(progressCallback!= 0) {
    (*progressCallback)(totalPercent);
    }
    remoteSourceUpdateMutex.unlock();
    returnresult;
    }

    Best regards,
    Tobias

    On 10/12/20 9:01 PM, Troy A. Griffitts wrote:
    Any luck getting a stack trace on crash?

    Regarding the "multitheaded mode", I'd have to get a bit
    more information as to exactly how you are sharing SWORD
    objects across your threads. Generally, as a rule, you
    shouldn't. We recommend a separate instance of SWMgr per
    thread and that probably goes for InstallMgr, as well.

    Troy

    On October 12, 2020 8:29:31 PM GMT+02:00, Tobias Klein
    <cont...@tklein.info> wrote:

        Hi Troy,

        I'm using curl on all three platforms.

        Regarding the timeout configuration I have not changed
        anything yet, to make this configurable in Ezra
        Project is still on my todo list.

        I just checked on Linux.
        With the old version (May 18th 2020) no hanging or
        crash in 10 out of 10 times.
        WIth the new version (latest trunk / SWORD 1.9 RC3) I
        get 1 x crash, 2 x hanging, 7 x working.

        I'm running the InstallMgr::refreshRemoteSource "in a
        multi-threaded mode".

        Best regards,
        Tobias

        On 10/12/20 6:59 PM, Troy A. Griffitts wrote:
        Hi Tobias,

        What transport library are you building with? ftplib
        or curl?

        Have you changed the value of our new timeout from
        the default, I believe we decided on, 10 seconds?

        Troy

        On October 12, 2020 6:46:54 PM GMT+02:00, Tobias
        Klein <cont...@tklein.info> wrote:

            Hi Troy,

            In my latest Ezra Project builds using SWORD trunk I’ve been noticing 
random „hangs“ and crashes related to "updating remote sources“. I suppose it 
must be around InstallMgr::refreshRemoteSource.

            This was still rock solid when using SWORD trunk from May 18th 
2020, but not so any more with the recent SWORD trunk.

            Unfortunately I cannot pinpoint this more specifically. I just 
wanted to first share this observation, because it’s worrying me.

            I’ve been noticing this regression both on Windows and macOS. Need 
to check later whether this also happens on Linux, cannot recall it right now.

            Best regards,
            Tobias
            
------------------------------------------------------------------------
            sword-devel mailing list:sword-devel@crosswire.org
            http://crosswire.org/mailman/listinfo/sword-devel
            Instructions to unsubscribe/change your settings at above page


-- Sent from my Android device with K-9 Mail. Please excuse my brevity.


-- Sent from my Android device with K-9 Mail. Please excuse my brevity.

    _______________________________________________
    sword-devel mailing list:sword-devel@crosswire.org
    http://crosswire.org/mailman/listinfo/sword-devel
    Instructions to unsubscribe/change your settings at above page

    _______________________________________________
    sword-devel mailing list:sword-devel@crosswire.org
    http://crosswire.org/mailman/listinfo/sword-devel
    Instructions to unsubscribe/change your settings at above page

    _______________________________________________
    sword-devel mailing list:sword-devel@crosswire.org
    http://crosswire.org/mailman/listinfo/sword-devel
    Instructions to unsubscribe/change your settings at above page

    _______________________________________________
    sword-devel mailing list:sword-devel@crosswire.org
    http://crosswire.org/mailman/listinfo/sword-devel
    Instructions to unsubscribe/change your settings at above page


--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

_______________________________________________
sword-devel mailing list:sword-devel@crosswire.org
http://crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page

_______________________________________________
sword-devel mailing list:sword-devel@crosswire.org
http://crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page

_______________________________________________
sword-devel mailing list:sword-devel@crosswire.org
http://crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page

_______________________________________________
sword-devel mailing list: sword-devel@crosswire.org
http://crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page
_______________________________________________
sword-devel mailing list: sword-devel@crosswire.org
http://crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page

Reply via email to