Dear Tobias,
Thank you for all the great information. This enabled me to
isolate the change which caused the issue.
So, for a bit of background, SWORD has no calls to mark
critical sections which might be problematic for re-entrant
usage. This has been due to the many implementations of
threading across many different platforms over the years,
before C++11. But, as a policy to support clients which
desire to use SWORD in a multithreaded manner, we do our best
to make this safe by advising clients to use separate SWMgr
instances per thread. There are still some shared objects in
this scenario, but we do our best to do all the writing to
these shared objects upon initialization. We broke this rule
in commit 2760, which is what caused your problem. SWORD have
a facility to pool open file handles, to help OSs which have
small open file handle limits. This work is done in FileMgr.
Recently, to support Windows Unicode path names (the commit
you found which breaks your multithreaded use), we rounded up
all remaining native file IO calls and replaced them to used
FileMgr for the IO and then extended FileMgr to handle Windows
Unicode paths in a Windows-specific manner. One of these
changes was in CURLFTPTransport, which is where you are having
the issue. The problem is that, where previously this class
was directly opening a FILE to do its writing, commit 2760
changed this to use FileMgr to open the file, which involved
the SWORD-wide file handle pool, and since we are create a new
file, we are always writing to this shared pool container,
which is not threadsafe. My guess is that you have two
threads trying to update the pool container at exactly the
same time. Using the file handle pool is usually safe,
because SWMgr "opens" all of its file handles on
initialization (these are not actually opening OS file
handles, but instead updating the file handle pool container
with proxy objects which delay actual OS open to on-demand,
but the point is this instance of shared file handle pool
container writing is done on creation of the SWMgr, afterward,
the shared resource file handle pool is only read and each
object in the pool is owned by only 1 thread if the "each
thread must have its own SWMgr" rule is followed.
Regardless of the details. I believe I have committed a fix
for you. In short, I have changed CURLFTPTransport to follow
our rule to avoid writing to shared objects when we might be
re-entrant. Here we now use FileMgr's methods which isolate
OS implementation, but not FileMgr's file handle pool (as it
did not previously use the pool before this commit). This
should allow this to still take advantage of the Windows
OS-specific implementation, and also avoid the critical
section. Can you please try SVN head and let me know if we
are back to 20 out of 20 successes?
Thanks again for the very helpful debug log and exact revision
where failure began.
Troy
On 10/13/20 10:08 PM, Tobias Klein wrote:
I managed to get a backtrace to a segmentation fault using GDB.
It seems like the crash is happening in sword::FileMgr::open( ...
The starting point is sword::InstallMgr::refreshRemoteSource
as I was writing before.
Best regards,
Tobias
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7f1af3fff700 (LWP 220833)]
0x00007f1b027045a4 in sword::FileMgr::open(char const*, int,
int, bool) () from
/home/tobi/dev/ezra_project/node-sword-interface-git/build/Release/node_sword_interface.node
(gdb) backtrace
#0 0x00007f1b027045a4 in sword::FileMgr::open(char const*,
int, int, bool) () from
/home/tobi/dev/ezra_project/node-sword-interface-git/build/Release/node_sword_interface.node
#1 0x00007f1b0276ad7b in sword::(anonymous
namespace)::my_fwrite(void*, unsigned long, unsigned long,
void*) ()
from
/home/tobi/dev/ezra_project/node-sword-interface-git/build/Release/node_sword_interface.node
#2 0x00007f1b180626bf in ?? () from
/usr/lib/x86_64-linux-gnu/libcurl-gnutls.so.4
#3 0x00007f1b18074a2b in ?? () from
/usr/lib/x86_64-linux-gnu/libcurl-gnutls.so.4
#4 0x00007f1b1807e2e4 in ?? () from
/usr/lib/x86_64-linux-gnu/libcurl-gnutls.so.4
#5 0x00007f1b1807f6f9 in curl_multi_perform () from
/usr/lib/x86_64-linux-gnu/libcurl-gnutls.so.4
#6 0x00007f1b18075d13 in curl_easy_perform () from
/usr/lib/x86_64-linux-gnu/libcurl-gnutls.so.4
#7 0x00007f1b0276b683 in
sword::CURLFTPTransport::getURL(char const*, char const*,
sword::SWBuf*) () from
/home/tobi/dev/ezra_project/node-sword-interface-git/build/Release/node_sword_interface.node
#8 0x00007f1b0271d5d2 in
sword::InstallMgr::remoteCopy(sword::InstallSource*, char
const*, char const*, bool, char const*) ()
from
/home/tobi/dev/ezra_project/node-sword-interface-git/build/Release/node_sword_interface.node
#9 0x00007f1b0271edc7 in
sword::InstallMgr::refreshRemoteSource(sword::InstallSource*)
() from
/home/tobi/dev/ezra_project/node-sword-interface-git/build/Release/node_sword_interface.node
#10 0x00007f1b026ad734 in
RepositoryInterface::refreshIndividualRemoteSource(std::__cxx11::basic_string<char,
std::char_traits<char>, std::allocator<char> >,
std::function<void (unsigned int)>*) ()
from
/home/tobi/dev/ezra_project/node-sword-interface-git/build/Release/node_sword_interface.node
#11 0x00007f1b026b17dd in
std::thread::_State_impl<std::thread::_Invoker<std::tuple<int
(RepositoryInterface::*)(std::__cxx11::basic_string<char,
std::char_traits<char>, std::allocator<char> >,
std::function<void (unsigned int)>*), RepositoryInterface*,
std::__cxx11::basic_string<char, std::char_traits<char>,
std::allocator<char> >, std::function<void (unsigned int)>*>
> >::_M_run() ()
from
/home/tobi/dev/ezra_project/node-sword-interface-git/build/Release/node_sword_interface.node
#12 0x00007f1b1d622cb4 in ?? () from
/usr/lib/x86_64-linux-gnu/libstdc++.so.6
#13 0x00007f1b1e20a609 in start_thread () from
/lib/x86_64-linux-gnu/libpthread.so.0
#14 0x00007f1b1e131103 in clone () from
/lib/x86_64-linux-gnu/libc.so.6
On 10/13/20 1:07 PM, Tobias Klein wrote:
Hi Troy,
I tested more SVN revisions of SWORD trunk (starting from my
stable version until I hit the bug) and I can now say that
SVN Rev. 3759 is the last SVN revision that works without
hanging for the below mentioned scenario. (20 out of 20
tests successful)
SVN Rev. 3760 is the first SVN revision where the hanging
occurs. The commit message is "First cut at better isolation
of FileIO to FileMgr and providing a WIN32 impl with works
with wchar_t".
Modified files:
include/filemgr.h
include/swbuf.h
lib/bcppmake/libsword.bpr
src/mgr/curlftpt.cpp
src/mgr/curlhttpt.cpp
src/mgr/filemgr.cpp
src/mgr/installmgr.cpp
src/mgr/swmgr.cpp
src/utilfuns/utilstr.cpp
Maybe this helps to find the root-cause.
Best regards,
Tobias
On 10/12/20 9:20 PM, Tobias Klein wrote:
I'll see whether I can collect a stack trace. It may take
some time until I have it.
The multi-threaded "remote source refreshing" worked
without issues until recently.
Here is the code of the function that does the actual work
in a thread.
See
https://github.com/tobias-klein/node-sword-interface/blob/787160ccb4b3bab2a762d22f74031c7237edc803/src/sword_backend/repository_interface.cpp#L105.
intRepositoryInterface::refreshIndividualRemoteSource(stringremoteSourceName,
std::function<void(unsignedintprogress)>*progressCallback)
{
//cout << "Refreshing source " << remoteSourceName << endl
<< flush;
InstallSource* source= this->getRemoteSource(remoteSourceName);
intresult= this->_installMgr->refreshRemoteSource(source);
if(result!= 0) {
cerr<<"Failed to refresh source
"<<remoteSourceName<<endl<<flush;
}
remoteSourceUpdateMutex.lock();
this->_remoteSourceUpdateCount++;
unsignedinttotalPercent=
(unsignedint)calculateIntPercentage<double>(this->_remoteSourceUpdateCount,
this->_remoteSourceCount);
if(progressCallback!= 0) {
(*progressCallback)(totalPercent);
}
remoteSourceUpdateMutex.unlock();
returnresult;
}
Best regards,
Tobias
On 10/12/20 9:01 PM, Troy A. Griffitts wrote:
Any luck getting a stack trace on crash?
Regarding the "multitheaded mode", I'd have to get a bit
more information as to exactly how you are sharing SWORD
objects across your threads. Generally, as a rule, you
shouldn't. We recommend a separate instance of SWMgr per
thread and that probably goes for InstallMgr, as well.
Troy
On October 12, 2020 8:29:31 PM GMT+02:00, Tobias Klein
<cont...@tklein.info> wrote:
Hi Troy,
I'm using curl on all three platforms.
Regarding the timeout configuration I have not changed
anything yet, to make this configurable in Ezra
Project is still on my todo list.
I just checked on Linux.
With the old version (May 18th 2020) no hanging or
crash in 10 out of 10 times.
WIth the new version (latest trunk / SWORD 1.9 RC3) I
get 1 x crash, 2 x hanging, 7 x working.
I'm running the InstallMgr::refreshRemoteSource "in a
multi-threaded mode".
Best regards,
Tobias
On 10/12/20 6:59 PM, Troy A. Griffitts wrote:
Hi Tobias,
What transport library are you building with? ftplib
or curl?
Have you changed the value of our new timeout from
the default, I believe we decided on, 10 seconds?
Troy
On October 12, 2020 6:46:54 PM GMT+02:00, Tobias
Klein <cont...@tklein.info> wrote:
Hi Troy,
In my latest Ezra Project builds using SWORD trunk I’ve been noticing
random „hangs“ and crashes related to "updating remote sources“. I suppose it
must be around InstallMgr::refreshRemoteSource.
This was still rock solid when using SWORD trunk from May 18th
2020, but not so any more with the recent SWORD trunk.
Unfortunately I cannot pinpoint this more specifically. I just
wanted to first share this observation, because it’s worrying me.
I’ve been noticing this regression both on Windows and macOS. Need
to check later whether this also happens on Linux, cannot recall it right now.
Best regards,
Tobias
------------------------------------------------------------------------
sword-devel mailing list:sword-devel@crosswire.org
http://crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page
--
Sent from my Android device with K-9 Mail. Please
excuse my brevity.
--
Sent from my Android device with K-9 Mail. Please excuse
my brevity.
_______________________________________________
sword-devel mailing list:sword-devel@crosswire.org
http://crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page
_______________________________________________
sword-devel mailing list:sword-devel@crosswire.org
http://crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page
_______________________________________________
sword-devel mailing list:sword-devel@crosswire.org
http://crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page
_______________________________________________
sword-devel mailing list:sword-devel@crosswire.org
http://crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page