Hi David and Robbin,

Please review a new version of the fix that makes the service thread 
responsible for the thread table growth.

Webrev:  http://cr.openjdk.java.net/~dtitov/8231666/webrev.02/
Bug: https://bugs.openjdk.java.net/browse/JDK-8231666 

Testing:  Mach5 tier1, tier2, and tier3 tests successfully passed.

Thank you!

Best regards,
Daniil 

On 10/2/19, 3:26 PM, "David Holmes" <david.hol...@oracle.com> wrote:

    Hi Daniil,
    
    On 3/10/2019 2:21 am, Daniil Titov wrote:
    > Hi David and Robbin,
    > 
    > Could we consider  making the ServiceThread responsible for the 
ThreadIdTable resizing in the similar way how
    > it works for  StringTable  and ResolvedMethodTable, rather than having 
ThreadIdTable::add() method calling ThreadIdTable::grow()?
    > As I understand It should solve  the current  issue and  address the 
concern that  the doing the resizing could be a relatively long and
    > doing it without polling  for safepoints or while the holding 
Threads_lock is not desirable.
    
    I originally rejected copying that part of the code from the other 
    tables as it seems to introduce unnecessary complexity. Having a 
    separate thread trying to grow the table when it could be concurrently 
    having threads added and removed seems like it could introduce hard to 
    diagnose performance pathologies. It also adds what we know to be a 
    potentially long running action to the workload of the service thread, 
    which means it may also impact the other tasks the service thread is 
    doing, thus potentially introducing even more hard to diagnose 
    performance pathologies.
    
    So this change does concern me. But go ahead and trial it.
    
    Thanks,
    David
    
    
    > Thank you,
    > Daniil
    > 
    > 
    > On 10/2/19, 6:25 AM, "David Holmes" <david.hol...@oracle.com> wrote:
    > 
    >      Hi Robbin,
    >      
    >      On 2/10/2019 7:58 pm, Robbin Ehn wrote:
    >      > Hi David,
    >      >
    >      >> What if the table is full and must be grown?
    >      >
    >      > The table uses chaining, it just means load factor tip over what is
    >      > considered a good backing array size.
    >      
    >      Coleen raised a good question in a separate discussion, which made me
    >      realize that once the table has been initially populated all 
subsequent
    >      additions, and hence all subsequent calls to grow() always happen 
with
    >      the Threads_lock held. So we can't just defer the grow().
    >      
    >      >> That aside, I'd like to know how expensive it is to grow this 
table.
    >      >> What are we talking about here?
    >      >
    >      > We use global counter which on write_synchronize must scan all
    >      > threads to make sure they have seen the update (there some
    >      > optimization to avoid it if there is no readers at all). Since this
    >      > table contains the threads, we get double penalized, for each new
    >      > thread the synchronization cost increase AND the number of items.
    >      >
    >      > With concurrent reads you still need many thousands of threads, but
    >      > I think I saw someone mentioning 100k threads, assuming concurrent
    >      > queries the resize can take hundreds of ms to finish. Note that 
reads
    >      > and inserts still in operate roughly at the same speed while
    >      > resizing. So a longer resize is only problematic if we do not
    >      > respect safepoints.
    >      I think if anything were capable of running 100K threads we would be
    >      hitting far worse scalability bottlenecks than this. But this does 
seem
    >      problematic.
    >      
    >      Thanks,
    >      David
    >      -----
    >      
    >      > Thanks, Robbin
    >      >
    >      >>
    >      >> David
    >      >>
    >      >>> /Robbin
    >      >>>
    >      >>> On 2019-10-02 08:46, David Holmes wrote:
    >      >>>> Hi Daniil,
    >      >>>>
    >      >>>> On 2/10/2019 4:13 pm, Daniil Titov wrote:
    >      >>>>> Please review a change that fixes the issue. The problem here 
is
    >      >>>>> that that the thread is added to the ThreadIdTable  
(introduced in
    >      >>>>> [3]) while the Threads_lock is held by
    >      >>>>> JVM_StartThread. When new thread is added  to the thread table 
the
    >      >>>>> table checks if its load factor is greater than required and 
if so
    >      >>>>> it grows itself while polling for safepoints.
    >      >>>>> After changes [4]  an attempt to block the thread while 
holding the
    >      >>>>> Threads_lock  results in assertion in
    >      >>>>> Thread::check_possible_safepoint().
    >      >>>>>
    >      >>>>> The fix  proposed by David Holmes ( thank you, David!)  is to 
skip
    >      >>>>> the ThreadBlockInVM inside ThreadIdTable::grow() method if the
    >      >>>>> current thread owns the Threads_lock.
    >      >>>>
    >      >>>> Sorry but looking at the fix in context now I think it would be
    >      >>>> better to do this:
    >      >>>>
    >      >>>>      while (gt.do_task(jt)) {
    >      >>>>        if (Threads_lock->owner() == jt) {
    >      >>>>          gt.pause(jt);
    >      >>>>          ThreadBlockInVM tbivm(jt);
    >      >>>>          gt.cont(jt);
    >      >>>>        }
    >      >>>>      }
    >      >>>>
    >      >>>> This way we don't waste time with the pause/cont when there's no
    >      >>>> safepoint pause going to happen - and the owner() check is 
quicker
    >      >>>> than owned_by_self(). That partially addresses a general 
concern I
    >      >>>> have about how long it may take to grow the table, as we are
    >      >>>> deferring safepoints until it is complete in this 
JVM_StartThread
    >      >>>> usecase.
    >      >>>>
    >      >>>> In the test you don't need all of:
    >      >>>>
    >      >>>>    32  * @run clean ThreadStartTest
    >      >>>>    33  * @run build ThreadStartTest
    >      >>>>    34  * @run main ThreadStartTest
    >      >>>>
    >      >>>> just the last @run suffices to build and run the test.
    >      >>>>
    >      >>>> Thanks,
    >      >>>> David
    >      >>>> -----
    >      >>>>
    >      >>>>> Testing : Mach 5 tier1 and tier2 completed successfully, tier3 
is
    >      >>>>> in progress.
    >      >>>>>
    >      >>>>> [1] Webrev: 
http://cr.openjdk.java.net/~dtitov/8231666/webrev.01/
    >      >>>>> [2] Bug: https://bugs.openjdk.java.net/browse/JDK-8231666
    >      >>>>> [3] https://bugs.openjdk.java.net/browse/JDK-8185005
    >      >>>>> [4] https://bugs.openjdk.java.net/browse/JDK-8184732
    >      >>>>>
    >      >>>>> Best regards,
    >      >>>>> Danill
    >      >>>>>
    >      >>>>>
    >      
    > 
    > 
    


Reply via email to