Re: RFR: 8231666: ThreadIdTable::grow() invokes invalid thread transition

David Holmes Fri, 04 Oct 2019 21:02:55 -0700

Hi Daniil,

On 5/10/2019 1:23 pm, Daniil Titov wrote:

Hi David and Robbin,


Please review a new version of the fix that adds the max size check 
check_concurrent_work code [1].


That change seems fine.

    I don't think you need to repeat the load factor check here:

void ThreadIdTable::do_concurrent_work(JavaThread* jt) {

         assert(_is_initialized, "Thread table is not initialized");
         _has_work = false;
         double load_factor = get_load_factor();
         log_debug(thread, table)("Concurrent work, load factor: %g",
    load_factor);
         if (load_factor > PREF_AVG_LIST_LEN &&
    !_local_table->is_max_size_reached()) {
           grow(jt);
         }
       }

as we will only execute this code if the load factor was seen to be too

    high.


I decided to leave it unchanged since in my understanding it could be the case 
when some threads exited and
were removed from the table after the work was triggered but before the service 
thread called do_concurrent_work()
method. In this case we might have the load factor back to the normal and 
therefore have no need to increase the size
  of the thread table.

True, but if new threads get added again you could just repeat theprocess. This is a more stable process if you use an "edge trigger"rather than a "level trigger". But either way we are making assumptionsabout the pattern of adding and removing threads. So okay to leave as-is.


So, good to go.

Thanks,
David

Testing: Mach5 tier1, tier2, and tier3 tests passed.

[1] Webrev:  http://cr.openjdk.java.net/~dtitov/8231666/webrev.03/
[2] Bug: https://bugs.openjdk.java.net/browse/JDK-8231666

Thank you,
Daniil

On 10/3/19, 11:30 PM, "Robbin Ehn" <[email protected]> wrote:

     Hi Daniil,

>

     > You might also want to put the max size check in the 
check_concurrent_work code:
     >
     > +   // Resize if we have more items than preferred load factor
     > +   if ( load_factor > PREF_AVG_LIST_LEN && 
!_local_table->is_max_size_reached()) {
     >
     > so that we don't keep waking up the service thread for nothing if the 
table gets
     > full.

Yes that would be a good, otherwise seems fine.>

     > Thanks,
     > David
     > -----
     >
     >> Testing:  Mach5 tier1, tier2, and tier3 tests successfully passed.

And if you have not done so, you should test this with the benchmark you have as

     a stress test and see that this does what we think.

Thanks, Robbin>>

     >> Thank you!
     >>
     >> Best regards,
     >> Daniil
     >>
     >> On 10/2/19, 3:26 PM, "David Holmes" <[email protected]> wrote:
     >>
     >>      Hi Daniil,
     >>      On 3/10/2019 2:21 am, Daniil Titov wrote:
     >>      > Hi David and Robbin,
     >>      >
     >>      > Could we consider  making the ServiceThread responsible for the
     >> ThreadIdTable resizing in the similar way how
     >>      > it works for  StringTable  and ResolvedMethodTable, rather than 
having
     >> ThreadIdTable::add() method calling ThreadIdTable::grow()?
     >>      > As I understand It should solve  the current  issue and  address 
the
     >> concern that  the doing the resizing could be a relatively long and
     >>      > doing it without polling  for safepoints or while the holding
     >> Threads_lock is not desirable.
     >>      I originally rejected copying that part of the code from the other
     >>      tables as it seems to introduce unnecessary complexity. Having a
     >>      separate thread trying to grow the table when it could be 
concurrently
     >>      having threads added and removed seems like it could introduce 
hard to
     >>      diagnose performance pathologies. It also adds what we know to be a
     >>      potentially long running action to the workload of the service 
thread,
     >>      which means it may also impact the other tasks the service thread 
is
     >>      doing, thus potentially introducing even more hard to diagnose
     >>      performance pathologies.
     >>      So this change does concern me. But go ahead and trial it.
     >>      Thanks,
     >>      David
     >>      > Thank you,
     >>      > Daniil
     >>      >
     >>      >
     >>      > On 10/2/19, 6:25 AM, "David Holmes" <[email protected]> 
wrote:
     >>      >
     >>      >      Hi Robbin,
     >>      >
     >>      >      On 2/10/2019 7:58 pm, Robbin Ehn wrote:
     >>      >      > Hi David,
     >>      >      >
     >>      >      >> What if the table is full and must be grown?
     >>      >      >
     >>      >      > The table uses chaining, it just means load factor tip 
over what is
     >>      >      > considered a good backing array size.
     >>      >
     >>      >      Coleen raised a good question in a separate discussion, 
which made me
     >>      >      realize that once the table has been initially populated all
     >> subsequent
     >>      >      additions, and hence all subsequent calls to grow() always 
happen
     >> with
     >>      >      the Threads_lock held. So we can't just defer the grow().
     >>      >
     >>      >      >> That aside, I'd like to know how expensive it is to grow 
this
     >> table.
     >>      >      >> What are we talking about here?
     >>      >      >
     >>      >      > We use global counter which on write_synchronize must 
scan all
     >>      >      > threads to make sure they have seen the update (there some
     >>      >      > optimization to avoid it if there is no readers at all). 
Since this
     >>      >      > table contains the threads, we get double penalized, for 
each new
     >>      >      > thread the synchronization cost increase AND the number 
of items.
     >>      >      >
     >>      >      > With concurrent reads you still need many thousands of 
threads, but
     >>      >      > I think I saw someone mentioning 100k threads, assuming 
concurrent
     >>      >      > queries the resize can take hundreds of ms to finish. 
Note that
     >> reads
     >>      >      > and inserts still in operate roughly at the same speed 
while
     >>      >      > resizing. So a longer resize is only problematic if we do 
not
     >>      >      > respect safepoints.
     >>      >      I think if anything were capable of running 100K threads we 
would be
     >>      >      hitting far worse scalability bottlenecks than this. But 
this does
     >> seem
     >>      >      problematic.
     >>      >
     >>      >      Thanks,
     >>      >      David
     >>      >      -----
     >>      >
     >>      >      > Thanks, Robbin
     >>      >      >
     >>      >      >>
     >>      >      >> David
     >>      >      >>
     >>      >      >>> /Robbin
     >>      >      >>>
     >>      >      >>> On 2019-10-02 08:46, David Holmes wrote:
     >>      >      >>>> Hi Daniil,
     >>      >      >>>>
     >>      >      >>>> On 2/10/2019 4:13 pm, Daniil Titov wrote:
     >>      >      >>>>> Please review a change that fixes the issue. The 
problem
     >> here is
     >>      >      >>>>> that that the thread is added to the ThreadIdTable
     >> (introduced in
     >>      >      >>>>> [3]) while the Threads_lock is held by
     >>      >      >>>>> JVM_StartThread. When new thread is added  to the 
thread
     >> table the
     >>      >      >>>>> table checks if its load factor is greater than 
required and
     >> if so
     >>      >      >>>>> it grows itself while polling for safepoints.
     >>      >      >>>>> After changes [4]  an attempt to block the thread 
while
     >> holding the
     >>      >      >>>>> Threads_lock  results in assertion in
     >>      >      >>>>> Thread::check_possible_safepoint().
     >>      >      >>>>>
     >>      >      >>>>> The fix  proposed by David Holmes ( thank you, 
David!)  is
     >> to skip
     >>      >      >>>>> the ThreadBlockInVM inside ThreadIdTable::grow() 
method if the
     >>      >      >>>>> current thread owns the Threads_lock.
     >>      >      >>>>
     >>      >      >>>> Sorry but looking at the fix in context now I think it 
would be
     >>      >      >>>> better to do this:
     >>      >      >>>>
     >>      >      >>>>      while (gt.do_task(jt)) {
     >>      >      >>>>        if (Threads_lock->owner() == jt) {
     >>      >      >>>>          gt.pause(jt);
     >>      >      >>>>          ThreadBlockInVM tbivm(jt);
     >>      >      >>>>          gt.cont(jt);
     >>      >      >>>>        }
     >>      >      >>>>      }
     >>      >      >>>>
     >>      >      >>>> This way we don't waste time with the pause/cont when 
there's no
     >>      >      >>>> safepoint pause going to happen - and the owner() 
check is
     >> quicker
     >>      >      >>>> than owned_by_self(). That partially addresses a 
general
     >> concern I
     >>      >      >>>> have about how long it may take to grow the table, as 
we are
     >>      >      >>>> deferring safepoints until it is complete in this
     >> JVM_StartThread
     >>      >      >>>> usecase.
     >>      >      >>>>
     >>      >      >>>> In the test you don't need all of:
     >>      >      >>>>
     >>      >      >>>>    32  * @run clean ThreadStartTest
     >>      >      >>>>    33  * @run build ThreadStartTest
     >>      >      >>>>    34  * @run main ThreadStartTest
     >>      >      >>>>
     >>      >      >>>> just the last @run suffices to build and run the test.
     >>      >      >>>>
     >>      >      >>>> Thanks,
     >>      >      >>>> David
     >>      >      >>>> -----
     >>      >      >>>>
     >>      >      >>>>> Testing : Mach 5 tier1 and tier2 completed 
successfully,
     >> tier3 is
     >>      >      >>>>> in progress.
     >>      >      >>>>>
     >>      >      >>>>> [1] Webrev:
     >> http://cr.openjdk.java.net/~dtitov/8231666/webrev.01/
     >>      >      >>>>> [2] Bug: 
https://bugs.openjdk.java.net/browse/JDK-8231666
     >>      >      >>>>> [3] https://bugs.openjdk.java.net/browse/JDK-8185005
     >>      >      >>>>> [4] https://bugs.openjdk.java.net/browse/JDK-8184732
     >>      >      >>>>>
     >>      >      >>>>> Best regards,
     >>      >      >>>>> Danill
     >>      >      >>>>>
     >>      >      >>>>>
     >>      >
     >>      >
     >>      >
     >>
     >>

Re: RFR: 8231666: ThreadIdTable::grow() invokes invalid thread transition

Reply via email to