I posted a PR + Jira ticket with the update: https://github.com/apache/ignite/pull/10123
The PR checks are still running/pending. Any feedback/help is appreciated. Art On Tue, Jun 28, 2022 at 10:53 PM Pavel Tupitsyn <ptupit...@apache.org> wrote: > Thank you for tracking this down! An additional map by name is a good idea > there. > > > CONCURRENCY NOTE: these two maps need to update concurrently > All updates are triggered by discovery events, which are raised under > "synchronized (discoEvtMux)" in GridDiscoveryManager, > so it is safe to update two maps together. > > > is desc.name() unique? > Yes > > > > On Wed, Jun 29, 2022 at 2:06 AM Arthur Naseef <artnas...@apache.org> > wrote: > >> The following is taking most of the time: >> >> @Nullable private ServiceInfo lookupInRegisteredServices(String name) { >> for (ServiceInfo desc : registeredServices.values()) { >> if (desc.name().equals(name)) >> return desc; >> } >> >> >> return null; >> } >> >> After changing that to use a Map lookup: >> >> - 50,000 service startup in *8s* (down from around 70s) >> - 100,000 service startup in *14s* (right around 2x of the 50K timing) >> >> >> Here's the change I tested (note it's shortened) - it's not 100%, but >> fine for my test cast, I believe: >> >> private final ConcurrentMap<String, ServiceInfo> registeredServicesByName >> = new ConcurrentHashMap<>(); >> >> >> @Nullable private ServiceInfo lookupInRegisteredServices(String name) { >> return registeredServicesByName.get(name); >> } >> >> private void registerService(ServiceInfo desc) { >> desc.context(ctx); >> >> >> // (CONCURRENCY NOTE: these two maps need to update concurrently) >> registeredServices.put(desc.serviceId(), desc); >> registeredServicesByName.put(desc.name(), desc); >> } >> >> >> That's in IgniteServiceProcessor.java. >> >> Any thoughts? I'll gladly clean this up and make PR - would appreciate >> feedback to help address possible questions with this change (e.g. is >> desc.name() unique?). >> >> Art >> >> >> On Tue, Jun 28, 2022 at 12:27 PM Arthur Naseef <artnas...@apache.org> >> wrote: >> >>> Yes. The "services" in our case will be schedules that periodically >>> perform fast operations. >>> >>> For example a service could be, "ping this device every <x> seconds". >>> >>> Art >>> >>> On Tue, Jun 28, 2022 at 12:20 PM Pavel Tupitsyn <ptupit...@apache.org> >>> wrote: >>> >>>> > we do not plan to make cross-cluster calls into the services >>>> >>>> If you are making local calls, I think there is no point in using >>>> Ignite services. >>>> Can you describe the use case - what are you trying to achieve? >>>> >>>> On Tue, Jun 28, 2022 at 8:55 PM Arthur Naseef <artnas...@apache.org> >>>> wrote: >>>> >>>>> Hello - I'm getting started with Ignite and looking seriously at using >>>>> it for a specific use-case. >>>>> >>>>> Working on a Proof-Of-Concept (POC), I am finding a question related >>>>> to performance, and wondering if the solution, using Ignite Services, is a >>>>> good fit for the use-case. >>>>> >>>>> In my testing, I am getting the following timings: >>>>> >>>>> - Startup of 20,000 ignite services takes 30 seconds >>>>> - Startup of 50,000 ignite services takes 250 seconds >>>>> - The 2.5x increase from 20,000 to 50,000 yielded > 8x cost in >>>>> startup time (appears to be exponential growth) >>>>> >>>>> Watching the JVM during this time, I see the following: >>>>> >>>>> - Heap usage is not significant (do not see signs of GC) >>>>> - CPU usage is only slightly increased - on the order of 20% total >>>>> (system has 12 cores/24 threads) >>>>> - Network utilization is reasonable >>>>> - Futex system call (measured with "strace -r") appears to be >>>>> taking the most time by far. >>>>> >>>>> The use-case involves the following: >>>>> >>>>> - Startup of up-to hundreds-of-thousands of services at cluster >>>>> spin-up >>>>> - Frequent, small adjustments to the services running over time >>>>> - Need to rebalance when a new node joins the cluster, or an old >>>>> one leaves the cluster >>>>> - Once the services are deployed, we do not plan to make >>>>> cross-cluster calls into the services (i.e. we do *not* plan to >>>>> use ignite's services().serviceProxy() on these) >>>>> - Jobs don't look like a fit because these (1) are "long-running" >>>>> (actually periodically scheduled tasks) and (2) they need to >>>>> redistribute >>>>> even after they start running >>>>> >>>>> This is starting to get long. I have more details to share. Here is >>>>> the repo with the code being used to test, and a link to a wiki page with >>>>> some of the details: >>>>> >>>>> https://github.com/opennms-forge/distributed-scheduling-poc/ >>>>> >>>>> >>>>> https://github.com/opennms-forge/distributed-scheduling-poc/wiki/Ignite-Startup-Performance >>>>> >>>>> >>>>> Questions I have in mind: >>>>> >>>>> - Are services a good fit here? We expect to reach upwards of >>>>> 500,000 services in a cluster with multiple nodes. >>>>> - Any thoughts on tracking down the bottleneck and alleviating >>>>> it? (I have started taking timing measurements in the Ignite code) >>>>> >>>>> Stopping here - please ask questions and I'll gladly fill in details. >>>>> Any tips are welcome, including ideas for tracking down just where the >>>>> bottleneck exists. >>>>> >>>>> Art >>>>> >>>>>