Sergey Korotkov created IGNITE-16424:
----------------------------------------

             Summary: Generate test data in several threads in 
DataGenerationApplication
                 Key: IGNITE-16424
                 URL: https://issues.apache.org/jira/browse/IGNITE-16424
             Project: Ignite
          Issue Type: Improvement
            Reporter: Sergey Korotkov
            Assignee: Sergey Korotkov


DataGenerationApplication app is used to fill ignite cluster with test data. 
Now It accepts as parameters:
 * number of caches to create
 * number of entries to put to each cache (interval of integer keys in fact)
 * size of each cache entry
 * number of backup partitions for caches created

Currently application creates and fills caches one by one in one thread.  For 
huge caches it is a very time-consuming operation. On the other hand it's known 
that the parallell load in several threads via the Ignite Streamer works fine 
and can speed up the process significantly.

It's also known that such parallell operations are very heap-memory intensive. 
Special attention should be paid to this issue.

---

So, the tasks looks like:
 # Modify the DataGenerationApplication as it would load data in several 
threads. It should accept new integer parameter: _*threads*_ (1 by default) and 
use this number of threads to load data trying to spread work evenly between 
them.
 # Try to figure out some heuristic of the dependence of the required heap 
memory on the number of threads, caches and data size.  It may require to 
implement some tool to get actual heap usage metrics from 
DataGenerationApplication after test run.
 # Implement this heuristic in python (to start DataGenerationApplication with 
the appropriate heap memory JVM options)

 

 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to