RE: Effective way to pre-load data around 10 TB

Stanislav Lukyanov Wed, 19 Dec 2018 08:55:18 -0800

The problem might be in HDD not performing fast enough, and also suffering from 
random reads
(IgniteCache::preloadPartition at least tries to read sequentially).


Also, do you have enough RAM to store all data? If not, you shouldn’t preload 
all the data, just the amount that fits into RAM.

Anyway, I think that your best chance is to implement the same thing 
https://issues.apache.org/jira/browse/IGNITE-8873 does.
E.g. you can try to backport the commit on top of 2.6.

Stan

From: Naveen
Sent: 5 декабря 2018 г. 7:59
To: user@ignite.apache.org
Subject: RE: Effective way to pre-load data around 10 TB

Thanks Stan, this may take little longer time to implement, we are in hurry
to build this functionality of preloading the data. 

Can someone correct me how to improve this pre-load process.

This is how we are preloading. 

1. Send an Async request for all the partitions with the below code, below
loop will get repeated for all the caches we have 

                        for (int i = 0; i < affinity.partitions(); i++) {
                                List<String> cacheList = 
Arrays.asList(cacheName);
                                affinityRunAsync= 
compute.affinityRunAsync(cacheList, i, new
DataPreloadTask(cacheList, i));
        
                        }
                        
2. Inside DataPreloadTask which is running on the Ignite node. 
I just execute scan query for the given partition and iterate thru the
cursor. not doing anything else. 


                IgniteCache<Object, Object> igniteCache = 
localIgnite.cache(cacheName);
                try (QueryCursor<Cache.Entry&lt;K, V>> cursor = 
igniteCache.query(new
ScanQuery().setPartition(partitionNo))) {
                        
                        for (Cache.Entry<K, V> entry : cursor) {
                                }
                                
                        }
                }

However, this seems to be quite slow. Taking more than 3 hours to read one
cache which has 400 M records. We have 30 such caches to load data, so not
fining this so efficient. 

Can we improve this, we do have very powerful machines with 128 CPU, 2 TB
RAM, HDD, our CPU utilization is also not so high when we are preloading the
data. 
Changing thread pool size will have any impact this read ???

Thanks
Naveen



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

RE: Effective way to pre-load data around 10 TB

Reply via email to