Re: very fast loading of very big table

Stephen Darlington Fri, 19 Feb 2021 05:41:31 -0800

I think it’s more that that putAll is mostly atomic, so the more records you 
save in one chunk, the more locking, etc. happens. Distributing as compute jobs 
means all the putAlls will be local which is beneficial, and the size of each 
put is going to be smaller (also beneficial).


But that’s a lot of work that the data streamer already does for you and the 
data streamer also batches updates so would still be faster.

> On 19 Feb 2021, at 13:33, Maximiliano Gazquez <maximiliano....@gmail.com> 
> wrote:
> 
> What would be the difference between doing cache.putAll(all rows) and 
> separating them by affinity key+executing putAll inside a compute job.
> If I'm not mistaken, doing putAll should end up splitting those rows by 
> affinity key in one of the servers, right? 
> Is there a comparison of that?
> 
> On Fri, Feb 19, 2021 at 9:51 AM Taras Ledkov <tled...@gridgain.com 
> <mailto:tled...@gridgain.com>> wrote:
> Hi Vladimir,
> 
> Did you try to use SQL command 'COPY FROM <csv_file>' via thin JDBC?
> This command uses 'IgniteDataStreamer' to write data into cluster and parse 
> CSV on the server node.
> 
> PS. AFAIK IgniteDataStreamer is one of the fastest ways to load data.
> 
>> Hi Denis,
>> 
>> Data space is 3.7Gb according to MSSQL table properries
>> 
>> Vladimir
>> 
>> 9:47, 19 февраля 2021 г., Denis Magda <dma...@apache.org> 
>> <mailto:dma...@apache.org>:
>> Hello Vladimir, 
>> 
>> Good to hear from you! How much is that in gigabytes?
>> 
>> -
>> Denis
>> 
>> 
>> On Thu, Feb 18, 2021 at 10:06 PM <vtcher...@gmail.com 
>> <mailto:vtcher...@gmail.com>> wrote:
>> Sep 2020 I've published the paper about Loading Large Datasets into Apache 
>> Ignite by Using a Key-Value API (English [1] and Russian [2] version). The 
>> approach described works in production, but shows inacceptable perfomance 
>> for very large tables.
>> 
>> The story continues, and yesterday I've finished the proof of concept for 
>> very fast loading of very big table. The partitioned MSSQL table about 295 
>> million rows was loaded by the 4-node Ignite cluster in 3 min 35 sec. Each 
>> node had executed its own SQL queries in parallel and then distributed the 
>> loaded values across the other cluster nodes.
>> 
>> Probably that result will be of interest for the community.
>> 
>> Regards,
>> Vladimir Chernyi
>> 
>> [1] 
>> https://www.gridgain.com/resources/blog/how-fast-load-large-datasets-apache-ignite-using-key-value-api
>>  
>> <https://www.gridgain.com/resources/blog/how-fast-load-large-datasets-apache-ignite-using-key-value-api>
>> [2] https://m.habr.com/ru/post/526708/ <https://m.habr.com/ru/post/526708/>
>> 
>> 
>> -- 
>> Отправлено из мобильного приложения Яндекс.Почты
> -- 
> Taras Ledkov
> Mail-To: tled...@gridgain.com <mailto:tled...@gridgain.com>

Re: very fast loading of very big table

Reply via email to