Re: Slow data load in ignite from S3

David Harvey Mon, 02 Apr 2018 11:09:32 -0700

When I did this, I found that with Ignite Persistence,  there is a lot of
write amplification (many times more bytes written to SSD than data bytes
written) the during the checkpoints, which makes sense because ignite
writes whole pages, and each record written dirties pieces of many pages.


The SSD write latency and throughput become critical.    On slower devices
(e.g., EBS GP2) separating the WAL  can help a bit, but the key is device
write speed.    On AWS, I found I needed to use local storage.

On Mon, Apr 2, 2018 at 6:22 AM, Andrey Mashenkov <andrey.mashen...@gmail.com
> wrote:

> Hi Rahul,
>
> Possibly, mostly a new data is loaded to Ignite.
> I meant, Ignite allocate new pages, rather than update ones.
>
> In that case, you may not get benefit from increasing checkpoint region
> size. It will just deffer a checkpoint.
>
> Also, you can try to move WAL and ignite store to different disks and to
> set region initial size to reduce or avoid region extents allocation .
>
> On Mon, Apr 2, 2018 at 9:59 AM, rahul aneja <rahulaneja...@gmail.com>
> wrote:
>
>> Hi Andrey,
>>
>> Yes we are using SSD. Earlier we were using default checkpoint buffer 256
>> MB , in order to reduce the frequency, we increased the buffer size , but
>> it didn’t have any impact on performance
>>
>> On Fri, 30 Mar 2018 at 10:49 PM, Andrey Mashenkov <
>> andrey.mashen...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> Possibly, storage is a bottleneck or checkpoint buffer is too large.
>>> Do you use Provissioned IOPS SSD?
>>>
>>>
>>> On Fri, Mar 30, 2018 at 3:32 PM, rahul aneja <rahulaneja...@gmail.com>
>>> wrote:
>>>
>>>> Hi ,
>>>>
>>>> We are trying to load orc data (around 50 GB) on s3  from spark using
>>>> dataframe API. It starts fast with good write throughput  and then after
>>>> sometime throughput just drops and it gets stuck.
>>>>
>>>> We also tried changing multiple configurations , but no luck
>>>> 1. enabling checkpoint write throttling
>>>> 2. disabling throttling and increasing checkpoint buffer
>>>>
>>>>
>>>> Please find below configuration and properties of the cluster
>>>>
>>>>
>>>>    1. 10 node cluster r4.4xl (EMR aws) and shared with spark
>>>>    2.  ignite is started with -Xms20g -Xmx30g
>>>>    3.  Cache mode is partitioned
>>>>
>>>>    4. persistence is enabled
>>>>    5. DirectIO is enabled
>>>>    6. No backup
>>>>
>>>> <property name=“dataStorageConfiguration”>
>>>>            <bean class=“org.apache.ignite.confi
>>>> guration.DataStorageConfiguration”>
>>>>                <!-- Enable write throttling. -->
>>>>                <property name=“writeThrottlingEnabled” value=“false”/>
>>>>                <property name=“defaultDataRegionConfiguration”>
>>>>                    <bean class=“org.apache.ignite.confi
>>>> guration.DataRegionConfiguration”>
>>>>                        <property name=“persistenceEnabled”
>>>> value=“true”/>
>>>>                        <property name=“checkpointPageBufferSize”
>>>>                    value=“#{20L * 1024 * 1024 * 1024}“/>
>>>>                        <property name=“name” value=“Default_Region”/>
>>>>                        <property name=“maxSize” value=“#{60L * 1024 *
>>>> 1024 * 1024}“/>
>>>>                    </bean>
>>>>                </property>
>>>>                <property name=“walMode” value=“NONE”/>
>>>>            </bean>
>>>>        </property>
>>>>
>>>>
>>>> Thanks in advance,
>>>>
>>>> Rahul Aneja
>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Best regards,
>>> Andrey V. Mashenkov
>>>
>>
>
>
> --
> Best regards,
> Andrey V. Mashenkov
>

Disclaimer

The information contained in this communication from the sender is 
confidential. It is intended solely for use by the recipient and others 
authorized to receive it. If you are not the recipient, you are hereby notified 
that any disclosure, copying, distribution or taking action in relation of the 
contents of this information is strictly prohibited and may be unlawful.

This email has been scanned for viruses and malware, and may have been 
automatically archived by Mimecast Ltd, an innovator in Software as a Service 
(SaaS) for business. Providing a safer and more useful place for your human 
generated data. Specializing in; Security, archiving and compliance. To find 
out more visit the Mimecast website.

Re: Slow data load in ignite from S3

Reply via email to