[ 
https://issues.apache.org/jira/browse/IGNITE-8359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16459546#comment-16459546
 ] 

Joel Lang commented on IGNITE-8359:
-----------------------------------

So using the 2.5 nightly build from April 27th, I ran the code to generate the 
6,000,000 entries for cache A then 12,000,000 entries for cache B using the 
data streamer for each. This again in a linux VM operating on a HDD.

This was started on Friday before I left work. When I came into the office I 
found that it had not even finished the operation. It was about 95-97% done. 
The fact that it didn't finish over such a long period of time is a bit obscene.

> Severe performance degradation with persistence and data streaming on HDD
> -------------------------------------------------------------------------
>
>                 Key: IGNITE-8359
>                 URL: https://issues.apache.org/jira/browse/IGNITE-8359
>             Project: Ignite
>          Issue Type: Bug
>          Components: cache, persistence, sql, streaming
>    Affects Versions: 2.4, 2.5
>         Environment: Linux CentOS 7 VM using Ignite DirectIO plugin with HDD.
>            Reporter: Joel Lang
>            Priority: Major
>
> I am testing the use of Ignite's native persistence to store a data set long 
> term. This is on a 2.5 nightly build. To do this I am using Ignite's data 
> streamers to stream in 6,000,000 entries into cache A, and 12,000,000 entries 
> into cache B to simulate the upper limit for 2 years worth of data.
> The test ran smoothly on my personal machine which has a SSD running Windows, 
> but ran into tremendous issues on a development test machine which is a Linux 
> VM using a HDD. I realize when looking at Ignite documentation that it 
> specifically excludes HDD's as something to base a persistent store on, but 
> perhaps my experience could yield improvements for SSD performance too.
> The root issue is that cache updates over time become severely bottlenecked 
> by reading SQL index pages from disk in order to update the index. If I had 
> to guess this would be related to BPlusTree.findInsertionPoint() and it 
> having to load pages from disk if they've been evicted.
> I used a 2.5 nightly build because 2.3 and 2.4 have the same issue where this 
> whole process was further bottlenecked by a lock behind held by Ignite while 
> it read the page from disk in PageMemoryImpl.acquirePage(). 2.5 fixed this.
> The performance issue was much more severe in the previously mentioned cache 
> B, which contains user comments on entries in cache A. The key for each 
> comment entry is a Java class containing the creation timestamp and the 
> string key of the owning entry in cache A. This owning entry key is indexed 
> so comments can be queried by their owner. In this test case there were two 
> comments in cache B for every entry in cache A.
> I found that even 25% of the way through streaming data into cache B, it 
> would take anywhere from 15 to 35 seconds to insert a batch of 2000 comments. 
> This slowed streaming to a crawl and ensures that streaming would need to 
> continue overnight to have any hope of finishing.
> This also brings up concerns about data rebalancing which will have the same 
> performance penalty and similarly take a day at least to rebalance both 
> caches.
> I am worried about the dependency on a large amount of disk reads being done 
> to update the index, even though it is considerably faster with an SSD than 
> without. I've also not been able to test whether performance for an SSD will 
> be different when running in a VM, which is another worry.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to