[jira] [Updated] (FLINK-21859) Batch job fails due to "Could not mark slot 61a637e3977c58a0e6b73533c419297d active"

2021-03-31 Thread Yingjie Cao (Jira)
[ https://issues.apache.org/jira/browse/FLINK-21859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yingjie Cao updated FLINK-21859: Attachment: tm.log.zip > Batch job fails due to "Could not mark slot 61a637e3977c58a0e6b73533c4192

[jira] [Updated] (FLINK-21859) Batch job fails due to "Could not mark slot 61a637e3977c58a0e6b73533c419297d active"

2021-03-31 Thread Yingjie Cao (Jira)
[ https://issues.apache.org/jira/browse/FLINK-21859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yingjie Cao updated FLINK-21859: Attachment: (was: tm_log.zip) > Batch job fails due to "Could not mark slot 61a637e3977c58a0e6

[jira] [Commented] (FLINK-21859) Batch job fails due to "Could not mark slot 61a637e3977c58a0e6b73533c419297d active"

2021-03-31 Thread Yingjie Cao (Jira)
[ https://issues.apache.org/jira/browse/FLINK-21859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17312155#comment-17312155 ] Yingjie Cao commented on FLINK-21859: - [~trohrmann] I have uploaded the log, hope it

[jira] [Updated] (FLINK-21859) Batch job fails due to "Could not mark slot 61a637e3977c58a0e6b73533c419297d active"

2021-03-31 Thread Yingjie Cao (Jira)
[ https://issues.apache.org/jira/browse/FLINK-21859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yingjie Cao updated FLINK-21859: Attachment: jm.log.zip > Batch job fails due to "Could not mark slot 61a637e3977c58a0e6b73533c4192

[jira] [Updated] (FLINK-21859) Batch job fails due to "Could not mark slot 61a637e3977c58a0e6b73533c419297d active"

2021-03-31 Thread Yingjie Cao (Jira)
[ https://issues.apache.org/jira/browse/FLINK-21859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yingjie Cao updated FLINK-21859: Attachment: tm_log.zip > Batch job fails due to "Could not mark slot 61a637e3977c58a0e6b73533c4192

[jira] [Commented] (FLINK-21859) Batch job fails due to "Could not mark slot 61a637e3977c58a0e6b73533c419297d active"

2021-03-30 Thread Yingjie Cao (Jira)
[ https://issues.apache.org/jira/browse/FLINK-21859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17311429#comment-17311429 ] Yingjie Cao commented on FLINK-21859: - [~trohrmann] I did not keep the log, I encoun

[jira] [Commented] (FLINK-16404) Avoid caching buffers for blocked input channels before barrier alignment

2021-03-30 Thread Yingjie Cao (Jira)
[ https://issues.apache.org/jira/browse/FLINK-16404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17311255#comment-17311255 ] Yingjie Cao commented on FLINK-16404: - [~assassinj] I would not expect that your que

[jira] [Closed] (FLINK-16298) GroupWindowTableAggregateITCase.testEventTimeTumblingWindow fails on Travis

2021-03-30 Thread Yingjie Cao (Jira)
[ https://issues.apache.org/jira/browse/FLINK-16298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yingjie Cao closed FLINK-16298. --- Resolution: Won't Fix > GroupWindowTableAggregateITCase.testEventTimeTumblingWindow fails on Travis

[jira] [Commented] (FLINK-16298) GroupWindowTableAggregateITCase.testEventTimeTumblingWindow fails on Travis

2021-03-30 Thread Yingjie Cao (Jira)
[ https://issues.apache.org/jira/browse/FLINK-16298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17311228#comment-17311228 ] Yingjie Cao commented on FLINK-16298: - It is a long time since the issue is reported

[jira] [Commented] (FLINK-21879) ActiveResourceManagerTest.testWorkerRegistrationTimeoutNotCountingAllocationTime fails on AZP

2021-03-29 Thread Yingjie Cao (Jira)
[ https://issues.apache.org/jira/browse/FLINK-21879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17311141#comment-17311141 ] Yingjie Cao commented on FLINK-21879: - The test seems still fails:  [https://dev.azu

[jira] [Updated] (FLINK-16641) Announce sender's backlog to solve the deadlock issue without exclusive buffers

2021-03-28 Thread Yingjie Cao (Jira)
[ https://issues.apache.org/jira/browse/FLINK-16641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yingjie Cao updated FLINK-16641: Fix Version/s: (was: 1.13.0) 1.14.0 > Announce sender's backlog to solve th

[jira] [Updated] (FLINK-16012) Reduce the default number of exclusive buffers from 2 to 1 on receiver side

2021-03-28 Thread Yingjie Cao (Jira)
[ https://issues.apache.org/jira/browse/FLINK-16012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yingjie Cao updated FLINK-16012: Fix Version/s: (was: 1.13.0) 1.14.0 > Reduce the default number of exclusiv

[jira] [Updated] (FLINK-18762) Make network buffers per incoming/outgoing channel can be configured separately

2021-03-28 Thread Yingjie Cao (Jira)
[ https://issues.apache.org/jira/browse/FLINK-18762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yingjie Cao updated FLINK-18762: Fix Version/s: (was: 1.13.0) 1.14.0 > Make network buffers per incoming/out

[jira] [Updated] (FLINK-16428) Fine-grained network buffer management for backpressure

2021-03-28 Thread Yingjie Cao (Jira)
[ https://issues.apache.org/jira/browse/FLINK-16428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yingjie Cao updated FLINK-16428: Fix Version/s: (was: 1.13.0) 1.14.0 > Fine-grained network buffer managemen

[jira] [Commented] (FLINK-16428) Fine-grained network buffer management for backpressure

2021-03-28 Thread Yingjie Cao (Jira)
[ https://issues.apache.org/jira/browse/FLINK-16428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17310448#comment-17310448 ] Yingjie Cao commented on FLINK-16428: - Though the PR is ready, I guess we do not hav

[jira] [Updated] (FLINK-21859) Batch job fails due to "Could not mark slot 61a637e3977c58a0e6b73533c419297d active"

2021-03-25 Thread Yingjie Cao (Jira)
[ https://issues.apache.org/jira/browse/FLINK-21859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yingjie Cao updated FLINK-21859: Affects Version/s: 1.12.0 > Batch job fails due to "Could not mark slot 61a637e3977c58a0e6b73533c4

[jira] [Updated] (FLINK-21859) Batch job fails due to "Could not mark slot 61a637e3977c58a0e6b73533c419297d active"

2021-03-25 Thread Yingjie Cao (Jira)
[ https://issues.apache.org/jira/browse/FLINK-21859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yingjie Cao updated FLINK-21859: Fix Version/s: 1.13.0 > Batch job fails due to "Could not mark slot 61a637e3977c58a0e6b73533c41929

[jira] [Updated] (FLINK-21789) Make FileChannelManagerImpl#getNextPathNum select data directories fairly

2021-03-25 Thread Yingjie Cao (Jira)
[ https://issues.apache.org/jira/browse/FLINK-21789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yingjie Cao updated FLINK-21789: Parent: (was: FLINK-19614) Issue Type: Bug (was: Sub-task) > Make FileChannelManagerI

[jira] [Updated] (FLINK-21790) Shuffle data directories to make directory selection of different TaskManagers fairer

2021-03-25 Thread Yingjie Cao (Jira)
[ https://issues.apache.org/jira/browse/FLINK-21790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yingjie Cao updated FLINK-21790: Issue Type: Improvement (was: Bug) > Shuffle data directories to make directory selection of diff

[jira] [Updated] (FLINK-21790) Shuffle data directories to make directory selection of different TaskManagers fairer

2021-03-25 Thread Yingjie Cao (Jira)
[ https://issues.apache.org/jira/browse/FLINK-21790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yingjie Cao updated FLINK-21790: Parent: (was: FLINK-19614) Issue Type: Bug (was: Sub-task) > Shuffle data directories

[jira] [Commented] (FLINK-21788) Throw PartitionNotFoundException if the partition file has been lost for blocking shuffle

2021-03-25 Thread Yingjie Cao (Jira)
[ https://issues.apache.org/jira/browse/FLINK-21788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17308615#comment-17308615 ] Yingjie Cao commented on FLINK-21788: - The problem should exist since 1.9. > Throw

[jira] [Closed] (FLINK-20758) Use region file mechanism for shuffle data reading before we switch to managed memory

2021-03-24 Thread Yingjie Cao (Jira)
[ https://issues.apache.org/jira/browse/FLINK-20758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yingjie Cao closed FLINK-20758. --- Resolution: Won't Fix > Use region file mechanism for shuffle data reading before we switch to > ma

[jira] [Updated] (FLINK-21951) Fix wrong if condition in BufferReaderWriterUtil#writeBuffers

2021-03-24 Thread Yingjie Cao (Jira)
[ https://issues.apache.org/jira/browse/FLINK-21951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yingjie Cao updated FLINK-21951: Description: The wrong if condition in BufferReaderWriterUtil#writeBuffers may lead to data loss w

[jira] [Created] (FLINK-21951) Fix wrong if condition in BufferReaderWriterUtil#writeBuffers

2021-03-24 Thread Yingjie Cao (Jira)
Yingjie Cao created FLINK-21951: --- Summary: Fix wrong if condition in BufferReaderWriterUtil#writeBuffers Key: FLINK-21951 URL: https://issues.apache.org/jira/browse/FLINK-21951 Project: Flink

[jira] [Closed] (FLINK-20824) BlockingShuffleITCase. testSortMergeBlockingShuffle test failed with "Inconsistent availability: expected true"

2021-03-24 Thread Yingjie Cao (Jira)
[ https://issues.apache.org/jira/browse/FLINK-20824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yingjie Cao closed FLINK-20824. --- Resolution: Duplicate > BlockingShuffleITCase. testSortMergeBlockingShuffle test failed with > "Inc

[jira] [Commented] (FLINK-20824) BlockingShuffleITCase. testSortMergeBlockingShuffle test failed with "Inconsistent availability: expected true"

2021-03-24 Thread Yingjie Cao (Jira)
[ https://issues.apache.org/jira/browse/FLINK-20824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17307628#comment-17307628 ] Yingjie Cao commented on FLINK-20824: - As the fix for FLINK-20547 has been merged. W

[jira] [Commented] (FLINK-20824) BlockingShuffleITCase. testSortMergeBlockingShuffle test failed with "Inconsistent availability: expected true"

2021-03-23 Thread Yingjie Cao (Jira)
[ https://issues.apache.org/jira/browse/FLINK-20824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17306837#comment-17306837 ] Yingjie Cao commented on FLINK-20824: - Hi, I am pretty sure this is the same issue w

[jira] [Commented] (FLINK-20547) Batch job fails due to the exception in network stack

2021-03-22 Thread Yingjie Cao (Jira)
[ https://issues.apache.org/jira/browse/FLINK-20547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17306733#comment-17306733 ] Yingjie Cao commented on FLINK-20547: - [~pnowojski] I have opened a PR for this issu

[jira] [Updated] (FLINK-20547) Batch job fails due to the exception in network stack

2021-03-22 Thread Yingjie Cao (Jira)
[ https://issues.apache.org/jira/browse/FLINK-20547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yingjie Cao updated FLINK-20547: Affects Version/s: 1.12.0 1.12.1 1.12.2 > Batch job

[jira] [Commented] (FLINK-20547) Batch job fails due to the exception in network stack

2021-03-22 Thread Yingjie Cao (Jira)
[ https://issues.apache.org/jira/browse/FLINK-20547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17305984#comment-17305984 ] Yingjie Cao commented on FLINK-20547: - [~pnowojski] Thanks for the confirmation. You

[jira] [Updated] (FLINK-21789) Make FileChannelManagerImpl#getNextPathNum select data directories fairly

2021-03-22 Thread Yingjie Cao (Jira)
[ https://issues.apache.org/jira/browse/FLINK-21789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yingjie Cao updated FLINK-21789: Parent: FLINK-19614 Issue Type: Sub-task (was: Improvement) > Make FileChannelManagerImpl

[jira] [Updated] (FLINK-21790) Shuffle data directories to make directory selection of different TaskManagers fairer

2021-03-22 Thread Yingjie Cao (Jira)
[ https://issues.apache.org/jira/browse/FLINK-21790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yingjie Cao updated FLINK-21790: Parent: FLINK-19614 Issue Type: Sub-task (was: Improvement) > Shuffle data directories to

[jira] [Updated] (FLINK-21788) Throw PartitionNotFoundException if the partition file has been lost for blocking shuffle

2021-03-22 Thread Yingjie Cao (Jira)
[ https://issues.apache.org/jira/browse/FLINK-21788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yingjie Cao updated FLINK-21788: Parent: FLINK-19614 Issue Type: Sub-task (was: Bug) > Throw PartitionNotFoundException if

[jira] [Commented] (FLINK-20547) Batch job fails due to the exception in network stack

2021-03-21 Thread Yingjie Cao (Jira)
[ https://issues.apache.org/jira/browse/FLINK-20547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17305878#comment-17305878 ] Yingjie Cao commented on FLINK-20547: - [~pnowojski] After reading the code, I think

[jira] [Updated] (FLINK-21789) Make FileChannelManagerImpl#getNextPathNum select data directories fairly

2021-03-20 Thread Yingjie Cao (Jira)
[ https://issues.apache.org/jira/browse/FLINK-21789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yingjie Cao updated FLINK-21789: Summary: Make FileChannelManagerImpl#getNextPathNum select data directories fairly (was: Make Fil

[jira] [Updated] (FLINK-21790) Shuffle data directories to make directory selection of different TaskManagers fairer

2021-03-20 Thread Yingjie Cao (Jira)
[ https://issues.apache.org/jira/browse/FLINK-21790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yingjie Cao updated FLINK-21790: Summary: Shuffle data directories to make directory selection of different TaskManagers fairer (w

[jira] [Updated] (FLINK-21790) Shuffle shuffle data directories to make directory selection of different TaskManagers fairer

2021-03-20 Thread Yingjie Cao (Jira)
[ https://issues.apache.org/jira/browse/FLINK-21790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yingjie Cao updated FLINK-21790: Summary: Shuffle shuffle data directories to make directory selection of different TaskManagers fa

[jira] [Updated] (FLINK-21790) Shuffle data directories to make data directory selection of different TaskManagers fairer

2021-03-20 Thread Yingjie Cao (Jira)
[ https://issues.apache.org/jira/browse/FLINK-21790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yingjie Cao updated FLINK-21790: Summary: Shuffle data directories to make data directory selection of different TaskManagers faire

[jira] [Commented] (FLINK-20547) Batch job fails due to the exception in network stack

2021-03-19 Thread Yingjie Cao (Jira)
[ https://issues.apache.org/jira/browse/FLINK-20547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17304845#comment-17304845 ] Yingjie Cao commented on FLINK-20547: - [~kevin.cyj] No much progress so far. Previou

[jira] [Commented] (FLINK-21857) StackOverflow for large parallelism jobs when processing EndOfChannelStateEvent

2021-03-18 Thread Yingjie Cao (Jira)
[ https://issues.apache.org/jira/browse/FLINK-21857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17304069#comment-17304069 ] Yingjie Cao commented on FLINK-21857: - [~roman_khachatryan] 8000 * 8000 can reproduc

[jira] [Created] (FLINK-21859) Batch job fails due to "Could not mark slot 61a637e3977c58a0e6b73533c419297d active"

2021-03-18 Thread Yingjie Cao (Jira)
Yingjie Cao created FLINK-21859: --- Summary: Batch job fails due to "Could not mark slot 61a637e3977c58a0e6b73533c419297d active" Key: FLINK-21859 URL: https://issues.apache.org/jira/browse/FLINK-21859 Pr

[jira] [Commented] (FLINK-21857) StackOverflow for large parallelism jobs when processing EndOfChannelStateEvent

2021-03-18 Thread Yingjie Cao (Jira)
[ https://issues.apache.org/jira/browse/FLINK-21857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17303949#comment-17303949 ] Yingjie Cao commented on FLINK-21857: - cc @[~pnowojski] [~AHeise] [~roman_khachatrya

[jira] [Created] (FLINK-21857) StackOverflow for large parallelism jobs when processing EndOfChannelStateEvent

2021-03-18 Thread Yingjie Cao (Jira)
Yingjie Cao created FLINK-21857: --- Summary: StackOverflow for large parallelism jobs when processing EndOfChannelStateEvent Key: FLINK-21857 URL: https://issues.apache.org/jira/browse/FLINK-21857 Project

[jira] [Created] (FLINK-21850) Improve document and config description of sort-merge blocking shuffle

2021-03-17 Thread Yingjie Cao (Jira)
Yingjie Cao created FLINK-21850: --- Summary: Improve document and config description of sort-merge blocking shuffle Key: FLINK-21850 URL: https://issues.apache.org/jira/browse/FLINK-21850 Project: Flink

[jira] [Commented] (FLINK-19938) Implement shuffle data read scheduling for sort-merge blocking shuffle

2021-03-16 Thread Yingjie Cao (Jira)
[ https://issues.apache.org/jira/browse/FLINK-19938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17303119#comment-17303119 ] Yingjie Cao commented on FLINK-19938: - [~dahaishuantuoba] I will update the PR this

[jira] [Commented] (FLINK-21416) FileBufferReaderITCase.testSequentialReading fails on azure

2021-03-16 Thread Yingjie Cao (Jira)
[ https://issues.apache.org/jira/browse/FLINK-21416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17303043#comment-17303043 ] Yingjie Cao commented on FLINK-21416: - > Maybe we can speed up the test? Make it lig

[jira] [Updated] (FLINK-18727) Remove the previous finished empty Buffer in PipelinedSubpartition when adding a new Buffer

2021-03-14 Thread Yingjie Cao (Jira)
[ https://issues.apache.org/jira/browse/FLINK-18727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yingjie Cao updated FLINK-18727: Fix Version/s: (was: 1.13.0) > Remove the previous finished empty Buffer in PipelinedSubpartit

[jira] [Closed] (FLINK-17208) Reduce redundant data available notification of PipelinedSubpartition

2021-03-14 Thread Yingjie Cao (Jira)
[ https://issues.apache.org/jira/browse/FLINK-17208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yingjie Cao closed FLINK-17208. --- Resolution: Won't Fix > Reduce redundant data available notification of PipelinedSubpartition >

[jira] [Commented] (FLINK-17208) Reduce redundant data available notification of PipelinedSubpartition

2021-03-14 Thread Yingjie Cao (Jira)
[ https://issues.apache.org/jira/browse/FLINK-17208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17301429#comment-17301429 ] Yingjie Cao commented on FLINK-17208: - This issue is introduced by FLINK-16428, now

[jira] [Closed] (FLINK-18727) Remove the previous finished empty Buffer in PipelinedSubpartition when adding a new Buffer

2021-03-14 Thread Yingjie Cao (Jira)
[ https://issues.apache.org/jira/browse/FLINK-18727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yingjie Cao closed FLINK-18727. --- Resolution: Won't Fix > Remove the previous finished empty Buffer in PipelinedSubpartition when > a

[jira] [Commented] (FLINK-18727) Remove the previous finished empty Buffer in PipelinedSubpartition when adding a new Buffer

2021-03-14 Thread Yingjie Cao (Jira)
[ https://issues.apache.org/jira/browse/FLINK-18727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17301427#comment-17301427 ] Yingjie Cao commented on FLINK-18727: - This issue is introduced by FLINK-16428, now

[jira] [Created] (FLINK-21790) Shuffle data directories to make data directory section of different TaskManagers fairer

2021-03-14 Thread Yingjie Cao (Jira)
Yingjie Cao created FLINK-21790: --- Summary: Shuffle data directories to make data directory section of different TaskManagers fairer Key: FLINK-21790 URL: https://issues.apache.org/jira/browse/FLINK-21790

[jira] [Created] (FLINK-21789) Make FileChannelManagerImpl#getNextPathNum select data directory fairly

2021-03-14 Thread Yingjie Cao (Jira)
Yingjie Cao created FLINK-21789: --- Summary: Make FileChannelManagerImpl#getNextPathNum select data directory fairly Key: FLINK-21789 URL: https://issues.apache.org/jira/browse/FLINK-21789 Project: Flink

[jira] [Created] (FLINK-21788) Throw PartitionNotFoundException if the partition file has been lost for blocking shuffle

2021-03-14 Thread Yingjie Cao (Jira)
Yingjie Cao created FLINK-21788: --- Summary: Throw PartitionNotFoundException if the partition file has been lost for blocking shuffle Key: FLINK-21788 URL: https://issues.apache.org/jira/browse/FLINK-21788

[jira] [Comment Edited] (FLINK-19938) Implement shuffle data read scheduling for sort-merge blocking shuffle

2021-03-14 Thread Yingjie Cao (Jira)
[ https://issues.apache.org/jira/browse/FLINK-19938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17301139#comment-17301139 ] Yingjie Cao edited comment on FLINK-19938 at 3/14/21, 12:38 PM: --

[jira] [Commented] (FLINK-19938) Implement shuffle data read scheduling for sort-merge blocking shuffle

2021-03-14 Thread Yingjie Cao (Jira)
[ https://issues.apache.org/jira/browse/FLINK-19938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17301139#comment-17301139 ] Yingjie Cao commented on FLINK-19938: - [~dahaishuantuoba] Thanks for your interest.

[jira] [Updated] (FLINK-20740) Use managed memory to avoid direct memory OOM error for sort-merge shuffle (introduce a separated buffer pool)

2021-03-13 Thread Yingjie Cao (Jira)
[ https://issues.apache.org/jira/browse/FLINK-20740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yingjie Cao updated FLINK-20740: Summary: Use managed memory to avoid direct memory OOM error for sort-merge shuffle (introduce a s

[jira] [Created] (FLINK-21778) Use heap memory instead of direct memory as index entry cache for sort-merge shuffle

2021-03-13 Thread Yingjie Cao (Jira)
Yingjie Cao created FLINK-21778: --- Summary: Use heap memory instead of direct memory as index entry cache for sort-merge shuffle Key: FLINK-21778 URL: https://issues.apache.org/jira/browse/FLINK-21778 Pr

[jira] [Updated] (FLINK-21777) Replace the 4M data writing cache of sort-merge shuffle with writev system call

2021-03-13 Thread Yingjie Cao (Jira)
[ https://issues.apache.org/jira/browse/FLINK-21777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yingjie Cao updated FLINK-21777: Description: Currently, the sort-merge shuffle implementation uses 4M unmanaged direct memory as c

[jira] [Created] (FLINK-21777) Replace the 4M data writing cache of sort-merge shuffle with writev system call

2021-03-13 Thread Yingjie Cao (Jira)
Yingjie Cao created FLINK-21777: --- Summary: Replace the 4M data writing cache of sort-merge shuffle with writev system call Key: FLINK-21777 URL: https://issues.apache.org/jira/browse/FLINK-21777 Project

[jira] [Commented] (FLINK-20758) Use region file mechanism for shuffle data reading before we switch to managed memory

2021-03-09 Thread Yingjie Cao (Jira)
[ https://issues.apache.org/jira/browse/FLINK-20758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17298070#comment-17298070 ] Yingjie Cao commented on FLINK-20758: - We decide to switch to managed memory in 1.13

[jira] [Updated] (FLINK-20758) Use region file mechanism for shuffle data reading before we switch to managed memory

2021-03-09 Thread Yingjie Cao (Jira)
[ https://issues.apache.org/jira/browse/FLINK-20758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yingjie Cao updated FLINK-20758: Fix Version/s: (was: 1.12.3) > Use region file mechanism for shuffle data reading before we sw

[jira] [Updated] (FLINK-20740) Use managed memory (network memory) to avoid direct memory OOM error for sort-merge shuffle (introduce a separated buffer pool)

2021-03-07 Thread Yingjie Cao (Jira)
[ https://issues.apache.org/jira/browse/FLINK-20740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yingjie Cao updated FLINK-20740: Summary: Use managed memory (network memory) to avoid direct memory OOM error for sort-merge shuff

[jira] [Updated] (FLINK-20740) Use managed memory (network memory) to avoid direct memory OOM error for sort-merge shuffle

2021-03-07 Thread Yingjie Cao (Jira)
[ https://issues.apache.org/jira/browse/FLINK-20740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yingjie Cao updated FLINK-20740: Description: Currently, sort-merge blocking shuffle uses some unmanaged memory for data writing a

[jira] [Commented] (FLINK-21416) FileBufferReaderITCase.testSequentialReading fails on azure

2021-03-07 Thread Yingjie Cao (Jira)
[ https://issues.apache.org/jira/browse/FLINK-21416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17297044#comment-17297044 ] Yingjie Cao commented on FLINK-21416: - Sorry for the late reply, I think it is the s

[jira] [Commented] (FLINK-16641) Announce sender's backlog to solve the deadlock issue without exclusive buffers

2021-03-07 Thread Yingjie Cao (Jira)
[ https://issues.apache.org/jira/browse/FLINK-16641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17296894#comment-17296894 ] Yingjie Cao commented on FLINK-16641: - [~pnowojski] [~zjwang] I have updated the PR

[jira] [Updated] (FLINK-20740) Use managed memory (network memory) to avoid direct memory OOM error for sort-merge shuffle

2021-01-27 Thread Yingjie Cao (Jira)
[ https://issues.apache.org/jira/browse/FLINK-20740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yingjie Cao updated FLINK-20740: Summary: Use managed memory (network memory) to avoid direct memory OOM error for sort-merge shuff

[jira] [Commented] (FLINK-20824) BlockingShuffleITCase. testSortMergeBlockingShuffle test failed with "Inconsistent availability: expected true"

2020-12-30 Thread Yingjie Cao (Jira)
[ https://issues.apache.org/jira/browse/FLINK-20824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17256822#comment-17256822 ] Yingjie Cao commented on FLINK-20824: - [~hxbks2ks] Thanks for reporting this issue,

[jira] [Created] (FLINK-20758) Use region file mechanism for shuffle data reading before we switch to managed memory

2020-12-23 Thread Yingjie Cao (Jira)
Yingjie Cao created FLINK-20758: --- Summary: Use region file mechanism for shuffle data reading before we switch to managed memory Key: FLINK-20758 URL: https://issues.apache.org/jira/browse/FLINK-20758 P

[jira] [Updated] (FLINK-20758) Use region file mechanism for shuffle data reading before we switch to managed memory

2020-12-23 Thread Yingjie Cao (Jira)
[ https://issues.apache.org/jira/browse/FLINK-20758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yingjie Cao updated FLINK-20758: Labels: usability (was: ) > Use region file mechanism for shuffle data reading before we switch t

[jira] [Created] (FLINK-20757) Optimize data broadcast for sort-merge shuffle

2020-12-23 Thread Yingjie Cao (Jira)
Yingjie Cao created FLINK-20757: --- Summary: Optimize data broadcast for sort-merge shuffle Key: FLINK-20757 URL: https://issues.apache.org/jira/browse/FLINK-20757 Project: Flink Issue Type: Sub-

[jira] [Updated] (FLINK-20740) Use managed memory to avoid direct memory OOM error for sort-merge shuffle

2020-12-22 Thread Yingjie Cao (Jira)
[ https://issues.apache.org/jira/browse/FLINK-20740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yingjie Cao updated FLINK-20740: Parent: FLINK-19614 Issue Type: Sub-task (was: Bug) > Use managed memory to avoid direct

[jira] [Updated] (FLINK-20740) Use managed memory to avoid direct memory OOM error for sort-merge shuffle

2020-12-22 Thread Yingjie Cao (Jira)
[ https://issues.apache.org/jira/browse/FLINK-20740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yingjie Cao updated FLINK-20740: Summary: Use managed memory to avoid direct memory OOM error for sort-merge shuffle (was: Use man

[jira] [Created] (FLINK-20740) Use managed memory to avoid direct memory OOM error

2020-12-22 Thread Yingjie Cao (Jira)
Yingjie Cao created FLINK-20740: --- Summary: Use managed memory to avoid direct memory OOM error Key: FLINK-20740 URL: https://issues.apache.org/jira/browse/FLINK-20740 Project: Flink Issue Type:

[jira] [Commented] (FLINK-20547) Batch job fails due to the exception in network stack

2020-12-10 Thread Yingjie Cao (Jira)
[ https://issues.apache.org/jira/browse/FLINK-20547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17247236#comment-17247236 ] Yingjie Cao commented on FLINK-20547: - [~roman_khachatryan] We are using 8000 becaus

[jira] [Commented] (FLINK-20547) Batch job fails due to the exception in network stack

2020-12-10 Thread Yingjie Cao (Jira)
[ https://issues.apache.org/jira/browse/FLINK-20547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17247222#comment-17247222 ] Yingjie Cao commented on FLINK-20547: - [~roman_khachatryan] To be honest, I am not s

[jira] [Commented] (FLINK-20547) Batch job fails due to the exception in network stack

2020-12-10 Thread Yingjie Cao (Jira)
[ https://issues.apache.org/jira/browse/FLINK-20547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17247144#comment-17247144 ] Yingjie Cao commented on FLINK-20547: - [~roman_khachatryan] I encountered this excep

[jira] [Commented] (FLINK-20547) Batch job fails due to the exception in network stack

2020-12-09 Thread Yingjie Cao (Jira)
[ https://issues.apache.org/jira/browse/FLINK-20547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17247029#comment-17247029 ] Yingjie Cao commented on FLINK-20547: - I also encountered this issue these days when

[jira] [Commented] (FLINK-17208) Reduce redundant data available notification of PipelinedSubpartition

2020-12-07 Thread Yingjie Cao (Jira)
[ https://issues.apache.org/jira/browse/FLINK-17208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17245683#comment-17245683 ] Yingjie Cao commented on FLINK-17208: - [~AHeise] This ticket was created for FLINK-1

[jira] [Updated] (FLINK-17208) Reduce redundant data available notification of PipelinedSubpartition

2020-12-07 Thread Yingjie Cao (Jira)
[ https://issues.apache.org/jira/browse/FLINK-17208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yingjie Cao updated FLINK-17208: Fix Version/s: (was: 1.12.0) > Reduce redundant data available notification of PipelinedSubpar

[jira] [Reopened] (FLINK-20035) BlockingShuffleITCase unstable with "Could not start rest endpoint on any port in port range 8081"

2020-11-23 Thread Yingjie Cao (Jira)
[ https://issues.apache.org/jira/browse/FLINK-20035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yingjie Cao reopened FLINK-20035: - [~rmetzger] I reopened this ticket. > BlockingShuffleITCase unstable with "Could not start rest end

[jira] [Comment Edited] (FLINK-20035) BlockingShuffleITCase unstable with "Could not start rest endpoint on any port in port range 8081"

2020-11-22 Thread Yingjie Cao (Jira)
[ https://issues.apache.org/jira/browse/FLINK-20035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17237069#comment-17237069 ] Yingjie Cao edited comment on FLINK-20035 at 11/23/20, 2:42 AM: --

[jira] [Commented] (FLINK-20035) BlockingShuffleITCase unstable with "Could not start rest endpoint on any port in port range 8081"

2020-11-22 Thread Yingjie Cao (Jira)
[ https://issues.apache.org/jira/browse/FLINK-20035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17237069#comment-17237069 ] Yingjie Cao commented on FLINK-20035: - [~rmetzger] I made a mistake in the previous

[jira] [Commented] (FLINK-19582) FLIP-148: Introduce sort-merge based blocking shuffle to Flink

2020-11-16 Thread Yingjie Cao (Jira)
[ https://issues.apache.org/jira/browse/FLINK-19582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17232714#comment-17232714 ] Yingjie Cao commented on FLINK-19582: - [~568793...@qq.com] Sorry for the late reply.

[jira] [Commented] (FLINK-19925) Errors$NativeIoException: readAddress(..) failed: Connection reset by peer

2020-11-16 Thread Yingjie Cao (Jira)
[ https://issues.apache.org/jira/browse/FLINK-19925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17232674#comment-17232674 ] Yingjie Cao commented on FLINK-19925: - [~AHeise] Could you please explain a bit more

[jira] [Updated] (FLINK-19983) ShuffleCompressionITCase.testDataCompressionForSortMergeBlockingShuffle unstable

2020-11-08 Thread Yingjie Cao (Jira)
[ https://issues.apache.org/jira/browse/FLINK-19983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yingjie Cao updated FLINK-19983: Fix Version/s: 1.12.0 > ShuffleCompressionITCase.testDataCompressionForSortMergeBlockingShuffle >

[jira] [Updated] (FLINK-20035) BlockingShuffleITCase unstable with "Could not start rest endpoint on any port in port range 8081"

2020-11-08 Thread Yingjie Cao (Jira)
[ https://issues.apache.org/jira/browse/FLINK-20035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yingjie Cao updated FLINK-20035: Fix Version/s: 1.12.0 > BlockingShuffleITCase unstable with "Could not start rest endpoint on any

[jira] [Commented] (FLINK-20035) BlockingShuffleITCase unstable with "Could not start rest endpoint on any port in port range 8081"

2020-11-08 Thread Yingjie Cao (Jira)
[ https://issues.apache.org/jira/browse/FLINK-20035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17228349#comment-17228349 ] Yingjie Cao commented on FLINK-20035: - This failure is caused by port bind exception

[jira] [Commented] (FLINK-19983) ShuffleCompressionITCase.testDataCompressionForSortMergeBlockingShuffle unstable

2020-11-08 Thread Yingjie Cao (Jira)
[ https://issues.apache.org/jira/browse/FLINK-19983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17228333#comment-17228333 ] Yingjie Cao commented on FLINK-19983: - After some investigation, I find that the sta

[jira] [Updated] (FLINK-19938) Implement shuffle data read scheduling for sort-merge blocking shuffle

2020-11-06 Thread Yingjie Cao (Jira)
[ https://issues.apache.org/jira/browse/FLINK-19938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yingjie Cao updated FLINK-19938: Affects Version/s: (was: 1.12.0) > Implement shuffle data read scheduling for sort-merge block

[jira] [Updated] (FLINK-19938) Implement shuffle data read scheduling for sort-merge blocking shuffle

2020-11-06 Thread Yingjie Cao (Jira)
[ https://issues.apache.org/jira/browse/FLINK-19938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yingjie Cao updated FLINK-19938: Fix Version/s: (was: 1.12.0) > Implement shuffle data read scheduling for sort-merge blocking

[jira] [Commented] (FLINK-19983) ShuffleCompressionITCase.testDataCompressionForSortMergeBlockingShuffle unstable

2020-11-06 Thread Yingjie Cao (Jira)
[ https://issues.apache.org/jira/browse/FLINK-19983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17227242#comment-17227242 ] Yingjie Cao commented on FLINK-19983: - I will take a look at this issue. > ShuffleC

[jira] [Created] (FLINK-20013) BoundedBlockingSubpartition may leak network buffer if task is failed or canceled

2020-11-05 Thread Yingjie Cao (Jira)
Yingjie Cao created FLINK-20013: --- Summary: BoundedBlockingSubpartition may leak network buffer if task is failed or canceled Key: FLINK-20013 URL: https://issues.apache.org/jira/browse/FLINK-20013 Proje

[jira] [Commented] (FLINK-19925) Errors$NativeIoException: readAddress(..) failed: Connection reset by peer

2020-11-05 Thread Yingjie Cao (Jira)
[ https://issues.apache.org/jira/browse/FLINK-19925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17227154#comment-17227154 ] Yingjie Cao commented on FLINK-19925: - Usually, it indicates an unstable network whi

[jira] [Closed] (FLINK-20010) SinkITCase.writerAndCommitterAndGlobalCommitterExecuteInStreamingMode fails on Azure Pipeline

2020-11-05 Thread Yingjie Cao (Jira)
[ https://issues.apache.org/jira/browse/FLINK-20010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yingjie Cao closed FLINK-20010. --- Resolution: Fixed > SinkITCase.writerAndCommitterAndGlobalCommitterExecuteInStreamingMode fails > o

[jira] [Commented] (FLINK-20010) SinkITCase.writerAndCommitterAndGlobalCommitterExecuteInStreamingMode fails on Azure Pipeline

2020-11-05 Thread Yingjie Cao (Jira)
[ https://issues.apache.org/jira/browse/FLINK-20010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17227123#comment-17227123 ] Yingjie Cao commented on FLINK-20010: - It is confirmed by [~maguowei] that the issue

[jira] [Commented] (FLINK-20010) SinkITCase.writerAndCommitterAndGlobalCommitterExecuteInStreamingMode fails on Azure Pipeline

2020-11-05 Thread Yingjie Cao (Jira)
[ https://issues.apache.org/jira/browse/FLINK-20010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17227116#comment-17227116 ] Yingjie Cao commented on FLINK-20010: - It seems the exception stack is incomplete, h

[jira] [Created] (FLINK-20010) SinkITCase.writerAndCommitterAndGlobalCommitterExecuteInStreamingMode fails on Azure Pipeline

2020-11-05 Thread Yingjie Cao (Jira)
Yingjie Cao created FLINK-20010: --- Summary: SinkITCase.writerAndCommitterAndGlobalCommitterExecuteInStreamingMode fails on Azure Pipeline Key: FLINK-20010 URL: https://issues.apache.org/jira/browse/FLINK-20010

[jira] [Created] (FLINK-19991) UnalignedCheckpointITCase#shouldPerformUnalignedCheckpointMassivelyParallel fails on Azure Pipeline

2020-11-05 Thread Yingjie Cao (Jira)
Yingjie Cao created FLINK-19991: --- Summary: UnalignedCheckpointITCase#shouldPerformUnalignedCheckpointMassivelyParallel fails on Azure Pipeline Key: FLINK-19991 URL: https://issues.apache.org/jira/browse/FLINK-19991

[jira] [Commented] (FLINK-19645) ShuffleCompressionITCase.testDataCompressionForBlockingShuffle is instable

2020-11-05 Thread Yingjie Cao (Jira)
[ https://issues.apache.org/jira/browse/FLINK-19645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17226566#comment-17226566 ] Yingjie Cao commented on FLINK-19645: - After some investigation, I didn't find any p

<    1   2   3   4   5   6   7   8   >