Hi Yun, I've copied 77e77928-cb26-4543-bd41-e785fcac49f0 and _metadata to Google drive: https://drive.google.com/drive/folders/1J3nwvQupLBT5ZaN_qEmc2y_-MgFz0cLb?usp=sharing
Compression was never enabled (docs says that RocksDB's incremental checkpoints always use snappy compression, not sure does it have effect on savepoint or not) Thanks, Alexey ________________________________ From: Yun Tang <myas...@live.com> Sent: Wednesday, March 17, 2021 12:33 AM To: Alexey Trenikhun <yen...@msn.com>; Tzu-Li (Gordon) Tai <tzuli...@apache.org>; user@flink.apache.org <user@flink.apache.org> Subject: Re: EOFException on attempt to scale up job with RocksDB state backend Hi Alexey, Thanks for your quick response. I have checked two different logs and still cannot understand why this could happen. Take "wasbs://gsp-st...@gspstatewestus2dev.blob.core.windows.net/gsp/savepoints/savepoint-000000-67de6690143a/77e77928-cb26-4543-bd41-e785fcac49f0" for example, the key group range offset has been intersected correctly during rescale for task "Intake voice calls (6/7)". The only place I could doubt is that azure blob storage did work as expected during seek offset [1]. Have you ever enabled snappy compression [2] [3] for savepoints? Could you also share the file "wasbs://gsp-st...@gspstatewestus2dev.blob.core.windows.net/gsp/savepoints/savepoint-000000-67de6690143a/77e77928-cb26-4543-bd41-e785fcac49f0 " so that I could seek locally to see whether work as expected. Moreover, could you also share savepoint meta data ""wasbs://gsp-st...@gspstatewestus2dev.blob.core.windows.net/gsp/savepoints/savepoint-000000-67de6690143a/_metadata" ? [1] https://github.com/apache/flink/blob/dc404e2538fdfbc98b9c565951f30f922bf7cedd/flink-state-backends/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/restore/RocksDBFullRestoreOperation.java#L211 [2] https://ci.apache.org/projects/flink/flink-docs-stable/ops/state/large_state_tuning.html#compression [3] https://ci.apache.org/projechttps://ci.apache.org/projects/flink/flink-docs-release-1.12/deployment/config.html#execution-checkpointing-snapshot-compressions/flink/flink-docs-stable/ops/state/large_state_tuning.html#compression Best Yun Tang ________________________________ From: Alexey Trenikhun <yen...@msn.com> Sent: Wednesday, March 17, 2021 14:25 To: Yun Tang <myas...@live.com>; Tzu-Li (Gordon) Tai <tzuli...@apache.org>; user@flink.apache.org <user@flink.apache.org> Subject: Re: EOFException on attempt to scale up job with RocksDB state backend Attached. ________________________________ From: Yun Tang <myas...@live.com> Sent: Tuesday, March 16, 2021 11:13 PM To: Alexey Trenikhun <yen...@msn.com>; Tzu-Li (Gordon) Tai <tzuli...@apache.org>; user@flink.apache.org <user@flink.apache.org> Subject: Re: EOFException on attempt to scale up job with RocksDB state backend Hi Alexey, Thanks for your reply, could you also share logs during normal restoring just as I wrote in previous thread so that I could compare. Best Yun Tang ________________________________ From: Alexey Trenikhun <yen...@msn.com> Sent: Wednesday, March 17, 2021 13:55 To: Yun Tang <myas...@live.com>; Tzu-Li (Gordon) Tai <tzuli...@apache.org>; user@flink.apache.org <user@flink.apache.org> Subject: Re: EOFException on attempt to scale up job with RocksDB state backend Hi Yun, I'm attaching shorter version of log, looks like full version didn't come through Thanks, Alexey ________________________________ From: Yun Tang <myas...@live.com> Sent: Tuesday, March 16, 2021 8:05 PM To: Alexey Trenikhun <yen...@msn.com>; Tzu-Li (Gordon) Tai <tzuli...@apache.org>; user@flink.apache.org <user@flink.apache.org> Subject: Re: EOFException on attempt to scale up job with RocksDB state backend Hi Alexey, I believe your exception messages are printed from Flink-1.12.2 not Flink-1.12.1 due to the line number of method calling. Could you share exception message of Flink-1.12.1 when rescaling? Moreover, I hope you could share more logs during restoring and rescaling. I want to see details of key group handle [1] [1] https://github.com/apache/flink/blob/dc404e2538fdfbc98b9c565951f30f922bf7cedd/flink-state-backends/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/restore/RocksDBFullRestoreOperation.java#L153 Best ________________________________ From: Alexey Trenikhun <yen...@msn.com> Sent: Tuesday, March 16, 2021 15:10 To: Yun Tang <myas...@live.com>; Tzu-Li (Gordon) Tai <tzuli...@apache.org>; user@flink.apache.org <user@flink.apache.org> Subject: Re: EOFException on attempt to scale up job with RocksDB state backend Also restore from same savepoint without change in parallelism works fine. ________________________________ From: Alexey Trenikhun <yen...@msn.com> Sent: Monday, March 15, 2021 9:51 PM To: Yun Tang <myas...@live.com>; Tzu-Li (Gordon) Tai <tzuli...@apache.org>; user@flink.apache.org <user@flink.apache.org> Subject: Re: EOFException on attempt to scale up job with RocksDB state backend No, I believe original exception was from 1.12.1 to 1.12.1 Thanks, Alexey ________________________________ From: Yun Tang <myas...@live.com> Sent: Monday, March 15, 2021 8:07:07 PM To: Alexey Trenikhun <yen...@msn.com>; Tzu-Li (Gordon) Tai <tzuli...@apache.org>; user@flink.apache.org <user@flink.apache.org> Subject: Re: EOFException on attempt to scale up job with RocksDB state backend Hi, Can you scale the job at the same version from 1.12.1 to 1.12.1? Best Yun Tang ________________________________ From: Alexey Trenikhun <yen...@msn.com> Sent: Tuesday, March 16, 2021 4:46 To: Tzu-Li (Gordon) Tai <tzuli...@apache.org>; user@flink.apache.org <user@flink.apache.org> Subject: Re: EOFException on attempt to scale up job with RocksDB state backend Savepoint was taken with 1.12.1, I've tried to scale up using same version and 1.12.2 ________________________________ From: Tzu-Li (Gordon) Tai <tzuli...@apache.org> Sent: Monday, March 15, 2021 12:06 AM To: user@flink.apache.org <user@flink.apache.org> Subject: Re: EOFException on attempt to scale up job with RocksDB state backend Hi, Could you provide info on the Flink version used? Cheers, Gordon -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/