Sorry for the late answer. I haven’t been in the office.

The logs show no problems.
The files that remain in the shared subfolder are almost all 1121 bytes. Except 
the files from the latest checkpoint (30 files for all operators)
For each historic checkpoint six files remain (parallelism is 6)

checkpoints/stp/2b160fc9e5eba47d1906d04f36b399bf/shared/

-rw-r--r--. 1 flink ice     11318 Dec  4 10:23 
046b16e5-0edc-4ca5-9757-d89aa86f5d5c
-rw-r--r--. 1 flink ice    308383 Dec  4 10:23 
dab0f801-8907-4b10-ae8b-5f5f71c28524
-rw-r--r--. 1 flink ice    101035 Dec  4 10:23 
e4915253-7025-4e36-8671-58a371f202ff
-rw-r--r--. 1 flink ice     11318 Dec  4 10:23 
613a9fd1-4545-4785-82ef-92f8c4818bc0
-rw-r--r--. 1 flink ice    308270 Dec  4 10:23 
23771709-03ef-4417-acd2-1c27c8b0785e
-rw-r--r--. 1 flink ice     11318 Dec  4 10:23 
59643ae2-57b8-4d9e-9f2e-977545db9238
-rw-r--r--. 1 flink ice    102444 Dec  4 10:23 
f9a70bb7-d4f3-4d94-af31-8e94276001b1
-rw-r--r--. 1 flink ice    308346 Dec  4 10:23 
bafc1a20-e5a4-4b06-93fe-db00f3f3913a
-rw-r--r--. 1 flink ice     96035 Dec  4 10:23 
a8b3aa75-fa44-4fc2-ab49-437d49410acd
-rw-r--r--. 1 flink ice     11318 Dec  4 10:23 
961f1472-b61c-4c04-b544-4549c3ce4038
-rw-r--r--. 1 flink ice    308387 Dec  4 10:23 
7245dcfe-62e8-42a3-94a9-0698bdf9fb4d
-rw-r--r--. 1 flink ice     99209 Dec  4 10:23 
5ad87ff2-604c-40d9-9262-76ac6369453d
-rw-r--r--. 1 flink ice     11318 Dec  4 10:23 
ccaae280-837a-4bc9-95cd-3612c4cd435c
-rw-r--r--. 1 flink ice    308451 Dec  4 10:23 
69b650dd-7969-4891-b973-9bd168e2f40e
-rw-r--r--. 1 flink ice    105638 Dec  4 10:23 
16f3092d-76eb-4fd7-87b4-2eb28ca4696c
-rw-r--r--. 1 flink ice     11318 Dec  4 10:23 
eccb19a0-4570-4e49-be97-707248690fe8
-rw-r--r--. 1 flink ice    308513 Dec  4 10:23 
47259326-cb62-4abb-ad0b-5e2deda97685
-rw-r--r--. 1 flink ice    109918 Dec  4 10:23 
aebc4390-3467-4596-a5fa-0d0a6dc74f54
-rw-r--r--. 1 flink ice 259444946 Dec  4 10:22 
3e97ea93-4bc9-4404-aeca-eed31e96a14b
-rw-r--r--. 1 flink ice 247501755 Dec  4 10:22 
b23d2c3a-94eb-45f0-bd29-457d8796624d
-rw-r--r--. 1 flink ice 247754788 Dec  4 10:22 
4eb66b02-6758-4cff-9de2-fb7399b2ac0b
-rw-r--r--. 1 flink ice 247281033 Dec  4 10:22 
2aeeb9ad-2714-481c-a7a4-04148f72671d
-rw-r--r--. 1 flink ice 247345955 Dec  4 10:22 
5ccf700a-bd83-4e02-93a5-db46ca71e47a
-rw-r--r--. 1 flink ice 259312070 Dec  4 10:22 
3bebe32d-0ad3-4f4e-a3aa-21719fa62c87
-rw-r--r--. 1 flink ice     97551 Dec  4 10:22 
5a4f8a3a-0f26-46e7-883e-d6fafd733183
-rw-r--r--. 1 flink ice    104198 Dec  4 10:22 
cdbb4913-7dd0-4614-8b81-276a4cdf62cc
-rw-r--r--. 1 flink ice    101466 Dec  4 10:22 
ce7f0fea-8cd3-4827-9ef1-ceba569c2989
-rw-r--r--. 1 flink ice    108561 Dec  4 10:22 
5bd3f681-c131-4c41-9fdc-6a39b9954aa7
-rw-r--r--. 1 flink ice     98649 Dec  4 10:22 
d5d8eb16-3bd8-4695-91a0-9d9089ca9510
-rw-r--r--. 1 flink ice    102071 Dec  4 10:22 
f8e34ef1-60d6-4c0a-954b-64c8a0320834
-rw-r--r--. 1 flink ice      1121 Dec  4 10:21 
8fda9911-f63e-45a6-b95a-5c93fe99d0fd
-rw-r--r--. 1 flink ice      1121 Dec  4 10:21 
82545c1b-69d6-499b-a9fd-62e227b820c6
-rw-r--r--. 1 flink ice      1121 Dec  4 10:21 
f9fa3bba-c92d-4dda-b16e-0ba417edf5d2
-rw-r--r--. 1 flink ice      1121 Dec  4 10:21 
844fa51d-bb74-4bec-ab15-e52d37703d24
-rw-r--r--. 1 flink ice      1121 Dec  4 10:21 
2115654a-4544-41cc-bbee-a36d0d80d8eb
-rw-r--r--. 1 flink ice      1121 Dec  4 10:21 
acfc1566-5f14-47d7-ae54-7aa1dfb3859c
-rw-r--r--. 1 flink ice      1121 Dec  4 10:16 
b0144120-cce0-4b4d-9f8c-1564b9abedd9
-rw-r--r--. 1 flink ice      1121 Dec  4 10:16 
8ab4ddab-3665-4307-a581-ab413e1e2080
-rw-r--r--. 1 flink ice      1121 Dec  4 10:16 
0f8c4b1a-df5d-47f7-b960-e671cfc3c666
-rw-r--r--. 1 flink ice      1121 Dec  4 10:16 
40baf147-400e-455f-aea3-074355a77031
-rw-r--r--. 1 flink ice      1121 Dec  4 10:16 
47d2deca-1703-4dd3-9fea-e027087d553e
-rw-r--r--. 1 flink ice      1121 Dec  4 10:16 
aa336ce0-3689-4b7d-a472-b0a3ed2f5eb9
-rw-r--r--. 1 flink ice      1121 Dec  4 10:11 
ee15f1e0-d23c-4add-86b4-e4ab51bb2a20
-rw-r--r--. 1 flink ice      1121 Dec  4 10:11 
f440b5cf-8f62-4532-a886-a2cedc9a043e
-rw-r--r--. 1 flink ice      1121 Dec  4 10:11 
de423c46-4288-464b-97cb-6f7764b88dfd
-rw-r--r--. 1 flink ice      1121 Dec  4 10:11 
273a15cb-8c9f-4412-b5d2-68397ba461c9
-rw-r--r--. 1 flink ice      1121 Dec  4 10:11 
bb38b011-070d-4c21-b04a-4e923f85de86
-rw-r--r--. 1 flink ice      1121 Dec  4 10:11 
969abc07-d313-4d79-8119-6e1f3886be48
-rw-r--r--. 1 flink ice      1121 Dec  4 10:06 
eb0b2591-653c-47bd-a6b2-9f6634ff4f0a
-rw-r--r--. 1 flink ice      1121 Dec  4 10:06 
20b7e49a-ace5-4ef7-987f-0d328f47c56f
-rw-r--r--. 1 flink ice      1121 Dec  4 10:06 
a25c2bd9-7fe9-4558-b9dd-30b525a0b435
-rw-r--r--. 1 flink ice      1121 Dec  4 10:06 
dcd0852f-58dc-467e-93db-5700cd4f606e
-rw-r--r--. 1 flink ice      1121 Dec  4 10:06 
400e5038-2913-4aea-932d-92f508bd38f7
-rw-r--r--. 1 flink ice      1121 Dec  4 10:06 
10ce727b-9389-4911-b0d4-1b342dd3232c
-rw-r--r--. 1 flink ice      1121 Dec  4 10:01 
daec0dcb-384a-4d86-a423-7e2b0482b70e
-rw-r--r--. 1 flink ice      1121 Dec  4 10:01 
c787d58a-4bd5-4d9a-a4d8-47d9618552ff
-rw-r--r--. 1 flink ice      1121 Dec  4 10:01 
b2c1383a-8452-4ec6-9064-a5e8f56e6f21
-rw-r--r--. 1 flink ice      1121 Dec  4 10:01 
65c0c908-b604-4ac9-a72f-4bb87676df11
-rw-r--r--. 1 flink ice      1121 Dec  4 10:01 
42d59072-95ec-42d5-81ff-b9447cb39fd0
-rw-r--r--. 1 flink ice      1121 Dec  4 10:01 
b40c2577-e604-482d-86a1-ddb785e3b799

The ttldb stream backend is mostly a copy of the current 
flink-statebackend-rocksb which uses the TtlDB instead of the RockDB class with 
the following configuration:


    protected void buildDefaultStateBackend(StreamExecutionEnvironment 
executionEnvironment) throws IOException {
        FsStateBackend fsStateBackend = new 
FsStateBackend(configuration.getCheckpointDirectory(), true);
        RocksDBStateBackend rocksDBStateBackend = new 
de.helaba.rtts.ice.rocksdb.ttl.RocksDBStateBackend(fsStateBackend, 
TernaryBoolean.TRUE, configuration.getStateTimeToLive());
        
rocksDBStateBackend.setPredefinedOptions(PredefinedOptions.SPINNING_DISK_OPTIMIZED_HIGH_MEM);
        rocksDBStateBackend.setOptions(new StpRocksDbOptions());
        executionEnvironment.setStateBackend(rocksDBStateBackend);
    }

    protected void configureCheckpointing(StreamExecutionEnvironment 
executionEnvironment) {
        CheckpointConfig checkpointConfig = 
executionEnvironment.getCheckpointConfig();
        
checkpointConfig.setCheckpointInterval(configuration.getCheckpointInterval());
        
checkpointConfig.setMinPauseBetweenCheckpoints(configuration.getCheckpointMinPause());
        
checkpointConfig.enableExternalizedCheckpoints(CheckpointConfig.ExternalizedCheckpointCleanup.DELETE_ON_CANCELLATION);
        checkpointConfig.setCheckpointingMode(CheckpointingMode.EXACTLY_ONCE);
        executionEnvironment.getConfig().setUseSnapshotCompression(true);
    }



Von: Andrey Zagrebin [mailto:and...@data-artisans.com]
Gesendet: Donnerstag, 29. November 2018 15:38
An: Winterstein, Bernd
Cc: Kostas Kloudas; user; Stefan Richter; Till Rohrmann; Stephan Ewen
Betreff: Re: number of files in checkpoint directory grows endlessly

Could you share the logs to check possible failures to subsume or remove 
previous checkpoints?
What is the sizes of the files? It can help to understand how compaction goes.
Could you also provide more details how you setup TtlDb with Flink?

Best,
Andrey


On 29 Nov 2018, at 11:34, Andrey Zagrebin 
<and...@data-artisans.com<mailto:and...@data-artisans.com>> wrote:

Compaction merges SST files in background using native threads. While merging 
it filters out removed and expired data. In general, the idea is that there are 
enough resources for compaction to keep up with the DB update rate and reduce 
storage. It can be quite IO intensive. Compaction has a lot of tuning knobs and 
statistics to monitor the process [1] which are usually out of the scope of 
Flink depending on state access pattern of the application. You can create and 
set RocksDBStateBackend for you application in Flink and configure it with 
custom RocksDb/column specific options.

[1] https://github.com/facebook/rocksdb/wiki/RocksDB-Tuning-Guide
[2] https://github.com/facebook/rocksdb/wiki/Compaction


On 29 Nov 2018, at 11:20, 
<bernd.winterst...@dev.helaba.de<mailto:bernd.winterst...@dev.helaba.de>> 
<bernd.winterst...@dev.helaba.de<mailto:bernd.winterst...@dev.helaba.de>> wrote:

We use TtlDB because the state contents should expire automatically after 24 
hours. Therefore we only changed the state backend to use TtlDb instead of 
RocksDB with a fixed retention time.

We have a slow IO because we only have SAN volumes available. Can you further 
clarify the problem with slow compaction.

Regards,

Bernd


-----Ursprüngliche Nachricht-----
Von: Andrey Zagrebin [mailto:and...@data-artisans.com]
Gesendet: Donnerstag, 29. November 2018 11:01
An: Winterstein, Bernd
Cc: Kostas Kloudas; user; 
s.rich...@data-artisans.com<mailto:s.rich...@data-artisans.com>; 
t...@data-artisans.com<mailto:t...@data-artisans.com>; 
step...@data-artisans.com<mailto:step...@data-artisans.com>
Betreff: Re: number of files in checkpoint directory grows endlessly

If you use incremental checkpoints, state backend stores raw RocksDB SST files 
which represent all state data. Each checkpoint adds SST files with new updates 
which are not present in previous checkpoint, basically their difference.

One of the following could be happening:
- old keys are not explicitly deleted or expire (depending on how TtlDb is used)
- compaction is too slow to drop older SST files for the latest checkpoint so 
that they can be deleted with the previous checkpoints


On 29 Nov 2018, at 10:48, 
<bernd.winterst...@dev.helaba.de<mailto:bernd.winterst...@dev.helaba.de>> 
<bernd.winterst...@dev.helaba.de<mailto:bernd.winterst...@dev.helaba.de>> wrote:

Hi
We use Flink 1..6.2. As for the checkpoint directory there is only one chk-xxx 
directory. Therefore if would expect only one checkpoint remains.
The value of 'state.checkpoints.num-retained’ is not set explicitly.

The problem is not the number of checkpoints but the number of files in the 
"shared" directory next to the chk-xxx directory.


-----Ursprüngliche Nachricht-----
Von: Andrey Zagrebin [mailto:and...@data-artisans.com]
Gesendet: Donnerstag, 29. November 2018 10:39
An: Kostas Kloudas
Cc: Winterstein, Bernd; user; Stefan Richter; Till Rohrmann; Stephan
Ewen
Betreff: Re: number of files in checkpoint directory grows endlessly

Hi Bernd,

Did you change 'state.checkpoints.num-retained’ in flink-conf.yaml? By default, 
only one checkpoint should be retained.

Which version of Flink do you use?
Can you check Job Master logs whether you see there warning like this:
`Fail to subsume the old checkpoint`?

Best,
Andrey


On 29 Nov 2018, at 10:18, Kostas Kloudas 
<k.klou...@data-artisans.com<mailto:k.klou...@data-artisans.com>> wrote:

Hi Bernd,

I think the Till, Stefan or Stephan (cc'ed) are the best to answer your 
question.

Cheers,
Kostas

________________________________


Landesbank Hessen-Thueringen Girozentrale Anstalt des oeffentlichen
Rechts
Sitz: Frankfurt am Main / Erfurt
Amtsgericht Frankfurt am Main, HRA 29821 / Amtsgericht Jena, HRA
102181

Bitte nutzen Sie die E-Mail-Verbindung mit uns ausschliesslich zum 
Informationsaustausch. Wir koennen auf diesem Wege keine rechtsgeschaeftlichen 
Erklaerungen (Auftraege etc.) entgegennehmen.

Der Inhalt dieser Nachricht ist vertraulich und nur fuer den angegebenen 
Empfaenger bestimmt. Jede Form der Kenntnisnahme oder Weitergabe durch Dritte 
ist unzulaessig. Sollte diese Nachricht nicht fur Sie bestimmt sein, so bitten 
wir Sie, sich mit uns per E-Mail oder telefonisch in Verbindung zu setzen.

Please use your E-mail connection with us exclusively for the exchange of 
information. We do not accept legally binding declarations (orders, etc.) by 
this means of communication.

The contents of this message is confidential and intended only for the
recipient indicated. Taking notice of this message or disclosure by third 
parties is not permitted. In the event that this message is not intended for 
you, please contact us via E-mail or phone.

________________________________


Landesbank Hessen-Thueringen Girozentrale
Anstalt des oeffentlichen Rechts
Sitz: Frankfurt am Main / Erfurt
Amtsgericht Frankfurt am Main, HRA 29821 / Amtsgericht Jena, HRA 102181

Bitte nutzen Sie die E-Mail-Verbindung mit uns ausschliesslich zum 
Informationsaustausch. Wir koennen auf diesem Wege keine rechtsgeschaeftlichen 
Erklaerungen (Auftraege etc.) entgegennehmen.

Der Inhalt dieser Nachricht ist vertraulich und nur fuer den angegebenen 
Empfaenger bestimmt. Jede Form der Kenntnisnahme oder Weitergabe durch Dritte 
ist unzulaessig. Sollte diese Nachricht nicht fur Sie bestimmt sein, so bitten 
wir Sie, sich mit uns per E-Mail oder telefonisch in Verbindung zu setzen.

Please use your E-mail connection with us exclusively for the exchange of 
information. We do not accept legally binding declarations (orders, etc.) by 
this means of communication.

The contents of this message is confidential and intended only for the 
recipient indicated. Taking notice of this message or disclosure by third 
parties is not
permitted. In the event that this message is not intended for you, please 
contact us via E-mail or phone.


Reply via email to