Re: Data loss when restoring from savepoint

Juho Autio Tue, 23 Oct 2018 05:34:03 -0700

I was glad to find that bravo had now been updated to support installing
bravo to a local maven repo.


I was able to load a checkpoint created by my job, thanks to the example
provided in bravo README, but I'm still missing the essential piece.

My code was:

        OperatorStateReader reader = new OperatorStateReader(env2,
savepoint, "DistinctFunction");
        DontKnowWhatTypeThisIs reducingState = reader.readKeyedStates(what
should I put here?);

I don't know how to read the values collected from reduce() calls in the
state. Is there a way to access the reducing state of the window with
bravo? I'm a bit confused how this works, because when I check with
debugger, flink internally uses a ReducingStateDescriptor
with name=window-contents, but still reading operator state for
"DistinctFunction" didn't at least throw an exception ("window-contents"
threw – obviously there's no operator by that name).

Cheers,
Juho

On Mon, Oct 15, 2018 at 2:25 PM Juho Autio <juho.au...@rovio.com> wrote:

> Hi Stefan,
>
> Sorry but it doesn't seem immediately clear to me what's a good way to use
> https://github.com/king/bravo.
>
> How are people using it? Would you for example modify build.gradle somehow
> to publish the bravo as a library locally/internally? Or add code directly
> in the bravo project (locally) and run it from there (using an IDE, for
> example)? Also it doesn't seem like the bravo gradle project supports
> building a flink job jar, but if it does, how do I do it?
>
> Thanks.
>
> On Thu, Oct 4, 2018 at 9:30 PM Juho Autio <juho.au...@rovio.com> wrote:
>
>> Good then, I'll try to analyze the savepoints with Bravo. Thanks!
>>
>> > How would you assume that backpressure would influence your updates?
>> Updates to each local state still happen event-by-event, in a single
>> reader/writing thread.
>>
>> Sure, just an ignorant guess by me. I'm not familiar with most of Flink's
>> internals. Any way high backpressure is not a seen on this job after it has
>> caught up the lag, so at I thought it would be worth mentioning.
>>
>> On Thu, Oct 4, 2018 at 6:24 PM Stefan Richter <
>> s.rich...@data-artisans.com> wrote:
>>
>>> Hi,
>>>
>>> Am 04.10.2018 um 16:08 schrieb Juho Autio <juho.au...@rovio.com>:
>>>
>>> > you could take a look at Bravo [1] to query your savepoints and to
>>> check if the state in the savepoint complete w.r.t your expectations
>>>
>>> Thanks. I'm not 100% if this is the case, but to me it seemed like the
>>> missed ids were being logged by the reducer soon after the job had started
>>> (after restoring a savepoint). But on the other hand, after that I also
>>> made another savepoint & restored that, so what I could check is: does that
>>> next savepoint have the missed ids that were logged (a couple of minutes
>>> before the savepoint was created, so there should've been more than enough
>>> time to add them to the state before the savepoint was triggered) or not.
>>> Any way, if I would be able to verify with Bravo that the ids are missing
>>> from the savepoint (even though reduced logged that it saw them), would
>>> that help in figuring out where they are lost? Is there some major
>>> difference compared to just looking at the final output after window has
>>> been triggered?
>>>
>>>
>>>
>>> I think that makes a difference. For example, you can investigate if
>>> there is a state loss or a problem with the windowing. In the savepoint you
>>> could see which keys exists and to which windows they are assigned. Also
>>> just to make sure there is no misunderstanding: only elements that are in
>>> the state at the start of a savepoint are expected to be part of the
>>> savepoint; all elements between start and completion of the savepoint are
>>> not expected to be part of the savepoint.
>>>
>>>
>>> > I also doubt that the problem is about backpressure after restore,
>>> because the job will only continue running after the state restore is
>>> already completed.
>>>
>>> Yes, I'm not suspecting that the state restoring would be the problem
>>> either. My concern was about backpressure possibly messing with the updates
>>> of reducing state? I would tend to suspect that updating the state
>>> consistently is what fails, where heavy load / backpressure might be a
>>> factor.
>>>
>>>
>>>
>>> How would you assume that backpressure would influence your updates?
>>> Updates to each local state still happen event-by-event, in a single
>>> reader/writing thread.
>>>
>>>
>>> On Thu, Oct 4, 2018 at 4:18 PM Stefan Richter <
>>> s.rich...@data-artisans.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> you could take a look at Bravo [1] to query your savepoints and to
>>>> check if the state in the savepoint complete w.r.t your expectations. I
>>>> somewhat doubt that there is a general problem with the state/savepoints
>>>> because many users are successfully running it on a large state and I am
>>>> not aware of any data loss problems, but nothing is impossible. What the
>>>> savepoint does is also straight forward: iterate a db snapshot and write
>>>> all key/value pairs to disk, so all data that was in the db at the time of
>>>> the savepoint, should show up. I also doubt that the problem is about
>>>> backpressure after restore, because the job will only continue running
>>>> after the state restore is already completed. Did you check if you are
>>>> using exactly-one-semantics or at-least-once semantics? Also did you check
>>>> that the kafka consumer start position is configured properly [2]? Are
>>>> watermarks generated as expected after restore?
>>>>
>>>> One more unrelated high-level comment that I have: for a granularity of
>>>> 24h windows, I wonder if it would not make sense to use a batch job 
>>>> instead?
>>>>
>>>> Best,
>>>> Stefan
>>>>
>>>> [1] https://github.com/king/bravo
>>>> [2]
>>>> https://ci.apache.org/projects/flink/flink-docs-stable/dev/connectors/kafka.html#kafka-consumers-start-position-configuration
>>>>
>>>> Am 04.10.2018 um 14:53 schrieb Juho Autio <juho.au...@rovio.com>:
>>>>
>>>> Thanks for the suggestions!
>>>>
>>>> > In general, it would be tremendously helpful to have a minimal
>>>> working example which allows to reproduce the problem.
>>>>
>>>> Definitely. The problem with reproducing has been that this only seems
>>>> to happen in the bigger production data volumes.
>>>>
>>>> That's why I'm hoping to find a way to debug this with the production
>>>> data. With that it seems to consistently cause some misses every time the
>>>> job is killed/restored.
>>>>
>>>> > check if it happens for shorter windows, like 1h etc
>>>>
>>>> What would be the benefit of that compared to 24h window?
>>>>
>>>> > simplify the job to not use a reduce window but simply a time window
>>>> which outputs the window events. Then counting the input and output events
>>>> should allow you to verify the results. If you are not seeing missing
>>>> events, then it could have something to do with the reducing state used in
>>>> the reduce function.
>>>>
>>>> Hm, maybe, but not sure how useful that would be, because it wouldn't
>>>> yet prove that it's related to reducing, because not having a reduce
>>>> function could also mean smaller load on the job, which might alone be
>>>> enough to make the problem not manifest.
>>>>
>>>> Is there a way to debug what goes into the reducing state (including
>>>> what gets removed or overwritten and what restored), if that makes sense..?
>>>> Maybe some suitable logging could be used to prove that the lost data is
>>>> written to the reducing state (or at least asked to be written), but not
>>>> found any more when the window closes and state is flushed?
>>>>
>>>> On configuration once more, we're using RocksDB state backend with
>>>> asynchronous incremental checkpointing. The state is restored from
>>>> savepoints though, we haven't been using those checkpoints in these tests
>>>> (although they could be used in case of crashes – but we haven't had those
>>>> now).
>>>>
>>>> On Thu, Oct 4, 2018 at 3:25 PM Till Rohrmann <trohrm...@apache.org>
>>>> wrote:
>>>>
>>>>> Hi Juho,
>>>>>
>>>>> another idea to further narrow down the problem could be to simplify
>>>>> the job to not use a reduce window but simply a time window which outputs
>>>>> the window events. Then counting the input and output events should allow
>>>>> you to verify the results. If you are not seeing missing events, then it
>>>>> could have something to do with the reducing state used in the reduce
>>>>> function.
>>>>>
>>>>> In general, it would be tremendously helpful to have a minimal working
>>>>> example which allows to reproduce the problem.
>>>>>
>>>>> Cheers,
>>>>> Till
>>>>>
>>>>> On Thu, Oct 4, 2018 at 2:02 PM Andrey Zagrebin <
>>>>> and...@data-artisans.com> wrote:
>>>>>
>>>>>> Hi Juho,
>>>>>>
>>>>>> can you try to reduce the job to minimal reproducible example and
>>>>>> share the job and input?
>>>>>>
>>>>>> For example:
>>>>>> - some simple records as input, e.g. tuples of primitive types saved
>>>>>> as cvs
>>>>>> - minimal deduplication job which processes them and misses records
>>>>>> - check if it happens for shorter windows, like 1h etc
>>>>>> - setup which you use for the job, ideally locally reproducible or
>>>>>> cloud
>>>>>>
>>>>>> Best,
>>>>>> Andrey
>>>>>>
>>>>>> On 4 Oct 2018, at 11:13, Juho Autio <juho.au...@rovio.com> wrote:
>>>>>>
>>>>>> Sorry to insist, but we seem to be blocked for any serious usage of
>>>>>> state in Flink if we can't rely on it to not miss data in case of 
>>>>>> restore.
>>>>>>
>>>>>> Would anyone have suggestions for how to troubleshoot this? So far I
>>>>>> have verified with DEBUG logs that our reduce function gets to process 
>>>>>> also
>>>>>> the data that is missing from window output.
>>>>>>
>>>>>> On Mon, Oct 1, 2018 at 11:56 AM Juho Autio <juho.au...@rovio.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi Andrey,
>>>>>>>
>>>>>>> To rule out for good any questions about sink behaviour, the job was
>>>>>>> killed and started with an additional Kafka sink.
>>>>>>>
>>>>>>> The same number of ids were missed in both outputs: KafkaSink &
>>>>>>> BucketingSink.
>>>>>>>
>>>>>>> I wonder what would be the next steps in debugging?
>>>>>>>
>>>>>>> On Fri, Sep 21, 2018 at 3:49 PM Juho Autio <juho.au...@rovio.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Thanks, Andrey.
>>>>>>>>
>>>>>>>> > so it means that the savepoint does not loose at least some
>>>>>>>> dropped records.
>>>>>>>>
>>>>>>>> I'm not sure what you mean by that? I mean, it was known from the
>>>>>>>> beginning, that not everything is lost before/after restoring a 
>>>>>>>> savepoint,
>>>>>>>> just some records around the time of restoration. It's not 100% clear
>>>>>>>> whether records are lost before making a savepoint or after restoring 
>>>>>>>> it.
>>>>>>>> Although, based on the new DEBUG logs it seems more like losing some
>>>>>>>> records that are seen ~soon after restoring. It seems like Flink would 
>>>>>>>> be
>>>>>>>> somehow confused either about the restored state vs. new inserts to 
>>>>>>>> state.
>>>>>>>> This could also be somehow linked to the high back pressure on the 
>>>>>>>> kafka
>>>>>>>> source while the stream is catching up.
>>>>>>>>
>>>>>>>> > If it is feasible for your setup, I suggest to insert one more
>>>>>>>> map function after reduce and before sink.
>>>>>>>> > etc.
>>>>>>>>
>>>>>>>> Isn't that the same thing that we discussed before? Nothing is sent
>>>>>>>> to BucketingSink before the window closes, so I don't see how it would 
>>>>>>>> make
>>>>>>>> any difference if we replace the BucketingSink with a map function or
>>>>>>>> another sink type. We don't create or restore savepoints during the 
>>>>>>>> time
>>>>>>>> when BucketingSink gets input or has open buckets – that happens at a 
>>>>>>>> much
>>>>>>>> later time of day. I would focus on figuring out why the records are 
>>>>>>>> lost
>>>>>>>> while the window is open. But I don't know how to do that. Would you 
>>>>>>>> have
>>>>>>>> any additional suggestions?
>>>>>>>>
>>>>>>>> On Fri, Sep 21, 2018 at 3:30 PM Andrey Zagrebin <
>>>>>>>> and...@data-artisans.com> wrote:
>>>>>>>>
>>>>>>>>> Hi Juho,
>>>>>>>>>
>>>>>>>>> so it means that the savepoint does not loose at least some
>>>>>>>>> dropped records.
>>>>>>>>>
>>>>>>>>> If it is feasible for your setup, I suggest to insert one more map
>>>>>>>>> function after reduce and before sink.
>>>>>>>>> The map function should be called right after window is triggered
>>>>>>>>> but before flushing to s3.
>>>>>>>>> The result of reduce (deduped record) could be logged there.
>>>>>>>>> This should allow to check whether the processed distinct records
>>>>>>>>> were buffered in the state after the restoration from the savepoint 
>>>>>>>>> or not.
>>>>>>>>> If they were buffered we should see that there was an attempt to 
>>>>>>>>> write them
>>>>>>>>> to the sink from the state.
>>>>>>>>>
>>>>>>>>> Another suggestion is to try to write records to some other sink
>>>>>>>>> or to both.
>>>>>>>>> E.g. if you can access file system of workers, maybe just into
>>>>>>>>> local files and check whether the records are also dropped there.
>>>>>>>>>
>>>>>>>>> Best,
>>>>>>>>> Andrey
>>>>>>>>>
>>>>>>>>> On 20 Sep 2018, at 15:37, Juho Autio <juho.au...@rovio.com> wrote:
>>>>>>>>>
>>>>>>>>> Hi Andrey!
>>>>>>>>>
>>>>>>>>> I was finally able to gather the DEBUG logs that you suggested. In
>>>>>>>>> short, the reducer logged that it processed at least some of the ids 
>>>>>>>>> that
>>>>>>>>> were missing from the output.
>>>>>>>>>
>>>>>>>>> "At least some", because I didn't have the job running with DEBUG
>>>>>>>>> logs for the full 24-hour window period. So I was only able to look 
>>>>>>>>> up if I
>>>>>>>>> can find *some* of the missing ids in the DEBUG logs. Which I did
>>>>>>>>> indeed.
>>>>>>>>>
>>>>>>>>> I changed the DistinctFunction.java to do this:
>>>>>>>>>
>>>>>>>>>     @Override
>>>>>>>>>     public Map<String, String> reduce(Map<String, String> value1,
>>>>>>>>> Map<String, String> value2) {
>>>>>>>>>         LOG.debug("DistinctFunction.reduce returns: {}={}",
>>>>>>>>> value1.get("field"), value1.get("id"));
>>>>>>>>>         return value1;
>>>>>>>>>     }
>>>>>>>>>
>>>>>>>>> Then:
>>>>>>>>>
>>>>>>>>> vi flink-1.6.0/conf/log4j.properties
>>>>>>>>>
>>>>>>>>> log4j.logger.org.apache.flink.streaming.runtime.tasks.StreamTask=DEBUG
>>>>>>>>> log4j.logger.com.rovio.ds.flink.uniqueid.DistinctFunction=DEBUG
>>>>>>>>>
>>>>>>>>> Then I ran the following kind of test:
>>>>>>>>>
>>>>>>>>> - Cancelled the on-going job with savepoint created at ~Sep 18
>>>>>>>>> 08:35 UTC 2018
>>>>>>>>> - Started a new cluster & job with DEBUG enabled at ~09:13,
>>>>>>>>> restored from that previous cluster's savepoint
>>>>>>>>> - Ran until caught up offsets
>>>>>>>>> - Cancelled the job with a new savepoint
>>>>>>>>> - Started a new job _without_ DEBUG, which restored the new
>>>>>>>>> savepoint, let it keep running so that it will eventually write the 
>>>>>>>>> output
>>>>>>>>>
>>>>>>>>> Then on the next day, after results had been flushed when the
>>>>>>>>> 24-hour window closed, I compared the results again with a batch 
>>>>>>>>> version's
>>>>>>>>> output. And found some missing ids as usual.
>>>>>>>>>
>>>>>>>>> I drilled down to one specific missing id (I'm replacing the
>>>>>>>>> actual value with AN12345 below), which was not found in the stream 
>>>>>>>>> output,
>>>>>>>>> but was found in batch output & flink DEBUG logs.
>>>>>>>>>
>>>>>>>>> Related to that id, I gathered the following information:
>>>>>>>>>
>>>>>>>>> 2018-09-18~09:13:21,000 job started & savepoint is restored
>>>>>>>>>
>>>>>>>>> 2018-09-18 09:14:29,085 missing id is processed for the first
>>>>>>>>> time, proved by this log line:
>>>>>>>>> 2018-09-18 09:14:29,085 DEBUG
>>>>>>>>> com.rovio.ds.flink.uniqueid.DistinctFunction                  -
>>>>>>>>> DistinctFunction.reduce returns: s.aid1=AN12345
>>>>>>>>>
>>>>>>>>> 2018-09-18 09:15:14,264 first synchronous part of checkpoint
>>>>>>>>> 2018-09-18 09:15:16,544 first asynchronous part of checkpoint
>>>>>>>>>
>>>>>>>>> (
>>>>>>>>> more occurrences of checkpoints (~1 min checkpointing time + ~1
>>>>>>>>> min delay before next)
>>>>>>>>> /
>>>>>>>>> more occurrences of DistinctFunction.reduce
>>>>>>>>> )
>>>>>>>>>
>>>>>>>>> 2018-09-18 09:23:45,053 missing id is processed for the last time
>>>>>>>>>
>>>>>>>>> 2018-09-18~10:20:00,000 savepoint created & job cancelled
>>>>>>>>>
>>>>>>>>> To be noted, there was high backpressure after restoring from
>>>>>>>>> savepoint until the stream caught up with the kafka offsets. 
>>>>>>>>> Although, our
>>>>>>>>> job uses assign timestamps & watermarks on the flink kafka consumer 
>>>>>>>>> itself,
>>>>>>>>> so event time of all partitions is synchronized. As expected, we 
>>>>>>>>> don't get
>>>>>>>>> any late data in the late data side output.
>>>>>>>>>
>>>>>>>>> From this we can see that the missing ids are processed by the
>>>>>>>>> reducer, but they must get lost somewhere before the 24-hour window is
>>>>>>>>> triggered.
>>>>>>>>>
>>>>>>>>> I think it's worth mentioning once more that the stream doesn't
>>>>>>>>> miss any ids if we let it's running without interruptions / state 
>>>>>>>>> restoring.
>>>>>>>>>
>>>>>>>>> What's next?
>>>>>>>>>
>>>>>>>>> On Wed, Aug 29, 2018 at 3:49 PM Andrey Zagrebin <
>>>>>>>>> and...@data-artisans.com> wrote:
>>>>>>>>>
>>>>>>>>>> Hi Juho,
>>>>>>>>>>
>>>>>>>>>> > only when the 24-hour window triggers, BucketingSink gets a
>>>>>>>>>> burst of input
>>>>>>>>>>
>>>>>>>>>> This is of course totally true, my understanding is the same. We
>>>>>>>>>> cannot exclude problem there for sure, just savepoints are used a 
>>>>>>>>>> lot w/o
>>>>>>>>>> problem reports and BucketingSink is known to be problematic with 
>>>>>>>>>> s3. That
>>>>>>>>>> is why, I asked you:
>>>>>>>>>>
>>>>>>>>>> > You also wrote that the timestamps of lost event are 'probably'
>>>>>>>>>> around the time of the savepoint, if it is not yet for sure I would 
>>>>>>>>>> also
>>>>>>>>>> check it.
>>>>>>>>>>
>>>>>>>>>> Although, bucketing sink might loose any data at the end of the
>>>>>>>>>> day (also from the middle). The fact, that it is always around the 
>>>>>>>>>> time of
>>>>>>>>>> taking a savepoint and not random, is surely suspicious and possible
>>>>>>>>>> savepoint failures need to be investigated.
>>>>>>>>>>
>>>>>>>>>> Regarding the s3 problem, s3 doc says:
>>>>>>>>>>
>>>>>>>>>> > The caveat is that if you make a HEAD or GET request to the key
>>>>>>>>>> name (to find if the object exists) before creating the object, 
>>>>>>>>>> Amazon S3
>>>>>>>>>> provides 'eventual consistency' for read-after-write.
>>>>>>>>>>
>>>>>>>>>> The algorithm you suggest is how it is roughly implemented now
>>>>>>>>>> (BucketingSink.openNewPartFile). My understanding is that
>>>>>>>>>> 'eventual consistency’ means that even if you just created file (its 
>>>>>>>>>> name
>>>>>>>>>> is key) it can be that you do not get it in the list or exists (HEAD)
>>>>>>>>>> returns false and you risk to rewrite the previous part.
>>>>>>>>>>
>>>>>>>>>> The BucketingSink was designed for a standard file system. s3 is
>>>>>>>>>> used over a file system wrapper atm but does not always provide 
>>>>>>>>>> normal file
>>>>>>>>>> system guarantees. See also last example in [1].
>>>>>>>>>>
>>>>>>>>>> Cheers,
>>>>>>>>>> Andrey
>>>>>>>>>>
>>>>>>>>>> [1]
>>>>>>>>>> https://codeburst.io/quick-explanation-of-the-s3-consistency-model-6c9f325e3f82
>>>>>>>>>>
>>>>>>>>>> On 29 Aug 2018, at 12:11, Juho Autio <juho.au...@rovio.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>> Andrey, thank you very much for the debugging suggestions, I'll
>>>>>>>>>> try them.
>>>>>>>>>>
>>>>>>>>>> In the meanwhile two more questions, please:
>>>>>>>>>>
>>>>>>>>>> > Just to keep in mind this problem with s3 and exclude it for
>>>>>>>>>> sure. I would also check whether the size of missing events is 
>>>>>>>>>> around the
>>>>>>>>>> batch size of BucketingSink or not.
>>>>>>>>>>
>>>>>>>>>> Fair enough, but I also want to focus on debugging the most
>>>>>>>>>> probable subject first. So what do you think about this – true or 
>>>>>>>>>> false:
>>>>>>>>>> only when the 24-hour window triggers, BucketinSink gets a burst of 
>>>>>>>>>> input.
>>>>>>>>>> Around the state restoring point (middle of the day) it doesn't get 
>>>>>>>>>> any
>>>>>>>>>> input, so it can't lose anything either. Isn't this true, or have I 
>>>>>>>>>> totally
>>>>>>>>>> missed how Flink works in triggering window results? I would not 
>>>>>>>>>> expect
>>>>>>>>>> there to be any optimization that speculatively triggers early 
>>>>>>>>>> results of a
>>>>>>>>>> regular time window to the downstream operators.
>>>>>>>>>>
>>>>>>>>>> > The old BucketingSink has in general problem with s3.
>>>>>>>>>> Internally BucketingSink queries s3 as a file system to list already
>>>>>>>>>> written file parts (batches) and determine index of the next part to 
>>>>>>>>>> start.
>>>>>>>>>> Due to eventual consistency of checking file existence in s3 [1], the
>>>>>>>>>> BucketingSink can rewrite the previously written part and basically 
>>>>>>>>>> loose
>>>>>>>>>> it.
>>>>>>>>>>
>>>>>>>>>> I was wondering, what does S3's "read-after-write consistency"
>>>>>>>>>> (mentioned on the page you linked) actually mean. It seems that this 
>>>>>>>>>> might
>>>>>>>>>> be possible:
>>>>>>>>>> - LIST keys, find current max index
>>>>>>>>>> - choose next index = max + 1
>>>>>>>>>> - HEAD next index: if it exists, keep adding + 1 until key
>>>>>>>>>> doesn't exist on S3
>>>>>>>>>>
>>>>>>>>>> But definitely sounds easier if a sink keeps track of files in a
>>>>>>>>>> way that's guaranteed to be consistent.
>>>>>>>>>>
>>>>>>>>>> Cheers,
>>>>>>>>>> Juho
>>>>>>>>>>
>>>>>>>>>> On Mon, Aug 27, 2018 at 2:04 PM Andrey Zagrebin <
>>>>>>>>>> and...@data-artisans.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> true, StreamingFileSink does not support s3 in 1.6.0, it is
>>>>>>>>>>> planned for the next 1.7 release, sorry for confusion.
>>>>>>>>>>> The old BucketingSink has in general problem with s3.
>>>>>>>>>>> Internally BucketingSink queries s3 as a file system
>>>>>>>>>>> to list already written file parts (batches) and determine index
>>>>>>>>>>> of the next part to start. Due to eventual consistency of checking 
>>>>>>>>>>> file
>>>>>>>>>>> existence in s3 [1], the BucketingSink can rewrite the previously 
>>>>>>>>>>> written
>>>>>>>>>>> part and basically loose it. It should be fixed for 
>>>>>>>>>>> StreamingFileSink in
>>>>>>>>>>> 1.7 where Flink keeps its own track of written parts and does not 
>>>>>>>>>>> rely on
>>>>>>>>>>> s3 as a file system.
>>>>>>>>>>> I also include Kostas, he might add more details.
>>>>>>>>>>>
>>>>>>>>>>> Just to keep in mind this problem with s3 and exclude it for
>>>>>>>>>>> sure  I would also check whether the size of missing events is 
>>>>>>>>>>> around the
>>>>>>>>>>> batch size of BucketingSink or not. You also wrote that the 
>>>>>>>>>>> timestamps of
>>>>>>>>>>> lost event are 'probably' around the time of the savepoint, if it 
>>>>>>>>>>> is not
>>>>>>>>>>> yet for sure I would also check it.
>>>>>>>>>>>
>>>>>>>>>>> Have you already checked the log files of job manager and task
>>>>>>>>>>> managers for the job running before and after the restore from the 
>>>>>>>>>>> check
>>>>>>>>>>> point? Is everything successful there, no errors, relevant warnings 
>>>>>>>>>>> or
>>>>>>>>>>> exceptions?
>>>>>>>>>>>
>>>>>>>>>>> As the next step, I would suggest to log all encountered events
>>>>>>>>>>> in DistinctFunction.reduce if possible for production data and check
>>>>>>>>>>> whether the missed events are eventually processed before or after 
>>>>>>>>>>> the
>>>>>>>>>>> savepoint. The following log message indicates a border between the 
>>>>>>>>>>> events
>>>>>>>>>>> that should be included into the savepoint (logged before) or not:
>>>>>>>>>>> “{} ({}, synchronous part) in thread {} took {} ms” (template)
>>>>>>>>>>> Also check if the savepoint has been overall completed:
>>>>>>>>>>> "{} ({}, asynchronous part) in thread {} took {} ms."
>>>>>>>>>>>
>>>>>>>>>>> Best,
>>>>>>>>>>> Andrey
>>>>>>>>>>>
>>>>>>>>>>> [1]
>>>>>>>>>>> https://docs.aws.amazon.com/AmazonS3/latest/dev/Introduction.html
>>>>>>>>>>>
>>>>>>>>>>> On 24 Aug 2018, at 20:41, Juho Autio <juho.au...@rovio.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> Using StreamingFileSink is not a convenient option for
>>>>>>>>>>> production use for us as it doesn't support s3*. I could use
>>>>>>>>>>> StreamingFileSink just to verify, but I don't see much point in 
>>>>>>>>>>> doing so.
>>>>>>>>>>> Please consider my previous comment:
>>>>>>>>>>>
>>>>>>>>>>> > I realized that BucketingSink must not play any role in this
>>>>>>>>>>> problem. This is because only when the 24-hour window triggers,
>>>>>>>>>>> BucketingSink gets a burst of input. Around the state restoring 
>>>>>>>>>>> point
>>>>>>>>>>> (middle of the day) it doesn't get any input, so it can't lose 
>>>>>>>>>>> anything
>>>>>>>>>>> either (right?).
>>>>>>>>>>>
>>>>>>>>>>> I could also use a kafka sink instead, but I can't imagine how
>>>>>>>>>>> there could be any difference. It's very real that the sink doesn't 
>>>>>>>>>>> get any
>>>>>>>>>>> input for a long time until the 24-hour window closes, and then it 
>>>>>>>>>>> quickly
>>>>>>>>>>> writes out everything because it's not that much data eventually 
>>>>>>>>>>> for the
>>>>>>>>>>> distinct values.
>>>>>>>>>>>
>>>>>>>>>>> Any ideas for debugging what's happening around the savepoint &
>>>>>>>>>>> restoration time?
>>>>>>>>>>>
>>>>>>>>>>> *) I actually implemented StreamingFileSink as an alternative
>>>>>>>>>>> sink. This was before I came to realize that most likely the sink 
>>>>>>>>>>> component
>>>>>>>>>>> has nothing to do with the data loss problem. I tried it with 
>>>>>>>>>>> s3n:// path
>>>>>>>>>>> just to see an exception being thrown. In the source code I indeed 
>>>>>>>>>>> then
>>>>>>>>>>> found an explicit check for the target path scheme to be "hdfs://".
>>>>>>>>>>>
>>>>>>>>>>> On Fri, Aug 24, 2018 at 7:49 PM Andrey Zagrebin <
>>>>>>>>>>> and...@data-artisans.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Ok, I think before further debugging the window reduced state,
>>>>>>>>>>>> could you try the new ‘StreamingFileSink’ [1] introduced in
>>>>>>>>>>>> Flink 1.6.0 instead of the previous 'BucketingSink’?
>>>>>>>>>>>>
>>>>>>>>>>>> Cheers,
>>>>>>>>>>>> Andrey
>>>>>>>>>>>>
>>>>>>>>>>>> [1]
>>>>>>>>>>>> https://ci.apache.org/projects/flink/flink-docs-stable/dev/connectors/streamfile_sink.html
>>>>>>>>>>>>
>>>>>>>>>>>> On 24 Aug 2018, at 18:03, Juho Autio <juho.au...@rovio.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> Yes, sorry for my confusing comment. I just meant that it seems
>>>>>>>>>>>> like there's a bug somewhere now that the output is missing some 
>>>>>>>>>>>> data.
>>>>>>>>>>>>
>>>>>>>>>>>> > I would wait and check the actual output in s3 because it is
>>>>>>>>>>>> the main result of the job
>>>>>>>>>>>>
>>>>>>>>>>>> Yes, and that's what I have already done. There seems to be
>>>>>>>>>>>> always some data loss with the production data volumes, if the job 
>>>>>>>>>>>> has been
>>>>>>>>>>>> restarted on that day.
>>>>>>>>>>>>
>>>>>>>>>>>> Would you have any suggestions for how to debug this further?
>>>>>>>>>>>>
>>>>>>>>>>>> Many thanks for stepping in.
>>>>>>>>>>>>
>>>>>>>>>>>> On Fri, Aug 24, 2018 at 6:37 PM Andrey Zagrebin <
>>>>>>>>>>>> and...@data-artisans.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi Juho,
>>>>>>>>>>>>>
>>>>>>>>>>>>> So it is a per key deduplication job.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Yes, I would wait and check the actual output in s3 because it
>>>>>>>>>>>>> is the main result of the job and
>>>>>>>>>>>>>
>>>>>>>>>>>>> > The late data around the time of taking savepoint might be
>>>>>>>>>>>>> not included into the savepoint but it should be behind the 
>>>>>>>>>>>>> snapshotted
>>>>>>>>>>>>> offset in Kafka.
>>>>>>>>>>>>>
>>>>>>>>>>>>> is not a bug, it is a possible behaviour.
>>>>>>>>>>>>>
>>>>>>>>>>>>> The savepoint is a snapshot of the data in transient which is
>>>>>>>>>>>>> already consumed from Kafka.
>>>>>>>>>>>>> Basically the full contents of the window result is split
>>>>>>>>>>>>> between the savepoint and what can come after the savepoint'ed 
>>>>>>>>>>>>> offset in
>>>>>>>>>>>>> Kafka but before the window result is written into s3.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Allowed lateness should not affect it, I am just saying that
>>>>>>>>>>>>> the final result in s3 should include all records after it.
>>>>>>>>>>>>> This is what should be guaranteed but not the contents of the
>>>>>>>>>>>>> intermediate savepoint.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>> Andrey
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 24 Aug 2018, at 16:52, Juho Autio <juho.au...@rovio.com>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks for your answer!
>>>>>>>>>>>>>
>>>>>>>>>>>>> I check for the missed data from the final output on s3. So I
>>>>>>>>>>>>> wait until the next day, then run the same thing re-implemented 
>>>>>>>>>>>>> in batch,
>>>>>>>>>>>>> and compare the output.
>>>>>>>>>>>>>
>>>>>>>>>>>>> > The late data around the time of taking savepoint might be
>>>>>>>>>>>>> not included into the savepoint but it should be behind the 
>>>>>>>>>>>>> snapshotted
>>>>>>>>>>>>> offset in Kafka.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Yes, I would definitely expect that. It seems like there's a
>>>>>>>>>>>>> bug somewhere.
>>>>>>>>>>>>>
>>>>>>>>>>>>> > Then it should just come later after the restore and should
>>>>>>>>>>>>> be reduced within the allowed lateness into the final result 
>>>>>>>>>>>>> which is saved
>>>>>>>>>>>>> into s3.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Well, as far as I know, allowed lateness doesn't play any role
>>>>>>>>>>>>> here, because I started running the job with allowedLateness=0, 
>>>>>>>>>>>>> and still
>>>>>>>>>>>>> get the data loss, while my late data output doesn't receive 
>>>>>>>>>>>>> anything.
>>>>>>>>>>>>>
>>>>>>>>>>>>> > Also, is this `DistinctFunction.reduce` just an example or
>>>>>>>>>>>>> the actual implementation, basically saving just one of records 
>>>>>>>>>>>>> inside the
>>>>>>>>>>>>> 24h window in s3? then what is missing there?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Yes, it's the actual implementation. Note that there's a keyBy
>>>>>>>>>>>>> before the DistinctFunction. So there's one record for each key 
>>>>>>>>>>>>> (which is
>>>>>>>>>>>>> the combination of a couple of fields). In practice I've seen 
>>>>>>>>>>>>> that we're
>>>>>>>>>>>>> missing ~2000-4000 elements on each restore, and the total output 
>>>>>>>>>>>>> is
>>>>>>>>>>>>> obviously much more than that.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Here's the full code for the key selector:
>>>>>>>>>>>>>
>>>>>>>>>>>>> public class MapKeySelector implements
>>>>>>>>>>>>> KeySelector<Map<String,String>, Object> {
>>>>>>>>>>>>>
>>>>>>>>>>>>>     private final String[] fields;
>>>>>>>>>>>>>
>>>>>>>>>>>>>     public MapKeySelector(String... fields) {
>>>>>>>>>>>>>         this.fields = fields;
>>>>>>>>>>>>>     }
>>>>>>>>>>>>>
>>>>>>>>>>>>>     @Override
>>>>>>>>>>>>>     public Object getKey(Map<String, String> event) throws
>>>>>>>>>>>>> Exception {
>>>>>>>>>>>>>         Tuple key =
>>>>>>>>>>>>> Tuple.getTupleClass(fields.length).newInstance();
>>>>>>>>>>>>>         for (int i = 0; i < fields.length; i++) {
>>>>>>>>>>>>>             key.setField(event.getOrDefault(fields[i], ""), i);
>>>>>>>>>>>>>         }
>>>>>>>>>>>>>         return key;
>>>>>>>>>>>>>     }
>>>>>>>>>>>>> }
>>>>>>>>>>>>>
>>>>>>>>>>>>> And a more exact example on how it's used:
>>>>>>>>>>>>>
>>>>>>>>>>>>>                 .keyBy(new MapKeySelector("ID", "PLAYER_ID",
>>>>>>>>>>>>> "FIELD", "KEY_NAME", "KEY_VALUE"))
>>>>>>>>>>>>>                 .timeWindow(Time.days(1))
>>>>>>>>>>>>>                 .reduce(new DistinctFunction())
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Fri, Aug 24, 2018 at 5:26 PM Andrey Zagrebin <
>>>>>>>>>>>>> and...@data-artisans.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi Juho,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Where exactly does the data miss? When do you notice that?
>>>>>>>>>>>>>> Do you check it:
>>>>>>>>>>>>>> - debugging `DistinctFunction.reduce` right after resume in
>>>>>>>>>>>>>> the middle of the day
>>>>>>>>>>>>>> or
>>>>>>>>>>>>>> - some distinct records miss in the final output
>>>>>>>>>>>>>> of BucketingSink in s3 after window result is actually triggered 
>>>>>>>>>>>>>> and saved
>>>>>>>>>>>>>> into s3 at the end of the day? is this the main output?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> The late data around the time of taking savepoint might be
>>>>>>>>>>>>>> not included into the savepoint but it should be behind the 
>>>>>>>>>>>>>> snapshotted
>>>>>>>>>>>>>> offset in Kafka. Then it should just come later after the 
>>>>>>>>>>>>>> restore and
>>>>>>>>>>>>>> should be reduced within the allowed lateness into the final 
>>>>>>>>>>>>>> result which
>>>>>>>>>>>>>> is saved into s3.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Also, is this `DistinctFunction.reduce` just an example or
>>>>>>>>>>>>>> the actual implementation, basically saving just one of records 
>>>>>>>>>>>>>> inside the
>>>>>>>>>>>>>> 24h window in s3? then what is missing there?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>> Andrey
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On 23 Aug 2018, at 15:42, Juho Autio <juho.au...@rovio.com>
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I changed to allowedLateness=0, no change, still missing data
>>>>>>>>>>>>>> when restoring from savepoint.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Tue, Aug 21, 2018 at 10:43 AM Juho Autio <
>>>>>>>>>>>>>> juho.au...@rovio.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I realized that BucketingSink must not play any role in this
>>>>>>>>>>>>>>> problem. This is because only when the 24-hour window triggers,
>>>>>>>>>>>>>>> BucketinSink gets a burst of input. Around the state restoring 
>>>>>>>>>>>>>>> point
>>>>>>>>>>>>>>> (middle of the day) it doesn't get any input, so it can't lose 
>>>>>>>>>>>>>>> anything
>>>>>>>>>>>>>>> either (right?).
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I will next try removing the allowedLateness entirely from
>>>>>>>>>>>>>>> the equation.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> In the meanwhile, please let me know if you have any
>>>>>>>>>>>>>>> suggestions for debugging the lost data, for example what logs 
>>>>>>>>>>>>>>> to enable.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> We use FlinkKafkaConsumer010 btw. Are there any known issues
>>>>>>>>>>>>>>> with that, that could contribute to lost data when restoring a 
>>>>>>>>>>>>>>> savepoint?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Fri, Aug 17, 2018 at 4:23 PM Juho Autio <
>>>>>>>>>>>>>>> juho.au...@rovio.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Some data is silently lost on my Flink stream job when
>>>>>>>>>>>>>>>> state is restored from a savepoint.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Do you have any debugging hints to find out where exactly
>>>>>>>>>>>>>>>> the data gets dropped?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> My job gathers distinct values using a 24-hour window. It
>>>>>>>>>>>>>>>> doesn't have any custom state management.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> When I cancel the job with savepoint and restore from that
>>>>>>>>>>>>>>>> savepoint, some data is missed. It seems to be losing just a 
>>>>>>>>>>>>>>>> small amount
>>>>>>>>>>>>>>>> of data. The event time of lost data is probably around the 
>>>>>>>>>>>>>>>> time of
>>>>>>>>>>>>>>>> savepoint. In other words the rest of the time window is not 
>>>>>>>>>>>>>>>> entirely
>>>>>>>>>>>>>>>> missed – collection works correctly also for (most of the) 
>>>>>>>>>>>>>>>> events that come
>>>>>>>>>>>>>>>> in after restoring.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> When the job processes a full 24-hour window without
>>>>>>>>>>>>>>>> interruptions it doesn't miss anything.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Usually the problem doesn't happen in test environments
>>>>>>>>>>>>>>>> that have smaller parallelism and smaller data volumes. But in 
>>>>>>>>>>>>>>>> production
>>>>>>>>>>>>>>>> volumes the job seems to be consistently missing at least 
>>>>>>>>>>>>>>>> something on
>>>>>>>>>>>>>>>> every restore.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> This issue has consistently happened since the job was
>>>>>>>>>>>>>>>> initially created. It was at first run on an older version of 
>>>>>>>>>>>>>>>> Flink
>>>>>>>>>>>>>>>> 1.5-SNAPSHOT and it still happens on both Flink 1.5.2 & 1.6.0.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I'm wondering if this could be for example some
>>>>>>>>>>>>>>>> synchronization issue between the kafka consumer offsets vs. 
>>>>>>>>>>>>>>>> what's been
>>>>>>>>>>>>>>>> written by BucketingSink?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 1. Job content, simplified
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>         kafkaStream
>>>>>>>>>>>>>>>>                 .flatMap(new ExtractFieldsFunction())
>>>>>>>>>>>>>>>>                 .keyBy(new MapKeySelector(1, 2, 3, 4))
>>>>>>>>>>>>>>>>                 .timeWindow(Time.days(1))
>>>>>>>>>>>>>>>>                 .allowedLateness(allowedLateness)
>>>>>>>>>>>>>>>>                 .sideOutputLateData(lateDataTag)
>>>>>>>>>>>>>>>>                 .reduce(new DistinctFunction())
>>>>>>>>>>>>>>>>                 .addSink(sink)
>>>>>>>>>>>>>>>>                 // use a fixed number of output partitions
>>>>>>>>>>>>>>>>                 .setParallelism(8))
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> /**
>>>>>>>>>>>>>>>>  * Usage: .keyBy("the", "distinct", "fields").reduce(new
>>>>>>>>>>>>>>>> DistinctFunction())
>>>>>>>>>>>>>>>>  */
>>>>>>>>>>>>>>>> public class DistinctFunction implements
>>>>>>>>>>>>>>>> ReduceFunction<java.util.Map<String, String>> {
>>>>>>>>>>>>>>>>     @Override
>>>>>>>>>>>>>>>>     public Map<String, String> reduce(Map<String, String>
>>>>>>>>>>>>>>>> value1, Map<String, String> value2) {
>>>>>>>>>>>>>>>>         return value1;
>>>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 2. State configuration
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> boolean enableIncrementalCheckpointing = true;
>>>>>>>>>>>>>>>> String statePath = "s3n://bucket/savepoints";
>>>>>>>>>>>>>>>> new RocksDBStateBackend(statePath,
>>>>>>>>>>>>>>>> enableIncrementalCheckpointing);
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Checkpointing Mode Exactly Once
>>>>>>>>>>>>>>>> Interval 1m 0s
>>>>>>>>>>>>>>>> Timeout 10m 0s
>>>>>>>>>>>>>>>> Minimum Pause Between Checkpoints 1m 0s
>>>>>>>>>>>>>>>> Maximum Concurrent Checkpoints 1
>>>>>>>>>>>>>>>> Persist Checkpoints Externally Enabled (retain on
>>>>>>>>>>>>>>>> cancellation)
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 3. BucketingSink configuration
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> We use BucketingSink, I don't think there's anything
>>>>>>>>>>>>>>>> special here, if not the fact that we're writing to S3.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>         String outputPath = "s3://bucket/output";
>>>>>>>>>>>>>>>>         BucketingSink<Map<String, String>> sink = new
>>>>>>>>>>>>>>>> BucketingSink<Map<String, String>>(outputPath)
>>>>>>>>>>>>>>>>                 .setBucketer(new ProcessdateBucketer())
>>>>>>>>>>>>>>>>                 .setBatchSize(batchSize)
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> .setInactiveBucketThreshold(inactiveBucketThreshold)
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> .setInactiveBucketCheckInterval(inactiveBucketCheckInterval);
>>>>>>>>>>>>>>>>         sink.setWriter(new IdJsonWriter());
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 4. Kafka & event time
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> My flink job reads the data from Kafka, using a
>>>>>>>>>>>>>>>> BoundedOutOfOrdernessTimestampExtractor on the kafka consumer 
>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>> synchronize watermarks accross all kafka partitions. We also 
>>>>>>>>>>>>>>>> write late
>>>>>>>>>>>>>>>> data to side output, but nothing is written there – if it 
>>>>>>>>>>>>>>>> would, it could
>>>>>>>>>>>>>>>> explain missed data in the main output (I'm also sure that our 
>>>>>>>>>>>>>>>> late data
>>>>>>>>>>>>>>>> writing works, because we previously had some actual late data 
>>>>>>>>>>>>>>>> which ended
>>>>>>>>>>>>>>>> up there).
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 5. allowedLateness
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> It may be or may not be relevant that I have also enabled
>>>>>>>>>>>>>>>> allowedLateness with 1 minute lateness on the 24-hour window:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> If that makes sense, I could try removing allowedLateness
>>>>>>>>>>>>>>>> entirely? That would be just to rule out that Flink doesn't 
>>>>>>>>>>>>>>>> have a bug
>>>>>>>>>>>>>>>> that's related to restoring state in combination with the 
>>>>>>>>>>>>>>>> allowedLateness
>>>>>>>>>>>>>>>> feature. After all, all of our data should be in a good enough 
>>>>>>>>>>>>>>>> order to not
>>>>>>>>>>>>>>>> be late, given the max out of orderness used on kafka consumer 
>>>>>>>>>>>>>>>> timestamp
>>>>>>>>>>>>>>>> extractor.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thank you in advance!
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>
>>>>
>>>
>>>
>>
>

Re: Data loss when restoring from savepoint

Reply via email to