Re: Jobmanager HA with Rolling Sink in HDFS

2016-03-09 Thread Aljoscha Krettek
Hi Maximilian, I’m currently running some tests again on a cluster to try and pinpoint the problem. Just to make sure, you are using Hadoop 2.4.1 with Yarn and Kafka 0.8, correct? In the meantime, could you maybe run a test where you completely bypass Kafka, just so we can see whether the probl

Re: Jobmanager HA with Rolling Sink in HDFS

2016-03-08 Thread Maximilian Bode
Hi Aljoscha, yeah I should have been clearer. I did mean those accumulators but am not trusting them in the sense of total number (as you said, they are reset on failure). On the other hand, if they do not change for a while it is pretty obvious that the job has ingested everything in the queue

Re: Jobmanager HA with Rolling Sink in HDFS

2016-03-08 Thread Aljoscha Krettek
Hi, with accumulator you mean the ones you get from RuntimeContext.addAccumulator/getAccumulator? I’m afraid these are not fault-tolerant which means that the count in these probably doesn’t reflect the actual number of elements that were processed. When a job fails and restarts the accumulator

Re: Jobmanager HA with Rolling Sink in HDFS

2016-03-08 Thread Maximilian Bode
Hi, thanks for the fast answer. Answers inline. > Am 08.03.2016 um 13:31 schrieb Aljoscha Krettek : > > Hi, > a missing part file for one of the parallel sinks is not necessarily a > problem. This can happen if that parallel instance of the sink never received > data after the job successfully

Re: Jobmanager HA with Rolling Sink in HDFS

2016-03-08 Thread Aljoscha Krettek
Hi, a missing part file for one of the parallel sinks is not necessarily a problem. This can happen if that parallel instance of the sink never received data after the job successfully restarted. Missing data, however, is a problem. Maybe I need some more information about your setup: - When

Re: Jobmanager HA with Rolling Sink in HDFS

2016-03-08 Thread Maximilian Bode
Hi Aljoscha, oh I see. I was under the impression this file was used internally and the output being completed at the end. Ok, so I extracted the relevant lines using for i in part-*; do head -c $(cat "_$i.valid-length" | strings) "$i" > "$i.final"; done which seems to do the trick. Unf

Re: Jobmanager HA with Rolling Sink in HDFS

2016-03-08 Thread Aljoscha Krettek
Hi, are you taking the “.valid-length” files into account. The problem with doing “exactly-once” with HDFS is that before Hadoop 2.7 it was not possible to truncate files. So the trick we’re using is to write the length up to which a file is valid if we would normally need to truncate it. (If th

Re: Jobmanager HA with Rolling Sink in HDFS

2016-03-07 Thread Maximilian Bode
Hi Aljoscha, thank you very much, I will try if this fixes the problem and get back to you. I am using 1.0.0 as of today :) Cheers, Max — Maximilian Bode * Junior Consultant * maximilian.b...@tngtech.com TNG Technology Consulting GmbH, Betastr. 13a, 85774 Unterföhring Geschäftsführer: Henrik Kl

Re: Jobmanager HA with Rolling Sink in HDFS

2016-03-07 Thread Aljoscha Krettek
Hi Maximilian, sorry for the delay, we where very busy with the release last week. I had a hunch about the problem but I think I found a fix now. The problem is in snapshot restore. When restoring, the sink tries to clean up any files that where previously in progress. If Flink restores to the s

Re: Jobmanager HA with Rolling Sink in HDFS

2016-03-07 Thread Maximilian Bode
Hi Aljoscha, did you by any chance get around to looking at the problem again? It seems to us that at the time when restoreState() is called, the files are already in final status and there are additional .valid-length files (hence there are neither pending nor in-progress files). Furthermore,

Re: Jobmanager HA with Rolling Sink in HDFS

2016-03-03 Thread Maximilian Bode
Hi Aljoscha, thank you for the fast answer. The files in HDFS change as follows: -before task manager is killed: [user@host user]$ hdfs dfs -ls -R /hdfs/dir/outbound -rw-r--r-- 2 user hadoop2435461 2016-03-03 14:51 /hdfs/dir/outbound/_part-0-0.in-progress -rw-r--r-- 2 user hadoop2404

Re: Jobmanager HA with Rolling Sink in HDFS

2016-03-03 Thread Aljoscha Krettek
Hi, did you check whether there are any files at your specified HDFS output location? If yes, which files are there? Cheers, Aljoscha > On 03 Mar 2016, at 14:29, Maximilian Bode wrote: > > Just for the sake of completeness: this also happens when killing a task > manager and is therefore proba

Re: Jobmanager HA with Rolling Sink in HDFS

2016-03-03 Thread Maximilian Bode
Just for the sake of completeness: this also happens when killing a task manager and is therefore probably unrelated to job manager HA. > Am 03.03.2016 um 14:17 schrieb Maximilian Bode : > > Hi everyone, > > unfortunately, I am running into another problem trying to establish exactly > once gu

Jobmanager HA with Rolling Sink in HDFS

2016-03-03 Thread Maximilian Bode
Hi everyone, unfortunately, I am running into another problem trying to establish exactly once guarantees (Kafka -> Flink 1.0.0-rc3 -> HDFS). When using RollingSink> sink = new RollingSink>("hdfs://our.machine.com:8020/hdfs/dir/outbound"); sink.setBucketer(new NonRollingBucketer()); output.add