Hi Maximilian,
I’m currently running some tests again on a cluster to try and pinpoint the
problem. Just to make sure, you are using Hadoop 2.4.1 with Yarn and Kafka 0.8,
correct?
In the meantime, could you maybe run a test where you completely bypass Kafka,
just so we can see whether the probl
Hi Aljoscha,
yeah I should have been clearer. I did mean those accumulators but am not
trusting them in the sense of total number (as you said, they are reset on
failure). On the other hand, if they do not change for a while it is pretty
obvious that the job has ingested everything in the queue
Hi,
with accumulator you mean the ones you get from
RuntimeContext.addAccumulator/getAccumulator? I’m afraid these are not
fault-tolerant which means that the count in these probably doesn’t reflect the
actual number of elements that were processed. When a job fails and restarts
the accumulator
Hi,
thanks for the fast answer. Answers inline.
> Am 08.03.2016 um 13:31 schrieb Aljoscha Krettek :
>
> Hi,
> a missing part file for one of the parallel sinks is not necessarily a
> problem. This can happen if that parallel instance of the sink never received
> data after the job successfully
Hi,
a missing part file for one of the parallel sinks is not necessarily a problem.
This can happen if that parallel instance of the sink never received data after
the job successfully restarted.
Missing data, however, is a problem. Maybe I need some more information about
your setup:
- When
Hi Aljoscha,
oh I see. I was under the impression this file was used internally and the
output being completed at the end. Ok, so I extracted the relevant lines using
for i in part-*; do head -c $(cat "_$i.valid-length" | strings) "$i" >
"$i.final"; done
which seems to do the trick.
Unf
Hi,
are you taking the “.valid-length” files into account. The problem with doing
“exactly-once” with HDFS is that before Hadoop 2.7 it was not possible to
truncate files. So the trick we’re using is to write the length up to which a
file is valid if we would normally need to truncate it. (If th
Hi Aljoscha,
thank you very much, I will try if this fixes the problem and get back to you.
I am using 1.0.0 as of today :)
Cheers,
Max
—
Maximilian Bode * Junior Consultant * maximilian.b...@tngtech.com
TNG Technology Consulting GmbH, Betastr. 13a, 85774 Unterföhring
Geschäftsführer: Henrik Kl
Hi Maximilian,
sorry for the delay, we where very busy with the release last week. I had a
hunch about the problem but I think I found a fix now. The problem is in
snapshot restore. When restoring, the sink tries to clean up any files that
where previously in progress. If Flink restores to the s
Hi Aljoscha,
did you by any chance get around to looking at the problem again? It seems to
us that at the time when restoreState() is called, the files are already in
final status and there are additional .valid-length files (hence there are
neither pending nor in-progress files). Furthermore,
Hi Aljoscha,
thank you for the fast answer. The files in HDFS change as follows:
-before task manager is killed:
[user@host user]$ hdfs dfs -ls -R /hdfs/dir/outbound
-rw-r--r-- 2 user hadoop2435461 2016-03-03 14:51
/hdfs/dir/outbound/_part-0-0.in-progress
-rw-r--r-- 2 user hadoop2404
Hi,
did you check whether there are any files at your specified HDFS output
location? If yes, which files are there?
Cheers,
Aljoscha
> On 03 Mar 2016, at 14:29, Maximilian Bode wrote:
>
> Just for the sake of completeness: this also happens when killing a task
> manager and is therefore proba
Just for the sake of completeness: this also happens when killing a task
manager and is therefore probably unrelated to job manager HA.
> Am 03.03.2016 um 14:17 schrieb Maximilian Bode :
>
> Hi everyone,
>
> unfortunately, I am running into another problem trying to establish exactly
> once gu
Hi everyone,
unfortunately, I am running into another problem trying to establish exactly
once guarantees (Kafka -> Flink 1.0.0-rc3 -> HDFS).
When using
RollingSink> sink = new
RollingSink>("hdfs://our.machine.com:8020/hdfs/dir/outbound");
sink.setBucketer(new NonRollingBucketer());
output.add
14 matches
Mail list logo