Hi Robert,
Uncaught exceptions that cause the job to fall into a fail-and-restart loop
is likewise to the corrupt record case I mentioned.
With exactly-once guarantees, the job will roll back to the last complete
checkpoint, which "resets" the Flink consumer to some earlier Kafka
partition offset
Hi,
We are seeing this exception in one of our job, whenever a check point or save
point is performed.
java.lang.RuntimeException: Error while adding data to RocksDB
at
org.apache.flink.contrib.streaming.state.RocksDBListState.add(RocksDBListState.java:119)
at
org.apache.flink.runtime.state.Us
Normally it should return 0ms in case of no latency not NaN, and my real
data size is 1kb, but for now I'm using 200 bytes, I will try it with the
real size later.
For the data generator, it is an infinite for loop.
Thanks.
2017-11-22 18:11 GMT+01:00 Timo Walther :
> At a first glance I would s
I usually refer to this:
https://github.com/FelixNeutatz/parquet-flinktacular
On 22 Nov 2017 18:29, "Fabian Hueske" wrote:
> Hi Ebru,
>
> AvroParquetOutputFormat seems to implement Hadoop's OutputFormat interface.
> Flink provides a wrapper for Hadoop's OutputFormat [1], so you can try to
> wra
Hi Ebru,
AvroParquetOutputFormat seems to implement Hadoop's OutputFormat interface.
Flink provides a wrapper for Hadoop's OutputFormat [1], so you can try to
wrap AvroParquetOutputFormat in Flink's HadoopOutputFormat.
Hope this helps,
Fabian
[1]
https://ci.apache.org/projects/flink/flink-docs-r
@Patrick: Do you have an advice?
Am 11/22/17 um 5:52 PM schrieb domi...@dbruhn.de:
Hey everyone,
I'm trying since hours to get Flink 1.3.2 (downloaded for hadoop 2.7)
to snapshot/checkpoint to an S3 bucket which is hosted in the
eu-central-1 region. Everything works fine for other regions. I'
At a first glance I would say that your data size is very small. Flink
is able to process millions of records on a single machine. It might be
that the records are produced to quickly to be used for latency measuring.
Is you data generator never-ending?
Am 11/22/17 um 4:13 PM schrieb Ladhari
Hey everyone,
I'm trying since hours to get Flink 1.3.2 (downloaded for hadoop 2.7) to
snapshot/checkpoint to an S3 bucket which is hosted in the eu-central-1
region. Everything works fine for other regions. I'm running my job on a
JobTracker in local mode. I googled the internet and found seve
Jared has a good point, what is mvn dependency:tree showing?
On Wed, Nov 22, 2017 at 7:54 AM, Jared Stehler <
jared.steh...@intellifylearning.com> wrote:
> Protobuf is notorious for throwing things like “class not found” when
> built and run with different versions of the library; I believe flink
Thanks Timo for your answer.
I have tried to setLatencyTrackingInterval(1000) but I have got the same
result ( latency : NaN )
My Flink Job is a geofencing pattern :
- [Latitude,Langitude ] < IN | OUT > Location ? Send Notification : None
In my stress test I'm using data that always send no
Hi Shailesh,
your JobManager log suggests that this same JVM instance actually contains a
TaskManager as well (sorry for not noticing earlier). Also this time, there is
nothing regarding the BlobServer/BlobCache, but it looks like the task manager
may think the jobmanager is down.
Can you try wi
Protobuf is notorious for throwing things like “class not found” when built
and run with different versions of the library; I believe flink is using
protobuf 2.5.0 and you mentioned using 2.6.1, which I think would be a
possible cause of this issue.
--
Jared Stehler
Chief Architect - Intellify Lea
Hello all,
We are trying to write dataset as parquet format, we use
AvroParquetOutputFormat but it is not compatible with Flink’s FileOutputFormat.
Is there a way to write dataset as parquet?
-Ebru
Thanks Gordon
But what if there is an uncaught exception in processing of the record (during
normal job execution, after deserialization)?
After the restart strategy exceeds the failure rate, the job will fail and on
re-run it would start at the same offset, right?
Is there a way to avoid this an
Hi,
the sampling functions are exposed in
org.apache.flink.api.java.utils.DataSetUtils. So you can basically can
create something like:
final HadoopInputFormat inputFormat =
HadoopInputs.readHadoopFile(new TextInputFormat(), LongWritable.class,
Text.class, hdfsPath);
final DataSet> input
Hi Vishal,
shouldn't it be possible to configure a proxy user via core-site.xml?
Flink is also using this XML for HDFS.
You can also set the configuration files manually, see
https://ci.apache.org/projects/flink/flink-docs-release-1.3/setup/config.html#hdfs
Regards,
Timo
Am 11/21/17 um 3:
Hi Sadok,
it would be helpful if you could tell us a bit more about your job. E.g.
a skewed key distribution where keys are only sent to one third of your
operators can not use your CPUs full capabilities.
The latency tracking interval is in milliseconds. Can you try if 1000
would fix your p
Hi!
The FlinkKafkaConsumer can handle watermark advancement with
per-Kafka-partition awareness (across partitions of different topics).
You can see an example of how to do that here [1].
Basically what this does is that it generates watermarks within the Kafka
consumer individually for each Kafka
Hi Robert,
As expected with exactly-once guarantees, a record that caused a Flink job
to fail will be attempted to be reprocessed on the restart of the job.
For some specific "corrupt" record that causes the job to fall into a
fail-and-restart loop, there is a way to let the Kafka consumer skip t
Hi,
When you join multiple stream with different watermarks,
the resulting stream's watermark will be the smallest of the input
watermark,
as long as you don't explicitly assign a new watermarks generator.
In your example, if small_topic has watermark at time t1, big_topic has
watermark at
Sorry guys,
in the previous message, when I talked about the task managers performance,
I meant *Jobmanager* performance
Francisco Gonzalez wrote
> Hi guys,
>
> After investigating a bit more about this topic, we found a solution
> adding
> a small change in the Flink-1.3.2 source code.
>
> W
Hi guys,
After investigating a bit more about this topic, we found a solution adding
a small change in the Flink-1.3.2 source code.
We found that the issue occurred when different threads tried to build the
Tuple2 object at the same time (due to they use the
static ExecutionEnvironmnet variable
Hi,
Yes, if I remember correctly, this was changed in 1.2 to always include the
user-jar in the system classloader on YARN. With Flink 1.4 we are changing the
user-code classloader to load classes from the user-jar first (child-first
classloading) by default so a lot of the comments on avoiding
Hi Dominik,
the Web UI shows you the status of a checkpoint [0], so it might be
possible to retrieve the information via REST calls. Usually, you should
perform a savepoint for planned restarts. If a savepoint is successful
you can be sure to restart from it.
Otherwise the platform from data
Hey,
we are running Flink 1.3.2 with streaming jobs and we are running into
issues when we are restarting a complete job (which can happen due to
various reasons: upgrading of the job, restarting of the cluster,
failures). The problem is that there is no automated way to find out
from which ch
I would like to understand how FlinkKafkaConsumer treats "unbalanced"
topics.
We're using FlinkKafkaConsumer010 with 2 topics, say "small_topic" &
"big_topic".
After restoring from an old savepoint (4 hours before), I checked the
consumer offsets on Kafka (Flink commits offsets to kafka for refer
But wouldn't a failed dependency show another ClassNotFoundException?
On Tuesday, 21 November 2017 20:31:58 CET Gordon Weakliem wrote:
> Isn't one cause for ClassNotFoundException that the class can't load due to
> failed dependencies or a failure in a static constructor?
>
> If jar -tf target/pr
Hi All,
I want to do a stress testing of my Flink app implementation: event
generation with ParallelSourceFunction then measuring the latency
,throughput, CPU & memry leak ...
But when testing, I noticed that :
- the maximum of CPU usage is 30-33%
- latency is always NaNd NaNh in the dashb
28 matches
Mail list logo