Hi,
Flink does not support auto-scaling, yet. Rescaling operations currently
are always manual, i.e take a savepoint of the Flink job, and when
restoring from the savepoint, define a new parallelism for the job.
As for the metrics to be used for auto-scaling, I can imagine that it would
be possibl
Hi,
Another detail not that apparent in the description is that the assignment
would only be evenly distributed assuming that the open Kinesis shards have
consecutive shard ids, and are of the same Kinesis stream.
Once you reshard a Kinesis stream, it could be that the shard ids are no
longer cons
Hi,
This is most likely an exception that indicates either that 1) you are
using mismatching versions of Flink in your application code and the
installed Flink cluster, or 2) your application code isn't properly
packaged.
>From your exception, I'm guessing it is the latter case. If so, I would
sug
Hi Jayant,
What is the Kryo exception message that you are getting?
Cheers,
Gordon
On 25 October 2018 at 5:17:13 PM, Jayant Ameta (wittyam...@gmail.com) wrote:
Hi,
I've not configured any serializer in the descriptor. (Neither in flink job,
nor in state query client).
Which serializer should
Hi Jose,
As far as I know, you should be able to use keyed state on a stream returned by
DataStreamUtils.reinterpretAsKeyedStream function. That shouldn’t be the issue
here.
Have you looked into the logs for any meaningful exceptions of why the restore
failed?
That would be helpful here to und
Hi!
How are you packaging your Flink program? This looks like a simple dependency
error.
If you don’t know where to start when beginning to write your Flink program,
the quickstart Maven templates are always a good place to begin with [1].
Cheers,
Gordon
[1]
https://ci.apache.org/projects/fli
Hi,
I’m forwarding this question to Stefan (cc’ed).
He would most likely be able to answer your question, as he has done
substantial work in the RocksDB state backends.
Cheers,
Gordon
On 24 October 2018 at 8:47:24 PM, chandan prakash (chandanbaran...@gmail.com)
wrote:
Hi,
I am new to Flink.
Hi,
Could you point to the AWS Kinesis Java API that exposes record headers?
As far as I can tell from the Javadoc, I can’t seem to find how to retrieve
headers from Kinesis records.
If there is a way to do that, then it might make sense to expose that from the
Kinesis connector’s serialization
Hi,
Since Flink 1.5, you should be able to set all available configurations on
the ClientConfiguration through the consumer Properties (see FLINK-9188
[1]).
The way to do that would be to prefix the configuration you want to set
with "aws.clientconfig" and add that to the properties, as such:
``
Hi Julio,
As Renjie had already mentioned, to achieve exactly-once semantics with the
Kafka consumer, Flink needs to have control over the Kafka partition to source
subtask assignment.
To add a bit more detail here, this is due to the fact that each subtask writes
to Flink managed state the cu
Hi,
The case you described looks a lot like this issue with the Flink Kafka
Consumer in 1.3.0 / 1.3.1:
https://issues.apache.org/jira/browse/FLINK-7143
If this is the case, you would have to upgrade to 1.3.2 or above to
overcome this.
The issue ticket description contains some things to keep in m
Hi Miki,
The latest stable version of the Elasticsearch connector, as of Flink 1.5.x, is
Elasticsearch 5.x.
As for Elasticsearch 6.x, there has been some PRs that has been open for a
while and have already been discussed quite thoroughly [1] [2].
Till and I have talked about merging these for
Hi,
It should be possible to deploy a single Flink cluster across
geo-distributed nodes, but Flink currently offers no optimization for such
a specific use case.
AFAIK, the general pattern for dealing with geographically distributed data
sources right now, would be to replicate data across cluster
Hi Giriraj,
The fact that the Flink Kafka Consumer doesn't use the group.id property,
is an expected behavior.
Since the partition-to-subtask assignment of the Flink Kafka Consumer needs
to be deterministic in Flink, the consumer uses static assignment instead
of the more high-level consumer group
Hi,
The tests in Flink uses a `AbstractStreamOperatorTestHarness` that allows
wrapping an operator, input elements into the operator, getting the outputs,
and also snapshotting / restoring operator state.
I’m not sure of your specific case, but in general that test harness utility
can be used t
Hi,
This is a notice for users who use the Flink Kinesis connector to produce data
to AWS Kinesis Streams, with Flink versions 1.4.2 or below.
For Flink versions 1.4.2 and below, the KPL client version used by default in
the Kinesis connectors, KPL 0.12.5, is no longer supported by AWS Kinesis
)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
From: "Liu, Gavin (CAI - Atlanta)"
Date: Wednesday, June 20, 2018 at 12:11 PM
To: "Tzu-
Hi Vishal,
Kryo has a serializer called `CompatibleFieldSerializer` that allows for simple
backward compatibility changes, such as adding non-optional fields / removing
fields.
If using the KryoSerializer is a must, then a good thing to do is to register
Kryo's `CompatibleFieldSerializer` as
Hi Gavin,
The problem is that the Kinesis producer currently does not propagate
backpressure properly.
Records are added to the internally used KPL client’s queue, without any queue
size limit.
This is considered a bug, and already has a pull request for it [1], which we
should probably push t
Hi,
You can “skip” the corrupted message by returning `null` from the deserialize
method on the user-provided DeserializationSchema.
This lets the Kafka connector consider the record as processed, advances the
offset, but doesn’t emit anything downstream for it.
Hope this helps!
Cheers,
Gordon
> :-)
>
> Alexey
>
> On Thu, Jun 14, 2018 at 12:20 PM Tzu-Li (Gordon) Tai
> wrote:
>
> > Hi,
> >
> > This could be related:
> > http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Flink-1-4-and-below-STOPS-writin
Hi,
This could be related:
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Flink-1-4-and-below-STOPS-writing-to-Kinesis-after-June-12th-td22687.html#a22701.
Shortly put, the KPL library version used by default in the 1.4.x Kinesis
connector, is no longer supported by AWS.
cords_lag_max" seems to be new (with the
attempt thingy).
On the "KafkaConsumer" front, but it only has the "commited_offset" for each
partition.
On Wed, Jun 13, 2018 at 5:41 AM, Tzu-Li (Gordon) Tai
wrote:
Hi,
Which Kafka version are you using?
AFAIK, the only recent
Hi,
Which Kafka version are you using?
AFAIK, the only recent changes to Kafka connector metrics in the 1.4.x series
would be FLINK-8419 [1].
The ‘records_lag_max’ metric is a Kafka-shipped metric simply forwarded from
the internally used Kafka client, so nothing should have been affected.
Do
Hi Jayant,
Yes, you don’t have to use an anonymous class for the sink function. An actual
separate class works just as fine.
The class fields don’t need to be marked as transient or checkpointed, since
they should just be constants that come with instantiation of the sink
function, or could eve
Hi,
FYI, this is the JIRA ticket for the issue:
https://issues.apache.org/jira/browse/FLINK-8836
Yes, this seems to be only included in 1.5.0 (to be released), and 1.4.3 (there
has been no discussion on releasing that yet).
It could also be possible that the reported issue was caused by
https:
Hi,
Timo is correct - partition discovery is supported by the consumer only
starting from Flink 1.4.
The expected behaviour without partition discovery on, is that the list of
partitions picked up on the first execution of the job will be the list of
subscribed partition across all executions.
Wh
Ah, correct, sorry for the incorrect link.
Thanks Ted!
On 7 May 2018 at 11:43:12 AM, Ted Yu (yuzhih...@gmail.com) wrote:
It seems the correct JIRA should be FLINK-9303
On Sun, May 6, 2018 at 8:29 PM, Tzu-Li (Gordon) Tai wrote:
Hi Edward,
Thanks for brining this up, and I think your
Hi,
AFAIK, there is no built-in feature for scheduling / orchestrating
submission of Flink jobs.
However, you should be able to easily use tools like cron jobs to do that.
It should work by just taking a savepoint of your running job, and then
resuming for that, and you do this periodically.
Chee
Hi Sampath,
Do you already have a target JIRA that you would like to work on?
Once you have one, let us know the JIRA issue ID and your JIRA account ID,
then we'll assign you contributor permissions. With that, you can pick up
unassigned JIRA issues to work on by yourself in the future.
Cheers,
Hi Edward,
Thanks for brining this up, and I think your suggestion makes sense.
The problem is that the Kafka consumer has no notion of "closed" partitions
at the moment, so statically assigned partitions to the Kafka client is
never removed and is always continuously requested for records.
For e
Hi,
Do you mean tests to verify that some metric is actually registered?
AFAIK, this is not really easy to do as a unit test.
One possible way is to have an integration test that uses a metrics reporter,
from which you verify against.
For example, the Kafka consumer integration tests that uses
Hi Sebastien,
You need to add the dependency under a “dependencies” section, like so:
…
Then it should be working.
I would also recommend using the Flink quickstart Maven templates [1], as they
already have a well defined Maven project skeleton for Flink jobs.
Cheers,
Gordon
[1]
h
@Alexander
Sorry about that, that would be my mistake. I’ll close FLINK-9204 as a
duplicate and leave my thoughts on FLINK-9155. Thanks for pointing out!
On 19 April 2018 at 2:00:51 AM, Elias Levy (fearsome.lucid...@gmail.com) wrote:
Either proposal would work. In the later case, at a minimum
Hi,
The partition-to-subtask assignment of is not locality aware.
There were discussions to expose functionality for custom user-defined
assignment methods, which it might be possible to leverage that for a
locality aware assignment.
Unfortunately, this feature is not implemented, yet.
The rrela
Hi,
The docs here [1] provide some example snippets of using the Kafka connector
to consume from / write to Kafka topics.
Once you consumed a `DataStream` from a Kafka topic using the Kafka
consumer, you can use Flink transformations such as map, flatMap, etc. to
perform processing on the records
Hi,
These are valid concerns. And yes, AFAIK users have been writing to logs within
the deserialization schema to track this. The connectors as of now have no
logging themselves in case of a skipped record.
I think we can implement both logging and metrics to track this, most of which
you have
Hi Ashish,
I don't really see why there are outputs in the out file for the program you
provided. Perhaps others could chime in here ..
As for your second question regarding window outputs:
Yes, subsequent window operators should definitely be doable in Flink.
This is just a matter of multiple tr
Hi,
How are your registering your event time timers on processElement?
If you are continuously registering them, and watermarks are correctly
generated upstream, then the onTimer method should be invoked properly.
For your 1-to-many case, I would assume that whenever a new key arrives
(that previ
Hi Max!
Before we jump into the custom ProcessFunction approach:
Have you also checked out using the RocksDB state backend, and whether or not
it is suitable for your use case?
For state that would not fit into memory, that is usually the to-go state
backend to use.
If you’re sure a custom Proc
Hi Julio,
I'm not really sure, but do you think it is possible that there could be
some hard data retention setting for your Kafka topics in the staging
environment?
As in, at some point in time and maybe periodically, all data in the Kafka
topics are dropped and therefore the consumers effectivel
Hi Alberto,
Looking at the code, I think the current behavior is that all timers (both
processing time and event time) are re-registered on restore, and therefore
should be triggered automatically.
So, for processing time timers, on restore all timers that were supposed to be
fired while the jo
partition to a topic that is being consumed.
>> Secor started consuming it as expected, but Flink didn't – or at least it
>> isn't reporting anything about doing so. The new partition is not shown in
>> Flink task metrics or consumer offsets committed by Flink.
>>
Hi Manuel,
Thanks a lot for reporting this!
Yes, this issue is most likely related to the recent changes to shading the
Elasticsearch connector dependencies, though it is a bit curious why I didn’t
bump into it before while testing it.
The Flink job runs Lucene queries on a data stream which e
Flink
dev? If yes, I could rebuild my 1.5-SNAPSHOT and try again.
On Thu, Mar 22, 2018 at 4:18 PM, Tzu-Li (Gordon) Tai
wrote:
Hi Juho,
Can you confirm that the new partition is consumed, but only that Flink’s
reported metrics do not include them?
If yes, then I think your observations can
Hi Juho,
Can you confirm that the new partition is consumed, but only that Flink’s
reported metrics do not include them?
If yes, then I think your observations can be explained by this issue:
https://issues.apache.org/jira/browse/FLINK-8419
This issue should have been fixed in the recently rele
Hi Gyula,
Are you using Flink 1.4.x, and have partition discovery enabled?
If yes, then both the state of previously existing topics, as well as
partitions of the newly specified topics will be consumed.
Cheers,
Gordon
On Tue, Mar 20, 2018 at 6:01 AM, Ankit Chaudhary wrote:
> Did you changed t
The Apache Flink community is very happy to announce the release of Apache
Flink 1.3.3, which is the third bugfix release for the Apache Flink 1.3 series.
Apache Flink® is an open-source stream processing framework for distributed,
high-performing, always-available, and accurate data streaming
The Apache Flink community is very happy to announce the release of Apache
Flink 1.4.2, which is the second bugfix release for the Apache Flink 1.4
series.
Apache Flink® is an open-source stream processing framework for distributed,
high-performing, always-available, and accurate data streamin
Hi Ankit,
This is a known issue in 1.4.1. Please see
https://issues.apache.org/jira/browse/FLINK-8741.
The release for 1.4.2 will include a fix for this issue, and we already have a
release candidate being voted at the moment.
Hopefully, it will be released soon, probable early next week.
Chee
Hi Chengzhi,
Yes, generally speaking, you would launch a separated job to do the
backfilling, and then shut down the job after the backfilling is completed.
For this to work, you’ll also have to keep in mind that writes to the external
sink must be idempotent.
Are you using Kafka as the data so
Hi Philip,
Yes, I also have the question that Fabian mentioned. Did you start observing
this only after upgrading to 1.4.0?
Could you let me know what exactly your deserialization schema is doing? I
don’t have any clues at the moment, but maybe there are hints there.
Also, you mentioned that th
Hi,
Are you using a `RichSinkFunction`? There you should have access to the runtime
context, with which you can use to access keyed state.
Cheers,
Gordon
On 24 February 2018 at 3:04:55 PM, Kien Truong (duckientru...@gmail.com) wrote:
Hi,
It seems that I can't used managed keyed state inside s
Hi,
Good to see that you have it working! Yes, each of the Kafka version-specific
connectors also have a dependency on the base Kafka connector module.
Note that it is usually not recommended to put optional dependencies (such as
the connectors) under the lib folder.
To add additional dependenc
The Apache Flink community is very happy to announce the release of Apache
Flink 1.4.1, which is the first bugfix release for the Apache Flink 1.4 series.
Apache Flink® is an open-source stream processing framework for distributed,
high-performing, always-available, and accurate data streaming a
need a different implementation of
KafkaFetcher.runFetchLoop that has slightly different logic for changing
running to be false.
What would you recommend in this case?
From: Tzu-Li (Gordon) Tai [mailto:tzuli...@apache.org]
Sent: Thursday, February 08, 2018 12:24 PM
To: user
Hi Hayden,
Have you tried looking into `KeyedDeserializationSchema#isEndOfStream` [1]?
I think that could be what you are looking for. It signals the end of the
stream when consuming from Kafka.
Cheers,
Gordon
On 8 February 2018 at 10:44:59 AM, Marchant, Hayden (hayden.march...@citi.com)
wrote
there was some ability to override the topic. Is there there a feature
that allows me to do that?
If not do you think this would be a worthwhile addition?
Thanks again,
--
Christophe
On Mon, Feb 5, 2018 at 9:52 AM, Tzu-Li (Gordon) Tai wrote:
Hi Christophe,
You can set the parallelism of the
clearly no. You have to set your parallelism yourself and then it will round
robin between them.
Thanks again,
--
Christophe
On Mon, Feb 5, 2018 at 9:52 AM, Tzu-Li (Gordon) Tai wrote:
Hi Christophe,
You can set the parallelism of the FlinkKafkaConsumer independently of the
total numbe
Hi Christophe,
You can set the parallelism of the FlinkKafkaConsumer independently of the
total number of Kafka partitions (across all subscribed streams, including
newly created streams that match a subscribed pattern).
The consumer deterministically assigns each partition to a single consumer
long you don't rename this
Tuple2TypeInformation around everything will work.. but it feels very
suboptimal.
On 29 January 2018 at 12:33, Tzu-Li (Gordon) Tai wrote:
Hi,
In the Scala API, type serializers may be anonymous classes generated by Scala
macros, and would therefore contain a
Hi,
In the Scala API, type serializers may be anonymous classes generated by Scala
macros, and would therefore contain a reference to the wrapping class (i.e.,
your `Operators` class).
Since Flink currently serializes serializers into the savepoint to be used for
deserialization on restore, and
Hi Christophe,
Thanks a lot for the contribution! I’ll add reviewing the PR to my backlog.
I would like / will try to take a look at the PR by the end of this week, after
some 1.4.1 blockers which I’m still busy with.
Cheers,
Gordon
On 29 January 2018 at 9:25:27 AM, Fabian Hueske (fhue...@gmail
Hi Tarandeep,
The error you are seeing only occurs when on startup of the consumer, it
couldn’t retrieve any partition information from Kafka.
Therefore, according to your description, there should be another error that
caused the previous execution of the job to fail. Could you check that? Mayb
On Mon, Jan 22, 2018 at 6:26 PM, Tzu-Li (Gordon) Tai
wrote:
Hi Philip,
Thanks a lot for reporting this, and looking into this in detail.
Your observation sounds accurate to me. The `endingSequenceNumber` would no
longer be null once a shard is closed, so on restore that would mistaken the
Hi Philip,
Thanks a lot for reporting this, and looking into this in detail.
Your observation sounds accurate to me. The `endingSequenceNumber` would no
longer be null once a shard is closed, so on restore that would mistaken the
consumer to think that it’s a new shard and start consuming it fr
versions of
the connector jars. Once the base jar is in the mvn repository, this won't be
as problematic.
On Friday, January 12, 2018, 9:46:22 AM EST, Tzu-Li (Gordon) Tai
wrote:
Hi Jason,
The KeyedDeserializationSchema is located in the flink-connector-kafka-base
module, so you'll need
Weird huh? I can't see how I would've changed anything related to these when
making those minor code changes required in upgrading to 1.4.
Cheers,
Juho
On Fri, Jan 12, 2018 at 2:58 PM, Tzu-Li (Gordon) Tai
wrote:
Hi Juho,
Could your key type have possibly changed / been modified ac
Hi Jared,
I currently don't have a solid idea of what may be happening, but from the
stack dump you provided, it seems like the client connection you are using
in the Elasticsearch API call bridge is stuck, even after the cleanup.
Do you think there could be some issue with closing the client you
Hi Jason,
The KeyedDeserializationSchema is located in the flink-connector-kafka-base
module, so you'll need to include the jar for that too [1].
Cheers,
Gordon
[1]
https://repo1.maven.org/maven2/org/apache/flink/flink-connector-kafka-base_2.11/1.4.0/
--
Sent from: http://apache-flink-user-
Hi!
Do you mean that you want to count all elements across all partitions of a
DataStream?
To do that, you'll need to transform the DataStream with an operator of
parallelism 1, e.g.
DatatStream stream = ...;
stream.map(new CountingMap<>()).setParallelism(1);
Cheers,
Gordon
--
Sent from: http
Hi Seth,
Thanks a lot for the report!
I think your observation is expected behaviour, if there really is a binary
incompatible change between Scala minor releases.
And yes, the type information macro in the Scala API is very sensitive to
the exact Scala version used. I had in the past also observ
Hi,
Externalized checkpoints [1] seems to be exactly what you are looking for.
Checkpoints are by default not persisted, unless configured otherwise to be
externalized so that they are not automatically cleaned up when the job
fails. They can be used to resume the job.
On the other hand, it woul
Hi Juho,
Could your key type have possibly changed / been modified across the
upgrade?
Also, from the error trace, it seems like the failing restore is of a window
operator. What window type are you using?
That exception is a result of either mismatching key serializers or
namespace serializers (
Hi Juho,
Now that I think of it this seems like a bug to me: why does the call to
getSideOutput succeed if it doesn't provide _any_ input?
With the way side outputs work, I don’t think this is possible (or would make
sense). An operator does not know whether or not it would ever emit some
elem
we can definitely look into that.
If no, is there a workaround to implement or customize AWS Utils?
Thank you
On Jan 11, 2018, at 6:41 PM, Tzu-Li (Gordon) Tai wrote:
Hi Sree,
Are Temporary Credentials automatically shipped with AWS EC2 instances when
delegated to the role?
If yes, you should
Hi Sree,
Are Temporary Credentials automatically shipped with AWS EC2 instances when
delegated to the role?
If yes, you should be able to just configure the properties so that the Kinesis
consumer automatically fetches credentials from the AWS instance.
To do that, simply do not provide the Acce
Hi Gyula,
Is your 1.3 savepoint from Flink 1.3.1 or 1.3.0?
In those versions, we had a critical bug that caused duplicate partition
assignments in corner cases, so the assignment logic was altered from 1.3.1
to 1.3.2 (and therefore also 1.4.0).
If you indeed was using 1.3.1 or 1.3.0, and you are
Hi Jaxon,
The threading model is implemented differently between the Kafka08Fetcher and
all other fetcher versions higher than 0.9+ because the Kafka Java clients used
between these versions have different abstraction levels.
The Kafka08Fetcher still uses the low-level `SimpleConsumer` API, whi
Hi Vishal,
AFAIK, intermittent restore failures from savepoints should not be expected.
Do you still have the logs from the failed restore attempts? What exceptions
were the restores failing on?
We would need to take a look at the logs to figure what may be going on.
Best,
Gordon
--
Sent from:
Hi Mans,
What's the difference between an operator and a function ?
An operator in Flink needs to handle processing of watermarks, records, and
checkpointing of the operator state.
To implement one, you need to extend the AbstractStreamOperator base class.
It is considered a very low-level API
nized by the Gmail!
On Mon, Dec 18, 2017 at 10:14 PM, Tzu-Li (Gordon) Tai
wrote:
Hi Soheil,
It seems like you are trying to link optional Flink libraries that are not
shipped with the binary Flink distributions.
Have you taken a look at this doc [1]? It should contain sufficient
information for
Hi Soheil,
It seems like you are trying to link optional Flink libraries that are not
shipped with the binary Flink distributions.
Have you taken a look at this doc [1]? It should contain sufficient
information for your problem.
Cheers,
Gordon
[1]
http://apache-flink-user-mailing-list-archive.2
Hi Jayant,
Updating your job application / operator code at runtime is currently not
available in Flink.
It is however achievable via taking a savepoint of your job, and then
restoring from the savepoint with your upgraded application.
There's a few points to keep in mind, especially job state co
Hi,
lateness is record time or the real word time?
maxoutoforderness is record time or the real word time?
Both allow lateness of window operators, or maxOutOfOrderness of the
BoundedOutOfOrdernessTimestampExtractor, refer to event time.
i.e.,
- given the end timestamp of a window is x (in ev
/browse/FLINK-8270
On 15 December 2017 at 4:12:24 PM, Tzu-Li (Gordon) Tai (tzuli...@apache.org)
wrote:
Hi 杨光,
Thanks a lot for reporting and looking into this with such detail!
Your observations are correct: the changes from 1.3.2 to 1.4.0 in the
YarnTaskManagerRunner caused the local Keytab
Hi Seth,
Some clarifications to point out:
Quick follow up question. Is there some way to notify a TimestampAssigner that
is consuming from an idle source?
In Flink, idle sources would emit a special idleness marker event that notifies
downstream time-based operators to not wait for its water
Hi Navneeth,
The exception you are getting is a Kafka NetworkException.
From the provided information I can’t really tell much and can only guess, but
are you sure that the client / broker versions match?
It seems like that you are using 0.10; the default client version in the Flink
Kafka 0.10 c
Hi,
I've just elevated FLINK-5479 to BLOCKER for 1.5.
Unfortunately, AFAIK there is no easy workaround solution for this issue
yet in the releases so far.
The min watermark logic that controls per-partition watermark emission is
hidden inside the consumer, making it hard to work around it.
One p
Hi Connie,
We do have a pull request for the feature, that should almost be ready
after rebasing: https://github.com/apache/flink/pull/3915, JIRA:
https://issues.apache.org/jira/browse/FLINK-6352.
This means, of course, that the feature isn't part of any release yet. We
can try to make sure this h
Hi,
I’ve created a PR to publicly expose the feature:
https://github.com/apache/flink/pull/5117.
Whether or not we should include this in the next release candidate for 1.4 is
still up for discussion.
Best,
Gordon
On 4 December 2017 at 3:02:29 PM, Tzu-Li (Gordon) Tai (tzuli...@apache.org
Hi Soheil,
That feature is actually already internally available.
The only issue is that the functionality is not yet exposed via any public APIs
on the Kafka consumer.
Please see this JIRA here: https://issues.apache.org/jira/browse/FLINK-8190.
I’m not sure of exposing the pattern-based subscri
Hi Mike,
The rationale behind implementing the FlinkFixedPartitioner as the default
is so that each Flink sink partition (i.e. one sink parallel subtask) maps
to a single Kafka partition.
One other thing to clarify:
By setting the partitioner to null, the partitioning is based on a hash of
the re
Hi Federico,
It seems like the state cannot be restored because the class of the state type
(i.e., Event) had been modified since the savepoint, and therefore has a
conflicting serialVersionUID with whatever it is in the savepoint.
This can happen if Java serialization is used for some part of y
Hi Soheil,
AFAIK, there is no built-in byte array deserializer in Flink.
However, it is very simple to implement one.
You can do that by implementing the `DeserializationSchema` interface, and for
the implementation of the `deserialize` method, simply return the fetched bytes
from Kafka as the
Hi Robert,
Uncaught exceptions that cause the job to fall into a fail-and-restart loop
is likewise to the corrupt record case I mentioned.
With exactly-once guarantees, the job will roll back to the last complete
checkpoint, which "resets" the Flink consumer to some earlier Kafka
partition offset
Hi!
The FlinkKafkaConsumer can handle watermark advancement with
per-Kafka-partition awareness (across partitions of different topics).
You can see an example of how to do that here [1].
Basically what this does is that it generates watermarks within the Kafka
consumer individually for each Kafka
Hi Robert,
As expected with exactly-once guarantees, a record that caused a Flink job
to fail will be attempted to be reprocessed on the restart of the job.
For some specific "corrupt" record that causes the job to fall into a
fail-and-restart loop, there is a way to let the Kafka consumer skip t
Hi Tony,
Thanks for the report. At first glance of the description, what you described
doesn’t seem to match the expected behavior.
I’ll spend some time later today to check this out.
Cheers,
Gordon
On 15 November 2017 at 5:08:34 PM, Tony Wei (tony19920...@gmail.com) wrote:
Hi Gordon,
When I
The `KafkaTopicPartitionAssigner.assign(partition, numParallelSubtasks)` method
returns the index of the target subtask for a given Kafka partition.
The implementation in that method ensures that the same subtask index will
always be returned for the same partition.
Each consumer subtask will lo
201 - 300 of 578 matches
Mail list logo