Hey Caio,
Your analysis of the problem sounds right to me, I don't have a good
solution for you :(
I’ve validated that CPU profiles show clearAllState using a significant
> amount of CPU.
Did you use something like async-profiler here? Do you have more info on
the breakdown into what used the C
hives.org) and we could generally
>> > >>>> customize the experience more towards Apache Flink. If we go for
>> > Slack,
>> > >>>> let's definitely try to archive it like Airflow did. If we do
>> this, we
>> > >>>> do
e once the invite link expires. It's not a nice solution, but it'll
work.
(1) https://the-asf.slack.com/archives/CBX4TSBQ8/p1652125017094159
On Mon, May 9, 2022 at 3:59 PM Robert Metzger wrote:
> Thanks a lot for your answer. The onboarding experience to the ASF Slack
> is
p for the ASF instance of Slack, you can
> only get there if you're a committer or if you're invited by a committer.
>
> On Mon, 9 May 2022 at 15:15, Robert Metzger wrote:
>
> > Sorry for joining this discussion late, and thanks for the summary
> Xintong!
> >
&g
Sorry for joining this discussion late, and thanks for the summary Xintong!
Why are we considering a separate slack instance instead of using the ASF
Slack instance?
The ASF instance is paid, so all messages are retained forever, and quite a
few people are already on that Slack instance.
There is
Hi,
I suspect that this error is not caused by Flink code (because our
serializer stack is fairly stable, there would be more users reporting such
issues if it was a bug in Flink).
In my experience, these issues are caused by broken serializer
implementations (e.g. a serializer being used by multi
Hi Salva,
my somewhat wild guess (because I'm not very involved with the Scala
development on Flink): I would stick with option 1 for now. It should be
easier now for the Flink community to support Scala versions past 2.12
(because we don't need to worry about scala 2.12+ support for Flink's
intern
Hi,
IngestionTime is usually used when the records don't have a proper event
time associated with it, but the job has a long topology, and users want to
persist the (time)order of events as they arrive in the system.
In that sense, you can use the regular event time watermark strategies also
for i
Hi,
multiple records are in the system at the same time, because Flink is
buffering records in various components, for efficiency reasons. That's why
you see that an individual record might have a latency of ~100ms, while
Flink is processing many more messages.
On Tue, Apr 5, 2022 at 12:54 PM lo
Hi Matt,
At first glance your code looks fine. I guess you'll need to follow the
codepaths more with the debugger.
Have you made sure that "reachedEnd()" returns false?
On Tue, Apr 5, 2022 at 9:42 AM Matthew Brown wrote:
> Hi all,
>
> I'm attempting to build a Table API connector for BigQuery
Hi Oran,
as you've already suggested, you could just use a (flat)map function that
takes an ObjectNode and outputs a string.
In the mapper, you can do whatever you want in case of an invalid object:
logging about it, discarding it, writing an "error json string", writing to
a side output stream, .
Hi Sweta,
yes, you can not run a Flink job compiled against Flink 1.13. against a
1.14 cluster. But if you are only using stable APIs of Flink, you should be
able to compile your job with the 1.14 dependencies without touching the
code.
See also:
https://nightlies.apache.org/flink/flink-docs-rele
Hi Javier,
I suspect that TwitterServer is using some classloading / dependency
injection / service loading "magic" that is causing this.
I would try to find out, either by attaching a remote debugger (should be
possible when executing in cluster mode locally) or by adding log
statements in the co
Hi Parag,
it seems that you are submitting a job with the same job id multiple times.
An easy fix would be generating a new job id each time you are submitting
the job.
To debug this: check out the Flink jobmanager logs, there are log messages
for every job submission.
On Thu, Jan 27, 2022 at 9
Hi Paul,
where are you storing your checkpoints, and what's their size?
IIRC, Flink won't trigger a new checkpoint before the old ones haven't been
cleaned up, and if your checkpoints are large and stored on S3, it can take
a while to clean them up (especially with the Hadoop S3 plugin, using
pre
Instead of using `reinterpretAsKeyedStream` can you use keyBey and see if
the behavior gets deterministic?
On Thu, Jan 27, 2022 at 9:49 PM Alexis Sarda-Espinosa <
alexis.sarda-espin...@microfocus.com> wrote:
> I'm not sure if the issue in [1] is relevant since it mentions the Table
> API, but it
Hi Alexis,
The usage of Custom Resource Definitions (CRDs). The main reason given to
> me was that such resources are global (for a given cluster) and that is not
> desired. I know that ultimately a CR based on a CRD can be scoped to a
> specific namespace, but customer is king…
I don't think th
Hi Jessy,
Which approach is suitable for a standalone deployment in Kubernetes? Do we
> have some best practises for running Flink applications on K8s ?
I would deploy Flink in Application Mode using the standalone K8s
deployment:
https://nightlies.apache.org/flink/flink-docs-master/docs/deploym
Hi Ayush,
I couldn't find the documentation you've mentioned. Can you send me a link
to it?
On Tue, Dec 7, 2021 at 10:40 AM Ayush Chauhan
wrote:
> Hi,
>
> Can you please let me know the alternatives of isEndOfStream() as now
> according to docs this method will no longer be used to determine th
Hi Jérémy,
In my understanding of the StateFun docs, you need to pass custom
properties using "ingress.spec.properties".
For example:
ingresses:
- ingress:
meta:
type: io.statefun.kafka/ingress
id: project.A/input
spec:
properties:
max.request.size
Hi,
I guess all the commits mentioned in all the subtasks of this ticket will
give you the feature: https://issues.apache.org/jira/browse/FLINK-23451
Hower, I'm pretty sure that you can't just cherry-pick such a big feature
to an older Flink version.
I would rather try to fix the connector to up
ful checkpoint.
>
> I'm confused why `811d3b279c8b26ed99ff0883b7630242` is used and not the
> auto-generated uid. That seems like a bug.
>
> On Tue, Dec 7, 2021 at 10:40 PM Robert Metzger
> wrote:
>
>> Hi Dan,
>>
>> When restoring a savepoint/checkpoin
Hi Puneet,
Are you submitting the Flink jobs using the "/bin/flink" command line tool
to a cluster in session mode?
Maybe the command line tool is just "fire and forget" submitting the job to
the cluster, that's why the listeners are firing immediately.
Can you try to use "env.executeAsync()" inst
Hi Dan,
When restoring a savepoint/checkpoint, Flink is matching the state for the
operators based on the uuid of the operator. The exception says that there
is some state that doesn't match any operator. So from Flink's perspective,
the operator is gone.
Here is more information:
https://nightlie
@Matthias Pohl : I've also been annoyed by this 30
days limit, but I'm not aware of a way to globally change the default. I
would ask in #asfinfra in the asf slack.
On Thu, Sep 30, 2021 at 12:19 PM Till Rohrmann wrote:
> Thanks for the hint with the managed search engines Matthias. I think this
Hi,
afaik the only real blocker for ARM support was a rocksdb binary for arm.
This has been resolved and is scheduled to be released with 1.14.0:
https://issues.apache.org/jira/browse/FLINK-13598
If you have an ARM machine available, you could even help the community in
the release verification p
Hey Andreas,
This could be related too
https://github.com/apache/hadoop/pull/110/files#diff-0a2e55a2f79ea4079eb7b77b0dc3ee562b383076fa0ac168894d50c80a95131dR950
I guess in Flink this would be
s3.endpoint: your-endpoint-hostname
Where your-endpoint-hostname is a region-specific endpoint, which y
Hi,
Yes, "rest.bind-port" seems to be set to "35485" on the JobManager
instance. Can you double check the configuration that is used by Flink?
The jobManager is also printing the effective configuration on start up.
You'll probably see the value there as well.
On Wed, Sep 22, 2021 at 6:48 PM Cur
Hi,
What happens if you do not set any boundedness on the KafkaSource?
For a DataStream job in streaming mode, the Kafka source should be
unbounded.
>From reading the code, it seems that setting unbounded(latest) should not
trigger the behavior you mention ... but the Flink docs are not clearly
w
Hi Kamaal,
I would first suggest understanding the performance bottleneck, before
applying any optimizations.
Idea 1: Are your CPUs fully utilized?
if yes, good, then scaling up will probably help
If not, then there's another inefficiency
Idea 2: How fast can you get the data into your job, with
d.
>
> Just FYI - We are using Fixed Delay Restart (5 times, 10s delay)
>
> On Thu, Sep 9, 2021 at 4:29 PM Robert Metzger wrote:
>
>> Hi Puneet,
>>
>> Can you provide us with the JobManager logs of this incident? Jobs should
>> not disappear, they should res
ner.java:84)
> at
> org.apache.flink.client.deployment.application.DetachedApplicationRunner.run(DetachedApplicationRunner.java:70)
> at
> org.apache.flink.runtime.webmonitor.handlers.JarRunHandler.lambda$handleRequest$0(JarRunHandler.java:102)
> at
> java.util.concurrent.CompletableFu
Hi Yuval,
EOF exceptions during serialization are usually an indication that some
serializers in the serializer chain is somehow broken.
What data type are you serializating? Does it include some type serializer
by a custom serializer, or Kryo, ... ?
On Thu, Sep 9, 2021 at 4:35 PM Yuval Itzchakov
t;
> On Thu, Sep 9, 2021 at 4:26 PM Robert Metzger wrote:
>
>> Hey,
>>
>> Why do you have these dependencies in your pom?
>>
>>
>>
>> org.apache.kafka
>> kafka-clients
>> 2.8
Hi,
from my understanding of the code [1], the task scheduling first considers
the state location, and then uses the evenly spread out scheduling strategy
as a fall back. So in my understanding of the code, the local recovery
should have preference over the evenly spread out strategy.
If you can e
Hi Puneet,
Can you provide us with the JobManager logs of this incident? Jobs should
not disappear, they should restart on other Task Managers.
On Wed, Sep 8, 2021 at 3:06 PM Puneet Duggal
wrote:
> Hi,
>
> So for past 2-3 days i have been looking for documentation which
> elaborates how flink t
Hey,
Why do you have these dependencies in your pom?
org.apache.kafka
kafka-clients
2.8.0
org.apache.kafka
kafka_2.12
2.8.0
They are not needed for using the Kafka connector of
Is the kubernetes server you are using particularly busy? Maybe these
issues occur because the server is overloaded?
"Triggering checkpoint 2193 (type=CHECKPOINT) @ 1630681482667 for job
."
"Completed checkpoint 2193 for job (474
byt
Hey Joshua,
Can you first validate if the docker image you've built is valid by running
it locally on your machine?
I would recommend putting the s3 filesystem files into the plugins [1]
directory to avoid classloading issues.
Also, you don't need to build custom images if you want to use build-i
>>
>> The partitions are assigned equally if we are reading from a single topic.
>>
>> Our Use case is to read from multiple topics [topics r4 regex pattern] we
>> use 6 topics and 1 partition per topic for this job.
>>
>> In this case , few of the kafka consu
untime?
>
>
>
>
>
>
>
> *From: *Robert Metzger
> *Date: *Tuesday, August 3, 2021 at 11:52 AM
> *To: *Weston Woods
> *Cc: *"user@flink.apache.org"
> *Subject: *Re: Savepoint class refactor in 1.11 causing restore from 1.9
> savepoint to fail
>
>
&g
Hi Svend,
I'm a bit confused by this statement:
* In sreaming mode, with checkpoing but removing the `setBounded()` on the
> kafka source yields the same result
My expectation would be that the source runs forever, if it is not bounded.
Are you sure this error message is not coming from another
Hi Robin,
Let's say you have two checkpoints #1 and #2, where #1 has been created by
an old version or your job, and #2 has been created by the new version.
When can you delete #1?
In #1, there's a directory "/shared" that contains data that is also used
by #2, because of the incremental nature of
Hi Weston,
I haven never looked into the savepoint migration code paths myself, but I
know that savepoint migration across multiple versions is not supported
(1.9 can only migrate to 1.10, not 1.11). We have test coverage for these
migrations, and I would be surprised if this "Savepoint" class migr
Hey,
generally, the mailing lists (and JIRA) are indexed by search engines, in
particular Google. As long as you have a specific enough search string
(such as an exception message), you should find past problems and solutions.
You can also download the entire Flink mailing list archives. For exam
Hey Sudhanva,
Have you configured IntelliJ to include dependencies with "Provided" Scope
when executing your main method?
I also noticed that you are using Flink 1.13.1 and 1.13.0 in your pom. its
probably not an issue in this case, but it can cause problems.
On Fri, Jul 30, 2021 at 10:29 AM Sudh
Hey Joe,
thanks a lot for reaching out regarding this.
I have no explanation for why this exists, but since there's not ticket
about this yet, I filed one:
https://issues.apache.org/jira/browse/FLINK-23589
I also pinged some committers who can hopefully provide some
additional context.
I would pro
Hi Alexis,
I hope I'm not stating the obvious, but have you checked this documentation
page:
https://ci.apache.org/projects/flink/flink-docs-master/docs/ops/debugging/debugging_classloading/#unloading-of-dynamically-loaded-classes-in-user-code
In particular the shutdown hooks we've introduced in F
+1 to this change!
When I was working on the reactive mode blog post [1] I also ran into this
issue, leading to a poor "out of the box" experience when scaling down.
For my experiments, I've chosen a timeout of 8 seconds, and the cluster has
been running for 76 days (so far) on Kubernetes.
I also
our reply.
>
> Will this feature be released in version 1.14?
>
> Best,
>
> Hui
>
> *发件人:* Robert Metzger [mailto:rmetz...@apache.org]
> *发送时间:* 2021年7月20日 19:45
> *收件人:* Wanghui (HiCampus)
> *抄送:* user@flink.apache.org
> *主题:* Re: Some question of RocksDB state
Hi Prasanna,
which Flink version and Kafka connector are you using? (the "KafkaSource"
or "FlinkKafkaConsumer"?)
The partition assignment for the FlinkKafkaConsumer is defined here:
https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-kafka/src/main/java/org/apache/flink/st
The RocksDB version provided by Flink does not currently run on ARM.
However, there are some efforts / hints:
- https://stackoverflow.com/a/44573013/568695
- https://issues.apache.org/jira/browse/FLINK-13448
- https://issues.apache.org/jira/browse/FLINK-13598
I would recommend voting and commenti
Your understanding of the problem is correct -- the serialization cost is
the reason for the high CPU usage.
What you can also try to optimize is the serializers you are using (by
using data types that are efficient to serialize). See also this blog post:
https://flink.apache.org/news/2020/04/15/f
Are you using remote disks for rocksdb? (I guess that's EBS on AWS) Afaik
there are usually limitations wrt to the IOPS you can perform.
I would generally recommend measuring where the bottleneck is coming from.
It could be that your CPUs are at 100%, then adding more machines / cores
will help (m
ot; job finishes. Using
> executeAsync(), which is non-blocking, will lead to the "next" job starting
> before "this" job finishes.*
>
>
> On Tue, Jun 22, 2021 at 11:13 PM Robert Metzger
> wrote:
>
>> Hi Qihua,
>>
>> Application Mode
Hi Qihua,
Application Mode is meant for executing one job at a time, not multiple
jobs on the same JobManager.
If you want to do that, you need to use session mode, which allows managing
multiple jobs on the same JobManager.
On Tue, Jun 22, 2021 at 10:43 PM Qihua Yang wrote:
> Hi Arvid,
>
> Do
for users, I think these instructions can appear
> in the Configuration of the official document.
>
> Best,
> Jason
>
> JasonLee1781
> jasonlee1...@163.com
>
> <https://maas.mail.163.com/dashi-web-extend/html/proSignature.html?ftlId=1&name=JasonLee1781&uid=jas
> immutable memtables metrics increased not as fast as before the
> backpressure.
>
> I can provide more context if you are interested. We are still debugging
> on this issue.
>
> Rommel
>
>
>
>
>
> On Wed, Jun 16, 2021 at 4:25 AM Robert Metzger
> wrote:
>
to test this hypothesis by
> making the same comparison with some simpler state inside the aggregation
> window.
>
> On Wed, 16 Jun 2021, 7:58 pm Robert Metzger, wrote:
>
>> Depending on the datatypes you are using, seeing 3x more CPU usage seems
>> realistic.
>>
It seems like the PrometheusReporter doesn't support HTTPS.
The Flink reporter seems to be based on the HttpServer prometheus client. I
wonder if using the servlet client would allow us to add HTTPS support:
https://github.com/prometheus/client_java/blob/master/simpleclient_servlet/src/main/java/i
Note to others on this mailing list. This email has also been sent with the
subject "Flink parameter configuration does not take effect" to this list.
I replied there, let's also discuss there.
On Tue, Jun 15, 2021 at 7:39 AM Jason Lee wrote:
> Hi everyone,
>
> When I was researching and using
Hi Felipe,
Which data source are you using?
> Then, in the MyCoGroupFunction there are only events of stream02
Are you storing events in your state?
> Is this the case where I have to use RichCoGroupFunction and save the
state by implementing the CheckpointedFunction?
If you want your state to
Thanks a lot for sharing the solution on the mailing list and in the
ticket.
On Tue, Jun 15, 2021 at 11:52 AM Patrick Lucas
wrote:
> Alright, I figured it out—it's very similar to FLINK-13703, but instead of
> having to do with immutable fields, it's due to use of the Avro Gradle
> plugin option
Depending on the datatypes you are using, seeing 3x more CPU usage seems
realistic.
Serialization can be quite expensive. See also:
https://flink.apache.org/news/2020/04/15/flink-serialization-tuning-vol-1.html
Maybe it makes sense to optimize there a bit.
On Tue, Jun 15, 2021 at 5:23 PM JING ZHAN
Hi Marco,
The DataSet API will not run out of memory, as it spills to disk if the
data doesn't fit anymore.
Load is distributed by partitioning data.
Giving you advice depends a bit on the use-case. I would explore two major
options:
a) reading the data from postgres using Flink's SQL JDBC connec
Hi Dipanjan,
Using Flink SQL's MATCH_RECOGNIZE operator is certainly a good idea if you
are looking for a non-programmatic way to do CEP with Flink.
On Wed, Jun 16, 2021 at 6:44 AM Dipanjan Mazumder wrote:
> Hi,
>
> Can we say that Flink SQL is kind of a DSL overlay on flink CEP , i
> mean
Thanks for the logs.
The OK job seems to read from "s3a://test-bucket/", while the KO job reads
from "s3a://bucket-test/". Could it be that you are just trying to access
the wrong bucket?
What I also found interesting from the KO Job TaskManager is this log
message:
Caused by: java.net.NoRouteTo
Hi Thomas,
My gut feeling is that you can use the available resources more efficiently.
What's the size of a checkpoint for your job (you can see that from the
UI)?
Given that your cluster has has an aggregate of 64 * 12 = 768gb of memory
available, you might be able to do everything in memory (
Hi Jason,
How are you deploying your Flink SQL tasks? (are you using
per-job/application clusters, or a session cluster? )
I agree that the configuration management is not optimal in Flink. By
default, I would recommend assuming that all configuration parameters are
cluster settings, which requir
Hi Yidan,
it seems that the attachment did not make it through the mailing list. Can
you copy-paste the text of the exception here or upload the log somewhere?
On Wed, Jun 16, 2021 at 9:36 AM yidan zhao wrote:
> Attachment is the exception stack from flink's web-ui. Does anyone
> have also met
+1 (binding)
- Tried out reactive mode in from the scala 2.11 binary locally (with scale
up & stop with savepoint)
- reviewed website update
- randomly checked a jar file in the staging repo (flink-python jar looks
okay (I just checked superifically))
On Fri, May 28, 2021 at 5:16 AM Leonard Xu
Hi Bob,
if you don't need any time characteristics, go with processing time.
Ingestion time will call System.currentTimeMillis() on every incoming
record, which is an somewhat expensive call.
Event time (and ingestion time) will attach a long field to each record,
making the records 8 bytes larger
Hi,
can you provide the jobmanager log of that run? it seems that the operation
timed out. The JobManager log will help us to give some insights into the
root cause.
On Tue, May 18, 2021 at 1:42 PM V N, Suchithra (Nokia - IN/Bangalore) <
suchithra@nokia.com> wrote:
> Hi,
>
>
>
> Stop command
Hi Theo,
Since you are running Flink locally it would be quite easy to attach a
profiler to Flink to see where most of the CPU cycles are burned (or: check
if you are maybe IO bound?) .. this could provide us with valuable data on
deciding for the next steps.
On Tue, May 18, 2021 at 5:26 PM Theo
Hi Yaroslav,
My recommendation is to go with the 2nd pattern you've described, but I
only have limited insights into real world production workloads.
Besides the parallelism configuration, I also recommend looking into slot
sharing groups, and maybe disabling operator chaining.
I'm pretty sure so
?
>
> On Thu, 20 May 2021 at 05:35, Robert Metzger wrote:
>
>> Hey Igor,
>>
>> 1) yes, reactive mode indeed does the same.
>> 2) No, HA mode is only storing some metadata in ZK about the leadership
>> and latest checkpoints, but the checkpoints itself are the sam
Hey Igor,
1) yes, reactive mode indeed does the same.
2) No, HA mode is only storing some metadata in ZK about the leadership and
latest checkpoints, but the checkpoints itself are the same. They should be
usable for a changed job graph (if the state matches the operators by
setting the UUIDs [1]
Hi Suchithra,
this is very likely a version mixup: Can you make sure all jars in your
classpath are Flink 1.11.1?
On Mon, May 17, 2021 at 2:05 PM V N, Suchithra (Nokia - IN/Bangalore) <
suchithra@nokia.com> wrote:
> Hi,
>
>
>
> With flink 1.11.1 version, taskmanager initialization is failing
Hi Ragini,
Since this exception is coming from the Hbase client, I assume the issue
has nothing to do with Flink directly.
I would recommend carefully studying the HBase client configuration
parameters, maybe setup a simple Java application that "hammers" data into
Hbase at a maximum rate to under
t
> org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
> ~[flink-dist_2.12-1.13.0.jar:1.13.0]
> at
> org.apache.flink.shaded.netty4.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
>
Hi,
can you check the client log in the "log/" directory?
The Flink client will try to access the K8s API server to retrieve the
endpoint of the jobmanager. For that, the pod needs to have permissions
(through a service account) to make such calls to K8s. My hope is that the
logs or previous messag
Thanks a lot to everybody who has contributed to the release, in particular
the release managers for running the show!
On Tue, May 4, 2021 at 8:54 AM Konstantin Knauf
wrote:
> Thank you Dawid and Guowei! Great job everyone :)
>
> On Mon, May 3, 2021 at 7:11 PM Till Rohrmann wrote:
>
>> This is
Quick comment on the kryo type registration and the messages you are
seeing: The messages are expected: What the message is saying is that we
are not serializing the type using Flink's POJO serializer, but we are
falling back to Kryo.
Since you are registering all the instances of Number that you a
Hi Dan,
Can you describe under which conditions you are missing records (after a
machine failure, after a Kafka failure, after taking and restoring from a
savepoint, ...).
Are many records missing? Are "the first records" or the "latest records"
missing? Any individual records missing, or larger b
Thanks a lot for your message. This could be a bug in Flink. It seems that
the archival of the execution graph is failing because some classes are
unloaded.
What I observe from your stack traces is that some classes are loaded from
flink-dist_2.11-1.11.2.jar, while other classes are loaded from
te
Hey Prashant,
the Kafka Consumer parallelism is constrained by the number of partitions
the topic(s) have. If you have configured the Kafka Consumer in Flink with
a parallelism of 100, but your topic has only 20 partitions, 80 consumer
instances in Flink will be idle.
On Mon, Apr 26, 2021 at 2:54
Hi Dan,
can you provide me with the JobManager logs to take a look as well? (This
will also tell me which Flink version you are using)
On Mon, Apr 26, 2021 at 7:20 AM Dan Hill wrote:
> My Flink job failed to checkpoint with a "The job has failed" error. The
> logs contained no other recent e
egards
> Klemens
>
>
> Am 15.04.2021 um 21:29 schrieb Robert Metzger :
>
> Hi,
>
> a DEBUG log of the client would indeed be nice.
> Can you adjust this file:
>
> conf/log4j-cli.properties
>
> to the f
based on
> https://www.alibabacloud.com/blog/deep-insights-into-flink-sql-flink-advanced-tutorials_596628.
> If it's doable, then I'll be able to solve our problem with applying
> streamfilesink to the transformed dataset.
>
> Best wishes,
> Chen-Che Huang
>
> On 2021
> test the Flink Pipelines until Silicon is supported.
>
> Nevertheless, thanks for your answer. If there is anything I can provide
> you to narrow down the problem, I am happy to help.
>
> Regards
> Klemens
>
> Am 15.04.2021 um 20:59 schrieb Robert Metzger :
>
>
Hey Chen-Che Huang,
I guess the StreamingFileSink is what you are looking for. It is documented
here:
https://ci.apache.org/projects/flink/flink-docs-stable/dev/connectors/streamfile_sink.html
I drafted a short example (that is not production ready), which does
roughly what you are asking for:
htt
Hey Klemens,
I'm sorry that you are running into this. Looks like you are the first (of
probably many people) who use Flink on a M1 chip.
If you are up for it, we would really appreciate a fix for this issue, as a
contribution to Flink.
Maybe you can distill the problem into an integration test,
Hi,
I'm not aware of any known issues with Hadoop and Flink on Docker.
I also tried what you are doing locally, and it seems to work:
flink-jobmanager| 2021-04-15 18:37:48,300 INFO
org.apache.flink.runtime.entrypoint.ClusterEntrypoint[] - Starting
StandaloneSessionClusterEntrypoint.
Hey Yaroslav,
GCS is a somewhat popular filesystem that should work fine with Flink.
It seems that the initial scale of a bucket is 5000 read requests per
second (https://cloud.google.com/storage/docs/request-rate), your job
should be at roughly the same rate (depending on how fast your job resta
It could very well be that your job gets stuck in a restart loop for some
reason. Can you either post the full TaskManager logs here, or try to
figure out yourself why the first checkpoint that timed out, timed out?
Backpressure or blocked operators are a common cause for this. In your
case, it cou
1.953gb (2097152000 bytes)
> INFO [] - Off-heap: 128.000mb (134217728 bytes)
> INFO [] - JVM Metaspace: 256.000mb (268435456 bytes)
> INFO [] - JVM Overhead: 264.889mb (277756136 bytes)
>
>
> On Fri, Mar 26, 2021 at 4:03 AM Robert Metzger
&
, 2020 at 10:57 PM
> >>>> >> To: dev mailto:d...@flink.apache.org>>, user
> <
> >>>> >> user@flink.apache.org<mailto:user@flink.apache.org>>
> >>>> >> Cc: Lasse Nedergaard >>>> >> lassenedergaardfl...@gmail.c
Hey Matthias,
Maybe the classpath contains hadoop libraries, but not the HDFS libraries?
The "DistributedFileSystem" class needs to be accessible to the
classloader. Can you check if that class is available?
Best,
Robert
On Thu, Mar 25, 2021 at 11:10 AM Matthias Seiler <
matthias.sei...@campus.t
Hey Arran,
It seems that the preferred way, even in the Java API is to use a DDL
statement:
https://github.com/apache/flink/blob/master/flink-table/flink-table-api-java-bridge/src/main/java/org/apache/flink/table/api/bridge/java/StreamTableEnvironment.java#L602-L639
Hope this helps!
Best,
Rober
Hey Sandeep,
(Maybe this thread is also relevant:
https://lists.apache.org/thread.html/7d56267d4c2344ccb5a774896682d0a3efb38c1c215ef3500c3569a2%40%3Cuser.flink.apache.org%3E
)
> My question is how do I initialise the pipeline for the first set of
records in the database? i.e. those that are not C
1 - 100 of 1085 matches
Mail list logo