When this happens, it appears that one of the workers fails but the rest of the
workers continue to run. How would I be able to configure the app to be able to
recover itself completely from the last successful checkpoint when this happens?
‐‐‐ Original Message ‐‐‐
On Monday, December
I have a job using TTL map state and RocksDB state backend. If I
lengthen the TTL on the map state and resume the job from savepoint (or
checkpoint assuming I don't change the state backend), will new values
added to that map have the new TTL or will the old TTL in the savepoint
override my
HI,all:
I specify the exact offsets the consumer should start from for each
partition.But the Kafka consumer connot periodically commit the offsets to
Zookeeper.
I have disabled the checkpoint only if the job is stopped.This is my code:
val properties = new Properties()
Why does Flink do resource management by only slots, not by TaskManagers
and slots?
If there are one Flink cluster to submit multiple jobs, how do I make
JobManager to distribute subtasks evenly to all TaskManagers?
Now, JobManager treats the slots globally, some jobs' operators are
assigned only
Folks,
We've been having a tough time building a spring boot app (Jar) to get
our Flink jobs running in our Flink Cluster.
The Spring Application Context is always getting set to null - when the
flink job runs - did anyone had luck with this ?. Any help would
be appreciated.
Thanks a lot
I am using 1.7 and job cluster on k8s.
Here is how I start my job
docker-entrypoint.sh job-cluster -j
com.zendesk.fraud_prevention.examples.ConnectedStreams
--allowNonRestoredState
*Seems like --allowNonRestoredState is not honored*
=== Logs ===
You can pass a ParametersProvider to the jdbc input format in order to
parallelize the fetch.
Of course you don't have to kill the mysql server with too many request in
parallel so you'll probably put a limit to the parallelism of the input
format.
On Tue, 4 Dec 2018, 17:31 miki haiat HI ,
> I
Perhaps related to this, one of my Tasks does not seem to be restoring
correctly / check pointing. It hangs during the checkpoint process and then
causes a timeout and then says "Checkpoint Coordinator is suspended." I
have increased the "slot.idel.timeout" as was recommended here
I'm using Flink 1.4.2 and running Flink on Yarn. Job runs with a parallelism
of 2. Each task manager is allocated 1 core. When the container memory
exceeds the allocated memory yarn kills the container as expected.
{"debug_level":"INFO","debug_timestamp":"2018-12-04
We are seeing this OutOfMemoryError in the container logs. How can we
increase the memory to take full advantage of the cluster? Or do we just
have to more aggressively scale?
Best,
Austin
java.lang.OutOfMemoryError: GC overhead limit exceeded
at
Hi all,
We have a Flink 1.6 streaming application running on Amazon EMR, with a
YARN session configured with 20GB for the Task Manager, 2GB for the Job
Manager, and 4 slots (number of vCPUs), in detached mode. Each Core Node
has 4 vCores, 32 GB mem, 32 GB disc, and each Task Node has 4 vCores, 8
HI ,
I want to query some sql table that contains ~80m rows.
There is a few ways to do that and i wonder what is the best way to do
that .
1. Using JDBCINPUTFORMAT -> convert to dataset and output it without
doing any logic in the dataset, passing the full query in the
Hi!
We have been running Flink on Yarn for quite some time and historically we
specified port ranges so that the client can access the cluster:
yarn.application-master.port: 100-200
Now we updated to flink 1.7 and try to migrate away from the legacy
execution mode but we run into a problem that
Unfortunately, setting the parallelism per SQL operator is not supported
right now.
We are currently thinking about a way of having fine-grained control
about properties of SQL operators but this is in an early design phase
and might take a while
Am 04.12.18 um 13:05 schrieb clay:
All calls to createColumnFamily were exchanged by createColumnFamilyWithTtl
private Tuple2> tryRegisterKvStateInformation(
StateDescriptor stateDesc,
TypeSerializer namespaceSerializer,
@Nullable StateSnapshotTransformer snapshotTransformer)
hi Timo:
first very thank u, I have solve the ploblems,
Regarding the problem of too large state, I set the global parallelism to 7
for the program, which solved my problem very well, checkpoint is very fast,
but I would like to ask if there is a way to set parallelism for each
Hi Derek,
what I would recommend to use is to trigger the cancel with savepoint
command [1]. This will create a savepoint and terminate the job execution.
Next you simply need to respawn the job cluster which you provide with the
savepoint to resume from.
[1]
Sorry for the late answer. I haven’t been in the office.
The logs show no problems.
The files that remain in the shared subfolder are almost all 1121 bytes. Except
the files from the latest checkpoint (30 files for all operators)
For each historic checkpoint six files remain (parallelism is 6)
Hi Derek,
I think your automation steps look good.
Recreating deployments should not take long
and as you mention, this way you can avoid unpredictable old/new version
collisions.
Best,
Andrey
> On 4 Dec 2018, at 10:22, Dawid Wysakowicz wrote:
>
> Hi Derek,
>
> I am not an expert in
I forgot to mention that the job was recently moved from the cluster with
SSD disk to SATA and SSD disk. On the old hardware, everything worked fine.
Flink version is 1.6.2. There were FLASH optimized setting for RocksDB.
I've changed to SPINNING_DISK_OPTIMIZED and it didn't have any effect.
Old
Hi Derek,
I am not an expert in kubernetes, so I will cc Till, who should be able
to help you more.
As for the automation for similar process I would recommend having a
look at dA platform[1] which is built on top of kubernetes.
Best,
Dawid
[1] https://data-artisans.com/platform-overview
On
Hi Rakesh,
Could you explain a little bit what is the actual problem? What do you
expect as the ouput and what actually happens? It is hard to guess what
is the problem you're facing.
Best,
Dawid
On 03/12/2018 12:19, Rakesh Kumar wrote:
>
> Hello Team,
>
>
> public class FlinkJoinDataStream {
Hi,
yes this was a unintended behavior that got fixed in Flink 1.7.
See https://issues.apache.org/jira/browse/FLINK-10474
Regards,
Timo
Am 04.12.18 um 05:21 schrieb clay:
I have found out that checkpoint is not triggered. Regarding the in
operation in flink sql, this sql will trigger
Hi Flavio,
Thank you for the example. It is definitely gonna be helpful for some
people!
Best,
Dawid
On 04/12/2018 09:05, Flavio Pompermaier wrote:
> Yesterday it was working...alternatively you can look at
> https://github.com/ckan/ckan-instances/blob/gh-pages/config/instances.json
>
> The
Yesterday it was working...alternatively you can look at
https://github.com/ckan/ckan-instances/blob/gh-pages/config/instances.json
The purpose of this code is to share with the community an example of a
useful input format that coul be interesting for people working with open
data (indeed there
25 matches
Mail list logo