Flink HA on Kubernetes - RPC port

2023-01-20 Thread bastien dine
Hello, We are migrating our HA setup from ZK to K8S, and we have a question regarding the RPC port. Previously with ZK, the RPC connection config was the : high-availability.jobmanager.port We were expecting that the config will be the same with K8S HA, as the doc says : "The port (range) used

DataSet API, chaining database access

2022-07-12 Thread bastien dine
Can someone help me ? *Regards,* ------ Bastien DINE Data Architect / Software Engineer / Sysadmin bastiendine.io

Re: New KafkaSource API : Change in default behavior regarding starting offset

2022-06-15 Thread bastien dine
ns that we drop some of our kafka source state to read again from kafka committed offset, but maybe nodoby does that ^^ Anyway thanks for the focus on the release note ! Best Regards, ------ Bastien DINE Data Architect / Software Engineer / Sysadmin bastiendine.io Le mer. 15 juin 20

Re: New KafkaSource API : Change in default behavior regarding starting offset

2022-06-15 Thread bastien dine
whether this is a changed behavior or I am missing something Bastien DINE Freelance Data Architect / Software Engineer / Sysadmin http://bastiendine.io Le mar. 14 juin 2022 à 23:08, Jing Ge a écrit : > Hi Bastien, > > Thanks for asking. I didn't find any call of setStartFromGroupOffsets

New KafkaSource API : Change in default behavior regarding starting offset

2022-06-14 Thread bastien dine
tedOffsets(OffsetResetStrategy.EARLIEST This change can lead to big troubles if user pay no attention to this point when migrating from old KafkaConsumer to new KafkaSource, Regards, Bastien ------ Bastien DINE Data Architect / Software Engineer / Sysadmin bastiendine.io

Re: question on dataSource.collect() on reading states from a savepoint file

2022-02-10 Thread bastien dine
ink as we are > planning to write to S3? > > Thanks again for your help. > > Antonio. > > On Thu, Feb 10, 2022 at 11:49 AM bastien dine > wrote: > >> Hello Antonio, >> >> .collect() method should be use with caution as it's collecting the >> Da

Re: question on dataSource.collect() on reading states from a savepoint file

2022-02-10 Thread bastien dine
it into a formatted file instead (like CSV one) or if it's too big, into something splittable Regards, Bastien -- Bastien DINE Data Architect / Software Engineer / Sysadmin bastiendine.io Le jeu. 10 févr. 2022 à 20:32, Antonio Si a écrit : > Hi, > > I am using the

Re: Flink High-Availability and Job-Manager recovery

2022-02-04 Thread bastien dine
Hello, On k8s the current recommendation is to set up 1 job manager with H-A enabled, so that cluster do not lost state upon crash 1. The storage dir can for sure be on kube PV, the directory should be shared within all JM, you will need to map the volume to the same local directory (e.g /data)

Re: Pojo State Migration - NPE with field deletion

2022-02-03 Thread bastien dine
t throwing an exception ? Or do I try to find the root cause, namely why the field serializer of the deleted field is still present ? Regards, ------ Bastien DINE Data Architect / Software Engineer / Sysadmin bastiendine.io Le mer. 2 févr. 2022 à 16:37, Alexis Sarda-Espinosa < alexi

Pojo State Migration - NPE with field deletion

2022-02-02 Thread bastien dine
Hello, I have some trouble restoring a state (pojo) after deleting a field According to documentation, it should not be a problem with POJO : *"Fields can be removed. Once removed, the previous value for the removed field will be dropped in future checkpoints and savepoints."* Here is a short

Re: Problem when restoring from savepoint with missing state & POJO modification

2020-12-09 Thread bastien dine
, Bastien -- Bastien DINE Data Architect / Software Engineer / Sysadmin bastiendine.io Le mar. 8 déc. 2020 à 08:54, Yun Tang a écrit : > Hi Bastien, > > Flink supports to register state via state descriptor when > calling runtimeContext.getState(). However, on

Problem when restoring from savepoint with missing state & POJO modification

2020-12-07 Thread bastien dine
on't want them to contain non-used state If anybody has experienced an issue like that before or knows how to handle this, I would be glad to discuss ! Best regards, -- Bastien DINE Data Architect / Software Engineer / Sysadmin

Re: Keyed raw state - example

2019-11-18 Thread bastien dine
Hello Congxian, Thanks for your response, Don't you have an example with an Operator extending the AbstractUdfStreamOperator? Using the context.getRawKeyedStateInputs() (& output to snapshots) TimeService is reimplementing the whole stuff :/ -- Bastien DINE Data Archi

Keyed raw state - example

2019-11-15 Thread bastien dine
( need to access it and update) so it's taking a crazy amount of time and we would like to have it serialized only on snapshot, so using Raw state is a possible good solution, But i cannot find anyexample of it :/ Thanks and best regards, Bastien DINE Freelance Data Architect / Software Engineer

Async operator with a KeyedStream

2019-10-31 Thread bastien dine
enough ressources to increase my parallelism), As far as I have seen, this is not possible When I write something like Async.orderedWait(myStream.keyBy(myKeyselector)), the keyBy is totally ignored Have you a solution for this? Best Regards, Bastien -- Bastien DINE Data Architect

NullPointerException on StreamTask

2019-05-27 Thread bastien dine
(Constructor.java:423) at org.apache.flink.runtime.taskmanager.Task.loadAndInstantiateInvokable(Task.java:1405) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:689) at java.lang.Thread.run(Thread.java:748) -- Bastien DINE Data Architect / Software Engineer / Sysadmin

Re: Job cluster and HA

2019-05-25 Thread bastien dine
with this entrypoint) it will read in ZK / HDFS and restore previous execution Regards, Bastien ------ Bastien DINE Data Architect / Software Engineer / Sysadmin bastiendine.io Le ven. 24 mai 2019 à 23:22, Boris Lublinsky a écrit : > Hi, > I was experimenting with HA latel

State migration into multiple operators

2019-05-14 Thread bastien dine
containing field1, field2 I would like to split it into two operator B & C with, respectively, state field1 and state field2 How can I split my state upon multiple operators ? Best Regards, Bastien ------ Bastien DINE Data Architect / Software Engineer / Sysadmin bastiendine.io

H-A Deployment : Job / task manager configuration

2019-02-05 Thread bastien dine
between job manager / task manager / zk Best Regards, Bastien -- Bastien DINE Data Architect / Software Engineer / Sysadmin bastiendine.io

Submit a job to a remote cluster : RemoteEnvironnement or ClientLevel ?

2019-02-01 Thread bastien dine
sing the clientLevel to submit a program ? (with a regular execution environment ) https://ci.apache.org/projects/flink/flink-docs-stable/dev/parallel.html#client-level What are the caveats of each methods ? Best Regards, Bastien -- Bastien DINE Data Architect / Software Engi

Re: One TaskManager per node or multiple TaskManager per node

2019-01-15 Thread bastien dine
Hello Jamie, Does #1 apply to batch jobs too ? Regards, -- Bastien DINE Data Architect / Software Engineer / Sysadmin bastiendine.io Le lun. 14 janv. 2019 à 20:39, Jamie Grier a écrit : > There are a lot of different ways to deploy Flink. It would be easier to >

Re: Flink on Kubernetes - Hostname resolution between job/tasks-managers

2019-01-15 Thread bastien dine
Nevermind.. Problem already discussed in thread : Flink 1.7 jobmanager tries to lookup taskmanager by its hostname in k8s environment" ------ Bastien DINE Data Architect / Software Engineer / Sysadmin bastiendine.io Le mar. 15 janv. 2019 à 15:16, bastien dine a écrit : &g

Flink on Kubernetes - Hostname resolution between job/tasks-managers

2019-01-15 Thread bastien dine
eed to had a service or something to make it work ? Best Regards, Bastien -- Bastien DINE Data Architect / Software Engineer / Sysadmin bastiendine.io

Re: Question about key group / key state & parallelism

2018-12-12 Thread bastien dine
cally do the processing, right ? How can I deal with that ? Best Regard and many thanks ! Bastien -- Bastien DINE Data Architect / Software Engineer / Sysadmin bastiendine.io Le mer. 12 déc. 2018 à 13:39, Hequn Cheng a écrit : > Hi Bastien, > > Each key “belongs” to exa

Question about key group / key state & parallelism

2018-12-12 Thread bastien dine
is related, does "works with the keys" means always the same keys ? Best Regards, Bastien -- Bastien DINE Data Architect / Software Engineer / Sysadmin bastiendine.io

Re: Flink - Metric are not reported

2018-11-28 Thread bastien dine
Yea, that was I was thinking.. Batch can be quick Can I report metric on "added" action ? (i should override the notifyOnAddedMetric to report ?) ------ Bastien DINE Data Architect / Software Engineer / Sysadmin bastiendine.io Le mer. 28 nov. 2018 à 15:54, Chesnay Scheple

Re: Flink - Metric are not reported

2018-11-28 Thread bastien dine
logy ? *Note* : I am using DataSet API (so my program is batch, and not continuous) Regards, Bastien -- Bastien DINE Data Architect / Software Engineer / Sysadmin bastiendine.io Le mar. 27 nov. 2018 à 17:07, Chesnay Schepler a écrit : > Please enable WARN logging an

Flink - Metric are not reported

2018-11-27 Thread bastien dine
Name)) .output(new MetricGaugeOutputFormat<>()); // dummy output -- Bastien DINE Data Architect / Software Engineer / Sysadmin bastiendine.io

Re: Multiple env.execute() into one Flink batch job

2018-11-23 Thread bastien dine
Oh god, if we have some code with Accumulator after the env.execute(), this will not be executed on the JobManager too ? Thanks, I would be interested indeed ! -- Bastien DINE Data Architect / Software Engineer / Sysadmin bastiendine.io Le ven. 23 nov. 2018 à 16:37, Flavio

Multiple env.execute() into one Flink batch job

2018-11-23 Thread bastien dine
Hello, I need to chain processing in DataSet API, so I am launching severals jobs, with multiple env.execute() : topology1.define(); env.execute; topogy2.define(); env.execute; This is working fine when I am running it within IntellIiJ But when I am deploying it into my cluster, it only launch

Re: Call batch job in streaming context?

2018-11-23 Thread bastien dine
Hi Eric, You can run a job from another one, using the REST API This is the only way we have found to launch a batch job from a streaming job -- Bastien DINE Data Architect / Software Engineer / Sysadmin bastiendine.io Le ven. 23 nov. 2018 à 11:52, Piotr Nowojski a écrit

DataSet - Broadcast set in output format

2018-11-22 Thread bastien dine
Hello, I would like to use a broadcast variable in my outputformat (to pass some information, and control execution flow) How would I do it ? .output does not have a .withBroadcast function as it does not extends SingleInputUdfOperator -- Bastien DINE Data Architect / Software

Re: Metric on JobManager

2018-11-21 Thread bastien dine
} -- Bastien DINE Data Architect / Software Engineer / Sysadmin bastiendine.io Le mer. 21 nov. 2018 à 22:44, Jamie Grier a écrit : > What you're describing is not possible. There is no runtime context or > metrics you can use at that point. > > The best you can

Metric on JobManager

2018-11-21 Thread bastien dine
(i.e not in a operator) before & after the env.execute.. How can i retrieve the runtime context, from the execution env maybe ? Regards, Bastien -- Bastien DINE Data Architect / Software Engineer / Sysadmin bastiendine.io

Re: Counting DataSet in DataFlow

2018-11-07 Thread bastien dine
Hi Fabian, Thanks for the response, I am going to use the second solution ! Regards, Bastien -- Bastien DINE Data Architect / Software Engineer / Sysadmin bastiendine.io Le mer. 7 nov. 2018 à 14:16, Fabian Hueske a écrit : > Another option for certain tasks is to w

Counting DataSet in DataFlow

2018-11-07 Thread bastien dine
Hello, I would like to a way to count a dataset to check if it is empty or not.. But .count() throw an execution and I do not want to do separe job execution plan, as hthis will trigger multiple reading.. I would like to have something like.. Source -> map -> count -> if 0 -> do someting

Re: Flink - Process datastream in a bounded context (like Dataset) - Unifying stream & batch

2018-09-25 Thread bastien dine
Best, Hequn > [1] https://flink.apache.org/flink-applications.html#layered-apis > [2] > https://ci.apache.org/projects/flink/flink-docs-master/dev/table/sql.html#joins > > On Tue, Sep 25, 2018 at 4:14 PM bastien dine > wrote: > >> Hello everyone, >> >> I need

Flink - Process datastream in a bounded context (like Dataset) - Unifying stream & batch

2018-09-25 Thread bastien dine
Hello everyone, I need to join some files to perform some processing.. The dataset API is a perfect way to achieve this, I am able to do it when I read file in batch (csv) However in the prod environment, I will receive thoses files in kafka messages (one message = one line of a file) So I am