Re: Flink 1.9Version State TTL parameter configuration it does not work

2020-12-04 Thread Andrey Zagrebin
Hi Yang, (redirecting this to user mailing list as this is not a dev question) I am not sure why the state loading is stuck after enabling the compaction filter but the background cleanup of RocksDB state with TTL will not work without activating the filter. This happens on RocksDB opening in Fli

Re: Caching Mechanism in Flink

2020-11-19 Thread Andrey Zagrebin
Hi Iacovos, As Matthias mentioned tasks' off-heap has nothing to do with the memory segments. This memory component is reserved only for the user code. The memory segments are managed by Flink and used for batch workloads, like in memory joins etc. They are part of managed memory (taskmanager.mem

Re: Logs of JobExecutionListener

2020-11-19 Thread Andrey Zagrebin
t; > Which version are you using? > > I used the exact same commands on Flink 1.11.0 and I didn't get the job > > listener output.. > > > > Il gio 19 nov 2020, 12:53 Andrey Zagrebin ha > scritto: > > > >> Hi Flavio and Aljoscha, > >>

Re: Logs of JobExecutionListener

2020-11-19 Thread Andrey Zagrebin
lusterClient, you are > >> not actually using that manually, right? You're just submitting your job > >> via "bin/flink run ...", right? > >> > >> What's the exact invocation of "bin/flink run" that you're using? > >>

Re: Logs of JobExecutionListener

2020-11-19 Thread Andrey Zagrebin
Hi Flavio, I think I can reproduce what you are reporting (assuming you also pass '--output' to 'flink run'). I am not sure why it behaves like this. I would suggest filing a Jira ticket for this. Best, Andrey On Wed, Nov 18, 2020 at 9:45 AM Flavio Pompermaier wrote: > is this a bug or is it a

Re: Strange behaviour when using RMQSource in Flink 1.11.2

2020-11-18 Thread Andrey Zagrebin
Hi Thomas, I am not an expert on RMQSource connector but your concerns look valid. Could you file a Jira issue in Flink issue tracker? [1] I cannot immediately refer to a committer who could help with this but let's hope that the issue gets attention. If you want to contribute an improvement for

Re: Support of composite data types in flink-parquet

2020-10-20 Thread Andrey Zagrebin
Hi Jon, I have found this ticket [1]. It looks like what you are looking for. Best, Andrey [1] https://issues.apache.org/jira/browse/FLINK-17782 On Tue, Oct 20, 2020 at 4:50 PM Jon Alberdi wrote: > Hello, as stated at > https://ci.apache.org/projects/flink/flink-docs-stable/dev/table/connecto

Re: rename error in flink sql

2020-10-20 Thread Andrey Zagrebin
Hi, I am not an SQL expert but I would not expect the original POJO to match the new row with the renamed field. Maybe Timo or Dawid have to add something. Best, Andrey On Tue, Oct 20, 2020 at 4:56 PM 大森林 wrote: > > I'm learning "select"from > official document >

Re: what's the new version of createTemporaryView?

2020-10-20 Thread Andrey Zagrebin
Hi, if you check JavaDocs of the deprecated 'createTemporaryView', it suggests to use another overloaded method: void createTemporaryView(String path, DataStream dataStream, Expression... fields); I suppose it should be then: tEnv.createTemporaryView("orderA", orderA, $("user,product,amount"));

Re: flink session job retention time

2020-10-09 Thread Andrey Zagrebin
Hi Richard, If you mean the retention of completed jobs, there are following options: jobstore.cache-size [1] jobstore.expiration-time [2] jobstore.max-capacity [3] Best, Andrey [1] https://ci.apache.org/projects/flink/flink-docs-release-1.11/ops/config.html#jobstore-cache-size [2] https://ci.ap

Re: Flink DynamoDB stream connector losing records

2020-09-10 Thread Andrey Zagrebin
;m using Kinesis analytics application >> which supports only Flink 1.8 >> >> Regards, >> Jiawei >> >> On Thu, Sep 10, 2020 at 10:13 PM Andrey Zagrebin >> wrote: >> >>> Hi Jiawei, >>> >>> Could you try Flink latest

Re: Flink DynamoDB stream connector losing records

2020-09-10 Thread Andrey Zagrebin
Hi Jiawei, Could you try Flink latest release 1.11? 1.8 will probably not get bugfix releases. I will cc Ying Xu who might have a better idea about the DinamoDB source. Best, Andrey On Thu, Sep 10, 2020 at 3:10 PM Jiawei Wu wrote: > Hi, > > I'm using AWS kinesis analytics application with Flin

Re: Flink OnCheckpointRollingPolicy streamingfilesink

2020-08-29 Thread Andrey Zagrebin
ommendation for : env.getCheckpointConfig. > *setMaxConcurrentCheckpoints*(concurrentchckpt) ? > > 1 or higher based on what factor. > > > Regards, > Vijay > > > On Tue, Aug 25, 2020 at 8:55 AM Andrey Zagrebin > wrote: > >> Hi Vijay, >> >> I think it depends on your

Re: OOM error for heap state backend.

2020-08-26 Thread Andrey Zagrebin
to JVM heap size on a single node without the > node being used for other tasks ? > I have attached GC log for TM and JM > > > > [1] > https://ci.apache.org/projects/flink/flink-docs-stable/ops/state/state_backends.html#the-fsstatebackend > > Best, > Vishwas > > On W

Re: Example flink run with security options? Running on k8s in my case

2020-08-26 Thread Andrey Zagrebin
Hi Adam, maybe also check your SSL setup in a local cluster to exclude possibly related k8s things. Best, Andrey On Wed, Aug 26, 2020 at 3:59 PM Adam Roberts wrote: > Hey Nico - thanks for the prompt response, good catch - I've just tried > with the two security options (enabling rest and inte

Re: OOM error for heap state backend.

2020-08-26 Thread Andrey Zagrebin
er task manager, this creates a cascading > effect and affects other jobs running on the cluster. My tests were run in > a single node cluster with 1 TM and 4 task slots with a parallelism of 4. > > Best, > Vishwas > > On Tue, Aug 25, 2020 at 10:02 AM Andrey Zagrebin > wrote

Re: How jobmanager and task manager communicates with each other ?

2020-08-25 Thread Andrey Zagrebin
Hi Sidhant, (1) If we are not using Flink's HA services then how we can dynamically > configure task manager nodes to connect to job manager? Any suggestions or > best practices? Not sure what you mean by 'dynamically'. I think you have to restart the task manager with the new configuration to co

Re: Transition Flink job from Java to Scala with state migration

2020-08-25 Thread Andrey Zagrebin
Hi Daksh, You need to find which type causes the problem: Long, MyCustomObject or maybe something else. You could share the logs with full exception stack trace. My guess is that your scala code uses another serializer for the failing type. See also docs to understand serialization in Flink [1] Wh

Re: Flink OnCheckpointRollingPolicy streamingfilesink

2020-08-25 Thread Andrey Zagrebin
Hi Vijay, I think it depends on your job requirements, in particular how many records are processed per second and how much resources you have to process them. If the checkpointing interval is short then the checkpointing overhead can be too high and you need more resources to efficiently keep up

Re: OOM error for heap state backend.

2020-08-25 Thread Andrey Zagrebin
Hi Vishwas, If you use Flink 1.7, check the older memory model docs [1] because you referred to the new memory model of Flink 1.10 in your reference 2. Could you also share a screenshot where you get the state size of 2.5 GB? Do you mean Flink WebUI? Generally, it is quite hard to estimate the on-

Re: flink1.10中hive module 没有plus,greaterThan等函数

2020-08-25 Thread Andrey Zagrebin
Hi Faaron, This mailing list is for support in English. Could you translate your question into English? You can also subscribe to the user mailing list in Chinese to get support in Chinese [1] Best, Andrey [1] user-zh-subscr...@flink.apache.org On Fri, Aug 21, 2020 at 4:43 AM faaron zheng wrot

Re: Monitor the usage of keyed state

2020-08-25 Thread Andrey Zagrebin
Hi Mu, I would suggest to look into RocksDB metrics which you can enable as Flink metrics [1] Best, Andrey [1] https://ci.apache.org/projects/flink/flink-docs-release-1.11/ops/config.html#rocksdb-native-metrics On Fri, Aug 21, 2020 at 4:27 AM Mu Kong wrote: > Hi community, > > I have a Flink

Re: Is there a way to start a timer without ever receiving an event?

2020-08-12 Thread Andrey Zagrebin
I do not think so. Each timer in KeyedProcessFunction is associated with the key. The key is implicitly set into the context from the record which is currently being processed. On Wed, Aug 12, 2020 at 8:00 AM Marco Villalobos wrote: > In the Stream API KeyedProcessFunction,is there a way to star

Re: JM & TM readiness probe

2020-08-12 Thread Andrey Zagrebin
Hi Alexey, As far as I know, TaskManager does not expose the REST API. ResourceManager redirects some REST calls to TaskManager [1]: /taskmanagers/:taskmanagerid/metrics /taskmanagers/:taskmanagerid/thread-dump These calls may be not so lightweight. I do not know others or how you ask e.g. the sta

Re: Purpose of starting LeaderRetrievalService in DefaultDispatcherResourceManagerComponentFactory#create

2020-07-02 Thread Andrey Zagrebin
Hi Linlin, There may be a historic confusion in terminology. We often refer to 'JobManager' as a component which manages a single job. Names of all related classes usually contain 'JobManager'. At the same time, we can refer to it as a master process in Flink's cluster, potentially running multipl

Re: Performance issue associated with managed RocksDB memory

2020-06-26 Thread Andrey Zagrebin
Hi Juha, > I can also submit the more complex test with the bigger operator and and a > window operator. There's just gonna be more code to read. Can I attach a > file here or how should I submit a larger chuck of code? You can just attach the file with the code. > 2. I'm not sure what would / s

Re: Performance issue associated with managed RocksDB memory

2020-06-25 Thread Andrey Zagrebin
Hi Juha, Thanks for sharing the testing program to expose the problem. This indeed looks suboptimal if X does not leave space for the window operator. I am adding Yu and Yun who might have a better idea about what could be improved about sharing the RocksDB memory among operators. Best, Andrey O

Re: Is State TTL possible with event-time characteristics ?

2020-06-17 Thread Andrey Zagrebin
Hi Arti, Any program can use State with TTL but the state can only expire in processing time at the moment even if you configure event-time characteristics. As Congxian mentioned, the event time for TTL is planned. The cleanup says that it will not be removed 'by default'. The following sections

Re: Internal state and external stores conditional usage advice sought (dynamodb asyncIO)

2020-06-09 Thread Andrey Zagrebin
Hi Orionemail, There is no simple state access in asyncIO operator. I think this would require a custom caching solution. Maybe, other community users solved this problem in some other way. Best, Andrey On Mon, Jun 8, 2020 at 5:33 PM orionemail wrote: > Hi, > > Following on from an earlier em

Re: Flink on yarn : yarn-session understanding

2020-06-09 Thread Andrey Zagrebin
Hi Anuj, Afaik, the REST API should work for both modes. What is the issue? Maybe, some network problem to connect to YARN application master? Best, Andrey On Mon, Jun 8, 2020 at 4:39 PM aj wrote: > I am running some stream jobs that are long-running always. I am currently > submitting each jo

Re: Run command after Batch is finished

2020-06-08 Thread Andrey Zagrebin
Hi Mark, I do not know how you output the results in your pipeline. If you use DataSet#output(OutputFormat outputFormat), you could try to extend the format with a custom close method which should be called once the task of the sink batch operator is done in the task manager. I also cc Aljoscha, m

Re: Data Quality Library in Flink

2020-06-08 Thread Andrey Zagrebin
Hi Anuj, I am not familiar with data quality measurement methods and deequ in depth. What you describe looks like monitoring some data metrics. Maybe, there are other community users aware of better solution. Meanwhile, I would recommend to implement the checks a

Re: The trigger of State TTL

2020-06-02 Thread Andrey Zagrebin
Hi Lec Ssmi, It depends on the state backend you use. The heap state backend needs that either some state access happens or some records get processed in the operator [1]. The RocksDB requires that the state size is big enough (roughly speaking) to trigger the compaction process to clean the expir

Re: Memory issue in Flink 1.10

2020-05-27 Thread Andrey Zagrebin
Hi Steve, RocksDB does not contribute to the JVM direct memory. RocksDB off-heap memory consumption is part of managed memory [1]. You got `OutOfMemoryError: Direct buffer memory` which is related to the JVM direct memory, also off-heap but managed by JVM. The JVM direct memory limit depends on t

Re: Flink 1.8.3 Kubernetes POD OOM

2020-05-24 Thread Andrey Zagrebin
t; Hi Andrey, > We don't use Rocks DB. As I said in the original email I am using File > Based. Even though our cluster is on Kubernetes out Flink cluster is Flink's > stand alone resource manager. We have not yet integrated our Flink with > Kubernetes. > >

Re: Flink 1.8.3 Kubernetes POD OOM

2020-05-22 Thread Andrey Zagrebin
Hi Josson, Do you use state backend? is it RocksDB? Best, Andrey On Fri, May 22, 2020 at 12:58 PM Fabian Hueske wrote: > Hi Josson, > > I don't have much experience setting memory bounds in Kubernetes myself, > but my colleague Andrey (in CC) reworked Flink's memory configuration for > the las

Re: Issue with single job yarn flink cluster HA

2020-04-02 Thread Andrey Zagrebin
that too small value >>>> will also cause unexpected failover because of network problem. >>>> >>>> >>>> Best, >>>> Yang >>>> >>>> Dinesh J 于2020年3月25日周三 下午4:20写道: >>>> >>>>> Hi Andrey, >>>

Re: Issue with single job yarn flink cluster HA

2020-03-24 Thread Andrey Zagrebin
Hi Dinesh, If the current leader crashes (e.g. due to network failures) then getting these messages do not look like a problem during the leader re-election. They look to me just as warnings that caused failover. Do you observe any problem with your application? Does the failover not work, e.g. n

Re: [DISCUSS] FLIP-111: Docker image unification

2020-03-22 Thread Andrey Zagrebin
l > configuration options (e.g., flink_docker_utils set_web_ui_port 8081). It > should be fine enough to do it via flink_docker-utils conifgure rest.port > 8081 if we cannot solve it via the general configuration mechanism. > > Cheers, > Till > > On Wed, Mar 18, 2020 a

Re: Flink long state TTL Concerns

2020-03-20 Thread Andrey Zagrebin
cluster running one job. >>> The total parallelism of the whole cluster is equal to the number of >>> taskmanagers where each task manager has 1 core cpu accounting for 1 slot. >>> If we add a state ttl, do you have any recommendation as to how much I >>> should bu

Re: Flink long state TTL Concerns

2020-03-19 Thread Andrey Zagrebin
Hi Matt, Generally speaking, using state with TTL in Flink should not differ a lot from just using Flink with state [1]. You have to provision your system so that it can keep the state of size which is worth of 7 days. The existing Flink state backends provide background cleanup to automatically

Re: Upgrade flink fail from 1.9.1 to 1.10.0

2020-03-18 Thread Andrey Zagrebin
Hi Reo, I do not think this is always guaranteed by Flink API. The usual supported way is to: - take a savepoint - upgrade the cluster (JM and TM) - maybe rebuild the job against the new Flink version - start the job from the savepoint [1] The externalised checkpoints also do not have to be alwa

Re: Question about RocksDBStateBackend Compaction Filter state cleanup

2020-03-17 Thread Andrey Zagrebin
Hi Lake, When the Flink doc mentions a state entry in RocksDB, we mean one key/value pair stored by user code over any keyed state API (keyed context in keyed operators obtained e.g. from keyBy() transformation). In case of map or list, the doc means map key/value and list element. - value/aggreg

Re: [DISCUSS] FLIP-111: Docker image unification

2020-03-16 Thread Andrey Zagrebin
> > > I would second that it is desirable to support Java 11 and in general > use a base image that allows the (straightforward) use of more recent > versions of other software (Python etc.) > > > > > https://github.com/apache/flink-docker/blob/d3416e720377e9b4c07a2d0f4

Re: Restoring state from an incremental RocksDB checkpoint

2020-03-13 Thread Andrey Zagrebin
//ci.apache.org/projects/flink/flink-docs-release-1.10/ops/state/checkpoints.html#directory-structure> > On 14 Mar 2020, at 00:12, Andrey Zagrebin wrote: > > Hi Yuval, > > You should be able to restore from the last checkpoint by restarting the job > with the same checkpoint

Re: Restoring state from an incremental RocksDB checkpoint

2020-03-13 Thread Andrey Zagrebin
Hi Yuval, You should be able to restore from the last checkpoint by restarting the job with the same checkpoint directory. An incremental part is removed only if none of retained checkpoints points to it. Best, Andrey > On 13 Mar 2020, at 16:06, Yuval Itzchakov wrote: > > Hi, > > We're usin

Re: [Survey] Default size for the new JVM Metaspace limit in 1.10

2020-03-13 Thread Andrey Zagrebin
by the suggested change. Any feedback is appreciated. Best, Andrey On Tue, Mar 3, 2020 at 6:35 PM Andrey Zagrebin wrote: > Hi All, > > Recently, FLIP-49 [1] introduced the new JVM Metaspace limit in the 1.10 > release [2]. Flink scripts, which start the task manager JVM process

Re: [DISCUSS] FLIP-111: Docker image unification

2020-03-10 Thread Andrey Zagrebin
onvenient for users without any learning curve. >> `docker run flink session_jobmanager -D rest.bind-port=8081` >> >> >> > About the logging >> >> Updating the `log4j-console.properties` to support multiple appender is a >> better option. >> Currently,

Re: Flink Deployment failing with RestClientException

2020-03-05 Thread Andrey Zagrebin
Hi Samir, It may be a known issue [1][2] where some action during job submission takes too long time but eventually completes in job manager. Have you checked job manager logs whether there are any other failures, not “Ask timed out"? Have you checked Web UI whether all the jobs have been starte

[DISCUSS] FLIP-111: Docker image unification

2020-03-04 Thread Andrey Zagrebin
Hi All, If you have ever touched the docker topic in Flink, you probably noticed that we have multiple places in docs and repos which address its various concerns. We have prepared a FLIP [1] to simplify the perception of docker topic in Flink by users. It mostly advocates for an approach of exte

[Survey] Default size for the new JVM Metaspace limit in 1.10

2020-03-03 Thread Andrey Zagrebin
Hi All, Recently, FLIP-49 [1] introduced the new JVM Metaspace limit in the 1.10 release [2]. Flink scripts, which start the task manager JVM process, set this limit by adding the corresponding JVM argument. This has been done to properly plan resources. especially to derive container size for Yar

Re: Colocating Sub-Tasks across jobs / Sharing Task Slots across jobs

2020-02-27 Thread Andrey Zagrebin
Hi Ben, I think at the moment, it is not possible because of current scheduling design which Xintong has already mentioned. The jobs are completely isolated and there is no synchronisation between their deployment. Alignment of tasks by e.g. key groups in general is difficult as it is up to the sc

Re: Tests in FileUtilsTest while building Flink in local

2020-02-21 Thread Andrey Zagrebin
These tests also fail on my mac. It may be some mac os setup related issue. I create a JIRA ticket for that: https://issues.apache.org/jira/browse/FLINK-16198 > On 20 Feb 2020, at 12:03, Chesnay Schepler wrote: > > Is the stacktrace identical in both tests? > > Did these fail on the command-l

Re: Rescaling a running topology

2020-02-11 Thread Andrey Zagrebin
Hi Stephen, I am sorry that you had this experience with the rescale API. Unfortunately, the rescale API was always experimental and had some flaws. Recently, Flink community decided to disable it temporarily with the 1.9 release, see more explanation here [1]. I would advise the manual rescaling

Re: Task-manager kubernetes pods take a long time to terminate

2020-02-07 Thread Andrey Zagrebin
Hi guys, It looks suspicious that the TM pod termination is potentially delayed by the reconnect to a killed JM. I created an issue to investigate this: https://issues.apache.org/jira/browse/FLINK-15946 Let's continue the discussion there. Best, Andrey On Wed, Feb 5, 2020 at 11:49 AM Yang Wang

Re: NoClassDefFoundError when an exception is throw inside invoke in a Custom Sink

2020-02-06 Thread Andrey Zagrebin
Hi David, This looks like a problem with resolution of maven dependencies or something. The custom WindowParquetGenericRecordListFileSink probably transitively depends on org/joda/time/format/DateTimeParserBucket and it is missing on the runtime classpath of Flink. Best, Andrey On Wed, Feb 5, 20

Re: [DISCUSS] Change default for RocksDB timers: Java Heap => in RocksDB

2020-01-16 Thread Andrey Zagrebin
Hi Stephan, Thanks for starting this discussion. I am +1 for this change. In general, number of timer state keys can have the same order as number of main state keys. So if RocksDB is used for main state for scalability, it makes sense to have timers there as well unless timers are used for only v

Re: Scala ListBuffer cannot be used as a POJO type in Flink

2019-12-17 Thread Andrey Zagrebin
Hi Utopia, There were already couple of hints in comments to your stack overflow questions about immutability. In general, I would also recommend this because when you work with Flink state the general API contract is that if you update the your state object (schoolDescriptor) you have to call

Re: Job manager is failing to start with an S3 no key specified exception [1.7.2]

2019-12-10 Thread Andrey Zagrebin
`flink-2`Hi Harshith, Could you share your full log files from the job master? As I understand, this stack trace already belongs to a failover attempt, what was the original cause of failover? Do you still have any other job state in S3 for this cluster id `flink-2`? Have you tried the latest vers

Re: Emit intermediate accumulator state of a session window

2019-12-05 Thread Andrey Zagrebin
Hi Chandu, I am not sure whether using the windowing API is helpful in this case at all. At least, you could try to consume the data not only by windowing but also by a custom stateful function. You look into the AggregatingState [1]. Then you could do whatever you want with the current aggregate

[PROPOSAL/SURVEY] Enable background cleanup for state with TTL by default

2019-11-27 Thread Andrey Zagrebin
Hi all, We were thinking about enabling background cleanup for the state with TTL by default: StateTtlConfig#Builder#cleanupInBackground() Previously, we did not have it in the first implementation of TTL if you remember. So technically, we were a bit conservative to not enable it by default at o

Re: ProcessFunction Timer

2019-10-18 Thread Andrey Zagrebin
Hi Navneeth, You could also apply filtering on the incoming records before windowing. This might save you some development effort but I do not know full details of your requirement whether filtering is sufficient. In general, you can use timers as you suggested as the windowing itself works in a s

Re: about Kafka sink and 2PC function

2019-10-18 Thread Andrey Zagrebin
Hi, This is the contract of 2PC transactions. Multiple commit retries should result in only one commit which actually happens in the external system. The external system has to support deduplication of committed transactions, e.g. by some unique id. Best, Andrey > On 10 Oct 2019, at 07:15, 121

Re: [ANNOUNCE] Andrey Zagrebin becomes a Flink committer

2019-08-19 Thread Andrey Zagrebin
t; >> > >> Congratulations Andrey! >> > >> >> > >> On Wed, Aug 14, 2019 at 10:14 PM chaojianok >> wrote: >> > >> >> > >> > Congratulations Andrey! >> > >> > At 2019-08-14 21:26:37, "Till Rohrm

Re: Error while running flink job on local environment

2019-07-30 Thread Andrey Zagrebin
Hi Vinayak, the error message provides a hint about changing config options, you could try to use StreamExecutionEnvironment.createLocalEnvironment(2, customConfig); to increase resources. this issue might also address the problem, it will be part of 1.9 release: https://issues.apache.org/jira/bro

Re: StreamingFileSink part file count reset

2019-07-30 Thread Andrey Zagrebin
Hi Sidhartha, This is a general limitation now because Flink does not keep counters for all buckets but only a global one. Flink assumes that the sink can write to any bucket any time and the counter is not reset to not rewrite the previously written file number 0. Best, Andrey On Tue, Jul 30, 2

Re: Job submission timeout with no error info.

2019-07-22 Thread Andrey Zagrebin
og is at: job > manager log > <https://drive.google.com/file/d/1iNOs2E69jevF9pu1t7uw6Gj2XZKJoWpC/view?usp=sharing> > . > > > > Thanks, > > -Fakrudeen > > (define (sqrte n xn eph) (if (> eph (abs (- n (* xn xn xn (sqrte n (/ > (+ xn (/ n xn)) 2) eph))) >

Re: Job submission timeout with no error info.

2019-07-19 Thread Andrey Zagrebin
Hi Fakrudeen, which Flink version do you use? could you share full client and job manager logs? Best, Andrey On Fri, Jul 19, 2019 at 7:00 PM Fakrudeen Ali Ahmed wrote: > Hi, > > > > We are submitting a Flink topology [YARN] and it fails during upload of > the jar with no error info. > > > > [m

Re: Consuming data from dynamoDB streams to flink

2019-07-19 Thread Andrey Zagrebin
Hi Vinay, 1. I would assume it works similar to kinesis connector (correct me if wrong, people who actually developed it) 2. If you have activated just checkpointing, the checkpoints are gone if you externally kill the job. You might be interested in savepoints [1] 3. See paragraph in [2] about ki

Re: Apache Flink - Event time and process time timers with same timestamp

2019-07-19 Thread Andrey Zagrebin
Hi, Event and processing time timers have independent state storage. You can use both independently, so I would expect two firings with different domains. `TimeCharacteristic` is for operations where you do not explicitly tell the time type, like windowing. Best, Andrey On Fri, Jul 19, 2019 at 8

Re: Checkpoints timing out for no apparent reason

2019-07-19 Thread Andrey Zagrebin
Hi Sergei, If you want just to try increasing the timeouts, you could change the checkpoint timeout in env.getCheckpointConfig().setCheckpointTimeout(...) [1] or s3 client timeouts (see presto or hdfs for s3 configuration, there are some network timeouts) [2]. Otherwise it would be easier to inve

Re: Queryable state and TTL

2019-07-03 Thread Andrey Zagrebin
Hi Avi, It is on the road map but I am not aware about plans of any contributor to work on it for the next releases. I think the community will firstly work on the event time support for TTL. I will loop Yu in, maybe he has some plans to work on TTL for the queryable state. Best, Andrey On Wed,

Re: Could not load the native RocksDB library

2019-07-03 Thread Andrey Zagrebin
Hi Samya, Additionally to Haibo's answer: Have you tried the previous 1.7 version of Flink? The Rocksdb version was upgraded in 1.8 version. Best, Andrey On Wed, Jul 3, 2019 at 5:21 AM Haibo Sun wrote: > Hi, Samya.Patro > > I guess this may be a setup problem. What OS and what version of JDK

Re: Received fatal alert: certificate_unknown

2019-05-17 Thread Andrey Zagrebin
Hi Pedro, thanks for letting know! Best, Andrey On Fri, May 17, 2019 at 4:29 PM PedroMrChaves wrote: > We found the issue. > > It was using the DNSName for the certificate validation and we were > accessing via localhost. > > > > - > Best Regards, > Pedro Chaves > -- > Sent from: > http://

Re: flink 1.4.2. java.lang.IllegalStateException: Could not initialize operator state backend

2019-05-16 Thread Andrey Zagrebin
The stack trace shows that the state is being restored which has probably already happened after job restart. I am wondering why it has been restarted after some time of running. Could you share full job/task manager logs? On Thu, May 16, 2019 at 6:26 AM anaray wrote: > Thank You Andrey. Arity o

Re: Getting java.lang.BootstrapMethodError: java.lang.NoClassDefFoundError when stopping/canceling job.

2019-05-16 Thread Andrey Zagrebin
flink.runtime.execution.librarycache.FlinkUserCodeClassLoaders$ChildFirstClassLoader.loadClass(FlinkUserCodeClassLoaders.java:129) > at java.lang.ClassLoader.loadClass(ClassLoader.java:357) > ... 10 more > > On Wed, 15 May 2019 at 12:00, Andrey Zagrebin > wrote: > >> Hi John, >> &g

Re: Checkpoints periodically fail with hdfs as the state backend - Could not flush and close the file system output stream

2019-05-16 Thread Andrey Zagrebin
Hi, could you also post job master logs? and ideally full task manager logs. This failure can be caused by some other previous failure. Best, Andrey On Wed, May 15, 2019 at 2:48 PM PedroMrChaves wrote: > Hello, > > Every once in a while our checkpoints fail with the following exception: > > /A

Re: Getting java.lang.BootstrapMethodError: java.lang.NoClassDefFoundError when stopping/canceling job.

2019-05-15 Thread Andrey Zagrebin
Hi John, could you share the full stack trace or better logs? It looks like something is trying to be executed in vertx.io code after the local task has been stopped and the class loader for the user code has been unloaded. Maybe from some daemon thread pool. Best, Andrey On Wed, May 15, 2019 a

Re: flink 1.4.2. java.lang.IllegalStateException: Could not initialize operator state backend

2019-05-15 Thread Andrey Zagrebin
Hi, I am not sure that FLINK-8836 is related to the failure in the stack trace. You say you are using Flink in production, does it mean it always worked and has started to fail recently? >From the stack trace, it looks like the arity of some Tup

Re: Table program cannot be compiled

2019-05-15 Thread Andrey Zagrebin
Hi, I am looping in Timo and Dawid to look at the problem. On Tue, May 14, 2019 at 9:12 PM shkob1 wrote: > BTW looking at past posts on this issue[1] it should have been fixed? i'm > using version 1.7.2 > Also the recommendation was to use a custom function, though that's exactly > what im doing

Re: Migrating Existing TTL State to 1.8

2019-05-15 Thread Andrey Zagrebin
Hi Ning, If you have not activated non-incremental checkpointing then taking a savepoint is the only way to trigger the full snapshot. In any case, it will take time. The incremental cleanup strategy is applicable only for heap state backend and does nothing for RocksDB backend. At the moment, yo

Re: [Discuss] Semantics of event time for state TTL

2019-04-15 Thread Andrey Zagrebin
Hi everybody, Thanks a lot for your detailed feedback on this topic. It looks like we can already do some preliminary wrap-up for this discussion. As far as I see we have the following trends: *Last access timestamp: **Event timestamp of currently being processed record* *Current timestamp to c

[Discuss] Semantics of event time for state TTL

2019-04-04 Thread Andrey Zagrebin
Hi All, As you might have already seen there is an effort tracked in FLINK-12005 [1] to support event time scale for state with time-to-live (TTL) [2]. While thinking about design, we realised that there can be multiple options for semantics of this feature, depending on use case. There is also so

Re: How can I get the right TaskExecutor in ProcessFunction

2019-04-02 Thread Andrey Zagrebin
Hi, What kind of information do you need about the TaskExecutor? This is usually quite low level type of information which might change randomly, e.g. after restore. What is the original problem why you need it? Maybe, there is another solution. E.g. you can get index of local parallel subtask of

Re: Async Function Not Generating Backpressure

2019-03-21 Thread Andrey Zagrebin
e case, we prefer the backpressure to slow down the source so > that the write to Cassandra is not delayed while the source is consuming > fast. > > Thanks, > Seed > > On Wed, Mar 20, 2019 at 9:38 AM Andrey Zagrebin > wrote: > >> Hi Seed, >> >> Sorry

Re: Async Function Not Generating Backpressure

2019-03-20 Thread Andrey Zagrebin
Hi Seed, Sorry for confusion, I see now it is separate. Back pressure should still be created because internal async queue has capacity but not sure about reporting problem, Ken and Till probably have better idea. As for consumption speed up, async operator creates another thread to collect the r

Re: ingesting time for TimeCharacteristic.IngestionTime on unit test

2019-03-19 Thread Andrey Zagrebin
Hi Avi, do you use processing time timer (timerService().registerProcessingTimeTimer)? why do you need ingestion time? do you set TimeCharacteristic.IngestionTime? Best, Andrey On Tue, Mar 19, 2019 at 1:11 PM Avi Levi wrote: > Hi, > Our stream is not based on time sequence and we do not use ti

Re: Flink 1.7.2 extremely unstable and losing jobs in prod

2019-03-19 Thread Andrey Zagrebin
Hi Bruno, could you also share the job master logs? Thanks, Andrey On Tue, Mar 19, 2019 at 12:03 PM Bruno Aranda wrote: > Hi, > > This is causing serious instability and data loss in our production > environment. Any help figuring out what's going on here would be really > appreciated. > > We

Re: End-to-end exactly-once semantics in simple streaming app

2019-03-19 Thread Andrey Zagrebin
Hi Patrick, One approach, I would try, is to use Flink state and sync it with database in initializeState and CheckpointListener.notifyCheckpointComplete. Basically issue only idempotent updates to database but only when the last checkpoint is securely taken and records before it are not processed

Re: Async Function Not Generating Backpressure

2019-03-19 Thread Andrey Zagrebin
AsyncFunctionHi Seed, I think the back pressure should emerge by blocking the AsyncFunction.asyncInvoke call. So it depends on how ResultFuture is generated from Cassandra client whether it blocks on submitting request or not when the number of pending requests is too big. Maybe, AsyncFunction.asy

Re: Re: Flink 1.7.2: All jobs are getting deployed on the same task manager

2019-03-19 Thread Andrey Zagrebin
Hi Kumar and Andrea, this is a known change in Flink behaviour from 1.4 to 1.5 (after FLIP-6). There is an issue to track progress on more fine-grained task distribution [1]. Best, Andrey [1] https://issues.apache.org/jira/browse/FLINK-11815 On Mon, Mar 18, 2019 at 1:28 PM Kumar Bolar, Harshith

Re: Iterator Data Sync

2019-03-19 Thread Andrey Zagrebin
Hi Mikhail, could you create a JIRA issue to discuss the change? Best, Andrey On Mon, Mar 18, 2019 at 3:10 PM Mikhail Pryakhin wrote: > Hello Flink community! > > I've come across of employing an "Iterator Data Sync"[1] approach to test > output from a streaming pipeline. The pipeline consists

Re: Is there a Flink DataSet equivalent to Spark's RDD.persist?

2019-02-26 Thread Andrey Zagrebin
Hi Frank, This feature is currently under discussion. You can follow it in this issue: https://issues.apache.org/jira/browse/FLINK-11199 Best, Andrey On Thu, Feb 21, 2019 at 7:41 PM Frank Grimes wrote: > Hi, > > I'm trying to port an existing Spark job to Flink and have gotten stuck on > the s

Re: Flink Streaming Job Task is getting cancelled silently and causing job to restart

2019-02-25 Thread Andrey Zagrebin
Hi Sohi, There seems to be no avro implementations of Encoder interface used in StreamingFileSink but maybe it could be implemented based on AvroKeyValueWriter with not such a big effort. There is also a DefaultRollingPolicy which is based on time and number of records. It might create a temporar

Re: Metrics for number of "open windows"?

2019-02-25 Thread Andrey Zagrebin
Hi Andrew, Just to add the Rong's answer, if you use RocksDB state backend, you can activate state metrics forwarded from RocksDB [1]. Best, Andrey [1] https://ci.apache.org/projects/flink/flink-docs-stable/monitoring/metrics.html#rocksdb On Thu, Feb 21, 2019 at 11:22 PM Rong Rong wrote: > Hi

Re: Flink Streaming Job Task is getting cancelled silently and causing job to restart

2019-02-25 Thread Andrey Zagrebin
Hi Sohi, I would also recommend trying the newer StreamingFileSink which is available in Flink 1.7.x [1]. Best, Andrey [1] https://ci.apache.org/projects/flink/flink-docs-release-1.7/dev/connectors/streamfile_sink.html On Sun, Feb 24, 2019 at 4:14 AM sohimankotia wrote: > Hi Erik, > > Are you

Re: Flink 1.6.1 Kerberos configuration

2019-02-22 Thread Andrey Zagrebin
Hi Marke, which storage layer causes the problem? Not sure, but some implementations might use different approaches internally and not update ticket automatically or use hadoop/jaas security. Best, Andrey On Fri, Feb 22, 2019 at 9:45 AM Marke Builder wrote: > Hello, > > I'm using flink 1.6.1 f

Re: Jira issue Flink-11127

2019-02-22 Thread Andrey Zagrebin
cc alek...@ververica.com On Fri, Feb 22, 2019 at 1:28 AM Boris Lublinsky < boris.lublin...@lightbend.com> wrote: > Adding metric-query port makes it a bit better, but there is still an error > > > 019-02-22 00:03:56,173 INFO > org.apache.flink.runtime.taskexecutor.TaskExecutor- Could

Re: Production readiness

2019-02-15 Thread Andrey Zagrebin
Hi Aitozi, Flink will check upon job start and fail if - max parallelism > parallelism (KeyGroupRangeAssignment.computeKeyGroupRangeForOperatorIndex) or - max parallelism of savepoint > max parallelism of restored job (Checkpoints.loadAndValidateCheckpoint). Theoretically that would be possible w

Re: Impact of occasional big pauses in stream processing

2019-02-14 Thread Andrey Zagrebin
Hi Ajay, Technically, it will immediately block the thread of MyKeyedProcessFunction subtask scheduled to some slot and basically block processing of the key range assigned to this subtask. Practically, I agree with Rong's answer. Depending on the topology of your inputStream, it can eventually bl

  1   2   >