Re: Flink-1.13 注册 UDAF accumulator 类型识别失败

2021-07-14 Thread Tianwang Li
测试验证可行,thx ! @FunctionHint( accumulator = @DataTypeHint(value = "RAW", bridgedTo = HyperLogLogPlus.class), input = @DataTypeHint("STRING"), output = @DataTypeHint("BIGINT") ) public void accumulate(HyperLogLogPlus acc, String id) { acc.offer(id); } Caizhi Weng

Re: flink大窗口性能问题

2021-07-14 Thread Wanghui (HiCampus)
并行度增大也可以吗? On 2021/07/15 02:45:18, "Michael Ran" mailto:g...@163.com>> wrote: > 要么内存增大,或者并行增大,要么窗口改小,同时保留数据时间减少> > 在 2021-07-15 10:23:25,"Hui Wang" > <46...@qq.com.INVALID> 写道:> > >flink大窗口缓存数据量过大导致jvm频烦full gc,并且处理速度极低,最终OOM,该如何调优> > >

Re: flink大窗口性能问题

2021-07-14 Thread Wanghui (HiCampus)
我在aarch64 + jre 8的环境下,使用rocksdb state backend时,碰到如下错误: 另外,使用rocksdb可以解决大窗口的oom问题吗,原理是什么? Caused by: java.lang.Exception: Exception while creating StreamOperatorStateContext. at

回复:flink大窗口性能问题

2021-07-14 Thread Hui Wang
我的系统是aarch64的,在使用rocksdb作为statebackend时,出现以下错误 Caused by: java.io.IOException: Could not load the native RocksDB library at org.apache.flink.contrib.streaming.state.RocksDBStateBackend.ensureRocksDBIsLoaded(RocksDBStateBackend.java:948) at

Re: Flink-1.13 注册 UDAF accumulator 类型识别失败

2021-07-14 Thread Caizhi Weng
Hi! Flink 1.11 以来对自动类型推导进行了一些修改。可能需要添加一些 annotations 才能推导类型。详见文档 https://ci.apache.org/projects/flink/flink-docs-release-1.13/zh/docs/dev/table/functions/udfs/#%e7%b1%bb%e5%9e%8b%e6%8e%a8%e5%af%bc Tianwang Li 于2021年7月15日周四 上午10:14写道: > Flink-1.13 注册 UDAF accumulator 类型识别失败, >

flink UDF使用jsonpath解析出错

2021-07-14 Thread casel.chen
项目用到了下面依赖,使用flink-shaded-hadoop-2-uber因为使用到了YarnExecutor提交作业 org.apache.flink flink-table-planner-blink_2.12 1.12.1 org.apache.flink flink-shaded-hadoop-2-uber 2.8.3-10.0

Re: flink大窗口性能问题

2021-07-14 Thread Jingsong Li
没用rocksdb吗? On Thu, Jul 15, 2021 at 10:46 AM Michael Ran wrote: > 要么内存增大,或者并行增大,要么窗口改小,同时保留数据时间减少 > 在 2021-07-15 10:23:25,"Hui Wang" <463329...@qq.com.INVALID> 写道: > >flink大窗口缓存数据量过大导致jvm频烦full gc,并且处理速度极低,最终OOM,该如何调优 > -- Best, Jingsong Lee

Re:flink大窗口性能问题

2021-07-14 Thread Michael Ran
要么内存增大,或者并行增大,要么窗口改小,同时保留数据时间减少 在 2021-07-15 10:23:25,"Hui Wang" <463329...@qq.com.INVALID> 写道: >flink大窗口缓存数据量过大导致jvm频烦full gc,并且处理速度极低,最终OOM,该如何调优

flink大窗口性能问题

2021-07-14 Thread Hui Wang
flink大窗口缓存数据量过大导致jvm频烦full gc,并且处理速度极低,最终OOM,该如何调优

?????? flink ??????????????

2021-07-14 Thread ????????????????
?? ---- ??: "user-zh"

Flink-1.13 注册 UDAF accumulator 类型识别失败

2021-07-14 Thread Tianwang Li
Flink-1.13 注册 UDAF accumulator 类型识别失败, 在Flink-1.10的时候是可以的。 请问在新的版本要如何注册写UDAF?? 错误信息: Caused by: org.apache.flink.table.api.ValidationException: SQL validation failed. An error occurred in the type inference logic of function 'default_catalog.default_database.hlp_count'. at

Re: High DirectByteBuffer Usage

2021-07-14 Thread bat man
Hi Timo, I am looking at these options. However, I had a couple of questions - 1. The off-heap usage grows overtime. My job does not do any off-heap operations so I don't think there is a leak there. Even after GC it keeps adding a few MBs after hours of running. 2. Secondly, I am seeing as the

Re: Flink UDF Scalar Function called only once for all rows of select SQL in case of no parameter passed

2021-07-14 Thread shamit jain
Thanks!! On 2021/07/14 02:26:47, JING ZHANG wrote: > Hi, Shamit Jain, > In fact, it is an optimization to simplify expression. > If a Udf has no parameters, optimizer would be look it as an expression > which always generate constants results. > So it would be calculated once in optimization

Re: Kafka Consumer stop consuming data

2021-07-14 Thread Aeden Jameson
Yes, that’s the scenario I was referring to. Besides that I’m not sure what could be another source of your issue. On Tue, Jul 13, 2021 at 5:35 PM Jerome Li wrote: > Hi Aeden, > > > > Thanks for getting back. > > > > Do you mean one of the partitions is in idle state and not new watermark >

Re: java.lang.Exception: Could not complete the stream element: Record @ 1626200691540 :

2021-07-14 Thread Ragini Manjaiah
Hi, According to the suggestion I override timeout method in the async function . flink jobs processes real time events for few mins and later hangs does process at all. Is there any issue with the method below? I see 0 records per second . can you please help here @Override public void

Re: Dead Letter Queue for JdbcSink

2021-07-14 Thread Maciej Bryński
This is the idea. Of course you need to wrap more functions like: open, close, notifyCheckpointComplete, snapshotState, initializeState and setRuntimeContext. The problem is that if you want to catch problematic record you need to set batch size to 1, which gives very bad performance. Regards,

Re: Dead Letter Queue for JdbcSink

2021-07-14 Thread Rion Williams
Hi Maciej, Thanks for the quick response. I wasn't aware of the idea of using a SinkWrapper, but I'm not quite certain that it would suit this specific use case (as a SinkFunction / RichSinkFunction doesn't appear to support side-outputs). Essentially, what I'd hope to accomplish would be to pick

Re: Dead Letter Queue for JdbcSink

2021-07-14 Thread Maciej Bryński
Hi Rion, We have implemented such a solution with Sink Wrapper. Regards, Maciek śr., 14 lip 2021 o 16:21 Rion Williams napisał(a): > > Hi all, > > Recently I've been encountering an issue where some external dependencies or > process causes writes within my JDBCSink to fail (e.g. something is

Re: Question about POJO rules - why fields should be public or have public setter/getter?

2021-07-14 Thread Timo Walther
Hi Naehee, the serializer for case classes is generated using the Scala macro that is also responsible for extracting the TypeInformation implcitly from your DataStream API program. It should be possible to use POJO serializer with case classes. But wouldn't it be easier to just use regular

Re: Stateful Functions PersistentTable duration

2021-07-14 Thread Ammon Diether
Excellent Thank you. On Wed, Jul 14, 2021 at 5:53 AM Igal Shilman wrote: > Hi Ammon, > > The duration is per item, and the cleanup happens transparently and > incrementally via RocksDB (background compactions with a custom filter) [1] > > In your example a gets cleaned up, while b will be

Re: Running Flink Dataset jobs Sequentially

2021-07-14 Thread Ken Krugler
Hi Jason, Yes, I write the files inside of the mapPartition function. Note that you can get multiple key groups inside of one partition, so you have to manage your own map from the key group to the writer. The Flink DAG ends with a DiscardingSink, after the mapPartition. And no, we didn’t

Dead Letter Queue for JdbcSink

2021-07-14 Thread Rion Williams
Hi all, Recently I've been encountering an issue where some external dependencies or process causes writes within my JDBCSink to fail (e.g. something is being inserted with an explicit constraint that never made it's way there). I'm trying to see if there's a pattern or recommendation for

Re: Flink 1.13.1 PartitionNotFoundException

2021-07-14 Thread Debraj Manna
I have increased it to 9 and seems to be running fine. If I see the failure still when I add some load I will post back in this thread. On Wed, Jul 14, 2021 at 7:19 PM Debraj Manna wrote: > Yes I forgot to mention in my first email. I have tried increasing >

Re: [External] NullPointerException on accumulator after Checkpointing

2021-07-14 Thread Timo Walther
Hi Clemens, first of all can you try to use the MapView within an accumulator POJO class. This might solve your exception. I'm not sure if we support the views as top-level accumulators. In any case this seems to be a bug. I will open an issue once I get you feedback. We might simply throw

Re: Upsert Kafka SQL Connector used as a sink does not generate an upsert stream

2021-07-14 Thread Timo Walther
Hi Carlos, currently, the changelog output might not always be optimal. We are continously improving this. For the upsert Kafka connector, we have added an reducing buffer to avoid those tombstone messages: https://issues.apache.org/jira/browse/FLINK-21191 Unfortunately, this is only

Re: High DirectByteBuffer Usage

2021-07-14 Thread Timo Walther
Hi Hemant, did you checkout the dedicated page for memory configuration and troubleshooting: https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/memory/mem_trouble/#outofmemoryerror-direct-buffer-memory

Re: Stateful Functions PersistentTable duration

2021-07-14 Thread Igal Shilman
Hi Ammon, The duration is per item, and the cleanup happens transparently and incrementally via RocksDB (background compactions with a custom filter) [1] In your example a gets cleaned up, while b will be cleaned in ~10min. Kind regards, Igal. [1]

Re: Flink 1.13.1 PartitionNotFoundException

2021-07-14 Thread Debraj Manna
Yes I forgot to mention in my first email. I have tried increasing taskmanager.network.request-backoff.max to 3 in flink-conf.yaml. But I am getting the same error. On Wed, Jul 14, 2021 at 7:10 PM Timo Walther wrote: > Hi Debraj, > > I could find quite a few older emails that were

Re: Process finite stream and notify upon completion

2021-07-14 Thread Tamir Sagi
Thank you so much mate. Now Its makes sense to me. I will test it and keep you posted if anything else comes up. Best, Tamir. [https://my-email-signature.link/signature.gif?u=1088647=165233849=4e235bf5d741dcd0b3fdca02e2c4451c808f8b723a67bd9da385502cd4fc7595]

Re: Flink 1.13.1 PartitionNotFoundException

2021-07-14 Thread Timo Walther
Hi Debraj, I could find quite a few older emails that were suggesting to play around with the `taskmanager.network.request-backoff.max` option. This was also recomended in the link that you shared. Have you tried it? Here is some background:

Re: Process finite stream and notify upon completion

2021-07-14 Thread Piotr Nowojski
Hi Tamir, Ok, let's take a step back. First of all let's assume we have a bounded source already. If so, when this source ends, it will emit MAX_WATERMARK shortly before closing and ending the stream. This MAX_WATERMARK will start flowing through your job graph firing all remaining event time

Re: Process finite stream and notify upon completion

2021-07-14 Thread Timo Walther
Hi Tamir, a nice property of watermarks is that they are kind of synchronized across input operators and their partitions (i.e. parallel instances). Bounded sources will emit a final MAX_WATERMARK once they have processed all data. When you receive a MAX_WATERMARK in your current operator,

Re: Process finite stream and notify upon completion

2021-07-14 Thread Tamir Sagi
Hey Piotr, Thank you for fast response, The refs are good, however , to be honest, I'm a little confused regarding the trick with MAX_WATERMARK . Maybe I'm missing something. keep in mind Flink is a distributed system so downstream operators/functions might still be busy for some time

Re: Kafka Consumer Retries Failing

2021-07-14 Thread Piotr Nowojski
Hi Rahul, I would highly doubt that you are hitting the network bottleneck case. It would require either a broken environment/network or throughputs in orders of GB/second. More likely you are seeing empty input pool and you haven't checked the documentation [1]: > inPoolUsage - An estimate of

Flink 1.13.1 PartitionNotFoundException

2021-07-14 Thread Debraj Manna
Hi I am observing my flink jobs is failing with the below error 2021-07-14T12:07:00.918Z INFO runtime.executiongraph.Execution flink-akka.actor.default-dispatcher-29 transitionState:1446 MetricAggregateFunction -> (Sink: LateMetricSink10, Sink: TSDBSink9) (12/30)

Re: Kafka Consumer Retries Failing

2021-07-14 Thread Rahul Patwari
Thanks, Piotrek. We have two Kafka sources. We are facing this issue for both of them. The downstream tasks with the sources form two independent directed acyclic graphs, running within the same Streaming Job. For Example: source1 -> task1 -> sink1 source2 -> task2 -> sink2 There is

Re: Process finite stream and notify upon completion

2021-07-14 Thread Piotr Nowojski
Hi Tamir, Sorry I missed that you want to use Kafka. In that case I would suggest trying out the new KafkaSource [1] interface and it's built-in boundness support [2][3]. Maybe it will do the trick? If you want to be notified explicitly about the completion of such a bounded Kafka stream, you

Re: Process finite stream and notify upon completion

2021-07-14 Thread Tamir Sagi
Hey Piotr, Thank you for your response. I saw the exact suggestion answer by David Anderson [1] but did not really understand how it may help. Sources when finishing are emitting {{org.apache.flink.streaming.api.watermark.Watermark#MAX_WATERMARK}} Assuming 10 messages are sent to Kafka topic ,

Re: Kafka Consumer Retries Failing

2021-07-14 Thread Piotr Nowojski
Hi, Waiting for memory from LocalBufferPool is a perfectly normal symptom of a backpressure [1][2]. Best, Piotrek [1] https://flink.apache.org/2021/07/07/backpressure.html [2] https://www.ververica.com/blog/how-flink-handles-backpressure śr., 14 lip 2021 o 06:05 Rahul Patwari napisał(a): >

Re: savepoint failure

2021-07-14 Thread Till Rohrmann
Hi Dan, Can you provide us with more information about your job (maybe even the job code or a minimally working example), the Flink configuration, the exact workflow you are doing and the corresponding logs and error messages? Cheers, Till On Tue, Jul 13, 2021 at 9:39 PM Dan Hill wrote: >

Re: flink oom

2021-07-14 Thread Hui Wang
Hi 我是开启了StateBackend的,使用的是filesystem,每半小时一次。 ---Original--- From: "Caizhi Weng"

Re: User Classpath from Plugin

2021-07-14 Thread Chesnay Schepler
You can't access the user classpath from plugins. On 14/07/2021 00:18, Mason Chen wrote: I've read this page (https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/deployment/filesystems/plugins/

flink oom

2021-07-14 Thread Hui Wang
Hi: 我遇到这么一个问题,flinkover窗口,窗口大小1小时,这个任务频道oom,看了gc日志,非常频烦,想问下,窗口中会缓存所有窗口事件吗?