Re: 广告业务中 如何用flink替换spark一些逻辑处理,是否需要用到processfunction

2021-08-17 Thread yidan zhao
逻辑混乱,没听懂你的需求。大搜? 张锴 于2021年8月18日周三 上午10:26写道: > > 需求描述: > 需要将目前的spark程序替换成flink去做,在梳理逻辑的时候有一块不知道用flink咋实现,spark是按每三分钟一个批次来跑的。 > 描述如下: > 广告日志按照ask日志->bid->show->click顺序流程,要求是要将不同的日志都与bid日志merge,来保证bid数据的完整性,key按sessionid+Adid做唯一 > 逻辑:spark读取多个日志topic >

flink任务触发检查点时报错,非必现。并发访问Map异常。

2021-08-17 Thread yidan zhao
下面是异常栈,我检查了出问题的那个task,该task包含2个算子A和B。 B是异步算子,但是目前无任何状态。A是广播处理算子(接受普通流和广播流),也仅用到broadcast state。 请问有人能分析下啥问题导致的Map并发访问问题吗。 2021-08-18 06:18:37 java.io.IOException: Could not perform checkpoint 575 for operator ual_transform_UserLogBlackUidJudger -> ual_transform_IpLabel (18/60)#0. at

广告业务中 如何用flink替换spark一些逻辑处理,是否需要用到processfunction

2021-08-17 Thread 张锴
需求描述: 需要将目前的spark程序替换成flink去做,在梳理逻辑的时候有一块不知道用flink咋实现,spark是按每三分钟一个批次来跑的。 描述如下: 广告日志按照ask日志->bid->show->click顺序流程,要求是要将不同的日志都与bid日志merge,来保证bid数据的完整性,key按sessionid+Adid做唯一 逻辑:spark读取多个日志topic

Re: PyFlink StreamingFileSink bulk-encoded format (Avro)

2021-08-17 Thread Dian Fu
Hi Kamil, AFAIK, it should still not support Avro format in Python StreamingFileSink in the Python DataStream API. However, I guess you could convert DataStream to Table[1] and then you could use all the connectors supported in the Table & SQL. In this case, you could use the FileSystem

Re: Flink taskmanager in crash loop

2021-08-17 Thread Yangze Guo
> 2021-08-16 15:58:13.986 [Cancellation Watchdog for Source: MASKED] ERROR > org.apache.flink.runtime.taskexecutor.TaskManagerRunner - Fatal > error > occurred while executing the TaskManager. Shutting it down... > org.apache.flink.util.FlinkRuntimeException: Task did not exit gracefully >

Re: Handling HTTP Requests for Keyed Streams

2021-08-17 Thread JING ZHANG
Hi Rion, Your solution is good. It seems that you need enrich a stream with data queries from external Http request. There is another solution for reference, just like the mechanism of lookup join in Flink SQL. Lookup Join in Flink SQL supports two modes: Async mode and Sync mode. For each input

flink Kinesis Consumer Connected but not consuming

2021-08-17 Thread tarun joshi
Hey All, I am running flink in docker containers (image Tag :flink:scala_2.11-java11) on EC2. I am able to connect to a Kinesis Connector but nothing is being consumed. My command to start Jobmanager and TaskManager : *docker run \--rm \--volume /root/:/root/ \--env

Re: flink not able to get scheme for S3

2021-08-17 Thread tarun joshi
Thanks Chesnay ! that helped me resolve the issue On Fri, 6 Aug 2021 at 04:31, Chesnay Schepler wrote: > The reason this doesn't work is that your application works directly > against Hadoop. > The filesystems in the plugins directory are only loaded via specific > code-paths, specifically

Re: redis sink from flink

2021-08-17 Thread Jin Yi
great, thanks for the pointers everyone. i'm going to pursue rolling my own built around lettuce since it seems more feature-full wrt async semantics. On Mon, Aug 16, 2021 at 7:21 PM Yik San Chan wrote: > By the way, this post in Chinese showed how we do it exactly with code. > >

Re: Can we release new flink-connector-redis? Thanks!

2021-08-17 Thread Hongbo Miao
Really appreciate, Austin! Hongbo On Aug 17, 2021, 10:33 -0700, Austin Cawley-Edwards , wrote: > Hi Hongbo, > > Thanks for your interest in the Redis connector! I'm not entirely sure what > the release process is like for Bahir, but I've pulled in @Robert Metzger who > has been involved in the

Task Managers having trouble registering after restart

2021-08-17 Thread Kevin Lam
Hi all, I'm observing an issue sometimes, and it's been hard to reproduce, where task managers are not able to register with the Flink cluster. We provision only the number of task managers required to run a given application, and so the absence of any of the task managers causes the job to enter

Re: Can we release new flink-connector-redis? Thanks!

2021-08-17 Thread Austin Cawley-Edwards
Hi Hongbo, Thanks for your interest in the Redis connector! I'm not entirely sure what the release process is like for Bahir, but I've pulled in @Robert Metzger who has been involved in the project in the past and can give an update there. Best, Austin On Tue, Aug 17, 2021 at 10:41 AM Hongbo

Re: Flink taskmanager in crash loop

2021-08-17 Thread Abhishek Rai
Before these message, there is the following message in the log: 2021-08-12 23:02:58.015 [Canceler/Interrupts for Source: MASKED]) (1/1)#29103' did not react to cancelling signal for 30 seconds, but is stuck in method: java.base@11.0.11/jdk.internal.misc.Unsafe.park(Native Method)

Re: Flink taskmanager in crash loop

2021-08-17 Thread Abhishek Rai
Thanks Yangze, indeed, I see the following in the log about 10s before the final crash (masked some sensitive data using `MASKED`): 2021-08-16 15:58:13.985 [Canceler/Interrupts for Source: MAKSED] WARN org.apache.flink.runtime.taskmanager.Task - Task 'MASKED' did not react to cancelling signal

Can we release new flink-connector-redis? Thanks!

2021-08-17 Thread Hongbo Miao
Hi Flink friends, I recently have a question about how to set TTL to make Redis keys expire in flink-connector-redis. I originally posted at Stack Overflow at  https://stackoverflow.com/questions/68795044/how-to-set-ttl-to-make-redis-keys-expire-in-flink-connector-redis Then I found there is a

Re: Handling HTTP Requests for Keyed Streams

2021-08-17 Thread Rion Williams
Hi Caizhi, I don’t mind the request being synchronous (or not using the Async I/O connectors). Assuming I go down that route would this be the appropriate way to handle this? Specifically creating an HttpClient and storing the result in state and on a keyed stream if the state was empty? It

flinksql的udf中可以使用Operator state的api么?

2021-08-17 Thread andrew
hi,你好: 通过flinksql读kafka数据流,实现监控用户信息基于上一次状态值发生变更触发最新用户信息输出.

Re:回复:如何监控kafka延迟

2021-08-17 Thread andrew
@Jimmy Zhang 了解下checkpoint/savepoint 中间计算的结果可以间隔时间写入外部hdfs等 在 2021-08-09 09:51:21,"Jimmy Zhang" 写道: >您好,看到你们在用kafka相关metrics,我想咨询一个问题。你们是否遇见过在重启一个kafka sink >job后,相关指标清零的情况?这样是不是就无法持续的进行数据想加?我们想做一个数据对账,查询不同时间段的输出量统计,这样可能中间归零就有问题,所以想咨询下,任何的回复都非常感谢! > > > > >| >Best, >Jimmy >| > >Signature is

Re: Upgrading from Flink on YARN 1.9 to 1.11

2021-08-17 Thread David Morávek
Hi Andreas, the problem here is that the command you're using is starting a per-job cluster (which is obvious from the used deployment method " YarnClusterDescriptor.deployJobCluster"). Apparently the `-m yarn-cluster` flag is deprecated and no longer supported, I think this is something we

PyFlink StreamingFileSink bulk-encoded format (Avro)

2021-08-17 Thread Kamil ty
Hello, I'm trying to save my data stream to an Avro file on HDFS. In Flink documentation I can only see explanations for Java/Scala. However, I can't seem to find a way to do it in PyFlink. Is this possible to do in PyFlink currently? Kind Regards Kamil

????????

2021-08-17 Thread 1421070960
??

Re: Flink On Yarn HA 部署模式下Flink程序无法启动

2021-08-17 Thread 周瑞
您好,我的版本是1.13.1 --Original-- From: "Yang Wang"https://issues.apache.org/jira/browse/FLINK-19212 Best, Yang 周瑞

Re: Flink On Yarn HA 部署模式下Flink程序无法启动

2021-08-17 Thread Yang Wang
看报错应该是个已知问题[1]并且已经在1.11.2中修复 [1]. https://issues.apache.org/jira/browse/FLINK-19212 Best, Yang 周瑞 于2021年8月17日周二 上午11:04写道: > 您好:Flink程序部署在Yran上以Appliation Mode 模式启动的,在没有采用HA > 模式的时候可以正常启动,配置了HA之后,启动异常,麻烦帮忙看下是什么原因导致的. > > > HA 配置如下: > high-availability: zookeeper high-availability.storageDir:

Re: Problems with reading ORC files with S3 filesystem

2021-08-17 Thread Piotr Jagielski
Hi David, Thanks for your answer. I finally managed to read ORC files by: - switching to s3a:// in my Flink SQL table path parameter - providing all the properties in Hadoop's core-site.xml file (fs.s3a.endpoint, fs.s3a.path.style.access, fs.s3a.aws.credentials.provider, fs.s3a.access.key,

Re: NullPointerException in StateTable.put()

2021-08-17 Thread László Ciople
I removed that line from the code and it seems to have solved the problem. Thank you very much! :) All the best, Laszlo On Tue, Aug 17, 2021 at 9:54 AM László Ciople wrote: > Ok, thank you for the tips. I will modify it and get back to you :) > > On Tue, Aug 17, 2021 at 9:42 AM David Morávek

Re: NullPointerException in StateTable.put()

2021-08-17 Thread László Ciople
Ok, thank you for the tips. I will modify it and get back to you :) On Tue, Aug 17, 2021 at 9:42 AM David Morávek wrote: > Hi Laszlo, > > Please use reply-all for mailing list replies. This may help others > finding their answer in the future ;) > > >>

flink ??????????????truncate table????

2021-08-17 Thread Asahi Lee
hi! flink??truncate table??flink hivetruncate table??

Re: RabbitMQ 3.9+ Native Streams

2021-08-17 Thread David Morávek
This would be awesome! We have the contribution guide [1] that should give you a rough idea on how to approach the contribution. Let me know if you need any further guidance, I'd be happy to help ;) [1]

flink 1.13.1??????????hive??????????insert overwirite??????????????????????????????????????????

2021-08-17 Thread Asahi Lee
hi?? ??sqlselect0??hive INSERT OVERWRITE target_table SELECT * from source_table where id 10;

Re: NullPointerException in StateTable.put()

2021-08-17 Thread David Morávek
Hi Laszlo, Please use reply-all for mailing list replies. This may help others finding their answer in the future ;) > sb.append(DeviceDetail.class.getName()).append('@').append(Integer.toHexString(System.identityHashCode(this))).append('['); This part will again make your key

Re:回复:如何监控kafka延迟

2021-08-17 Thread RS
1. metric指标每次都会清0的2. 数据对账的话, 可以将每次的统计数据按时间点保存起来, 然后查询时间范围的时候, 做sum求和来对账 在 2021-08-09 09:51:43,"Jimmy Zhang" 写道: >您好,看到你们在用kafka相关metrics,我想咨询一个问题。你们是否遇见过在重启一个kafka sink >job后,相关指标清零的情况?这样是不是就无法持续的进行数据想加?我们想做一个数据对账,查询不同时间段的输出量统计,这样可能中间归零就有问题,所以想咨询下,任何的回复都非常感谢! > > > > >| >Best, >Jimmy >| >