Re: Re: flink on yarn启动失败

2020-12-23 Thread magichuang
感谢感谢感谢!!! 原来是这样,以为solt 缩写就是-s了,,,感谢这位朋友的解答,已经可以提交了~ > -- 原始邮件 -- > 发 件 人:"Yang Wang" > 发送时间:2020-12-24 11:01:46 > 收 件 人:user-zh > 抄 送: > 主 题:Re: flink on yarn启动失败 > > 你这个命令写的有点问题,flink run -m yarn-cluster -ynm traffic -s 2 -p 2 -ytm 1024 -py > traffic.py > >

Re: Flink catalog+hive问题

2020-12-23 Thread 19916726683
可以参考下这个 https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsPermissionsGuide.html 贴的代码是org.apache.hadoop.hive.io.HdfsUtils 的setFullFileStatus 方法 Original Message Sender:Rui lilirui.fu...@gmail.com Recipient:user-zhuser...@flink.apache.org Date:Thursday, Dec 24, 2020 11:33

pyflink1.12 使用connector read.query参数报错

2020-12-23 Thread 肖越
使用DDL 定义connector连接Mysql数据库,想通过发送sql的方式直接获取数据: source_ddl = """ CREATE TABLE source_table( yldrate DECIMAL, pf_id VARCHAR, symbol_id VARCHAR) WITH( 'connector' = 'jdbc', 'url' = 'jdbc:mysql://ip/db',

flink.client.program.ProgramInvocationException: The main method caused an error: Unable to instantiate java compiler

2020-12-23 Thread bigdata
flink1.10.1?? org.apache.flink.client.program.ProgramInvocationException: The main method caused an error: Unable to instantiate java compiler at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:335) at

proctime in yaml ,sql-cli start throws exception

2020-12-23 Thread su_...@cjspd.com
Exception in thread "main" org.apache.flink.table.client.SqlClientException: Unexpected exception. This is a bug. Please consider filing an issue.Exception in thread "main" org.apache.flink.table.client.SqlClientException: Unexpected exception. This is a bug. Please consider filing an issue.

Just published connect-flink-with-kinesis-kinesalite-using-scala

2020-12-23 Thread Avi Levi
Hi , After stumbling a little with connecting to kinesis/kinesalite I just published connect-flink-with-kinesis-kinesalite-using-scala hopefully it will assist someone. would love to get your inputtes

Re: Flink catalog+hive问题

2020-12-23 Thread Rui Li
Hello, 你贴的图看不到了。可以贴一下参考的官网链接。hive至少支持三种不同的authorization模式,flink目前对接hive时只有用storage based authorization会生效。 On Thu, Dec 24, 2020 at 10:51 AM 19916726683 <19916726...@163.com> wrote: > hive的官网有介绍ACL,如何继承权限关系。源码在Hive-> HDFSUtils类中 核心代码应该是上面的这点。 > > Original Message > *Sender:* Rui Li >

Re: Flink catalog+hive问题

2020-12-23 Thread r pp
gmail 可能有些不兼容,看不到截图 19916726683 <19916726...@163.com> 于2020年12月24日周四 上午10:51写道: > hive的官网有介绍ACL,如何继承权限关系。源码在Hive-> HDFSUtils类中 核心代码应该是上面的这点。 > > Original Message > *Sender:* Rui Li > *Recipient:* user-zh > *Date:* Wednesday, Dec 23, 2020 19:41 > *Subject:* Re: Flink catalog+hive问题 > >

Re: pyflink query 语句执行获取数据速度很慢,where子句不过滤数据么?

2020-12-23 Thread r pp
表a 在 sql 语句的哪里呢? 关心的真的是过滤问题么? 如果你对你的业务十分熟悉,且了解到 flink1.11 不过 过滤,那为什么 不自行过滤 优化下呢? 如果,不是过滤问题,是大数 join 小数 问题,或者 大数 join 大数问题,是不是可以考虑 广播传播 或者 并行度 的优化方向? 是不是应该 先分析好业务问题,在去看 flink1.12 能否解决问题。 肖越 <18242988...@163.com> 于2020年12月24日周四 上午11:16写道: > connector 从数据库读取整张表格,执行: > env.sql_query("select a , b,

pyflink query 语句执行获取数据速度很慢,where子句不过滤数据么?

2020-12-23 Thread 肖越
connector 从数据库读取整张表格,执行: env.sql_query("select a , b, c from table1 left join table2 on a = d where b = '103' and c = '203' and e = 'AC' and a between 20160701 and 20170307 order a") 其中表 a 的数据量很大,能有1千万条,但匹配出来的数据只有250条,本机执行要10分钟~ 了解到 flink 1.11存在where子句不会先过滤数据,请问flink1.12 仍存在这个问题么?怎么优化呢?

Re: flink on yarn启动失败

2020-12-23 Thread Yang Wang
你这个命令写的有点问题,flink run -m yarn-cluster -ynm traffic -s 2 -p 2 -ytm 1024 -py traffic.py 应该是-ys,而不是-s -s是从savepoints恢复,所以报错里面会有找不到savepoints目录 Best, Yang magichuang 于2020年12月23日周三 下午8:29写道: > 机器参数:三台 32C64G centos 7.8,cdh集群在这上面先部署 > flink版本:1.11.2,在三台集群上搭建的集群 > > hadoop集群是用cdh搭建的 > > >

Re: Flink catalog+hive问题

2020-12-23 Thread 19916726683
hive的官网有介绍ACL,如何继承权限关系。源码在Hive-> HDFSUtils类中 核心代码应该是上面的这点。 Original Message Sender: Rui Li Recipient: user-zh Date: Wednesday, Dec 23, 2020 19:41 Subject: Re: Flink catalog+hive问题 hive的ACL用的是哪种呢?目前flink没有专门做ACL的对接,只有HMS端storage based authorization [1] 会生效 [1]

Re: flink1.10.1/1.11.1 使用sql 进行group 和 时间窗口 操作后 状态越来越大

2020-12-23 Thread Yun Tang
Hi @Storm checkpoint的增量模式目前仅对RocksDB生效,这里的增量是指上传新产生的DB sst文件。而RocksDB的全量模式是将DB的有效kv进行序列化写出,除非有大量的数据没有compaction清理掉,否则不可能出现增量checkpoint size无限膨胀,而全量checkpoint正常的问题,你这里的无限膨胀的size范围是多大呢? 祝好 唐云 From: Storm☀️ Sent: Tuesday, December 22, 2020 19:52 To:

Re: Queryable state on task managers that are not running the job

2020-12-23 Thread Yun Tang
Hi Martin, What kind of deploy mode you choose? If you use per-job mode [1] to launch jobs, there might exist only idle slots instead of idle taskmanagers. Currently, queryable state is bounded to specific job and if the idle taskmanager is not registered in the target's resource manager, no

RE: Distribute Parallelism/Tasks within RichOutputFormat?

2020-12-23 Thread Hailu, Andreas [Engineering]
Thanks Chesnay, Flavio – I believe Flavio’s first recommendation will work well enough. I agree that the second approach may be a bit finicky to use long-term. Cheers. // ah From: Chesnay Schepler Sent: Wednesday, December 23, 2020 4:07 AM To: Flavio Pompermaier ; Hailu, Andreas [Engineering]

Re: Re: checkpoint delay consume message

2020-12-23 Thread lec ssmi
Checkpoint can be done synchronously and asynchronously, the latter is the default . If you chooese the synchronous way , it may cause this problem. nick toker 于2020年12月23日周三 下午3:53写道: > Hi Yun, > > Sorry but we didn't understand your questions. > The delay we are experiencing is on the

Re: Issue in WordCount Example with DataStream API in BATCH RuntimeExecutionMode

2020-12-23 Thread David Anderson
I did a little experiment, and I was able to reproduce this if I use the sum aggregator on KeyedStream to do the counting. However, if I implement my own counting in a KeyedProcessFunction, or if I use the Table API, I get correct results with RuntimeExecutionMode.BATCH -- though the results are

Issue in WordCount Example with DataStream API in BATCH RuntimeExecutionMode

2020-12-23 Thread Derek Sheng
Hi team, Recently I am trying to explore the new features of Flink 1.12 with Batch Execution. I locally wrote a classic WordCount program to read from text file and count the words (almost same as the one in Flink Github repo

Re: flink1.11.2写hive分区表,hive识别不到分区

2020-12-23 Thread admin
Hi, Hive 自动添加分区依赖于分区提交策略 metastore,所以得添加policy配置才能生效 > 2020年12月23日 上午9:27,kingdomad 写道: > > 是的。开启了checkpoint。 > 消费kafka,用tableEnv把stream注册成TemporaryView。 > 然后执行sql写入到hive的表中。 > > > > > > > > > > > > > > -- > > kingdomad > > > > > > > > 在 2020-12-23 09:22:48,"范瑞"

How does Flink handle shorted lived keyed streams

2020-12-23 Thread narasimha
Hi, Belos is the use case. Have a stream of transaction events, success/failure of a transaction can be determined by those events. Partitioning stream by transaction id and applying CEP to determine the success/failure of a transaction. Each transaction keyed stream is valid only until the

Re: How to implement a WindowableTask(similar to samza) in Apache flink?

2020-12-23 Thread David Anderson
Please note that I responded to this question on Stack Overflow: https://stackoverflow.com/questions/65414125/how-to-implement-a-windowabletask-similar-to-samza-in-apache-flink Regards, David On Wed, Dec 23, 2020 at 7:08 AM Debraj Manna wrote: > I am new to flink and this is my first post in

flink on yarn启动失败

2020-12-23 Thread magichuang
机器参数:三台 32C64G centos 7.8,cdh集群在这上面先部署 flink版本:1.11.2,在三台集群上搭建的集群 hadoop集群是用cdh搭建的 启动命令:flink run -m yarn-cluster -ynm traffic -s 2 -p 2 -ytm 1024 -py traffic.py 程序使用pyflink开发的,从kafka读取数据,然后用滚动窗口聚合每分钟的数据在写入kafka 这个程序在local模式下是正常运行的,但是用per-job模式提交总是失败 测试官方例子 flink run -m yarn-cluster

Re: Flink 操作hive 一些疑问

2020-12-23 Thread Rui Li
Hi, 是说写数据的是一个流作业,读数据的是一个批作业么? On Tue, Dec 22, 2020 at 5:51 PM Jacob <17691150...@163.com> wrote: > Dear all, > > 我目前有个Flink job,执行完所以业务逻辑后生成了一些业务数据,然后将这些数据以ORC格式写到hdfs上,并调用hive api > 将orc文件load到Hive表,至此flink job的工作结束。 > > 后面,其他Java定时程序做Mapreduce,对上一步写进hive的数据进行后续操作。 > >

Re: Flink SQL continuous join checkpointing

2020-12-23 Thread Taras Moisiuk
Hi Leonard, Thank you for answer, in fact I used regular join because my interval condition was based on wrong column. I extended my join with attribute column condition and it solved the problem: ... FROM table_fx fx LEFT JOIN table_v v ON v.active = fx.instrument_active_id

Re: Re: flink1.11.2写hive分区表,hive识别不到分区

2020-12-23 Thread Rui Li
流数据写hive分区表是需要额外的参数配置的。Flink 1.11里具体的参数可以参考这两个文档: https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/table/hive/hive_streaming.html#streaming-writing https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/table/connectors/filesystem.html#streaming-sink On Wed, Dec 23,

flink1.10.1??idea????dml????????Error during disposal of stream operator. java.lang.NullPointerException

2020-12-23 Thread bigdata
2020-12-23 19:43:01,588 ERROR org.apache.flink.streaming.runtime.tasks.StreamTask - Error during disposal of stream operator. java.lang.NullPointerException at

Re: flink 1.11.2 创建hive表的问题

2020-12-23 Thread Rui Li
只是日志里有异常信息还是说DDL会执行失败呢?另外可以贴一下日志里的异常堆栈,看看是哪里打出来的。 On Tue, Dec 22, 2020 at 2:41 PM 曹武 <14701319...@163.com> wrote: > 大佬好,我在使用create table if not > exists创建hive表时,对于已存在的hive表,在hive的日志中会抛出AlreadyExistsException(message:Table > bm_tsk_001 already exists异常,查看源码发现if not >

Re: Flink catalog+hive问题

2020-12-23 Thread Rui Li
hive的ACL用的是哪种呢?目前flink没有专门做ACL的对接,只有HMS端storage based authorization [1] 会生效 [1] https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Authorization#LanguageManualAuthorization-1StorageBasedAuthorizationintheMetastoreServer On Wed, Dec 23, 2020 at 4:34 PM 19916726683

dml????????Error during disposal of stream operator. java.lang.NullPointerException

2020-12-23 Thread bigdata
??ddl??dml??

pyflink query 语句执行获取数据速度很慢,where子句不过滤数据么?

2020-12-23 Thread 肖越
connector 从数据库读取整张表格,执行: env.sql_query("select a , b, c from table1 left join table2 on a = d where b = '103' and c = '203' and e = 'AC' and a between 20160701 and 20170307 order by biz_date") 其中表 a 的数据量很大,能有1千万条,但匹配出来的数据只有250条,本机执行要10分钟! 了解到 flink 1.11存在where子句不会先过滤数据,请问flink1.12

Re: Distribute Parallelism/Tasks within RichOutputFormat?

2020-12-23 Thread Chesnay Schepler
Essentially I see 2 options here: a) split your output format such that each format is it's own sink, and then follow Flavio's suggestion to filter the stream and apply each sink to one of the streams, with the respective parallelism. This would be the recommended approach. b) modify your

dml????????Error during disposal of stream operator. java.lang.NullPointerException

2020-12-23 Thread bigdata
??ddl??dml??

flink sql cdc流(1.11.2),双流join长时间运行后,试着取消job后,再从从checkpoint处恢复运行,报outofmemorry, tm的参数配置也没有变过,rocksdb后端。 这个可能原因是啥?运行的时候没报,为啥恢复的时间报了?

2020-12-23 Thread jindy_liu
java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Arrays.java:3181) at java.util.ArrayList.grow(ArrayList.java:261) at java.util.ArrayList.ensureExplicitCapacity(ArrayList.java:235) at

Re: Distribute Parallelism/Tasks within RichOutputFormat?

2020-12-23 Thread Flavio Pompermaier
I'm not an expert of the streaming APIs but you could try to do something like this: DataStream ds = null; DataStream ds1 = ds.filter(...).setParallelism(3); DataStream ds2 = ds.filter(...).setParallelism(7); Could it fit your needs? Best, Flavio On Wed, Dec 23, 2020 at 3:54 AM Hailu, Andreas

Re: Flink catalog+hive问题

2020-12-23 Thread 19916726683
spark是可以通过配置来确定是用hive的acl还是用自己的acl,不清楚flink是不是也是这种模式 Original Message Sender:guaishushu1103@163.comguaishushu1...@163.com Recipient:user-zhuser...@flink.apache.org Date:Wednesday, Dec 23, 2020 15:53 Subject:Flink catalog+hive问题 在用flink