Re: Install/Run Streaming Anomaly Detection R package in Flink

2021-02-23 Thread Wei Zhong
chatryan 写道: > > Hi, > > I'm pulling in Wei Zhong and Xingbo Huang who know PyFlink better. > > Regards, > Roman > > > On Mon, Feb 22, 2021 at 3:01 PM Robert Cullen <mailto:cinquate...@gmail.com>> wrote: > My customer wants us to install thi

Re: pyflink的py4j里是不是可以调用我自己写的java程序 ?

2021-02-05 Thread Wei Zhong
尝试调用: get_gateway().jvm.Test2.Test2.main(None) > 在 2021年2月5日,18:27,瞿叶奇 <389243...@qq.com> 写道: > > 老师,您好,列表参数就不在报错,但是还是没有加载进去。 > >>> from pyflink.util.utils import add_jars_to_context_class_loader > >>> add_jars_to_context_class_loader(['file:///root/Test2.jar > >>> ']) > >>> from

Re: pyflink的py4j里是不是可以调用我自己写的java程序 ?

2021-02-05 Thread Wei Zhong
图片似乎无法加载,不过我猜应该是参数类型问题?这个函数需要参数类型为List: add_jars_to_context_class_loader(["file:///xxx "]) > 在 2021年2月5日,17:48,瞿叶奇 <389243...@qq.com> 写道: > > 老师,您好, > 我升级到了flink1.12.0了,用这个函数加载类报错了,是我url写的有问题吗?pyfink有没有连接hdfs认证kerberos的方法呢? > > > > > -- 原始邮件 -- > 发件人:

Re: pyflink的py4j里是不是可以调用我自己写的java程序 ?

2021-02-05 Thread Wei Zhong
Hi, 首先你需要将你的java程序打成jar包,之后热加载你的jar包, 目前pyflink里有一个util方法可以直接调用: from pyflink.util.utils import add_jars_to_context_class_loader add_jars_to_context_class_loader("file:///xxx ") # 注意需要是url格式的路径 然后就能通过java gateway进行调用了: from pyflink.java_gateway import get_gateway

Re: pyflink连接kerberos的kafka问题

2021-02-02 Thread Wei Zhong
Hi, 第一个问题应该是通过你现在的配置找不到对应的KDC realm, 可以继续尝试使用System.setProperty手动配置, 例如 System.setProperty("java.security.krb5.realm", ""); System.setProperty("java.security.krb5.kdc","”); 第二个问题, 'update-mode’=‘append'指的是只接受来自上游算子的append消息,而不是写文件时采用append模式。我想你可能想要配置的属性是'format.write-mode’='OVERWRITE’?

Re: 请问pyflink如何跟kerberos认证的kafka对接呢

2021-01-31 Thread Wei Zhong
Hi, 看你之前发的邮件,你现在是把kerberos相关的配置放在某一个flink-conf.yaml里,然后启动了一个local模式吧? 但是local模式的pyflink shell是不会主动读取任何flink-conf.yaml的。需要配置环境变量FLINK_HOME,将相关配置写入$FLINK_HOME/conf/flink-conf.yaml里,并且只有在提交job时候(flink run、remote模式或者yarn模式)才会去读取flink-conf.yaml里的内容。 如果执意要在local模式下尝试,可以通过以下代码: from

Re: [ANNOUNCE] Apache Flink 1.12.1 released

2021-01-19 Thread Wei Zhong
Thanks Xintong for the great work! Best, Wei > 在 2021年1月19日,18:00,Guowei Ma 写道: > > Thanks Xintong's effort! > Best, > Guowei > > > On Tue, Jan 19, 2021 at 5:37 PM Yangze Guo > wrote: > Thanks Xintong for the great work! > > Best, > Yangze Guo > > On Tue, Jan

Re: [ANNOUNCE] Apache Flink 1.12.1 released

2021-01-19 Thread Wei Zhong
Thanks Xintong for the great work! Best, Wei > 在 2021年1月19日,18:00,Guowei Ma 写道: > > Thanks Xintong's effort! > Best, > Guowei > > > On Tue, Jan 19, 2021 at 5:37 PM Yangze Guo > wrote: > Thanks Xintong for the great work! > > Best, > Yangze Guo > > On Tue, Jan

Re: PyFlink on Yarn, Per-Job模式,如何增加多个外部依赖jar包?

2021-01-05 Thread Wei Zhong
Hi Zhizhao, 能检查一下'file://' 后面跟的是绝对路径吗?这个报错是因为对应的路径在本地磁盘上找不到导致的。 > 在 2021年1月6日,10:23,Zhizhao Shangguan 写道: > > Hi: > PyFlink on Yarn, > Per-Job模式,如何增加多个外部依赖jar包?比如flink-sql-connector-kafka、flink-connector-jdbc等。 > > 环境信息 > Flink 版本:1.11.0 > Os: mac > > 尝试了如下方案,遇到了一些问题 > 1、

Re: pyflink 1.12 是不支持 通过sql 直接向数据库获取数据的操作么? 没看到相关接口

2020-12-22 Thread Wei Zhong
你好, pyflink需要通过声明jdbc connector的方式来从数据库中获取数据。 > 在 2020年12月22日,17:40,肖越 <18242988...@163.com> 写道: > > 例如:pandas.read_sql()的用法,直接返回源数据,pyflink小白,蹲大佬的答复。

Re: pyflink1.12 进行多表关联后的结果类型是TableResult,如何转为Table类型

2020-12-22 Thread Wei Zhong
你好, 使用env.sql_update()执行select语句可以获得Table类型的结果。 > 在 2020年12月22日,13:25,肖越 <18242988...@163.com> 写道: > > 通过sql进行左连接查询,sql语句为: > sql = ''' Insert into print_sink select a.id, a.pf_id, b.symbol_id from a \ > left join b on b.day_id = a.biz_date where a.ccy_type = 'AC' and \ >

Re: pyflink1.12 连接Mysql报错 : Missing required options

2020-12-21 Thread Wei Zhong
Hi, 正如报错中提示的,with参数里需要的是"url"参数,你可以尝试将connector.url改成url试试看会不会报错了。 > 在 2020年12月21日,13:44,肖越 <18242988...@163.com> 写道: > > 在脚本中定义了两个源数据 ddl,但是第二就总会报缺option的问题,pyflink小白,求大神解答? > #DDL定义 > source_ddl2 = """CREATE TABLE ts_pf_sec_yldrate (id DECIMAL,pf_id VARCHAR,\ > >symbol_id

Re: Urgent help on S3 CSV file reader DataStream Job

2020-12-16 Thread Wei Zhong
Deep > > On Mon, 14 Dec, 2020, 10:28 AM DEEP NARAYAN Singh, <mailto:about.d...@gmail.com>> wrote: > Hi Wei, > No problem at all.Thanks for your response. > Yes ,it is just starting from the beginning like no check pointing finished. > > Thanks, > -Deep >

Re: Pandas UDF处理过的数据sink问题

2020-12-13 Thread Wei Zhong
Hi Lucas, 是这样的,这个Pandas的输出类型是一列Row, 而你现在的sink需要接收的是一列BIGINT和一列INT。 你可以尝试将sql语句改成以下形式: select orderCalc(code, amount).get(0), orderCalc(code, amount).get(1) from `some_source` group by TUMBLE(eventTime, INTERVAL '1' SECOND), code, amount 此外你这里实际是Pandas UDAF的用法吧,如果是的话则需要把”@udf”换成”@udaf” Best,

Re: [ANNOUNCE] Apache Flink 1.12.0 released

2020-12-10 Thread Wei Zhong
Congratulations! Thanks Dian and Robert for the great work! Best, Wei > 在 2020年12月10日,20:26,Leonard Xu 写道: > > > Thanks Dian and Robert for the great work as release manager ! > And thanks everyone who makes the release possible ! > > > Best, > Leonard > >> 在 2020年12月10日,20:17,Robert

Re: Urgent help on S3 CSV file reader DataStream Job

2020-12-08 Thread Wei Zhong
t()) { > return true; >} else { > throw new ParseException("Row too short: " + new String(bytes, offset, > numBytes, getCharset())); >} > } > Let me know if you need any details. > Thanks, > -Deep > > > > > > On Tue,

Re: Urgent help on S3 CSV file reader DataStream Job

2020-12-07 Thread Wei Zhong
Regards, > -Deep > > > On Mon, Dec 7, 2020 at 6:38 PM Till Rohrmann wrote: > Hi Deep, > > Could you use the TextInputFormat which reads a file line by line? That way > you can do the JSON parsing as part of a mapper which consumes the file > lines. > > Cheers, &g

Re: Urgent help on S3 CSV file reader DataStream Job

2020-12-07 Thread Wei Zhong
Hi Deep, (redirecting this to user mailing list as this is not a dev question) You can try to set the line delimiter and field delimiter of the RowCsvInputFormat to a non-printing character (assume there is no non-printing characters in the csv files). It will read all the content of a csv

Re: SQL解析复杂JSON问题

2020-12-04 Thread Wei Zhong
是的,1.11想做JSON的自定义解析和映射只能在json format以外的地方进行了 > 在 2020年12月4日,17:19,李轲 写道: > > 如果1.11想做自定义解析和映射,只能通过udf么? > > 发自我的iPhone > >> 在 2020年12月4日,16:52,Wei Zhong 写道: >> >> Hi 你好, >> >> 这个取决于你使用的flink版本,1.11版本会自动从table schema中解析,而1.10版本如果table

Re: SQL解析复杂JSON问题

2020-12-04 Thread Wei Zhong
Hi 你好, 这个取决于你使用的flink版本,1.11版本会自动从table schema中解析,而1.10版本如果table schema和json schema不是完全相同的话,需要手动写json-schema: https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/table/connectors/formats/json.html#data-type-mapping

Re: 用代码执行flink sql 报错 Multiple factories for identifier 'jdbc' that implement

2020-12-03 Thread Wei Zhong
这个错误信息显示问题在后续版本已经修复,新版本发布后升级版本就能直接从错误信息中看到是哪些TableFactory冲突了: https://issues.apache.org/jira/browse/FLINK-20186 <https://issues.apache.org/jira/browse/FLINK-20186> > 在 2020年12月3日,20:08,Wei Zhong 写道: > > Hi, > > 现在的查找TableFactory的代码在错误信息显示上似乎存在问题,看不到真实的类名,可以先手动执行一下

Re: 用代码执行flink sql 报错 Multiple factories for identifier 'jdbc' that implement

2020-12-03 Thread Wei Zhong
Hi, 现在的查找TableFactory的代码在错误信息显示上似乎存在问题,看不到真实的类名,可以先手动执行一下以下代码查看到底是哪些类被判定为JDBC的DynamicTableSinkFactory了: List result = new LinkedList<>(); ServiceLoader .load(Factory.class, Thread.currentThread().getContextClassLoader()) .iterator() .forEachRemaining(result::add); List jdbcResult =

Re: Flink CEP 动态加载 pattern

2020-12-02 Thread Wei Zhong
Hi 你好, 现在Flink CEP还不支持动态加载规则。社区现在有一个JIRA来跟踪这个需求: https://issues.apache.org/jira/browse/FLINK-7129 您可以关注这个JIRA来获取最新进展。 > 在 2020年12月2日,17:48,huang botao 写道: > > Hi,在项目中常遇到规则变更的情况,我们一般怎么动态加载这些规则?Flink CEP有原生支持动态加载规则的API吗?

Re: PyFlink - Scala UDF - How to convert Scala Map in Table API?

2020-11-18 Thread Wei Zhong
Type conversions working: > - scala.collection.immutable.Map[String,String] => org.apache.flink.types.Row > => ROW > - scala.collection.immutable.Map[String,String] => > java.util.Map[String,String] => MAP > > Any hint for Map[String,Any] ? > > Best regard

Re: pyflink利用sql ddl连接hbase-1.4.x出错Configuring the input format (null) failed: Cannot create connection to HBase

2020-11-18 Thread Wei Zhong
Hi 你好, 看root cause是 io.netty.channel.EventLoopGroup 这个类找不到,能否检查一下classpath里是否包含netty的jar包,亦或相关jar包中是否shade了netty库? > 在 2020年11月16日,17:02,ghostviper 写道: > > *环境配置如下:* > hbase-1.4.13 > flink-1.11.1 > python-3.6.1 > pyflink-1.0 > > *已做配置如下:* > 1.hadoop classpath下已经加入hbase路径

Re: pyflink 1.11 运行pyflink作业时报错

2020-11-18 Thread Wei Zhong
Hi 你好, 只看目前的报错看不出问题来,请问能贴出出错部分的job源码吗? > 在 2020年11月17日,16:58,whh_960101 写道: > > Hi,各位大佬,pyflink 1.11 将pyflink作业提交到yarn集群运行,作业在将处理后的main_table > insert到sink端的kafka时报错File "/home/cdh272705/poc/T24_parse.py", line 179, in > from_kafka_to_oracle_demo > >

Re: PyFlink - Scala UDF - How to convert Scala Map in Table API?

2020-11-17 Thread Wei Zhong
aType > [error] (x$1: > org.apache.flink.table.api.DataTypes.Field*)org.apache.flink.table.types.DataType > [error] cannot be applied to (org.apache.flink.table.types.DataType, > org.apache.flink.table.types.DataType) > [error] override def getResultType(signature: Array[Class[_]]): > TypeInformation[_] = Da

Re: PyFlink - Scala UDF - How to convert Scala Map in Table API?

2020-11-17 Thread Wei Zhong
larFunction { > def eval(): Row = { > Row.of(java.lang.String.valueOf("foo"), java.lang.String.valueOf("bar")) > } > } > > Best regards, > > Le mar. 17 nov. 2020 à 10:04, Wei Zhong <mailto:weizhong0...@gmail.com>> a écrit : > Hi Pierre, &g

Re: PyFlink - Scala UDF - How to convert Scala Map in Table API?

2020-11-17 Thread Wei Zhong
Hi Pierre, You can try to replace the '@DataTypeHint("ROW")' with '@FunctionHint(output = new DataTypeHint("ROW”))' Best, Wei > 在 2020年11月17日,15:45,Pierre Oberholzer 写道: > > Hi Dian, Community, > > (bringing the thread back to wider audience) > > As you suggested, I've tried to use

Re: [ANNOUNCE] New PMC member: Dian Fu

2020-08-28 Thread Wei Zhong
Congratulations Dian! > 在 2020年8月28日,14:29,Jingsong Li 写道: > > Congratulations , Dian! > > Best, Jingsong > > On Fri, Aug 28, 2020 at 11:06 AM Walter Peng > wrote: > congrats! > > Yun Tang wrote: > > Congratulations , Dian! > > > -- > Best, Jingsong Lee

Re: [DISCUSS] FLIP-133: Rework PyFlink Documentation

2020-08-05 Thread Wei Zhong
gt; >> improve the Python API doc. > >>> >> > >>> >> I have received many feedbacks from PyFlink beginners about > >>> >> the PyFlink doc, e.g. the materials are too few, the Python doc is > >>> mixed > >>> >&

Re: [DISCUSS] FLIP-133: Rework PyFlink Documentation

2020-08-05 Thread Wei Zhong
gt; >> improve the Python API doc. > >>> >> > >>> >> I have received many feedbacks from PyFlink beginners about > >>> >> the PyFlink doc, e.g. the materials are too few, the Python doc is > >>> mixed > >>> >&

Re: PyFlink DDL UDTF join error

2020-07-29 Thread Wei Zhong
28日,21:19,Wei Zhong 写道: > > Hi Manas, > > It seems like a bug. You can try to replace the udtf sql call with such code > as a workaround currently: > > t_env.register_table("tmp_view", > t_env.from_path(f"{INPUT_TABLE}").join_lateral("split(

Re: PyFlink DDL UDTF join error

2020-07-28 Thread Wei Zhong
Hi Manas, It seems like a bug. You can try to replace the udtf sql call with such code as a workaround currently: t_env.register_table("tmp_view", t_env.from_path(f"{INPUT_TABLE}").join_lateral("split(data) as (featureName, featureValue)")) This works for me. I’ll try to find out what caused

Re: [ANNOUNCE] Apache Flink 1.11.1 released

2020-07-22 Thread Wei Zhong
Congratulations! Thanks Dian for the great work! Best, Wei > 在 2020年7月22日,15:09,Leonard Xu 写道: > > Congratulations! > > Thanks Dian Fu for the great work as release manager, and thanks everyone > involved! > > Best > Leonard Xu > >> 在 2020年7月22日,14:52,Dian Fu 写道: >> >> The Apache Flink

Re: [ANNOUNCE] Apache Flink 1.11.1 released

2020-07-22 Thread Wei Zhong
Congratulations! Thanks Dian for the great work! Best, Wei > 在 2020年7月22日,15:09,Leonard Xu 写道: > > Congratulations! > > Thanks Dian Fu for the great work as release manager, and thanks everyone > involved! > > Best > Leonard Xu > >> 在 2020年7月22日,14:52,Dian Fu 写道: >> >> The Apache Flink

Re: windows用户使用pyflink问题

2020-04-27 Thread Wei Zhong
Hi Tao, PyFlink 的windows支持正在开发中,预计在1.11发布。届时可以解决在windows下开发PyFlink的问题。 > 在 2020年4月28日,10:23,tao siyuan 写道: > > 好的,我试试 > > Zhefu PENG 于2020年4月28日周二 上午10:16写道: > >> 可以尝试在external lib把site-packages下的内容都添加进去,可以帮助提升开发效率。 >> >> On Tue, Apr 28, 2020 at 10:13 tao siyuan wrote: >> >>>

Re: [VOTE] Release Flink Python API(PyFlink) 1.9.2 to PyPI, release candidate #1

2020-02-10 Thread Wei Zhong
Hi, Thanks for driving this, Jincheng. +1 (non-binding) - Verified signatures and checksums. - Verified README.md and setup.py. - Run `pip install apache-flink-1.9.2.tar.gz` in Python 2.7.15 and Python 3.7.5 successfully. - Start local pyflink shell in Python 2.7.15 and Python 3.7.5 via

Re: [VOTE] Release Flink Python API(PyFlink) 1.9.2 to PyPI, release candidate #1

2020-02-10 Thread Wei Zhong
Hi, Thanks for driving this, Jincheng. +1 (non-binding) - Verified signatures and checksums. - Verified README.md and setup.py. - Run `pip install apache-flink-1.9.2.tar.gz` in Python 2.7.15 and Python 3.7.5 successfully. - Start local pyflink shell in Python 2.7.15 and Python 3.7.5 via

Re: [DISCUSS] Upload the Flink Python API 1.9.x to PyPI for user convenience.

2020-02-03 Thread Wei Zhong
Hi Jincheng, Thanks for bring up this discussion! +1 for this proposal. Building from source takes long time and requires a good network environment. Some users may not have such an environment. Uploading to PyPI will greatly improve the user experience. Best, Wei jincheng sun 于2020年2月4日周二

Re: [DISCUSS] Upload the Flink Python API 1.9.x to PyPI for user convenience.

2020-02-03 Thread Wei Zhong
Hi Jincheng, Thanks for bring up this discussion! +1 for this proposal. Building from source takes long time and requires a good network environment. Some users may not have such an environment. Uploading to PyPI will greatly improve the user experience. Best, Wei jincheng sun 于2020年2月4日周二

Re: [ANNOUNCE] Dian Fu becomes a Flink committer

2020-01-16 Thread Wei Zhong
Congrats Dian Fu! Well deserved! Best, Wei > 在 2020年1月16日,18:10,Hequn Cheng 写道: > > Congratulations, Dian. > Well deserved! > > Best, Hequn > > On Thu, Jan 16, 2020 at 6:08 PM Leonard Xu > wrote: > Congratulations! Dian Fu > > Best, > Leonard > >> 在

Re: [ANNOUNCE] Apache Flink 1.8.3 released

2019-12-11 Thread Wei Zhong
Thanks Hequn for being the release manager. Great work! Best, Wei > 在 2019年12月12日,15:27,Jingsong Li 写道: > > Thanks Hequn for your driving, 1.8.3 fixed a lot of issues and it is very > useful to users. > Great work! > > Best, > Jingsong Lee > > On Thu, Dec 12, 2019 at 3:25 PM jincheng sun

Re: yarn-session模式通过python api消费kafka数据报错

2019-12-09 Thread Wei Zhong
Hi 改改, 看现在的报错,可能是kafka版本不匹配,你需要放入lib目录的kafka connector 需要是0.11版本的,即flink-sql-connector-kafka-0.11_2.11-1.9.1.jar > 在 2019年12月10日,10:06,改改 写道: > > HI Wei Zhong , > 感谢您的回复,flink的lib目录下已经放了kafka connector的jar包的,我的flink/lib目录下文件目录如下: > > <560079166

Re: yarn-session模式通过python api消费kafka数据报错

2019-12-09 Thread Wei Zhong
Hi 改改, 只看这个报错的话信息量太少不能确定,不过一个可能性比较大的原因是kafka connector的jar包没有放到lib目录下,能否检查一下你的flink的lib目录下是否存在kafka connector的jar包? > 在 2019年12月6日,14:36,改改 写道: > > > [root@hdp02 bin]# ./flink run -yid application_1575352295616_0014 -py > /opt/tumble_window.py > 2019-12-06 14:15:48,262 INFO