Hi Fei,
可以带上你的 SQL 代码吗?你用的版本和planner 是哪个?
以及具体描述下”当前数据量无法统计,只能到下次才能统计到“,这个现象?不是很理解。
Best,
Jark
On Wed, 1 Apr 2020 at 18:00, Fei Han
wrote:
> Hi,大家好:
> 在做窗口统计的时候,用count over和sum over出现当前数据量无法统计,只能到下次才能统计到。
> 是参数写错了,还是另有其他函数,数据过来应该类似闭区间,现在是开区间的。请大家给个建议,谢谢啦?
Hi,
I want to perform some processing on events only when the watermark is
updated. Otherwise, for all other events, I want to keep buffering them
till the watermark arrives.
The main motivation behind doing this is that I have several operators that
emit events/messages to a downstream operator.
Hi,
GenericUDTFExplode是一个UDTF。
Flink中使用UDTF的方式是标准SQL的方式:
"select x from db1.nested, lateral table(explode(a)) as T(x)"
你试下。
[1]
https://ci.apache.org/projects/flink/flink-docs-master/dev/table/functions/udfs.html#table-functions
Best,
Jingsong Lee
On Thu, Apr 2, 2020 at 11:22 AM Yaoting Gong
大家后,
我们项目目前 集成了HiveModule,遇到一些问题。请教下大家。
在集成 Hive Module 之前,substr,split 都是无法使用的。集成后,验证都是可以的。
比如:select split('1,2,2,4',',')
但是在使用 explode 函数,select explode(split('1,2,2,4',','));
有如下错误:
The main method caused an error: SQL validation failed. From line 1, column
8 to line 1, column 36: *No
Hi,
了解了,那我知道怎么解决了。我这边使用的是sql-gateway,看样子得在sql-gateway里面加一种表定义的语法了。
多谢多谢
Best,
Xinghalo
在2020年04月2日 10:52,Benchao Li 写道:
你写的不是维表join的语法,维表join现在用的是temporal table[1] 的方式来实现的,需要特殊的join语法:
SELECT
o.amout, o.currency, r.rate, o.amount * r.rateFROM
Orders AS o
JOIN LatestRates FOR SYSTEM_TIME AS OF
Hi,
I think we may add latency metric for each operator, which can reflect
consumption ability of each operator.
Best,
Dan Zou
> 在 2020年3月30日,18:19,Guanghui Zhang 写道:
>
> Hi.
> At flink source connector, you can send $source_current_time - $event_time
> metric.
> In the meantime, at flink
你写的不是维表join的语法,维表join现在用的是temporal table[1] 的方式来实现的,需要特殊的join语法:
SELECT
o.amout, o.currency, r.rate, o.amount * r.rateFROM
Orders AS o
JOIN LatestRates FOR SYSTEM_TIME AS OF o.proctime AS r
ON r.currency = o.currency
此外,你看的JDBCTableSource是一个普通的bounded source,或者是batch
是的,有关的,这个umbrella issue就是FLIP-115.
Best,
Jingsong Lee
On Wed, Apr 1, 2020 at 10:57 PM 叶贤勋 wrote:
> Hi jingsong,
> 我看这个issue[1] 你提了关于支持hive streaming sink的两个pr,这部分代码是否跟Flip-115相关?
>
>
> [1] https://issues.apache.org/jira/browse/FLINK-14255
>
>
> | |
> 叶贤勋
> |
> |
> yxx_c...@163.com
> |
>
Hi,
试验了下貌似不行,我的sql:
select s.*, item.product_name
from ( select member_id, uuid, page, cast(SPLIT_INDEX(page, '10.pd.item-', 1)
as string) as item_id, `time` from tgs_topic_t1 where page like
'10.pd.item-%’ ) s
inner join PROD_ITEM_MALL_ACTIVITY_PRODUCT item on cast(item.id as string) =
registerTableSource 被标记了@Deprecated 在flink
1.10,我这种情况是继续沿用过期的API(registerTableSource)吗?
原始邮件
发件人: deadwind4
收件人: user-zh
发送时间: 2020年4月2日(周四) 10:30
主题: Re: flink 1.10 createTemporaryTable丢失proctime问题
修改前
tEnv.connect().withFormat().withSchema(
xxx.proctime()
).registerTableSource(“foo”);
修改前
tEnv.connect().withFormat().withSchema(
xxx.proctime()
).registerTableSource(“foo”);
修改后
tEnv.connect().withFormat().withSchema(
xxx.proctime()
).createTemporaryTable(“foo”);
修改后.proctime()就失效了,所以我proctime window也用不了了。
原始邮件
发件人: deadwind4
收件人: user-zh
发送时间: 2020年4月2日(周四) 10:22
主题:
tEnv.connect().withFormat().withSchema().registerTableSource(“foo”);
tEnv.connect().withFormat().withSchema().createTemporaryTable(“foo”);
原始邮件
发件人: Jark Wu
收件人: user-zh
发送时间: 2020年4月2日(周四) 10:18
主题: Re: flink 1.10 createTemporaryTable丢失proctime问题
Hi, 你能描述下你的改动前后的代码吗?据我所知 TableEnvironment
Hi benchao,
原来如此,我这边只是做了普通查询,并没有走join。
我加上join条件再试下哈
Best,
Xinghalo
在2020年04月2日 10:11,Benchao Li 写道:
Hi,
能否把你的SQL也发出来呢?
正常来讲,维表关联用的是join的等值条件作为关联的条件去mysql查询,然后如果还有其他的filter,会在关联之后的结果之上在做filter。如果你发现每次都是扫描全表,很有可能是你的维表join的条件写的有问题导致的。
111 于2020年4月2日周四 上午9:55写道:
Hi,
想确认下MySQL JDBC
Hi,
你能描述下你的改动前后的代码吗?据我所知 TableEnvironment 上没有 createTemporaryTable
方法,只有createTemporaryView方法,而且 registerTableSource 和 createTemporaryView
的参数是不一样的。
Best,
Jark
> 2020年4月1日 23:13,deadwind4 写道:
>
> 我其实是想用processing time window 但是我把过期的API
>
Hi,
能否把你的SQL也发出来呢?
正常来讲,维表关联用的是join的等值条件作为关联的条件去mysql查询,然后如果还有其他的filter,会在关联之后的结果之上在做filter。如果你发现每次都是扫描全表,很有可能是你的维表join的条件写的有问题导致的。
111 于2020年4月2日周四 上午9:55写道:
> Hi,
> 想确认下MySQL JDBC Connector的实现细节,目前了解到的实现是:
> 1 获取查询sql中的字段和表名,拼接成select a, b, c from t
> 2
Hi,
想确认下MySQL JDBC Connector的实现细节,目前了解到的实现是:
1 获取查询sql中的字段和表名,拼接成select a, b, c from t
2 创建并行输入,如果制定了分区字段,会按照分区字段和并行度切分生成where条件,如where id between xxx and xxx
3 执行sql,加载到内存(不确定后面缓存的实现细节)
目前我们遇到的问题是,我们把mysql作为维表(表的量级在一千500万左右,并行度为10),没有指定分区条件(只有一个slot执行查询,其他的都没有查询任务)。
结果导致只有一个分区查询数据,查询的sql为select
All, it looks like the actual return structure from the API is:
1. Success
> {
> "status": {
> "id": "completed"
> },
> *"operation"*: {
> "location": "string"
> }
> }
2. Failure
> {
> "status": {
> "id": "completed"
> },
> *"operation"*: {
> "failure-cause": {
>
Hello Robert,
Thank you for the help, *tried to access method
org.apache.hadoop.yarn.client.**ConfiguredRMFailoverProxyProvi*
was the root of the issue.
I managed to fix issue with the "hadoop version is 2.4.1", it was because
previous flink version I was using referred to
HiI have attached a simple project with a test that reproduce the problem. The normal fault is a mixed string but you can also EOF exception. Please let me know if you have any questions to the solution. Med venlig hilsen / Best regardsLasse Nedergaard
Thank you very much Chesnary!
On Tue, Mar 31, 2020 at 1:32 AM Chesnay Schepler wrote:
> flink-shaded-hadoop2 was released as part of Flink until 1.8 (hence why it
> followed the Flink version scheme), after which it was renamed to
> flink-shaded-hadoop-2 and is now being released separately
Hi jingsong,
我看这个issue[1] 你提了关于支持hive streaming sink的两个pr,这部分代码是否跟Flip-115相关?
[1] https://issues.apache.org/jira/browse/FLINK-14255
| |
叶贤勋
|
|
yxx_c...@163.com
|
签名由网易邮箱大师定制
在2020年04月1日 16:28,111 写道:
Hi jingsong,
那厉害了,相当于Flink内部做了一个数据湖的插件了。
Best,
Xinghalo
Hi,
Sorry I missed that. But yes, it looks like you are running two JobManagers :)
You can always check the yarn logs for more information what is being executed.
Piotrek
> On 1 Apr 2020, at 16:44, Antonio Martínez Carratalá
> wrote:
>
> Hi Piotr,
>
> I don't have 2 task managers, just one
Hi Piotr,
I don't have 2 task managers, just one with 2 slots. That would be ok
according to my calculations, but as Craig said I need one more instance
for the cluster master. I was guessing the job manager was running in the
master and the task manager in the slave, but both job manager and
@Marta
Thanks for the tip! I'll do that.
Best,
Yangze Guo
On Wed, Apr 1, 2020 at 8:05 PM Marta Paes Moreira wrote:
>
> Hey, Yangze.
>
> I'd like to suggest that you submit this tool to Flink Community Pages [1].
> That way it can get more exposure and it'll be easier for users to find it.
>
>
@Marta
Thanks for the tip! I'll do that.
Best,
Yangze Guo
On Wed, Apr 1, 2020 at 8:05 PM Marta Paes Moreira wrote:
>
> Hey, Yangze.
>
> I'd like to suggest that you submit this tool to Flink Community Pages [1].
> That way it can get more exposure and it'll be easier for users to find it.
>
>
Hi Longfei,
非常抱歉当前确实支持不了,不过这个问题将在 FLIP-95 提供的新 TableSink 接口后解决,有望在 1.11 中解决。
Best,
Jark
On Wed, 1 Apr 2020 at 19:12, Longfei Zhou wrote:
> 问题:
> SQL中对时间窗口和PRODUCT_ID进行了Group
> By聚合操作,PG数据表中的主键须设置为WINDOW_START/WINDOW_END和PRODUCT_ID,否则无法以upinsert方式写出数据,但是这样却无法满足业务场景的需求,业务上应以RANK_ID
>
Hi,
proctime 的含义是机器时间,不等价于 now()或者 current_timestamp() 函数,该字段只有在真正使用的才会物化(即去取
System.currentTimeMillis)。
能请描述下你想用 createTemporaryTable 做什么呢?当前哪里不能满足你的需求呢?
Best,
Jark
On Wed, 1 Apr 2020 at 18:56, deadwind4 wrote:
>
>
Hey, Yangze.
I'd like to suggest that you submit this tool to Flink Community Pages [1].
That way it can get more exposure and it'll be easier for users to find it.
Thanks for your contribution!
[1] https://flink-packages.org/
On Tue, Mar 31, 2020 at 9:09 AM Yangze Guo wrote:
> Hi, there.
>
Hi Laurent!
On 31.03.20 10:43, Laurent Exsteens wrote:
Yesterday I managed to find another solution: create the type information
outside of the class and pass it to the constructor. I can retrieve the
type information from DataStream.getType() (whiich. This works well, and is
acceptable in my
+1 to making Blink the default planner, we definitely don't want to
maintain two planners for much longer.
Best,
Aljoscha
Hey,
The thread you are referring to is about DataStream API job and long
checkpointing issue. While from your message it seems like you are using Table
API (SQL) to process a batch data? Or what exactly do you mean by:
> i notice that there are one or two subtasks that take too long to
Hi,
I haven’t heard about Flink specific problem like that. Have you checked that
the records are not changing over time? That they are not for example twice as
large or twice as heavy to process? Especially that you are using rate limiter
with 12MB/s. If your records grew to 60KB in size,
Hey,
Isn’t explanation of the problem in the logs that you posted? Not enough
memory? You have 2 EMR nodes, 8GB memory each, while trying to allocate 2
TaskManagers AND 1 JobManager with 6GB heap size each?
Piotrek
> On 31 Mar 2020, at 17:01, Antonio Martínez Carratalá
> wrote:
>
> Hi, I'm
??
SQL??PRODUCT_ID??Group
By??PG??WINDOW_START/WINDOW_END??PRODUCT_IDupinsert??RANK_ID
+WINDOW_START/WINDOW_ENDFlink ??
??Top3
我使用1.10版本的createTemporaryTable发现proctime字段全是null但是换成过时的registerTableSource就可以。
如果我想使用createTemporaryTable该怎么办。
并且我debug了createTemporaryTable的源码没有发现对proctime的处理。
Hi,大家好:
在做窗口统计的时候,用count over和sum over出现当前数据量无法统计,只能到下次才能统计到。
是参数写错了,还是另有其他函数,数据过来应该类似闭区间,现在是开区间的。请大家给个建议,谢谢啦?
Hi, 从贤,
我查看了下HDFS,
/data/flink1_10/tmp/flink-io-01229972-48d4-4229-ac8c-33f0edfe5b7c/job_5ec178dc885a8f1a64c1925e182562e3_op_KeyedProcessOperator_da2b90ef97e5c844980791c8fe08b926__1_2__uuid_772b4663-f633-4ed5-a67a-d1904760a160下面是空的,也没有db这一层目录。
在 2020-04-01 16:50:13,"Congxian Qiu" 写道:
>Hi
Hi
Restore 可以大致分为两部分,1)下载文件;2)从下载的文件恢复
从 TM 日志看像下载出错了,你可以看下
/data/flink1_10/tmp/flink-io-01229972-48d4-4229-ac8c-33f0edfe5b7c/job_5ec178dc885a8f1a64c1925e182562e3_op_KeyedProcessOperator_da2b90ef97e5c844980791c8fe08b926__1_2__uuid_772b4663-f633-4ed5-a67a-d1904760a160/db/001888.sst
这个文件是不是存在 double
Hi jingsong,
那厉害了,相当于Flink内部做了一个数据湖的插件了。
Best,
Xinghalo
Hi 111,
虽然数据湖可以扩展一些事情,但是流写Hive也是Hive数仓重要的一环。
文件数的问题:
- 取决于checkpoint间隔,如果checkpoint间隔内,能写到128MB的文件,对HDFS来说就是很合适的文件大小了。
- 流写,也可以引入files compact等功能,FLIP-115里面也有讨论。
Best,
Jingsong Lee
On Wed, Apr 1, 2020 at 4:06 PM 111 wrote:
>
>
> Hi,
> 流写入hive,其实是属于数据湖的概念范畴。
>
Hi,
流写入hive,其实是属于数据湖的概念范畴。
因为流往hive里面写,会造成很多的碎片文件,对hdfs造成性能影响,因此一般不会在流场景下直接写入hive。
详细的可以了解 Delta lake 或 hudi。
在2020年04月1日 15:05,sunfulin 写道:
Hi,
场景其实很简单,就是通过Flink实时将kafka数据做个同步到hive。hive里创建了分区表。
我感觉这个场景很常见吧。之前以为是支持的,毕竟可以在通过hivecatalog创建kafka table。但是创建了不能写,有点不合理。
Hi jingjing,
If seconds precision is OK for you.
You can try "to_timestamp(from_unixtime(your_time_seconds_long))".
Best,
Jingsong Lee
On Wed, Apr 1, 2020 at 8:56 AM jingjing bai
wrote:
> Thanks a lot!
>
> Jark Wu 于2020年4月1日周三 上午1:13写道:
>
>> Hi Jing,
>>
>> I created
Hi Lasse
Never meet this problem before, but can you share some exception stack trace so
that we could take a look. The simple project to reproduce is also a good
choice.
Best
Yun Tang
From: Lasse Nedergaard
Sent: Tuesday, March 31, 2020 19:10
To: user
不幸的是,FlinkSQL的确一直不支持。。
是的,这是1.11的重要目标之一。
Best,
Jingsong Lee
On Wed, Apr 1, 2020 at 3:05 PM sunfulin wrote:
> Hi,
> 场景其实很简单,就是通过Flink实时将kafka数据做个同步到hive。hive里创建了分区表。
> 我感觉这个场景很常见吧。之前以为是支持的,毕竟可以在通过hivecatalog创建kafka table。但是创建了不能写,有点不合理。
> OK吧。想问下FLIP-115计划是在哪个release版本支持哈?1.11么?
>
>
>
>
>
>
> 在
Hi,
场景其实很简单,就是通过Flink实时将kafka数据做个同步到hive。hive里创建了分区表。
我感觉这个场景很常见吧。之前以为是支持的,毕竟可以在通过hivecatalog创建kafka table。但是创建了不能写,有点不合理。
OK吧。想问下FLIP-115计划是在哪个release版本支持哈?1.11么?
在 2020-04-01 15:01:32,"Jingsong Li" 写道:
Hi,
Batch模式来支持Kafka -> Hive,也是不推荐的哦,FLIP-115后才可以在streaming模式支持这类场景。
任务启用rocksdb作为状态后端,任务出现异常重启时经常失败Could not restore keyed state backend for
KeyedProcessOperator。这个问题怎么解决呢?
版本:1.10 standalone
配置信息:
state.backend: rocksdb
state.checkpoints.dir: hdfs://nameservice1/data/flink1_10/checkpoint
state.backend.incremental: true
Hi,
Batch模式来支持Kafka -> Hive,也是不推荐的哦,FLIP-115后才可以在streaming模式支持这类场景。
你可以描述下详细堆栈、应用场景、SQL吗?
Best,
Jingsong Lee
On Wed, Apr 1, 2020 at 2:56 PM sunfulin wrote:
>
> 我使用batch mode时,又抛出了如下异常:感觉一步一个坑。。sigh
>
> org.apache.calcite.plan.RelOptPlanner$CannotPlanException: There are not
> enough rules to
我使用batch mode时,又抛出了如下异常:感觉一步一个坑。。sigh
org.apache.calcite.plan.RelOptPlanner$CannotPlanException: There are not enough
rules to produce a node with desired properties
在 2020-04-01 14:49:41,"Jingsong Li" 写道:
>Hi,
>
>异常的意思是现在hive sink还不支持streaming模式,只能用于batch模式中。功能正在开发中[1]
>
+1
In 1.10, we have set default planner for SQL Client to Blink planner[1].
Looks good.
[1]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Set-default-planner-for-SQL-Client-to-Blink-planner-in-1-10-release-td36379.html
Best,
Jingsong Lee
On Wed, Apr 1, 2020 at 11:39 AM
Hi,
异常的意思是现在hive sink还不支持streaming模式,只能用于batch模式中。功能正在开发中[1]
[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-115%3A+Filesystem+connector+in+Table
Best,
Jingsong Lee
On Wed, Apr 1, 2020 at 2:32 PM sunfulin wrote:
> Hi,
> 我这边在使用Flink消费Kafka数据写入hive。配置连接都OK,但是在实际执行insert into
>
Hi,
我这边在使用Flink消费Kafka数据写入hive。配置连接都OK,但是在实际执行insert into
xxx_table时,报了如下异常。这个看不懂啥原因,求大神指教。
cc @Jingsong Li @Jark Wu
org.apache.flink.table.api.TableException: Stream Tables can only be emitted by
AppendStreamTableSink, RetractStreamTableSink, or UpsertStreamTableSink.
at
51 matches
Mail list logo