?? flink-table-runtime-blink #
org.apache.flink.table.runtime.functions.SqlFunctionUtils
??demo
----
??: "??"
Hi:
经常会用到flink-sql的内置函数,因为官方函数比自己写的UDF更加健壮,想学习下官方函数是怎么写的,请问在哪一块能看到这个代码呢?
是不是数据重复了,如果是ORC格式可以尝试执行alter table table_name partition (pt_dt='2021-02-20')
concatenate 语句进行小文件的合并。
--Original--
From: "RS";
Date: 2022年2月22日(星期二) 上午9:36
To: "user-zh";
Subject: hive 进行 overwrite 合并数据后文件变大?
Hi,
Hello, Dan
> 2022年2月21日 下午9:11,Dan Serb 写道:
> 1.Have a processor that uses Flink JDBC CDC Connector over the table that
> stores the information I need. (This is implemented currently - working)
You mean you’ve implemented a Flink JDBC Connector? Maybe the Flink CDC
Connectors[1] would help
Hi,
flink写hive任务,checkpoint周期配置的比较短,生成了很多小文件,一天一个目录,
然后我调用flink sql合并之前的数据,跑完之后,发现存储变大了,请教下这个是什么原因导致的?
合并之前是很多小part文件,overwrite之后文件减少了,但是存储变大了,从274MB变大成2.9GB了?
hive表table1的分区字段是`date`
insert overwrite aw_topic_compact select * from `table1` where
`date`='2022-02-21';
合并前:
514.0 M 1.5 G
Hey Marco,
There’s unfortunately no perfect fit here, at least that I know of. A
Deployment will make it possible to upgrade the image, but does not support
container exits (eg if the Flink job completes, even successfully, K8s will
still restart the container). If you are only running long lived
Thanks a lot Yufei and Wong.
I was able to get a version working by combining both the aspects mentioned in
each of your responses.
1. Trying the sample code base that Wong mentioned below resulted in a
no-response from JobManager. I had to use the non-sql connector jar in my
python
Hello flink community,
I am deploying a flink application cluster using a helm chart , the problem
is that the jobmanager component type is a "Job" , and with helm i can't do
an upgrade of the chart in order to change the application image version
because helm is unable to upgrade the docker
Thanks Guovei and Francis for your references.
On Monday, February 21, 2022, 01:05:58 AM EST, Guowei Ma
wrote:
Hi,
You can try flink's cdc connector [1] to see if it meets your needs.
[1] https://github.com/ververica/flink-cdc-connectors
Best,Guowei
On Mon, Feb 21, 2022 at 6:23
Hello all,
I kind of need the community’s help with some ideas, as I’m quite new with
Flink and I feel like I need a little bit of guidance in regard to an
implementation I’m working on.
What I need to do, is to have a way to store a mysql table in Flink, and expose
that data to other jobs,
Luning Wong 于2022年2月21日周一 19:38写道:
> import logging
> import sys
>
> from pyflink.common import SimpleStringSchema, WatermarkStrategy
> from pyflink.datastream import StreamExecutionEnvironment
> from pyflink.datastream.connectors import PulsarSource,
> PulsarDeserializationSchema,
Hello all,
I have to perform a join between two large csv sets that do not fit in ram. I
process this two files in batch mode. I also need a side output to catch csv
processing errors.
So my question is what is the best way to this kind of join operation ? I think
I should use a valueState
Hi Ananth,
>From the steps you described, the steps involved using
`flink-sql-connector-pulsar-1.15-SNAPSHOT.jar`, however to my knowledge
pulsar connector has not supported Table API yet, so would you mind
considering using the `flink-connector-pulsar-1.14.jar` (without sql,
though the classes
Thanks Guowei.
A small correction in the telnet result command below. I had a typo in the
telnet command earlier (did not separate the port from host name ). Issuing the
proper telnet command resolved the jobmanagers host properly.
Regards,
Ananth
From: Guowei Ma
Date: Monday, 21 February
Thanks Ananth for your clarification.But I am not an expert on Pulsar.
I would cc the author of the connector to have a look. Would Yufei like to
give some insight?
Best,
Guowei
On Mon, Feb 21, 2022 at 2:10 PM Ananth Gundabattula <
agundabatt...@darwinium.com> wrote:
> Thanks for the response
退订
Hi Ryan,
Thanks for bringing up this topic. Currently, your analysis is
correct, and reading parquet files outside the Table API is rather
difficult. The community started an effort in Flink 1.15 to
restructure some of the formats to make them better applicable to the
DataStream and Table API.
19 matches
Mail list logo