Re: [DISCUSS] Flink 1.6 features

zhangminglei Mon, 18 Jun 2018 18:31:19 -0700

Hi, Sagar

Thank your for your review. I will fix it when available.


> 2. Will you be able to add more unit tests in the commit ? Eg. Writing some 
> example data with simple schema which will initialize OrcWriter object and 
> sinking it to local hdfs node ?


Ans: Yes. I will add more unit tests in there.

> 3. Are there plans to add support for other data types ?

Ans: Yes. Since I have been busy these days. After a couple of days, I will add 
the rest data type. And give more tests for that.

Cheers
Zhangminglei


> 在 2018年6月19日，上午9:10，sagar loke <sagar...@gmail.com> 写道：
> 
> Thanks @zhangminglei for replying. 
> 
> I agree, hive on Flink would be a big project. 
> 
> By the way, i looked at the Jira ticket related to ORC format which you 
> shared. 
> 
> Couple of comments/requests about the pull request in th ticket:
> 
> 1. Sorry for nitpicking but meatSchema is mispelled. I think it should be 
> metaSchema. 
> 
> 2. Will you be able to add more unit tests in the commit ? Eg. Writing some 
> example data with simple schema which will initialize OrcWriter object and 
> sinking it to local hdfs node ?
> 
> 3. Are there plans to add support for other data types ?
> 
> Thanks,
> Sagar
> 
> On Sun, Jun 17, 2018 at 6:45 AM zhangminglei <18717838...@163.com 
> <mailto:18717838...@163.com>> wrote:
> But if we do hive on flink , I think it should be a very big project.
> 
> 
> > 在 2018年6月17日，下午9:36，Will Du <will...@gmail.com <mailto:will...@gmail.com>> 
> > 写道：
> > 
> > Agree, two missing pieces I think could make Flink more competitive against 
> > Spark SQL/Stream and Kafka Stream
> > 1. Flink over Hive or Flink SQL hive table source and sink
> > 2. Flink ML on stream
> > 
> > 
> >> On Jun 17, 2018, at 8:34 AM, zhangminglei <18717838...@163.com 
> >> <mailto:18717838...@163.com>> wrote:
> >> 
> >> Actually, I have been an idea, how about support hive on flink ? Since 
> >> lots of business are written by hive sql. And users wants to transform map 
> >> reduce to fink without changing the sql.
> >> 
> >> Zhangminglei
> >> 
> >> 
> >> 
> >>> 在 2018年6月17日，下午8:11，zhangminglei <18717838...@163.com 
> >>> <mailto:18717838...@163.com>> 写道：
> >>> 
> >>> Hi, Sagar
> >>> 
> >>> There already has relative JIRAs for ORC and Parquet, you can take a look 
> >>> here: 
> >>> 
> >>> https://issues.apache.org/jira/browse/FLINK-9407 
> >>> <https://issues.apache.org/jira/browse/FLINK-9407> 
> >>> <https://issues.apache.org/jira/browse/FLINK-9407 
> >>> <https://issues.apache.org/jira/browse/FLINK-9407>> and 
> >>> https://issues.apache.org/jira/browse/FLINK-9411 
> >>> <https://issues.apache.org/jira/browse/FLINK-9411> 
> >>> <https://issues.apache.org/jira/browse/FLINK-9411 
> >>> <https://issues.apache.org/jira/browse/FLINK-9411>>
> >>> 
> >>> For ORC format, Currently only support basic data types, such as Long, 
> >>> Boolean, Short, Integer, Float, Double, String. 
> >>> 
> >>> Best
> >>> Zhangminglei
> >>> 
> >>> 
> >>> 
> >>>> 在 2018年6月17日，上午11:11，sagar loke <sagar...@gmail.com 
> >>>> <mailto:sagar...@gmail.com>> 写道：
> >>>> 
> >>>> We are eagerly waiting for 
> >>>> 
> >>>> - Extends Streaming Sinks:
> >>>>   - Bucketing Sink should support S3 properly (compensate for eventual 
> >>>> consistency), work with Flink's shaded S3 file systems, and efficiently 
> >>>> support formats that compress/index arcoss individual rows (Parquet, 
> >>>> ORC, ...)
> >>>> 
> >>>> Especially for ORC and Parquet sinks. Since, We are planning to use 
> >>>> Kafka-jdbc to move data from rdbms to hdfs. 
> >>>> 
> >>>> Thanks,
> >>>> 
> >>>> On Sat, Jun 16, 2018 at 5:08 PM Elias Levy <fearsome.lucid...@gmail.com 
> >>>> <mailto:fearsome.lucid...@gmail.com> <mailto:fearsome.lucid...@gmail.com 
> >>>> <mailto:fearsome.lucid...@gmail.com>>> wrote:
> >>>> One more, since it we have to deal with it often:
> >>>> 
> >>>> - Idling sources (Kafka in particular) and proper watermark propagation: 
> >>>> FLINK-5018 / FLINK-5479
> >>>> 
> >>>> On Fri, Jun 8, 2018 at 2:58 PM, Elias Levy <fearsome.lucid...@gmail.com 
> >>>> <mailto:fearsome.lucid...@gmail.com> <mailto:fearsome.lucid...@gmail.com 
> >>>> <mailto:fearsome.lucid...@gmail.com>>> wrote:
> >>>> Since wishes are free:
> >>>> 
> >>>> - Standalone cluster job isolation: 
> >>>> https://issues.apache.org/jira/browse/FLINK-8886 
> >>>> <https://issues.apache.org/jira/browse/FLINK-8886> 
> >>>> <https://issues.apache.org/jira/browse/FLINK-8886 
> >>>> <https://issues.apache.org/jira/browse/FLINK-8886>>
> >>>> - Proper sliding window joins (not overlapping hoping window joins): 
> >>>> https://issues.apache.org/jira/browse/FLINK-6243 
> >>>> <https://issues.apache.org/jira/browse/FLINK-6243> 
> >>>> <https://issues.apache.org/jira/browse/FLINK-6243 
> >>>> <https://issues.apache.org/jira/browse/FLINK-6243>>
> >>>> - Sharing state across operators: 
> >>>> https://issues.apache.org/jira/browse/FLINK-6239 
> >>>> <https://issues.apache.org/jira/browse/FLINK-6239> 
> >>>> <https://issues.apache.org/jira/browse/FLINK-6239 
> >>>> <https://issues.apache.org/jira/browse/FLINK-6239>>
> >>>> - Synchronizing streams: 
> >>>> https://issues.apache.org/jira/browse/FLINK-4558 
> >>>> <https://issues.apache.org/jira/browse/FLINK-4558> 
> >>>> <https://issues.apache.org/jira/browse/FLINK-4558 
> >>>> <https://issues.apache.org/jira/browse/FLINK-4558>>
> >>>> 
> >>>> Seconded:
> >>>> - Atomic cancel-with-savepoint: 
> >>>> https://issues.apache.org/jira/browse/FLINK-7634 
> >>>> <https://issues.apache.org/jira/browse/FLINK-7634> 
> >>>> <https://issues.apache.org/jira/browse/FLINK-7634 
> >>>> <https://issues.apache.org/jira/browse/FLINK-7634>>
> >>>> - Support dynamically changing CEP patterns : 
> >>>> https://issues.apache.org/jira/browse/FLINK-7129 
> >>>> <https://issues.apache.org/jira/browse/FLINK-7129> 
> >>>> <https://issues.apache.org/jira/browse/FLINK-7129 
> >>>> <https://issues.apache.org/jira/browse/FLINK-7129>>
> >>>> 
> >>>> 
> >>>> On Fri, Jun 8, 2018 at 1:31 PM, Stephan Ewen <se...@apache.org 
> >>>> <mailto:se...@apache.org> <mailto:se...@apache.org 
> >>>> <mailto:se...@apache.org>>> wrote:
> >>>> Hi all!
> >>>> 
> >>>> Thanks for the discussion and good input. Many suggestions fit well with 
> >>>> the proposal above.
> >>>> 
> >>>> Please bear in mind that with a time-based release model, we would 
> >>>> release whatever is mature by end of July.
> >>>> The good thing is we could schedule the next release not too far after 
> >>>> that, so that the features that did not quite make it will not be 
> >>>> delayed too long.
> >>>> In some sense, you could read this as as "what to do first" list, rather 
> >>>> than "this goes in, other things stay out".
> >>>> 
> >>>> Some thoughts on some of the suggestions
> >>>> 
> >>>> Kubernetes integration: An opaque integration with Kubernetes should be 
> >>>> supported through the "as a library" mode. For a deeper integration, I 
> >>>> know that some committers have experimented with some PoC code. I would 
> >>>> let Till add some thoughts, he has worked the most on the deployment 
> >>>> parts recently.
> >>>> 
> >>>> Per partition watermarks with idleness: Good point, could one implement 
> >>>> that on the current interface, with a periodic watermark extractor?
> >>>> 
> >>>> Atomic cancel-with-savepoint: Agreed, this is important. Making this 
> >>>> work with all sources needs a bit more work. We should have this in the 
> >>>> roadmap.
> >>>> 
> >>>> Elastic Bloomfilters: This seems like an interesting new feature - the 
> >>>> above suggested feature set was more about addressing some longer 
> >>>> standing issues/requests. However, nothing should prevent contributors 
> >>>> to work on that.
> >>>> 
> >>>> Best,
> >>>> Stephan
> >>>> 
> >>>> 
> >>>> On Wed, Jun 6, 2018 at 6:23 AM, Yan Zhou [FDS Science] 
> >>>> <yz...@coupang.com <mailto:yz...@coupang.com> <mailto:yz...@coupang.com 
> >>>> <mailto:yz...@coupang.com>>> wrote:
> >>>> +1 on https://issues.apache.org/jira/browse/FLINK-5479 
> >>>> <https://issues.apache.org/jira/browse/FLINK-5479> 
> >>>> <https://issues.apache.org/jira/browse/FLINK-5479 
> >>>> <https://issues.apache.org/jira/browse/FLINK-5479>>
> >>>> [FLINK-5479] Per-partition watermarks in ... 
> >>>> <https://issues.apache.org/jira/browse/FLINK-5479 
> >>>> <https://issues.apache.org/jira/browse/FLINK-5479>>
> >>>> issues.apache.org <http://issues.apache.org/> <http://issues.apache.org/ 
> >>>> <http://issues.apache.org/>>
> >>>> Reported in ML: 
> >>>> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Kafka-topic-partition-skewness-causes-watermark-not-being-emitted-td11008.html
> >>>>  
> >>>> <http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Kafka-topic-partition-skewness-causes-watermark-not-being-emitted-td11008.html>
> >>>>  
> >>>> <http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Kafka-topic-partition-skewness-causes-watermark-not-being-emitted-td11008.html
> >>>>  
> >>>> <http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Kafka-topic-partition-skewness-causes-watermark-not-being-emitted-td11008.html>>
> >>>>  It's normally not a common case to have Kafka partitions not producing 
> >>>> any data, but it'll probably be good to handle this as well. I ...
> >>>> 
> >>>> From: Rico Bergmann <i...@ricobergmann.de <mailto:i...@ricobergmann.de> 
> >>>> <mailto:i...@ricobergmann.de <mailto:i...@ricobergmann.de>>>
> >>>> Sent: Tuesday, June 5, 2018 9:12:00 PM
> >>>> To: Hao Sun
> >>>> Cc: d...@flink.apache.org <mailto:d...@flink.apache.org> 
> >>>> <mailto:d...@flink.apache.org <mailto:d...@flink.apache.org>>; user
> >>>> Subject: Re: [DISCUSS] Flink 1.6 features
> >>>> 
> >>>> +1 on K8s integration 
> >>>> 
> >>>> 
> >>>> 
> >>>> Am 06.06.2018 um 00:01 schrieb Hao Sun <ha...@zendesk.com 
> >>>> <mailto:ha...@zendesk.com> <mailto:ha...@zendesk.com 
> >>>> <mailto:ha...@zendesk.com>>>:
> >>>> 
> >>>>> adding my vote to K8S Job mode, maybe it is this?
> >>>>>> Smoothen the integration in Container environment, like "Flink as a 
> >>>>>> Library", and easier integration with Kubernetes services and other 
> >>>>>> proxies.
> >>>>> 
> >>>>> 
> >>>>> 
> >>>>> On Mon, Jun 4, 2018 at 11:01 PM Ben Yan <yan.xiao.bin.m...@gmail.com 
> >>>>> <mailto:yan.xiao.bin.m...@gmail.com> 
> >>>>> <mailto:yan.xiao.bin.m...@gmail.com 
> >>>>> <mailto:yan.xiao.bin.m...@gmail.com>>> wrote:
> >>>>> Hi Stephan,
> >>>>> 
> >>>>> Will  [ https://issues.apache.org/jira/browse/FLINK-5479 
> >>>>> <https://issues.apache.org/jira/browse/FLINK-5479> 
> >>>>> <https://issues.apache.org/jira/browse/FLINK-5479 
> >>>>> <https://issues.apache.org/jira/browse/FLINK-5479>> ]  (Per-partition 
> >>>>> watermarks in FlinkKafkaConsumer should consider idle partitions) be 
> >>>>> included in 1.6? As we are seeing more users with this issue on the 
> >>>>> mailing lists.
> >>>>> 
> >>>>> Thanks.
> >>>>> Ben
> >>>>> 
> >>>>> 2018-06-05 5:29 GMT+08:00 Che Lui Shum <sh...@us.ibm.com 
> >>>>> <mailto:sh...@us.ibm.com> <mailto:sh...@us.ibm.com 
> >>>>> <mailto:sh...@us.ibm.com>>>:
> >>>>> Hi Stephan,
> >>>>> 
> >>>>> Will FLINK-7129 (Support dynamically changing CEP patterns) be included 
> >>>>> in 1.6? There were discussions about possibly including it in 1.6: 
> >>>>> http://mail-archives.apache.org/mod_mbox/flink-user/201803.mbox/%3cCAMq=ou7gru2o9jtowxn1lc1f7nkcxayn6a3e58kxctb4b50...@mail.gmail.com%3e
> >>>>>  
> >>>>> <http://mail-archives.apache.org/mod_mbox/flink-user/201803.mbox/%3cCAMq=ou7gru2o9jtowxn1lc1f7nkcxayn6a3e58kxctb4b50...@mail.gmail.com%3e>
> >>>>>  
> >>>>> <http://mail-archives.apache.org/mod_mbox/flink-user/201803.mbox/%3cCAMq=ou7gru2o9jtowxn1lc1f7nkcxayn6a3e58kxctb4b50...@mail.gmail.com%3e
> >>>>>  
> >>>>> <http://mail-archives.apache.org/mod_mbox/flink-user/201803.mbox/%3cCAMq=ou7gru2o9jtowxn1lc1f7nkcxayn6a3e58kxctb4b50...@mail.gmail.com%3e>>
> >>>>> 
> >>>>> Thanks,
> >>>>> Shirley Shum
> >>>>> 
> >>>>> Stephan Ewen ---06/04/2018 02:21:47 AM---Hi Flink Community! The 
> >>>>> release of Apache Flink 1.5 has happened (yay!) - so it is a good time
> >>>>> 
> >>>>> From: Stephan Ewen <se...@apache.org <mailto:se...@apache.org> 
> >>>>> <mailto:se...@apache.org <mailto:se...@apache.org>>>
> >>>>> To: d...@flink.apache.org <mailto:d...@flink.apache.org> 
> >>>>> <mailto:d...@flink.apache.org <mailto:d...@flink.apache.org>>, user 
> >>>>> <user@flink.apache.org <mailto:user@flink.apache.org> 
> >>>>> <mailto:user@flink.apache.org <mailto:user@flink.apache.org>>>
> >>>>> Date: 06/04/2018 02:21 AM
> >>>>> Subject: [DISCUSS] Flink 1.6 features
> >>>>> 
> >>>>> 
> >>>>> 
> >>>>> Hi Flink Community!
> >>>>> 
> >>>>> The release of Apache Flink 1.5 has happened (yay!) - so it is a good 
> >>>>> time to start talking about what to do for release 1.6.
> >>>>> 
> >>>>> == Suggested release timeline ==
> >>>>> 
> >>>>> I would propose to release around end of July (that is 8-9 weeks from 
> >>>>> now).
> >>>>> 
> >>>>> The rational behind that: There was a lot of effort in release testing 
> >>>>> automation (end-to-end tests, scripted stress tests) as part of release 
> >>>>> 1.5. You may have noticed the big set of new modules under 
> >>>>> "flink-end-to-end-tests" in the Flink repository. It delayed the 1.5 
> >>>>> release a bit, and needs to continue as part of the coming release 
> >>>>> cycle, but should help make releasing more lightweight from now on.
> >>>>> 
> >>>>> (Side note: There are also some nightly stress tests that we created 
> >>>>> and run at data Artisans, and where we are looking whether and in which 
> >>>>> way it would make sense to contribute them to Flink.)
> >>>>> 
> >>>>> == Features and focus areas ==
> >>>>> 
> >>>>> We had a lot of big and heavy features in Flink 1.5, with FLIP-6, the 
> >>>>> new network stack, recovery, SQL joins and client, ... Following 
> >>>>> something like a "tick-tock-model", I would suggest to focus the next 
> >>>>> release more on integrations, tooling, and reducing user friction. 
> >>>>> 
> >>>>> Of course, this does not mean that no other pull request gets reviewed, 
> >>>>> an no other topic will be examined - it is simply meant as a help to 
> >>>>> understand where to expect more activity during the next release cycle. 
> >>>>> Note that these are really the coarse focus areas - don't read this as 
> >>>>> a comprehensive list.
> >>>>> 
> >>>>> This list is my first suggestion, based on discussions with committers, 
> >>>>> users, and mailing list questions.
> >>>>> 
> >>>>> - Support Java 9 and Scala 2.12
> >>>>> 
> >>>>> - Smoothen the integration in Container environment, like "Flink as a 
> >>>>> Library", and easier integration with Kubernetes services and other 
> >>>>> proxies.
> >>>>> 
> >>>>> - Polish the remaing parts of the FLIP-6 rewrite
> >>>>> 
> >>>>> - Improve state backends with asynchronous timer snapshots, efficient 
> >>>>> timer deletes, state TTL, and broadcast state support in RocksDB.
> >>>>> 
> >>>>> - Extends Streaming Sinks:
> >>>>>   - Bucketing Sink should support S3 properly (compensate for eventual 
> >>>>> consistency), work with Flink's shaded S3 file systems, and efficiently 
> >>>>> support formats that compress/index arcoss individual rows (Parquet, 
> >>>>> ORC, ...)
> >>>>>   - Support ElasticSearch's new REST API
> >>>>> 
> >>>>> - Smoothen State Evolution to support type conversion on snapshot 
> >>>>> restore
> >>>>> 
> >>>>> - Enhance Stream SQL and CEP
> >>>>>   - Add support for "update by key" Table Sources
> >>>>>   - Add more table sources and sinks (Kafka, Kinesis, Files, K/V stores)
> >>>>>   - Expand SQL client
> >>>>>   - Integrate CEP and SQL, through MATCH_RECOGNIZE clause
> >>>>>   - Improve CEP Performance of SharedBuffer on RocksDB
> >>>>> 
> >>>>> 
> >>>>> 
> >>>>> 
> >>>>> 
> >>>> 
> >>>> 
> >>>> 
> >>>> -- 
> >>>> Cheers,
> >>>> Sagar
> >>> 
> >> 
> >> 
> 
> 
> -- 
> Cheers,
> Sagar

Re: [DISCUSS] Flink 1.6 features

Reply via email to