Re: StreamingFileSink Avro batch size and compression

2019-01-25 Thread Taher Koitawala
Can someone please help with this? On Fri 25 Jan, 2019, 1:47 PM Taher Koitawala Hi All, > Is there a way to specify *batch size* and *compression *properties > when using StreamingFileSink just like we did in bucketing sink? The only > parameters it is accepting is Inactivity bucket

Re: Print table contents

2019-01-25 Thread Soheil Pourbafrani
Thanks a lot. On Sat, Jan 26, 2019 at 10:22 AM Hequn Cheng wrote: > Hi Soheil, > > There is no print() or show() method in Table. As a workaround, you can > convert[1] the Table into a DataSet and perform print() or collect() on the > DataSet. > You have to pay attention to the differences

Re: Print table contents

2019-01-25 Thread Hequn Cheng
Hi Soheil, There is no print() or show() method in Table. As a workaround, you can convert[1] the Table into a DataSet and perform print() or collect() on the DataSet. You have to pay attention to the differences between DataSet.print() and DataSet.collect(). For DataSet.print(), prints the

Re: [DISCUSS] Towards a leaner flink-dist

2019-01-25 Thread Hequn Cheng
Hi Chesnay, Thanks a lot for the proposal! +1 for a leaner flink-dist and improve the "Download" page. I think a leaner flink-dist would be very helpful. If we bundle all jars into a single one, this will easily cause class conflict problem. Best, Hequn On Fri, Jan 25, 2019 at 2:48 PM

Re: Query on retract stream

2019-01-25 Thread Hequn Cheng
Hi Gagan, Time attribute fields will be materialized by the unbounded groupby. Also, currently, the window doesn't have the ability to handle retraction messages. I see two ways to solve the problem. - Use multi-window. The first window performs lastValue, the second performs count. - Use two

Re: TimeZone shift problem in Flink SQL

2019-01-25 Thread Rong Rong
Hi Henry, Unix epoch time values are always under GMT timezone, for example: - 1548162182001 <=> GMT: Tuesday, January 22, 2019 1:03:02.001 PM, or CST: Tuesday, January 22, 2019 9:03:02.001 PM. - 1548190982001 <=> GMT: Tuesday, January 22, 2019 9:03:02.001 PM, or CST: Wednesday, January 23, 2019

Re: Query on retract stream

2019-01-25 Thread Gagan Agrawal
Based on the suggestions in this mail thread, I tried out few experiments on upsert stream with flink 1.7.1 and here is the issue I am facing with window stream. *1. Global Pending order count. * Following query works fine and it's able to handle updates as per original requirement. select

Re: How test and validate a data stream software?

2019-01-25 Thread Congxian Qiu
Hi, Alexandre Maybe the blog post[1] can be helpful. [1] https://www.da-platform.com/blog/extending-the-yahoo-streaming-benchmark Alexandre Strapacao Guedes Vianna 于2019年1月23日周三 下午9:54写道: > Hello People, > > I'm conducting a study for my PhD about applications using data stream > processing,

Print table contents

2019-01-25 Thread Soheil Pourbafrani
Hi, Using Flink Table object how can we print table contents, something like Spark show() method? for example in the following: tableEnv.registerDataSet("Orders", raw, "id, country, num, about"); Table results = tableEnv.sqlQuery("SELECT id FROM Orders WHERE id > 10"); How can I print the

AssertionError: mismatched type $5 TIMESTAMP(3)

2019-01-25 Thread Chris Miller
I'm trying to group some data and then enrich it by joining with a temporal table function, however my test code (attached) is failing with the error shown below. Can someone please give me a clue as to what I'm doing wrong? Exception in thread "main" java.lang.AssertionError: mismatched type

RE: [Flink 1.6] How to get current total number of processed events

2019-01-25 Thread Thanh-Nhan Vo
Hi Kien, Thanks you so much for you answer ! Regards, Nhan De : Kien Truong [mailto:duckientru...@gmail.com] Envoyé : vendredi 25 janvier 2019 13:47 À : Thanh-Nhan Vo ; user@flink.apache.org Objet : Re: [Flink 1.6] How to get current total number of processed events Hi Nhan, To get a global

Re: [Flink 1.6] How to get current total number of processed events

2019-01-25 Thread Kien Truong
Hi Nhan, To get a global view over all events, you can use a non-keyed TumblingWindow and a ProcessAllWindowFunction. Inside the ProcessAllWindowFunction, you calculate the min/max/count of the elements of that window, compared them to the existing values in the global state, then update

Re: No resource available error while testing HA

2019-01-25 Thread Averell
Hi Gary, Yes, my problem mentioned in the original post had been resolved by correcting the zookeeper connection string. I have two other relevant questions, if you have time, please help: 1. Regarding JM high availability, when I shut down the host having JM running, YARN would detect that

Re: Is there a way to get all flink build-in SQL functions

2019-01-25 Thread Timo Walther
The problem right now is that Flink SQL has two stacks for defining functions. One is the built-in function stack that is based on Calcite and the other are the registered UDFs. What you can do is to use FunctionCatalog.withBuiltIns.getSqlOperatorTable() for listing Calcite built-in

Re: Is there a way to get all flink build-in SQL functions

2019-01-25 Thread Jeff Zhang
I believe it make sense to list available udf programmatically. e.g. Users may want to see available udfs in sql-client. It would also benefit other downstream project that use flink sql. Besides that I think flink should also provide api for querying the description of udf about how to use it.

Re: [Flink 1.6] How to get current total number of processed events

2019-01-25 Thread Congxian Qiu
Hi, Nhan There is only one way I know to sum up all the parallel operator instances: set parallel to 1. Best, Congxian Thanh-Nhan Vo 于2019年1月25日周五 下午4:38写道: > Hi Congixan Wiu, > > Thank you for your answer. > > If I understand well, each operator state is bound to one parallel > operator

Re: Is there a way to get all flink build-in SQL functions

2019-01-25 Thread yinhua.dai
Thanks Guys. I just wondering if there is another way except hard code the list:) Thanks anyway. -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

RE: [Flink 1.6] How to get current total number of processed events

2019-01-25 Thread Thanh-Nhan Vo
Hi Congixan Wiu, Thank you for your answer. If I understand well, each operator state is bound to one parallel operator instance. Indeed, I expect to get the total number of all parallel operator instances. Is there a way to sum up all these operator states , please? Best regard, Nhan De :

RE: [Flink 1.6] How to get current total number of processed events

2019-01-25 Thread Thanh-Nhan Vo
Hi Kien, Thank you for your answer. Please correct me if I'm wrong. If I understand well, if I store the max/min value using the value states of a KeyedProcessFunction, this max/min value is calculated per key? Note that in my case, I expect that at every instant, I can obtain the

StreamingFileSink Avro batch size and compression

2019-01-25 Thread Taher Koitawala
Hi All, Is there a way to specify *batch size* and *compression *properties when using StreamingFileSink just like we did in bucketing sink? The only parameters it is accepting is Inactivity bucket check interval and avro schema. We have numerous flink jobs pulling data from