Re: How to split tuple2 returned by UDAF into two columns in a result table

2019-03-19 Thread Dongwon Kim
Another, yet related question: Is there something like aggregate table function? In the above scenario, I have to apply an aggregate function and then apply a table function solely to flatten tuples, which seems quite inefficient. On Wed, Mar 20, 2019 at 1:09 PM Dongwon Kim wrote: > Hi Kurt,

Re: [VOTE] Release 1.8.0, release candidate #3

2019-03-19 Thread jincheng sun
Hi Aljoscha&All, When I did the `end-to-end` test for RC3 under Mac OS, I found the following two problems: 1. The verification returned for different `minikube status` is is not enough for the robustness. The strings returned by different versions of different platforms are different. the follow

Re: How to split tuple2 returned by UDAF into two columns in a result table

2019-03-19 Thread Dongwon Kim
Hi Kurt, You're right; It is table function like "mytablefunc(col1, col2, col3) as (col4, col5)". I've got to define a custom UDTF for that purpose. Thanks, - Dongwon On Wed, Mar 20, 2019 at 12:04 PM Kurt Young wrote: > Hi Dongwon, > > AFAIK, Flink doesn't support the usage like "myscalar(col1,

Re: [PROGRESS UPDATE] [DISCUSS] Flink-Hive Integration and Catalogs

2019-03-19 Thread Shaoxuan Wang
Hi Bowen, Thanks for driving this. I am CCing this email/survey to user-zh@ flink.apache.org as well. I heard there are lots of interests on Flink-Hive from the field. One of the biggest requests the hive users are raised is "the support of out-of-date hive version". A large amount of users are sti

Re: [DISCUSS] Create a Flink ecosystem website

2019-03-19 Thread Becket Qin
I agree. We can start with english-only and see how it goes. The comments and descriptions can always be multi-lingual but that is up to the package owners. On Tue, Mar 19, 2019 at 6:07 PM Robert Metzger wrote: > Thanks. > > Do we actually want this page to be multi-language? > > I propose to ma

Re: How to split tuple2 returned by UDAF into two columns in a result table

2019-03-19 Thread Kurt Young
Hi Dongwon, AFAIK, Flink doesn't support the usage like "myscalar(col1, col2, col3) as (col4, col5)". Am I missing something? If you want to split Tuple2 into two different columns, you can use UDTF. Best, Kurt On Wed, Mar 20, 2019 at 9:59 AM Dongwon Kim wrote: > Hi, > > I want to split Tupl

How to split tuple2 returned by UDAF into two columns in a result table

2019-03-19 Thread Dongwon Kim
Hi, I want to split Tuple2 returned by AggregateFunction.getValue into two different columns in a resultant table. Let's consider the following example where myudaf returns Tuple2: Table table2 = table1 .window(Slide.over("3.rows").every("1.rows").on("time").as("w")) .groupBy("w,

Re: Backoff strategies for async IO functions?

2019-03-19 Thread Rong Rong
Thanks for the feedback @Till. Yes I agree as well that opening up or changing the AsyncWaitOperator doesn't seem to be a necessity here. I think making "AsyncFunctionBase", making the current AsyncFunction as a extension of it with a some of the default behaviors like Shuyi suggested seems to be

[PROGRESS UPDATE] [DISCUSS] Flink-Hive Integration and Catalogs

2019-03-19 Thread Bowen Li
Hi Flink users and devs, We want to get your feedbacks on integrating Flink with Hive. Background: In Flink Forward in Beijing last December, the community announced to initiate efforts on integrating Flink and Hive. On Feb 21 Seattle Flink Meetup

Re: Schema Evolution on Dynamic Schema

2019-03-19 Thread Shahar Cizer Kobrinsky
My bad. it actually did work with Select a, map['sumB', sum(b) ,'sumC' , sum(c) ) as metric_map group by a do you think thats OK as a workaround? main schema should be changed that way - only keys in the map On Tue, Mar 19, 2019 at 11:50 AM Shahar Cizer Kobrinsky < shahar.kobrin...@gmail.com> wro

Re: Async Function Not Generating Backpressure

2019-03-19 Thread Ken Krugler
Hi Seed, I was assuming the Cassandra sink was separate from and after your async function. I was trying to come up for an explanation as to why adding the async function would improve your performance. The only very unlikely reason I thought of was that the async function somehow caused data

Re: Schema Evolution on Dynamic Schema

2019-03-19 Thread Shahar Cizer Kobrinsky
Thanks Fabian, Im thinking about how to work around that issue and one thing that came to my mind is to create a map that holds keys & values that can be edited without changing the schema, though im thinking how to implement it in Calcite. Considering the following original SQL in which "metrics"

Re: Async Function Not Generating Backpressure

2019-03-19 Thread Seed Zeng
Hi Ken and Andrey, Thanks for the response. I think there is a confusion that the writes to Cassandra are happening within the Async function. In my test, the async function is just a pass-through without doing any work. So any Cassandra related batching or buffering should not be the cause for t

Re: ingesting time for TimeCharacteristic.IngestionTime on unit test

2019-03-19 Thread Andrey Zagrebin
Hi Avi, do you use processing time timer (timerService().registerProcessingTimeTimer)? why do you need ingestion time? do you set TimeCharacteristic.IngestionTime? Best, Andrey On Tue, Mar 19, 2019 at 1:11 PM Avi Levi wrote: > Hi, > Our stream is not based on time sequence and we do not use ti

Re: Backoff strategies for async IO functions?

2019-03-19 Thread Till Rohrmann
Sorry for joining the discussion so late. I agree that we could add some more syntactic sugar for handling failure cases. Looking at the existing interfaces, I think it should be fairly easy to create an abstract class AsyncFunctionWithRetry which extends AsyncFunction and encapsulates the retry lo

Re: Status.JVM.Memory.Heap.Used metric shows only a few megabytes but profiling shows gigabytes (as expected)

2019-03-19 Thread Chesnay Schepler
Known issue, fixed in 1.7.3/1.8.0: https://issues.apache.org/jira/browse/FLINK-11183 On 19.03.2019 15:03, gerardg wrote: Hi, Before Flink 1.7.0 we were getting correct values in Status.JVM.Memory.Heap.Used metric. Since updating we just see a constant small value (just a few megabytes), did so

Re: Flink 1.7.2 extremely unstable and losing jobs in prod

2019-03-19 Thread Andrey Zagrebin
Hi Bruno, could you also share the job master logs? Thanks, Andrey On Tue, Mar 19, 2019 at 12:03 PM Bruno Aranda wrote: > Hi, > > This is causing serious instability and data loss in our production > environment. Any help figuring out what's going on here would be really > appreciated. > > We

Re: End-to-end exactly-once semantics in simple streaming app

2019-03-19 Thread Andrey Zagrebin
Hi Patrick, One approach, I would try, is to use Flink state and sync it with database in initializeState and CheckpointListener.notifyCheckpointComplete. Basically issue only idempotent updates to database but only when the last checkpoint is securely taken and records before it are not processed

Re: Task slot sharing: force reallocation

2019-03-19 Thread Till Rohrmann
Hi Le, at the moment Flink does not allow the user to specify how the individual task a being deployed (spread out across all TMs vs. filling up a TM before using another). At the moment, the current implementation uses the second strategy. That's why you see all sources being deployed to the same

Re: Async Function Not Generating Backpressure

2019-03-19 Thread Ken Krugler
Hi Seed, It’s a known issue that Flink doesn’t report back pressure properly for AsyncFunctions, due to how it monitors the output collector to gather back pressure statistics. But that wouldn’t explain how you get a faster processing with the AsyncFunction inserted into your workflow. I have

Re: Async Function Not Generating Backpressure

2019-03-19 Thread Andrey Zagrebin
AsyncFunctionHi Seed, I think the back pressure should emerge by blocking the AsyncFunction.asyncInvoke call. So it depends on how ResultFuture is generated from Cassandra client whether it blocks on submitting request or not when the number of pending requests is too big. Maybe, AsyncFunction.asy

Re: Re: Flink 1.7.2: All jobs are getting deployed on the same task manager

2019-03-19 Thread Andrey Zagrebin
Hi Kumar and Andrea, this is a known change in Flink behaviour from 1.4 to 1.5 (after FLIP-6). There is an issue to track progress on more fine-grained task distribution [1]. Best, Andrey [1] https://issues.apache.org/jira/browse/FLINK-11815 On Mon, Mar 18, 2019 at 1:28 PM Kumar Bolar, Harshith

Re: Iterator Data Sync

2019-03-19 Thread Andrey Zagrebin
Hi Mikhail, could you create a JIRA issue to discuss the change? Best, Andrey On Mon, Mar 18, 2019 at 3:10 PM Mikhail Pryakhin wrote: > Hello Flink community! > > I've come across of employing an "Iterator Data Sync"[1] approach to test > output from a streaming pipeline. The pipeline consists

Status.JVM.Memory.Heap.Used metric shows only a few megabytes but profiling shows gigabytes (as expected)

2019-03-19 Thread gerardg
Hi, Before Flink 1.7.0 we were getting correct values in Status.JVM.Memory.Heap.Used metric. Since updating we just see a constant small value (just a few megabytes), did something change? Gerard -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

ingesting time for TimeCharacteristic.IngestionTime on unit test

2019-03-19 Thread Avi Levi
Hi, Our stream is not based on time sequence and we do not use time based operations. we do want to clean the state after x days hence we fire timer event. My problem is that our unit test fires the event immediately (there is no ingestion time) how can I inject ingestion time ? Cheers Avi

Fwd: [VOTE] Release 1.8.0, release candidate #3

2019-03-19 Thread Aljoscha Krettek
Hi All, The release process for Flink 1.8.0 is currently ongoing. Please have a look at the thread, in case you’re interested in checking your applications against this next release of Apache Flink and participate in the process. Best, Aljoscha > Begin forwarded message: > > From: Aljoscha Kr

Flink 1.7.2 extremely unstable and losing jobs in prod

2019-03-19 Thread Bruno Aranda
Hi, This is causing serious instability and data loss in our production environment. Any help figuring out what's going on here would be really appreciated. We recently updated our two EMR clusters from flink 1.6.1 to flink 1.7.2 (running on AWS EMR). The road to the upgrade was fairly rocky, but

Re: [DISCUSS] Create a Flink ecosystem website

2019-03-19 Thread Robert Metzger
Thanks. Do we actually want this page to be multi-language? I propose to make the website english-only, but maybe consider allowing comments in different languages. If we would make it multi-language, then we might have problems with people submitting packages in non-english languages. On Tue,

End-to-end exactly-once semantics in simple streaming app

2019-03-19 Thread Patrick Fial
Hello, I am working on a streaming application with apache flink, which shall provide end-to-end exactly-once delivery guarantees. The application is roughly built like this: environment.addSource(consumer) .map(… idempotent transformations ...) .map(new DatabaseFunction) .map(… idempoten

Re: Schema Evolution on Dynamic Schema

2019-03-19 Thread Fabian Hueske
Hi, Restarting a changed query from a savepoint is currently not supported. In general this is a very difficult problem as new queries might result in completely different execution plans. The special case of adding and removing aggregates is easier to solve, but the schema of the stored state cha