Re: Number of parallel connections for Elasticsearch Connector

2021-01-16 Thread Rex Fenley
I found the following, indicating that there is no concurrency for the Elasticsearch Connector https://github.com/apache/flink/blob/97bfd049951f8d52a2e0aed14265074c4255ead0/flink-connectors/flink-connector-elasticsearch-base/src/main/java/org/apache/flink/streaming/connectors/elasticsearch/Elastics

Re: Setting different timeouts for savepoints and checkpoints

2021-01-16 Thread Rex Fenley
Thanks for the quick response. Is this something that can be added as a feature request? Given that the time it takes to restore from either is different, and the semantics are slightly different, it seems like they should have completely separate configurable timeouts. Thanks! On Sat, Jan 16,

Re: Setting different timeouts for savepoints and checkpoints

2021-01-16 Thread Khachatryan Roman
Hi Rex, Unfortunately not: the same timeout value is used both for savepoints and checkpoints. Regards, Roman On Sat, Jan 16, 2021 at 9:42 AM Rex Fenley wrote: > Hello, > > I'm wondering if there's a way to set different timeouts for savepoints > and checkpoints. Our savepoints can take a num

Why use ListView?

2021-01-16 Thread Rex Fenley
Hello, In the recent version of Flink docs I read the following [1]: > If an accumulator needs to store large amounts of data, org.apache.flink.table.api.dataview.ListView and org.apache.flink.table.api.dataview.MapView provide advanced features for leveraging Flink’s state backends in unbounded d

Flink ID hashing

2021-01-16 Thread Rex Fenley
Hello, I'm wondering what sort of algorithm flink uses to map an Integer ID to a subtask when distributing data. Also, what operators from the TableAPI cause data to be redistributed? I know Joins will, what about Aggregates, Sources, Filters? Thanks! -- Rex Fenley | Software Engineer - Mobi

Number of parallel connections for Elasticsearch Connector

2021-01-16 Thread Rex Fenley
Hello, How many connections does the ES connector use to write to Elasticsearch? We have a single machine with 16 vCPUs and parallelism of 4 running our job, with -p 4 I'd expect there to be 4 parallel bulk request writers / connections to Elasticsearch. Is there a place in the code to confirm thi

Re: Flink sql interval join problem

2021-01-16 Thread Leonard Xu
Hi, >I only saw inner interval join on the official website, I don't see > outer interval join on the offical website, Is there an example of an OUTER > INTERVAL JOIN? >official website > link://ci.apache.org/projects/flink/flink-docs-release-1.12/dev/table/sql/queries.html#join

Re: Pushing Down Filters

2021-01-16 Thread Leonard Xu
Hi, Shekhar > 1. Optimizer does not use both - ProjectableTableSource and > FilterableTableSource - in a single query even if the source implements both > interfaces. Each interface works correctly if implemented independently. I didn’t see your custom source implementation, but I think the tw

Re: Pushing Down Filters

2021-01-16 Thread Leonard Xu
Hi, Shekhar > 1. Optimizer does not use both - ProjectableTableSource and > FilterableTableSource - in a single query even if the source implements both > interfaces. Each interface works correctly if implemented independently. I didn’t your custom source implementation, but I think the two int

Re: Pyflink Join with versioned view / table

2021-01-16 Thread Leonard Xu
Hi, Torben > When implementing the join I get only updates when the right table changes The event-time temporal join versioned table is triggered watermark which calculated by both left and right table’s watermark, so you get only updated when the right table changes(which is the slower one in

Setting different timeouts for savepoints and checkpoints

2021-01-16 Thread Rex Fenley
Hello, I'm wondering if there's a way to set different timeouts for savepoints and checkpoints. Our savepoints can take a number of hours to complete, whereas incremental checkpoints at their slowest take around 10 min. We'd like to timeout a checkpoint on a significantly smaller duration than a s