Re: threading and distribution

2021-02-11 Thread Matthias Pohl
Hi Marco, sorry for the late reply. The documentation you found [1] is already a good start. You can define how many subtasks of an operator run in parallel using the operator's parallelism configuration [2]. Each operator's subtask will run in a separate task slot. There's the concept of slot shar

How to debug flink serialization error?

2021-02-11 Thread Debraj Manna
HI I am having a ProcessFunction like below which is throwing an error like below whenever I am trying to use it in a opeator . My understanding when flink initializes the operator dag, it serializes things and sends over to the taskmanagers. So I have marked the operator state transient, since t

CDC for MS SQL Server

2021-02-11 Thread John Smith
I see you have native connectors for Postgress and MySql. I also See a debezium connector. 1- I'm guessing if we want to uses MS SQL Server CDC we need to use the Deebezium connector? 2- If so, do we need to have the full debezium service running. Or it's as simple as the MySql and Postgres connec

Re: "upsert-kafka" connector not working with Avro confluent schema registry

2021-02-11 Thread Shamit
Hi Arvid, Thanks for the response. I have tried without serializer and getting error. With "avro-confluent" it shows missing "schema-registry.url" although it is defined in the definition. Below is the screen shot. Request you to please help.

Re: "upsert-kafka" connector not working with Avro confluent schema registry

2021-02-11 Thread Shamit
Hi Arvid, Thanks for the response. I have tried without serializer and getting error. With "avro-confluent" it shows missing "schema-registry.url" although it is defined in the definition. Below is the screen shot. Request you to please help.

Re: Any plans to make Flink configurable with pure data?

2021-02-11 Thread Arvid Heise
Hi Pilgrim, Thank you for clarifying. I solved a similar challenge in Spark (for batch) by doing the following (translated into Flink terms): - I created most of the application with the Table API - the programmatic interface to Flink SQL. Here it is quite easy to implement structural variance of

Re: Joining and Grouping Flink Tables with Java API

2021-02-11 Thread Arvid Heise
Hi Abdelilah, you are right that union does not work (well) in your case. I misunderstood the relation between the two streams. The ideal pattern would be a broadcast join imho. [1] I'm not sure how to do it in Table API/SQL though, but I hope Timo can help here as well. [1] https://ci.apache.or

Re: Optimizing Flink joins

2021-02-11 Thread Dan Hill
Hi Timo! I'm moving away from SQL to DataStream. On Thu, Feb 11, 2021 at 9:11 AM Timo Walther wrote: > Hi Dan, > > the order of all joins depends on the order in the SQL query by default. > > You can also check the following example (not interval joins though) and > swap e.g. b and c: > > env.c

Re: Joining and Grouping Flink Tables with Java API

2021-02-11 Thread Timo Walther
After thinking about this topic again, I think UNION ALL will not solve the problem because you would need to group by brandId and perform the joining within the aggregate function which could also be quite expensive. Regards, Timo On 11.02.21 17:16, Timo Walther wrote: Hi Abdelilah, at a fi

Re: Re: Re: Connecting Pyflink DDL to Azure Eventhubs using Kafka connector

2021-02-11 Thread joris.vanagtmaal
Hi Arvid, I'm currently running PyFlink locally in the JVM with a parallelism of 1, and the same file works fine if i direct it to a Kafka cluster (running in a local docker instance). I assumed that the JAR pipeline definition in the python file would make sure they are made available on the cl

Exception when writing part file to S3

2021-02-11 Thread Robert Cullen
I’m using a StreamingFileSink to write data to my S3 instance. When writing the part file this exception occurs: 2021-02-11 12:08:39 org.apache.flink.runtime.JobException: Recovery is suppressed by NoRestartBackoffTimeStrategy at org.apache.flink.runtime.executiongraph.failover.flip1.Executio

Re: Native K8S HA Session Cluster Issue 1.12.1

2021-02-11 Thread Till Rohrmann
Hi Kevin, Unfortunately, the root cause for the error is missing. I can only guess but it could indeed be FLINK-20417 [1]. If this is the case, then the problem should be fixed with the upcoming Flink 1.12.2 version. It should be released next week hopefully. If it should be a different problem, t

Re: Optimizing Flink joins

2021-02-11 Thread Timo Walther
Hi Dan, the order of all joins depends on the order in the SQL query by default. You can also check the following example (not interval joins though) and swap e.g. b and c: env.createTemporaryView("a", env.fromValues(1, 2, 3)); env.createTemporaryView("b", env.fromValues(4, 5, 6)); env.create

Re: Proctime consistency

2021-02-11 Thread Timo Walther
Hi Rex, sorry for replying so late. Yes, your summary should be correct. In many cases this processing time stress on restore is the reason why people select event time eventually. But if that is fine for your use case, that's great. Regards, Timo On 05.02.21 06:26, Rex Fenley wrote: So if

Re: Joining and Grouping Flink Tables with Java API

2021-02-11 Thread Timo Walther
Hi Abdelilah, at a first glance your logic seems to be correct. But Arvid is right that your pipeline might not have the optimal performance that Flink can offer due to the 3 groupBy operations. I'm wondering what the optimizer produces out of this plan. Maybe you can share it with us using `

Re: Should flink job manager crash during zookeeper upgrade?

2021-02-11 Thread Barisa Obradovic
Thank you Till, that's perfect. I increased the max retry attempts a bit, and now it works like a charm ( no restarts ). -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Flink’s Kubernetes HA services - NOT working

2021-02-11 Thread Matthias Pohl
One other thing: It looks like you've set high-availability.storageDir to a local path file:///opt/flink/recovery. You should use a storage path that is accessible from all Flink cluster components (e.g. using S3). Only references are stored in Kubernetes ConfigMaps [1]. Best, Matthias [1] https:

Re: Any plans to make Flink configurable with pure data?

2021-02-11 Thread Pilgrim Beart
Hi Arvid, thanks for the response. I don't think I was being very clear. This is a streaming application. Each customer will have a Flink job running in its own private cluster. The rough *shape *of the analysis that the Flink job is doing for each customer is always the same, because we do just o

Re: Question

2021-02-11 Thread Abu Bakar Siddiqur Rahman Rocky
Hi, Is there anyone who can inform me how I can connect a Java program to Apache Flink (in mac)? Thank you! Regards, Abu Bakar Siddiqur Rahman On Thu, Feb 11, 2021 at 4:26 AM Abu Bakar Siddiqur Rahman Rocky < bakar121...@gmail.com> wrote: > Hi Chesnay, > > Could you please inform me that how c

Re: error accessing S3 bucket 1.12

2021-02-11 Thread Billy Bain
Dawid, We found the issue. Our bucket has periods in the name, com.this.bucket.fails Recreating the bucket with dashes instead of periods solved it. com-this-bucket-succeeds This seems crazy, but the bucket naming guidelines are clear. https://docs.aws.amazon.com/AmazonS3/latest/userguide/buc

Re: State Access Beyond RichCoFlatMapFunction

2021-02-11 Thread Sandeep khanzode
Hello, Can you please share if you have some example of CoGroupedStreams? Thanks! > On 10-Feb-2021, at 3:22 PM, Kezhu Wang wrote: > > > Actually, my use case is that I want to share the state of one stream in > > two other streams. Right now, I can think of connecting this stream > > independ

Re: What is the difference between RuntimeContext state and ProcessWindowFunction.Context state when using a ProcessWindowFunction?

2021-02-11 Thread Arvid Heise
Hi Marco, 1. RuntimeContext is available to all operators and is bound to the current key (if used in a keyed context). 2. Is the same actually. Not sure why it's exposed twice... So use either one. 3. Is additionally bound to the current window. On Tue, Feb 9, 2021 at 10:46 PM Marco Villalobos

Re: S3 parquet files as Sink in the Table SQL API

2021-02-11 Thread Arvid Heise
Hi, If you just want to use s3a, you only need flink-s3-fs-hadoop-1.12.1.jar in the plugin. The format flink-sql-parquet_2.11-1.12.1.jar should be in lib. All other jars are not needed afaik. On Thu, Feb 11, 2021 at 9:00 AM meneldor wrote: > Well, i am not sure which of those actually helped

Re: ClassLoader leak when using s3a upload through DataSet.output

2021-02-11 Thread Arvid Heise
Hi Vishal, if you have the possibility could you create a memdump? It would be interesting to know why the TransferManager is never released. Note that it's impossible to release all objects/classes loaded through a particular ClassLoader, all we can do is making sure that the ClassLoader is not

Re: "upsert-kafka" connector not working with Avro confluent schema registry

2021-02-11 Thread Arvid Heise
Hi Shamit, Why are specifying the upsert-kafka completely different? In particular, why did you set the serializer explicitly? I would have assumed that just setting the format to 'format'='avro-confluent' should be enough (same as in the working source). On Tue, Feb 9, 2021 at 11:06 PM Shamit

Re: Join two streams from Kafka

2021-02-11 Thread Arvid Heise
Hi Shamit, unless you have some temporal relationship between the records to be joined, you have to use a regular join over stream 1 and stream 2. Since you cannot define any window, all data will be held in Flink's state, which is not an issue for a few millions but probably means you have to use

Apache Flink Deployment in Multiple AZ

2021-02-11 Thread VINAYA KUMAR BENDI
Hi, Is Apache Flink (Job Managers, Task Managers) setup in multiple availability zones a recommended deployment model? Kind regards, Vinaya

Re: Custom source with multiple, differently typed outputs

2021-02-11 Thread Arvid Heise
Hi Roman, In general, the use of inconsistent types is discouraged but there is little that you can do on your end. I think your approach with SourceFunction is good but I'd probably not use Row already but rather some POJO or source format record. Note, that I have never seen side-outputs in a s

Re: Kafka connector doesn't support consuming update and delete changes in Table SQL API

2021-02-11 Thread meneldor
> > Are you sure that the null records are not actually tombstone records? If > you use upsert tables you usually want to have them + compaction. Or how > else will you deal with deletions? yes they are tombstone records, but i cannot avoid them because the deduplication query cant produce an appe

Re: statefun: Unable to find a source translation for ingress

2021-02-11 Thread Igal Shilman
Hello, I believe that your assembly plugin configuration doesn't merge files under META-INF/services. Can you unzip your jar and examin manually the contents of: META-INF/services/org.apache.flink.statefun.flink.io.spi.FlinkIoModule It should include at least the following lines: org.apache.flink

Re: Re: Re: Connecting Pyflink DDL to Azure Eventhubs using Kafka connector

2021-02-11 Thread Arvid Heise
Hi Joris, Are you sure that all nodes have access to these jars? Usually, shared resources reside on some kind of distributed file system or network directory. I'm pulling in Dian who can probably help better. Best, Arvid On Tue, Feb 9, 2021 at 1:50 PM joris.vanagtmaal < joris.vanagtm...@warts

Re: Any plans to make Flink configurable with pure data?

2021-02-11 Thread Arvid Heise
Hi Pilgrim, it sounds to me as if you are planning to use Flink for batch processing by having some centralized server to where you submit your queries. While you can use Flink SQL for that (and it's used in this way in larger companies), the original idea of Flink was to be used in streaming app

Re: Kafka connector doesn't support consuming update and delete changes in Table SQL API

2021-02-11 Thread Arvid Heise
Hi, Are you sure that the null records are not actually tombstone records? If you use upsert tables you usually want to have them + compaction. Or how else will you deal with deletions? Is there anyone who is successfully deduplicating CDC records into either > kafka topic or S3 files(CSV/parquet

Re: Joining and Grouping Flink Tables with Java API

2021-02-11 Thread Arvid Heise
Hi Abdelilah, I think your approach is overly complicated (and probably slow) but I might have misunderstood things. Naively, I'd assume that you just want to union stream 1 and stream 2 instead of joining. Note that for union the events must have the same schema, so you most likely want to have a

Re: Minicluster Flink tests, checkpoints and inprogress part files

2021-02-11 Thread Arvid Heise
Hi Dan, it's not entirely clear to me how you want to write your tests, but it's possible with your setup (we have a couple of thousand tests in Flink that do that). What you usually try to use is a test source that is finite (e.g. file source that is not scanning for new files), such that the st

Re: question on checkpointing

2021-02-11 Thread Arvid Heise
Hi Marco, Actually, perhaps I misworded it. This particular checkpoint seems to > occur in an operator that is flat mapping (it is actually a keyed > processing function) a single blob data-structure into several hundred > thousands elements (sometimes a million) that immediately flow into a sink

Flink Elasticseach success handler

2021-02-11 Thread Vignesh Ramesh
I use Flink Elasticsearch sink to bulk insert the records to ES. I want to do an operation after the record is successfully synced to Elasticsearch. There is a failureHandler by which we can retry failures. Is there a successHandler in flink elasticsearch sink? *Note*: I couldn't do the operation

Re: clarification on backpressure metrics in Apache Flink Dashboard

2021-02-11 Thread Chesnay Schepler
Yes. Unless operator 2 is also back-pressured of course, then you should take a look at the sink. On 2/11/2021 4:50 AM, Marco Villalobos wrote: given: [source] -> [operator 1] -> [operator 2] -> [sink]. If within the dashboard, operator 1 shows that it has backpressure, does that mean I need

Re: Should flink job manager crash during zookeeper upgrade?

2021-02-11 Thread Till Rohrmann
Hi Barisa, Could you give us the full logs of the run? It looks a bit that you exceeded the maximum retry attempts while you upgraded your ZooKeeper cluster. You can increase it via recovery.zookeeper.client.retry-wait and recovery.zookeeper.client.max-retry-attempts. >From Flink's perspective it

Native K8S HA Session Cluster Issue 1.12.1

2021-02-11 Thread Bohinski, Kevin
Hi All, On long lived session clusters we are seeing a k8s error `Error while watching the ConfigMap`. Good news is it looks like `too old resource version` issue is fixed :). Logs are attached below. Any tips? best Kevin 2021-02-11 07:55:15,249 INFO org.apache.flink.runtime.checkpoint.Chec

Re: S3 parquet files as Sink in the Table SQL API

2021-02-11 Thread meneldor
Well, i am not sure which of those actually helped but it works now. I downloaded the following jars in plugins/s3-fs-hadoop/ : > > flink-hadoop-compatibility_2.11-1.12.1.jar > flink-s3-fs-hadoop-1.12.1.jar > flink-sql-parquet_2.11-1.12.1.jar > force-shading-1.12.1.jar > hadoop-mapreduce-client-cor