Re: LEGACY('STRUCTURED_TYPE' to pojo

2020-10-30 Thread Rex Fenley
Correction, I'm using Scala case classes not strictly Java POJOs just to be clear. On Fri, Oct 30, 2020 at 7:56 PM Rex Fenley wrote: > Hello, > > I keep running into trouble moving between DataStream and SQL with POJOs > because my nested POJOs turn into LEGACY('STRUCTURED_TYPE', is there any >

LEGACY('STRUCTURED_TYPE' to pojo

2020-10-30 Thread Rex Fenley
Hello, I keep running into trouble moving between DataStream and SQL with POJOs because my nested POJOs turn into LEGACY('STRUCTURED_TYPE', is there any way to convert them back to POJOs in Flink when converting a SQL Table back to a DataStream? Thanks! -- Rex Fenley | Software Engineer - Mo

Re: Best way to test Table API and SQL

2020-10-30 Thread Rex Fenley
Hello, Thank you for these examples, they look great. However, I can seem to import `import org.apache.flink.table.planner.runtime.utils.{StreamingTestBase, StringSink}` is it because I'm using the Blink planner and not the regular one? Thanks On Fri, Oct 9, 2020 at 7:55 AM Timo Walther wrote:

Re: Running flink in a Local Execution Environment for Production Workloads

2020-10-30 Thread David Anderson
Another factor to be aware of is that there's no cluster configuration file to load (the mini-cluster doesn't know where to find flink-conf.yaml), and by default you won't have a REST API endpoint or web UI available. But you can do the configuration in the code, and/or provide configuration on the

Re: Feature request: Removing state from operators

2020-10-30 Thread David Anderson
It seems like another option here would be to occasionally use the state processor API to purge a savepoint of all unnecessary state. On Fri, Oct 30, 2020 at 6:57 PM Steven Wu wrote: > not a solution, but a potential workaround. Maybe rename the operator uid > so that you can continue to leverag

Re: Feature request: Removing state from operators

2020-10-30 Thread Steven Wu
not a solution, but a potential workaround. Maybe rename the operator uid so that you can continue to leverage allowNonRestoredState? On Thu, Oct 29, 2020 at 7:58 AM Peter Westermann wrote: > Does that actually allow removing a state completely (vs. just modifying > the values stored in state)?

Re: how to enable metrics in Flink 1.11

2020-10-30 Thread Diwakar Jha
Hello, I see that in my class path (below) I have both log4j-1 and lo4j-api-2. is this because of which i'm not seeing any logs. If so, could someone suggest how to fix it? export CLASSPATH=":lib/flink-csv-1.11.0.jar:lib/flink-json-1.11.0.jar:lib/flink-shaded-zookeeper-3.4.14.jar:lib/flink-table-

Re: Can I get the filename as a column?

2020-10-30 Thread Dawid Wysakowicz
No, you would do:     CREATE TABLE table1(       `text` VARCHAR,  -- each CSV row is just a single text column   `path` VARCHAR METADATA VIRTUAL, -- where path is a name declared by the filesystem source   `size` VARCHAR METADATA VIRTUAL -- where size is a name declared by the filesystem s

Re: Can I get the filename as a column?

2020-10-30 Thread Ruben Laguna
I've written [FLINK-19903][1]. I just read [FLIP-107][2] and [FLINK-15869][3] and I need to ask So assuming that FLIP-107 / FLINK-15869 is implemented and Filesystem SQL connector modified to expose metadata (including path, and possible other stuff) , then to use it I would need to write:

Re: Can I get the filename as a column?

2020-10-30 Thread Ruben Laguna
Sure, I’ll write the JIRA issue On Fri, 30 Oct 2020 at 13:27, Dawid Wysakowicz wrote: > I am afraid there is no such functionality available yet. > > I think though it is a valid request. I think we can use the upcoming > FLIP-107 metadata columns for this purpose and expose the file name as > m

Re: Can I get the filename as a column?

2020-10-30 Thread Dawid Wysakowicz
I am afraid there is no such functionality available yet. I think though it is a valid request. I think we can use the upcoming FLIP-107 metadata columns for this purpose and expose the file name as metadata column of a filesystem source. Would you like to create a JIRA issue for it? Best, Dawi

Can I get the filename as a column?

2020-10-30 Thread Ruben Laguna
I've asked this already on [stackoverflow][1] Is there anything equivalent to Spark's `f.input_file_name()` ? I don't see anything that could be used in [system functions][2] I have a dataset where they embedded some information in the filenames (200k files) and I need to extract that as a new c

Re: RestClusterClient and classpath

2020-10-30 Thread Flavio Pompermaier
You are Right Chesnay but I'm doing this stuff in parallel with other 2 things and I messed up the jar name, sorry for that. For the code after env.execute I'll try to use the new JobListener interface next days.. I hope it could be sufficient (I just have to call an external service to update the

Re: RestClusterClient and classpath

2020-10-30 Thread Chesnay Schepler
Yes, it is definitely way easier to upload&run jars instead of submitting JobGraphs. But I thought this was not viable for you because you cannot execute anything after env.execute()? I believe this limitation still exists. Or are you referring here to error-handling in case env.execute() thro

Re: RestClusterClient and classpath

2020-10-30 Thread Flavio Pompermaier
I just discovered that I was using the "slim" jar instead of the "fat" one...sorry. Now I'm able to successfully run the program on the remote cluster. However, the fact of generating the job graph on the client side it's something I really don't like at allbecause it requires access both to flink

Re: Logging when building and testing Flink

2020-10-30 Thread Juha Mynttinen
Very nice Dawid, thanks! I'll give this a try next time I run the tests. Regards, Juha El vie., 30 oct. 2020 a las 13:13, Dawid Wysakowicz () escribió: > Small correction to my previous email. The previously mentioned problem > is actually not a problem. You can just pass the log4j.configuration

Re: Logging when building and testing Flink

2020-10-30 Thread Dawid Wysakowicz
Small correction to my previous email. The previously mentioned problem is actually not a problem. You can just pass the log4j.configurationFile explicitly: mvn '-Dlog4j.configurationFile=[path]/log4j2-on.properties' clean install Best, Dawid On 23/10/2020 09:48, Juha Mynttinen wrote: > Hey the

Re: Logging when building and testing Flink

2020-10-30 Thread Dawid Wysakowicz
You should be able to globally override the configuration file used by surefire plugin which executes tests like this: mvn '-Dlog4j.configuration=[path]/log4j2-on.properties' clean install Bear in mind there is a minor bug in our surefire configuration now: https://issues.apache.org/jira/browse/F

Re: RestClusterClient and classpath

2020-10-30 Thread Chesnay Schepler
Can you give me more information on your packaging setup / project structure? Is "it.test.MyOb" a test class? Does the dependency containing this class have a "test" scope? On 10/30/2020 11:34 AM, Chesnay Schepler wrote: It is irrelevant whether it contains transitive dependencies or not; that

Re: RestClusterClient and classpath

2020-10-30 Thread Chesnay Schepler
It is irrelevant whether it contains transitive dependencies or not; that's a maven concept, not a classloading one. The WordCount main class, which is only contained in that jar, could be found, so the classloading is working. If any other class that is supposed to be in jar cannot be found,

Re: [Flink::Test] access registered accumulators via harness

2020-10-30 Thread Sharipov, Rinat
Hi Dawid, thx for your reply and sorry for a question with a double interpretation ! You understood me correctly, I would like to get counters value by their names after completing all operations with the harness component. I suppose that it should be useful because most of our functions are impl

Re: RestClusterClient and classpath

2020-10-30 Thread Flavio Pompermaier
Yes, with the WordCount it works but that jar is not a "fat" jar (it does not include transitive dependencies). Actually I was able to use the REST API without creating the JobGraph, you just have to tell the run API the jar id, the main cluss to run and the optional parameters. For this don't use

Re: RestClusterClient and classpath

2020-10-30 Thread Chesnay Schepler
If you aren't setting up the classpath correctly then you cannot create a JobGraph, and cannot use the REST API (outside of uploading jars). In other words, you _will_ have to solve this issue, one way or another. FYI, I just tried your code to submit a WordCount jar to a cluster (the one in th

Re: [Flink::Test] access registered accumulators via harness

2020-10-30 Thread Dawid Wysakowicz
Hi Rinat, First of all, sorry for some nitpicking in the beginning, but your message might be a bit misleading for some. If I understood your message correctly you are referring to Metrics, not accumulators, which are a different concept[1]. Or were you indeed referring to accumulators? Now on to

Re: RestClusterClient and classpath

2020-10-30 Thread Flavio Pompermaier
For "REST only client" I mean using only the REST API to interact with the Flink cluster, i.e. without creating any PackagedProgram and thus incurring into classpath problems. I've implemented a running job server that was using the REST API to upload the job jar and execute the run command but the

Re: RestClusterClient and classpath

2020-10-30 Thread Chesnay Schepler
To clarify, if the job creation fails on the JM side, in 1.11 the job submission will fail, in 1.12 it will succeed but the job will be in a failed state. On 10/30/2020 10:23 AM, Chesnay Schepler wrote: 1) the job still reaches a failed state, which you can poll for, see 2) 2) polling is your

Re: RestClusterClient and classpath

2020-10-30 Thread Chesnay Schepler
1) the job still reaches a failed state, which you can poll for, see 2) 2) polling is your only way. What do you mean with "REST only client"? Do you mean a plain http client, not something that Flink provides? On 10/30/2020 10:02 AM, Flavio Pompermaier wrote: Nothing to do also with IntelliJ.

Does Flink operators synchronize states?

2020-10-30 Thread Yuta Morisawa
Hello, I am wondering whether Flink operators synchronize their execution states like Apache Spark. In Apache Spark, the master decides everything, for example, it schedules jobs and assigns tasks to Executors so that each job is executed in a synchronized way. But Flink looks different. It a

Re: RestClusterClient and classpath

2020-10-30 Thread Flavio Pompermaier
Nothing to do also with IntelliJ..do you have any sample project I can reuse to test the job submission to a cluster? I can't really understand why the classes within the fat jar are not found when generating the PackagedProgram. Ideally, I'd prefer to use REST only client (so no need to build pack

Re: Table Print SQL Connector

2020-10-30 Thread Dawid Wysakowicz
Hi, The problem in your case is that you exit before anything is printed out. The method executeInsert executes the query, but it does not wait for the query to finish. Therefore your main/test method returns, bringing down the local cluster, before anything is printed out. You can e.g. add      

Re: Working with bounded Datastreams - Flink 1.11.1

2020-10-30 Thread Arvid Heise
Hi Sunitha, you probably need to apply a non-windowed grouping. datastream .keyBy(Event::getVehicleId) .reduce((first, other) -> first); This example will always throw away the second record. You may want to combine the records though by summing up the fuel. Best, Arvid On Wed, Oct 28,

Re: Insufficient number of network buffers for simple last_value aggregate

2020-10-30 Thread Arvid Heise
Hi Thilo, the number of required network buffers depends on your data exchanges and parallelism. For each shuffling data exchange (what you need for group by), you ideally have #slots-per-TM^2 * #TMs * 4 buffers. So I'm assuming you have 11 machines and 8 slots per machine. Then for best performa

Re: Insufficient number of network buffers for simple last_value aggregate

2020-10-30 Thread Xintong Song
Hi Schneider, The error message suggests that your task managers are not configured with enough network memory. You would need to increase the network memory configuration. See this doc [1] for more details. Thank you~ Xintong Song [1] https://ci.apache.org/projects/flink/flink-docs-release-1.