Re: Flink + Consul as HA backend. What do you think?

2018-02-14 Thread Krzysztof Białek
I have very little experience with ZK and cannot explain the differences between ZK and Consul by myself. However there are some comparisions available: * https://www.consul.io/intro/vs/zookeeper.html - done by Consul so may be biased * https://www.slideshare.net/IvanGlushkov/zookeeper-vs-consul-41

Re: Deploying Flink with JobManager HA on Docker Swarm/Kubernetes

2018-02-14 Thread Chirag Dewan
Thanks a lot Aljoscha. I was doing a silly mistake. TaskManagers can now register with JobManager. One more thing, does Flink now store Job Graphs on ZK too? Regards, Chirag On Wednesday, 14 February, 2018, 8:06:14 PM IST, Aljoscha Krettek wrote: It should be roughly the same settings t

Re: Regarding BucketingSink

2018-02-14 Thread Vishal Santoshi
-rw-r--r-- 3 root hadoop 11 2018-02-14 18:48 /kraken/sessions_uuid_csid_v3/dt=2018-02-14/_part-0-18.valid-length -rw-r--r-- 3 root hadoop 54053518 2018-02-14 19:15 /kraken/sessions_uuid_csid_v3/dt=2018-02-14/_part-0-19.pending -rw-r--r-- 3 root hadoop 11 2018-02-14 21:17 /

Re: Support COUNT(DISTINCT 'field') Query Yet?

2018-02-14 Thread Fabian Hueske
Hi Tao, DISTINCT aggregates in group windows are not supported yet. There's currently a discussion on the dev mailing list about this feature [1]. Since we are only a few days before the feature freeze of Flink 1.5.0, it might be included in 1.6.0, about 4-5 months from now. As a workaround, you

Support COUNT(DISTINCT 'field') Query Yet?

2018-02-14 Thread xiatao123
SELECT TUMBLE_START(event_timestamp, INTERVAL '1' HOUR), COUNT(DISTINCT session), COUNT(DISTINCT user_id), SUM(duration), SUM(num_interactions) FROM unified_events GROUP BY TUMBLE(event_timestamp, INTERVAL '1' HOUR) I have the above statement my flink query running on Flink 1.3.2, but got the erro

How do I run SQL query on a dataStream that generates my custom type.

2018-02-14 Thread nikhilsimha
I have a stream of events of a custom type. I want to know how to convert these events into Rows that can be queried. More specifically how do I attach type information to the stream of rows that is generated? I have the following code ``` val execEnv = StreamExecutionEnvironment.getExecution

Architecture question

2018-02-14 Thread robert
I need to grab avro data from a kafka topic and write to the local file system Inside the avro record there is a date time field. From that field I need to name the file accordingly. (20180103) as an example I was thinking of using flink to read, unpack this generic record then put to a sink tha

Re: Deep Copy in FLINK, Kryo Copy is used in the different operator

2018-02-14 Thread Gábor Gévay
Hello, You might also be able to make Flink use a better serializer than Kryo. Flink falls back to Kryo when it can't use its own serializers, see here: https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/types_serialization.html For example, it might help to make your type a POJO. Be

Re: Deep Copy in FLINK, Kryo Copy is used in the different operator

2018-02-14 Thread Aljoscha Krettek
Hi, You can disable those copies via ExecutionConfig.enableObjectReuse(), which you can get from the StreamExecutionEnvironment via getConfig(). Best, Aljoscha > On 12. Feb 2018, at 04:00, chen wrote: > > Actually our team have our own Stream Engine, we tested our engine and flink, > find out

Re: Deploying Flink with JobManager HA on Docker Swarm/Kubernetes

2018-02-14 Thread Aljoscha Krettek
It should be roughly the same settings that you use in your JobManager. They are described here: https://ci.apache.org/projects/flink/flink-docs-master/ops/config.html#zookeeper-based-ha-mode > On 14. Feb 2018, at 15:32, Chirag Dewan wrote: > > Thanks Aljoscha. > > I haven't checked that bit.

Re: Deploying Flink with JobManager HA on Docker Swarm/Kubernetes

2018-02-14 Thread Chirag Dewan
Thanks Aljoscha. I haven't checked that bit. Is there any configuration for TaskManagers to find ZK? Regards, Chirag Sent from Yahoo Mail on Android On Wed, 14 Feb 2018 at 7:43 PM, Aljoscha Krettek wrote: Do you see in the logs whether the TaskManager correctly connect to ZooKeeper as we

Unexpected hop start & end timestamps after stream SQL join

2018-02-14 Thread Juho Autio
I'm joining a tumbling & hopping window in Flink 1.5-SNAPSHOT. The result is unexpected. Am I doing something wrong? Maybe this is just not a supported join type at all? Any way here goes: I first register these two tables: 1. new_ids: a tumbling window of seen ids within the last 10 seconds: SE

Re: Deploying Flink with JobManager HA on Docker Swarm/Kubernetes

2018-02-14 Thread Aljoscha Krettek
Do you see in the logs whether the TaskManager correctly connect to ZooKeeper as well? They need this in order to find the JobManager leader. Best, Aljoscha > On 14. Feb 2018, at 06:12, Chirag Dewan wrote: > > Hi, > > I am trying to deploy a Flink cluster (1 JM, 2TM) on a Docker Swarm. For >

Re: NullPointerException when asking for batched TableEnvironment

2018-02-14 Thread André Schütz
Hi, to concretizise the example. We get the batch table environment. Process some data and get results. After that we receive a stream table environment and after executing the stream, the system throws the batch null pointer exception. Means, the batch table environment already processed some dat

Re: Retrieve written records of a sink after job

2018-02-14 Thread Flavio Pompermaier
So, if I'm not wrong, the right way to do this is using accumulators..what do you think about my proposal to add an easy way to add to a sink an accumulator for the written/outputed records? On Wed, Feb 14, 2018 at 1:08 PM, Chesnay Schepler wrote: > Technically yes, a subset of metrics is stored

Re: NullPointerException when asking for batched TableEnvironment

2018-02-14 Thread Timo Walther
Hi, by looking at the source code it seems that your "batchEnvironment" is null. Did you verify this? Regards, Timo Am 2/14/18 um 1:01 PM schrieb André Schütz: Hi, within the Flink Interpreter context, we try to get a Batch TableEnvironment with the following code. The code was executed wi

Re: Flink + Consul as HA backend. What do you think?

2018-02-14 Thread Chesnay Schepler
Hello, I don't know anything about Consul but the prospect of having other options beside Zookeeper is very interesting. It's rather surprising how little you had to modify existing classes to get this to work. It may take a bit until someone provides proper feedback as the community is curr

Re: Sample project does not work ?

2018-02-14 Thread Chesnay Schepler
I agree that the quickstart docs/pom are lacking in this regard and have filed FLINK-8654 . On 14.02.2018 13:11, Esa Heikkinen wrote: Hi Thank you. It works now J Maybe strict syntax of the “flink run” –command could be added to the sample p

RE: Sample project does not work ?

2018-02-14 Thread Esa Heikkinen
Hi Thank you. It works now :) Maybe strict syntax of the "flink run" -command could be added to the sample project ? Esa From: Chesnay Schepler [mailto:ches...@apache.org] Sent: Wednesday, February 14, 2018 12:27 PM To: user@flink.apache.org Subject: Re: Sample project does not work ? The qui

Re: Retrieve written records of a sink after job

2018-02-14 Thread Chesnay Schepler
Technically yes, a subset of metrics is stored in the ExecutionGraph when the job finishes. (This is for example where the webUI derives the values from for finished jobs). However these are on the task level, and will not contain the number of incoming records if your sink is chained to anothe

NullPointerException when asking for batched TableEnvironment

2018-02-14 Thread André Schütz
Hi, within the Flink Interpreter context, we try to get a Batch TableEnvironment with the following code. The code was executed within a Apache Zeppelin paragraph. [code] import org.apache.flink.table.api._ import org.apache.flink.table.api.scala._ import org.apache.flink.table.sources._ val bat

Re: Retrieve written records of a sink after job

2018-02-14 Thread Flavio Pompermaier
The problem here is that I don't know the vertex id of the sink..would it be possible to access the sink info by id? And couldn't be all those info attached to the JobExecutionResult (avoiding to set up all the rest connection etc)? On Wed, Feb 14, 2018 at 12:44 PM, Chesnay Schepler wrote: > The

Re: Retrieve written records of a sink after job

2018-02-14 Thread Chesnay Schepler
The only way to access this info from the client is the REST API or the Metrics REST API

Re: Retrieve written records of a sink after job

2018-02-14 Thread Flavio Pompermaier
Actually I'd like to get this number from my Java class in order to update some external dataset "catalog", so I'm asking if there's some programmatic way to access this info (from JobExecutionResult for example). On Wed, Feb 14, 2018 at 12:25 PM, Chesnay Schepler wrote: > Do you want to know ho

Re: Retrieve written records of a sink after job

2018-02-14 Thread Chesnay Schepler
Do you want to know how many records the sink received, or how many the sink wrote to the DB? If it's the first you're in luck because we measure that already, check out the metrics documentation. If it's the latter, then this issue is essentially covered by FLINK-7286 which aims at allowing fun

Retrieve written records of a sink after job

2018-02-14 Thread Flavio Pompermaier
Hi to all, I have a (batch) job that writes to 1 or more sinks. Is there a way to retrieve, once the job has terminated, the number of records written to each sink? Is there any better way than than using an accumulator for each sink? If that is the only way to do that, the Sink API could be enrich

Re: Sample project does not work ?

2018-02-14 Thread Chesnay Schepler
The quickstarts do not generate separate jars, but only one that contains all the sample jobs as described here . To select a specific job from the jar when submitting a job you can use th

RE: Sample project does not work ?

2018-02-14 Thread Esa Heikkinen
Hi For example I don't find WordCount.jar ! Or is it included in some other jar ? Esa From: Chesnay Schepler [mailto:ches...@apache.org] Sent: Wednesday, February 14, 2018 11:33 AM To: user@flink.apache.org Subject: Re: Sample project does not work ? Hello, What exactly is not working? What yo

Re: Sample project does not work ?

2018-02-14 Thread Chesnay Schepler
Hello, What exactly is not working? What you've shown looks fine to me. The following isn't due to your setup, and in this case can be safely ignored: [WARNING] com.data-artisans:flakka-slf4j_2.10:2.3-custom requires scala version: 2.10.4 [WARNING] org.clapper:grizzled-slf4j_2.10:1.0.2 req

Re: Python and Scala

2018-02-14 Thread Piotr Nowojski
Hi, I have never used it before, but it’s described in the documentation: https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/scala_shell.html the "Adding external dependencies” part. It should answe

RE: Python and Scala

2018-02-14 Thread Esa Heikkinen
Hi Good news. Is it way to supply Scala-code from file to REPL ? It seems the compiling is too complicated operation.. Actually I don’t get it to work yet. Esa From: Piotr Nowojski [mailto:pi...@data-artisans.com] Sent: Wednesday, February 14, 2018 10:55 AM To: Esa Heikkinen Cc: Esa Heikkinen

Re: Python and Scala

2018-02-14 Thread Piotr Nowojski
Hi Scala REPL uses the same code as compiled library so they should work the same. Piotrek > On 13 Feb 2018, at 18:32, Esa Heikkinen wrote: > > Hi > > And what about the differences between Scala REPL and Scala (compiled) ? > Esa > > Piotr Nowojski kirjoitti 13.2.2018 klo 15:14: >> Hi, >> >

Sample project does not work ?

2018-02-14 Thread Esa Heikkinen
Hi I have tried sample project from: https://ci.apache.org/projects/flink/flink-docs-release-1.4/quickstart/scala_api_quickstart.html#maven Versions: Linux: DISTRIB_DESCRIPTION="Ubuntu 16.04.3 LTS" Maven: Apache Maven 3.3.9 Java: openjdk version "1.8.0_151" OpenJDK Runtime Environment (build 1.8