Re: Submit Flink Jobs to YARN running on AWS

2016-04-26 Thread Robert Metzger
Hi Abhi, I'll try to reproduce the issue and come up with a solution. On Tue, Apr 26, 2016 at 1:13 AM, Bajaj, Abhinav wrote: > Hi Fabian, > > Thanks for your reply and the pointers to documentation. > > In these steps, I think the Flink client is installed on the master node, > referring to ste

Re: Job hangs

2016-04-26 Thread Robert Metzger
Hi Timur, thank you for sharing the source code of your job. That is helpful! Its a large pipeline with 7 joins and 2 co-groups. Maybe your job is much more IO heavy with the larger input data because all the joins start spilling? Our monitoring, in particular for batch jobs is really not very adv

Re: YARN terminating TaskNode

2016-04-25 Thread Robert Metzger
Hi Timur, The reason why we only allocate 570mb for the heap is because you are allocating most of the memory as off heap (direct byte buffers). In theory, the memory footprint of the JVM is limited to 570 (heap) + 1900 (direct mem) = 2470 MB (which is below 2500). But in practice thje JVM is all

Re: Getting java.lang.Exception when try to fetch data from Kafka

2016-04-25 Thread Robert Metzger
Hi Prateek, were the messages written to the Kafka topic by Flink, using the TypeInformationKeyValueSerializationSchema ? If not, maybe the Flink deserializers expect a different data format of the messages in the topic. How are the messages written into the topic? On Fri, Apr 22, 2016 at 10:21

Re: "No more bytes left" at deserialization

2016-04-24 Thread Robert Metzger
For the second exception, can you check the logs of the failing taskmanager (10.105.200.137)? I guess these logs some details on why the TM timed out. Are you on 1.0.x or on 1.1-SNAPSHOT? We recently changed something related to the ExecutionConfig which has lead to Kryo issues in the past. On

Re: How to fetch kafka Message have [KEY,VALUE] pair

2016-04-22 Thread Robert Metzger
If you've serialized your data with a custom format, you can also implement a custom deserializer using the KeyedDeserializationSchema. On Fri, Apr 22, 2016 at 2:35 PM, Till Rohrmann wrote: > Depending on how the key value pair is encoded, you could use the > TypeInformationKeyValueSerialization

Re: Flink + S3

2016-04-20 Thread Robert Metzger
Hi Michael-Keith, Welcome to the Flink community! Let me try answer your question regarding the "best" deployment options: >From what I see from the mailing list, most of our users are using one of the big hadoop distributions (including Amazon EMR) with YARN installed. Having YARN makes things q

Re: Leader not found

2016-04-20 Thread Robert Metzger
gt; Flink version : 1.0.0 > Kafka version : 0.8.2.1 > > Try to use a topic which has no message posted to it, at the time flink > starts. > > On Tue, Apr 19, 2016 at 5:41 PM, Robert Metzger > wrote: > >> Can you provide me with the exact Flink and Kafka versions you are

Re: ClasNotFound when submitting job from command line

2016-04-20 Thread Robert Metzger
Hi Flavio, in which class are you calling Class.forName()? Is the class where the Class.forName() call is loaded from the user jar or is it a class from the Flink distribution? I'm asking because Class.forName() is using the classloader of the class where the call is located. So if the class has b

Re: Leader not found

2016-04-19 Thread Robert Metzger
t; the flink application emits this error and bails, could this be missed use > case in the fix. > > On Tue, Apr 19, 2016 at 3:58 PM, Robert Metzger > wrote: > >> Hi, >> >> I'm sorry, the documentation in the JIRA issue is a bit incorrect. The >> issue has been

Re: Leader not found

2016-04-19 Thread Robert Metzger
Hi, I'm sorry, the documentation in the JIRA issue is a bit incorrect. The issue has been fixed in all versions including and after 1.0.0. Earlier releases (0.10, 0.9) will fail when the leader changes. However, you don't necessarily need to upgrade to Flink 1.0.0 to resolve the issue: With checkp

Re: Flink + Kafka + Scalabuff issue

2016-04-19 Thread Robert Metzger
Hi Alex, I suspect its a GC issue with the code generated by ScalaBuff. Can you maybe try to do something like a standalone test where use use a while(true) loop to see how fast you can deserialize elements from your Foo type? Maybe you'll find that the JVM is growing all the time. Then there's pro

Re: Compression - AvroOutputFormat and over network ?

2016-04-18 Thread Robert Metzger
Hi Tarandeep, I think for that you would need to set a codec factory on the DataFileWriter. Sadly we don't expose that method to the user. If you want, you can contribute this change to Flink. Otherwise, I can quickly fix it. Regards, Robert On Mon, Apr 18, 2016 at 2:36 PM, Ufuk Celebi wrote:

Re: throttled stream

2016-04-18 Thread Robert Metzger
Hi, I would also go for Niels approach. If the mapper has the same parallelism as the source and its right after it, it'll be chained to the source. The throttling then happens with almost no overhead. Regarding the ThrottledIterator: Afaik there is no iterator involved when reading data out of th

Re: Flink 0.10.2 and Kafka 0.8.1

2016-04-18 Thread Robert Metzger
ert-schmidtke/HiBench/blob/flink-streaming/src/streambench/flinkbench/pom.xml >> >> I would guess the POM is similar to the one in the sample project, yet >> when building it, the jar does not contain all indirect dependencies. >> >> Thanks!! >> >> Robert &g

Re: Flink 0.10.2 and Kafka 0.8.1

2016-04-18 Thread Robert Metzger
between the sample project and the one I'm working on. > > Thanks! Really quick help. > > Robert > > On Mon, Apr 18, 2016 at 4:02 PM, Robert Metzger > wrote: > >> Hi, >> the problem with the posted project is that it doesn't have the Flink >> kafka c

Re: Flink 0.10.2 and Kafka 0.8.1

2016-04-18 Thread Robert Metzger
package > -Pbuild-jar and mvn clean package) do not contain the org/apache/kafka/** > classes. Can you have a quick look at the pom? However, as I said, it's > verbatim Archetype+Flink Docs. > > Thanks a lot in advance! > > Robert > > > > On Mon, Apr 18, 2016 at 1

Re: Flink 0.10.2 and Kafka 0.8.1

2016-04-18 Thread Robert Metzger
Hi, did you check your user jar if it contains the Kafka classes? Are you building a fat jar? Are you manually excluding any dependencies? Flink's 0.10.2 Kafka connector depends on Kafka 0.8.2.0 [1] which in turn depends on kafka-clients 0.8.2.0 [2]. And the "kafka-clients" dependency also contain

Re: Flink performance pre-packaged vs. self-compiled

2016-04-14 Thread Robert Metzger
Hi Robert, check out the tools/create_release_files.sh file in the source tree. There you can see how we are building the release binaries. It would be quite interesting to find out what caused the performance difference. On Wed, Apr 13, 2016 at 5:03 PM, Robert Schmidtke wrote: > Hi everyone, >

Re: Possible use case: Simulating iterative batch processing by rewinding source

2016-04-11 Thread Robert Metzger
Flink's DataStream API also allows reading files from disk (local, hdfs, etc.). So you don't have to set up Kafka to make this work (If you have it already, you can of course use it). On Mon, Apr 11, 2016 at 11:08 AM, Ufuk Celebi wrote: > On Mon, Apr 11, 2016 at 10:26 AM, Raul Kripalani wrote:

Re: modify solution set within the delta iteration

2016-04-09 Thread Robert Metzger
Hi Riccardo, I don't think you can use delta iterations when your solution set changes over time. For those cases, I would use the bulk iterations in Flink. On Mon, Mar 21, 2016 at 12:26 PM, Riccardo Diomedi < riccardo.diomed...@gmail.com> wrote: > I try to explain my situation > > I’m doing a d

Re: How to test serializability of a Flink job

2016-04-09 Thread Robert Metzger
Hi Simone, do you have a stack trace for the error? Usually the user code serialization is the same locally and on a cluster. On Tue, Apr 5, 2016 at 12:02 PM, Simone Robutti < simone.robu...@radicalbit.io> wrote: > Hello, > > last week I got a problem where my job worked in local mode but could

Re: Joda DateTimeSerializer

2016-04-08 Thread Robert Metzger
Hi Stefano, your fix is the right way to resolve the issue ;) If you want, give me your Confluence Wiki username and I give you edit permissions in our wiki. Otherwise, I'll quickly add a note to the migration guide. On Fri, Apr 8, 2016 at 11:28 AM, Stefano Bortoli wrote: > Hi to all, > we've

Re: Integrate Flink with S3 on EMR cluster

2016-04-08 Thread Robert Metzger
Hi Timur, the Flink optimizer runs on the client, so the exception is thrown from the JVM running the ./bin/flink client. Since the statistics sampling is an optional step, its surrounded by a try / catch block that just logs the error message. More answers inline below On Thu, Apr 7, 2016 at 1

Re: YARN High Availability

2016-04-07 Thread Robert Metzger
rote: > > Hi Robert, > > > > I tried several paths and rmr before. > > > > It stopped after 1-2 minutes. There was an exception on the shell. > > Sorry, should have attached to the last mail. > > > > Thanks, > > > > Konstnatin > > >

Re: Powered by Flink

2016-04-05 Thread Robert Metzger
Hi everyone, I would like to bring the "Powered by Flink" wiki page [1] to the attention of Flink user's who recently joined the Flink community. The list tracks which organizations are using Flink. If your company / university / research institute / ... is using Flink but the name is not yet list

Re: YARN High Availability

2016-04-05 Thread Robert Metzger
sked Max to > trigger a > >> >> new build of them. > >> >>>> > >> >>>> Regarding Aljoscha’s idea: I like it. It is essentially a > shortcut > >> >> for configuring the root path. > >> >>

Re: TopologyBuilder throws java.lang.ExceptionInInitializerError

2016-04-04 Thread Robert Metzger
Hi, I suspect that this dependency: storm storm-kafka 0.9.0-wip16a-scala292 pulls in a different storm version. Can you exclude the storm from that dependency? You can also run: mvn clean install and the

Re: scaling a flink streaming application on a single node

2016-04-04 Thread Robert Metzger
Hi, usually it doesn't make sense to run multiple task managers on a single machine to get more slots. Your machine has only 4 CPU cores, so you are just putting a lot of pressure on the cpu scheduler.. On Thu, Mar 31, 2016 at 7:16 PM, Shinhyung Yang wrote: > Thank you for replying! > > I am tr

Re: Stack overflow from self referencing Avro schema

2016-03-22 Thread Robert Metzger
Hey David, FLINK-3602 has been merged to master. On Fri, Mar 11, 2016 at 5:11 PM, David Kim wrote: > Thanks Stephan! :) > > On Thu, Mar 10, 2016 at 11:06 AM, Stephan Ewen wrote: > >> The following issue should track that. >> https://issues.apache.org/jira/browse/FLINK-3602 >> >> @Niels: Thanks

Re: Not enough free slots to run the job

2016-03-21 Thread Robert Metzger
> (for this case the ‘spare slots’ will not be of help right; losing a TM > means the job will fail, no recovery) > > Thanks! > > Best, > Ovidiu > > > On 21 Mar 2016, at 14:15, Robert Metzger wrote: > > Hi Ovidiu, > > right now the scheduler in Flink will n

Re: Not enough free slots to run the job

2016-03-21 Thread Robert Metzger
Hi Ovidiu, right now the scheduler in Flink will not use more slots than requested. To avoid issues on recovery, we usually recommend users to have some spare slots (run job with p=15 on a cluster with slots=20). I agree that it would make sense to add a flag which allows a job to grab more slots

Re: Access to S3 from YARN on EC2

2016-03-20 Thread Robert Metzger
Hi, did you check if the "org.apache.hadoop.fs.s3native.NativeS3FileSystem" class is in the flink-dist.jar in the lib/ folder? On Sun, Mar 20, 2016 at 10:19 AM, Ashutosh Kumar wrote: > I have setup a 3 node YARN based cluster on EC2. I am running flink in > cluster mode. I added these lines in

Re: S3 Timeouts with lots of Files Using Flink 0.10.2

2016-03-19 Thread Robert Metzger
The default timeout for opening a split is 5 minutes. You can set a higher value with "taskmanager.runtime.fs_timeout" (milliseconds), but I believe that 5 minutes is already way too long. It would be interesting to find out the root cause of this. On Thu, Mar 17, 2016 at 11:00 PM, Sourigna Phetsa

Re: Javadoc version

2016-03-19 Thread Robert Metzger
using. That's the main issue for me, not whether > the docs are slightly out of sync with the actual 1.0 release code. > > In any case, it's a minor issue. I thought originally someone just forgot > to update the version during the doc release process. > > Regards, &g

Re: Javadoc version

2016-03-19 Thread Robert Metzger
Hi Ken, we are building the docs for each version based on the "release-x.y" branch. This branch contains the snapshot version of the respective version. This allows us to fix documentation issues without releasing a new version. But you are right, it may happen that the docs / javadocs are slight

Re: Unable to Read from Kafka [ zookeeper.connect error]

2016-03-19 Thread Robert Metzger
Hi, you can set the property by passing it as an argument when starting the job: --zookeeper.connect localhost:2181 On Wed, Mar 16, 2016 at 3:17 PM, subash basnet wrote: > Hello all, > > I ran the WriteIntoKafka.java/WikipediaAnalysis.java from flink-streaming > examples and able to view the out

Re: JDBCInputFormat preparation with Flink 1.1-SNAPSHOT and Scala 2.11

2016-03-16 Thread Robert Metzger
Sorry for joining this discussion late. Maybe this is also interesting for you: http://www.confluent.io/blog/bottled-water-real-time-integration-of-postgresql-and-kafka/ On Wed, Mar 9, 2016 at 1:47 PM, Prez Cannady wrote: > Thanks. Need to dive in a bit better, but I did clarify some things in

Re: Integration Alluxio and Flink

2016-03-15 Thread Robert Metzger
Hi Andrea, the filesystem class can not be in the job jar. You have to put it into the lib folder. On Tue, Mar 15, 2016 at 5:40 PM, Andrea Sella wrote: > Hi Till, > > I put the jar as dependency of my job on build.sbt. I need to do > somenthing else? > > val flinkDependencies = Seq( > "org.ap

Re: kafka.javaapi.consumer.SimpleConsumer class not found

2016-03-15 Thread Robert Metzger
PM, Balaji Rajagopalan < >> balaji.rajagopa...@olacabs.com> wrote: >> >>> Yes figured that out, thanks for point that, my bad. I have put back >>> 0.10.2 as flink version, will try to reproduce the problem again, this time >>> I have explicitly called out the

Re: Kafka integration error

2016-03-14 Thread Robert Metzger
Hi Stefanos, this looks like an issue with Kafka. Which version does your broker have? Can you check the logs of the broker you are trying to connect to? On Fri, Mar 11, 2016 at 5:27 PM, Stefanos Antaris < antaris.stefa...@gmail.com> wrote: > Hi to all, > > i am trying to make Flink to work with

Re: kafka.javaapi.consumer.SimpleConsumer class not found

2016-03-14 Thread Robert Metzger
icense"> >> AL2 >> Apache License >> 2.0 >> >> >> Copyright 2015 data Artisans GmbH >> Licensed under the Apache License, Version >> 2.0 (the "License"); >&g

Re: Log4j configuration on YARN

2016-03-14 Thread Robert Metzger
Hi Nick, the name of the "log4j-yarn-session.properties" file might be a bit misleading. The file is just used for the YARN session client, running locally. The Job- and TaskManager are going to use the log4j.properties on the cluster. On Fri, Mar 11, 2016 at 7:20 PM, Ufuk Celebi wrote: > Hey N

Re: Flink and YARN ship folder

2016-03-14 Thread Robert Metzger
Hi Andrea, You don't have to manually replicate any operations on the slaves. All files in the lib/ folder are transferred to all containers (Jobmanagers and TaskManagers). On Sat, Mar 12, 2016 at 3:25 PM, Andrea Sella wrote: > Hi Ufuk, > > I'm trying to execute the WordCount batch example wit

Re: kafka.javaapi.consumer.SimpleConsumer class not found

2016-03-14 Thread Robert Metzger
will try to see scala 2.10 makes any > difference. > > balaji > > On Fri, Mar 11, 2016 at 8:06 PM, Robert Metzger > wrote: > >> Hi, >> >> you have to use the same version for all dependencies from the >> "org.apache.flink" group. >> &g

Re: Passing two value to the ConvergenceCriterion function

2016-03-14 Thread Robert Metzger
Hi, take a look at the "Record" class. That one implements the Value interface and can have multiple values. On Fri, Mar 11, 2016 at 6:01 PM, Riccardo Diomedi < riccardo.diomed...@gmail.com> wrote: > Hi > > I want to send two value to the ConvergenceCriterion function, so i > decided to use an a

Re: Silly keyBy() error

2016-03-13 Thread Robert Metzger
Hey Ron, for accessing keys of a class by their field name (like you did: .keyBy("accountId", "agentId", "wideMetricId")), the class needs to be recognized as a POJO by Flink. >From the documentation [1] a class is recognized as a POJO when: - The class must be public. - It must ha

Re: Running Flink 1.0.0 on YARN

2016-03-11 Thread Robert Metzger
Hi, the first issue you are describing is expected. Flink is starting the web interface on the container running the JobManager, not on the resource manager. Also, the port is allocated dynamically, to avoid port collisions. So its not started on 8081. However, you can access the web interface fro

Re: protobuf messages from Kafka to elasticsearch using flink

2016-03-11 Thread Robert Metzger
Hi, I think what you have to do is the following: 1. Create your own DeserializationSchema. There, the deserialize() method gets a byte[] for each message in Kafka 2. Deserialize the byte[] using the generated classes from protobuf. 3. If your datatype is called "Foo", there should be a generated

Re: kafka.javaapi.consumer.SimpleConsumer class not found

2016-03-11 Thread Robert Metzger
Hi, you have to use the same version for all dependencies from the "org.apache.flink" group. You said these are the versions you are using: flink.version = 0.10.2 kafka.verison = 0.8.2 flink.kafka.connection.verion=0.9.1 For the connector, you also need to use 0.10.2. On Fri, Mar 11, 2016 at

Re: Flink streaming throughput

2016-03-11 Thread Robert Metzger
Hi Hironori, can you try with the kafka-console-consumer how many messages you can read in one minute? Maybe the broker's disk I/O is limited because everything is running in virtual machines (potentially sharing one hard disk?) I'm also not sure if running a Kafka 0.8 consumer against a 0.9 broke

Re: Checkpoint

2016-03-10 Thread Robert Metzger
Hi Vijay, regarding your other questions: 1) On the TaskManagers, the FlinkKafkaConsumers will write the partitions they are going to read in the log. There is currently no way of seeing the state of a checkpoint in Flink (which is the offsets). However, once a checkpoint is completed, the Kafka

Re: java.lang.NoClassDefFoundError for Keys Class

2016-03-04 Thread Robert Metzger
The problem is probably this dependency: org.apache.kafka kafka_2.9.1 0.8.2.0 Since it pulls Scala 2.9. You can probably remove this dependency because our kafka connector will also pull also that kafka dependency. On Fri, Mar 4, 2016 at 8:07 PM, Madhire, Naveen < naveen

Re: readTextFile is not working for StreamExecutionEnvironment

2016-03-02 Thread Robert Metzger
Hi, you need to explicitly trigger the execution by calling "env.execute()" On Wed, Mar 2, 2016 at 1:36 PM, Balaji Rajagopalan < balaji.rajagopa...@olacabs.com> wrote: > def main(args: Array[String]): Unit = { > > > > val env: StreamExecutionEnvironment = > StreamExecutionEnvironment.getExecut

Re: kafka and flink integration issue

2016-02-27 Thread Robert Metzger
Hi Pankaj, I suspect you are trying to start Flink on a cluster with Flink 0.10.1 installed? On Sat, Feb 27, 2016 at 9:20 AM, Pankaj Kumar wrote: > I am trying to integrate kafka and flink. > my pom file is where {flink.version} is 0.10.2 > > > org.apache.flink > flink-java > ${fli

Re: Graph with stream of updates

2016-02-26 Thread Robert Metzger
Hi Ankur, Can you provide a bit more information on what you are trying to achieve? Do you want to keep a graph build from an stream of events within Flink and query that? Or you you want to change the dataflow graph of Flink while a job is running? Regards, Robert On Thu, Feb 25, 2016 at 11:1

Re: Kafka issue

2016-02-26 Thread Robert Metzger
Are you building 1.0-SNAPSHOT yourself or are you relying on the snapshot repository? We had issues in the past that jars in the snapshot repo were incorrect On Fri, Feb 26, 2016 at 10:45 AM, Gyula Fóra wrote: > I am not sure what is happening. I tried running against a Flink cluster > that is

[VOTE] Release Apache Flink 1.0.0 (RC1)

2016-02-25 Thread Robert Metzger
Dear Flink community, Please vote on releasing the following candidate as Apache Flink version 1.0 .0. I've set user@flink.apache.org on CC because users are encouraged to help testing Flink 1.0.0 for their specific use cases. Please report issues (and successful tests!) on d...@flink.apache.org.

Re: Kafka partition alignment for event time

2016-02-25 Thread Robert Metzger
Hi Erdem, FLINK-3440 has been resolved. The fix is merged to master. 1.0-SNAPSHOT should already contain the fix and it'll be in 1.0.0 (for which I'll post a release candidate today) as well. On Thu, Feb 18, 2016 at 3:24 PM, Erdem Agaoglu wrote: > Thanks Stephan > > On Thu, Feb 18, 2016 at 3:00

Re: Problem with Kafka 0.9 Client

2016-02-23 Thread Robert Metzger
;, ".37:2181"); >> properties.setProperty("group.id", "test"); >> properties.setProperty("client.id", "flink_test"); >> properties.setProperty("auto.offset.reset", "earliest"); >> prop

Re: Flink HA

2016-02-22 Thread Robert Metzger
Hi Thomas, To avoid having jobs forever restarting, you have to cancel them manually (from the web interface or the /bin/flink client). Also, you can set an appropriate restart strategy (in 1.0-SNAPSHOT), which limits the number of retries. This way the retrying will eventually stop. On Fri, Feb

Re: state.backend.fs.checkpointdir setting

2016-02-22 Thread Robert Metzger
Hi, how is your cluster setup? Do you have multiple machines, or only one? Did you copy the configuration to all machines? On Fri, Feb 19, 2016 at 6:08 PM, Andrew Ge Wu wrote: > Hi All, > > I have been experiencing an error stopping my HA standalone setup. > > The cluster startup just fine, b

Re: Use jvm to run flink on single-node machine with many cores

2016-02-21 Thread Robert Metzger
Hi Ana, you can create a StreamExecutionEnvironment also by passing a configuration object. In the configuration, you can also configure the number of network buffers. // set up the execution environmentConfiguration conf = new Configuration(); conf.setBoolean("taskmanager.network.numberOfBuffer

Re: How to increase akka heartbeat?

2016-02-19 Thread Robert Metzger
om Actor[akka.tcp://flink@myip:6123/user/jobmanager#1144818256]. >> » >> >> I tried to set the heartbeat interval in the cluster but it didn't solve >> the problem, should I try to set it in the client (how can I do it)? I see >> no other errors or exceptions on the

Re: How to increase akka heartbeat?

2016-02-19 Thread Robert Metzger
Hi Saiph, are you sure that the jobs are cancelled because the client disconnects? For the different timeouts, check the configuration page: https://ci.apache.org/projects/flink/flink-docs-release-0.10/setup/config.html and search for "heartbeat". On Fri, Feb 19, 2016 at 8:04 PM, Saiph Kappa wr

Re: streaming hdfs sub folders

2016-02-19 Thread Robert Metzger
Hi Martin, where is the null pointer exception thrown? I think you didn't call the open() method of the AvroInputFormat. Maybe that's the issue. On Thu, Feb 18, 2016 at 5:01 PM, Martin Neumann wrote: > I tried to implement your idea but I'm getting NullPointer exceptions from > the AvroInputFor

Re: Problem with Kafka 0.9 Client

2016-02-19 Thread Robert Metzger
executions. We are > basically trying to read from our kafka cluster and then writing the data > to elasticsearch. > > Thanks for your help! > > On 18 February 2016 at 11:19, Robert Metzger wrote: > >> Hi Javier, >> >> sorry for the late response. In

Re: Problem with Kafka 0.9 Client

2016-02-18 Thread Robert Metzger
Hi Javier, sorry for the late response. In the Error Mapping of Kafka, it says that code 15 means: ConsumerCoordinatorNotAvailableCode. https://github.com/apache/kafka/blob/0.9.0/core/src/main/scala/kafka/common/ErrorMapping.scala How many brokers did you put into the list of bootstrap servers? C

Re: Availability for the ElasticSearch 2 streaming connector

2016-02-17 Thread Robert Metzger
Hi Mihail, It seems that nobody is actively working on the elasticsearch2 connector right now. The 1.0.0 release is already feature frozen, only bug fixes or (some) pending pull requests go in. What you can always do is copy the code from our current elasticsearch connector, set the dependency to

Flink 1.0.0 Release Candidate 0: Please help testing

2016-02-15 Thread Robert Metzger
Hi, I've now created a "preview RC" for the upcoming 1.0.0 release. There are still some blocking issues and important pull requests to be merged but nevertheless I would like to start testing Flink for the release. In past major releases, we needed to create many release candidates, often for fi

Re: consume kafka stream with flink

2016-02-12 Thread Robert Metzger
Hi Tanguy, I would recommend to refer to the documentation of the specific Flink version you are using. This is the documentation for 1.0-SNAPSHOT: https://ci.apache.org/projects/flink/flink-docs-master/apis/streaming/connectors/kafka.html and this is the doc for 0.10.x: https://ci.apache.org/pro

Re: job manager timeout

2016-02-11 Thread Robert Metzger
Hi Radu, did you check the JobManager logs as well? Maybe there you can see why the JobManager is failing. The timeout is configurable through the "akka.client.timeout" variable. The default value is "60 s". On Wed, Feb 10, 2016 at 7:35 PM, Radu Tudoran wrote: > Hi, > > > > I am running a prog

Re: Simple Flink - Kafka Test

2016-02-11 Thread Robert Metzger
Quick clarification on Stephan's comment: In Flink 0.10, no suffix means scala 2.10, for Scala 2.11 you have to add the _2.11 suffix to ALL dependencies (including flink-java_2.11, flink-core_2.11 and so on). In Flink 1.0, all artifacts depending on scala have a version suffix. For example flink-cl

Re: Compilation error while instancing FlinkKafkaConsumer082

2016-02-11 Thread Robert Metzger
Hi, which build system are you using? Can you maybe post the configuration file of that build system ? (pom.xml / sbt file). I suspect that some of the dependencies are wrong. Maybe not all have the right scala version suffix or there is a version mix. On Wed, Feb 10, 2016 at 5:17 PM, Simone Rob

Re: pom file scala plugin error [flink-examples-batch project]

2016-02-09 Thread Robert Metzger
Hi Subash, I think the two errors are unrelated. Maven is failing because of the checkstyle plugin. It checks if the code follows our coding guidelines. If you are experienced with IntelliJ, I would suggest to use that IDE. Most Flink committers are using it because its difficult to get Eclipse s

Re: GC on TaskManagers stats

2016-02-09 Thread Robert Metzger
Hi Guido, sorry for the late reply. You were collecting the stats every 1 second. Afaik, Flink is internally collecting the stats with a frequency of 5 seconds, so you can either change your or Flink's polling interval (I think its taskmanager.heartbeat-interval) Regarding the details on PS-Scave

Re: Flink on YARN: Stuck on "Trying to register at JobManager"

2016-02-08 Thread Robert Metzger
Pieter Hameete : > >> Matter of RTFM eh ;-) thx and sorry for the bother. >> >> 2016-02-08 17:06 GMT+01:00 Robert Metzger : >> >>> You said earlier that you are using Flink 0.10. The feature is only >>> available in 1.0-SNAPSHOT. >>> >>

Re: Flink on YARN: Stuck on "Trying to register at JobManager"

2016-02-08 Thread Robert Metzger
> 2016-02-07 20:04 GMT+01:00 Pieter Hameete : > >> I found the relevant information on the website. Ill consult with the >> cluster admin tomorrow, thanks for the help :-) >> >> - Pieter >> >> 2016-02-07 19:31 GMT+01:00 Robert Metzger : >> >>> Hi

Re: Flink on YARN: Stuck on "Trying to register at JobManager"

2016-02-07 Thread Robert Metzger
de that is on >>>> the same >>>> > network as the hadoop cluster there are no problems registering with >>>> the >>>> > JobManager. I did also notice the following message in the local >>>> console: >>>> > >>>

Re: Performance insights

2016-02-06 Thread Robert Metzger
You can count the number of elements per key. This allows you to see how they are distributed. On Sat, Feb 6, 2016 at 1:23 PM, Flavio Pompermaier wrote: > And what if I detect some skewness in some task? Do I have to try to call > rebalance()?is there a way to identify the keys causing the skewn

Re: Flink on YARN: Stuck on "Trying to register at JobManager"

2016-02-06 Thread Robert Metzger
Hi, did you check the logs of the JobManager itself? Maybe it'll tell us already whats going on. On Sat, Feb 6, 2016 at 12:14 PM, Pieter Hameete wrote: > Hi Guys! > > Im attempting to run Flink on YARN, but I run into an issue. Im starting > the Flink YARN session from an Ubuntu 14.04 VM. All g

Re: Flink_Kafka

2016-02-04 Thread Robert Metzger
Hi, I don't understand what you are trying to achieve. If you want to read the topic from the beginning, use a different group.id. Flink should consume data from the topic when you produce something into it. As you can see from the log statement, its at the "INFO" log level, hence its not an issue

Re: Stream conversion

2016-02-04 Thread Robert Metzger
I'm wondering which kind of transformations you want to apply to the window you cannot apply with the DataStream API? Would it be sufficient for you to have the windows as files in HDFS and then run batch jobs against the windows on disk? If so, you could use our filesystem sink, which creates fil

Re: Internal buffers supervision and yarn vCPUs

2016-02-04 Thread Robert Metzger
Hi Gwen, let me answer the second question: There is a JIRA to reintroduce the configuration parameter: https://issues.apache.org/jira/browse/FLINK-2213. I will try to get a fix for this into the 1.0 release. I think I removed back then because users were unable to define the number of vcores ind

Re: TaskManager unable to register with JobManager

2016-02-03 Thread Robert Metzger
Hi, the TaskManager is starting up, but its not able to register at the job manager. Did you check the JobManager log? Do you see anything suspicious there? Are the ports matching? On Wed, Feb 3, 2016 at 9:23 PM, Ravinder Kaur wrote: > Hello, > > Thank you for pointing it out. I had a little t

Re: Left join with unbalanced dataset

2016-02-02 Thread Robert Metzger
Hi Arnaud, you can retrieve the logs of a yarn application by calling "yarn logs -applicationId ". Its going to output you the logs of all Taskmanagers and the job manager in one stream. I would pipe the output into a file and then search for the position where the log for the failing taskmanager

Re: Flume source support

2016-01-28 Thread Robert Metzger
Hi Alex, I'm sorry that you found that class commented. I think so far nobody was interested in using the Flume source.It was commented out during a refactoring of the stream sources: https://github.com/apache/flink/pull/659#discussion-diff-29854032. I just looked a bit through the (commented) cod

Re: Kafka+Flink

2016-01-28 Thread Robert Metzger
Hi Vinaya, this blog post explains how to connect Flink and Kafka: http://data-artisans.com/kafka-flink-a-practical-how-to/ On Thu, Jan 28, 2016 at 1:28 AM, Vinaya M S wrote: > Hi, > > I have a 3 node kafka cluster. In server.properties file of each of them > I'm setting > advertisedhost.name:

Re: Maven artifacts scala 2.11 bug?

2016-01-27 Thread Robert Metzger
Hi David, can you post your SBT build file as well? On Wed, Jan 27, 2016 at 7:52 PM, David Kim wrote: > Hello again, > > I saw the recent change to flink 1.0-SNAPSHOT on explicitly adding the > scala version to the suffix. > > I have a sbt project that fails. I don't believe it's a misconfigura

Re: Flink 0.10.1 and HBase

2016-01-25 Thread Robert Metzger
Hi Christophe, I'm sorry that you ran into the issue. Right now, there is no better fix. For the next releases, I'll take care that this doesn't happen again. Maybe (you are the third user who (however implicitly) requested publicly for a flink 0.10.2 release), we'll do a 0.10.2 before 1.0.0. O

Re: DeserializationSchema isEndOfStream usage?

2016-01-22 Thread Robert Metzger
s://github.com/apache/flink/blob/81320c1c7ee98b9a663998df51cc4d5aa73d9b2a/flink-streaming-connectors/flink-connector-kafka-base/src/main/java/org/apache/flink/streaming/connectors/kafka/FlinkKafkaProducerBase.java#L192 > > Thanks! > David > > On Fri, Jan 22, 2016 at 2:17 AM, Robert Metzg

Re: JDBCInputFormat GC overhead limit exceeded error

2016-01-22 Thread Robert Metzger
ning Oracle JDBC memory: >> >> - >> http://stackoverflow.com/questions/2876895/oracle-t4cpreparedstatement-memory-leaks >> (bottom answer) >> - https://community.oracle.com/thread/2220078?tstart=0 >> >> >> On Tue, Jan 19, 2016 at 5:44 PM, Maximilian Bode

Re: Backpressure in the context of JDBCOutputFormat update

2016-01-22 Thread Robert Metzger
tsführer: Henrik Klagges, Christoph Stock, Dr. Robert Dahlke > Sitz: Unterföhring * Amtsgericht München * HRB 135082 > > Am 21.01.2016 um 15:57 schrieb Robert Metzger : > > Hi Max, > > is the distinct() operation reducing the size of the DataSet? If so, I > assume you have an

Re: Results of testing Flink quickstart against 0.10-SNAPSHOT and 1.0-SNAPSHOT (re. Dependency on non-existent org.scalamacros:quasiquotes_2.11:)

2016-01-22 Thread Robert Metzger
hrows IOException, ClassNotFoundException { >> return flinkAccumulators.deserializeValue( >> *ClassLoader.getSystemClassLoader()*); >> } >> >> *ClassLoader.getSystemClassLoader()* should be >> *getClass().getClassLoader().* >> Not sure why it’s not t

Re: DeserializationSchema isEndOfStream usage?

2016-01-22 Thread Robert Metzger
st likely due to > the non-deterministic ordering of the collection. Any guidance appreciated! > > I'm using 1.0-SNAPSHOT and my two sinks are FlinkKafkaProducer08. > > The flink code looks something like this: > > > val stream: DataStream[Foo] = ... > > val kafkaA =

Re: Backpressure in the context of JDBCOutputFormat update

2016-01-21 Thread Robert Metzger
Hi Max, is the distinct() operation reducing the size of the DataSet? If so, I assume you have an idempotent update and the job is faster because fewer updates are done? if the distinct() operator is not changing anything, then, the job might be faster because the INSERT is done while Flink is sti

Re: Could not upload the jar files to the job manager IOException

2016-01-21 Thread Robert Metzger
a.net.SocketOutputStream.socketWrite(SocketOutputStream.java:113) > at java.net.SocketOutputStream.write(SocketOutputStream.java:153) > at org.apache.flink.runtime.blob.BlobUtils.writeLength(BlobUtils.java:262) > at > org.apache.flink.runtime.blob.BlobClient.putInputStream(BlobClient.ja

Re: DeserializationSchema isEndOfStream usage?

2016-01-20 Thread Robert Metzger
Thanks Robert! I'll be keeping tabs on the PR. > > Cheers, > David > > On Mon, Jan 11, 2016 at 4:04 PM, Robert Metzger > wrote: > >> Hi David, >> >> In theory isEndOfStream() is absolutely the right way to go for stopping >> data sources in Flink. >

Re: groupBy(int) is undefined for the type SingleOutputStreamOperator while running streaming example provided on webpage

2016-01-20 Thread Robert Metzger
Thank you. > > On Wed, Jan 20, 2016 at 9:41 AM, Robert Metzger > wrote: > >> Hi. >> >> which Flink version are you using? >> Starting from Flink 0.10., "groupBy" has been renamed to "keyBy". >> >> Where did you find the "grou

Re: groupBy(int) is undefined for the type SingleOutputStreamOperator while running streaming example provided on webpage

2016-01-20 Thread Robert Metzger
Hi. which Flink version are you using? Starting from Flink 0.10., "groupBy" has been renamed to "keyBy". Where did you find the "groupBy" example? On Wed, Jan 20, 2016 at 3:37 PM, Vinaya M S wrote: > Hi, > > I'm new to Flink. Can anyone help me with the error below. > > Exception in thread "ma

<    3   4   5   6   7   8   9   10   11   >