Multiple windows with large number of partitions

2016-04-27 Thread Christopher Santiago
I've been working through the flink demo applications and started in on a prototype, but have run into an issue with how to approach the problem of getting a daily unique user count from a traffic stream. I'm using a time characteristic event time. Sample event stream (timestamp,userid):

Re: Discarding header from CSV file

2016-04-27 Thread Chiwan Park
It seems that type of `buildingManager` is not matched to CSV column. In source code, `buildingManager` is defined as `Int`. But in your CSV file, it starts with a character `M`. I succeeded in running the code with your CSV file after changing the type of `buildingManager` to `String`.

Re: Discarding header from CSV file

2016-04-27 Thread nsengupta
Hello Chiwan, Yes, that's an oversight on my part. In my hurry, I didn't even try to explore the source of that /Exception/. Thanks, again. However, I still don't know why I am not being able to read the CSV file. As the output shows, using standard IO routines, I can read the same file anyway.

Reducing parallelism leads to NoResourceAvailableException

2016-04-27 Thread Ken Krugler
Hi all, In trying out different settings for performance, I run into a job failure case that puzzles me. I’d done a run with a parallelism of 20 (-p 20 via CLI), and the job ran successfully, on a cluster with 40 slots. I then tried with -p 15, and it failed with:

Flink Iterations Ordering

2016-04-27 Thread David Kim
Hello all, I read the documentation at [1] on iterations and had a question on whether an assumption is safe to make. As partial solutions are continuously looping through the step function, when new elements are added as iteration inputs will the insertion order of all of the elements be

Re: Getting java.lang.Exception when try to fetch data from Kafka

2016-04-27 Thread prateekarora
Thanks for the response . can you please suggest some link or example to write own DeserializationSchema ? Regards Prateek On Tue, Apr 26, 2016 at 11:06 AM, rmetzger0 [via Apache Flink User Mailing List archive.] wrote: > Hi Prateek, > > sorry for the

Re: Join DataStream with dimension tables?

2016-04-27 Thread Srikanth
Aljoscha, Your thoughts on this? Srikanth On Mon, Apr 25, 2016 at 8:08 PM, Srikanth wrote: > Aljoscha, > > Looks like a potential solution. Feels a bit hacky though. > > Didn't quite understand why a list backed store is used to for static > input buffer? Join(inner)

Re: Need a working example to read/write avro data using FlinkKafkaProducer / Consumer

2016-04-27 Thread Maximilian Michels
Hi Kaniska, I've replied to your mail on the Beam user mailing list. Cheers, Max On Wed, Apr 27, 2016 at 4:56 PM, kaniska Mandal wrote: > I am facing some issues while reading / writing Avro data. > > Attached here the corresponding files and avro-generated pojo. > >

Re: Command line arguments getting munged with CLI?

2016-04-27 Thread Ken Krugler
Hi Timur, Thanks, using ‘--’ seems to work. I’ve filed https://issues.apache.org/jira/browse/FLINK-3838 to allow ‘-’ as well. — Ken > On Apr 27, 2016, at 12:20am, Timur Fayruzov wrote: > > Hi Ken, > > I have

Re: AvroWriter for Rolling sink

2016-04-27 Thread Igor Berman
Hi Aljoscha, avro-mapred jar contains different M/R output formats for avro, and their writers it's primary used in M/R jobs that produce avro output see some details here : https://avro.apache.org/docs/1.7.6/mr.html I have extracted(kind of copy-pasted+adjustments) some of the classes from

Re: Gelly CommunityDetection in scala example

2016-04-27 Thread Vasiliki Kalavri
Hi Trevor, note that the community detection algorithm returns a new graph where the vertex values correspond to the computed communities. Also, note that the current implementation expects a graph with java.lang.Long vertex values and java.lang.Double edge values. The following should work:

Re: Requesting the next InputSplit failed

2016-04-27 Thread Flavio Pompermaier
A precursor of the modified connector (since we started a long time ago). However the idea is the same, I compute the inputSplits and then I get the data split by split (similarly to what it happens in FLINK-3750 - https://github.com/apache/flink/pull/1941 ) Best, Flavio On Wed, Apr 27, 2016 at

Re: Gelly CommunityDetection in scala example

2016-04-27 Thread Suneel Marthi
Recall facing a similar issue while trying to contribute a gelly-scala example to flink-training. See https://github.com/dataArtisans/flink-training-exercises/blob/master/src/main/scala/com/dataartisans/flinktraining/exercises/gelly_scala/PageRankWithEdgeWeights.scala On Wed, Apr 27, 2016 at

Re: Requesting the next InputSplit failed

2016-04-27 Thread Chesnay Schepler
Are you using your modified connector or the currently available one? On 27.04.2016 17:35, Flavio Pompermaier wrote: Hi to all, I'm running a Flink Job on a JDBC datasource and I obtain the following exception: java.lang.RuntimeException: Requesting the next InputSplit failed. at

Gelly CommunityDetection in scala example

2016-04-27 Thread Trevor Grant
The following example in the scala shell worked as expected: import org.apache.flink.graph.library.LabelPropagation val verticesWithCommunity = graph.run(new LabelPropagation(30)) // print the result verticesWithCommunity.print I tried to extend the example to use CommunityDetection: import

Requesting the next InputSplit failed

2016-04-27 Thread Flavio Pompermaier
Hi to all, I'm running a Flink Job on a JDBC datasource and I obtain the following exception: java.lang.RuntimeException: Requesting the next InputSplit failed. at org.apache.flink.runtime.taskmanager.TaskInputSplitProvider.getNextInputSplit(TaskInputSplitProvider.java:91) at

Need a working example to read/write avro data using FlinkKafkaProducer / Consumer

2016-04-27 Thread kaniska Mandal
I am facing some issues while reading / writing Avro data. Attached here the corresponding files and avro-generated pojo. Any clues whats wrong here ? May be missing some simple step ! *A) << producer >> BeamKafkaFlinkAvroProducerTest * >> if I use KafkaProducer directly (i.e. call

Re: Simple GraphX example.

2016-04-27 Thread Trevor Grant
Ahh pro-tip. Thanks Aljoscha! Final solution in case any should stumble across this in the future: You have to load the flink-gelly-scala jar AND the flink-gelly jar (to get access to the Edge/Vertex). import org.apache.flink.api.scala._ import org.apache.flink.graph.scala._ import

Re: Flink Queue Scheduling (JobManager)

2016-04-27 Thread Vikram Saxena
Yes, I have tried using separate Yarn queues for this, but I have my doubts. Here is what I am trying to do: I have 2 Flink Jobs JobA : Regular Job running every x minutes. JobB : User requested adhoc job Tried1. Sol A : Have 2 queues on Yarn with 95 : 5 resource distribution Sol B: Have

Re: Simple GraphX example.

2016-04-27 Thread Aljoscha Krettek
Hi, I think you need to import the stuff from org.apache.flink.graph.scala.* instead of org.apache.flink.graph.*. Cheers, Aljoscha On Wed, 27 Apr 2016 at 16:07 Trevor Grant wrote: > Hi, I'm running Flink 1.0.2 from the Zeppelin/shell- trying to experiment > with some

Simple GraphX example.

2016-04-27 Thread Trevor Grant
Hi, I'm running Flink 1.0.2 from the Zeppelin/shell- trying to experiment with some graph stuff. Zeppelin has been known to add degrees of crazy to trouble shooting- but I intuitively feel like this is something I'm doing wrong on the Flink Side. The simplest example is not working for me.

About flink stream table API

2016-04-27 Thread Zhangrucong
Hello everybody: I want to learn the flink stream API. The stream sql is the same with calcite? In the flowing link, the examples of table api are dataset, where I can see the detail introduction of streaming table API.

Re: Flink Queue Scheduling (JobManager)

2016-04-27 Thread Flavio Pompermaier
That would be definitely interesting but I think that at the moment the only way to achieve that is to exploit YARN for that.. An integration with some job-workflow engine (like Apache Oozie and Apache Falcon) would also be very useful! I tried to wrote on Apache Oozie mailing list if there is any

Flink Queue Scheduling (JobManager)

2016-04-27 Thread Vikram Saxena
Hi I am reading and learning about Flink and I have tried to implement some Flink Jobs. In my application I have 2 Flink Jobs which I want to run in parallel. Of course, as I understand I can have the task slots divided so that each one can run concurrently. But, is there a possibility for

Re: Discarding header from CSV file

2016-04-27 Thread Chiwan Park
Hi, You don’t need to call execute() method after calling print() method. print() method triggers the execution. The exception is raised because you call execute() after print() method. Regards, Chiwan Park > On Apr 27, 2016, at 6:35 PM, nsengupta wrote: > >

Re: Discarding header from CSV file

2016-04-27 Thread nsengupta
Till, Thanks for looking into this. I have removed the toList() from the collect() function, to align the code with what I generally do in a Flink application. It throws an Exception, and I can't figure out why. *Here's my code (shortened for brevity):* case class

Re: "No more bytes left" at deserialization

2016-04-27 Thread Till Rohrmann
Hi Timur, could you try to exclude the older kryo dependency from twitter.carbonite via com.twitter carbonite 1.4.0 kryo com.esotericsoftware.kryo and try whether this solves your problem. If your problem should still persist,

Re: Job hangs

2016-04-27 Thread Fabian Hueske
Hi Timur, I had a look at the plan you shared. I could not find any flow that branches and merges again, a pattern which is prone to cause a deadlocks. However, I noticed that the plan performs a lot of partitioning steps. You might want to have a look at forwarded field annotations which can

Re: Tuning parallelism in cascading-flink planner

2016-04-27 Thread Fabian Hueske
Hi Ken, at the moment, there are just two parameters to control the parallelism of Flink operators generated by the Cascading-Flink connector. The parameters are: - flink.num.sourceTasks to specify the parallelism of source tasks. - flink.num.shuffleTasks to specify the parallelism of all

Re: Job hangs

2016-04-27 Thread Vasiliki Kalavri
Hi Timur, I've previously seen large batch jobs hang because of join deadlocks. We should have fixed those problems, but we might have missed some corner case. Did you check whether there was any cpu activity when the job hangs? Can you try running htop on the taskmanager machines and see if

Re: Wildcards with --classpath parameter in CLI

2016-04-27 Thread Ufuk Celebi
+1 :-) On Wed, Apr 27, 2016 at 10:49 AM, Till Rohrmann wrote: > Hi Ken, > > I think it would be a good addition to support wildcards for the classpath > option. That makes life much easier if you want to specify multiple jars to > be included in the classpath. If you want

Re: Wildcards with --classpath parameter in CLI

2016-04-27 Thread Till Rohrmann
Hi Ken, I think it would be a good addition to support wildcards for the classpath option. That makes life much easier if you want to specify multiple jars to be included in the classpath. If you want to take the lead here, then it would be great :-) Cheers, Till On Wed, Apr 27, 2016 at 6:43

Re: Discarding header from CSV file

2016-04-27 Thread Till Rohrmann
Hi Nirmalya, I tried to reproduce your problem but was not successful. For me it worked to read a csv file and file in the values in to case classes. Could you maybe compile an example program with sample input to reproduce your problem? Cheers, Till On Wed, Apr 27, 2016 at 5:51 AM, nsengupta

Re: AvroWriter for Rolling sink

2016-04-27 Thread Aljoscha Krettek
Hi, which code did you reuse from there? I asked Robert and I think it is somewhat problematic to add these somewhat bigger dependencies. Cheers, Aljoscha On Mon, 25 Apr 2016 at 21:24 Igor Berman wrote: > Hi, > it's not a problem, I'll find time to change it(I understand

Re: Flink Client use remote app jar

2016-04-27 Thread Till Rohrmann
At the moment, there is no concrete plan to introduce such a feature, because it cannot be guaranteed that you always have a distributed file system available. But we could maybe add it as a tool which we contribute to flink-contrib. Do you wanna take the lead? Cheers, Till On Wed, Apr 27, 2016

Re: Flink Client use remote app jar

2016-04-27 Thread Theofilos Kakantousis
Hi Till, Thank you for the quick reply. Do you think that would be a useful feature in the future, for the Client to automatically download an job jar from HDFS, or there are no plans to introduce it? Cheers, Theofilos On 2016-04-27 10:42, Till Rohrmann wrote: Hi Theofilos, I'm afraid,

Re: Flink Client use remote app jar

2016-04-27 Thread Till Rohrmann
Hi Theofilos, I'm afraid, but that is currently not possible with Flink. Flink expects the user code jar to be uploaded to its Blob server. That's what the client does prior to submitting the job. You would have to upload the jar with the BlobClient manually if you wanted to circumvent the

Re: Command line arguments getting munged with CLI?

2016-04-27 Thread Timur Fayruzov
Hi Ken, I have built parameter parser in my jar to work with '--' instead of '-' and it works fine (on 1.0.0 and on current master). After a cursory look at parameter parser Flink uses (http://commons.apache.org/proper/commons-cli/) it seems that double vs single dash could make a difference, so

Re: "No more bytes left" at deserialization

2016-04-27 Thread Timur Fayruzov
Hi Ken, Good point actually, thanks for pointing this out. In Flink project I see that there is dependency on 2.24 and then I see transitive dependencies through twitter.carbonite has a dependency on 2.21. Also, twitter.chill that is used to manipulate Kryo as far as I understand, shows up with