Testing library for Flink

2020-01-20 Thread Juan Rodríguez Hortalá
Hi all, Recently, my colleagues at Complutense University of Madrid and I developed a testing library for Flink. The library extends on ScalaCheck to allow you to specify random generators of streams using temporal logic. You can also write assertions as temporal logic formulas. If you are

Re: Best way to compute the difference between 2 datasets

2019-09-16 Thread Juan Rodríguez Hortalá
> groups first, which typically means spilling to disk if the data set has > any significant size. > > — Ken > > PS - I assume that you’ve implemented a valid hashCode()/equals() for the > record. > > > On Jul 22, 2019, at 8:29 AM, Juan Rodríguez Hortalá < > juan.

Re: Re: MiniClusterResource class not found using AbstractTestBase

2019-07-23 Thread Juan Rodríguez Hortalá
% flinkVersion % Test **classifier > "tests"* > *)* > > Best, > Haibo > > At 2019-07-23 17:51:23, "Fabian Hueske" wrote: > > Hi Juan, > > Which Flink version do you use? > > Best, Fabian > > Am Di., 23. Juli 2019 um 06:49 Uhr schri

Re: Execution environments for testing: local vs collection vs mini cluster

2019-07-23 Thread Juan Rodríguez Hortalá
t? > > 1. Do you want to write a unit test (or integration test) case for your > project or for Flink? Or just want to run your job locally? > 2. Which mode do you want to test? DataStream or DataSet? > > > > Juan Rodríguez Hortalá 于2019年7月23日周二 > 下午1:12写道: > >>

Execution environments for testing: local vs collection vs mini cluster

2019-07-22 Thread Juan Rodríguez Hortalá
Hi, In https://ci.apache.org/projects/flink/flink-docs-stable/dev/local_execution.html and https://ci.apache.org/projects/flink/flink-docs-master/api/java/org/apache/flink/runtime/minicluster/MiniCluster.html I see there are 3 ways to create an execution environment for testing: -

MiniClusterResource class not found using AbstractTestBase

2019-07-22 Thread Juan Rodríguez Hortalá
Hi, I'm trying to use AbstractTestBase in a test in order to use the mini cluster. I'm using specs2 with Scala, so I cannot extend AbstractTestBase because I also have to extend org.specs2.Specification, so I'm trying to access the mini cluster directly using Specs2 BeforeAll to initialize it as

Best way to compute the difference between 2 datasets

2019-07-22 Thread Juan Rodríguez Hortalá
Hi, I've been trying to write a function to compute the difference between 2 datasets. With that I mean computing a dataset that has all the elements of a dataset that are not present in another dataset. I first tried using coCogroup, but it was very slow in a local execution environment, and

Re: Exceptions when launching counts on a Flink DataSet concurrently

2019-05-02 Thread Juan Rodríguez Hortalá
3Cdev.flink.apache.org%3E > > Am Fr., 26. Apr. 2019 um 17:03 Uhr schrieb Juan Rodríguez Hortalá < > juan.rodriguez.hort...@gmail.com>: > >> Hi Timo, >> >> Thanks for your answer. I was surprised to have problems calling those >> methods concurrently, be

Re: Exceptions when launching counts on a Flink DataSet concurrently

2019-04-26 Thread Juan Rodríguez Hortalá
ht know more about the internals of the execution? > > Regards, > Timo > > > Am 26.04.19 um 03:13 schrieb Juan Rodríguez Hortalá: > > Any thoughts on this? > > On Sun, Apr 7, 2019, 6:56 PM Juan Rodríguez Hortalá < > juan.rodriguez.hort...@gmail.com> wrote: > >>

Re: Exceptions when launching counts on a Flink DataSet concurrently

2019-04-25 Thread Juan Rodríguez Hortalá
Any thoughts on this? On Sun, Apr 7, 2019, 6:56 PM Juan Rodríguez Hortalá < juan.rodriguez.hort...@gmail.com> wrote: > Hi, > > I have a very simple program using the local execution environment, that > throws NPE and other exceptions related to concurrent access when l

Exceptions when launching counts on a Flink DataSet concurrently

2019-04-07 Thread Juan Rodríguez Hortalá
Hi, I have a very simple program using the local execution environment, that throws NPE and other exceptions related to concurrent access when launching a count for a DataSet from different threads. The program is https://gist.github.com/juanrh/685a89039e866c1067a6efbfc22c753e which is basically

Re: IterativeStream seems to ignore maxWaitTimeMillis

2016-11-23 Thread Juan Rodríguez Hortalá
ck what happens. > > On Wed, 23 Nov 2016 at 05:08 Juan Rodríguez Hortalá < > juan.rodriguez.hort...@gmail.com> wrote: > >> Thanks for your answer Aljoscha, >> >> The source stops, when I comment all the transformed streams and just >> print the input,

Re: IterativeStream seems to ignore maxWaitTimeMillis

2016-11-22 Thread Juan Rodríguez Hortalá
2016 at 07:58 Juan Rodríguez Hortalá < > juan.rodriguez.hort...@gmail.com> wrote: > >> Hi, >> >> I wrote a proof of concept for a Java version of mapWithState with >> time-based state eviction https://github.com/juanrh/ >> flink-state-eviction/blob/a6bb0d4ca

Re: Early events

2016-11-21 Thread Juan Rodríguez Hortalá
; On Sun, 20 Nov 2016 at 06:18 Juan Rodríguez Hortalá < > juan.rodriguez.hort...@gmail.com> wrote: > >> Hi, >> >> There was a bug in my code, I was assigning the timestamps wrong and that >> is why it looked like early events where assigned processing time. >&g

IterativeStream seems to ignore maxWaitTimeMillis

2016-11-20 Thread Juan Rodríguez Hortalá
Hi, I wrote a proof of concept for a Java version of mapWithState with time-based state eviction https://github.com/juanrh/flink-state-eviction/blob/a6bb0d4ca0908d2f4350209a4a41e381e99c76c5/src/main/java/com/github/juanrh/streaming/MapWithStateIterPoC.java. The idea is: - Convert an input

Failure to donwload flink-contrib dependency

2016-11-20 Thread Juan Rodríguez Hortalá
Hi, I'm having problems to download flink-contrib in my Java maven project, the relevant part of the pom is: UTF-8 1.1.3 org.apache.flink flink-contrib ${flink.version} I see that in https://repo1.maven.org/maven2/org/apache/flink/flink-contrib/1.1.3/ there are no jar

Re: Early events

2016-11-19 Thread Juan Rodríguez Hortalá
/293fe1cf972b2e4bc6fb4e874eb8ba70c78f7894/src/test/java/com/github/juanrh/streaming/source/EventTimeDelayedElementsSourceTest.java ) Anyway, is this the expected behaviour for early events? Is Flink buffering early events until their future timestamp arrives? Thanks, Juan On Sat, Nov 19, 2016 at 8:31 PM, Juan Rodríguez

Early events

2016-11-19 Thread Juan Rodríguez Hortalá
Hi, Maybe this is already in the documentation, sorry if I'm asking something obvious. I was thinking that if you have event time then you can also have early events, which would be events whose extracted timestampt is in the future. This might happen in practice for example in sensors with a

An idea for a parallel AllWindowedStream

2016-11-08 Thread Juan Rodríguez Hortalá
Hi, As a self training exercise I've defined a class extending WindowedStream for implementing a proof of concept for a parallel version of AllWindowStream /** * Tries to create a parallel version of a AllWindowStream for a DataStream * by creating a KeyedStream by using as key the hash of the

Re: Testing DataStreams

2016-11-04 Thread Juan Rodríguez Hortalá
source code which contains many such tests. > > -Max > > > On Wed, Nov 2, 2016 at 4:58 PM, Juan Rodríguez Hortalá > <juan.rodriguez.hort...@gmail.com> wrote: > > Hi, > > > > I'm new to Flink, and I'm trying to write my first unit test for a > simple > &g

Testing DataStreams

2016-11-02 Thread Juan Rodríguez Hortalá
Hi, I'm new to Flink, and I'm trying to write my first unit test for a simple DataStreams job. In https://ci.apache.org/projects/flink/flink-docs-master/api/java/org/apache/flink/streaming/util/package-summary.html I see several promising classes, but for example I cannot import

Re: About stateful transformations

2016-10-27 Thread Juan Rodríguez Hortalá
ints can be > kept. > > Cheers, > Aljoscha > > On Tue, 25 Oct 2016 at 06:47 Juan Rodríguez Hortalá < > juan.rodriguez.hort...@gmail.com> wrote: > >> Hi Gyula, >> >> Thanks a lot for your response, it was very clear. I understand that >> there is no prob

Re: About stateful transformations

2016-10-24 Thread Juan Rodríguez Hortalá
nt directory. This means there is no data > fragmentation in the checkpoints. Similar applies to the FsStateBackend but > that keeps the local state strictly in memory. > > I think you should definitely give RocksDB + HDFS a try. It works > extremely well for very large state sizes g

About stateful transformations

2016-10-23 Thread Juan Rodríguez Hortalá
Hi all, I don't have much experience with Flink, so please forget me if I ask some obvious questions. I was taking a look to the documentation on stateful transformations in Flink at https://ci.apache.org/projects/flink/flink-docs- release-1.2/dev/state.html. I'm mostly interested in Flink for

Re: How to create a stream of data batches

2015-09-07 Thread Juan Rodríguez Hortalá
Hi, I'm just a Flink newbie, but maybe I'd suggest using window operators with a Count policy for that https://ci.apache.org/projects/flink/flink-docs-release-0.9/apis/streaming_guide.html#window-operators Hope that helps. Greetings, Juan 2015-09-04 14:14 GMT+02:00 Stephan Ewen

Re: Hardware requirements and learning resources

2015-09-02 Thread Juan Rodríguez Hortalá
apply to the batch APIs; >>> the streaming API in Flink follows a true streaming paradigm, where you get >>> an unbounded stream of records and operators on these streams. >>> >>> Funny that you ask about a video for the DataStream slides. There is a >>&g

Re: Hardware requirements and learning resources

2015-09-02 Thread Juan Rodríguez Hortalá
, and there are more lessons at http://dataartisans.github.io/flink-training, for stream processing and the table API for which I haven't found a video. Does anyone have pointers to the missing videos? Greetings, Juan 2015-09-02 12:50 GMT+02:00 Juan Rodríguez Hortalá < juan.rodriguez.hort...@gmail.