Re: Example - Reading Avro Generic records

2016-04-07 Thread Sourigna Phetsarath
was write the PR. -Gna On Thu, Apr 7, 2016 at 11:08 AM, Sourigna Phetsarath < gna.phetsar...@teamaol.com> wrote: > Tranadeep, > > Thanks for pasting your code! > > I have a PR ready that extends AvroInputFormat and will submit it soon. > > Still waiting for the l

Re: Example - Reading Avro Generic records

2016-04-07 Thread Sourigna Phetsarath
blic Tuple2<Long,String> map(GenericRecord record) { > Long id = (Long) record.get("id"); > String someString = record.get("somestring").toString(); > return new Tuple2<>(id, someString); > } > }).wr

Re: Example - Reading Avro Generic records

2016-04-01 Thread Sourigna Phetsarath
There is a way yet, but I am proposing to do one: https://issues.apache.org/jira/browse/FLINK-3691 On Fri, Apr 1, 2016 at 4:04 AM, Tarandeep Singh wrote: > Hi, > > Can someone please point me to an example of creating DataSet using Avro > Generic Records? > > I tried this

Re: Example - Reading Avro Generic records

2016-04-01 Thread Sourigna Phetsarath
Tarandeep, There isn't a way yet, but I am proposing to do one: https://issues.apache.org/jira/browse/FLINK-3691 -Gna On Fri, Apr 1, 2016 at 4:04 AM, Tarandeep Singh wrote: > Hi, > > Can someone please point me to an example of creating DataSet using Avro > Generic

Re: Flink ML 1.0.0 - Saving and Loading Models to Score a Single Feature Vector

2016-03-29 Thread Sourigna Phetsarath
ally need a quick >> and dirty solution, it's not that hard to serialize the model into a file. >> >> 2016-03-28 17:59 GMT+02:00 Sourigna Phetsarath < >> gna.phetsar...@teamaol.com>: >> >>> Flinksters, >>> >>> Is there an example of sa

Flink ML 1.0.0 - Saving and Loading Models to Score a Single Feature Vector

2016-03-28 Thread Sourigna Phetsarath
Flinksters, Is there an example of saving a Trained Model, loading a Trained Model and then scoring one or more feature vectors using Flink ML? All of the examples I've seen have shown only sequential fit and predict. Thank you. -Gna -- *Gna Phetsarath*System Architect // AOL Platforms //

Re: DataSet.randomSplit()

2016-03-28 Thread Sourigna Phetsarath
rticular method is not part of Flink yet. > > Did you have a look at the sampling methods in DataSetUtils? Maybe they > can be helpful for what you are trying to achieve. > > – Ufuk > > On Wed, Mar 23, 2016 at 5:19 PM, Sourigna Phetsarath < > gna.phetsar...@teamaol.com> w

DataSet.randomSplit()

2016-03-23 Thread Sourigna Phetsarath
All: Does Flink DataSet have a randomSplit(weights:Array[Double], seed: Long): Array[DataSet[T]] function? There is this pull request: https://github.com/apache/flink/pull/921 Does anyone have an update of the progress of this? Thank you. -- *Gna Phetsarath*System Architect // AOL

Re: Flink 1.0.0 reading files from multiple directory with wildcards

2016-03-23 Thread Sourigna Phetsarath
//flink.apache.org/contribute-code.html > http://flink.apache.org/how-to-contribute.html > > On Wed, Mar 23, 2016 at 12:00 AM, Fabian Hueske <fhue...@gmail.com> wrote: > >> Hi Gna, >> >> thanks for sharing the good news and opening the JIRA! >> >> C

Re: Flink 1.0.0 reading files from multiple directory with wildcards

2016-03-22 Thread Sourigna Phetsarath
-Gna On Mon, Mar 21, 2016 at 10:04 AM, Sourigna Phetsarath < gna.phetsar...@teamaol.com> wrote: > Fabian, > > I'll try extending InputFormat as you suggested and will create a JIRA > issue as well. > > I also have an AvroGenericRecordInput format class that I would like to &

Re: Flink 1.0.0 reading files from multiple directory with wildcards

2016-03-21 Thread Sourigna Phetsarath
InputFormat. >> Would you mind opening a JIRA issue with your suggestions? >> >> Until this is added to Flink, it can be implemented as a custom >> InputFormat based on FileInputFormat by overriding the createInputSplits() >> method. >> >> Best, Fabian >

Flink 1.0.0 reading files from multiple directory with wildcards

2016-03-20 Thread Sourigna Phetsarath
All, Do any of the Flink Data Sources support comma separated directories with wildcards? For example: env.readFile("/data/2016/01/01/*/*,/data/2016/01/02/*/*,/data/2016/01/03/*/* ") Thanks in advance for any help that you can provide. -- *Gna Phetsarath*System Architect // AOL Platforms

Setting taskmanager.network.numberOfBuffers does not seem to have an affect - Flink 0.10.2

2016-03-19 Thread Sourigna Phetsarath
All: Flink Version 0.10.2 The number that I set for *taskmanager.network.numberOfBuffers* doesn't seem to have any affect, even if I set it to a very high number. There might be a race condition here where the upper bound is not enforced or computer correctly. java.io.IOException: Insufficient

Re: S3 Timeouts with lots of Files Using Flink 0.10.2

2016-03-19 Thread Sourigna Phetsarath
ection setup timeout in milliseconds. > > > > fs.s3a.connection.timeout > 5 > Socket connection timeout in milliseconds. > > > > -- Ken > > > -- > > *From:* Sourigna Phetsarath > > *Sent:* March 17, 2016 2:05:39pm PDT > > *To:* user@flink

Setting taskmanager.network.numberOfBuffers and getting errors...

2016-03-03 Thread Sourigna Phetsarath
All: I'm running a Flink 0.10.2 App by submitting to YARN as an application. I'm using an AWS EMR cluster of 1 Master and 10 d2.8xlarge. When I submit the job using: bin/flink run \ -m yarn-cluster \ -yjm 20480 \ -yn 10 \ -ytm 80960 \ -ys 36 \ -yD

FlinkML 0.10.1 - Using SparseVectors with MLR does not work

2016-02-03 Thread Sourigna Phetsarath
All: I'm trying to use SparseVectors with FlinkML 0.10.1. It does not seem to be working. Here is a UnitTest that I created to recreate the problem: *package* com.aol.ds.arc.ml.poc.flink > *import* org.junit.After > *import* org.junit.Before > *import* org.slf4j.LoggerFactory > *import*