Hi,
like Timo said e.g. you need a distributed filesystem like HDFS.
Best regards,
Felix
On Aug 8, 2017 09:01, "P. Ramanjaneya Reddy" wrote:
Hi Timo,
How to make access the files across TM?
Thanks & Regards,
Ramanji.
On Mon, Aug 7, 2017 at 7:45 PM, Timo Walther wrote:
> Flink is a distrib
Felix Neutatz created FLINK-7029:
Summary: Documentation for WindowFunction is confusing
Key: FLINK-7029
URL: https://issues.apache.org/jira/browse/FLINK-7029
Project: Flink
Issue Type
Great to have you Andrea :)
On Feb 17, 2017 15:21, "Aljoscha Krettek" wrote:
> Welcome to the community, Andrea! :-)
>
> On Fri, 17 Feb 2017 at 10:22 Fabian Hueske wrote:
>
> > Hi Andrea,
> >
> > welcome to the community!
> > I gave you Contributor permissions. You can now assign issues to
> yo
che/mahout/tree/master/viennacl-omp
> > > https://github.com/apache/mahout/tree/master/viennacl
> > >
> > > Best,
> > > tg
> > >
> > >
> > > Trevor Grant
> > > Data Scientist
> > > https://github.com/rawkintrevo
> &
ve seen
> SystemML’s source code and would like to ask: why you have decided to
> implement your own integration with cuda? Did you try to consider ND4J, or
> because it is younger, you support your own implementation?
>
> вт, 7 февр. 2017 г. в 18:35, Felix Neutatz :
>
> > Hi Kath
Hi Katherin,
we are also working in a similar direction. We implemented a prototype to
integrate with SystemML:
https://github.com/apache/incubator-systemml/pull/119
SystemML provides many different matrix formats, operations, GPU support
and a couple of DL algorithms. Unfortunately, we realized t
2:00 Till Rohrmann :
>
>> Cool first version Felix :-)
>>
>> On Wed, Aug 10, 2016 at 6:48 PM, Stephan Ewen wrote:
>>
>> > Cool, nice results!
>> >
>> > For the iteration unspecialization - we probably should design this
>> hand in
>>
on - we probably should design this hand
> in
> > hand with the streaming fault tolerance, as they share the notion of
> > "intermediate result versions".
> >
> >
> > On Wed, Aug 10, 2016 at 6:09 PM, Felix Neutatz
> > wrote:
> >
> > > H
iate results
> in
> > iterations, then we have to make FLINK-1713 a blocker for this new issue.
> > But maybe you can also keep the current broadcasting mechanism to be used
> > within iterations only. Then we can address the iteration problem later.
> >
> > Cheers,
>
t also be interesting to have a blocking mode which schedules its
> consumers once the first result is there. Thus, having a mixture of
> pipelined and blocking mode.
>
> Cheers,
> Till
>
> On Tue, Aug 9, 2016 at 4:40 AM, Felix Neutatz
> wrote:
>
> > Hi Stephan,
> >
&g
tings,
> Stephan
>
>
> On Wed, Jul 27, 2016 at 11:33 AM, Felix Neutatz
> wrote:
>
> > Hi Stephan,
> >
> > thanks for the great ideas. First I have some questions:
> >
> > 1.1) Does every task generate an intermediate result partition for every
>
; RecordReader, so the others do not even request the stream. That way, the
> TaskManager should pull only one stream from each producing task, which
> means the data is transferred once.
>
>
> That would also work perfectly with the current failure / recovery model.
>
> What
Hi everybody,
I want to improve the performance of broadcasts in Flink. Therefore Till
told me to start a FLIP on this topic to discuss how to go forward to solve
the current issues for broadcasts.
The problem in a nutshell: Instead of sending data to each taskmanager only
once, at the moment the
nQF9q-btBnGvrzzN3lX0Os6rzHcCOjA/edit?usp=sharing
I highly appreciate any idea or comment and I am looking forward to the
discussion to finally solve this issue :)
Best regards,
Felix
2016-07-08 1:47 GMT+02:00 Felix Neutatz :
> Hi,
>
> i already started to work on this issue. Therefore I created a
weekend,
Felix
P.S. for super curious people:
https://github.com/FelixNeutatz/incubator-flink/commit/7d79d4dfe3f18208a73d6b692b3909f9c69a1da7
2016-06-09 11:50 GMT+02:00 Felix Neutatz :
> Hi everybody,
>
> could we use the org.apache.flink.api.common.cache.DistributedCache to
> work
Felix Neutatz created FLINK-4175:
Summary: Broadcast data sent increases with # slots per TM
Key: FLINK-4175
URL: https://issues.apache.org/jira/browse/FLINK-4175
Project: Flink
Issue Type
Hi everybody,
could we use the org.apache.flink.api.common.cache.DistributedCache to work
around this Broadcast issue for the moment, until we fixed it?
Or do you think it won't scale either?
Best regards,
Felix
2016-06-09 10:57 GMT+02:00 Stephan Ewen :
> Till is right. Broadcast joins currentl
Hi,
I also encountered the EOF exception for a delta iteration with "more
data". With less data it works ...
Best regards,
Felix
Am 27.07.2015 10:25 vorm. schrieb "Andra Lungu" :
> Hi Stephan,
>
> I tried to debug a bit around the EOF Exception. It seems that I am pretty
> useless on my own. I h
Hi,
I want to use t-digest by Ted Dunning (
https://github.com/tdunning/t-digest/blob/master/src/main/java/com/tdunning/math/stats/ArrayDigest.java)
on Flink.
Locally that works perfectly. But on the cluster I get the following error:
java.lang.Exception: Call to registerInputOutput() of invokab
ollect the weights, it is executed.
>
> Cheers,
> Till
>
> On Wed, Jul 8, 2015 at 10:52 AM, Felix Neutatz
> wrote:
>
> > Thanks for the information Till :)
> >
> > So at the moment the iteration is the only way.
> >
> > Best regards,
> >
edDataSet as input. What you can
> do is to extract each group from the original DataSet by using a filter
> operation. Once you have done this, you can train the linear model on each
> sub part of the DataSet.
>
> Cheers,
> Till
>
>
> On Wed, Jul 8, 2015 at 10:37 AM,
iple models on partitions of
> my data with mapPartition and the model-parameters (weights) as
> broadcast variable. If I understood broadcast variables in Flink
> correctly, you should end up with one model on each TaskManager.
>
> Does that work?
>
> Felix
>
> Am 07.07.20
ere anyway besides an iteration how to do this at the moment?
Thanks for your help,
Felix Neutatz
gt;
> On Mon, Jul 6, 2015 at 3:50 PM, Felix Neutatz
> wrote:
>
> > So do you know how to solve this issue apart from increasing the current
> > file-max (4748198)?
> >
> > 2015-07-06 15:35 GMT+02:00 Stephan Ewen :
> >
> > > I th
bf-964b-4883-8eee-12869b9476ab/
> 995a38a2c92536383d0057e3482999a9.000329.channel
> (Too many open files in system)
>
>
>
>
> On Mon, Jul 6, 2015 at 3:31 PM, Felix Neutatz
> wrote:
>
> > Hi,
> >
> > I want to do some simple aggregations o
Hi,
I want to do some simple aggregations on 727 gz files (68 GB total) from
HDFS. See code here:
https://github.com/FelixNeutatz/wikiTrends/blob/master/extraction/src/main/scala/io/sanfran/wikiTrends/extraction/flink/Stats.scala
We are using a Flink-0.9 SNAPSHOT.
I get the following error:
Ca
Hi everybody,
does anybody know whether there is an implementation of the
hadoopinputformats for matrices in numpy or .mat format? This would be
really needed since a lot of machine learning data is stored in these
formats.
Thanks for your help,
Felix
Hi,
when I want to access the documentation on the website I get the following
error:
http://ci.apache.org/projects/flink/flink-docs-release-0.9
Service Unavailable
The server is temporarily unable to service your request due to maintenance
downtime or capacity problems. Please try again later.
Hi Ronny,
I agree with you and I would go even further and generalize it overall. So
that the movieID could be of type Long or Int and the userID of type String.
This would increase usability of the ALS implementation :)
Best regards,
Felix
2015-06-10 11:28 GMT+02:00 Ronny Bräunlich :
> Hello
done: https://issues.apache.org/jira/browse/FLINK-2208
2015-06-12 0:50 GMT+02:00 Ufuk Celebi :
>
> On 12 Jun 2015, at 00:42, Felix Neutatz wrote:
>
> > Yes, it is on a IBM PowerPC machine. So we change that in the
> documentation
> > to all Java 7,8 ( except IBM Java
Felix Neutatz created FLINK-2208:
Summary: Built error for IBM Java
Key: FLINK-2208
URL: https://issues.apache.org/jira/browse/FLINK-2208
Project: Flink
Issue Type: Bug
Components
can be fixed w/o loading the MX beans depending on the JVM.
>
> For your JVM, the classes are located in "com.ibm.lang.management.*" and
> not "com.sun.management.*".
>
> On 12 Jun 2015, at 00:21, Felix Neutatz wrote:
>
> > Hi,
> >
> > the
Hi,
the documentation says: "It [the built of the 0.9 snapshot] works well with
OpenJDK 6 and all Java 7 and 8 compilers."
But I got the following error:
[INFO] --- scala-maven-plugin:3.1.4:compile (scala-compile-first) @
flink-runtime ---
[INFO]
/share/flink/flink-0.9-SNAPSHOT-wo-Yarn/flink-run
Thanks :) works like a charm.
2015-06-10 22:28 GMT+02:00 Fabian Hueske :
> Hi,
>
> use ./bin/flink run -c your.MainClass yourJar to specify the Main class.
> Check the documentation of the CLI client for details.
>
> Cheers, Fabian
> On Jun 10, 2015 22:24, "Felix
Hi,
I try to run this Scala program:
https://github.com/FelixNeutatz/wikiTrends/blob/master/extraction/src/main/scala/io/sanfran/wikiTrends/extraction/flink/DownloadTopKPages.scala
on a cluster.
I tried this command:
/share/flink/flink-0.9-SNAPSHOT/bin/flink run
/home/neutatz/jars/extraction-1.0
> > data, this might be prohibitive expensive. I guess the more efficient
> > > > solution would be to assign an ID and later join with the removed
> > feature
> > > > elements.
> > > >
> > > > Cheers,
> > > >
[SVM, (T, Int),
> (LabeledVector, Int)] value with T <: Vector in the scope where you call
> the predict operation.
> On Jun 6, 2015 8:14 AM, "Felix Neutatz" wrote:
>
> > That would be great. I like the special predict operation better because
> it
> > is only in some
the Vector type.
>
> Cheers,
> Till
> On Jun 4, 2015 7:30 PM, "Felix Neutatz" wrote:
>
> > Hi,
> >
> > I have the following use case: I want to to regression for a timeseries
> > dataset like:
> >
> > id, x1, x2, ..., xn, y
> >
> >
Felix,
> >
> > Passing a JoinHint to your function should help.
> > see:
> >
> >
> http://mail-archives.apache.org/mod_mbox/flink-user/201504.mbox/%3ccanc1h_vffbqyyiktzcdpihn09r4he4oluiursjnci_rwc+c...@mail.gmail.com%3E
> >
> > Cheers,
> > And
k-user/201504.mbox/%3ccanc1h_vffbqyyiktzcdpihn09r4he4oluiursjnci_rwc+c...@mail.gmail.com%3E
>
> Cheers,
> Andra
>
> On Thu, Jun 4, 2015 at 7:07 PM, Felix Neutatz
> wrote:
>
> > after bug fix:
> >
> > for 100 blocks and standard jvm heap space
> >
> > Caused by: java.lang.RuntimeExce
Hi,
I have the following use case: I want to to regression for a timeseries
dataset like:
id, x1, x2, ..., xn, y
id = point in time
x = features
y = target value
In the Flink frame work I would map this to a LabeledVector (y,
DenseVector(x)). (I don't want to use the id as a feature)
When I ap
org.apache.flink.runtime.operators.hash.HashPartition.spillPartition(HashPartition.java:310)
...
Best regards,
Felix
2015-06-04 10:19 GMT+02:00 Felix Neutatz :
> Yes, I will try it again with the newest update :)
>
> 2015-06-04 10:17 GMT+02:00 Till Rohrmann :
>
>> If the first error is not fixed by Chiwans PR, then
fixed by your recent change?
> > >
> > > @felix: if yes, can you try the 2nd run again with the changes?
> > >
> > > On Thursday, June 4, 2015, Felix Neutatz
> wrote:
> > >
> > >> Hi,
> > >>
> > >> I playe
Hi,
I played a bit with the ALS recommender algorithm. I used the movielens
dataset: http://files.grouplens.org/datasets/movielens/ml-latest-README.html
The rating matrix has 21.063.128 entries (ratings).
I run the algorithm with 3 configurations:
1. standard jvm heap space:
val als = ALS()
. SGD is very sensitive to the right step size
> > choice.
> > > > If the step size is too high, then the SGD algorithm does not
> converge.
> > > You
> > > > can find the parameter description here [1].
> > > >
> > > > Cheers,
> > > > T
Hi,
I want to use MultipleLinearRegression, but I got really strange results.
So I tested it with the housing price dataset:
http://archive.ics.uci.edu/ml/machine-learning-databases/housing/housing.data
And here I get negative house prices - even when I use the training set as
dataset:
LabeledVec
Hi Ufuk,
I really like the idea of redesigning the start page. But in my opinion
your page design looks more like a documentation webpage than a starting
page.
In my personal opinion I like the current design better, since you get a
really quick overview with many fancy pictures. (So if you wanna
size settings.
> Do you think it is safe to keep defaults?
>
> On Fri, Apr 24, 2015 at 11:19 PM, Flavio Pompermaier >
> wrote:
>
> > Thanks Felix,
> > Thanks fir the response!
> > I'm looking forward to use it!
> > On Apr 24, 2015 9:01
Felix Neutatz created FLINK-1939:
Summary: Add Parquet Documentation to Wiki
Key: FLINK-1939
URL: https://issues.apache.org/jira/browse/FLINK-1939
Project: Flink
Issue Type: Task
> > > wrote:
> > > >
> > > > > Very nice article!
> > > > > How about adding the full article to the wiki and having a shorter
> > > > version
> > > > > as a blog post (with a link to the wiki)?
> > > > > Adding the co
be very expensive. I will
> > add a jira ticket for adding a manual cross operation to the Table
> > API.
> >
> > On Thu, Apr 16, 2015 at 2:28 PM, Felix Neutatz
> > wrote:
> > > Hi,
> > >
> > > I want to join two tables in the following wa
Hi,
I want to join two tables in the following way:
case class WeightedEdge(src: Int, target: Int, weight: Double)
case class Community(communityID: Int, nodeID: Int)
case class CommunitySumTotal(communityID: Int, sumTotal: Double)
val communities: DataSet[Community]
val weightedEdges: DataSet[
Felix Neutatz created FLINK-1899:
Summary: Table API Bug
Key: FLINK-1899
URL: https://issues.apache.org/jira/browse/FLINK-1899
Project: Flink
Issue Type: Bug
Components: Expression
@Till: Yes, it works without the parentheses :) Thanks :)
2015-04-14 16:52 GMT+02:00 Felix Neutatz :
> I don't know. I can only see the following:
>
> def collect : scala.collection.mutable.Buffer[T] = { /* compiled code */ }
>
> When do they update the latest snapshot on Mav
et.scala
> file? Does it contain parentheses or not?
>
> On Tue, Apr 14, 2015 at 3:48 PM, Felix Neutatz
> wrote:
>
> > I use the latest maven snapshot:
> >
> >
> > org.apache.flink
> > flink-scala
> > 0.9-SNAPSHOT
> >
&
I use the latest maven snapshot:
org.apache.flink
flink-scala
0.9-SNAPSHOT
org.apache.flink
flink-clients
0.9-SNAPSHOT
2015-04-14 15:45 GMT+02:00 Robert Metzger :
> Hi,
>
> which version of Flink are you using?
>
> On Tue, Apr 14, 2015 at 3:36 PM, Felix N
Hi,
I want to run the following example:
import org.apache.flink.api.scala._
case class EdgeType(src: Int, target: Int)
object Test {
def main(args: Array[String]) {
implicit val env = ExecutionEnvironment.getExecutionEnvironment
val graphEdges = readEdges("edges.csv")
gr
uot; ?
>
> Stephan
>
>
> On Sun, Apr 5, 2015 at 3:30 PM, Felix Neutatz
> wrote:
>
> > Hi everybody,
> >
> > I am working currently on a tutorial/article about how/when/why to use
> > Parquet on Flink.
> >
> > You can find the pdf version h
Hi everybody,
I am working currently on a tutorial/article about how/when/why to use
Parquet on Flink.
You can find the pdf version here:
https://github.com/FelixNeutatz/parquet-flinktacular/blob/master/tutorial/parquet_flinktacular.pdf
The git repository with all the code examples can be found
Felix Neutatz created FLINK-1820:
Summary: Bug in DoubleParser and FloatParser - empty String is not
casted to 0
Key: FLINK-1820
URL: https://issues.apache.org/jira/browse/FLINK-1820
Project: Flink
Hi,
I have run the following program:
final ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
List l = Arrays.asList(new Tuple1(1L));
TypeInformation t = TypeInfoParser.parse("Tuple1");
DataSet> data = env.fromCollection(l, t);
long value = data.count();
System.out.prin
Hi Karim,
you can use a Hadoop Input Format and read the files
using flink-hadoop-compatibility classes like here:
http://flink.apache.org/docs/0.7-incubating/hadoop_compatibility.html
Have a nice Sunday,
Felix
2015-02-22 10:02 GMT+01:00 Karim Alaa :
> Hi All,
>
> I’m currently working with Fl
+1
+ Built from source
+ tested some Protobuf examples
+ tested some Parquet examples
Best regards,
Felix
2015-02-10 18:28 GMT+01:00 Márton Balassi :
> +1
>
> Built from source
> Run local examples
> Checked version numbers in the poms
> Validated check sums and signatures
>
> Minor cosmetic
b2527
>
> I would like to merge this to the "release-0.8" branch as well:
> https://github.com/apache/flink/pull/376
>
> After that, all important changes are in the branch. I'm going to create a
> release candidate once #376 is merged.
>
>
> On Mon, Feb 9, 2
Yes, that would be great :)
2015-02-09 14:37 GMT+01:00 Robert Metzger :
> Yes.
>
> Do you mean this? https://github.com/apache/flink/pull/287
>
> I guess you would like to have this for the Parquet blog post? If so, we
> can merge it in my opinion.
>
>
> On Mon, Fe
@Robert, does this also include the Kryo Protobuff support? If yes we could
also ship my changes of the Hadoopinputformat :)
Am 09.02.2015 14:27 schrieb "Robert Metzger" :
> Cool.
>
> I'm currently also testing my last change (kryo serializers). I think I'll
> start creating the release candidate
rator.
>
> 2015-01-21 20:22 GMT+01:00 Chesnay Schepler >:
>
> > If i remember correctly first() returns the first n values for every
> > group. the javadocs actually don't make this behaviour very clear.
> >
> >
> > On 21.01.2015 19:18, Felix Neutatz
Hi,
my use case is the following:
I have a Tuple2. I want to group by the String and sum up the
Long values accordingly. This works fine with these lines:
DataSet lineitems = getLineitemDataSet(env);
lineitems.project(new int []{3,0}).groupBy(0).aggregate(Aggregations.SUM,
1);
After the aggrega
Felix Neutatz created FLINK-1428:
Summary: Typos in Java code example for RichGroupReduceFunction
Key: FLINK-1428
URL: https://issues.apache.org/jira/browse/FLINK-1428
Project: Flink
Issue
Felix Neutatz created FLINK-1412:
Summary: Update Website Links - Optimizer Plan Visualization Tool
Key: FLINK-1412
URL: https://issues.apache.org/jira/browse/FLINK-1412
Project: Flink
Issue
Hi,
is there any example which shows how I can load several files with
different Hadoop input formats at once? My use case is that I want to load
two tables (in Parquet format) via Hadoop and join them within Flink.
Best regards,
Felix
Felix Neutatz created FLINK-1398:
Summary: A new DataSet function: extractElementFromTuple
Key: FLINK-1398
URL: https://issues.apache.org/jira/browse/FLINK-1398
Project: Flink
Issue Type
Felix Neutatz created FLINK-1393:
Summary: Serializing Protobuf - issue 2
Key: FLINK-1393
URL: https://issues.apache.org/jira/browse/FLINK-1393
Project: Flink
Issue Type: Bug
Felix Neutatz created FLINK-1392:
Summary: Serializing Protobuf - issue 1
Key: FLINK-1392
URL: https://issues.apache.org/jira/browse/FLINK-1392
Project: Flink
Issue Type: Bug
Felix Neutatz created FLINK-1382:
Summary: Void is not added to TypeInfoParser
Key: FLINK-1382
URL: https://issues.apache.org/jira/browse/FLINK-1382
Project: Flink
Issue Type: Bug
75 matches
Mail list logo