Re: XGBoost on DataFlow and Flink

2016-05-21 Thread Henry Saputra
Hi Tianqi, I am also impressed with XGBoost and looking forward to having integration with Apache Flink. As for resource requirements, Apache Flink is using abstraction of slots to express parallel execution in the TaskManager [1] I suppose you were looking for some DSL or specification configur

Re: XGBoost on DataFlow and Flink

2016-03-14 Thread Tianqi Chen
Thanks! I am not aware of SrcOperator before. Then yes things can be done. About multi-threading issue, I am looking for more principled API to specify the resources requirement, e.g. the slots in this stage needs 4 GPU cores and 1 GPU. So the resource allocator can be aware of that. We have publ

Re: XGBoost on DataFlow and Flink

2016-03-14 Thread Till Rohrmann
Hi Tianqi, dmlc looks really cool and it would be great to integrate it with Flink. As far as I understood your requirements, I think that you can already implement most of it on Flink. For example, starting a special container which does not receive any input could be a specialized SourceOperato

Re: XGBoost on DataFlow and Flink

2016-03-12 Thread Simone Robutti
Thanks for the insight, what you're doing is really interesting. I will definitely spend some time looking at DMLC and MXNet. 2016-03-12 18:35 GMT+01:00 Tianqi Chen : > Thanks for the reply. I am writing a long email to give the answers to > Simone and clarifies what we do > > I want to mention

Re: XGBoost on DataFlow and Flink

2016-03-12 Thread Tianqi Chen
Thanks for the reply. I am writing a long email to give the answers to Simone and clarifies what we do I want to mention that *you can use the library already in Flink*. See Flink example here: https://github.com/dmlc/xgboost/tree/master/jvm-packages#xgboost-flink I have not run pressure test on

Re: XGBoost on DataFlow and Flink

2016-03-12 Thread Theodore Vasiloudis
Hello Tianqui, Yes that definitely sounds interesting for us and we are looking forward to help out with the implementation. Regards, Theodore -- Sent from a mobile device. May contain autocorrect errors. On Mar 12, 2016 11:29 AM, "Simone Robutti" wrote: > This is a really interesting approach

Re: XGBoost on DataFlow and Flink

2016-03-12 Thread Simone Robutti
This is a really interesting approach. The idea of a ML library over DataFlow is probably a winning move and I hope it will stop the proliferation of worthless reimplementation that is taking place in the big data world. Do you think that DataFlow posed specific problems to your work? Does it missi

XGBoost on DataFlow and Flink

2016-03-11 Thread Tianqi Chen
Hi Flink Developers I am sending this email to let you know about XGBoost4J, a package that we are planning to announce next week . Here is the draft version of the post https://github.com/dmlc/xgboost/blob/master/doc/jvm/xgboost4j-intro.md In short, XGBoost is a machine learning package t