date:20180121

[Help] Converting a Python Numpy code into Spark using RDD

2018-01-21 Thread Aakash Basu

Hi, How can I convert this Python Numpy code into Spark RDD so that the operations leverage the Spark distributed architecture for Big Data. Code is as follows - def gini(array): """Calculate the Gini coefficient of a numpy array.""" array = array.flatten() #all values are treated equall

Re: run spark job in yarn cluster mode as specified user

2018-01-21 Thread Margusja

Hi One way to get it is use YARN configuration parameter - yarn.nodemanager.container-executor.class. By default it is org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor - gives you user who executes script. Br

Re: Has there been any explanation on the performance degradation between spark.ml and Mllib?

2018-01-21 Thread Nick Pentreath

At least one of their comparisons is flawed. The Spark ML version of linear regression (*note* they use linear regression and not logistic regression, it is not clear why) uses L-BFGS as the solver, not SGD (as MLLIB uses). Hence it is typically going to be slower. However, it should in most cases

run spark job in yarn cluster mode as specified user

2018-01-21 Thread sd wang

Hi Advisers, When submit spark job in yarn cluster mode, the job will be executed by "yarn" user. Any parameters can change the user? I tried setting HADOOP_USER_NAME but it did not work. I'm using spark 2.2. Thanks for any help!

Is there any Spark ML or MLLib API for GINI for Model Evaluation? Please help! [EOM]

2018-01-21 Thread Aakash Basu

Has there been any explanation on the performance degradation between spark.ml and Mllib?

2018-01-21 Thread Stephen Boesch

While MLLib performed favorably vs Flink it *also *performed favorably vs spark.ml .. and by an *order of magnitude*. The following is one of the tables - it is for Logistic Regression. At that time spark.ML did not yet support SVM From: https://bdataanalytics.biomedcentral.com/articles/10. 118

Re: Processing huge amount of data from paged API

2018-01-21 Thread anonymous

The devices and device messages are retrieved using the APIs provided by company X (not the company's real name), which owns the IoT network. There is the option of setting HTTP POST callbacks for device messages, but we want to be able to run analytics on messages of ALL the devices of the network

Re: Processing huge amount of data from paged API

2018-01-21 Thread Jörn Franke

Which device provides messages as thousands of http pages? This is obviously inefficient and it will not help much to run them in parallel. Furthermore with paging you risk that messages get los or you get duplicate messages. I still not get why nowadays applications download a lot of data throu

Processing huge amount of data from paged API

2018-01-21 Thread anonymous

Hello, I'm in an IoT company, and I have a use case for which I would like to know if Apache Spark could be helpful. It's a very broad question, and sorry if it's long winded. We have HTTP GET APIs to get two kinds of information: 1) The Device Messages API returns data about device messages (in

Gracefully shutdown spark streaming application

2018-01-21 Thread KhajaAsmath Mohammed

Hi, Could anyone please provide your thoughts on how to kill spark streaming application gracefully. I followed link of http://why-not-learn-something.blogspot.in/2016/05/apache-spark-streaming-how-to-do.html https://github.com/lanjiang/streamingstopgraceful I played around with having either p

Re: external shuffle service in mesos

2018-01-21 Thread igor.berman

Hi Susan In general I can get what I need without Marathon, with configuring external-shuffle-service with puppet/ansible/chef + maybe some alerts for checks. I mean in companies that don't have strong Devops teams and want to install services as simple as possible just by config - Marathon might

[Help] Converting a Python Numpy code into Spark using RDD

Re: run spark job in yarn cluster mode as specified user

Re: Has there been any explanation on the performance degradation between spark.ml and Mllib?

run spark job in yarn cluster mode as specified user

Is there any Spark ML or MLLib API for GINI for Model Evaluation? Please help! [EOM]

Has there been any explanation on the performance degradation between spark.ml and Mllib?

Re: Processing huge amount of data from paged API

Re: Processing huge amount of data from paged API

Processing huge amount of data from paged API

Gracefully shutdown spark streaming application

Re: external shuffle service in mesos

11 matches

Site Navigation

Mail list logo

Footer information