SystemML Notebook docker image

2016-02-04 Thread Luciano Resende
I started experimenting with some nice ways to enable data scientists to get started with SystemML with the minimum setup and a pleasant user experience. Following the guide published in the SystemML project documentation page [1], I created a docker image containing the necessary infrastructure

Re: Fixed hadoop configuration to run dml on large dataset

2016-02-04 Thread Deron Eriksson
Ethan, thank you for posting the fix to the LZO configuration issue. Deron On Thu, Feb 4, 2016 at 9:45 AM, Ethan Xu wrote: > Thanks to help from the team, we fixed a hadoop classpath configuration so > dml successfully invokes MapReduce jobs. > > I'm carrying the

Re: Compatibility with MR1 Cloudera cdh4.2.1

2016-02-04 Thread Deron Eriksson
Hi Matthias, Glad to hear the fix is simple. Mixing jar versions sometimes is not very fun. Deron On Thu, Feb 4, 2016 at 11:10 PM, Matthias Boehm wrote: > well, let's not mix different hadoop versions in the class path or > client/server. If I'm not mistaken, cdh 4.x

Re: Compatibility with MR1 Cloudera cdh4.2.1

2016-02-04 Thread Shirish Tatikonda
Hi Ethan, The getDouble() method is actually part of org.apache.hadoop.conf.Configuration.java, which is part of hadoop-common but not hadoop-core -- see [1]. Seems like, it used to be part of hadoop-core a long time ago. Also, the pom.xml in SystemML project does specify hadoop-common as the

Re: User friendly output of univariate statistics

2016-02-04 Thread Shirish Tatikonda
Just to clarify: the current output is actually a matrix, in which rows denote stats and columns denote input variables. So, the output you see is simply the univariate stats matrix in IJV format. In a general case, the primary data type for input/output and computations in SystemML is a *matrix

Compatibility with MR1 Cloudera cdh4.2.1

2016-02-04 Thread Ethan Xu
Hello, I got an error when running the systemML/scripts/Univar-Stats.dml script on a hadoop cluster (Cloudera CDH4.2.1) on a 6GB data set. Error message is at the bottom of the email. The same script ran fine on a smaller sample (several MB) of the same data set, when MR was not invoked. The

Fixed hadoop configuration to run dml on large dataset

2016-02-04 Thread Ethan Xu
Thanks to help from the team, we fixed a hadoop classpath configuration so dml successfully invokes MapReduce jobs. I'm carrying the discussion here in case other people ran into the same problem. Problem description I was running a simple dml to carry out data transformation on a