Re: Specify "rows_in_block" and "cols_in_block" when writing out matrix

2017-05-08 Thread Matthias Boehm
you can copy SystemML-config.xml.template from ./conf to SystemML-config.xml (needs to be in the same directory as your SystemML.jar or explicitly referenced via -conf) and set defaultblocksize to your custom configuration. Regards, Matthias On Mon, May 8, 2017 at 5:26 PM, Mingyang Wang wrote:

Re: Sparse Matrix Storage Consumption Issue

2017-05-08 Thread Matthias Boehm
at 3:09 PM, Matthias Boehm wrote: > ok thanks for sharing - I'll have a look later this week. > > Regards, > Matthias > > On Mon, May 8, 2017 at 2:20 PM, Mingyang Wang wrote: > >> Hi Matthias, >> >> With a driver memory of 10GB, all operations were

Re: Sparse Matrix Storage Consumption Issue

2017-05-08 Thread Matthias Boehm
so far) > 17/05/08 13:20:20 INFO ExternalSorter: Thread 116 spilling in-memory > map of 31.2 GB to disk (1 time so far) > > ... > > 17/05/08 13:24:50 INFO ExternalAppendOnlyMap: Thread 116 spilling > in-memory map of 26.9 GB to disk (1 time so far) > 17/05/08 13:25:08 I

Re: Sparse Matrix Storage Consumption Issue

2017-05-06 Thread Matthias Boehm
k+ 92.597 sec 1 > > -- 2) sp_chkpoint 0.377 sec 1 > > -- 3) == 0.001 sec 1 > > -- 4) print 0.000 sec 1 > > -- 5) + 0.000 sec 1 > > -- 6) castdts 0.000 sec 1 > > -- 7) createvar 0.000 sec 3 > > -- 8) rmvar 0.000 sec 7 > > -- 9) assignvar 0.000 sec 1 > >

Re: Sparse Matrix Storage Consumption Issue

2017-05-03 Thread Matthias Boehm
to summarize, this was an issue of selecting serialized representations for large ultra-sparse matrices. Thanks again for sharing your feedback with us. 1) In-memory representation: In CSR every non-zero will require 12 bytes - this is 240MB in your case. The overall memory consumption, howeve

Re: Standard code styles for DML and Java?

2017-05-02 Thread Matthias Boehm
thanks Deron for centralizing this discussion, as this could help to avoid redundancy spread across many individual JIRAs and PRs. Overall, I think it would be good to agree on individual style guides for DML and Java. I'm fine with using spaces for DML scripts because they are rarely changed

Re: [DISCUSS] Remove old MLContext API

2017-05-01 Thread Matthias Boehm
definitely +1 from me, although I think we already agreed upon that by properly deprecating this API in previous releases. Regards, Matthias On Mon, May 1, 2017 at 6:55 PM, Nakul Jindal wrote: > +1 > > Nakul > > On Mon, May 1, 2017 at 5:37 PM, wrote: > > > +1 > > > > -- > > > > Mike Dusenberry

Re: Randomly Selecting rows from a dataframe

2017-04-30 Thread Matthias Boehm
;m looking for. > > We can use the for-loop in this case using "data_sample_matrix" matrix. > But want to avoid looping. > > Can anyone please help? > > Thank you! > Arijit > > > > > > From: arijit chakraborty

Re: [VOTE] Apache SystemML 0.14.0-incubating (RC4)

2017-04-28 Thread Matthias Boehm
: Re: [VOTE] Apache SystemML 0.14.0-incubating (RC4) > > > > +1 > > > > Successfully ran Linear Regression, Logistic Regression, Naive Bayes, > SVM in > > Python notebooks with Spark 2.0.2 (in cloud environment) and Spark 2.1 > (on local test cluster) after pip insta

Re: Build passed/failed messages for pull requests

2017-04-28 Thread Matthias Boehm
as I commented on one of these github comments, I'm strongly against these kind of unnecessary messages because they distract from the actual discussions. I already had to change my notification settings accordingly - essentially I'm not watching SystemML's PR activity any more. Regards, Matt

Re: Updating A Vector

2017-04-27 Thread Matthias Boehm
if your values in matrix2 are aligned as in your example, then you can do the following (which works for arbitrary values in matrix1 but you could simplify it if you have just 1s): matrix1 = matrix1*(matrix2==0) + (matrix2!=0)*2; The only problematic case would be special values such NaNs in matr

MLContext scratch space cleanup

2017-04-25 Thread Matthias Boehm
A recent issue, described in SYSTEMML-1466, made me think about the cleanup semantics of our temporary scratch_space when coming through the new MLContext API. For our main compilation chain (hadoop/spark_submit), the semantics are very clear: we delete the entire script specific directory before a

Re: Evaluate a scalar DAG during compilation

2017-04-24 Thread Matthias Boehm
yes, we already do constant folding - the details are in org.apache.sysml.hops.rewrite.RewriteConstantFolding In order to ensure consistency with our runtime, we actually generate instructions for these sub dags, execute them and finally replace the dag with the computed literal. Regards, Matthia

Re: [VOTE] Apache SystemML 0.14.0-incubating (RC4)

2017-04-24 Thread Matthias Boehm
+1 I ran large-scale experiments on Spark 2.1 for L2SVM, GLM, MLogreg, LinregCG, LinregDS, and PCA over scaled versions of MNIST and ImageNet (up to 1TB, with uncompressed and compressed linear algebra) without any issues. Compared to previous experiments with SystemML 0.11 and Spark 1.6, I've se

Re: [VOTE] Apache SystemML 0.14.0-incubating (RC4)

2017-04-24 Thread Matthias Boehm
+1 I ran large-scale experiments on Spark 2.1 for L2SVM, GLM, MLogreg, LinregCG, LinregDS, and PCA over scaled versions of MNIST and ImageNet (up to 1TB, with uncompressed and compressed linear algebra) without any issues. Compared to previous experiments with SystemML 0.11 and Spark 1.6, I've se

Fwd: Questions about the Compositions of Execution Time

2017-04-22 Thread Matthias Boehm
-- Forwarded message -- From: Matthias Boehm Date: Sat, Apr 22, 2017 at 4:23 PM Subject: Re: Questions about the Compositions of Execution Time To: Mingyang Wang with the latest change from today there should not be much difference between the different storage formats. However

Re: function default parameters

2017-04-21 Thread Matthias Boehm
well, for arguments passed into dml scripts there is of course ifdef($b, 2) but for functions there is indeed no good support. At runtime level we still support default parameters for scalar arguments at the tail of the parameter list but I guess at one point the corresponding parser support was di

Re: Randomly Selecting rows from a dataframe

2017-04-21 Thread Matthias Boehm
you can take for example a 1% sample of rows via a permutation matrix (specifically selection matrix) as follows I = (rand(rows=nrow(X), cols=1, min=0, max=1) <= 0.01); P = removeEmpty(target=diag(I), margin="rows"); Xsample = P %*% X; or via removeEmpty and selection vector I = (rand(rows=nrow

Re: Vector of Matrix

2017-04-21 Thread Matthias Boehm
no, right now, we don't support structs or complex objects. Regards, Matthias On 4/21/2017 4:17 AM, arijit chakraborty wrote: Hi, In R (as well as in python), we can store values list within list. Say I've 2 matrix with different dimensions, x <- matrix(1:10, ncol=2) y <- matrix(1:5, ncol=1

Re: Table

2017-04-21 Thread Matthias Boehm
The input vectors to table are interpreted as row indexes and column indexes, respectively. Without weights, we add 1, otherwise the corresponding weight value to the output cells. So in your example you have constant row indexes of 1 but a seq(1,10) for column indexes and hence you get a 1x10

Re: Questions about the Compositions of Execution Time

2017-04-21 Thread Matthias Boehm
On Thu, Apr 20, 2017 at 11:44 AM, Matthias Boehm wrote: > 1) Understanding execution plans: Our local bufferpool reads matrices in a > lazy manner on the first singlenode, i.e., CP, operation that tries to pin > the matrix into memory. Similarly, distributed matrices are read into >

Re: Questions about the Compositions of Execution Time

2017-04-20 Thread Matthias Boehm
ry format with a simple read/write script (it took quite a long time and failed). Regards, Mingyang On Thu, Apr 20, 2017 at 2:08 AM Matthias Boehm wrote: Hi Mingyang, thanks for the questions - this is very valuable feedback. I was able to reproduce your performance issue on scenario 1

Re: Experimental code generation

2017-04-20 Thread Matthias Boehm
. On Apr 20, 2017, at 8:32 AM, Berthold Reinwald wrote: This is awesome! Regards, Berthold Reinwald IBM Almaden Research Center office: (408) 927 2208; T/L: 457 2208 e-mail: reinw...@us.ibm.com From: Matthias Boehm To: dev@systemml.incubator.apache.org Date: 04/20/2

Experimental code generation

2017-04-20 Thread Matthias Boehm
Hi all, meanwhile our new code generation feature is sufficiently stable to enter a broader testing with the goal to further improve its capabilities. If you're interesting, you can enable this feature via true in your SystemML-config.xml file. The major advantages are fewer intermediates (read

Re: Questions about the Compositions of Execution Time

2017-04-20 Thread Matthias Boehm
Hi Mingyang, thanks for the questions - this is very valuable feedback. I was able to reproduce your performance issue on scenario 1 and I have a patch, which I'll push to master tomorrow after a more thorough testing. Below are the details and the answers to your questions: 1) Expected performan

Re: Loss of dimensionality info in transient reads

2017-04-18 Thread Matthias Boehm
In general, there are a couple of scenarios which make size propagation challenging. This includes: * Complex function call patterns (where functions are potentially called with different sizes) * External user-defined functions * Data-dependent operators (e.g., table, aggregate, removeEmtpy); * C

Re: True/False flags in HOPs parameters

2017-04-18 Thread Matthias Boehm
These flags in the runtime plans (-explain runtime or recompile_runtime) are indicators if the given input operand is a literal or not. Without these flags we could not differentiate between literal strings and variable names. Regards, Matthias On Tue, Apr 18, 2017 at 12:20 PM, wrote: > Regardi

Re: SystemML query

2017-04-17 Thread Matthias Boehm
if your data X is already ordered you can do the following: I = rbind(matrix(1,1,1), (X[1:nrow(X)-1,]!=X[2:nrow(X),])); dX = removeEmpty(target=X, margin="rows", select=I); Regards, Matthias On 4/17/2017 8:40 AM, arijit chakraborty wrote: Hi, I've an issue regarding finding and removing the

Re: [VOTE] Apache SystemML 0.14.0-incubating (RC3)

2017-04-15 Thread Matthias Boehm
I think SYSTEMML-1518 and SYSTEMML-1520 require a new RC and I agree that we should create a 0.14 branch along with it to unblock ongoing development. I'm happy to backport any additional fixes into this branch until we have a solid release candidate. Regards, Matthias On Thu, Apr 13, 2017 at 5:3

Re: GSoC : Getting started contributions

2017-04-15 Thread Matthias Boehm
A great issue to start with would be SYSTEMML-546, which aims to cleanup and extend our existing application tests. This would get you in touch with DML and PyDML algorithm scripts as well as the R scripts for comparisons. Regards, Matthias On Sat, Apr 15, 2017 at 2:58 PM, Krishna Kalyan wrote:

Re: [VOTE] Apache SystemML 0.14.0-incubating (RC2)

2017-04-12 Thread Matthias Boehm
well it's been 7 days now, so how do we resolve this deadlock? I personally, think that SYSTEMML-1467 and SYSTEMML-1474 are sufficient to cut a new release. However, apart from these known issues, I did not encounter any additional issues running large-scale experiments with L2SVM, GLM, MLogreg, L

Re: [VOTE] Apache SystemML 0.14.0-incubating (RC2)

2017-04-05 Thread Matthias Boehm
thanks for the new RC Arvind. Just as a quick update: meanwhile Imran has caught SYSTEMML-1467 but there is a workaround, so let's keep it open and collect some additional input from more people trying out the RC. Regards, Matthias On Wed, Apr 5, 2017 at 5:02 PM, Arvind Surve wrote: > Please vo

Re: [VOTE] Apache SystemML 0.14.0-incubating (RC1)

2017-04-04 Thread Matthias Boehm
sorry, but -1 due to SYSTEMML-1464 and SYSTEMML-1459. In detail, SYSTEMML-1464 is a blocker issue for me because it renders JMLC model scoring of text inputs with tokens that contain spaces almost unusable. Furthermore, SYSTEMML-1459 covers a rewrite issue that might corrupt hop dags for special o

Re: Java compiler for code generation

2017-03-31 Thread Matthias Boehm
a scoring environment w/o Spark and Hadoop, will > the dependency on Janino still be there (that question applies to JDK as > well), and what is the footprint? > > Regards, > Berthold Reinwald > IBM Almaden Research Center > office: (408) 927 2208; T/L: 457 2208 > e-mail: reinw...@us

Java compiler for code generation

2017-03-31 Thread Matthias Boehm
Hi all, currently, our new code generator for operator fusion, uses the programmatic javax.tools.JavaCompiler, which is Java's standard API for compilation. Despite a plan cache that mitigates unnecessary compilation and recompilation overheads, we still see significant end-to-end overhead especia

Re: UDFs Within Expressions

2017-03-30 Thread Matthias Boehm
iPhone. > > > > On Mar 29, 2017, at 4:18 PM, Matthias Boehm > wrote: > > > > Well, this would indeed be a very useful extension - I've actually seen > > many use cases, where new users ran into issues with simple expressions > > like X[i,i] = foo(

Re: UDFs Within Expressions

2017-03-29 Thread Matthias Boehm
Well, this would indeed be a very useful extension - I've actually seen many use cases, where new users ran into issues with simple expressions like X[i,i] = foo(). In the general case, the problem with UDFs is that they can have - in contrast to builtin functions - multiple returns. These multiple

Re: [HELP] Undesired Benchmark Results

2017-03-24 Thread Matthias Boehm
ards, Mingyang On Thu, Mar 23, 2017 at 11:36 PM Matthias Boehm wrote: well, after thinking some more about this issue, I have to correct myself but the workarounds still apply. The problem is not the "in-memory reblock" but the collect of the reblocked RDD, which is similarly hand

Re: [HELP] Undesired Benchmark Results

2017-03-23 Thread Matthias Boehm
d classify the missing transitive operator selection for right indexing operations as a bug - we'll fix this in our upcoming 0.14 release. Thanks for catching it. Regards, Matthias On Thu, Mar 23, 2017 at 9:55 PM, Matthias Boehm wrote: > thanks for the feedback Mingyang. Let me quickly ex

Re: [HELP] Undesired Benchmark Results

2017-03-23 Thread Matthias Boehm
thanks for the feedback Mingyang. Let me quickly explain what happens here and subsequently give you a couple of workarounds. 1. Understanding the Bottleneck: For any text inputs, we will initially compile distributed reblock operations that convert the text representation into RDDs of matrix inde

Re: Release cadence

2017-03-23 Thread Matthias Boehm
us stage it in a way that back porting of > possible bug fixes will not be too difficult in the next few weeks or > small nbr of month. > > Regards, > Berthold Reinwald > IBM Almaden Research Center > office: (408) 927 2208; T/L: 457 2208 > e-mail: reinw...@us.ibm.com > >

Re: Next Steps in the graduation process

2017-03-16 Thread Matthias Boehm
. Regards, Matthias On Thu, Mar 16, 2017 at 12:48 PM, Luciano Resende wrote: > Has anyone been able to work on the Maturity Model? > I believe this is one of the last things we are waiting for. > > On Tue, Mar 7, 2017 at 10:48 PM, Matthias Boehm > wrote: > > > I could help doi

Re: Build failed in Jenkins: SystemML-DailyTest #870

2017-03-16 Thread Matthias Boehm
sorry for the issues - I'll fix it with the next change. Regards, Matthias On Thu, Mar 16, 2017 at 2:56 AM, wrote: > See <https://sparktc.ibmcloud.com/jenkins/job/SystemML- > DailyTest/870/changes> > > Changes: > > [Matthias Boehm] [SYSTEMML-1402] Fix reset numbe

Re: Release cadence

2017-03-12 Thread Matthias Boehm
t; be able to correctly identify our next version in the online > documentation. > >> > >> > > How about just make SystemML Next and change the release name when we do > > the release ? > > > > > > > >> Deron > >> > >> > &

Re: Next Steps in the graduation process

2017-03-07 Thread Matthias Boehm
I could help doing this assessment. Btw, here is a working link: https://community.apache.org/apache-way/apache-project-maturity-model.html Regards, Matthias On Tue, Mar 7, 2017 at 1:38 PM, Luciano Resende wrote: > On Tue, Mar 7, 2017 at 11:59 AM, Arvind Surve > wrote: > > > I will start looki

Dropping Java 6 and 7 support

2017-03-06 Thread Matthias Boehm
Hi all, I'd like to drop the support for Java 6 and 7 in our SystemML 1.0 release. Our build still refers to a java compliance level 6, which has not been changed for more than 5 years now. Spark >= 1.5 anyway requires Java 7 and there has been some discussion on removing Java 7 as well because it

Re: Release cadence

2017-03-04 Thread Matthias Boehm
> > On Thu, Jan 5, 2017 at 1:50 PM, wrote: > > > +1 for adopting a 1 month release cycle. > > > > -- > > > > Mike Dusenberry > > GitHub: github.com/dusenberrymw > > LinkedIn: linkedin.com/in/mikedusenberry > > > > Sent

Re: [DISCUSS] SystemML Graduation

2017-03-03 Thread Matthias Boehm
Thanks for starting this discussion Luciano. I think it's a good point in time to graduate SystemML as we have shown readiness by creating an open and positive community, and it would send a great signal to potential new users and developers. From my perspective, we should aim for a top-level proje

Re: [VOTE] Apache SystemML 0.13.0-incubating (RC2)

2017-02-23 Thread Matthias Boehm
+1, I ran our perftest 8GB and 80GB on this 0.13 release (a version earlier than the RC that included all relevant changed) and it passed without any failures or major performance issues. Regards, Matthias On Thu, Feb 23, 2017 at 1:46 PM, Nakul Jindal wrote: > +1 > > Basic sanity tests pass on

Re: incubator-systemml git commit: [maven-release-plugin] prepare for next development iteration

2017-02-22 Thread Matthias Boehm
Could we please change the target version to 1.0 instead of 0.14 to make clear that master is now open for 1.0 features? Regards, Matthias On Mon, Feb 20, 2017 at 12:08 PM, wrote: > Repository: incubator-systemml > Updated Branches: > refs/heads/master 07f26ca4e -> da5879f53 > > > [maven-rele

Re: Minimum required Spark version

2017-02-21 Thread Matthias Boehm
excellent - thanks for the quick fix Deron. Regards, Matthias On 2/21/2017 1:09 AM, Deron Eriksson wrote: Note that MLContext has been updated to log a warning rather than throw an exception to the user for Spark versions previous to 2.1.0. Deron On Mon, Feb 20, 2017 at 2:29 PM, Matthias Boeh

Re: [DISCUSS] Roadmap SystemML 1.0

2017-02-19 Thread Matthias Boehm
; >>>> > >> > >>>> 2) Updated Dependencies: > >> > >>>> * Spark 2.0 support > >> > >>>> * Matrix block library (isolated jar) > >> > >>>> > >> > >>>> 3) Compiler/Runtime Featur

Weighted Statistical Estimates

2017-02-18 Thread Matthias Boehm
Going toward to our 1.0 release, I'd like to create consistency across our weighted statistics. Conceptually, theses weights represent frequency counts, i.e., multiplicities of input values. So far, our documentation does not state any restrictions on these weights but some runtime operations requ

Re: Build failed in Jenkins: SystemML-DailyTest #816

2017-02-17 Thread Matthias Boehm
yes, this is one of the flaky tests with occasional errors - unfortunately, even with the exact seeds of a failed run, this behavior is not reproducible locally. Regards, Matthias On Fri, Feb 17, 2017 at 10:16 AM, wrote: > > Failed tests: > > > > FrameMatrixReblockTest.testFrameWriteMultiple

Re: Operators in HOP DAG

2017-02-17 Thread Matthias Boehm
ad 1: t(-*): ternary minus mult (for patterns like X-s*Y) ad 2: ua(+RC): unary aggregate with aggregation function + (at runtime level you will see k+ for Kahan plus) and direction RC, i.e., full aggregate over rows and columns. ad 3: lix: matrix or frame left indexing (for patterns like X[a:

Re: Proposal to add 'accuracy test suite' before 1.0 release

2017-02-17 Thread Matthias Boehm
Yes, this has been discussed a couple of times now, most recently in SYSTEMML-546. It takes quite some effort though to create a sophisticated algorithm-level test suite as done for GLM. So by all means, please, go ahead and add these tests. However, I would not impose any constraints on the c

Corrupt SystemML 0.12 release download

2017-02-15 Thread Matthias Boehm
Just in case you have not seen the issue described in [1], the download of our 0.12 release is currently corrupted as the included SystemML.jar does not contain the antlr-runtime and wink-json libraries. Hence, without modifying the jar, SystemML fails with ClassNotFoundExceptions. I agree with Fe

Re: Removal of workaround flags

2017-02-13 Thread Matthias Boehm
ion of > SYSTEMML-1140. Specifically, what did you use to attempt to reproduce 1140? > > > -Mike > > -- > > Mike Dusenberry > GitHub: github.com/dusenberrymw > LinkedIn: linkedin.com/in/mikedusenberry > > Sent from my iPhone. > > > > On Feb 12, 2017, at 12:30 AM, Matthias Boehm > wrote: > > > > SYSTEMML-1140 >

Re: Namespace handling w/ imports

2017-02-13 Thread Matthias Boehm
amespace used. > It also helped simplify calling dml-bodied functions when a file was > imported by another. > > > > Thanks, > > Glenn > > > > > > Matthias Boehm ---02/12/2017 12:30:35 AM---While debugging our > mnist_lenet script, I encountered an issue

Re: Build failed in Jenkins: SystemML-DailyTest #805

2017-02-12 Thread Matthias Boehm
org.apache.sysml.test.integration.functions.transform.TransformCSVFrameEncodeReadTest Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 29.677 sec - in org.apache.sysml.test.integration.functions.transform.TransformCSVFrameEncodeReadTest On Sun, Feb 12, 2017 at 12:26 AM, Matthias Boehm wrote: could someone please test

Namespace handling w/ imports

2017-02-12 Thread Matthias Boehm
While debugging our mnist_lenet script, I encountered an issue with our namespace handling with imports. Here is the related function call graph (after inlining): FUNCTION CALL GRAPH --MAIN PROGRAM .\mnist_lenet.dml::train --.\nn/layers/dropout.dml::forward --.\m

Removal of workaround flags

2017-02-12 Thread Matthias Boehm
just a little heads-up: I intend to the remove the recently added workaround flags DISABLE_SPARSE and DISABLE_CACHING because any underlying issues should be directly addressed. Furthermore, I was not able the reproduce the issues reported in SYSTEMML-1140, probably due to improvements that hav

Re: Build failed in Jenkins: SystemML-DailyTest #805

2017-02-12 Thread Matthias Boehm
ins/job/SystemML-DailyTest/805/changes> Changes: [Matthias Boehm] [SYSTEMML-1244] Fix robustness csv text read (quoted recoded maps) [Matthias Boehm] [SYSTEMML-1243] Fix size update wdivmm/wsigmoid/wumm on rewrite [Matthias Boehm] [SYSTEMML-1248] Fix loop rewrite update-in-place (exclude

Re: [DISCUSS] Enable Python Tests on Jenkins

2017-02-03 Thread Matthias Boehm
this is fine, but please make sure that it gets integrated into our existing testsuite which can be run through maven or junit. Regards, Matthias On 2/3/2017 9:10 PM, Deron Eriksson wrote: +1 for enabling the Python tests in the test suite. Since we use multiple languages and it's not always

Re: February Podling Report

2017-02-01 Thread Matthias Boehm
optionally, we could include the following paper that we presented at CIDR'17 in January. Tarek Elgamal, Shangyu Luo, Mattias Boehm, Alexandre V. Evfimievski, Shirish Tatikonda, Berthold Reinwald, Prithviraj Sen: SPOOF: Sum-Product Optimization and Operator Fusion for Large-Scale Machine Learn

Re: HOP and LOP DAGs

2017-01-30 Thread Matthias Boehm
Hi Nantia, good question - so far the documentation of tools like explain and stats is indeed very sparse. However, there are some overview slides from a tutorial we gave last year at the BOSS workshop: http://boss.dima.tu-berlin.de/media/BOSS16-Tutorial-mboehm.pdf (slides 10-15) If you have

Re: [VOTE] Apache SystemML 0.12.0-incubating (RC2)

2017-01-27 Thread Matthias Boehm
Thanks Glenn. Could you please also share the measurements (maybe in a jira). Furthermore, seeing that you ran only a subset of multinomial experiments, makes me wonder if you used the current default configuration of 150 classes? In the recent past, we usually ran this perftest with a reason

Re: Build failed in Jenkins: SystemML-DailyTest #761

2017-01-23 Thread Matthias Boehm
L-541 under: https://issues.apache.org/jira/browse/SYSTEMML-1188 Thanks, Glenn [image: Inactive hide details for Matthias Boehm ---01/21/2017 02:21:04 AM---Let's keep the test, collect the used seeds, and fix it. T]Matthias Boehm ---01/21/2017 02:21:04 AM---Let's keep the test, collect

Re: Build failed in Jenkins: SystemML-DailyTest #761

2017-01-21 Thread Matthias Boehm
Let's keep the test, collect the used seeds, and fix it. The issue is due to a randomly generated seed which hints at an underlying problem for certain special cases. Btw, FullReblockTest has a similar issue. Since I don't have access to the sparktc jenkins infrastructure, it would be great if

Re: SystemML optimizer design

2017-01-17 Thread Matthias Boehm
Hi Dylan, these are very interesting questions - let me answer them one by one: 0. SPOOF: We developed the SPOOF compiler framework in a separate fork that will be integrated back into SystemML master soon. Initially, we will add the code generation part as an experimental feature, likely in

Re: Time To Release 0.13

2017-01-17 Thread Matthias Boehm
I agree with Arvind here as the 8GB case would mostly run as singlenode, in-memory operations and not test the Spark 2.x integration. Regards, Matthias On 1/17/2017 5:33 AM, Arvind Surve wrote: We are planning to have 80GB testing for 0.13 release (to support Spark 2.0). It will add couple of

Re: Time To Merge Spark 2.0 Support PR

2017-01-06 Thread Matthias Boehm
+1 on moving to Spark 2.x - I think we delayed this way too long now and there will always be some awesome feature that we'd want to support on older Spark versions too. Regards, Matthias On 1/6/2017 9:41 PM, Mike Dusenberry wrote: Well to be fair, a user can still use the Python DSL with the

Re: Release cadence

2017-01-05 Thread Matthias Boehm
In general, I like the idea of aiming for consistent release cycles. However, every month is just too much, at least for me. There is a considerable overhead associated with each release for end-to-end performance tests, tests on different environments, code freeze for new features, etc. Hence,

Re: [DISCUSS] Roadmap SystemML 1.0

2017-01-03 Thread Matthias Boehm
, I'm fine with making (3), (4), and part of (5) optional and let the "must-have" features from (1) and (2) determine the timeline. Regards, Matthias On 1/3/2017 11:27 PM, Luciano Resende wrote: On Tue, Jan 3, 2017 at 11:50 AM, Matthias Boehm wrote: I'd like to initiate

[DISCUSS] Roadmap SystemML 1.0

2017-01-03 Thread Matthias Boehm
I'd like to initiate the discussion of a concrete roadmap for our next release. According, to previous discussions, I'd think it's fair to say that we agree on calling it SystemML 1.0. We should carefully plan this release as it's an opportunity to change APIs and remove some older deprecated f

Re: Build and distribution related issues for GPU support

2016-12-02 Thread Matthias Boehm
listically, the PTX JIT compilation adds about <5 seconds of startup overhead (on the platforms I tested on), if the "-gpu" flag option is used. It can be argued that in a long running job, a constant cost is justified. -Nakul On Thu, Nov 24, 2016 at 12:53 AM, Matthias Boe

Re: Performance differences between SystemML LibMatrixMult and Breeze with native BLAS

2016-12-01 Thread Matthias Boehm
eworks. To do this would definitely require setting up a more "scientific" benchmark suite than my little test here. Felix Am 01.12.2016 01:00 schrieb Matthias Boehm: ok, then let's sort this out one by one 1) Benchmarks: There are a couple of things we should be aware of for these

Re: Performance differences between SystemML LibMatrixMult and Breeze with native BLAS

2016-11-30 Thread Matthias Boehm
false,1000,1000,100)x(false,1000,1000,100) in 251.290325 MM k=8 (false,1000,1000,100)x(false,1000,1000,100) in 265.851277 MM k=8 (false,1000,1000,100)x(false,1000,1000,100) in 240.902494 Am 01.12.2016 00:08 schrieb Matthias Boehm: Could you please make sure you're comparing

Re: Performance differences between SystemML LibMatrixMult and Breeze with native BLAS

2016-11-30 Thread Matthias Boehm
Could you please make sure you're comparing the right thing. Even on old sandy bridge CPUs our matrix mult for 1kx1k usually takes 40-50ms. We also did the same experiments with larger matrices and SystemML was about 2x faster compared to Breeze. Please decomment the timings in LibMatrixMult.ma

Re: Build and distribution related issues for GPU support

2016-11-24 Thread Matthias Boehm
l use the cuda compiler that ships with that version of the toolkit and compile the .cu files in the project and commit the resulting .ptx files. Thoughts, comments? -Nakul On Wed, Nov 23, 2016 at 2:43 PM, Matthias Boehm wrote: thanks for sharing Nakul. Could you please also comment on th

Re: Build and distribution related issues for GPU support

2016-11-23 Thread Matthias Boehm
thanks for sharing Nakul. Could you please also comment on the PTX story for custom kernels and different PTX versions? Regards, Matthias On 11/23/2016 10:13 PM, Nakul Jindal wrote: Hi, SystemML has experimental GPU support, which we are working to solidify. Currently, GPU is supported in CP

Re: Parfor semantics

2016-11-23 Thread Matthias Boehm
arate model over the full dataset using a mini-batch SGD approach. Has the `parfor` construct been used for this purpose before? -- Mike Dusenberry GitHub: github.com/dusenberrymw LinkedIn: linkedin.com/in/mikedusenberry Sent from my iPhone. On Nov 22, 2016, at 2:01 PM, Matthias Boehm

Re: Parfor semantics

2016-11-22 Thread Matthias Boehm
wrote: The constrained optimizer doesn't seem to know about a REMOTE_SPARK execution mode and either sets CP or REMOTE_MR. I can open a jira for that and provide a fix. Felix Am 22.11.2016 02:07 schrieb Matthias Boehm: yes, this came up several times - initially we only supported opt=NONE w

Re: Parfor semantics

2016-11-21 Thread Matthias Boehm
yes, this came up several times - initially we only supported opt=NONE where users had to specify all other parameters. Meanwhile, there is a so-called "constrained optimizer" that does the same as the rule-based optimizer but respects any given parameters. Please try something like this: parf

Re: Build failed in Jenkins: SystemML-DailyTest #631

2016-11-17 Thread Matthias Boehm
:746) 14:37:58 at org.apache.sysml.test.integration.functions.data.FullReblockTest.runReblockTest (FullReblockTest.java:479) 14:37:58 at org.apache.sysml.test.integration.functions.data.FullReblockTest.testTextCellSingeMSparseSP (FullReblockTest.java:125) -Arvind From: Matthias Boehm To: dev@sy

Re: Build failed in Jenkins: SystemML-DailyTest #631

2016-11-17 Thread Matthias Boehm
could someone help me get the detailed test output from FullReblockTest? I instrumented it with the used seeds to resolve these occasional failures. Regards, Matthias On 11/17/2016 2:03 AM, jenk...@spark.tc wrote: See

Re: Collection project progress for SystemML report

2016-10-31 Thread Matthias Boehm
thanks Felix for taking this over. Here is a list of recent papers and tutorials: 1) VLDB'16 Papers: We presented the papers "Compressed Linear Algebra for Large-Scale Machine Learning" (research paper + poster) and "SystemML: Declarative Machine Learning on Spark" (industry paper) at VLDB'16

Re: [DISCUSS] Adding tensorboard-like functionality to SystemML

2016-10-28 Thread Matthias Boehm
Thanks for putting this together Niketan. However, could we please postpone this discussion after our 1.0 release? Right now, I'm concerned to see that we're adding many experimental features without really getting them done. This includes for example, the GPU backend, the new MLContext API, th

Re: Local versions of Linear Algebra Operators in DML

2016-10-24 Thread Matthias Boehm
math functions at compile time depending on what intermediates they produce ... Meaning you may still end up with java heap space OOM at runtime. Regards, Berthold Reinwald IBM Almaden Research Center office: (408) 927 2208; T/L: 457 2208 e-mail: reinw...@us.ibm.com From: Matthias Boehm To:

Re: Local versions of Linear Algebra Operators in DML

2016-10-24 Thread Matthias Boehm
m.com/researcher/view.php?person=us-npansar Matthias Boehm ---10/21/2016 01:00:51 PM---thanks Nakul for reaching out before starting work on this. Actually, the introduction of these CP- From: Matthias Boehm To: dev@systemml.incubator.apache.org Date: 10/21/2016 01:00 PM Subject: Re: Local versions of

Re: [Discuss] String requirements for data passed to SystemML Frames.

2016-10-22 Thread Matthias Boehm
ok let me clarify a couple of things and provide an easy solution that resolves this issue altogether. 1) Escaping: transformencode, transformdecode, and transformapply do not remove quotes to provide easy to understand semantics. If users want to match strings with different escaping policies

Re: [Discuss] String requirements for data passed to SystemML Frames.

2016-10-22 Thread Matthias Boehm
ok let me clarify a couple of things and provide an easy solution that resolves this issue altogether. 1) Escaping: transformencode, transformdecode, and transformapply do not remove quotes to provide easy to understand semantics. If users want to match strings with different escaping policies

Re: Local versions of Linear Algebra Operators in DML

2016-10-21 Thread Matthias Boehm
thanks Nakul for reaching out before starting work on this. Actually, the introduction of these CP-only builtin functions was a big mistake because (as you already mentioned) they mistakenly suggest that we provide distributed operations for them too. The intend was to support them in later ver

Re: [VOTE] SystemML New Logo Ideas

2016-10-21 Thread Matthias Boehm
how strict this is. Deron On Fri, Oct 21, 2016 at 12:15 PM, Matthias Boehm wrote: Thanks for these proposals. For all the options, I'd prefer to remove the TM - it's just a little odd for an open source project with no intentions to register a trademark. I know, the new Spark logo has

Re: [VOTE] SystemML New Logo Ideas

2016-10-21 Thread Matthias Boehm
Thanks for these proposals. For all the options, I'd prefer to remove the TM - it's just a little odd for an open source project with no intentions to register a trademark. I know, the new Spark logo has it too but it's probably a different context, especially since there are discussions to add

Re: Running the bivar-stats example

2016-10-20 Thread Matthias Boehm
apart from the missing support for Spark 2.x, I would recommend to double check your inputs. Do you have meta data files along with your inputs? If not, then these inputs are assumed to be in default format "text", i.e., matrix market ijv representation (not csv). You can either provide such js

Re: [VOTE] Apache SystemML 0.11.0-incubating (RC3)

2016-10-19 Thread Matthias Boehm
Glenn, all these issues were only caused by wrong tests that used an invalid ID schema or populated this column incorrectly, right? If so, then I think it's fine to release. However, if we touch it anyway, we should globally change the ID schema from double to long, which is more intuitive when

Re: [VOTE] Apache SystemML 0.11.0-incubating (RC2)

2016-10-10 Thread Matthias Boehm
I hate to say it, but -1. There have been a couple of important fixes since we've cut the rc and unfortunately, additional (so far unresolved) blocking issues showed up. In detail the fixed issues are: * SYSTEMML-1023: Fix csv line parsing (the quote-aware column-splitting was hanging on a single

Re: [VOTE] Apache SystemML 0.11.0-incubating (RC1)

2016-10-07 Thread Matthias Boehm
with SYSTEMML-1014, 1018, and 1019 now being fixed, I think we're ready to cut RC2. Regards, Matthias From: Matthias Boehm/Almaden/IBM@IBMUS To: dev@systemml.incubator.apache.org Date: 10/06/2016 04:58 PM Subject:Re: [VOTE] Apache SystemML 0.11.0-incubating

Re: [VOTE] Apache SystemML 0.11.0-incubating (RC1)

2016-10-06 Thread Matthias Boehm
> LinkedIn: linkedin.com/in/mikedusenberry > > Sent from my iPhone. > > > > On Oct 5, 2016, at 1:17 PM, Matthias Boehm wrote: > > > > as the Python DSL is still in experimental status, I don't think that > SYSTEMML-1013 is blocking the release. However,

  1   2   3   >