Specify "rows_in_block" and "cols_in_block" when writing out matrix

2017-05-08 Thread Mingyang Wang
Hi all, Is there anyway to specify the "rows_in_block" and "cols_in_block" when writing out a matrix in binary format? I have tried to specify the values of these two attributes in write(...), but clearly it is not working. Regards, Mingyang

Re: Sparse Matrix Storage Consumption Issue

2017-05-08 Thread Mingyang Wang
gain (e.g., num executors, data distribution, etc). Thanks. > > Regards, > Matthias > > > On Thu, May 4, 2017 at 9:55 PM, Mingyang Wang wrote: > > > Out of curiosity, I increased the driver memory to 10GB, and then all > > operations were executed on CP. It took 37.166s

Re: Sparse Matrix Storage Consumption Issue

2017-05-04 Thread Mingyang Wang
-- 3) + 0.000 sec 1 -- 4) print 0.000 sec 1 -- 5) rmvar 0.000 sec 5 -- 6) createvar 0.000 sec 1 -- 7) assignvar 0.000 sec 1 -- 8) cpvar 0.000 sec 1 Regards, Mingyang On Thu, May 4, 2017 at 9:48 PM Mingyang Wang wrote: > Hi Matthias, > > Thanks for the patch. > > I have re-run the

Re: Sparse Matrix Storage Consumption Issue

2017-05-04 Thread Mingyang Wang
when introducing an improvement that stores sparse matrices in MCSR > format in CSR format on checkpoints which eliminated the need to use a > serialized storage level. I just deliver a fix. Now we store such > ultra-sparse matrices again in serialized form which should > significantl

Sparse Matrix Storage Consumption Issue

2017-05-03 Thread Mingyang Wang
Hi all, I was playing with a super sparse matrix FK, 2e7 by 1e6, with only one non-zero value on each row, that is 2e7 non-zero values in total. With driver memory of 1GB and executor memory of 100GB, I found the HOP "Spark chkpoint", which is used to pin the FK matrix in memory, is really expens

Re: Questions about the Compositions of Execution Time

2017-04-21 Thread Mingyang Wang
ore it as a column vector with FK2 = rowIndexMax(FK) and >> subsequently reconstruct it via FK = table(seq(1,nrow(FK2)), FK2, >> nrow(FK2), N), for which we will compile a dedicated operator that does row >> expansions. You don't necessarily need the last two argument which only

Re: Questions about the Compositions of Execution Time

2017-04-20 Thread Mingyang Wang
e largest operation > fits into 70% of the max heap. Additionally, memory configurations also > impact operator selection - for example, we only compile broadcast-based > matrix multiplications if the smaller input fits twice in the driver and in > the broadcast budget of executors (w

Questions about the Compositions of Execution Time

2017-04-19 Thread Mingyang Wang
parately? 4. Any rule of thumb to estimate the memory needed for a program in SystemML? I really appreciate your inputs! Best, Mingyang Wang

Re: [HELP] Undesired Benchmark Results

2017-03-24 Thread Mingyang Wang
t; would however, not recommend this. > > Thanks again for the feedback. While writing this comment, I actually came > to the conclusion that we could handle even the case with input csv better > in order to avoid evictions in these scenarios. > > > Regards, > Matthias > >

[HELP] Undesired Benchmark Results

2017-03-23 Thread Mingyang Wang
ks in my case. Any suggestion about how to choose a better configuration or make some detours so I can obtain fair benchmarks on a wide range of data dimensions? If needed, I can attach the logs. I really appreciate your help! Regards, Mingyang Wang Graduate Student in UCSD CSE Dept.