[ 
https://issues.apache.org/jira/browse/SPARK-32061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhengruifeng resolved SPARK-32061.
----------------------------------
    Resolution: Resolved

> potential regression if use memoryUsage instead of numRows
> ----------------------------------------------------------
>
>                 Key: SPARK-32061
>                 URL: https://issues.apache.org/jira/browse/SPARK-32061
>             Project: Spark
>          Issue Type: Sub-task
>          Components: ML, PySpark
>    Affects Versions: 3.1.0
>            Reporter: zhengruifeng
>            Priority: Major
>
> 1, if the `memoryUsage` is improperly set, for example, too small to store a 
> instance;
> 2,  the blockify+GMM reuse two matrices whose shape is related to current 
> blockSize:
> {code:java}
> @transient private lazy val auxiliaryProbMat = DenseMatrix.zeros(blockSize, k)
> @transient private lazy val auxiliaryPDFMat = DenseMatrix.zeros(blockSize, 
> numFeatures) {code}
> When implementing blockify+GMM, I found that if I do not pre-allocate those 
> matrices, there will be seriously regression (maybe 3~4 slower, I fogot the 
> detailed numbers);
> 3, in MLP, three pre-allocated objects are also related to numRows:
> {code:java}
> if (ones == null || ones.length != delta.cols) ones = 
> BDV.ones[Double](delta.cols)
> // TODO: allocate outputs as one big array and then create BDMs from it
> if (outputs == null || outputs(0).cols != currentBatchSize) {
> ...
> // TODO: allocate deltas as one big array and then create BDMs from it
> if (deltas == null || deltas(0).cols != currentBatchSize) {
>   deltas = new Array[BDM[Double]](layerModels.length)
> ... {code}
> I am not very familiar with the impl of MLP and failed to find some related 
> document about this pro-allocation. But I guess there maybe regression if we 
> disable this pro-allocation, since those objects look relatively big.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to