[ 
https://issues.apache.org/jira/browse/SPARK-21389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-21389.
----------------------------------
    Resolution: Incomplete

> ALS recommendForAll optimization uses Native BLAS
> -------------------------------------------------
>
>                 Key: SPARK-21389
>                 URL: https://issues.apache.org/jira/browse/SPARK-21389
>             Project: Spark
>          Issue Type: Improvement
>          Components: ML, MLlib
>    Affects Versions: 2.3.0
>            Reporter: Peng Meng
>            Priority: Major
>              Labels: bulk-closed
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> In Spark 2.2, we have optimized ALS recommendForAll, which uses a handwriting 
> matrix multiplication, and get the topK items for each matrix. The method 
> effectively reduce the GC problem. However, Native BLAS GEMM, like Intel MKL, 
> and OpenBLAS, the performance of matrix multiplication is about 10X comparing 
> with handwriting method. 
> I have rewritten the code of recommendForAll with GEMM, and got about 50% 
> improvement comparing with the master recommendForAll method. 
> The key point of this optimization:
> 1), use GEMM to replace hand-written matrix multiplication.
> 2), Use matrix to keep temp result, largely reduce GC and computing time. The 
> master method create many small objects, which causes using GEMM directly 
> cannot get good performance.
> 3), Use sort and merge to get the topK items, which don't need to call 
> priority queue two times.
> Test Result:
> 479818 users, 13727 products, rank = 10, topK = 20.
> 3 workers, each with 35 cores. Native BLAS is Intel MKL.
> Block Size: 1000===2000===4000===8000
> Master Method:40s==39.4s===39.5s===39.1s
> This Method 26.5s==25.9s===26s===27.1s
> Performance Improvement: (OldTime - NewTime)/NewTime = about 50%



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to