strange usage of tempfile.mkdtemp() in PySpark mllib.recommendation doctest

Han-Cheol Cho Thu, 02 Mar 2017 00:31:10 -0800

Dear Spark user mailinglist members,


In PySpark's mllib.recommendation doctest, I found a bit strange usage of 
temporary directory creation function, tempfile.mkdtemp(), in the following
part.
    # 
https://github.com/apache/spark/blob/master/python/pyspark/mllib/recommendation.py

    ...
    >>> import os, tempfile
    >>> path = tempfile.mkdtemp()
    >>> model.save(sc, path)
    >>> sameModel = MatrixFactorizationModel.load(sc, path)
    >>> sameModel.predict(2, 2)
    0.4...
    >>> sameModel.predictAll(testset).collect()
    [Rating(...
    >>> from shutil import rmtree
    >>> try:
    ...     rmtree(path)
    ... except OSError:
    ...     pass

As I understand, calling tempfile.mkdtemp() function creates a temporary 
directory in LOCAL machine.
However, model.save(sc, path) saves the model data in HDFS.
After all, the doctest removes only LOCAL temp directory using shutil.rmtree().
Shouldn't we delete the temporary directory in HDFS too?


Best wishes,
HanCheol







 Han-Cheol Cho  Data Laboratory   / Data Scientist <!-- <span 
id="deptLineBR"><br></span>  --> 〒160-0022　東京都新宿区新宿6-27-30　新宿イーストサイドスクエア13階
Email  hancheol....@nhn-techorus.com

strange usage of tempfile.mkdtemp() in PySpark mllib.recommendation doctest

Reply via email to