Dear Spark user mailinglist members,
In PySpark's mllib.recommendation doctest, I found a bit strange usage of temporary directory creation function, tempfile.mkdtemp(), in the following part. # https://github.com/apache/spark/blob/master/python/pyspark/mllib/recommendation.py ... >>> import os, tempfile >>> path = tempfile.mkdtemp() >>> model.save(sc, path) >>> sameModel = MatrixFactorizationModel.load(sc, path) >>> sameModel.predict(2, 2) 0.4... >>> sameModel.predictAll(testset).collect() [Rating(... >>> from shutil import rmtree >>> try: ... rmtree(path) ... except OSError: ... pass As I understand, calling tempfile.mkdtemp() function creates a temporary directory in LOCAL machine. However, model.save(sc, path) saves the model data in HDFS. After all, the doctest removes only LOCAL temp directory using shutil.rmtree(). Shouldn't we delete the temporary directory in HDFS too? Best wishes, HanCheol Han-Cheol Cho Data Laboratory / Data Scientist <!-- <span id="deptLineBR"><br></span> --> 〒160-0022 東京都新宿区新宿6-27-30 新宿イーストサイドスクエア13階 Email hancheol....@nhn-techorus.com