The slowness in PySpark may be related to searching path added by PySpark,
could you show the sys.path?

On Thu, Sep 3, 2015 at 1:38 PM, Priedhorsky, Reid <rei...@lanl.gov> wrote:
>
> On Sep 3, 2015, at 12:39 PM, Davies Liu <dav...@databricks.com> wrote:
>
> I think this is not a problem of PySpark, you also saw this if you
> profile this script:
>
> ```
> list(map(map_, range(sc.defaultParallelism)))
> ```
>
> 81777/80874    0.086    0.000    0.360    0.000 <frozen
> importlib._bootstrap>:2264(_handle_fromlist)
>
>
> Thanks. Yes, I think you’re right; they seem to be coming from Pandas. Plain
> NumPy calculations do not generate the numerous import-related calls.
>
> That said, I’m still not sure why the time consumed in my real program is so
> much more (~20% rather than ~1%). I will see if I can figure out a better
> test program, or maybe try a different approach.
>
> Reid

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to