This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push: new 7220d134670 [SPARK-46059][INFRA][PYTHON] Install `six==1.16.0` explicitly for `pandas` in Python 3.12 7220d134670 is described below commit 7220d134670efa2474f4581e5ae22786f85e6626 Author: Dongjoon Hyun <dh...@apple.com> AuthorDate: Wed Nov 22 18:51:40 2023 -0800 [SPARK-46059][INFRA][PYTHON] Install `six==1.16.0` explicitly for `pandas` in Python 3.12 ### What changes were proposed in this pull request? This PR aims to make it sure that `six==1.16.0` for `pandas` in Python 3.12. ### Why are the changes needed? `import pandas` fails like the following if `six`'s version is lower than `1.16.0`. **BEFORE** ```python $ docker run -it --rm ghcr.io/apache/apache-spark-ci-image:master-6955850829 bash WARNING: The requested image's platform (linux/amd64) does not match the detected host platform (linux/arm64/v8) and no specific platform was requested root39f78dbc0836:/# python3.12 Python 3.12.0 (main, Oct 21 2023, 17:44:38) [GCC 9.4.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import pandas Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/local/lib/python3.12/dist-packages/pandas/__init__.py", line 46, in <module> from pandas.core.api import ( File "/usr/local/lib/python3.12/dist-packages/pandas/core/api.py", line 1, in <module> from pandas._libs import ( File "/usr/local/lib/python3.12/dist-packages/pandas/_libs/__init__.py", line 18, in <module> from pandas._libs.interval import Interval File "interval.pyx", line 1, in init pandas._libs.interval File "hashtable.pyx", line 1, in init pandas._libs.hashtable File "missing.pyx", line 1, in init pandas._libs.missing File "/usr/local/lib/python3.12/dist-packages/pandas/_libs/tslibs/__init__.py", line 39, in <module> from pandas._libs.tslibs.conversion import localize_pydatetime File "conversion.pyx", line 1, in init pandas._libs.tslibs.conversion File "offsets.pyx", line 1, in init pandas._libs.tslibs.offsets File "timestamps.pyx", line 1, in init pandas._libs.tslibs.timestamps File "timedeltas.pyx", line 1, in init pandas._libs.tslibs.timedeltas File "timezones.pyx", line 24, in init pandas._libs.tslibs.timezones File "/usr/local/lib/python3.12/dist-packages/dateutil/tz/__init__.py", line 2, in <module> from .tz import * File "/usr/local/lib/python3.12/dist-packages/dateutil/tz/tz.py", line 21, in <module> from six.moves import _thread ModuleNotFoundError: No module named 'six.moves' ``` **AFTER** ```python $ docker run -it --rm ghcr.io/apache/apache-spark-ci-image:master-6955850829 bash root35c02e3acdc1:/# python3.12 -m pip install six==1.16.0 Collecting six==1.16.0 Using cached six-1.16.0-py2.py3-none-any.whl (11 kB) Installing collected packages: six Attempting uninstall: six Found existing installation: six 1.14.0 Uninstalling six-1.14.0: Successfully uninstalled six-1.14.0 Successfully installed six-1.16.0 WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv root35c02e3acdc1:/# python3.12 Python 3.12.0 (main, Oct 21 2023, 17:44:38) [GCC 9.4.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import pandas >>> ``` ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the CIs. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #43964 from dongjoon-hyun/SPARK-46059. Authored-by: Dongjoon Hyun <dh...@apple.com> Signed-off-by: Dongjoon Hyun <dh...@apple.com> --- dev/infra/Dockerfile | 10 +++++----- dev/requirements.txt | 1 + 2 files changed, 6 insertions(+), 5 deletions(-) diff --git a/dev/infra/Dockerfile b/dev/infra/Dockerfile index acd3ac0ce90..225de5f9ed5 100644 --- a/dev/infra/Dockerfile +++ b/dev/infra/Dockerfile @@ -92,8 +92,8 @@ RUN Rscript -e "devtools::install_version('preferably', version='0.4', repos='ht # See more in SPARK-39735 ENV R_LIBS_SITE "/usr/local/lib/R/site-library:${R_LIBS_SITE}:/usr/lib/R/library" -RUN pypy3 -m pip install numpy 'pandas<=2.1.3' scipy coverage matplotlib -RUN python3.9 -m pip install numpy 'pyarrow>=14.0.0' 'pandas<=2.1.3' scipy unittest-xml-reporting plotly>=4.8 'mlflow>=2.8.1' coverage matplotlib openpyxl 'memory-profiler==0.60.0' 'scikit-learn>=1.3.2' +RUN pypy3 -m pip install numpy 'six==1.16.0' 'pandas<=2.1.3' scipy coverage matplotlib +RUN python3.9 -m pip install numpy 'pyarrow>=14.0.0' 'six==1.16.0' 'pandas<=2.1.3' scipy unittest-xml-reporting plotly>=4.8 'mlflow>=2.8.1' coverage matplotlib openpyxl 'memory-profiler==0.60.0' 'scikit-learn>=1.3.2' # Add Python deps for Spark Connect. RUN python3.9 -m pip install 'grpcio==1.59.3' 'grpcio-status==1.59.3' 'protobuf==4.25.1' 'googleapis-common-protos==1.56.4' @@ -110,7 +110,7 @@ RUN apt-get update && apt-get install -y \ python3.10 python3.10-distutils \ && rm -rf /var/lib/apt/lists/* RUN curl -sS https://bootstrap.pypa.io/get-pip.py | python3.10 -RUN python3.10 -m pip install numpy 'pyarrow>=14.0.0' 'pandas<=2.1.3' scipy unittest-xml-reporting plotly>=4.8 'mlflow>=2.8.1' coverage matplotlib openpyxl 'memory-profiler==0.60.0' 'scikit-learn>=1.3.2' +RUN python3.10 -m pip install numpy 'pyarrow>=14.0.0' 'six==1.16.0' 'pandas<=2.1.3' scipy unittest-xml-reporting plotly>=4.8 'mlflow>=2.8.1' coverage matplotlib openpyxl 'memory-profiler==0.60.0' 'scikit-learn>=1.3.2' RUN python3.10 -m pip install 'grpcio==1.59.3' 'grpcio-status==1.59.3' 'protobuf==4.25.1' 'googleapis-common-protos==1.56.4' RUN python3.10 -m pip install 'torch<=2.0.1' torchvision --index-url https://download.pytorch.org/whl/cpu RUN python3.10 -m pip install torcheval @@ -122,7 +122,7 @@ RUN apt-get update && apt-get install -y \ python3.11 python3.11-distutils \ && rm -rf /var/lib/apt/lists/* RUN curl -sS https://bootstrap.pypa.io/get-pip.py | python3.11 -RUN python3.11 -m pip install numpy 'pyarrow>=14.0.0' 'pandas<=2.1.3' scipy unittest-xml-reporting plotly>=4.8 'mlflow>=2.8.1' coverage matplotlib openpyxl 'memory-profiler==0.60.0' 'scikit-learn>=1.3.2' +RUN python3.11 -m pip install numpy 'pyarrow>=14.0.0' 'six==1.16.0' 'pandas<=2.1.3' scipy unittest-xml-reporting plotly>=4.8 'mlflow>=2.8.1' coverage matplotlib openpyxl 'memory-profiler==0.60.0' 'scikit-learn>=1.3.2' RUN python3.11 -m pip install 'grpcio==1.59.3' 'grpcio-status==1.59.3' 'protobuf==4.25.1' 'googleapis-common-protos==1.56.4' RUN python3.11 -m pip install 'torch<=2.0.1' torchvision --index-url https://download.pytorch.org/whl/cpu RUN python3.11 -m pip install torcheval @@ -134,5 +134,5 @@ RUN apt-get update && apt-get install -y \ python3.12 python3.12-distutils \ && rm -rf /var/lib/apt/lists/* RUN curl -sS https://bootstrap.pypa.io/get-pip.py | python3.12 -RUN python3.12 -m pip install numpy 'pyarrow>=14.0.0' 'pandas<=2.1.3' scipy unittest-xml-reporting plotly>=4.8 'mlflow>=2.8.1' coverage matplotlib openpyxl 'scikit-learn>=1.3.2' +RUN python3.12 -m pip install numpy 'pyarrow>=14.0.0' 'six==1.16.0' 'pandas<=2.1.3' scipy unittest-xml-reporting plotly>=4.8 'mlflow>=2.8.1' coverage matplotlib openpyxl 'scikit-learn>=1.3.2' RUN python3.12 -m pip install 'grpcio==1.59.3' 'grpcio-status==1.59.3' 'protobuf==4.25.1' 'googleapis-common-protos==1.56.4' diff --git a/dev/requirements.txt b/dev/requirements.txt index 0b629a3b044..66a74471377 100644 --- a/dev/requirements.txt +++ b/dev/requirements.txt @@ -4,6 +4,7 @@ py4j # PySpark dependencies (optional) numpy pyarrow +six==1.16.0 pandas scipy plotly --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org