(spark) branch branch-3.4 updated: [SPARK-48116][INFRA][3.4] Run `pyspark-pandas*` only in PR builder and Daily Python CIs

dongjoon Wed, 08 May 2024 13:48:28 -0700

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git



The following commit(s) were added to refs/heads/branch-3.4 by this push:
     new 2d5a77bbea4a [SPARK-48116][INFRA][3.4] Run `pyspark-pandas*` only in 
PR builder and Daily Python CIs
2d5a77bbea4a is described below

commit 2d5a77bbea4a96916525299d277f368790ccc602
Author: Dongjoon Hyun <dh...@apple.com>
AuthorDate: Wed May 8 13:48:12 2024 -0700

    [SPARK-48116][INFRA][3.4] Run `pyspark-pandas*` only in PR builder and 
Daily Python CIs
    
    ### What changes were proposed in this pull request?
    
    This PR aims to run `pyspark-pandas*` of `branch-3.4` only in PR builder 
and Daily Python CIs. In other words, only the commit builder will skip it by 
default. Please note that all PR builders is not consuming ASF resources and 
they provides lots of test coverage everyday.
    
    `branch-3.4` Python Daily CI runs all Python tests including 
`pyspark-pandas` like the following.
    
    
https://github.com/apache/spark/blob/21548a8cc5c527d4416a276a852f967b4410bd4b/.github/workflows/build_branch34_python.yml#L43-L44
    
    ### Why are the changes needed?
    
    To reduce GitHub Action usage to meet ASF INFRA policy.
    - https://infra.apache.org/github-actions-policy.html
    
        > All workflows MUST have a job concurrency level less than or equal to 
20. This means a workflow cannot have more than 20 jobs running at the same 
time across all matrices.
    
    Although `pandas` is an **optional** package in PySpark, this is essential 
for PySpark users and we have **6 test pipelines** which requires lots of 
resources. We need to optimize the job concurrently level to `less than or 
equal to 20` while keeping the test capability as much as possible.
    
    
https://github.com/apache/spark/blob/da0c7cc81bb3d69d381dd0683e910eae4c80e9ae/dev/requirements.txt#L4-L7
    
    - pyspark-pandas
    - pyspark-pandas-slow
    
    ### Does this PR introduce _any_ user-facing change?
    
    No.
    
    ### How was this patch tested?
    
    Manual review.
    
    ### Was this patch authored or co-authored using generative AI tooling?
    
    No.
    
    Closes #46483 from dongjoon-hyun/SPARK-48116-3.4.
    
    Authored-by: Dongjoon Hyun <dh...@apple.com>
    Signed-off-by: Dongjoon Hyun <dh...@apple.com>
---
 .github/workflows/build_and_test.yml | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/.github/workflows/build_and_test.yml 
b/.github/workflows/build_and_test.yml
index 2d2e8da80d46..825ad064d078 100644
--- a/.github/workflows/build_and_test.yml
+++ b/.github/workflows/build_and_test.yml
@@ -88,12 +88,18 @@ jobs:
             tpcds=`./dev/is-changed.py -m sql`
             docker=`./dev/is-changed.py -m docker-integration-tests`
           fi
+          if [ "${{ github.repository != 'apache/spark' }}" ]; then
+            pandas=$pyspark
+          else
+            pandas=false
+          fi
           # 'build', 'scala-213', and 'java-11-17' are always true for now.
           # It does not save significant time and most of PRs trigger the 
build.
           precondition="
             {
               \"build\": \"true\",
               \"pyspark\": \"$pyspark\",
+              \"pyspark-pandas\": \"$pandas\",
               \"sparkr\": \"$sparkr\",
               \"tpcds-1g\": \"$tpcds\",
               \"docker-integration-tests\": \"$docker\",
@@ -349,6 +355,12 @@ jobs:
             pyspark-pandas-slow
           - >-
             pyspark-connect
+        exclude:
+          # Always run if pyspark-pandas == 'true', even infra-image is skip 
(such as non-master job)
+          # In practice, the build will run in individual PR, but not against 
the individual commit
+          # in Apache Spark repository.
+          - modules: ${{ 
fromJson(needs.precondition.outputs.required).pyspark-pandas != 'true' && 
'pyspark-pandas' }}
+          - modules: ${{ 
fromJson(needs.precondition.outputs.required).pyspark-pandas != 'true' && 
'pyspark-pandas-slow' }}
     env:
       MODULES_TO_TEST: ${{ matrix.modules }}
       HADOOP_PROFILE: ${{ inputs.hadoop }}


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch branch-3.4 updated: [SPARK-48116][INFRA][3.4] Run `pyspark-pandas*` only in PR builder and Daily Python CIs

Reply via email to