Re: Building Spark to run PySpark Tests?

2023-01-19 Thread Sean Owen
It's not clear what error you're facing from this info (ConnectionError could mean lots of things), so would be hard to generalize answers. How much mem do you have on your Mac? -Xmx2g sounds low, but also probably doesn't matter much. Spark builds work on my Mac, FWIW. On Thu, Jan 19, 2023 at

Re: Building Spark to run PySpark Tests?

2023-01-19 Thread Adam Chhina
Hmm, would there be a list of common env issues that would interfere with builds? Looking up the error message, it seemed like often the issue was OOM by the JVM process. I’m not sure if that’s what’s happening here, since during the build and setting up the tests the config should have

Re: Building Spark to run PySpark Tests?

2023-01-18 Thread Sean Owen
Release _branches_ are tested as commits arrive to the branch, yes. That's what you see at https://github.com/apache/spark/actions Released versions are fixed, they don't change, and were also manually tested before release, so no they are not re-tested; there is no need. You presumably have some

Re: Building Spark to run PySpark Tests?

2023-01-18 Thread Adam Chhina
Hi Sean, That’s fair in regards to 3.3.x being the current release branch. I’m not familiar with the testing schedule, but I had assumed all currently supported release versions would have some nightly/weekly tests ran; is that not the case? I only ask, as when I when I’m seeing these test

Re: Building Spark to run PySpark Tests?

2023-01-18 Thread Sean Owen
That isn't the released version either, but rather the head of the 3.2 branch (which is beyond 3.2.3). You may want to check out the v3.2.3 tag instead: https://github.com/apache/spark/tree/v3.2.3 ... instead of 3.2.1. But note of course the 3.3.x is the current release branch anyway. Hard to say

Re: Building Spark to run PySpark Tests?

2023-01-18 Thread Adam Chhina
Oh, whoops, didn’t realize that wasn’t the release version, thanks! > git clone --branch branch-3.2 https://github.com/apache/spark.git Ah, so the old failing tests are passing now, but I am seeing failures in `pyspark.tests.test_broadcast` such as `test_broadcast_value_against_gc`, with a

Re: Building Spark to run PySpark Tests?

2023-01-18 Thread Bjørn Jørgensen
Replace > > git clone g...@github.com:apache/spark.git > > git checkout -b spark-321 v3.2.1 with git clone --branch branch-3.2 https://github.com/apache/spark.git This will give you branch 3.2 as today, what I suppose you call upstream https://github.com/apache/spark/commits/branch-3.2 and right

Re: Building Spark to run PySpark Tests?

2023-01-18 Thread Sean Owen
Never seen those, but it's probably a difference in pandas, numpy versions. You can see the current CICD test results in GitHub Actions. But, you want to use release versions, not an RC. 3.2.1 is not the latest version, and it's possible the tests were actually failing in the RC. On Wed, Jan 18,

Re: Building Spark to run PySpark Tests?

2023-01-18 Thread Adam Chhina
Bump, Just trying to see where I can find what tests are known failing for a particular release, to ensure I’m building upstream correctly following the build docs. I figured this would be the best place to ask as it pertains to building and testing upstream (also more than happy to provide a

Building Spark to run PySpark Tests?

2022-12-27 Thread Adam Chhina
As part of an upgrade, I was looking to run upstream PySpark unit tests on v3.2.1-rc2 before I applied some downstream patches and tested those. However, I’m running into some issues with failing unit tests, which I’m not sure are failing upstream or due to some steps, I missed in the build. The