Re: No LICENSE file in spark custom build distribution

2020-05-01 Thread Holden Karau
Can you send me the output of those two commands On Fri, May 1, 2020 at 1:46 PM Xiangyu Li wrote: > Hi Holden, > > Please check the second email of mine in this email chain. I did that > originally and to quote my email: > > > =

Re: No LICENSE file in spark custom build distribution

2020-05-01 Thread Xiangyu Li
Hi Holden, Please check the second email of mine in this email chain. I did that originally and to quote my email: === In the spark-2.4.5 src directory, I just did a simple: `./build/mvn -DskipTests clean pac

Re: No LICENSE file in spark custom build distribution

2020-05-01 Thread Xiangyu Li
I need the pip packaging, all these efforts are to get a pyspark pip package actually. On Fri, May 1, 2020 at 4:38 PM Sean Owen wrote: > I see, that makes more sense, though I have limited knowledge of how the > pip packaging works. You don't need pip packaging, do you? just pyspark > itself rig

Re: No LICENSE file in spark custom build distribution

2020-05-01 Thread Holden Karau
Your problem isn't the missing license per-se (that just happens to be the first error). I don't believe that is the way we expect users to pip install the Python library. pip will only install directories/targets underneath the directory where setup.py, hence the deps directory which is construct

Re: No LICENSE file in spark custom build distribution

2020-05-01 Thread Sean Owen
I see, that makes more sense, though I have limited knowledge of how the pip packaging works. You don't need pip packaging, do you? just pyspark itself right. Omit --pip? On Fri, May 1, 2020 at 3:32 PM Xiangyu Li wrote: > make-distribution.sh with --pip would run a `python setup.py sdist` within

Re: No LICENSE file in spark custom build distribution

2020-05-01 Thread Xiangyu Li
make-distribution.sh with --pip would run a `python setup.py sdist` within that make-distribution.sh script. I also tested `make-distribution.sh` without --pip, and the same error happens. Correct me if I'm wrong, but pyspark binary has always been successfully built, it is the pyspark pip package

Re: No LICENSE file in spark custom build distribution

2020-05-01 Thread Sean Owen
Hm, others may have to chime in here. Either that's not how you create the pyspark binary from the source release (make-distribution.sh doesn't do that?) or there is a small but important issue here, that the source release doesn't contain one thing that the binary release script expects, which is

Re: No LICENSE file in spark custom build distribution

2020-05-01 Thread Xiangyu Li
To reproduce this, I just did curl -O http://www.trieuvan.com/apache/spark/spark-2.4.5/spark-2.4.5.tgz tar xzf spark-2.4.5.tgz cd spark-2.4.5 ./dev/make-distribution.sh --name custom-spark --pip --tgz -Phadoop-2.7 mv spark-2.4.5-bin-custom-spark.tgz ../ cd .. tar xzf spark-2.4.5-bin-custom-spark.t

Re: No LICENSE file in spark custom build distribution

2020-05-01 Thread Sean Owen
You wrote: " 2. On each machine, I can install pyspark by running `python setup.py install` inside the python directory. Step 2 would fail because of missing the licenses directory. " That shouldn't depend on the license file, and the script you showed does not fail when not present, so I am won

Re: No LICENSE file in spark custom build distribution

2020-05-01 Thread Xiangyu Li
Hmm, sorry I don't get what part of my email were you referring to when you said "the build fails?". So I am trying to build a custom spark binary distribution with, say, different Hadoop versions and R support. Then I stored this custom build on S3, so as I am building more machines I can just d

Re: No LICENSE file in spark custom build distribution

2020-05-01 Thread Sean Owen
Hm, the build fails? you can see this is just skipped if not present, for this reason. I'm not clear why you need the file for its own sake, for your own internal modification that you don't redistribute. On Fri, May 1, 2020 at 11:43 AM Xiangyu Li wrote: > Hi Sean, > > Thanks for the quick res

Re: No LICENSE file in spark custom build distribution

2020-05-01 Thread Xiangyu Li
Hi Sean, Thanks for the quick response! Yes, what you described about how LICENSE file should be distributed makes sense. The reason I learned about this is that I was trying to build spark-2.4.5-bin-custom.tgz, then distributes this build to multiple machines, so that: 1. These machines can run

Re: No LICENSE file in spark custom build distribution

2020-05-01 Thread Sean Owen
The source distribution has the source LICENSE file. The binary distribution has the LICENSE-binary license file. The source release isn't supposed to have LICENSE-binary as it would not be accurate for that release; LICENSE is. If you're redistributing a build, you'll have your own process for mod

No LICENSE file in spark custom build distribution

2020-05-01 Thread Xiangyu Li
Hello, I downloaded spark-2.4.5 source from https://mirrors.ocf.berkeley.edu/apache/spark/spark-2.4.5/spark-2.4.5.tgz After extracting it and running: ./dev/make-distribution.sh --name custom-spark --pip --r --tgz -Psparkr -Phadoop-2.7 -Phive -Phive-thriftserver -Pmesos -Pyarn -Pkubernetes It c