[ https://issues.apache.org/jira/browse/SPARK-28749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sean Owen reassigned SPARK-28749: --------------------------------- Assignee: Matt Foley > Fix PySpark tests not to require kafka-0-8 in branch-2.4 > -------------------------------------------------------- > > Key: SPARK-28749 > URL: https://issues.apache.org/jira/browse/SPARK-28749 > Project: Spark > Issue Type: Bug > Components: PySpark, Tests > Affects Versions: 2.4.3 > Reporter: Matt Foley > Assignee: Matt Foley > Priority: Minor > > As noted in SPARK-27550 we want to encourage testing of Spark 2.4.x with > Scala-2.12, and kafka-0-8 does not support Scala-2.12. > Currently, the PySpark tests invoked by `python/run-tests` demand the > presence of kafka-0-8 libraries. If not present, this failure message will be > generated: > {code} > Traceback (most recent call last): > File "/usr/lib64/python2.7/runpy.py", line 174, in _run_module_as_main > "__main__", fname, loader, pkg_name) > File "/usr/lib64/python2.7/runpy.py", line 72, in _run_code > exec code in run_globals > File "spark/python/pyspark/streaming/tests.py", line 1579, in <module> > kafka_assembly_jar = search_kafka_assembly_jar() > File "spark/python/pyspark/streaming/tests.py", line 1524, in > search_kafka_assembly_jar > "You need to build Spark with " > Exception: Failed to find Spark Streaming kafka assembly jar in > spark/external/kafka-0-8-assembly. You need to build Spark with 'build/sbt > -Pkafka-0-8 assembly/package streaming-kafka-0-8-assembly/assembly' or > 'build/mvn -DskipTests -Pkafka-0-8 package' before running this test. > Had test failures in pyspark.streaming.tests with > spark/py_virtenv/bin/python; see logs. > Process exited with code 255 > {code} > This change is only targeted at branch-2.4, as most kafka-0-8 related > materials have been removed in master and this problem no longer occurs there. > PROPOSED SOLUTION > The proposed solution is to make the kafka-0-8 stream testing optional for > pyspark testing, exactly the same as the Kinesis stream testing currently is, > in file `python/pyspark/streaming/tests.py`. This is only a few lines of > change. > Ideally it would be limited to when SPARK_SCALA_VERSION >= 2.12, but it turns > out to be somewhat onerous to reliably obtain that value from within the > python test env, and no other python test code currently does so. So my > proposed solution simply makes the use of the kafka-0-8 profile optional, and > leaves it to the tester to include it for Scala-2.11 test builds and exclude > it for Scala-2.12 test builds. > PR will be available in a day or so. -- This message was sent by Atlassian Jira (v8.3.2#803003) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org