Re: [I] [PYTHON] Allow `PathLike` path objects as input to `readwriter` [spark]

2026-04-06 Thread via GitHub


gaogaotiantian commented on issue #55203:
URL: https://github.com/apache/spark/issues/55203#issuecomment-4195952338

   Do we have to have a matching experience on Python vs Scala? I mean Path 
object in Python and Scala are different anyway. We can have this as a pyspark 
specific feature? We have slightly different APIs for Python and Scala right?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [I] [PYTHON] Allow `PathLike` path objects as input to `readwriter` [spark]

2026-04-06 Thread via GitHub


HyukjinKwon commented on issue #55203:
URL: https://github.com/apache/spark/issues/55203#issuecomment-4195934063

   We should probably take a look at the Scala/Java side together to support 
Path instances which are simialr with it. The work would be pretty large I 
believe vs the benefit we get from that doesn't seem worth to me. But I won't 
go against that idea.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [I] [PYTHON] Allow `PathLike` path objects as input to `readwriter` [spark]

2026-04-06 Thread via GitHub


gaogaotiantian commented on issue #55203:
URL: https://github.com/apache/spark/issues/55203#issuecomment-4195890009

   I think it's a reasonable idea. @HyukjinKwon 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



[I] [PYTHON] Allow `PathLike` path objects as input to `readwriter` [spark]

2026-04-05 Thread via GitHub


Ivernoerve opened a new issue, #55203:
URL: https://github.com/apache/spark/issues/55203

   ### Description
   
   Currently, the read and write functionality in [`pyspark.sql.readwriter`] 
supports paths on the form `PathOrPaths = Union[str, List[str]]` 
   
   `Pathlibs`: `Path` is a widely used way to manage pathlike objects in 
python. It is heavily adopted by the community. Allowing the readers and 
writers to consume path objects, makes pyspark more python "native" by 
accepting commonly used first party data structures.
   
   Supporting `os.PathLike` objects would reduce friction between spark and 
python
   
   ### Motivation
   
   pathlib is part of the Python standard library and is widely adopted across 
the ecosystem due to its improved readability, and safety compared to raw 
strings. Many Python libraries already accept PathLike objects (via 
os.PathLike), making this a good inclusion for aligning pyspark with modern 
python.
   Users working with PySpark often need to manually convert Path objects to 
strings before passing them into Spark APIs. This introduces unnecessary 
friction and deviates from commonly adopted python practices.
   
   Supporting PathLike objects would:
   * Align PySpark with modern Python standards
   * Reduce boilerplate conversions `(str(path))` or `os.fspath(path)`
   * Make PySpark feel more "native" in Python environments
   
   ### Proposed Change
   
   Extend the accepted input types for path arguments in pyspark.sql.readwriter 
from:
   
   `PathOrPaths = Union[str, List[str]]` 
   
   to `PathOrPaths = Union[str, os.PathLike, List[Union[str, os.PathLike]]]` 
   
   Internally the `PathLike` objects would be normalized back to strings before 
being passed along to the `jreader`
   
   
   This change is fully backward compatible, as it only expands the accepted 
input types without altering existing behavior.
   The proposed change increases the public api's flexibility without breaking 
existing standards.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]