Hi, Just thought I'd chime in on this point:
> - In your case, the partitioning has the same name as one of the actual columns in the data files. I am not sure this corner case of duplicate fields is tested very well, or how the filtering will work? I _think_ this is the default behaviour for pyspark for writes. Eg. the column is both in the data files as well as in the partition. I think this might actually make sense, though, since putting the partition column in the schema means you'll know what type it should be when you read it back from disk (at least for data files that support schemas). -- Kind regards, Robin Kåveland
