Joris Van den Bossche created ARROW-10695:
---------------------------------------------

             Summary: [C++][Dataset] Allow to use a UUID in the 
basename_template when writing a dataset
                 Key: ARROW-10695
                 URL: https://issues.apache.org/jira/browse/ARROW-10695
             Project: Apache Arrow
          Issue Type: Improvement
          Components: C++
            Reporter: Joris Van den Bossche


Currently we allow the user to specify a {{basename_template}}, and this can 
include a {{"\{i\}"}} part to replace it with an automatically incremented 
integer (so each generated file written to a single partition is unique):

https://github.com/apache/arrow/blob/master/python/pyarrow/dataset.py#L713-L717

It _might_ be useful to also have the ability to use a UUID, to ensure the file 
is unique in general (not only for a single write) and to mimic the behaviour 
of the old {{write_to_dataset}} implementation.

For example, we could look for a {{"\{uuid\}"}} in the template string, and if 
present replace it for each file with a new UUID.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to