Andy Grove created ARROW-11059:
----------------------------------

             Summary: [Rust] [DataFusion] Implement extensible configuration 
mechanism
                 Key: ARROW-11059
                 URL: https://issues.apache.org/jira/browse/ARROW-11059
             Project: Apache Arrow
          Issue Type: New Feature
          Components: Rust - DataFusion
            Reporter: Andy Grove
            Assignee: Andy Grove
             Fix For: 3.0.0


We are getting to the point where there are multiple settings we could add to 
operators to fine-tune performance. Custom operators provided by crates that 
extend DataFusion may also need this capability.

I propose that we add support for key-value configuration options so that we 
don't need to plumb through each new configuration setting that we add.

For example. I am about to start on a "coalesce batches" operator and I would 
like a setting such as "coalesce.batch.size".

For built-in settings like this we can provide information such as 
documentation and default values and generate documentation from this.

For example, here is how Spark defines configs:
{code:java}
  val PARQUET_VECTORIZED_READER_ENABLED =
            buildConf("spark.sql.parquet.enableVectorizedReader")
              .doc("Enables vectorized parquet decoding.")
              .version("2.0.0")
              .booleanConf
              .createWithDefault(true) {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to