Hi, 




On 1,2. Yes, you are right, moving the getter to the component level Config 
class itself. 


On 3, HoodieWriteConfig can also set value through ConfigOption, small code 
snippets.
From the bellow snippets, we can see that clients need to know each component's 
builders 
and also call their "with" methods to override the default value in old version.


But, in new version, clients just need to know each component's public config 
options, just like constants.
So, these builders are redundant.
     
/---------------------------------------------------------------------------/


public class HoodieIndexConfigOptions {
  public static final ConfigOption<String> INDEX_TYPE = ConfigOption
          .key("hoodie.index.type")
          .defaultValue(HoodieIndex.IndexType.BLOOM.name());
}


public class HoodieWriteConfig {
  public void setString(ConfigOption<String> option, String value) {
    this.props.put(option.key(), value);
  }
}




/**
 * New version
 */
// set value overrite the default value
HoodieWriteConfig config = new HoodieWriteConfig();
config.set(HoodieIndexConfigOptions.INDEX_TYPE, 
HoodieIndex.IndexType.HBASE.name())




/**
 * Old version
 */
HoodieWriteConfig.Builder builder = HoodieWriteConfig.newBuilder()
builder.withIndexConfig(HoodieIndexConfig.newBuilder().withIndexType(HoodieIndex.IndexType.BLOOM).build())


/---------------------------------------------------------------------------/


Another, users use hudi like bellow, here're all keys.
/---------------------------------------------------------------------------/


df.write.format("hudi").
    option("hoodie.insert.shuffle.parallelism", "10").
    option("hoodie.upsert.shuffle.parallelism", "10").
    option("hoodie.delete.shuffle.parallelism", "10").
    option("hoodie.bulkinsert.shuffle.parallelism", "10").
    option("hoodie.datasource.write.recordkey.field", "name").
    option("hoodie.datasource.write.partitionpath.field", "location").
    option("hoodie.datasource.write.precombine.field", "ts").
    option("hoodie.table.name", tableName).
    mode(Overwrite).
    save(basePath);


/---------------------------------------------------------------------------/




Last, as I responsed to @vino, it's reasonable to handle fallbackkeys. I think 
we need to do this step by step,
it's easy to integrate FallbackKey in the future, it is not what we need right 
now in my opinion.


If some places are still not very clear, feel free to feedback.




Best,
lamber-ken












At 2019-12-11 23:41:31, "Vinoth Chandar" <vin...@apache.org> wrote:
>Hi Lamber-ken,
>
>I looked at the sample PR you put up as well.
>
>On 1,2 => Seems your intent is to replace these with moving the getter to
>the component level Config class itself? I am fine with that (although I
>think its not that big of a hurdle really to use atm). But, once we do that
>we could pass just the specific component config into parts of code versus
>passing in the entire HoodieWriteConfig object. I am fine with moving the
>classes to a ConfigOption class as you suggested as well.
>
>On 3, I still we feel we will need the builder pattern going forward. to
>build the HoodieWriteConfig object. Like below? Cannot understand why we
>would want to change this. Could you please clarify?
>
>HoodieWriteConfig.Builder builder =
>    
> HoodieWriteConfig.newBuilder().withPath(cfg.targetBasePath).combineInput(cfg.filterDupes,
>true)
>        
> .withCompactionConfig(HoodieCompactionConfig.newBuilder().withPayloadClass(cfg.payloadClassName)
>            // Inline compaction is disabled for continuous mode.
>otherwise enabled for MOR
>            .withInlineCompaction(cfg.isInlineCompactionEnabled()).build())
>        .forTable(cfg.targetTableName)
>        
> .withIndexConfig(HoodieIndexConfig.newBuilder().withIndexType(HoodieIndex.IndexType.BLOOM).build())
>        .withAutoCommit(false).withProps(props);
>
>
>Typically, we write RFCs for large changes that breaks existing behavior or
>introduces significantly complex new features.. If you are just planning to
>do the refactoring into ConfigOption class, per se you don't need a RFC.
>But , if you plan to address the fallback keys (or) your changes are going
>to break/change existing jobs, we would need a RFC.
>
>>> It is not clear to me whether there is any external facing changes which
>changes this model.
>I am still unclear on this as well. can you please explicitly clarify?
>
>thanks
>vinoth
>
>
>On Tue, Dec 10, 2019 at 12:35 PM lamberken <lamber...@163.com> wrote:
>
>>
>> Hi, @Balaji @Vinoth
>>
>>
>> I'm sorry, some places are not very clear,
>>
>>
>> 1, We can see that HoodieMetricsConfig, HoodieStorageConfig, etc.. already
>> defined in project.
>>    But we get property value by methods which defined in
>> HoodieWriteConfig, like HoodieWriteConfig#getParquetMaxFileSize,
>>    HoodieWriteConfig#getParquetBlockSize, etc. It's means that
>> Hoodie*Config are redundant.
>>
>>
>> 2, These Hoodie*Config classes are used to set default value when call
>> their build method, nothing else.
>>
>>
>> 3, For current plan is keep the Builder pattern when configuring, when we
>> are familiar with the config framework,
>>    We will find that Hoodie*Config class are redundant and methods
>> prefixed with "get" in HoodieWriteConfig are also redundant.
>>
>>
>> In addition, I create a pr[1] for initializing with a demo. At this demo,
>> I create
>> MetricsGraphiteReporterOptions which contains HOST, PORT, PREFIX, and
>> remove getGraphiteServerHost,
>> getGraphiteServerPort, getGraphiteMetricPrefix in HoodieMetricsConfig.
>>
>>
>> https://github.com/apache/incubator-hudi/pull/1094
>>
>>
>> Best,
>> lamber-ken
>>
>>
>>
>>
>>
>>
>>
>> At 2019-12-11 02:35:30, "Balaji Varadarajan" <v.bal...@ymail.com.INVALID>
>> wrote:
>> > Hi Lamber-Ken,
>> >Thanks for the time writing the proposal and thinking about improving
>> Hudi usability.
>> >My preference would be to keep the Builder pattern when configuring. It
>> is something I find it natural when configuring. It is not clear to me
>> whether there is any external facing changes which changes this model.
>> Would you mind adding some more details on the RFC. It would save time to
>> read it in one place as opposed to checking out github repo :)
>> >Thanks,Balaji.V
>> >    On Tuesday, December 10, 2019, 07:55:01 AM PST, Vinoth Chandar <
>> vin...@apache.org> wrote:
>> >
>> > Hi ,
>> >
>> >Thanks for the proposal. Some parts I agree, some parts I don't and some
>> >parts are unclear
>> >
>> >Agree :
>> >- On introducing a class that binds key, default value, provided value,
>> and
>> >also may be a doc along with it (?).
>> >- Designing the framework to have fallback keys is good IMO. It helps us
>> do
>> >things like https://issues.apache.org/jira/browse/HUDI-89
>> >
>> >Disagree :
>> >- Not all configuration values are in HoodieWriteConfig, its not accurate.
>> >Configs are already split by components into HoodieIndexConfig,
>> >HoodieCompactionConfig etc..
>> >- There are helpers for all these conveniently located in
>> >HoodieWriteConfig. I think some of the claims of usability seem subjective
>> >to me, speaking from hands-on experience writing jobs. So, if you
>> proposing
>> >a large shake up (e.g not have a single properties file load all
>> >components), I would love to understand this at more depth. From my
>> >experience, well namespaced configs in a single properties file keeps it
>> >simple and understandable.
>> >
>> >Unclear :
>> >- What is impact on existing jobs - using  RDD/WriteClient API,
>> DataSource,
>> >DeltaStreamer level? Do you intend to change namespacing of configs?
>> >
>> >
>> >Thanks
>> >Vinoth
>> >
>> >On Tue, Dec 10, 2019 at 6:44 AM lamberken <lamber...@163.com> wrote:
>> >
>> >>
>> >>
>> >> Hi, vino
>> >>
>> >>
>> >> Reasonable,  we can refactor this step by step. The first step now is to
>> >> introduce the config framework.
>> >> When our community is familiar with the config framework mechanism, it's
>> >> easy to integrate FallbackKey in the future.
>> >>
>> >>
>> >> Best,
>> >> lamber-ken
>> >>
>> >>
>> >>
>> >> At 2019-12-10 11:51:22, "vino yang" <yanghua1...@gmail.com> wrote:
>> >> >Hi Lamber,
>> >> >
>> >> >Thanks for the proposal. +1 from my side.
>> >> >
>> >> >When it comes to configuration, it will involve how we handle
>> deprecated
>> >> >configuration items in the future. In my opinion, we need to take this
>> >> into
>> >> >consideration when designing. There are already some successful
>> practices
>> >> >for our reference. For example, Flink defines some deprecated
>> >> >configurations as FallbackKey[1]. Maybe we can learn from these
>> designs.
>> >> >
>> >> >WDYT?
>> >> >
>> >> >[1]:
>> >> >
>> >>
>> https://github.com/apache/flink/blob/master/flink-core/src/main/java/org/apache/flink/configuration/FallbackKey.java
>> >> >
>> >> >Best,
>> >> >Vino
>> >> >
>> >> >lamberken <lamber...@163.com> 于2019年12月9日周一 下午11:19写道:
>> >> >
>> >> >>
>> >> >>
>> >> >> Hi, all
>> >> >>
>> >> >>
>> >> >> Currently, many configuration items and their default values are
>> >> dispersed
>> >> >> in the config file like HoodieWriteConfig. It’s very confused for
>> >> >> developers, and it's easy for developers to use them in a wrong place
>> >> >> especially when there are more and more configuration items. If we
>> can
>> >> >> solve this, developers will benefit from it and the code structure
>> will
>> >> be
>> >> >> more concise.
>> >> >>
>> >> >>
>> >> >> I had create a JIRA[1] and a under discuss RFC[2] to explain how to
>> >> solve
>> >> >> the problem, if you are interested in this, you can visit jira and
>> RFC
>> >> for
>> >> >> detail. Any comments and feedback are welcome, WDYT?
>> >> >>
>> >> >>
>> >> >> Best,
>> >> >> lamber-ken
>> >> >>
>> >> >>
>> >> >> [1] https://issues.apache.org/jira/projects/HUDI/issues/HUDI-375
>> >> >> [2]
>> >> >>
>> >>
>> https://cwiki.apache.org/confluence/display/HUDI/RFC-11+%3A+Refactor+of+the+configuration+framework+of+hudi+project
>> >>
>>

Reply via email to