Re: Request to document the direct relationship between other configurations

2020-02-14 Thread Hyukjin Kwon
Thanks Jungtaek! 2020년 2월 14일 (금) 오후 3:57, Jungtaek Lim 님이 작성: > OK I agree this is going forward as we decided the final goal and it seems > someone starts to work on. In the meanwhile we agree about documenting the > direct relationship, and which style to use is "open". > > Thanks again to

Re: Request to document the direct relationship between other configurations

2020-02-13 Thread Jungtaek Lim
OK I agree this is going forward as we decided the final goal and it seems someone starts to work on. In the meanwhile we agree about documenting the direct relationship, and which style to use is "open". Thanks again to initiate the discussion thread - this thread led the following thread for

Re: Request to document the direct relationship between other configurations

2020-02-13 Thread Hyukjin Kwon
It's okay to just follow one prevailing style. The main point I would like to say is the fact that we should *document* the direct relationship of configurations. For this case specifically, I don't think there is so much point here to decide one hard requirement to follow for the mid-term. We

Re: Request to document the direct relationship between other configurations

2020-02-13 Thread Jungtaek Lim
Even spark.dynamicAllocation.* doesn't follow 2-2, right? It follows the mix of 1 and 2-1, though 1 is even broken there. It doesn't matter If we just discuss about one-time decision - it may be OK to not to be strict on consistency, though it's not ideal. The thing is that these kind of

Re: Request to document the direct relationship between other configurations

2020-02-12 Thread Hyukjin Kwon
I think it’s just fine as long as we’re consistent with the instances having the description, for instance: When true and ‘spark.xx.xx’ is enabled, … I think this is 2-2 in most cases so far. I think we can reference other configuration keys in another configuration documentation by using

Re: Request to document the direct relationship between other configurations

2020-02-12 Thread Jungtaek Lim
I tend to agree that there should be a time to make thing be consistent (and I'm very happy to see the new thread on discussion) and we may want to take some practice in the interim. But for me it is not clear what is the practice in the interim. I pointed out the problems of existing style and

Re: Request to document the direct relationship between other configurations

2020-02-12 Thread Hyukjin Kwon
Adding those information is already a more prevailing style at this moment, and this is usual to follow prevailing side if there isn't a specific reason. If there is confusion about this, I will explicitly add it into the guide ( https://spark.apache.org/contributing.html). Let me know if this

Re: Request to document the direct relationship between other configurations

2020-02-12 Thread Hyukjin Kwon
Yes, that's probably our final goal to revisit the configurations to make it structured and deduplicated documentation cleanly. +1. One point I would like to add is though to add such information to the documentation until we actually manage our final goal since probably it's going to take a

Re: Request to document the direct relationship between other configurations

2020-02-12 Thread Dongjoon Hyun
Thank you for raising the issue, Hyukjin. According to the current status of discussion, it seems that we are able to agree on updating the non-structured configurations and keeping the structured configuration AS-IS. I'm +1 for the revisiting the configurations if that is our direction. If

Re: Request to document the direct relationship between other configurations

2020-02-12 Thread Jules Damji
All are valid and valuable observations to put into practice: * structured and meaningful config names * explainable text or succinct description * easily accessible or searchable While these are aspirational but gradually doable if we make it part of the dev and review cycle. Often

Re: Request to document the direct relationship between other configurations

2020-02-12 Thread Hyukjin Kwon
Yeah, that's one of my point why I dont want to document this in the guide yet. I would like to make sure dev people are on the same page that documenting is a better practice. I dont intend to force as a hard requirement; however, if that's pointed out, it should better to address. On Wed, 12

Re: Request to document the direct relationship between other configurations

2020-02-12 Thread Wenchen Fan
In general I think it's better to have more detailed documents, but we don't have to force everyone to do it if the config name is structured. I would +1 to document the relationship of we can't tell it from the config names, e.g. spark.shuffle.service.enabled and spark.dynamicAllocation.enabled.

Re: Request to document the direct relationship between other configurations

2020-02-12 Thread Hyukjin Kwon
Also, I would like to hear other people' thoughts on here. Could I ask what you guys think about this in general? 2020년 2월 12일 (수) 오후 12:02, Hyukjin Kwon 님이 작성: > To do that, we should explicitly document such structured configuration > and implicit effect, which is currently missing. > I would

Re: Request to document the direct relationship between other configurations

2020-02-11 Thread Hyukjin Kwon
To do that, we should explicitly document such structured configuration and implicit effect, which is currently missing. I would be more than happy if we document such implied relationship, *and* if we are very sure all configurations are structured correctly coherently. Until that point, I think

Re: Request to document the direct relationship between other configurations

2020-02-11 Thread Jungtaek Lim
I'm looking into the case of `spark.dynamicAllocation` and this seems to be the thing to support my voice. https://github.com/apache/spark/blob/master/docs/configuration.md#dynamic-allocation I don't disagree with adding "This requires spark.shuffle.service.enabled to be set." in the description

Re: Request to document the direct relationship between other configurations

2020-02-11 Thread Hyukjin Kwon
Sure, adding "[DISCUSS]" is a good practice to label it. I had to do it although it might be "redundant" :-) since anyone can give feedback to any thread in Spark dev mailing list, and discuss. This is actually more prevailing given my rough reading of configuration files. I would like to see

Re: Request to document the direct relationship between other configurations

2020-02-11 Thread Jungtaek Lim
I'm sorry if I miss something, but this is ideally better to be started as [DISCUSS] as I haven't seen any reference to have consensus on this practice. For me it's just there're two different practices co-existing on the codebase, meaning it's closer to the preference of individual (with

Re: Request to document the direct relationship between other configurations

2020-02-11 Thread Hyukjin Kwon
> I don't plan to document this officially yet Just to prevent confusion, I meant I don't yet plan to document the fact that we should write the relationships in configurations as a code/review guideline in https://spark.apache.org/contributing.html 2020년 2월 12일 (수) 오전 9:57, Hyukjin Kwon 님이 작성:

Request to document the direct relationship between other configurations

2020-02-11 Thread Hyukjin Kwon
Hi all, I happened to review some PRs and I noticed that some configurations don't have some information necessary. To be explicit, I would like to make sure we document the direct relationship between other configurations in the documentation. For example,