acphile opened a new issue #18412:
URL: https://github.com/apache/incubator-mxnet/issues/18412


   ## Motivations
   Currently the implementation of mxnet.gluon.block is not so pythonic and 
there are many redundancies 
   
   ### 1. overlaps between Block._params and Block._reg_params 
   when we want to self-define a model, we currently need to use the code as 
follows:
   ```
   class Net(nn.HybridBlock):
       def __init__(self, **kwargs):
           super(HybridNet, self).__init__(**kwargs)
           with self.name_scope():
               self.hidden1 = nn.Dense(256, activation='relu')
               self.a=self.params.get('a', shape=(1, ))        
   ```
   There are several shortcomings when using this form of registration:
   a. adding parameter ‘a’ will lead to double recordings in both self._params 
and self._reg_params, which is a redundancy. And there is also a discrepancy in 
Block:
         i. In the method “collect_params”, we use 
“_params” to get all parameters
        ii. while in the method 
“_collect_params_with_prefix” (and methods “load_parameters” accordingly), we 
use “_reg_params” to get all parameters.
   b. Currently if we do not use “with self.name_scope():” for children blocks, 
it will lead to wrong name scopes. For the following example, we actually can 
not get the parameters of self.hidden1 from the result of collect_params
   ```
   class HybridNet(nn.HybridBlock):
       def __init__(self, **kwargs):
           super(HybridNet, self).__init__(**kwargs)
           self.hidden1 = nn.Dense(256, activation='relu')
           with self.name_scope():
               self.hidden2 = nn.Dense(10, activation='relu')
   
       def hybrid_forward(self, F, x):
           x = self.hidden2(self.hidden1(x))
           return x
       
   >>> net = HybridNet()
   >>> net.initialize()
   >>> print(net.collect_params())
   hybridnet0_ (
     Parameter dense0_weight (shape=(256, -1), dtype=float32)
     Parameter dense0_bias (shape=(256,), dtype=float32)
     Parameter hybridnet0_dense0_weight (shape=(10, -1), dtype=float32)
     Parameter hybridnet0_dense0_bias (shape=(10,), dtype=float32)
   )
   ```
   From the above example we can also find that the parameter names are not 
related to the attributes’ names, which is not straightforward.
   
   In all, we find that using name_scope and ParameterDict is not 
user-friendly. Thus we plan to remove such redundancies and simplify the 
definitions of children blocks and parameters, like:
   ```
   class Net(nn.HybridBlock):
       def __init__(self, **kwargs):
           super(HybridNet, self).__init__(**kwargs)
           self.hidden1 = nn.Dense(256, activation='relu')
           self.a=gluon.parameter.Parameter(name="a", shape=(1, ))    
   ```
   And we also can get correct results in the following operations like 
“collect_params”: For the above example, when calling “collect_params” we 
should get the form like:
   {
   “a”: xxx
   “hidden1_weight”: xxx
   “hidden1_bias”: xxx
   }
   which is like the origianl form used in _collect_params_with_prefix() 
   
   For using name_scope in hybridization, we use new method “set_prefix()” to 
recursively add prefix for all parameters inside, using the name of attributes 
(similar form as the keys in _collect_params_with_prefix())
   For example:
   ```
   net=Net()
   >>> print(net.collect_params())
   {
       “a”: Parameter a (...)
       “hidden1_weight”: Parameter weight (...)
       “hidden1_bias”: Parameter bias (...)
   }
   >>> net.set_prefix()
   >>> print(net.collect_params())
   {
       “a”: Parameter a (...)
       “hidden1_weight”: Parameter hidden1_weight (...)
       “hidden1_bias”: Parameter hidden1_bias (...)
   }
   ```
   Shared parameter would be added with the prefix where it first occurs.
   
   
   ### 2. parameter sharing 
   Currently, we use parameter “params” in the definition of Block for 
parameter sharing. It means before the __init__ of Block, shared parameters 
already recorded in self._params.shared. And currently Block forbids overriding 
parameters. 
   We think that this is not convenient. A most common way to share parameter 
is like what Pytorch does, like 
   ```
   self.hidden1.weight=self.hidden2.weight
   ```
   But note that in the case where we have a HybridBlock and the block has been 
hybridized, then we shouldn't allow overriding the parameter but ask the user 
to unhybridize the Block first.
   To further allow sharing parameters recursively, we plan to add an API:
   ```
       def share_parameters(self, params : Dict):
   ```
   We plan to use the structured based form (like what is used in 
“_collect_params_with_prefix()”) to represent each parameter recursively. For 
example, we denote “self.hidden1.weight” as “hidden_weight”
   
   In all, we plan to make the following improvements:
   
   1. remove parameters “prefix” and “params” in the “\_\_init\_\_" function.
   2. remove the use of self._params(ParameterDict) in Block
   3. allow parameter attribute overriding in non-hydridization case.
   4. add the method “share_parameters" to recursively share parameters in 
children blocks.
   
   ## Detailed improvements
   
   ### For Class Block
   
   1. remove attribute _empty_prefix, _prefix, _params, _profiler_scope_name,  
_name and their corresponding properties.
   ```
   @property
   def params(self):
       """Returns this :py:class:`Block`'s parameter dictionary (does not 
include its
       children's parameters)."""
       return self._reg_params
   ```
   2. use the structured format for self._scope
         a. self._scope is only intended to be used in 
hybridize() internally
   3. implement collect_params with _collect_params_with_prefix
         a.  def _collect_params_with_prefix(self, 
prefix='', select=None):
   4. change the implementation of save_params to save_parameters
   5. move the implementation of ParameterDict.initialize to Block.initialize
   6. move ParameterDict.zero_grad() to Block.zero_grad()
         a. call model.zero_grad() instead of 
model.collect_params().zero_grad()
   7. move ParameterDict.reset_ctx(ctx) to Block.reset_ctx(ctx)
   8. move ParameterDict.setattr to Block.setattr
   9. add method share_parameters(self, params : OrderedDict):
         a. return self
   10. add method set_prefix():
         a. return self
         b. when calling Block.initialize(), set_prefix() 
would be called first internally.
   
   
   ### For Class HybridBlock:
   1. before hybridize() is called, set_prefix would be called first.
   ```
   net = Net()
   net.initialize()
   net.hybridize()
   ```
   or
   ```
   net = Net().set_prefix()
   net.hybridize()
   ```
   
   ### For Parameter:
   1. add attribute self._prefix (which is set automatically by 
Block.set_prefix())
   2. use self._prefix + self._name for attribute self.name 
   
   ### For other children class:
   
   1. remove params and prefix
   2. use Parameter instead of self.params.get like
   ```
   #self.i2h_weight = self.params.get('i2h_weight', shape=(hidden_size, 
input_size),
   #                                  init=i2h_weight_initializer,
   #                                  allow_deferred_init=True)
   self.i2h_weight = Parameter(shape=(hidden_size, input_size), 
init=i2h_weight_initializer,
                                         allow_deferred_init=True)
   ```
   
   ### For _RNNLayer (gluon.rnn.rnn_layer._RNNlayer)
   1. remove the implementation of _unfuse()
   2. remove the implementation of _collect_params_with_prefix() (Seemingly It 
is only for backward compatibility)
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to