[GitHub] [incubator-mxnet] ZheyuYe opened a new issue #18717: [Bug] Wrong parameters dict for weights sharing

GitBox Wed, 15 Jul 2020 00:09:49 -0700


ZheyuYe opened a new issue #18717:
URL: https://github.com/apache/incubator-mxnet/issues/18717



   ## Description
   ```python
   import mxnet as mx
   from mxnet.gluon import HybridBlock, nn
   import tempfile
   import os
   mx.npx.set_np()
   
   
   class Foo(HybridBlock):
       def __init__(self, use_mlm=False):
           super().__init__()
           self.use_mlm = use_mlm
           self.vocab_size = 30522
           self.word_embed = nn.Embedding(input_dim=self.vocab_size,
                                          output_dim=64)
           
           if self.use_mlm:
               self.mlm_decoder = nn.HybridSequential()
               self.mlm_decoder.add(nn.Dense(units=64, flatten=False))
               self.mlm_decoder.add(nn.Dense(units=self.vocab_size, 
flatten=False))
               
self.mlm_decoder[-1].share_parameters(self.word_embed.collect_params())
   
       def hybrid_forward(self, F, x):
           x = self.word_embed(x)
           if self.use_mlm:
               x = self.mlm_decoder(x)
           return x
   
   foo = Foo(use_mlm=True)
   foo.initialize()
   foo(mx.np.ones((8,)))
   foo2 = Foo(use_mlm=False)
   with tempfile.TemporaryDirectory() as dir_path:
       foo.save_parameters(os.path.join(dir_path, 'test.params'),  
deduplicate=True)
       parametes = mx.npx.load(os.path.join(dir_path, 'test.params'))
       print(parametes.keys())
       foo2.load_parameters(os.path.join(dir_path, 'test.params'))
   ```
   Output:
   ```bash
   >>>dict_keys(['l2.weight', 'l2.bias'])
   >>>AssertionError: Parameter 'l1.weight' is missing in 'file: 
/tmp/tmp3a6xslz2/test.params', which contains parameters: 'l2.weight', 
'l2.bias'. Set allow_missing=True to ignore missing parameters.
   ```
   Here `l1` and `l2` are shared and thanks for the flag `deduplicate`, we 
could save shared paremeters only once as well as the dictionary correspondence 
using the last parameter name as key like `dict_keys(['l2.weight', 
'l2.bias'])`. There's nothing wrong with that unless we just load part 
parameters, as `foo2 = Foo(use_mlm=False)`.
   
   Of course we can solve this problem by calling L1 repeatedly instead of 
creating a separate layer `l2` sharing weights with `l1`. The following 
scenario is fairly common in pretraind model with masked language modelling as 
pretrained objective
   ```python
   import mxnet as mx
   from mxnet.gluon import HybridBlock, nn
   import tempfile
   import os
   mx.npx.set_np()
   
   
   class Foo(HybridBlock):
       def __init__(self, use_mlm=False):
           super().__init__()
           self.use_mlm = use_mlm
           self.vocab_size = 30522
           self.word_embed = nn.Embedding(input_dim=self.vocab_size,
                                          output_dim=64)
           
           if self.use_mlm:
               self.mlm_decoder = nn.HybridSequential()
               self.mlm_decoder.add(nn.Dense(units=64, flatten=False))
               self.mlm_decoder.add(nn.Dense(units=self.vocab_size, 
flatten=False))
               
self.mlm_decoder[-1].share_parameters(self.word_embed.collect_params())
   
       def hybrid_forward(self, F, x):
           x = self.word_embed(x)
           if self.use_mlm:
               x = self.mlm_decoder(x)
           return x
   
   foo = Foo(use_mlm=True)
   foo.initialize()
   foo(mx.np.ones((8,)))
   foo2 = Foo(use_mlm=False)
   with tempfile.TemporaryDirectory() as dir_path:
       foo.save_parameters(os.path.join(dir_path, 'test.params'), 
deduplicate=True)
       parametes = mx.npx.load(os.path.join(dir_path, 'test.params'))
       print(parametes.keys())
       foo2.load_parameters(os.path.join(dir_path, 'test.params'))
   ```
   
   ```bash
   >>>dict_keys(['mlm_decoder.1.weight', 'mlm_decoder.0.weight', 
'mlm_decoder.0.bias', 'mlm_decoder.1.bias'])
   ```
   Here `mlm_decoder` is only used in pretraining and woube be discard when 
fine-tuning down-stream tasks. In the `mlm_decoder`, we usually need to predict 
the masked token by mapping back to the `vocab_index` through a dense where 
parameters are shared with `word_embed`. However, saving in this way results in 
parameters without `word_embed.weight`.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [incubator-mxnet] ZheyuYe opened a new issue #18717: [Bug] Wrong parameters dict for weights sharing

Reply via email to