anirudhacharya closed pull request #10662: [MXNET-347] Logical AND operator
URL: https://github.com/apache/incubator-mxnet/pull/10662
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/Jenkinsfile b/Jenkinsfile
index a4bf2c492af..5601c52df1c 100644
--- a/Jenkinsfile
+++ b/Jenkinsfile
@@ -107,6 +107,12 @@ def python3_ut(docker_container_name) {
   }
 }
 
+def python3_ut_mkldnn(docker_container_name) {
+  timeout(time: max_time, unit: 'MINUTES') {
+    sh "ci/build.py --build --platform ${docker_container_name} 
/work/runtime_functions.sh unittest_ubuntu_python3_cpu_mkldnn"
+  }
+}
+
 // GPU test has two parts. 1) run unittest on GPU, 2) compare the results on
 // both CPU and GPU
 // Python 2
@@ -478,7 +484,7 @@ try {
         ws('workspace/ut-python3-mkldnn-cpu') {
           init_git()
           unpack_lib('mkldnn_cpu', mx_mkldnn_lib)
-          python3_ut('ubuntu_cpu')
+          python3_ut_mkldnn('ubuntu_cpu')
         }
       }
     },
diff --git a/ci/docker/runtime_functions.sh b/ci/docker/runtime_functions.sh
index d6a1d6c6490..e8e17d9f1a8 100755
--- a/ci/docker/runtime_functions.sh
+++ b/ci/docker/runtime_functions.sh
@@ -376,6 +376,18 @@ unittest_ubuntu_python3_cpu() {
     nosetests-3.4 --verbose tests/python/quantization
 }
 
+unittest_ubuntu_python3_cpu_mkldnn() {
+    set -ex
+    export PYTHONPATH=./python/ 
+    # MXNET_MKLDNN_DEBUG is buggy and produces false positives
+    # https://github.com/apache/incubator-mxnet/issues/10026
+    #export MXNET_MKLDNN_DEBUG=1  # Ignored if not present
+    export MXNET_STORAGE_FALLBACK_LOG_VERBOSE=0
+    nosetests-3.4 --verbose tests/python/unittest
+    nosetests-3.4 --verbose tests/python/quantization
+    nosetests-3.4 --verbose tests/python/mkl
+}
+
 unittest_ubuntu_python2_gpu() {
     set -ex
     export PYTHONPATH=./python/
diff --git a/docs/tutorials/gluon/custom_layer.md 
b/docs/tutorials/gluon/custom_layer.md
new file mode 100644
index 00000000000..97bdf05aff5
--- /dev/null
+++ b/docs/tutorials/gluon/custom_layer.md
@@ -0,0 +1,248 @@
+
+# How to write a custom layer in Apache MxNet Gluon API
+
+While Gluon API for Apache MxNet comes with [a decent number of pre-defined 
layers](https://mxnet.incubator.apache.org/api/python/gluon/nn.html), at some 
point one may find that a new layer is needed. Adding a new layer in Gluon API 
is straightforward, yet there are a few things that one needs to keep in mind.
+
+In this article, I will cover how to create a new layer from scratch, how to 
use it, what are possible pitfalls and how to avoid them.
+
+## The simplest custom layer
+
+To create a new layer in Gluon API, one must create a class that inherits from 
[Block](https://github.com/apache/incubator-mxnet/blob/master/python/mxnet/gluon/block.py#L123)
 class. This class provides the most basic functionality, and all pre-defined 
layers inherit from it directly or via other subclasses. Because each layer in 
Apache MxNet inherits from `Block`, words "layer" and "block" are used 
interchangeable inside of the Apache MxNet community.
+
+The only instance method needed to be implemented is [forward(self, 
x)](https://github.com/apache/incubator-mxnet/blob/master/python/mxnet/gluon/block.py#L415),
 which defines what exactly your layer is going to do during forward 
propagation. Notice, that it doesn't require to provide what the block should 
do during back propogation. Back propogation pass for blocks is done by Apache 
MxNet for you. 
+
+In the example below, we define a new layer and implement `forward()` method 
to normalize input data by fitting it into a range of [0, 1].
+
+
+```python
+# Do some initial imports used throughout this tutorial 
+from __future__ import print_function
+import mxnet as mx
+from mxnet import nd, gluon, autograd
+from mxnet.gluon.nn import Dense
+mx.random.seed(1)                      # Set seed for reproducable results
+```
+
+
+```python
+class NormalizationLayer(gluon.Block):
+    def __init__(self):
+        super(NormalizationLayer, self).__init__()
+
+    def forward(self, x):
+        return (x - nd.min(x)) / (nd.max(x) - nd.min(x))
+```
+
+The rest of methods of the `Block` class are already implemented, and majority 
of them are used to work with parameters of a block. There is one very special 
method named 
[hybridize()](https://github.com/apache/incubator-mxnet/blob/master/python/mxnet/gluon/block.py#L384),
 though, which I am going to cover before moving to a more complex example of a 
custom layer.
+
+## Hybridization and the difference between Block and HybridBlock
+
+Looking into implementation of [existing 
layers](https://mxnet.incubator.apache.org/api/python/gluon/nn.html), one may 
find that more often a block inherits from a 
[HybridBlock](https://github.com/apache/incubator-mxnet/blob/master/python/mxnet/gluon/block.py#L428),
 instead of directly inheriting from `Block`.
+
+The reason for that is that `HybridBlock` allows to write custom layers that 
can be used in imperative programming as well as in symbolic programming. It is 
convinient to support both ways, because the imperative programming eases the 
debugging of the code and the symbolic one provides faster execution speed. You 
can learn more about the difference between symbolic vs. imperative programming 
from [this 
article](https://mxnet.incubator.apache.org/architecture/program_model.html).
+
+Hybridization is a process that Apache MxNet uses to create a symbolic graph 
of a forward computation. This allows to increase computation performance by 
optimizing the computational symbolic graph. Once the symbolic graph is 
created, Apache MxNet caches and reuses it for subsequent computations.
+
+To simplify support of both imperative and symbolic programming, Apache MxNet 
introduce the `HybridBlock` class. Compare to the `Block` class, `HybridBlock` 
already has its 
[forward()](https://mxnet.incubator.apache.org/api/python/gluon/gluon.html#mxnet.gluon.HybridBlock.forward)
 method implemented, but it defines a 
[hybrid_forward()](https://mxnet.incubator.apache.org/api/python/gluon/gluon.html#mxnet.gluon.HybridBlock.hybrid_forward)
 method that needs to be implemented.
+
+The main difference between `forward()` and `hybrid_forward()` is an `F` 
argument. This argument sometimes is refered as a `backend` in the Apache MxNet 
community. Depending on if hybridization has been done or not, `F` can refer 
either to [mxnet.ndarray 
API](https://mxnet.incubator.apache.org/api/python/ndarray/ndarray.html) or 
[mxnet.symbol 
API](https://mxnet.incubator.apache.org/api/python/symbol/symbol.html). The 
former is used for imperative programming, and the latter for symbolic 
programming. 
+
+To support hybridization, it is important to use only methods avaible directly 
from `F` parameter. Usually, there are equivalent methods in both APIs, but 
sometimes there are mismatches or small variations. For example, by default, 
subtraction and division of NDArrays support broadcasting, while in Symbol API 
broadcasting is supported in a separate operators. 
+
+Knowing this, we can can rewrite our example layer, using HybridBlock:
+
+
+```python
+class NormalizationHybridLayer(gluon.HybridBlock):
+    def __init__(self):
+        super(NormalizationHybridLayer, self).__init__()
+
+    def hybrid_forward(self, F, x):
+        return F.broadcast_div(F.broadcast_sub(x, F.min(x)), 
(F.broadcast_sub(F.max(x), F.min(x))))
+```
+
+Thanks to inheriting from HybridBlock, one can easily do forward pass on a 
given ndarray, either on CPU or GPU:
+
+
+```python
+layer = NormalizationHybridLayer()
+layer(nd.array([1, 2, 3], ctx=mx.cpu()))
+```
+
+
+
+
+    
+    [0.  0.5 1. ]
+    <NDArray 3 @cpu(0)>
+
+
+
+As a rule of thumb, one should always implement custom layers by inheriting 
from `HybridBlock`. This allows to have more flexibility, and doesn't affect 
execution speed once hybridization is done. 
+
+Unfortunately, at the moment of writing this tutorial, NLP related layers such 
as 
[RNN](https://mxnet.incubator.apache.org/api/python/gluon/rnn.html#mxnet.gluon.rnn.RNN),
 
[GRU](https://mxnet.incubator.apache.org/api/python/gluon/rnn.html#mxnet.gluon.rnn.GRU)
 and 
[LSTM](https://mxnet.incubator.apache.org/api/python/gluon/rnn.html#mxnet.gluon.rnn.LSTM)
 are directly inhereting from the `Block` class via common `_RNNLayer` class. 
That means that networks with such layers cannot be hybridized. But this might 
change in the future, so stay tuned.
+
+It is important to notice that hybridization has nothing to do with 
computation on GPU. One can train both hybridized and non-hybridized networks 
on both CPU and GPU, though hybridized networks would work faster. Though, it 
is hard to say in advance how much faster it is going to be.
+
+## Adding a custom layer to a network
+
+While it is possible, custom layers are rarely used separately. Most often 
they are used with predefined layers to create a neural network. Output of one 
layer is used as an input of another layer.
+
+Depending on which class you used as a base one, you can use either 
[Sequential](https://mxnet.incubator.apache.org/api/python/gluon/gluon.html#mxnet.gluon.nn.Sequential)
 or 
[HybridSequential](https://mxnet.incubator.apache.org/api/python/gluon/gluon.html#mxnet.gluon.nn.HybridSequential)
 container to form a sequential neural network. By adding layers one by one, 
one adds dependencies of one layer's input from another layer's output. It is 
worth noting, that both `Sequential` and `HybridSequential` containers inherit 
from `Block` and `HybridBlock` respectively. 
+
+Below is an example of how to create a simple neural network with a custom 
layer. In this example, `NormalizationHybridLayer` gets as an input the output 
from `Dense(5)` layer and pass its output as an input to `Dense(1)` layer.
+
+
+```python
+net = gluon.nn.HybridSequential()                         # Define a Neural 
Network as a sequence of hybrid blocks
+with net.name_scope():                                    # Used to 
disambiguate saving and loading net parameters
+    net.add(Dense(5))                                     # Add Dense layer 
with 5 neurons
+    net.add(NormalizationHybridLayer())                   # Add our custom 
layer
+    net.add(Dense(1))                                     # Add Dense layer 
with 1 neurons
+
+
+net.initialize(mx.init.Xavier(magnitude=2.24))            # Initialize 
parameters of all layers
+net.hybridize()                                           # Create, optimize 
and cache computational graph
+input = nd.random_uniform(low=-10, high=10, shape=(5, 2)) # Create 5 random 
examples with 2 feature each in range [-10, 10]
+net(input)
+```
+
+
+
+
+    
+    [[-0.13601446]
+     [ 0.26103732]
+     [-0.05046433]
+     [-1.2375476 ]
+     [-0.15506986]]
+    <NDArray 5x1 @cpu(0)>
+
+
+
+## Parameters of a custom layer
+
+Usually, a layer has a set of associated parameters, sometimes also referred 
as weights. This is an internal state of a layer. Most often, these parameters 
are the ones, that we want to learn during backpropogation step, but sometimes 
these parameters might be just constants we want to use during forward pass.
+
+All parameters of a block are stored and accessed via 
[ParameterDict](https://github.com/apache/incubator-mxnet/blob/master/python/mxnet/gluon/parameter.py#L508)
 class. This class helps with initialization, updating, saving and loading of 
the parameters. Each layer can have multiple set of parameters, and all of them 
can be stored in a single instance of the `ParameterDict` class. On a block 
level, the instance of the `ParameterDict` class is accessible via 
`self.params` field, and outside of a block one can access all parameters of 
the network via 
[collect_params()](https://mxnet.incubator.apache.org/api/python/gluon/gluon.html#mxnet.gluon.Block.collect_params)
 method called on a `container`. `ParameterDict` uses 
[Parameter](https://mxnet.incubator.apache.org/api/python/gluon/gluon.html#mxnet.gluon.Parameter)
 class to represent parameters inside of Apache MxNet neural network. If 
parameter doesn't exist, trying to get a parameter via `self.params` will 
create it automatically.
+
+
+```python
+class NormalizationHybridLayer(gluon.HybridBlock):
+    def __init__(self, hidden_units, scales):
+        super(NormalizationHybridLayer, self).__init__()
+
+        with self.name_scope():
+            self.weights = self.params.get('weights',
+                                           shape=(hidden_units, 0),
+                                           allow_deferred_init=True)
+
+            self.scales = self.params.get('scales',
+                                      shape=scales.shape,
+                                      
init=mx.init.Constant(scales.asnumpy().tolist()), # Convert to regular list to 
make this object serializable
+                                      differentiable=False)
+            
+    def hybrid_forward(self, F, x, weights, scales):
+        normalized_data = F.broadcast_div(F.broadcast_sub(x, F.min(x)), 
(F.broadcast_sub(F.max(x), F.min(x))))
+        weighted_data = F.FullyConnected(normalized_data, weights, 
num_hidden=self.weights.shape[0], no_bias=True)
+        scaled_data = F.broadcast_mul(scales, weighted_data)
+        return scaled_data
+```
+
+In the example above 2 set of parameters are defined:
+1. Parameter `weights` is trainable. Its shape is unknown during construction 
phase and will be infered on the first run of forward propogation; 
+1. Parameter `scale` is a constant that doesn't change. Its shape is defined 
during construction.
+
+Notice a few aspects of this code:
+* `name_scope()` method is used to add a prefix to parameter names during 
saving and loading
+* Shape is not provided when creating `weights`. Instead it is going to be 
infered from the shape of the input
+* `Scales` parameter is initialized and marked as `differentiable=False`.
+* `F` backend is used for all calculations
+* The calculation of dot product is done using `F.FullyConnected()` method 
instead of `F.dot()` method. The one was chosen over another because the former 
supports automatic infering shapes of inputs while the latter doesn't. This is 
extremely important to know, if one doesn't want to hard code all the shapes. 
The best way to learn what operators supports automatic inference of input 
shapes at the moment is browsing C++ implementation of operators to see if one 
uses a method `SHAPE_ASSIGN_CHECK(*in_shape, fullc::kWeight, 
Shape2(param.num_hidden, num_input));`
+* `hybrid_forward()` method signature has changed. It accepts two new 
arguments: `weights` and `scales`.
+
+The last peculiarity is due to support of imperative and symbolic programming 
by `HybridBlock`. During training phase, parameters are passed to the layer by 
Apache MxNet framework as additional arguments to the method, because they 
might need to be converted to a `Symbol` depending on if the layer was 
hybridized. One shouldn't use `self.weights` and `self.scales` or 
`self.params.get` in `hybrid_forward` except to get shapes of parameters. 
+
+Running forward pass on this network is very similar to the previous example, 
so instead of just doing one forward pass, let's run whole training for a few 
epochs to show that `scales` parameter doesn't change during the training while 
`weights` parameter is changing.
+
+
+```python
+def print_params(title, net):
+    """
+    Helper function to print out the state of parameters of 
NormalizationHybridLayer
+    """
+    print(title)
+    hybridlayer_params = {k: v for k, v in net.collect_params().items() if 
'normalizationhybridlayer' in k }
+    
+    for key, value in hybridlayer_params.items():
+        print('{} = {}\n'.format(key, value.data()))
+
+net = gluon.nn.HybridSequential()                             # Define a 
Neural Network as a sequence of hybrid blocks
+with net.name_scope():                                        # Used to 
disambiguate saving and loading net parameters
+    net.add(Dense(5))                                         # Add Dense 
layer with 5 neurons
+    net.add(NormalizationHybridLayer(hidden_units=5, 
+                                     scales = nd.array([2]))) # Add our custom 
layer
+    net.add(Dense(1))                                         # Add Dense 
layer with 1 neurons
+
+
+net.initialize(mx.init.Xavier(magnitude=2.24))                # Initialize 
parameters of all layers
+net.hybridize()                                               # Create, 
optimize and cache computational graph
+
+input = nd.random_uniform(low=-10, high=10, shape=(5, 2))     # Create 5 
random examples with 2 feature each in range [-10, 10]
+label = nd.random_uniform(low=-1, high=1, shape=(5, 1))
+
+mse_loss = gluon.loss.L2Loss()                                # Mean squared 
error between output and label
+trainer = gluon.Trainer(net.collect_params(),                 # Init trainer 
with Stochastic Gradient Descent (sgd) optimization method and parameters for it
+                        'sgd', 
+                        {'learning_rate': 0.1, 'momentum': 0.9 })
+                        
+with autograd.record():                                       # Autograd 
records computations done on NDArrays inside "with" block 
+    output = net(input)                                       # Run forward 
propogation
+    
+    print_params("=========== Parameters after forward pass ===========\n", 
net)    
+    loss = mse_loss(output, label)                            # Calculate MSE
+    
+loss.backward()                                               # Backward 
computes gradients and stores them as a separate array within each NDArray in 
.grad field
+trainer.step(input.shape[0])                                  # Trainer 
updates parameters of every block, using .grad field using oprimization method 
(sgd in this example)
+                                                              # We provide 
batch size that is used as a divider in cost function formula
+print_params("=========== Parameters after backward pass ===========\n", net)
+
+```
+
+    =========== Parameters after forward pass ===========
+    
+    hybridsequential94_normalizationhybridlayer0_weights = 
+    [[-0.3983642  -0.505708   -0.02425683 -0.3133553  -0.35161012]
+     [ 0.6467543   0.3918715  -0.6154656  -0.20702496 -0.4243446 ]
+     [ 0.6077331   0.03922009  0.13425875  0.5729856  -0.14446527]
+     [-0.3572498   0.18545026 -0.09098256  0.5106366  -0.35151464]
+     [-0.39846328  0.22245121  0.13075739  0.33387476 -0.10088372]]
+    <NDArray 5x5 @cpu(0)>
+    
+    hybridsequential94_normalizationhybridlayer0_scales = 
+    [2.]
+    <NDArray 1 @cpu(0)>
+    
+    =========== Parameters after backward pass ===========
+    
+    hybridsequential94_normalizationhybridlayer0_weights = 
+    [[-0.29839832 -0.47213346  0.08348035 -0.2324698  -0.27368504]
+     [ 0.76268613  0.43080837 -0.49052125 -0.11322092 -0.3339738 ]
+     [ 0.48665082 -0.00144657  0.00376363  0.47501418 -0.23885089]
+     [-0.22626656  0.22944227  0.05018325  0.6166192  -0.24941102]
+     [-0.44946212  0.20532274  0.07579394  0.29261002 -0.14063817]]
+    <NDArray 5x5 @cpu(0)>
+    
+    hybridsequential94_normalizationhybridlayer0_scales = 
+    [2.]
+    <NDArray 1 @cpu(0)>
+    
+
+
+As it is seen from the output above, `weights` parameter has been changed by 
the training and `scales` not.
+
+## Conclusion
+
+One important quality of a Deep learning framework is extensibility. Empowered 
by flexible abstractions, like `Block` and `HybridBlock`, one can easily extend 
Apache MxNet functionality to match its needs.
+
+<!-- INSERT SOURCE DOWNLOAD BUTTONS -->
diff --git a/docs/tutorials/index.md b/docs/tutorials/index.md
index 04b7893c619..94ea050b986 100644
--- a/docs/tutorials/index.md
+++ b/docs/tutorials/index.md
@@ -98,7 +98,7 @@ The Gluon and Module tutorials are in Python, but you can 
also find a variety of
 
 - [Plumbing: A look under the hood of 
gluon](http://gluon.mxnet.io/chapter03_deep-neural-networks/plumbing.html)
 
-- [Designing a custom layer with 
gluon](http://gluon.mxnet.io/chapter03_deep-neural-networks/custom-layer.html)
+- [Designing a custom layer with gluon](/tutorials/gluon/custom_layer.html)
 
 - [Block and Parameter naming](/tutorials/gluon/naming.html)
 
diff --git a/docs/tutorials/sparse/train.md b/docs/tutorials/sparse/train.md
index 0232281608e..ce7020553c2 100644
--- a/docs/tutorials/sparse/train.md
+++ b/docs/tutorials/sparse/train.md
@@ -36,8 +36,13 @@ We can specify the `stype` of a variable as "csr" or 
"row_sparse" to hold sparse
 
 ```python
 import mxnet as mx
+import numpy as np
+import random
 
-mx.random.seed(42) # set the seed for repeatability
+# set the seeds for repeatability
+random.seed(42)
+np.random.seed(42)
+mx.random.seed(42) 
 
 # Create a variable to hold an NDArray
 a = mx.sym.Variable('a')
diff --git a/python/mxnet/ndarray/sparse.py b/python/mxnet/ndarray/sparse.py
index 363ed9deb06..c7355c2e46d 100644
--- a/python/mxnet/ndarray/sparse.py
+++ b/python/mxnet/ndarray/sparse.py
@@ -400,7 +400,7 @@ def __setitem__(self, key, value):
                [ 0.,  0.,  0.],
                [ 0.,  0.,  0.]], dtype=float32)
         >>> # assign CSRNDArray with same storage type
-        >>> x = mx.nd.ones('row_sparse', (3,3)).tostype('csr')
+        >>> x = mx.nd.ones((3,3)).tostype('csr')
         >>> x[:] = src
         >>> x.asnumpy()
         array([[ 1.,  1.,  1.],
diff --git 
a/scala-package/core/src/main/scala/org/apache/mxnet/Visualization.scala 
b/scala-package/core/src/main/scala/org/apache/mxnet/Visualization.scala
index ee2f10c9dc7..6ecc3cadbc2 100644
--- a/scala-package/core/src/main/scala/org/apache/mxnet/Visualization.scala
+++ b/scala-package/core/src/main/scala/org/apache/mxnet/Visualization.scala
@@ -228,7 +228,7 @@ object Visualization {
       val op = params("op").asInstanceOf[String]
       val name = params("name").asInstanceOf[String]
       val attrs = {
-        if (params.contains("attr")) params("attr").asInstanceOf[Map[String, 
String]]
+        if (params.contains("attrs")) params("attrs").asInstanceOf[Map[String, 
String]]
         else Map[String, String]()
       }
       // input data
diff --git a/src/operator/mshadow_op.h b/src/operator/mshadow_op.h
index 2f5dd97d7b6..7748b04bd36 100644
--- a/src/operator/mshadow_op.h
+++ b/src/operator/mshadow_op.h
@@ -317,6 +317,8 @@ MXNET_BINARY_MATH_OP_NC(eq, a == b ? DType(1) : DType(0));
 
 MXNET_BINARY_MATH_OP_NC(ne, a != b ? DType(1) : DType(0));
 
+MXNET_BINARY_MATH_OP(logical_and, a && b ? DType(1) : DType(0));
+
 MXNET_UNARY_MATH_OP(square_root, math::sqrt(a));
 
 MXNET_UNARY_MATH_OP(square_root_grad, 0.5f / math::id(a));
diff --git a/src/operator/nn/mkldnn/mkldnn_pooling-inl.h 
b/src/operator/nn/mkldnn/mkldnn_pooling-inl.h
index 2097d57ba92..4b6235ec446 100644
--- a/src/operator/nn/mkldnn/mkldnn_pooling-inl.h
+++ b/src/operator/nn/mkldnn/mkldnn_pooling-inl.h
@@ -92,12 +92,18 @@ inline bool SupportMKLDNNPooling(const PoolingParam &param,
 
   if (param.pooling_convention == pool_enum::kValid)
     return true;
+  else
+    return false;
 
+// need to support pooling convention full
+// https://issues.apache.org/jira/browse/MXNET-33
+#if 0
   if (((dshape[2] + 2 * param.pad[0] - param.kernel[0]) % param.stride[0] == 
0) &&
       ((dshape[3] + 2 * param.pad[1] - param.kernel[1]) % param.stride[1] == 
0))
     return true;
   else
     return false;
+#endif
 }
 
 inline bool MKLDNNRequireWorkspace(const PoolingParam &param) {
diff --git a/src/operator/operator_tune.cc b/src/operator/operator_tune.cc
index 47db78bc188..dc17080b365 100644
--- a/src/operator/operator_tune.cc
+++ b/src/operator/operator_tune.cc
@@ -342,6 +342,8 @@ IMPLEMENT_BINARY_WORKLOAD_FWD(mxnet::op::mshadow_op::ne);  
// NOLINT()
 IMPLEMENT_BINARY_WORKLOAD_BWD(mxnet::op::mshadow_op::ne);  // NOLINT()
 IMPLEMENT_BINARY_WORKLOAD_FWD(mxnet::op::mshadow_op::eq);  // NOLINT()
 IMPLEMENT_BINARY_WORKLOAD_BWD(mxnet::op::mshadow_op::eq);  // NOLINT()
+IMPLEMENT_BINARY_WORKLOAD_FWD(mxnet::op::mshadow_op::logical_and);  // NOLINT()
+IMPLEMENT_BINARY_WORKLOAD_BWD(mxnet::op::mshadow_op::logical_and);  // NOLINT()
 IMPLEMENT_BINARY_WORKLOAD_FWD(mxnet::op::mshadow_op::smooth_l1_loss);  // 
NOLINT()
 IMPLEMENT_BINARY_WORKLOAD_BWD(mxnet::op::mshadow_op::smooth_l1_gradient);  // 
NOLINT()
 IMPLEMENT_BLANK_WORKLOAD_FWD(mxnet::op::mxnet_op::set_to_int<0>);  // NOLINT()
diff --git a/src/operator/tensor/cast_storage-inl.cuh 
b/src/operator/tensor/cast_storage-inl.cuh
index c441341eafd..a5b4df53843 100644
--- a/src/operator/tensor/cast_storage-inl.cuh
+++ b/src/operator/tensor/cast_storage-inl.cuh
@@ -436,6 +436,8 @@ struct CastDnsCsrColIdxAndValsBlockKernel {
             nnz++;
           }
         }
+        // make sure k was updated using block_nnz in the previous iter
+        __syncthreads();
         if (threadIdx.x == kBaseThreadNum-1) {
           block_nnz = nnz;
         }
diff --git a/src/operator/tensor/elemwise_binary_broadcast_op_logic.cc 
b/src/operator/tensor/elemwise_binary_broadcast_op_logic.cc
index 31f34bbc28e..99bcb2864bf 100644
--- a/src/operator/tensor/elemwise_binary_broadcast_op_logic.cc
+++ b/src/operator/tensor/elemwise_binary_broadcast_op_logic.cc
@@ -137,5 +137,23 @@ Example::
 .set_attr<FCompute>("FCompute<cpu>", BinaryBroadcastCompute<cpu, 
mshadow_op::le>)
 .set_attr<nnvm::FGradient>("FGradient", MakeZeroGradNodes);
 
+MXNET_OPERATOR_REGISTER_BINARY_BROADCAST(broadcast_logical_and)
+.describe(R"code(Returns the result of element-wise **logical and** with 
broadcasting.
+
+Example::
+
+   x = [[ 1.,  1.,  1.],
+        [ 1.,  1.,  1.]]
+
+   y = [[ 0.],
+        [ 1.]]
+
+   broadcast_logical_and(x, y) = [[ 0.,  0.,  0.],
+                                   [ 1.,  1.,  1.]]
+
+)code" ADD_FILELINE)
+.set_attr<FCompute>("FCompute<cpu>", BinaryBroadcastCompute<cpu, 
mshadow_op::logical_and>)
+.set_attr<nnvm::FGradient>("FGradient", MakeZeroGradNodes);
+
 }  // namespace op
 }  // namespace mxnet
diff --git a/src/operator/tensor/elemwise_binary_broadcast_op_logic.cu 
b/src/operator/tensor/elemwise_binary_broadcast_op_logic.cu
index 4e80ae9572e..d547db7833b 100644
--- a/src/operator/tensor/elemwise_binary_broadcast_op_logic.cu
+++ b/src/operator/tensor/elemwise_binary_broadcast_op_logic.cu
@@ -47,5 +47,9 @@ NNVM_REGISTER_OP(broadcast_lesser)
 NNVM_REGISTER_OP(broadcast_lesser_equal)
 .set_attr<FCompute>("FCompute<gpu>", BinaryBroadcastCompute<gpu, 
mshadow_op::le>);
 
+// logical and
+NNVM_REGISTER_OP(broadcast_logical_and)
+.set_attr<FCompute>("FCompute<gpu>", BinaryBroadcastCompute<gpu, 
mshadow_op::logical_and>);
+
 }  // namespace op
 }  // namespace mxnet
diff --git a/tests/python/mkl/data/test_mkldnn_test_mkldnn_model_model1.json 
b/tests/python/mkl/data/test_mkldnn_test_mkldnn_model_model1.json
new file mode 100644
index 00000000000..ba822f57848
--- /dev/null
+++ b/tests/python/mkl/data/test_mkldnn_test_mkldnn_model_model1.json
@@ -0,0 +1,770 @@
+{
+  "nodes": [
+    {
+      "op": "null", 
+      "name": "data", 
+      "inputs": []
+    }, 
+    {
+      "op": "null", 
+      "name": "conv1_1_weight", 
+      "attrs": {
+        "kernel": "(3, 3)", 
+        "num_filter": "64", 
+        "pad": "(1, 1)"
+      }, 
+      "inputs": []
+    }, 
+    {
+      "op": "null", 
+      "name": "conv1_1_bias", 
+      "attrs": {
+        "kernel": "(3, 3)", 
+        "num_filter": "64", 
+        "pad": "(1, 1)"
+      }, 
+      "inputs": []
+    }, 
+    {
+      "op": "Convolution", 
+      "name": "conv1_1", 
+      "attrs": {
+        "kernel": "(3, 3)", 
+        "num_filter": "64", 
+        "pad": "(1, 1)"
+      }, 
+      "inputs": [[0, 0, 0], [1, 0, 0], [2, 0, 0]]
+    }, 
+    {
+      "op": "Activation", 
+      "name": "relu1_1", 
+      "attrs": {"act_type": "relu"}, 
+      "inputs": [[3, 0, 0]]
+    }, 
+    {
+      "op": "null", 
+      "name": "conv1_2_weight", 
+      "attrs": {
+        "kernel": "(3, 3)", 
+        "num_filter": "64", 
+        "pad": "(1, 1)"
+      }, 
+      "inputs": []
+    }, 
+    {
+      "op": "null", 
+      "name": "conv1_2_bias", 
+      "attrs": {
+        "kernel": "(3, 3)", 
+        "num_filter": "64", 
+        "pad": "(1, 1)"
+      }, 
+      "inputs": []
+    }, 
+    {
+      "op": "Convolution", 
+      "name": "conv1_2", 
+      "attrs": {
+        "kernel": "(3, 3)", 
+        "num_filter": "64", 
+        "pad": "(1, 1)"
+      }, 
+      "inputs": [[4, 0, 0], [5, 0, 0], [6, 0, 0]]
+    }, 
+    {
+      "op": "Activation", 
+      "name": "relu1_2", 
+      "attrs": {"act_type": "relu"}, 
+      "inputs": [[7, 0, 0]]
+    }, 
+    {
+      "op": "Pooling", 
+      "name": "pool1", 
+      "attrs": {
+        "kernel": "(2, 2)", 
+        "pool_type": "max", 
+        "stride": "(2, 2)"
+      }, 
+      "inputs": [[8, 0, 0]]
+    }, 
+    {
+      "op": "null", 
+      "name": "conv2_1_weight", 
+      "attrs": {
+        "kernel": "(3, 3)", 
+        "num_filter": "128", 
+        "pad": "(1, 1)"
+      }, 
+      "inputs": []
+    }, 
+    {
+      "op": "null", 
+      "name": "conv2_1_bias", 
+      "attrs": {
+        "kernel": "(3, 3)", 
+        "num_filter": "128", 
+        "pad": "(1, 1)"
+      }, 
+      "inputs": []
+    }, 
+    {
+      "op": "Convolution", 
+      "name": "conv2_1", 
+      "attrs": {
+        "kernel": "(3, 3)", 
+        "num_filter": "128", 
+        "pad": "(1, 1)"
+      }, 
+      "inputs": [[9, 0, 0], [10, 0, 0], [11, 0, 0]]
+    }, 
+    {
+      "op": "Activation", 
+      "name": "relu2_1", 
+      "attrs": {"act_type": "relu"}, 
+      "inputs": [[12, 0, 0]]
+    }, 
+    {
+      "op": "null", 
+      "name": "conv2_2_weight", 
+      "attrs": {
+        "kernel": "(3, 3)", 
+        "num_filter": "128", 
+        "pad": "(1, 1)"
+      }, 
+      "inputs": []
+    }, 
+    {
+      "op": "null", 
+      "name": "conv2_2_bias", 
+      "attrs": {
+        "kernel": "(3, 3)", 
+        "num_filter": "128", 
+        "pad": "(1, 1)"
+      }, 
+      "inputs": []
+    }, 
+    {
+      "op": "Convolution", 
+      "name": "conv2_2", 
+      "attrs": {
+        "kernel": "(3, 3)", 
+        "num_filter": "128", 
+        "pad": "(1, 1)"
+      }, 
+      "inputs": [[13, 0, 0], [14, 0, 0], [15, 0, 0]]
+    }, 
+    {
+      "op": "Activation", 
+      "name": "relu2_2", 
+      "attrs": {"act_type": "relu"}, 
+      "inputs": [[16, 0, 0]]
+    }, 
+    {
+      "op": "Pooling", 
+      "name": "pool2", 
+      "attrs": {
+        "kernel": "(2, 2)", 
+        "pool_type": "max", 
+        "stride": "(2, 2)"
+      }, 
+      "inputs": [[17, 0, 0]]
+    }, 
+    {
+      "op": "null", 
+      "name": "conv3_1_weight", 
+      "attrs": {
+        "kernel": "(3, 3)", 
+        "num_filter": "256", 
+        "pad": "(1, 1)"
+      }, 
+      "inputs": []
+    }, 
+    {
+      "op": "null", 
+      "name": "conv3_1_bias", 
+      "attrs": {
+        "kernel": "(3, 3)", 
+        "num_filter": "256", 
+        "pad": "(1, 1)"
+      }, 
+      "inputs": []
+    }, 
+    {
+      "op": "Convolution", 
+      "name": "conv3_1", 
+      "attrs": {
+        "kernel": "(3, 3)", 
+        "num_filter": "256", 
+        "pad": "(1, 1)"
+      }, 
+      "inputs": [[18, 0, 0], [19, 0, 0], [20, 0, 0]]
+    }, 
+    {
+      "op": "Activation", 
+      "name": "relu3_1", 
+      "attrs": {"act_type": "relu"}, 
+      "inputs": [[21, 0, 0]]
+    }, 
+    {
+      "op": "null", 
+      "name": "conv3_2_weight", 
+      "attrs": {
+        "kernel": "(3, 3)", 
+        "num_filter": "256", 
+        "pad": "(1, 1)"
+      }, 
+      "inputs": []
+    }, 
+    {
+      "op": "null", 
+      "name": "conv3_2_bias", 
+      "attrs": {
+        "kernel": "(3, 3)", 
+        "num_filter": "256", 
+        "pad": "(1, 1)"
+      }, 
+      "inputs": []
+    }, 
+    {
+      "op": "Convolution", 
+      "name": "conv3_2", 
+      "attrs": {
+        "kernel": "(3, 3)", 
+        "num_filter": "256", 
+        "pad": "(1, 1)"
+      }, 
+      "inputs": [[22, 0, 0], [23, 0, 0], [24, 0, 0]]
+    }, 
+    {
+      "op": "Activation", 
+      "name": "relu3_2", 
+      "attrs": {"act_type": "relu"}, 
+      "inputs": [[25, 0, 0]]
+    }, 
+    {
+      "op": "null", 
+      "name": "conv3_3_weight", 
+      "attrs": {
+        "kernel": "(3, 3)", 
+        "num_filter": "256", 
+        "pad": "(1, 1)"
+      }, 
+      "inputs": []
+    }, 
+    {
+      "op": "null", 
+      "name": "conv3_3_bias", 
+      "attrs": {
+        "kernel": "(3, 3)", 
+        "num_filter": "256", 
+        "pad": "(1, 1)"
+      }, 
+      "inputs": []
+    }, 
+    {
+      "op": "Convolution", 
+      "name": "conv3_3", 
+      "attrs": {
+        "kernel": "(3, 3)", 
+        "num_filter": "256", 
+        "pad": "(1, 1)"
+      }, 
+      "inputs": [[26, 0, 0], [27, 0, 0], [28, 0, 0]]
+    }, 
+    {
+      "op": "Activation", 
+      "name": "relu3_3", 
+      "attrs": {"act_type": "relu"}, 
+      "inputs": [[29, 0, 0]]
+    }, 
+    {
+      "op": "Pooling", 
+      "name": "pool3", 
+      "attrs": {
+        "kernel": "(2, 2)", 
+        "pool_type": "max", 
+        "pooling_convention": "full", 
+        "stride": "(2, 2)"
+      }, 
+      "inputs": [[30, 0, 0]]
+    }, 
+    {
+      "op": "null", 
+      "name": "conv4_1_weight", 
+      "attrs": {
+        "kernel": "(3, 3)", 
+        "num_filter": "512", 
+        "pad": "(1, 1)"
+      }, 
+      "inputs": []
+    }, 
+    {
+      "op": "null", 
+      "name": "conv4_1_bias", 
+      "attrs": {
+        "kernel": "(3, 3)", 
+        "num_filter": "512", 
+        "pad": "(1, 1)"
+      }, 
+      "inputs": []
+    }, 
+    {
+      "op": "Convolution", 
+      "name": "conv4_1", 
+      "attrs": {
+        "kernel": "(3, 3)", 
+        "num_filter": "512", 
+        "pad": "(1, 1)"
+      }, 
+      "inputs": [[31, 0, 0], [32, 0, 0], [33, 0, 0]]
+    }, 
+    {
+      "op": "Activation", 
+      "name": "relu4_1", 
+      "attrs": {"act_type": "relu"}, 
+      "inputs": [[34, 0, 0]]
+    }, 
+    {
+      "op": "null", 
+      "name": "conv4_2_weight", 
+      "attrs": {
+        "kernel": "(3, 3)", 
+        "num_filter": "512", 
+        "pad": "(1, 1)"
+      }, 
+      "inputs": []
+    }, 
+    {
+      "op": "null", 
+      "name": "conv4_2_bias", 
+      "attrs": {
+        "kernel": "(3, 3)", 
+        "num_filter": "512", 
+        "pad": "(1, 1)"
+      }, 
+      "inputs": []
+    }, 
+    {
+      "op": "Convolution", 
+      "name": "conv4_2", 
+      "attrs": {
+        "kernel": "(3, 3)", 
+        "num_filter": "512", 
+        "pad": "(1, 1)"
+      }, 
+      "inputs": [[35, 0, 0], [36, 0, 0], [37, 0, 0]]
+    }, 
+    {
+      "op": "Activation", 
+      "name": "relu4_2", 
+      "attrs": {"act_type": "relu"}, 
+      "inputs": [[38, 0, 0]]
+    }, 
+    {
+      "op": "null", 
+      "name": "conv4_3_weight", 
+      "attrs": {
+        "kernel": "(3, 3)", 
+        "num_filter": "512", 
+        "pad": "(1, 1)"
+      }, 
+      "inputs": []
+    }, 
+    {
+      "op": "null", 
+      "name": "conv4_3_bias", 
+      "attrs": {
+        "kernel": "(3, 3)", 
+        "num_filter": "512", 
+        "pad": "(1, 1)"
+      }, 
+      "inputs": []
+    }, 
+    {
+      "op": "Convolution", 
+      "name": "conv4_3", 
+      "attrs": {
+        "kernel": "(3, 3)", 
+        "num_filter": "512", 
+        "pad": "(1, 1)"
+      }, 
+      "inputs": [[39, 0, 0], [40, 0, 0], [41, 0, 0]]
+    }, 
+    {
+      "op": "Activation", 
+      "name": "relu4_3", 
+      "attrs": {"act_type": "relu"}, 
+      "inputs": [[42, 0, 0]]
+    }, 
+    {
+      "op": "Pooling", 
+      "name": "pool4", 
+      "attrs": {
+        "kernel": "(2, 2)", 
+        "pool_type": "max", 
+        "stride": "(2, 2)"
+      }, 
+      "inputs": [[43, 0, 0]]
+    }, 
+    {
+      "op": "null", 
+      "name": "conv5_1_weight", 
+      "attrs": {
+        "kernel": "(3, 3)", 
+        "num_filter": "512", 
+        "pad": "(1, 1)"
+      }, 
+      "inputs": []
+    }, 
+    {
+      "op": "null", 
+      "name": "conv5_1_bias", 
+      "attrs": {
+        "kernel": "(3, 3)", 
+        "num_filter": "512", 
+        "pad": "(1, 1)"
+      }, 
+      "inputs": []
+    }, 
+    {
+      "op": "Convolution", 
+      "name": "conv5_1", 
+      "attrs": {
+        "kernel": "(3, 3)", 
+        "num_filter": "512", 
+        "pad": "(1, 1)"
+      }, 
+      "inputs": [[44, 0, 0], [45, 0, 0], [46, 0, 0]]
+    }, 
+    {
+      "op": "Activation", 
+      "name": "relu5_1", 
+      "attrs": {"act_type": "relu"}, 
+      "inputs": [[47, 0, 0]]
+    }, 
+    {
+      "op": "null", 
+      "name": "conv5_2_weight", 
+      "attrs": {
+        "kernel": "(3, 3)", 
+        "num_filter": "512", 
+        "pad": "(1, 1)"
+      }, 
+      "inputs": []
+    }, 
+    {
+      "op": "null", 
+      "name": "conv5_2_bias", 
+      "attrs": {
+        "kernel": "(3, 3)", 
+        "num_filter": "512", 
+        "pad": "(1, 1)"
+      }, 
+      "inputs": []
+    }, 
+    {
+      "op": "Convolution", 
+      "name": "conv5_2", 
+      "attrs": {
+        "kernel": "(3, 3)", 
+        "num_filter": "512", 
+        "pad": "(1, 1)"
+      }, 
+      "inputs": [[48, 0, 0], [49, 0, 0], [50, 0, 0]]
+    }, 
+    {
+      "op": "Activation", 
+      "name": "relu5_2", 
+      "attrs": {"act_type": "relu"}, 
+      "inputs": [[51, 0, 0]]
+    }, 
+    {
+      "op": "null", 
+      "name": "conv5_3_weight", 
+      "attrs": {
+        "kernel": "(3, 3)", 
+        "num_filter": "512", 
+        "pad": "(1, 1)"
+      }, 
+      "inputs": []
+    }, 
+    {
+      "op": "null", 
+      "name": "conv5_3_bias", 
+      "attrs": {
+        "kernel": "(3, 3)", 
+        "num_filter": "512", 
+        "pad": "(1, 1)"
+      }, 
+      "inputs": []
+    }, 
+    {
+      "op": "Convolution", 
+      "name": "conv5_3", 
+      "attrs": {
+        "kernel": "(3, 3)", 
+        "num_filter": "512", 
+        "pad": "(1, 1)"
+      }, 
+      "inputs": [[52, 0, 0], [53, 0, 0], [54, 0, 0]]
+    }, 
+    {
+      "op": "Activation", 
+      "name": "relu5_3", 
+      "attrs": {"act_type": "relu"}, 
+      "inputs": [[55, 0, 0]]
+    }, 
+    {
+      "op": "Pooling", 
+      "name": "pool5", 
+      "attrs": {
+        "kernel": "(3, 3)", 
+        "pad": "(1, 1)", 
+        "pool_type": "max", 
+        "stride": "(1, 1)"
+      }, 
+      "inputs": [[56, 0, 0]]
+    }, 
+    {
+      "op": "null", 
+      "name": "fc6_weight", 
+      "attrs": {
+        "dilate": "(6, 6)", 
+        "kernel": "(3, 3)", 
+        "num_filter": "1024", 
+        "pad": "(6, 6)"
+      }, 
+      "inputs": []
+    }, 
+    {
+      "op": "null", 
+      "name": "fc6_bias", 
+      "attrs": {
+        "dilate": "(6, 6)", 
+        "kernel": "(3, 3)", 
+        "num_filter": "1024", 
+        "pad": "(6, 6)"
+      }, 
+      "inputs": []
+    }, 
+    {
+      "op": "Convolution", 
+      "name": "fc6", 
+      "attrs": {
+        "dilate": "(6, 6)", 
+        "kernel": "(3, 3)", 
+        "num_filter": "1024", 
+        "pad": "(6, 6)"
+      }, 
+      "inputs": [[57, 0, 0], [58, 0, 0], [59, 0, 0]]
+    }, 
+    {
+      "op": "Activation", 
+      "name": "relu6", 
+      "attrs": {"act_type": "relu"}, 
+      "inputs": [[60, 0, 0]]
+    }, 
+    {
+      "op": "null", 
+      "name": "fc7_weight", 
+      "attrs": {
+        "kernel": "(1, 1)", 
+        "num_filter": "1024", 
+        "pad": "(0, 0)"
+      }, 
+      "inputs": []
+    }, 
+    {
+      "op": "null", 
+      "name": "fc7_bias", 
+      "attrs": {
+        "kernel": "(1, 1)", 
+        "num_filter": "1024", 
+        "pad": "(0, 0)"
+      }, 
+      "inputs": []
+    }, 
+    {
+      "op": "Convolution", 
+      "name": "fc7", 
+      "attrs": {
+        "kernel": "(1, 1)", 
+        "num_filter": "1024", 
+        "pad": "(0, 0)"
+      }, 
+      "inputs": [[61, 0, 0], [62, 0, 0], [63, 0, 0]]
+    }, 
+    {
+      "op": "Activation", 
+      "name": "relu7", 
+      "attrs": {"act_type": "relu"}, 
+      "inputs": [[64, 0, 0]]
+    }, 
+    {
+      "op": "Pooling", 
+      "name": "global_pool", 
+      "attrs": {
+        "global_pool": "True", 
+        "kernel": "(7, 7)", 
+        "pool_type": "avg"
+      }, 
+      "inputs": [[65, 0, 0]]
+    }, 
+    {
+      "op": "null", 
+      "name": "fc8_weight", 
+      "attrs": {
+        "kernel": "(1, 1)", 
+        "num_filter": "1000"
+      }, 
+      "inputs": []
+    }, 
+    {
+      "op": "null", 
+      "name": "fc8_bias", 
+      "attrs": {
+        "kernel": "(1, 1)", 
+        "num_filter": "1000"
+      }, 
+      "inputs": []
+    }, 
+    {
+      "op": "Convolution", 
+      "name": "fc8", 
+      "attrs": {
+        "kernel": "(1, 1)", 
+        "num_filter": "1000"
+      }, 
+      "inputs": [[66, 0, 0], [67, 0, 0], [68, 0, 0]]
+    }, 
+    {
+      "op": "Flatten", 
+      "name": "flatten0", 
+      "inputs": [[69, 0, 0]]
+    }, 
+    {
+      "op": "null", 
+      "name": "softmax_label", 
+      "inputs": []
+    }, 
+    {
+      "op": "SoftmaxOutput", 
+      "name": "softmax", 
+      "inputs": [[70, 0, 0], [71, 0, 0]]
+    }
+  ], 
+  "arg_nodes": [
+    0, 
+    1, 
+    2, 
+    5, 
+    6, 
+    10, 
+    11, 
+    14, 
+    15, 
+    19, 
+    20, 
+    23, 
+    24, 
+    27, 
+    28, 
+    32, 
+    33, 
+    36, 
+    37, 
+    40, 
+    41, 
+    45, 
+    46, 
+    49, 
+    50, 
+    53, 
+    54, 
+    58, 
+    59, 
+    62, 
+    63, 
+    67, 
+    68, 
+    71
+  ], 
+  "node_row_ptr": [
+    0, 
+    1, 
+    2, 
+    3, 
+    4, 
+    5, 
+    6, 
+    7, 
+    8, 
+    9, 
+    11, 
+    12, 
+    13, 
+    14, 
+    15, 
+    16, 
+    17, 
+    18, 
+    19, 
+    21, 
+    22, 
+    23, 
+    24, 
+    25, 
+    26, 
+    27, 
+    28, 
+    29, 
+    30, 
+    31, 
+    32, 
+    33, 
+    35, 
+    36, 
+    37, 
+    38, 
+    39, 
+    40, 
+    41, 
+    42, 
+    43, 
+    44, 
+    45, 
+    46, 
+    47, 
+    49, 
+    50, 
+    51, 
+    52, 
+    53, 
+    54, 
+    55, 
+    56, 
+    57, 
+    58, 
+    59, 
+    60, 
+    61, 
+    63, 
+    64, 
+    65, 
+    66, 
+    67, 
+    68, 
+    69, 
+    70, 
+    71, 
+    72, 
+    73, 
+    74, 
+    75, 
+    76, 
+    77, 
+    78
+  ], 
+  "heads": [[72, 0, 0]], 
+  "attrs": {"mxnet_version": ["int", 10200]}
+}
diff --git a/tests/python/mkl/test_mkldnn.py b/tests/python/mkl/test_mkldnn.py
new file mode 100644
index 00000000000..a4c9c4557a3
--- /dev/null
+++ b/tests/python/mkl/test_mkldnn.py
@@ -0,0 +1,94 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+"""
+MKL-DNN related test cases
+"""
+
+import logging
+import os
+from sys import platform
+
+
+def test_mkldnn_install():
+    """
+    This test will verify that MXNet is built/installed correctly when
+    compiled with Intel MKL-DNN library. The method will try to import
+    the mxnet module and see if the mkldnn library is mapped to this
+    process's address space.
+    """
+    logging.basicConfig(level=logging.INFO)
+
+    if not platform.startswith('linux'):
+        logging.info("Bypass mkldnn install test for non-Linux OS")
+        return
+
+    try:
+        #pylint: disable=unused-variable
+        import mxnet as mx
+    except (ImportError, OSError) as e:
+        assert 0, "Import mxnet error: %s. Please double check your build/" \
+            "install steps or environment variable settings" % str(e)
+
+    pid = os.getpid()
+    rc = os.system("cat /proc/" + str(pid) +
+                   "/maps | grep libmkldnn > /dev/null")
+
+    if rc == 0:
+        logging.info("MXNet is built/installed correctly with MKL-DNN")
+    else:
+        assert 0, "MXNet is built/installed incorrectly with MKL-DNN, please " 
\
+            "double check your build/install steps or environment " \
+            "variable settings"
+
+
+def test_mkldnn_model():
+    """
+    This test will run a sample model for couple of iterations.
+    """
+
+    import mxnet as mx
+    model = os.path.join(os.path.dirname(os.path.realpath(__file__)), "data",
+                         "test_mkldnn_test_mkldnn_model_model1.json")
+    shape = (32, 3, 300, 300)
+    ctx = mx.cpu()
+
+    sym = mx.sym.load(model)
+    args = sym.list_arguments()
+    shapes = sym.infer_shape(data=shape)
+
+    def get_tensors(args, shapes, ctx):
+        return {x: mx.nd.ones(y, ctx) for x, y in zip(args, shapes)}
+
+    inputs = get_tensors(args, shapes[0], ctx)
+    grads = get_tensors(args, shapes[0], ctx)
+
+    try:
+        exe = sym.bind(ctx, inputs, args_grad=grads)
+        for _ in range(2):
+            exe.forward(is_train=True)
+            for y in exe.outputs:
+                y.wait_to_read()
+            exe.backward()
+            for y in exe.grad_arrays:
+                y.wait_to_read()
+    except:  # pylint: disable=bare-except
+        assert 0, "test_mkldnn_model exception in bind and execution"
+
+
+if __name__ == '__main__':
+    test_mkldnn_install()
diff --git a/tests/python/unittest/test_operator.py 
b/tests/python/unittest/test_operator.py
index a581e32762e..af940fe7176 100644
--- a/tests/python/unittest/test_operator.py
+++ b/tests/python/unittest/test_operator.py
@@ -1580,6 +1580,13 @@ def test_bmin(a, b):
         # pass idx=200 to gen_broadcast_data so that generated ndarrays' sizes 
are not too big
         data = gen_broadcast_data(idx=200)
         check_bmaxmin_gradient(c, data[0], data[1], 0.001, 1e-2, 1e-3)
+        
+    def test_band(a, b):
+        c = mx.sym.broadcast_logical_and(a, b)
+        check_binary_op_forward(c, lambda x, y: np.logical_and(x, y), 
gen_broadcast_data, mx_nd_func=mx.nd.broadcast_logical_and)
+        # pass idx=200 to gen_broadcast_data so that generated ndarrays' sizes 
are not too big
+        data = gen_broadcast_data(idx=200)
+        check_bmaxmin_gradient(c, data[0], data[1], 0.001, 1e-2, 1e-3)
 
     test_bplus(a, b)
     test_bminus(a, b)
@@ -1591,6 +1598,7 @@ def test_bmin(a, b):
     test_bequal(a, b)
     test_bmax(a, b)
     test_bmin(a, b)
+    test_band(a, b)
 
 
 @with_seed()
diff --git a/tests/python/unittest/test_sparse_operator.py 
b/tests/python/unittest/test_sparse_operator.py
index d6d607f115e..484c98643d9 100644
--- a/tests/python/unittest/test_sparse_operator.py
+++ b/tests/python/unittest/test_sparse_operator.py
@@ -1194,6 +1194,9 @@ def check_cast_storage(shape, density, from_stype, 
to_stype, check_numeric_grad=
             # test gpu block  kernel
             check_cast_storage((dim0, rnd.randint(512, 1024)), d, 'default', 
'csr',
                                check_numeric_grad=False)
+            # check race condition in block kernel
+            check_cast_storage((200, 128 * 2 + 1), d, 'default', 'csr',
+                               check_numeric_grad=False)
             # test gpu thread kernel
             check_cast_storage((dim0, rnd.randint(  1,   32)), d, 'default', 
'row_sparse')
             # test gpu warp   kernel
diff --git a/tests/tutorials/test_tutorials.py 
b/tests/tutorials/test_tutorials.py
index 0949da7cebf..da8d69051aa 100644
--- a/tests/tutorials/test_tutorials.py
+++ b/tests/tutorials/test_tutorials.py
@@ -121,6 +121,9 @@ def test_basic_data():
 def test_gluon_customop():
     assert _test_tutorial_nb('gluon/customop')
 
+def test_gluon_custom_layer():
+    assert _test_tutorial_nb('gluon/custom_layer')
+
 def test_gluon_data_augmentation():
     assert _test_tutorial_nb('gluon/data_augmentation')
 


 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to