Some questions after playing a little with the new ml.Pipeline.

Jaonary Rabarisoa Fri, 27 Feb 2015 22:30:50 -0800

Dear all,


We mainly do large scale computer vision task (image classification,
retrieval, ...). The pipeline is really great stuff for that. We're trying
to reproduce the tutorial given on that topic during the latest spark
summit (
http://ampcamp.berkeley.edu/5/exercises/image-classification-with-pipelines.html
)
using the master version of spark pipeline and dataframe. The tutorial
shows different examples of feature extraction stages before running
machine learning algorithms. Even the tutorial is straightforward to
reproduce with this new API, we still have some questions :

   - Can one use external tools (e.g via pipe) as a pipeline stage ? An
   example of use case is to extract feature learned with convolutional neural
   network. In our case, this corresponds to a pre-trained neural network with
   Caffe library (http://caffe.berkeleyvision.org/) .


   - The second question is about the performance of the pipeline. Library
   such as Caffe processes the data in batch and instancing one Caffe network
   can be time consuming when this network is very deep. So, we can gain
   performance if we minimize the number of Caffe network creation and give
   data in batch to the network. In the pipeline, this corresponds to run
   transformers that work on a partition basis and give the whole partition to
   a single caffe network. How can we create such a transformer ?



Best,

Jao

Some questions after playing a little with the new ml.Pipeline.

Reply via email to