Jeremy Nixon will focus on the engineering and applications of a new algorithm built on top of MLlib. The presentation will focus on the methods the algorithm uses to automatically generate features to capture nonlinear structure in data, as well as the process by which it’s trained. Major aspects of that are the compositional transformations over the data, convolution, and distributed backpropagation via SGD with adaptive gradients and an adaptive learning rate. Applications will look into how to use convolutional neural networks to model data in computer vision, natural language and signal processing. Details around optimal preprocessing, the type of structure that can be learned, and managing its ability to generalize will inform developers looking to apply nonlinear modeling tools to problems that they face.
2. Spark Technology
Center
1. Machine Learning Engineer at the Spark
Technology Center
2. Contributor to MLlib, dedicated to
scalable deep learning.
3. Previously, studied Applied Mathematics
to Computer Science and Economics at
Harvard
Jeremy Nixon
3. Future Work
1. Convolutional Neural Networks
a. Convolutional Layer Type
b. Max Pooling Layer Type
2. Flexible Deep Learning API
3. More Modern Optimizers
a. Adam
b. Adadelta + Nesterov Momentum
4. More Modern activations
5. Dropout / L2 Regularization
6. Batch Normalization
7. Tensor Support
8. Recurrent Neural Networks (LSTM)
6. Spark Technology
Center
- Network depth creates an extraordinary
range of possible models.
- That flexibility creates value in large
datasets to reduce variance.
Structural
Assumptions:
Combinatorial
Flexibility
7. Spark Technology
Center
X = Normalized Data, W1
, W2
= Weights
Forward:
1. Multiply data by first layer weights | (X*W1
)
2. Put output through non-linear activation | max(0, X*W1
)
3. Multiply output by second layer weights | max(0, X*W1
) *
W2
4. Return predicted output
Structural
Assumption:
The Model
8. Spark Technology
Center
- Pixels - Edges - Shapes - Parts - Objects
- Learn features that are optimized for the
data
- Makes transfer learning feasible
Structural
Assumptions:
Hierarchical
Abstraction
12. Spark Technology
Center
1. CNNs - State of the art
a. Object Recognition
b. Object Localization
c. Image Segmentation
d. Image Restoration
e. Music Recommendation
2. RNNs (LSTM) - State of the Art
a. Speech Recognition
b. Question Answering
c. Machine Translation
d. Text Summarization
e. Named Entity Recognition
f. Natural Language Generation
g. Word Sense Disambiguation
h. Image / Video Captioning
i. Sentiment Analysis
Applications
19. Spark Technology
Center
Parallel implementation of
backpropagation:
1. Each worker gets weights from master
node.
2. Each worker computes a gradient on its
data.
3. Each worker sends gradient to master.
4. Master averages the gradients and
updates the weights.
Distributed
Optimization
20. Spark Technology
Center
● Parallel MLP on Spark with 7 nodes ~=
Caffe w/GPU (single node).
● Advantages to parallelism diminish with
additional nodes due to
communication costs.
● Additional workers are valuable up to
~20 workers.
● See
https://github.com/avulanov/ann-benc
hmark for more details
Performance