Basics of RNNs and its applications with following papers:
- Generating Sequences With Recurrent Neural Networks, 2013
- Show and Tell: A Neural Image Caption Generator, 2014
- Show, Attend and Tell: Neural Image Caption Generation with Visual Attention, 2015
- DenseCap: Fully Convolutional Localization Networks for Dense Captioning, 2015
- Deep Tracking- Seeing Beyond Seeing Using Recurrent Neural Networks, 2016
- Robust Modeling and Prediction in Dynamic Environments Using Recurrent Flow Networks, 2016
- Social LSTM- Human Trajectory Prediction in Crowded Spaces, 2016
- DESIRE- Distant Future Prediction in Dynamic Scenes with Interacting Agents, 2017
- Predictive State Recurrent Neural Networks, 2017
7. Long Short Term Memory
7
Vanilla RNN
http://colah.github.io/posts/2015-08-Understanding-LSTMs/
8. Long Short Term Memory
8
Long Short Term Memory (LSTM)
http://colah.github.io/posts/2015-08-Understanding-LSTMs/
9. Long Short Term Memory
9
Overall structure
http://colah.github.io/posts/2015-08-Understanding-LSTMs/
Input
Output
Cell state Next cell state
Hidden state Next hidden state
Forget gate
Input
gate
Output
gate
10. Long Short Term Memory
10
Core idea
http://colah.github.io/posts/2015-08-Understanding-LSTMs/
vs.
11. Long Short Term Memory
11
Core idea
http://colah.github.io/posts/2015-08-Understanding-LSTMs/
12. Long Short Term Memory
12
Step-by-step
http://colah.github.io/posts/2015-08-Understanding-LSTMs/
Forget gate
Input gate
Decide which information to throw away
from the cell state.
Decide which information to store to the
cell state.
13. Long Short Term Memory
13
Step-by-step
http://colah.github.io/posts/2015-08-Understanding-LSTMs/
Update
Output gate
Update the cell state scaled by input and
forget gates.
Output based on the updated cell state.
14. Long Short Term Memory
14
Reminder
http://colah.github.io/posts/2015-08-Understanding-LSTMs/
Input
Output
Cell state Next cell state
Hidden state Next hidden state
Forget gate
Input
gate
Output
gate
18. Training a RNN
18
How do we train a RNN?
h1 = (WT
h0 + UT
x1)
h2 = (WT
(WT
h0 + UT
x1) + UT
x2)
h3 = (WT
(WT
(WT
h0 + UT
x1) + UT
x2) + UT
x3)
h3 = (WT
(WT
(WT
(WT
h0 + UT
x1) + UT
x2) + UT
x3) + UT
x4)
19. Training a RNN
19
How do we train a RNN?
h1 = (WT
h0 + UT
x1)
h2 = (WT
(WT
h0 + UT
x1) + UT
x2)
h3 = (WT
(WT
(WT
h0 + UT
x1) + UT
x2) + UT
x3)
h3 = (WT
(WT
(WT
(WT
h0 + UT
x1) + UT
x2) + UT
x3) + UT
x4)
Vanishing / exploding gradients often happen.
28. Prediction Network
28
The prediction network is optimized by minimizing the following loss:
TX
t=1
log P(xt|x1:t 1)
However, synthesis is NOT possible yet. Why?
31. Handwriting Prediction
31
Mixture density network
The outputs of a neural network parametrize a mixture distribution.
The inputs consist of pen offsets and end of stroke.
95. Social LSTM
95
This paper proposed an LSTM model which can learn general human movement
and predict their future trajectories.
A new model called SocialLSTM which can jointly predict the paths of all the
people in a scene by taking into account the common sense rules and social
conventions is presented.
97. SocialLSTM
97
A new social pooling method is presented by incorporating a social hidden
tensor per each LSTM.
Social pooling
98. SocialLSTM
98
Pose estimation
A new social pooling method is presented by incorporating a social hidden
tensor per each LSTM.
Future position of a pedestrian is modeled via a bivariate Gaussian distribution:
101. DESIRE
101
This paper introduced a Deep Stochastic IOC RNN Encoder decoder framework,
DESIRE, for the task of future predictions of multiple interacting agents in
dynamic scenes.
Following mechanisms are focussed:
Diverse Sample Generation
IOC-based Ranking and Refinement
Scene Context Fusion
103. Proposed Architecture
103
Sample Generation Module
Both previous and future trajectories are encoded with RNN Encoder1 and RNN
Encoder2.
Conditional VAE samples a latent vector and fed into RNN Decoder1 to generate
a future trajectory where the reconstruction error is minimized.
106. Predictive State Recurrent Neural Networks
106
Many of time series modeling methods can be categorized as either recursive
Bayes Filtering or Recurrent Neural Networks.
Recursive Bayes Filtering
Hidden Markov models (HMMs) or Kalman fitering (KF)
Predictive State Representation (PSR) is a variation on Bayes filters that
represents a state as the statistics of a distribution of features of future
observations.
Recurrent Neural Network (RNN)
RNNs model sequential data via a parameterized internal state and
update function.