Day 3 at ICML 2017 — musical RNNs

Here are my notes from ICML Day 3 (Tuesday).

Lots of interesting tracks (going in parallel) to choose from:  Fisher approximations, Continuous optimization, RRNs, Reinforcement learning, Probabilistic inference, Clustering, Deep learning analysis,  Game theory and etc.

The day’s been kicked off with “Test Of Time Award” presentation.  Each year committee  looks back ~ 10 years and choses paper that’s proven to be most impactful.  This time it was “Combining Online and Offline Knowledge in UCT” – the paper that laid foundation of AlphaGo’s success. The original idea of Mogo was leveraging  Reinforcement Learning and  Monter-Carlo Tree search. AlphaGo’s added Deep Learning kick to it. Back in 2007 authors maid bets\predictions on the future of their algo, beating  Go’s world champion in 10 years was one of them.

Reinforcement learning

Several policy evaluation approaches has been discussed.

Data-Efficient Policy Evaluation Through Behavior Policy Search. Key idea here is adapting the behavioral  policy parameters with gradient descent on  the MSE  of importance sampling. Approach shows especially good results on high variance policy.

Optimal and Adaptive Off-policy Evaluation in Contextual Bandits . New  SWITCH estimator is proposed. For each action there is literally a switch depending on the conditions:  either  IPC (Inverse propensity scoring) or DR (doubly robust) estimator is used; otherwise oracle estimator is used. Performs quite well in practice.

Relational learning

This  track drives the point that deep models can learn expressive features of nodes in relational graph.

Know-Evolve: Deep Temporal Reasoning for Dynamic Knowledge Graphs. Deep RNN learns non-linearly evolving entity representations over time. Network will  ingest  new info  and updates embeddings of  affected entities. Can predict time when the fact may occur. Supports prediction over unseen entities.

Recurrent neural networks

Lot’s of generated music is this track.

Sequence Tutor: Conservative fine-tuning of sequence generation models with KL-control. Combination of RRNs with Reinforcement Learnings (RL).

Results are demonstrated on generating new music melodies and molecules. First RNN #1 is pre-trained on data and the probability distribution over the next token in the sequence learned by this model is treated as a prior policy.  Then RL is applied:  RNN  #2 is trained to generate new  outputs using  prior policy of RNN #1.

Here is sample of generated music. Regarding molecule generation: 35.8%  of generated output was valid (so not really usable for prod). Code is here.

Deep Voice: Real-time Neural Text-to-Speech.

Five subcomponents were presented:

  1. Segmentation model

Find phoneme boundaries

  1. Grapheme-to-phoneme model

Performs conversion

  1. Phoneme duration prediction model

RNN sequence regression

  1. Fundamental frequency (F0) prediction model
  2. Audio synthesis model

400x speedup over prev implementation ( thus – real time).

Evaluation: by humans. If both F0 and duration are synthesized that score given by people  is around of 2 (pretty bad actually).  If F0 and duration are cut from the ground truth then the results gets much realistic, human score  3.8+.

DeepBach: a Steerable Model for Bach Chorales Generation. Pretty impressive demos. 50% of human voters (mixed audience) wouldn’t know if that’s real Bach’s chorale or generated one.

For music lovers out there: here is how they encoded four voice chorale for NN to learn:

notes2

Deep learning analysis

A Closer Look at Memorization in Deep Networks.  This work focuses on difference  in learning on noise vs data. Main take ways are (nothing really eye opening I think):

  1. DNN’s do not just memorize data (phew!)
  2. Regularization helps with memorization
  3. DNNs learn simple patterns first

Cognitive Psychology for Deep Neural Networks: A Shape Bias Case Study.

They have taken developmental psychology methods explaining how children learn word labels for objects, and applied that analysis to DNNs.

Result: state-of-the-art one shot learning models trained on ImageNet exhibit a similar bias to that observed in humans: they prefer to categorize objects according to shape rather than color.

Axiomatic Attribution for Deep Networks. The idea is to examine the gradients of inputs obtained by interpolating on a straight-line path between the input at hand and a baseline input (black image), and then aggregate these gradients together.

Application is not limited to image classification ( though definitely would be of great use to the domains as  healthcare). Code.

bus
On Calibration of Modern Neural Networks. Author highlights the problem in latest DNNs:  they are overconfident when misclassifying, original simpler networks as  LeNet did not have such issue. Factors contributing to  such miscalibration are  increased N capacity,  batch norm, less regularization (weight decay).

Simple trick: temperature scaling, works well, does not require architectural changes.

 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s