Amount of content is astounding. Learning a lot and truly impressed on magnitude of high promising research happening around the world.
Day 2 at ICML had great variety of parallel tracks with topics covering Online Learning, Probabalistic Learning, Deep Generative Models, Deep Learning Theory, Supervised Learning, Latent Feature Models, Reinforcement Learning, Continuous Optimization, Matrix Factorization, Metalearning and etc.
Bernhard Schölkopf kicked off the day with talk on Causal Learning (book) and how causal ideas could be exploited for classical machine learning problems.
Deep Generative Models
Lots of interest in this area (no surprise). Here are what few memorable talks were about (no links to papers as they are easy to find using your fav search engine ;)… maybe I will add those later).
Pixel CNN (also were mentioned @Day1), Pixel CNN with Auxiliary Variables — tries to solve the problem of model not capturing global image statics (and thus generated images looks somewhat shapeless).
Grayscale Pixel CNN uses 4 bit grayscale view of input as auxiliary variable and thus retains global image info.
Next reincarnation of this idea is Pyramid Pixel CNN: start from generating low resolution 8×8 images (using Pixel CNN) and then recursively use apply Pixel CNN to upscale to 128×128.
Python notebook with Face Generation demo here (built on Tensorflow).
Parallel Multiscale Autoregressive Density Estimation (by DeepMind team)– is a work concurrent with PixelCNNs.
Goal is to accelerate sampling of PixelCNNs by rearranging pixels into subsampling image pyramid. Result: 100x speedup!
Video Pixel Models (by DeepMind team again) aim to capture dependencies across time, space and RGB channels . The network consist of
- Resolution preserving CNN encoders
- Pixel CNN decoders
Demo of results (generated videos):
- Moving MNIST here.
- Google Robotic Pushing with seen objects here.
- Google Robotic Pushing with novel objects here.
Learning Texture Manifolds with the Periodic Spatial GAN (PSGAN). This one demonstrated really visually pleasing results of complex generated textures.
For example, periodic textures has been generated (honeycomb); manifolds of textures were learnt, smoothly blend to create new textures.
Based on DCGAN. Code is here (Theano).
DiscoGAN is not about dancing but about having two datasets and training GAN to transfer features from one dataset to another. The scenario presented was the following (appealing to lesser part of the audience): dataset A had portraits of ladies with blond hair, dataset B — ladies with dark hair; goal — change hair color while preserving face.
That’s been accomplished by having 2 Generators, 2 Generators and cyclic loss between those. Sounds like CycleGAN, right? Because indeed 2 papers were published approximately at the same time and use the same approach. For whatever reason CycleGans paper got more popular.
Wasserstein GAN is being references quite a bit in GAN related papers and got super speedy presentation by the author.
This reddit comment does fabulous job summarizing what it is about.
Advantages — super stable (especially newer WGAN-GP), no mode collapse! Disadvantages — slow ( at least 6x slower than DCGAN).
- Dataset augmentation https://arxiv.org/pdf/1707.03124.pdf
- Detecting non existent pedestrians http://sunw.csail.mit.edu/abstract/Detecting_Nonexistent_Pedestrians.pdf
- Secure Stenography https://arxiv.org/ftp/arxiv/papers/1707/1707.01613.pdf
Few interesting ideas below on flexible, adaptive ML.
MetaNet — framework for building flexible models. Steps include:
- define your NN
- define adaptable part of NN,
- define feeback in from of meta info
- inject fast weights
- optimize model e2e
Code is here.
SplitNet — semantically splitting DNNs for parameter reduction and model parallelization.
Application: large scale visual recognition. Goal is to overcome nodes communication overhead that Model Parallelization brings => SplitNet introduces hierarchy of disjoint sub-NNs, relevant classes and features are grouped together.
Code is here (Tensorflow).
Model -Agnostic Meta-Learning for Fast Adaptations of DNNs (MAML). Goal: learn new task from small amount of data (given 1 training image per class, classify unseen image).
The gist of the approach: MAML optimizes for a set of parameters such that when a gradient step is taken with respect to a particular task i (the gray lines), the parameters are close to the optimal parameters θ∗i for task i.
Supporting blogpost with links to code (Tensorflow) is here.
AdaNET — learn NN architecture itself alongside with needed weights.
Construction of NN is incremental. Complexity of resulting NN depends on how difficult the task is: NN will have less nodes for the classification task “deer” \ “truck” than NN for more challenging task of classifying “cat”\”dog”.
General techniques are applicable to CNNs, RNNs, ResNet and etc.
Google Research team does not yet have clear plans on open sourcing this gem, so people concerned about data science job security can sleep ok (for now).