Brain endurance or Day 2 at ICML 2017

Amount of content is astounding. Learning a lot and truly impressed  on magnitude of high promising research happening around the world.

Day 2  at ICML had great variety of parallel tracks with topics covering Online Learning,  Probabalistic Learning,  Deep Generative Models,  Deep Learning Theory, Supervised Learning,  Latent Feature Models, Reinforcement Learning, Continuous Optimization,  Matrix Factorization,  Metalearning and etc.

Bernhard Schölkopf kicked off the day with talk on Causal Learning (book) and how causal ideas could be exploited for classical machine learning problems.

Deep Generative Models

Lots of interest in this area (no surprise).  Here are what few memorable talks were about (no links  to papers as they are easy to find using your fav search engine ;)… maybe I will add those later).

Pixel CNN (also were mentioned @Day1), Pixel CNN with Auxiliary Variables — tries to solve the problem of model  not capturing global image statics (and thus generated images looks somewhat shapeless).

Grayscale Pixel CNN  uses 4 bit grayscale view  of input as auxiliary variable and thus retains global image info.

Next reincarnation of this idea is Pyramid Pixel CNN: start from generating low resolution 8×8 images (using Pixel CNN) and  then recursively use apply Pixel CNN  to upscale to 128×128.

Python notebook with Face Generation demo  here (built on Tensorflow).


Parallel Multiscale Autoregressive Density Estimation  (by DeepMind team)– is a work concurrent with PixelCNNs.

Goal is to accelerate sampling of PixelCNNs by rearranging pixels into subsampling image pyramid. Result: 100x speedup!


Video Pixel Models (by DeepMind team again)  aim to capture dependencies across time, space and RGB channels . The network consist of

  1. Resolution preserving  CNN encoders
  2. Pixel CNN decoders

Demo of results (generated videos):

  1. Moving MNIST here.
  2. Google Robotic Pushing with seen objects here.
  3. Google Robotic Pushing with novel objects here.

Learning Texture Manifolds with the Periodic Spatial GAN (PSGAN). This one demonstrated really visually pleasing results of complex generated textures.

For example, periodic textures  has been generated  (honeycomb); manifolds of textures were learnt, smoothly blend to create new textures.

Based on DCGAN.  Code is here (Theano).


DiscoGAN is not about dancing but about having two datasets and training GAN to transfer features from one dataset to another. The scenario presented was the following (appealing to lesser part of the audience): dataset A  had portraits of ladies with blond hair, dataset B — ladies with dark hair; goal — change hair color while preserving face.

That’s been accomplished by having 2 Generators, 2 Generators and cyclic loss between those.  Sounds like CycleGAN, right?  Because indeed 2 papers were published approximately at the same time and use the same approach. For whatever reason CycleGans paper got more popular.


Wasserstein  GAN is being references quite a bit in GAN related papers and got super speedy presentation by the author.

This reddit comment does fabulous job summarizing what it is about.

Advantages — super stable (especially newer WGAN-GP), no mode collapse!  Disadvantages — slow ( at least 6x slower than DCGAN).

Interesting applications:

  1. Dataset augmentation
  2. Detecting non existent pedestrians
  3. Secure Stenography



Few interesting ideas below on flexible, adaptive ML.

MetaNet — framework for building flexible models. Steps include:

  1. define your NN
  2. define adaptable part of NN,
  3. define feeback in from of meta info
  4. inject fast weights
  5. optimize model e2e

Code is here.


SplitNet — semantically splitting DNNs  for parameter reduction and model parallelization.

Application: large scale visual recognition. Goal is to overcome nodes communication overhead that  Model Parallelization brings => SplitNet introduces hierarchy of disjoint sub-NNs, relevant classes and features are grouped together.

Code is here (Tensorflow).


Model -Agnostic  Meta-Learning  for Fast Adaptations of  DNNs (MAML). Goal: learn new task  from small amount of data (given 1 training image per class, classify unseen image).

The gist of the approach: MAML optimizes for a set of parameters such that when a gradient step is taken with respect to a particular task i (the gray lines), the parameters are close to the optimal parameters θ∗i for task i.



Supporting blogpost with links to code (Tensorflow) is here.


AdaNET  — learn NN architecture  itself alongside with  needed weights.

Construction of NN is incremental. Complexity of resulting NN depends on how difficult the task is: NN will have less nodes for the classification task “deer” \ “truck” than NN for more challenging task of classifying “cat”\”dog”.

General techniques are applicable to   CNNs, RNNs,  ResNet and etc.

Google Research team does not yet have clear plans on open sourcing this gem, so people concerned about data science job security can sleep ok (for now).

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s