Day 4 at ICML 2017 — more Adversarial NNs

The morning talk was about Deep Reinforcement Learning in Complex environment by Raia Hadsell from Deep Mind.  In overall lots of great talks on the conference from DeepMind and Google Brain. The talk was generously sprinkled with newly published papers by DeepMind researches in Reinforcement Learning\Gaming space. Angry Birds are not yet solved, just FYI if somebody is up for a challenge.

Main algos\approaches covered in talk were: hierarchical reinforcement learning,  continual learning,  continuous control, multimodal agents, auxiliary tasks. See quite entertaining and nicely annotated demos here.

Deep learning & hardware

Main  theme: let’s use CPUs effectively and make NN computation effective on mobile devices.

Device Placement Optimization with Reinforcement Learning. Reward function: runtime of the program. Input: NN and available CPU\GPU devices.


  1. Policy finds non trivial assignments, outperforms heuristics.
  2. Policy learns trade-offs between computation vs communication capabilities of devices.

Deep Tensor Convolution on Multicores. Author drives the point that using CPUs (vs GPUs) is still a thing:

  1. You may not have access to GPUs. CPUs are generally more available.
  2. If you’re in an autoregressive domain (i.e. forced batch size == 1), it’s hard to utilize a GPU.
  3. GPUs have memory limitations.

Authors show math optimizations they did to optimize  convolutional  algorithms for CPU hardware. They maximize CPU utilization and multicore scalability by transforming data matrices to be cache-aware.

Result: 5-25 folds speedup on 2D ConvNets.

Code is not open sourced, paper has pseudocode  that should be sufficient to reimplement.

MEC: Memory-efficient Convolution for Deep Neural Network.  Motivation:  reduce memory requirements  of convolution layer computation so it could efficiently run devices like cameras and  smartphones.

MEC  has smaller memory footprint, removes redundant info  leverages cache locality and explicit parallelism.

Beyond Filters: Compact Feature Map for Portable Deep Model. Same motivation as paper above: make convnets effective on mobile devices.

Main point: DNN with lots of filters in a layer has redundant info in feature maps.

Authors demonstrate how feature map can be transformed to more compact representation via matrix transformation. Resulting network has fewer params, thus needs fewer memory and faster  for computation.

Speedup: 4-5x.

Efficient softmax approximation for GPUs.  Application: language modelling.

Goal: to speed up training of large models (that  take days to train).

Zipf law: Small part of the vocabulary accounts for most words occurrences thus  computation should be very efficient for frequent words. So vocabulary is split up into clusters and approximate hierarchical model is defined.

Result: 2-10x speedup

Code (Torch).


Interesting applications of Adversarial Networks to text generation.

Toward Controlled Generation of Text. Allows controlling of user specified attributes like  sentiment and tense.

Semi-supervised leaning:  sentence & label pairs for training were synthesized.

Model:  encoder,  generator and multiple discriminators. Each attribute of control has it’s own discriminator, f.e. for sentiment attribute discriminator is a sentiment classifier.

Some generated sentences are quite good.

Adversarial Feature Matching for Text Generation. Another attempt to generate text with GANs.

Author noted that mode collapse for text is even bigger problem that for images as transitions in text are less smooth than in images.

Generator: LSTM, translates latent vector into  synthetic sentence.

Discriminator: uses CNN. Sentence is represented as matrix followed by convolution +maxpool.

D tries to detect informative text features and G tries to match those.

For training author combined Ariv and BookCorpus datasets. End result is quite novel like “I waited alone in  a gene expression dataset…”

Code (Theano).


That track showed some cool application DNNs.

Dance Dance Convolution. Produced DNN that was taught to choreograph: generate step charts from raw audio (to be used in popular video game Dance Dance Revolution).

Task 1 is step placement. For that task model predicts steps at frame level.  Convolutional front end + LSTM RNN.

Task 2 is step selection (left, right and etc). LSTM is used.

Positive user reviews: 3.8

World of Bits: An Open-Domain Platform for WebBased Agents. Use AI to automate web based task like booking a flight via website.

Data collection: pixel input from screen,  DOM of the website page, Key and Pointer mouse events (captured from session when human navigates web page).

Reward is  defined through POST request: POST request from AI should similar to the one from human session.

Current success result on United website: 30% correct  booking — far from being prod ready for booking flights.

Real-Time Adaptive Image Compression (paper). To my mind this one  a killer application of GANs (no open sourced code though).

Outperforms all existing codecs.

Discriminator is used for higher perceptual fidelity: ensure that decompressed image looks comparable to the original.

Result files are 2.5 smaller than JPEG.

Neural Message Passing for Quantum Chemistry. Molecules are represented as 3d graphs with node and edge features.

Current tools that chemists use take a long time (>1) to calculate DFT (density function) for big molecules.

Goal: predict DFT (density function).

QM9 dataset was used for training, has limited number of molecules (thus the result model could not really be used real life).

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s