Day 5 & 6 at ICML. All done.

Last 2 days of the conference were workshops and actually had less rock-star content.

In overall ICML in this year was well organized (well, minus pass-holders that emit constant cow-bell like tinkling) and rich for content.  I have not noticed any breakthrough papers though. Lots of RNNs, LSTMs, language\speech related work, GANs and Reinforcement Learning.

Toolset wise it “feels” like mostly Tensorflow,  Caffe, Pytorch,  even Matlab was mentioned few times.


Principled Approaches to Deep Learning

This track was about theoretical understanding  DNN architectures.

Do GANs actually learn distribution? I personally had higher expectation of this talk.  Main point was that yes, it’s problematic quantify success of GANs training algo and that mode collapse is a problem. That’s pretty much all about it.

Towards deeper understanding of a quantized networks. DNNs are big and there is subset of quantized/low precision networks  that are fast (no multiplication), have low storage and  power consumption and thus could be used on low power devices. During the talk author explores nuances of training quantized networks.

Experiments hey did showed that floating point networks go from  Exploration to Exploitation as learning rate shrinks, they get really focused in finding optimum. However networks with Stochastic Rounding (SR) kinda get stuck in Exploration phase. That’s a problem because SGD  have Exploitation phase and thus NN with SR will surely get issues.

SVCCA: Singular Vector Canonical Correlation Analysis for Deep Understanding and Improvement. Vector-themed approach to understand how NN is being trained: neurons are vectors, layers are subspaces. Applying SVD and CCA we can compare representational similarity of the layers.

Some observations:

  1. Layers converge bottom-up (ones close to input solidify first) => can freeze lower layers in training earlier, thus improve generalization
  2. Interpretability: by applying SVCCA  to a layer and output we can measure sensitivity  to different classes through the network.


The Sum-Product Theorem: A Foundation for Learning Tractable Models. 15 pages of A. Friesen and P. Domingos  masterpiece is here.

The paper describes a unifying framework to learn tractable models for a variety of problems including optimization, satisfiability, constraint programming, inference, etc.  The unifying framework is achieved by considering sum-product functions over a semi-ring of sum operators and product operators.  When the product operator is decomposable, it is shown that any problem that boils down to summing out the variables from the sum-product function can be solved in linear time with respect to the size of the representation of the sum-product function.

LibSPN. It’s a library  for learning inference with Sum-Product Networks (SPNs), based on Tensorflow (multi-GPU).

SPNs are viewed as promising networks, combining benefits of Deep Learning and Probabilistic modeling. SPNs learn form high-dim, noisy data.

Promising results:  speech\lang modeling, robotics,  image classification and completion.

Visualization for Deep Learning

Few interesting gradient based method has been discussed, mostly image classification\segmentation\captioning related.

Interpretation Deep Visual Representations. Goal: quantify the interpretability of latent representations of CNNs.

Solution: evaluate units  for semantic segmentation (paper  and blog has really interesting charts). Authors created Broden dataset —  heavily annotated (63+K images, 1+K visual concepts), most examples are segmented down to the pixel level except textures.

Network dissection method evaluates every individual convolutional unit in a CNN as a solution to a binary segmentation task to every visual concept in Broden dataset.

Interpretability of units within all the convolutional layers is analyzed: color and texture concepts dominate at lower layers while more object and part detectors emerge in higher levels.

Network architectures are analyzed:


Authors explore how training  conditions like number of  iterations, dropout, batch norm affect the representation learning of neural networks:


Nice video showing  how emergent concepts appear when training a model.

Code (Caffe).

Grad-CAM: Visual explanations from NNs (paper, blog,  cool demos). Approach: use gradients flowing into the final convolutional layer to produce a  localization map highlighting the important regions in the image for predicting the concept. In overall known approach (based on Guided Backprop) but with more refined technique and extensive human based studies.

Image captioning explanations:


Attention maps analysis for Visual Question Answering was interesting:


Negative explanations highlight the support of the regions that would make the network predict a different class:


Code (Caffe)

Time Series

During that track it was  emphasized that time series is super popular practical application of ML (finance,  economy, climate, IoT,  web,  healthcare,  energy,  astrophysics, traffic and etc).

2 types of approaches:

  1. Extensive feature engineering + classical  time services ML algos
  2. In case of huge amount of homogeneous time series DNN could be applied (and thus laborious feature engineering could be omitted)


Visualizing and forecasting big time series data. The talk was presented by  Rob Hyndman (website), author of many time series relate books and R packages. Time serious visualization,  automatic forecasting,  hierarchical and  grouped time series were discussed.

The following R packages were used:

  1. forecast
  2. anomalous
  3. hts

Secret link to v2 of “Forecasting: Principles and Practice” free book here.

PS: baby kangaroo  photo as a bonus 😉[2]

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s