【ICLR2019】Poster 论文汇总

ICLR2019 Poster 文章汇总, 共478 papers

在这里插入图片描述

Convolutional Neural Networks on Non-uniform Geometrical Signals Using Euclidean Spectral Transformation
Keywords:Non-uniform Fourier Transform, 3D Learning, CNN, surface reconstruction
TL;DR:We use non-Euclidean Fourier Transformation of shapes defined by a simplicial complex for deep learning, achieving significantly better results than point-based sampling techiques used in current 3D learning literature.

Augmented Cyclic Adversarial Learning for Low Resource Domain Adaptation
Keywords:Domain adaptation, generative adversarial network, cyclic adversarial learning, speech
TL;DR:A new cyclic adversarial learning augmented with auxiliary task model which improves domain adaptation performance in low resource supervised and unsupervised situations

Variance Networks: When Expectation Does Not Meet Your Expectations
Keywords:deep learning, variational inference, variational dropout
TL;DR:It is possible to learn a zero-centered Gaussian distribution over the weights of a neural network by learning only variances, and it works surprisingly well.

Initialized Equilibrium Propagation for Backprop-Free Training
Keywords:credit assignment, energy-based models, biologically plausible learning
TL;DR:We train a feedforward network without backprop by using an energy-based model to provide local targets

Explaining Image Classifiers by Counterfactual Generation
Keywords:Explainability, Interpretability, Generative Models, Saliency Map, Machine Learning, Deep Learning
TL;DR:We compute saliency by using a strong generative model to efficiently marginalize over plausible alternative inputs, revealing concentrated pixel areas that preserve label information.

SNIP: SINGLE-SHOT NETWORK PRUNING BASED ON CONNECTION SENSITIVITY
Keywords:neural network pruning, connection sensitivity
TL;DR:We present a new approach, SNIP, that is simple, versatile and interpretable; it prunes irrelevant connections for a given task at single-shot prior to training and is applicable to a variety of neural network models without modifications.

Diagnosing and Enhancing VAE Models
Keywords:variational autoencoder, generative models
TL;DR:We closely analyze the VAE objective function and draw novel conclusions that lead to simple enhancements.

Disjoint Mapping Network for Cross-modal Matching of Voices and Faces
Keywords:None
TL;DR:None

Automatically Composing Representation Transformations as a Means for Generalization
Keywords:compositionality, deep learning, metareasoning
TL;DR:We explore the problem of compositional generalization and propose a means for endowing neural network architectures with the ability to compose themselves to solve these problems.

Visual Reasoning by Progressive Module Networks
Keywords:None
TL;DR:None

Bayesian Deep Convolutional Networks with Many Channels are Gaussian Processes
Keywords:Deep Convolutional Neural Networks, Gaussian Processes, Bayesian
TL;DR:Finite-width SGD trained CNNs vs. infinitely wide fully Bayesian CNNs. Who wins?

Learning to Learn without Forgetting by Maximizing Transfer and Minimizing Interference
Keywords:None
TL;DR:None

Sparse Dictionary Learning by Dynamical Neural Networks
Keywords:None
TL;DR:None

Eidetic 3D LSTM: A Model for Video Prediction and Beyond
Keywords:None
TL;DR:None

ALISTA: Analytic Weights Are As Good As Learned Weights in LISTA
Keywords:None
TL;DR:None

Three Mechanisms of Weight Decay Regularization
Keywords:Generalization, Regularization, Optimization
TL;DR:We investigate weight decay regularization for different optimizers and identify three distinct mechanisms by which weight decay improves generalization.

Learning Multimodal Graph-to-Graph Translation for Molecule Optimization
Keywords:graph-to-graph translation, graph generation, molecular optimization
TL;DR:We introduce a graph-to-graph encoder-decoder framework for learning diverse graph translations.

A Data-Driven and Distributed Approach to Sparse Signal Representation and Recovery
Keywords:Sparsity, Compressive Sensing, Convolutional Network
TL;DR:We use deep learning techniques to solve the sparse signal representation and recovery problem.

On the Minimal Supervision for Training Any Binary Classifier from Only Unlabeled Data
Keywords:learning from only unlabeled data, empirical risk minimization, unbiased risk estimator
TL;DR:Three class priors are all you need to train deep models from only U data, while any two should not be enough.

Neural Logic Machines
Keywords:Neural-Symbolic Computation, Rule Induction, First-Order Logic
TL;DR:We propose the Neural Logic Machine (NLM), a neural-symbolic architecture for both inductive learning and logic reasoning.

Neural Speed Reading with Structural-Jump-LSTM
Keywords:natural language processing, speed reading, recurrent neural network, classification
TL;DR:We propose a new model for neural speed reading that utilizes the inherent punctuation structure of a text to define effective jumping and skipping behavior.

Rigorous Agent Evaluation: An Adversarial Approach to Uncover Catastrophic Failures
Keywords:agent evaluation, adversarial examples, robustness, safety, reinforcement learning
TL;DR:We show that rare but catastrophic failures may be missed entirely by random testing, which poses issues for safe deployment. Our proposed approach for adversarial testing fixes this.

Woulda, Coulda, Shoulda: Counterfactually-Guided Policy Search
Keywords:None
TL;DR:None

signSGD via Zeroth-Order Oracle
Keywords:nonconvex optimization, zeroth-order algorithm, black-box adversarial attack
TL;DR:We design and analyze a new zeroth-order stochastic optimization algorithm, ZO-signSGD, and demonstrate its connection and application to black-box adversarial attacks in robust deep learning

Preventing Posterior Collapse with delta-VAEs
Keywords:Posterior Collapse, VAE, Autoregressive Models
TL;DR: Avoid posterior collapse by lower bounding the rate.

Algorithmic Framework for Model-based Deep Reinforcement Learning with Theoretical Guarantees
Keywords:model-based reinforcement learning, sample efficiency, deep reinforcement learning
TL;DR:We design model-based reinforcement learning algorithms with theoretical guarantees and achieve state-of-the-art results on Mujuco benchmark tasks when one million or fewer samples are permitted.

Knowledge Flow: Improve Upon Your Teachers
Keywords:Transfer Learning, Reinforcement Learning
TL;DR:‘Knowledge Flow’ trains a deep net (student) by injecting information from multiple nets (teachers). The student is independent upon training and performs very well on learned tasks irrespective of the setting (reinforcement or supervised learning).

Directed-Info GAIL: Learning Hierarchical Policies from Unsegmented Demonstrations using Directed Information
Keywords:Imitation Learning, Reinforcement Learning, Deep Learning
TL;DR:Learning Hierarchical Policies from Unsegmented Demonstrations using Directed Information

A Max-Affine Spline Perspective of Recurrent Neural Networks
Keywords:RNN, max-affine spline operators
TL;DR:We provide new insights and interpretations of RNNs from a max-affine spline operators perspective.

Learning to Navigate the Web
Keywords:navigating web pages, reinforcement learning, q learning, curriculum learning, meta training
TL;DR:We train reinforcement learning policies using reward augmentation, curriculum learning, and meta-learning to successfully navigate web pages.

Training for Faster Adversarial Robustness Verification via Inducing ReLU Stability
Keywords:verification, adversarial robustness, adversarial examples, stability, deep learning, regularization
TL;DR:We develop methods to train deep neural models that are both robust to adversarial perturbations and whose robustness is significantly easier to verify.

Learning to Learn with Conditional Class Dependencies
Keywords:meta-learning, learning to learn, few-shot learning
TL;DR:CAML is an instance of MAML with conditional class dependencies.

Hierarchical Visuomotor Control of Humanoids
Keywords:hierarchical reinforcement learning, motor control, motion capture
TL;DR:Solve tasks involving vision-guided humanoid locomotion, reusing locomotion behavior from motion capture data.

Unsupervised Adversarial Image Reconstruction
Keywords:None
TL;DR:None

Max-MIG: an Information Theoretic Approach for Joint Learning from Crowds
Keywords:None
TL;DR:None

AutoLoss: Learning Discrete Schedule for Alternate Optimization
Keywords:Meta Learning, AutoML, Optimization Schedule
TL;DR:We propose a unified formulation for iterative alternate optimization and develop AutoLoss, a framework to automatically learn and generate optimization schedules.

Learning what and where to attend
Keywords:Attention models, human feature importance, object recognition, cognitive science
TL;DR:A large-scale dataset for training attention models for object recognition leads to more accurate, interpretable, and human-like object recognition.

ROBUST ESTIMATION VIA GENERATIVE ADVERSARIAL NETWORKS
Keywords:robust statistics, neural networks, minimax rate, data depth, contamination model, Tukey median, GAN
TL;DR:GANs are shown to provide us a new effective robust mean estimate against agnostic contaminations with both statistical optimality and practical tractability.

INVASE: Instance-wise Variable Selection using Neural Networks
Keywords:None
TL;DR:None

Meta-Learning with Latent Embedding Optimization
Keywords:meta-learning, few-shot, miniImageNet, tieredImageNet, hypernetworks, generative, latent embedding, optimization
TL;DR:Latent Embedding Optimization (LEO) is a novel gradient-based meta-learner with state-of-the-art performance on the challenging 5-way 1-shot and 5-shot miniImageNet and tieredImageNet classification tasks.

Non-vacuous Generalization Bounds at the ImageNet Scale: a PAC-Bayesian Compression Approach
Keywords:generalization, deep-learning, pac-bayes
TL;DR:We obtain non-vacuous generalization bounds on ImageNet-scale deep neural networks by combining an original PAC-Bayes bound and an off-the-shelf neural network compression method.

Learning to Represent Edits
Keywords:None
TL;DR:None

Neural Probabilistic Motor Primitives for Humanoid Control
Keywords:Motor Primitives, Distillation, Reinforcement Learning, Continuous Control, Humanoid Control, Motion Capture, One-Shot Imitation
TL;DR:Neural Probabilistic Motor Primitives compress motion capture tracking policies into one flexible model capable of one-shot imitation and reuse as a low-level controller.

Differentiable Perturb-and-Parse: Semi-Supervised Parsing with a Structured Variational Autoencoder
Keywords:differentiable dynamic programming, variational auto-encoder, dependency parsing, semi-supervised learning
TL;DR:Differentiable dynamic programming over perturbed input weights with application to semi-supervised VAE

Janossy Pooling: Learning Deep Permutation-Invariant Functions for Variable-Size Inputs
Keywords:representation learning, permutation invariance, set functions, feature pooling
TL;DR:We propose Janossy pooling, a method for learning deep permutation invariant functions designed to exploit relationships within the input sequence and tractable inference strategies such as a stochastic optimization procedure we call piSGD

An Empirical Study of Example Forgetting during Deep Neural Network Learning
Keywords:catastrophic forgetting, sample weighting, deep generalization
TL;DR:We show that catastrophic forgetting occurs within what is considered to be a single task and find that examples that are not prone to forgetting can be removed from the training set without loss of generalization.

RNNs implicitly implement tensor-product representations
Keywords:tensor-product representations, compositionality, neural network interpretability, recurrent neural networks
TL;DR:RNNs implicitly implement tensor-product representations, a principled and interpretable method for representing symbolic structures in continuous space.

Learning To Solve Circuit-SAT: An Unsupervised Differentiable Approach
Keywords:Neuro-Symbolic Methods, Circuit Satisfiability, Neural SAT Solver, Graph Neural Networks
TL;DR:We propose a neural framework that can learn to solve the Circuit Satisfiability problem from (unlabeled) circuit instances.

Dynamic Channel Pruning: Feature Boosting and Suppression
Keywords:dynamic network, faster CNNs, channel pruning
TL;DR:We make convolutional layers run faster by dynamically boosting and suppressing channels in feature computation.

signSGD with Majority Vote is Communication Efficient and Fault Tolerant
Keywords:large-scale learning, distributed systems, communication efficiency, convergence rate analysis, robust optimisation
TL;DR:Workers send gradient signs to the server, and the update is decided by majority vote. We show that this algorithm is convergent, communication efficient and fault tolerant, both in theory and in practice.

Bounce and Learn: Modeling Scene Dynamics with Real-World Bounces
Keywords:None
TL;DR:None

K for the Price of 1: Parameter-efficient Multi-task and Transfer Learning
Keywords:deep learning, mobile, transfer learning, multi-task learning, computer vision, small models, imagenet, inception, batch normalization
TL;DR:A novel and practically effective method to adapt pretrained neural networks to new tasks by retraining a minimal (e.g., less than 2%) number of parameters

Towards Metamerism via Foveated Style Transfer
Keywords:Metamerism, foveation, perception, style transfer, psychophysics
TL;DR:We introduce a novel feed-forward framework to generate visual metamers

Post Selection Inference with Incomplete Maximum Mean Discrepancy Estimator
Keywords:None
TL;DR:None

Emergent Coordination Through Competition
Keywords:Multi-agent learning, Reinforcement Learning
TL;DR:We introduce a new MuJoCo soccer environment for continuous multi-agent reinforcement learning research, and show that population-based training of independent reinforcement learners can learn cooperative behaviors

Prior Convictions: Black-box Adversarial Attacks with Bandits and Priors
Keywords:adversarial examples, gradient estimation, black-box attacks, model-based optimization, bandit optimization
TL;DR:We present a unifying view on black-box adversarial attacks as a gradient estimation problem, and then present a framework (based on bandits optimization) to integrate priors into gradient estimation, leading to significantly increased performance.

Sample Efficient Imitation Learning for Continuous Control
Keywords:Imitation Learning, Continuous Control, Reinforcement Learning, Inverse Reinforcement Learning, Conditional Generative Adversarial Network
TL;DR:In this paper, we proposed a model-free, off-policy IL algorithm for continuous control. Experimental results showed that our algorithm achieves competitive results with GAIL while significantly reducing the environment interactions.

Generative Code Modeling with Graphs
Keywords:Generative Model, Source Code, Graph Learning
TL;DR:Representing programs as graphs including semantics helps when generating programs

Critical Learning Periods in Deep Networks
Keywords:Critical Period, Deep Learning, Information Theory, Artificial Neuroscience, Information Plasticity
TL;DR:Sensory deficits in early training phases can lead to irreversible performance loss in both artificial and neuronal networks, suggesting information phenomena as the common cause, and point to the importance of the initial transient and forgetting.

CEM-RL: Combining evolutionary and gradient-based methods for policy search
Keywords:evolution strategy, deep reinforcement learning
TL;DR:We propose a new combination of evolution strategy and deep reinforcement learning which takes the best of both worlds

LanczosNet: Multi-Scale Deep Graph Convolutional Networks
Keywords:None
TL;DR:None

Excessive Invariance Causes Adversarial Vulnerability
Keywords:Generalization, Adversarial Examples, Invariance, Information Theory, Invertible Networks
TL;DR:We show deep networks are not only too sensitive to task-irrelevant changes of their input, but also too invariant to a wide range of task-relevant changes, thus making vast regions in input space vulnerable to adversarial attacks.

Hindsight policy gradients
Keywords:reinforcement learning, policy gradients, multi-goal reinforcement learning
TL;DR:We introduce the capacity to exploit information about the degree to which an arbitrary goal has been achieved while another goal was intended to policy gradient methods.

Adaptive Gradient Methods with Dynamic Bound of Learning Rate
Keywords:Optimization, SGD, Adam, Generalization
TL;DR:Novel variants of optimization methods that combine the benefits of both adaptive and non-adaptive methods.

Decoupled Weight Decay Regularization
Keywords:None
TL;DR:None

Optimistic mirror descent in saddle-point problems: Going the extra (gradient) mile
Keywords:Mirror descent, extra-gradient, generative adversarial networks, saddle-point problems
TL;DR:We show how the inclusion of an extra-gradient step in first-order GAN training methods can improve stability and lead to improved convergence results.

DialogWAE: Multimodal Response Generation with Conditional Wasserstein Auto-Encoder
Keywords:None
TL;DR:None

No Training Required: Exploring Random Encoders for Sentence Classification
Keywords:None
TL;DR:None

Neural Graph Evolution: Automatic Robot Design
Keywords:Reinforcement learning, graph neural networks, robotics, deep learning, transfer learning
TL;DR:Automatic robotic design search with graph neural networks

Function Space Particle Optimization for Bayesian Neural Networks
Keywords:None
TL;DR:None

Structured Adversarial Attack: Towards General Implementation and Better Interpretability
Keywords:None
TL;DR:None

Spherical CNNs on Unstructured Grids
Keywords:Spherical CNN, unstructured grid, panoramic, semantic segmentation, parameter efficiency
TL;DR:We present a new CNN kernel for unstructured grids for spherical signals, and show significant accuracy and parameter efficiency gain on tasks such as 3D classfication and omnidirectional image segmentation.

Optimal Transport Maps For Distribution Preserving Operations on Latent Spaces of Generative Models
Keywords:generative models, optimal transport, distribution preserving operations
TL;DR:We propose a framework for modifying the latent space operations such that the distribution mismatch between the resulting outputs and the prior distribution the generative model was trained on is fully eliminated.

Deep Lagrangian Networks: Using Physics as Model Prior for Deep Learning
Keywords:Deep Model Learning, Robot Control
TL;DR:This paper introduces a physics prior for Deep Learning and applies the resulting network topology for model-based control.

Accumulation Bit-Width Scaling For Ultra-Low Precision Training Of Deep Networks
Keywords:reduced precision floating-point, partial sum accumulation bit-width, deep learning, training
TL;DR:We present an analytical framework to determine accumulation bit-width requirements in all three deep learning training GEMMs and verify the validity and tightness of our method via benchmarking experiments.

Deep Convolutional Networks as shallow Gaussian Processes
Keywords:Gaussian process, CNN, ResNet, Bayesian
TL;DR:We show that CNNs and ResNets with appropriate priors on the parameters are Gaussian processes in the limit of infinitely many convolutional filters.

Unsupervised Domain Adaptation for Distance Metric Learning
Keywords:domain adaptation, distance metric learning, face recognition
TL;DR:A new theory of unsupervised domain adaptation for distance metric learning and its application to face recognition across diverse ethnicity variations.

A comprehensive, application-oriented study of catastrophic forgetting in DNNs
Keywords:incremental learning, deep neural networks, catatrophic forgetting, sequential learning
TL;DR:We check DNN models for catastrophic forgetting using a new evaluation scheme that reflects typical application conditions, with surprising results.

Posterior Attention Models for Sequence to Sequence Learning
Keywords:posterior inference, attention, seq2seq learning, translation
TL;DR:Computing attention based on posterior distribution leads to more meaningful attention and better performance

Generative Question Answering: Learning to Answer the Whole Question
Keywords:Question answering, question generation, reasoning, squad, clevr
TL;DR:Question answering models that model the joint distribution of questions and answers can learn more than discriminative models

Diversity and Depth in Per-Example Routing Models
Keywords:conditional computation, routing models, depth
TL;DR:Per-example routing models benefit from architectural diversity, but still struggle to scale to a large number of routing decisions.

Selfless Sequential Learning
Keywords:Lifelong learning, Continual Learning, Sequential learning, Regularization
TL;DR:A regularization strategy for improving the performance of sequential learning

M^3RL: Mind-aware Multi-agent Management Reinforcement Learning
Keywords:Multi-agent Reinforcement Learning, Deep Reinforcement Learning
TL;DR:We propose Mind-aware Multi-agent Management Reinforcement Learning (M^3RL) for training a manager to motivate self-interested workers to achieve optimal collaboration by assigning suitable contracts to them.

The Deep Weight Prior
Keywords:deep learning, variational inference, prior distributions
TL;DR:The deep weight prior learns a generative model for kernels of convolutional neural networks, that acts as a prior distribution while training on new datasets.

Efficient Multi-Objective Neural Architecture Search via Lamarckian Evolution
Keywords:Neural Architecture Search, AutoML, AutoDL, Deep Learning, Evolutionary Algorithms, Multi-Objective Optimization
TL;DR:We propose a method for efficient Multi-Objective Neural Architecture Search based on Lamarckian inheritance and evolutionary algorithms.

Quaternion Recurrent Neural Networks
Keywords:None
TL;DR:None

Adversarial Audio Synthesis
Keywords:audio, waveform, spectrogram, GAN, adversarial, WaveGAN, SpecGAN
TL;DR:Learning to synthesize raw waveform audio with GANs

Preconditioner on Matrix Lie Group for SGD
Keywords:preconditioner, stochastic gradient descent, Newton method, Fisher information, natural gradient, Lie group
TL;DR:We propose a new framework for preconditioner learning, derive new forms of preconditioners and learning methods, and reveal the relationship to methods like RMSProp, Adam, Adagrad, ESGD, KFAC, batch normalization, etc.

Learning to Screen for Fast Softmax Inference on Large Vocabulary Neural Networks
Keywords:None
TL;DR:None

Adaptive Posterior Learning: few-shot learning with a surprise-based memory module
Keywords:metalearning, memory, few-shot, relational, self-attention, classification, sequential, reasoning, working memory, episodic memory
TL;DR:We introduce a model which generalizes quickly from few observations by storing surprising information and attending over the most relevant data at each time point.

Probabilistic Planning with Sequential Monte Carlo methods
Keywords:control as inference, probabilistic planning, sequential monte carlo, model based reinforcement learning
TL;DR:Leveraging control as inference and Sequential Monte Carlo methods, we proposed a probabilistic planning algorithm.

Plan Online, Learn Offline: Efficient Learning and Exploration via Model-Based Control
Keywords:deep reinforcement learning, exploration, model-based
TL;DR:We propose a framework that incorporates planning for efficient exploration and learning in complex environments.

DHER: Hindsight Experience Replay for Dynamic Goals
Keywords:None
TL;DR:None

FlowQA: Grasping Flow in History for Conversational Machine Comprehension
Keywords:Machine Comprehension, Conversational Agent, Natural Language Processing, Deep Learning
TL;DR:We propose the Flow mechanism and an end-to-end architecture, FlowQA, that achieves SotA on two conversational QA datasets and a sequential instruction understanding task.

Learning to Design RNA
Keywords:matter engineering, bioinformatics, rna design, reinforcement learning, meta learning, neural architecture search, hyperparameter optimization
TL;DR:We learn to solve the RNA Design problem with reinforcement learning using meta learning and autoML approaches.

Robust Conditional Generative Adversarial Networks
Keywords:conditional GAN, unsupervised pathway, autoencoder, robustness
TL;DR:We introduce a new type of conditional GAN, which aims to leverage structure in the target space of the generator. We augment the generator with a new, unsupervised pathway to learn the target structure.

Top-Down Neural Model For Formulae
Keywords:logic, formula, recursive neural networks, recurrent neural networks
TL;DR:A top-down approach how to recursively represent propositional formulae by neural networks is presented.

Cost-Sensitive Robustness against Adversarial Examples
Keywords:Certified robustness, Adversarial examples, Cost-sensitive learning
TL;DR:A general method for training certified cost-sensitive robust classifier against adversarial perturbations

The role of over-parametrization in generalization of neural networks
Keywords:Generalization, Over-Parametrization, Neural Networks, Deep Learning
TL;DR:We suggest a generalization bound that could partly explain the improvement in generalization with over-parametrization.

Diffusion Scattering Transforms on Graphs
Keywords:graph neural networks, deep learning, stability, scattering transforms, convolutional neural networks
TL;DR:Stability of scattering transform representations of graph data to deformations of the underlying graph support.

Capsule Graph Neural Network
Keywords:CapsNet, Graph embedding, GNN
TL;DR:Inspired by CapsNet, we propose a novel architecture for graph embeddings on the basis of node features extracted from GNN.

Energy-Constrained Compression for Deep Neural Networks via Weighted Sparse Projection and Layer Input Masking
Keywords:None
TL;DR:None

Emerging Disentanglement in Auto-Encoder Based Unsupervised Image Content Transfer
Keywords:Image-to-image Translation, Disentanglement, Autoencoders, Faces
TL;DR:An image to image translation method which adds to one image the content of another thereby creating a new image.

SGD Converges to Global Minimum in Deep Learning via Star-convex Path
Keywords:None
TL;DR:None

Toward Understanding the Impact of Staleness in Distributed Machine Learning
Keywords:None
TL;DR:None

Transfer Learning for Sequences via Learning to Collocate
Keywords:transfer learning, recurrent neural network, attention, natural language processing
TL;DR:Transfer learning for sequence via learning to align cell-level information across domains.

Learning Procedural Abstractions and Evaluating Discrete Latent Temporal Structure
Keywords:None
TL;DR:None

Unsupervised Speech Recognition via Segmental Empirical Output Distribution Matching
Keywords:None
TL;DR:None

Adversarial Attacks on Graph Neural Networks via Meta Learning
Keywords:graph mining, adversarial attacks, meta learning, graph neural networks, node classification
TL;DR:We use meta-gradients to attack the training procedure of deep neural networks for graphs.

Maximal Divergence Sequential Autoencoder for Binary Software Vulnerability Detection
Keywords:Vulnerabilities Detection, Sequential Auto-Encoder, Separable Representation
TL;DR:We propose a novel method named Maximal Divergence Sequential Auto-Encoder that leverages Variational AutoEncoder representation for binary code vulnerability detection.

Neural Program Repair by Jointly Learning to Localize and Repair
Keywords:neural program repair, neural program embeddings, pointer networks
TL;DR:Multi-headed Pointer Networks for jointly learning to localize and repair Variable Misuse bugs

Information-Directed Exploration for Deep Reinforcement Learning
Keywords:reinforcement learning, exploration, information directed sampling
TL;DR:We develop a practical extension of Information-Directed Sampling for Reinforcement Learning, which accounts for parametric uncertainty and heteroscedasticity in the return distribution for exploration.

Attention, Learn to Solve Routing Problems!
Keywords:learning, routing problems, heuristics, attention, reinforce, travelling salesman problem, vehicle routing problem, orienteering problem, prize collecting travelling salesman problem
TL;DR:Attention based model trained with REINFORCE with greedy rollout baseline to learn heuristics with competitive results on TSP and other routing problems

L2-Nonexpansive Neural Networks
Keywords:None
TL;DR:None

Improving Generalization and Stability of Generative Adversarial Networks
Keywords:GAN, generalization, gradient penalty, zero centered, convergence
TL;DR:We propose a zero-centered gradient penalty for improving generalization and stability of GANs

Adaptive Input Representations for Neural Language Modeling
Keywords:Neural language modeling
TL;DR:Variable capacity input word embeddings and SOTA on WikiText-103, Billion Word benchmarks.

Neural Persistence: A Complexity Measure for Deep Neural Networks Using Algebraic Topology
Keywords:Algebraic topology, persistent homology, network complexity, neural network
TL;DR:We develop a new topological complexity measure for deep neural networks and demonstrate that it captures their salient properties.

Efficient Augmentation via Data Subsampling
Keywords:data augmentation, invariance, subsampling, influence
TL;DR:Selectively augmenting difficult to classify points results in efficient training.

Neural TTS Stylization with Adversarial and Collaborative Games
Keywords:Text-To-Speech synthesis, GANs
TL;DR:a generative adversarial network for style modeling in a text-to-speech system

Optimal Control Via Neural Networks: A Convex Approach
Keywords:None
TL;DR:None

CBOW Is Not All You Need: Combining CBOW with the Compositional Matrix Space Model
Keywords:Text representation learning, Sentence embedding, Efficient training scheme, word2vec
TL;DR:We present a novel training scheme for efficiently obtaining order-aware sentence representations.

Stochastic Optimization of Sorting Networks via Continuous Relaxations
Keywords:continuous relaxations, sorting, permutation, stochastic computation graphs, Plackett-Luce
TL;DR:We provide a continuous relaxation to the sorting operator, enabling end-to-end, gradient-based stochastic optimization.

Adaptivity of deep ReLU network for learning in Besov and mixed smooth Besov spaces: optimal rate and curse of dimensionality
Keywords:None
TL;DR:None

Generating Multiple Objects at Spatially Distinct Locations
Keywords:controllable image generation, text-to-image synthesis, generative model, generative adversarial network, gan
TL;DR:Extend GAN architecture to obtain control over locations and identities of multiple objects within generated images.

Near-Optimal Representation Learning for Hierarchical Reinforcement Learning
Keywords:representation hierarchy reinforcement learning
TL;DR:We translate a bound on sub-optimality of representations to a practical training objective in the context of hierarchical reinforcement learning.

Understanding Composition of Word Embeddings via Tensor Decomposition
Keywords:word embeddings, semantic composition, tensor decomposition
TL;DR:We present a generative model for compositional word embeddings that captures syntactic relations, and provide empirical verification and evaluation.

Structured Neural Summarization
Keywords:Summarization, Graphs, Source Code
TL;DR:One simple trick to improve sequence models: Compose them with a graph model

Graph Wavelet Neural Network
Keywords:graph convolution, graph wavelet transform, graph Fourier transform, semi-supervised learning
TL;DR:We present graph wavelet neural network (GWNN), a novel graph convolutional neural network (CNN), leveraging graph wavelet transform to address the shortcoming of previous spectral graph CNN methods that depend on graph Fourier transform.

A rotation-equivariant convolutional neural network model of primary visual cortex
Keywords:rotation equivariance, equivariance, primary visual cortex, V1, neuroscience, system identification
TL;DR:A rotation-equivariant CNN model of V1 that outperforms previous models and suggest functional groupings of V1 neurons.

Supervised Community Detection with Line Graph Neural Networks
Keywords:community detection, graph neural networks, belief propagation, energy landscape, non-backtracking matrix
TL;DR:We propose a novel graph neural network architecture based on the non-backtracking matrix defined over the edge adjacencies and demonstrate its effectiveness in community detection tasks on graphs.

Multiple-Attribute Text Rewriting
Keywords:controllable text generation, generative models, conditional generative models, style transfer
TL;DR:A system for rewriting text conditioned on multiple controllable attributes

Wasserstein Barycenter Model Ensembling
Keywords:Wasserstein barycenter model ensembling
TL;DR:we propose to use Wasserstein barycenters for semantic model ensembling

Policy Transfer with Strategy Optimization
Keywords:transfer learning, reinforcement learning, modeling error, strategy optimization
TL;DR:We propose a policy transfer algorithm that can overcome large and challenging discrepancies in the system dynamics such as latency, actuator modeling error, etc.

code2seq: Generating Sequences from Structured Representations of Code
Keywords:source code, programs, code2seq
TL;DR:We leverage the syntactic structure of source code to generate natural language sequences.

Predict then Propagate: Graph Neural Networks meet Personalized PageRank
Keywords:Graph, GCN, GNN, Neural network, Graph neural network, Message passing neural network, Semi-supervised classification, Semi-supervised learning, PageRank, Personalized PageRank
TL;DR:Personalized propagation of neural predictions (PPNP) improves graph neural networks by separating them into prediction and propagation via personalized PageRank.

Slimmable Neural Networks
Keywords:Slimmable neural networks, mobile deep learning, accuracy-efficiency trade-offs
TL;DR:We present a simple and general method to train a single neural network executable at different widths (number of channels in a layer), permitting instant and adaptive accuracy-efficiency trade-offs at runtime.

Analysing Mathematical Reasoning Abilities of Neural Models
Keywords:mathematics, dataset, algebraic, reasoning
TL;DR:A dataset for testing mathematical reasoning (and algebraic generalization), and results on current sequence-to-sequence models.

RotDCF: Decomposition of Convolutional Filters for Rotation-Equivariant Deep Networks
Keywords:None
TL;DR:None

Execution-Guided Neural Program Synthesis
Keywords:None
TL;DR:None

Dynamic Sparse Graph for Efficient Deep Learning
Keywords:Sparsity, compression, training, acceleration
TL;DR:We construct dynamic sparse graph via dimension-reduction search to reduce compute and memory cost in both DNN training and inference.

Fixup Initialization: Residual Learning Without Normalization
Keywords:deep learning, residual networks, initialization, batch normalization, layer normalization
TL;DR:All you need to train deep residual networks is a good initialization; normalization layers are not necessary.

ProbGAN: Towards Probabilistic GAN with Theoretical Guarantees
Keywords:Generative Adversarial Networks, Bayesian Deep Learning, Mode Collapse, Inception Score, Generator, Discriminator, CIFAR-10, STL-10, ImageNet
TL;DR:A novel probabilistic treatment for GAN with theoretical guarantee.

Exploration by random network distillation
Keywords:reinforcement learning, exploration, curiosity
TL;DR:A simple exploration bonus is introduced and achieves state of the art performance in 3 hard exploration Atari games.

Unsupervised Learning of the Set of Local Maxima
Keywords:None
TL;DR:None

On the Convergence of A Class of Adam-Type Algorithms for Non-Convex Optimization
Keywords:nonconvex optimization, Adam, convergence analysis
TL;DR:We analyze convergence of Adam-type algorithms and provide mild sufficient conditions to guarantee their convergence, we also show violating the conditions can makes an algorithm diverge.

Minimum Divergence vs. Maximum Margin: an Empirical Comparison on Seq2Seq Models
Keywords:None
TL;DR:None

GANSynth: Adversarial Neural Audio Synthesis
Keywords:GAN, Audio, WaveNet, NSynth, Music
TL;DR:High-quality audio synthesis with GANs

Sliced Wasserstein Auto-Encoders
Keywords:optimal transport, Wasserstein distances, auto-encoders, unsupervised learning
TL;DR:In this paper we use the sliced-Wasserstein distance to shape the latent distribution of an auto-encoder into any samplable prior distribution.

Learning Two-layer Neural Networks with Symmetric Inputs
Keywords:Neural Network, Optimization, Symmetric Inputs, Moment-of-moments
TL;DR:We give an algorithm for learning a two-layer neural network with symmetric input distribution.

Learning to Understand Goal Specifications by Modelling Reward
Keywords:instruction following, reward modelling, language understanding
TL;DR:We propose AGILE, a framework for training agents to perform instructions from examples of respective goal-states.

Do Deep Generative Models Know What They Don't Know?
Keywords:None
TL;DR:None

Identifying and Controlling Important Neurons in Neural Machine Translation
Keywords:neural machine translation, individual neurons, unsupervised, analysis, correlation, translation control, distributivity, localization
TL;DR:Unsupervised methods for finding, analyzing, and controlling important neurons in NMT

Representing Formal Languages: A Comparison Between Finite Automata and Recurrent Neural Networks
Keywords:Language recognition, Recurrent Neural Networks, Representation Learning, deterministic finite automaton, automaton
TL;DR:Finite Automata Can be Linearly decoded from Language-Recognizing RNNs using low coarseness abstraction functions and high accuracy decoders.

Visual Explanation by Interpretation: Improving Visual Feedback Capabilities of Deep Neural Networks
Keywords:model explanation, model interpretation, explainable ai, evaluation
TL;DR:Interpretation by Identifying model-learned features that serve as indicators for the task of interest. Explain model decisions by highlighting the response of these features in test data. Evaluate explanations objectively with a controlled dataset.

Don't let your Discriminator be fooled
Keywords:GAN, generative models, computer vision
TL;DR:A discriminator that is not easily fooled by adversarial example makes GAN training more robust and leads to a smoother objective.

Latent Convolutional Models
Keywords:latent models, convolutional networks, unsupervised learning, deep learning, modeling natural images, image restoration
TL;DR:We present a new deep latent model of natural images that can be trained from unlabeled datasets and can be utilized to solve various image restoration tasks.

A Universal Music Translation Network
Keywords:None
TL;DR:None

How to train your MAML
Keywords:meta-learning, deep-learning, few-shot learning, supervised learning, neural-networks, stochastic optimization
TL;DR:MAML is great, but it has many problems, we solve many of those problems and as a result we learn most hyper parameters end to end, speed-up training and inference and set a new SOTA in few-shot learning

Learning a SAT Solver from Single-Bit Supervision
Keywords:sat, search, graph neural network, theorem proving, proof
TL;DR:We train a graph network to predict boolean satisfiability and show that it learns to search for solutions, and that the solutions it finds can be decoded from its activations.

Learning Representations of Sets through Optimized Permutations
Keywords:sets, representation learning, permutation invariance
TL;DR:Learn how to permute a set, then encode permuted set with RNN to obtain a set representation.

Big-Little Net: An Efficient Multi-Scale Feature Representation for Visual and Speech Recognition
Keywords:None
TL;DR:None

Unsupervised Hyper-alignment for Multilingual Word Embeddings
Keywords:None
TL;DR:None

Visual Semantic Navigation using Scene Priors
Keywords:None
TL;DR:None

NOODL: Provable Online Dictionary Learning and Sparse Coding
Keywords:provable dictionary learning, sparse coding, support recovery, iterative hard thresholding, matrix factorization, neural architectures, noodl
TL;DR:We present a provable algorithm for exactly recovering both factors of the dictionary learning model.

Stochastic Gradient/Mirror Descent: Minimax Optimality and Implicit Regularization
Keywords:None
TL;DR:None

Active Learning with Partial Feedback
Keywords:None
TL;DR:None

Gradient descent aligns the layers of deep linear networks
Keywords:None
TL;DR:None

Data-Dependent Coresets for Compressing Neural Networks with Applications to Generalization Bounds
Keywords:None
TL;DR:None

On the loss landscape of a class of deep neural networks with no bad local valleys
Keywords:None
TL;DR:None

DOM-Q-NET: Grounded RL on Structured Language
Keywords:Reinforcement Learning, Web Navigation, Graph Neural Networks
TL;DR:Graph-based Deep Q Network for Web Navigation

Boosting Robustness Certification of Neural Networks
Keywords:Robustness certification, Adversarial Attacks, Abstract Interpretation, MILP Solvers, Verification of Neural Networks
TL;DR:We refine the over-approximation results from incomplete verifiers using MILP solvers to prove more robustness properties than state-of-the-art.

Learning To Simulate
Keywords:Simulation in machine learning, reinforcement learning, policy gradients, image rendering
TL;DR:We propose an algorithm that automatically adjusts parameters of a simulation engine to generate training data for a neural network such that validation accuracy is maximized.

Towards Understanding Regularization in Batch Normalization
Keywords:None
TL;DR:None

The Laplacian in RL: Learning Representations with Efficient Approximations
Keywords:Laplacian, reinforcement learning, representation
TL;DR:We propose a scalable method to approximate the eigenvectors of the Laplacian in the reinforcement learning context and we show that the learned representations can improve the performance of an RL agent.

Predicting the Generalization Gap in Deep Networks with Margin Distributions
Keywords:Deep learning, large margin, generalization bounds, generalization gap.
TL;DR:We develop a new scheme to predict the generalization gap in deep networks with high accuracy.

Adversarial Imitation via Variational Inverse Reinforcement Learning
Keywords:Our method introduces the empowerment-regularized maximum-entropy inverse reinforcement learning to learn near-optimal rewards and policies from expert demonstrations.
TL;DR:Inverse Reinforcement Learning, Imitation learning, Variational lnference, Learning from demonstrations

Reasoning About Physical Interactions with Object-Oriented Prediction and Planning
Keywords:structured scene representation, predictive models, intuitive physics, self-supervised learning
TL;DR:We present a framework for learning object-centric representations suitable for planning in tasks that require an understanding of physics.

LayoutGAN: Generating Graphic Layouts with Wireframe Discriminators
Keywords:None
TL;DR:None

Learning Mixed-Curvature Representations in Product Spaces
Keywords:embeddings, non-Euclidean geometry, manifolds, geometry of data
TL;DR:Product manifold embedding spaces with heterogenous curvature yield improved representations compared to traditional embedding spaces for a variety of structures.

StrokeNet: A Neural Painting Environment
Keywords:image generation, differentiable model, reinforcement learning, deep learning, model based
TL;DR:StrokeNet is a novel architecture where the agent is trained to draw by strokes on a differentiable simulation of the environment, which could effectively exploit the power of back-propagation.

Harmonizing Maximum Likelihood with GANs for Multimodal Conditional Generation
Keywords:conditional GANs, conditional image generation, multimodal generation, reconstruction loss, maximum likelihood estimation, moment matching
TL;DR:We prove that the mode collapse in conditional GANs is largely attributed to a mismatch between reconstruction loss and GAN loss and introduce a set of novel loss functions as alternatives for reconstruction loss.

Measuring Compositionality in Representation Learning
Keywords:compositionality, representation learning, evaluation
TL;DR:This paper proposes a simple procedure for evaluating compositional structure in learned representations, and uses the procedure to explore the role of compositionality in four learning problems.

Benchmarking Neural Network Robustness to Common Corruptions and Perturbations
Keywords:robustness, benchmark, convnets, perturbations
TL;DR:We propose ImageNet-C to measure classifier corruption robustness and ImageNet-P to measure perturbation robustness

ADef: an Iterative Algorithm to Construct Adversarial Deformations
Keywords:Adversarial examples, deformations, deep neural networks, computer vision
TL;DR:We propose a new, efficient algorithm to construct adversarial examples by means of deformations, rather than additive perturbations.

Discriminator-Actor-Critic: Addressing Sample Inefficiency and Reward Bias in Adversarial Imitation Learning
Keywords:deep learning, reinforcement learning, imitation learning, adversarial learning
TL;DR:We address sample inefficiency and reward bias in adversarial imitation learning algorithms such as GAIL and AIRL.

Doubly Reparameterized Gradient Estimators for Monte Carlo Objectives
Keywords:variational autoencoder, reparameterization trick, IWAE, VAE, RWS, JVI
TL;DR:Doubly reparameterized gradient estimators provide unbiased variance reduction which leads to improved performance.

Learning Recurrent Binary/Ternary Weights
Keywords:Quantized Recurrent Neural Network, Hardware Implementation, Deep Learning
TL;DR:We propose high-performance LSTMs with binary/ternary weights, that can greatly reduce implementation complexity

Learning concise representations for regression by evolving networks of trees
Keywords:regression, stochastic optimization, evolutionary compution, feature engineering
TL;DR:Representing the network architecture as a set of syntax trees and optimizing their structure leads to accurate and concise regression models.

Efficient Training on Very Large Corpora via Gramian Estimation
Keywords:similarity learning, pairwise learning, matrix factorization, Gramian estimation, variance reduction, neural embedding models, recommender systems
TL;DR:We develop efficient methods to train neural embedding models with a dot-product structure, by reformulating the objective function in terms of generalized Gram matrices, and maintaining estimates of those matrices.

MAE: Mutual Posterior-Divergence Regularization for Variational AutoEncoders
Keywords:None
TL;DR:None

Residual Non-local Attention Networks for Image Restoration
Keywords:Non-local network, attention network, image restoration, residual learning
TL;DR:New state-of-the-art framework for image restoration

Meta-Learning For Stochastic Gradient MCMC
Keywords:Meta Learning, MCMC
TL;DR:This paper proposes a method to automate the design of stochastic gradient MCMC proposal using meta learning approach.

Systematic Generalization: What Is Required and Can It Be Learned?
Keywords:systematic generalization, language understanding, visual questions answering, neural module networks
TL;DR:We show that modular structured models are the best in terms of systematic generalization and that their end-to-end versions don't generalize as well.

Efficient Lifelong Learning with A-GEM
Keywords:Lifelong Learning, Continual Learning, Catastrophic Forgetting, Few-shot Transfer
TL;DR:An efficient lifelong learning algorithm that provides a better trade-off between accuracy and time/ memory complexity compared to other algorithms.

Multi-step Retriever-Reader Interaction for Scalable Open-domain Question Answering
Keywords:Open domain Question Answering, Reinforcement Learning, Query reformulation
TL;DR:Paragraph retriever and machine reader interacts with each other via reinforcement learning to yield large improvements on open domain datasets

Double Viterbi: Weight Encoding for High Compression Ratio and Fast On-Chip Reconstruction for Deep Neural Network
Keywords:quantization, pruning, memory footprint, model compression, sparse matrix
TL;DR:We present a new weight encoding scheme which enables high compression ratio and fast sparse-to-dense matrix conversion.

Overcoming the Disentanglement vs Reconstruction Trade-off via Jacobian Supervision
Keywords:disentangling, autoencoders, jacobian, face manipulation
TL;DR:A method for learning image representations that are good for both disentangling factors of variation and obtaining faithful reconstructions.

RotatE: Knowledge Graph Embedding by Relational Rotation in Complex Space
Keywords:knowledge graph embedding, knowledge graph completion, adversarial sampling
TL;DR:A new state-of-the-art approach for knowledge graph embedding.

Guiding Policies with Language via Meta-Learning
Keywords:meta-learning, language grounding, interactive
TL;DR:We propose a meta-learning method for interactively correcting policies with natural language.

AdaShift: Decorrelation and Convergence of Adaptive Learning Rate Methods
Keywords:optimizer, Adam, convergence, decorrelation
TL;DR:We analysis and solve the non-convergence issue of Adam.

AD-VAT: An Asymmetric Dueling mechanism for learning Visual Active Tracking
Keywords:Active tracking, reinforcement learning, adversarial learning, multi agent
TL;DR:We propose AD-VAT, where the tracker and the target object, viewed as two learnable agents, are opponents and can mutually enhance during training.

Marginal Policy Gradients: A Unified Family of Estimators for Bounded Action Spaces with Applications
Keywords:None
TL;DR:None

On Self Modulation for Generative Adversarial Networks
Keywords:unsupervised learning, generative adversarial networks, deep generative modelling
TL;DR:A simple GAN modification that improves performance across many losses, architectures, regularization schemes, and datasets.

Off-Policy Evaluation and Learning from Logged Bandit Feedback: Error Reduction via Surrogate Policy
Keywords:None
TL;DR:None

Subgradient Descent Learns Orthogonal Dictionaries
Keywords:Dictionary learning, Sparse coding, Non-convex optimization, Theory
TL;DR:Efficient dictionary learning by L1 minimization via a novel analysis of the non-convex non-smooth geometry.

ClariNet: Parallel Wave Generation in End-to-End Text-to-Speech
Keywords:None
TL;DR:None

MARGINALIZED AVERAGE ATTENTIONAL NETWORK FOR WEAKLY-SUPERVISED LEARNING
Keywords:feature aggregation, weakly supervised learning, temporal action localization
TL;DR:A novel marginalized average attentional network for weakly-supervised temporal action localization

Towards GAN Benchmarks Which Require Generalization
Keywords:evaluation, generative adversarial networks, adversarial divergences
TL;DR:We argue that GAN benchmarks must require a large sample from the model to penalize memorization and investigate whether neural network divergences have this property.

A Closer Look at Few-shot Classification
Keywords:few shot classification, meta-learning
TL;DR: A detailed empirical study in few-shot classification that revealing challenges in standard evaluation setting and showing a new direction.

Meta-Learning Probabilistic Inference for Prediction
Keywords:probabilistic models, approximate inference, few-shot learning, meta-learning
TL;DR:Novel framework for meta-learning that unifies and extends a broad class of existing few-shot learning methods. Achieves strong performance on few-shot learning benchmarks without requiring iterative test-time inference.

Deep reinforcement learning with relational inductive biases
Keywords:relational reasoning, reinforcement learning, graph neural networks, starcraft, generalization, inductive bias
TL;DR:Relational inductive biases improve out-of-distribution generalization capacities in model-free reinforcement learning agents

Relaxed Quantization for Discretized Neural Networks
Keywords:Quantization, Compression, Neural Networks, Efficiency
TL;DR:We introduce a technique that allows for gradient based training of quantized neural networks.

Tree-Structured Recurrent Switching Linear Dynamical Systems for Multi-Scale Modeling
Keywords:None
TL;DR:None

STCN: Stochastic Temporal Convolutional Networks
Keywords:latent variables, variational inference, temporal convolutional networks, sequence modeling, auto-regressive modeling
TL;DR:We combine the computational advantages of temporal convolutional architectures with the expressiveness of stochastic latent variables.

Soft Q-Learning with Mutual-Information Regularization
Keywords:None
TL;DR:None

On the Turing Completeness of Modern Neural Network Architectures
Keywords:Transformer, NeuralGPU, Turing completeness
TL;DR:We show that the Transformer architecture and the Neural GPU are Turing complete.

Improving Differentiable Neural Computers Through Memory Masking, De-allocation, and Link Distribution Sharpness Control
Keywords:None
TL;DR:None

Evaluating Robustness of Neural Networks with Mixed Integer Programming
Keywords:verification, adversarial robustness, adversarial examples, deep learning
TL;DR:We efficiently verify the robustness of deep neural models with over 100,000 ReLUs, certifying more samples than the state-of-the-art and finding more adversarial examples than a strong first-order attack.

Random mesh projectors for inverse problems
Keywords:imaging, inverse problems, subspace projections, random Delaunay triangulations, CNN, geophysics, regularization
TL;DR:We solve ill-posed inverse problems with scarce ground truth examples by estimating an ensemble of random projections of the model instead of the model itself.

Multi-Agent Dual Learning
Keywords:None
TL;DR:None

Complement Objective Training
Keywords:optimization, entropy, image recognition, natural language understanding, adversarial attacks, deep learning
TL;DR:We propose Complement Objective Training (COT), a new training paradigm that optimizes both the primary and complement objectives for effectively learning the parameters of neural networks.

Mode Normalization
Keywords:Deep Learning, Expert Models, Normalization, Computer Vision
TL;DR:We present a novel normalization method for deep neural networks that is robust to multi-modalities in intermediate feature distributions.

Detecting Egregious Responses in Neural Sequence-to-sequence Models
Keywords:Deep Learning, Natural Language Processing, Adversarial Attacks, Dialogue Response Generation
TL;DR:This paper aims to provide an empirical answer to the question of whether well-trained dialogue response model can output malicious responses.

Learning Actionable Representations with Goal Conditioned Policies
Keywords:Representation Learning, Reinforcement Learning
TL;DR:Learning state representations which capture factors necessary for control

Verification of Non-Linear Specifications for Neural Networks
Keywords:None
TL;DR:None

Generating Liquid Simulations with Deformation-aware Neural Networks
Keywords:Learning weighting and deformations of space-time data sets for highly efficient approximations of liquid behavior.
TL;DR:deformation learning, spatial transformer networks, fluid simulation

DyRep: Learning Representations over Dynamic Graphs
Keywords:Dynamic Graphs, Representation Learning, Dynamic Processes, Temporal Point Process, Attention, Latent Representation
TL;DR:Models Representation Learning over dynamic graphs as latent hidden process bridging two observed processes of Topological Evolution of and Interactions on dynamic graphs.

Trellis Networks for Sequence Modeling
Keywords:sequence modeling, language modeling, recurrent networks, convolutional networks, trellis networks
TL;DR:Trellis networks are a new sequence modeling architecture that bridges recurrent and convolutional models and sets a new state of the art on word- and character-level language modeling.

Scalable Unbalanced Optimal Transport using Generative Adversarial Networks
Keywords:unbalanced optimal transport, generative adversarial networks, population modeling
TL;DR:We propose new methodology for unbalanced optimal transport using generative adversarial networks.

Solving the Rubik's Cube with Approximate Policy Iteration
Keywords:reinforcement learning, Rubik's Cube, approximate policy iteration, deep learning, deep reinforcement learning
TL;DR:We solve the Rubik's Cube with pure reinforcement learning

Variance Reduction for Reinforcement Learning in Input-Driven Environments
Keywords:reinforcement learning, policy gradient, input-driven environments, variance reduction, baseline
TL;DR:For environments dictated partially by external input processes, we derive an input-dependent baseline that provably reduces the variance for policy gradient methods and improves the policy performance in a wide range of RL tasks.

Model-Predictive Policy Learning with Uncertainty Regularization for Driving in Dense Traffic
Keywords:model-based reinforcement learning, stochastic video prediction, autonomous driving
TL;DR:A model-based RL approach which uses a differentiable uncertainty penalty to learn driving policies from purely observational data.

GAN Dissection: Visualizing and Understanding Generative Adversarial Networks
Keywords:GAN representations are examined in detail, and sets of representation units are found that control the generation of semantic concepts in the output.
TL;DR:GANs, representation, interpretability, causality

Improving MMD-GAN Training with Repulsive Loss Function
Keywords:generative adversarial nets, loss function, maximum mean discrepancy, image generation, unsupervised learning
TL;DR:Rearranging the terms in maximum mean discrepancy yields a much better loss function for the discriminator of generative adversarial nets

Deterministic PAC-Bayesian generalization bounds for deep networks via generalizing noise-resilience
Keywords:generalization, PAC-Bayes, SGD, learning theory, implicit regularization
TL;DR:We provide a PAC-Bayes based generalization guarantee for uncompressed, deterministic deep networks by generalizing noise-resilience of the network on the training data to the test data.

Recall Traces: Backtracking Models for Efficient Reinforcement Learning
Keywords:Model free RL, Variational Inference
TL;DR:A backward model of previous (state, action) given the next state, i.e. P(s_t, a_t | s_{t+1}), can be used to simulate additional trajectories terminating at states of interest! Improves RL learning efficiency.

Stable Recurrent Models
Keywords:stability, gradient descent, non-convex optimization, recurrent neural networks
TL;DR:Stable recurrent models can be approximated by feed-forward networks and empirically perform as well as unstable models on benchmark tasks.

The Limitations of Adversarial Training and the Blind-Spot Attack
Keywords:Adversarial Examples, Adversarial Training, Blind-Spot Attack
TL;DR:We show that even the strongest adversarial training methods cannot defend against adversarial examples crafted on slightly scaled and shifted test images.

Efficiently testing local optimality and escaping saddles for ReLU networks
Keywords:local optimality, second-order stationary point, escaping saddle points, nondifferentiability, ReLU, empirical risk
TL;DR:A theoretical algorithm for testing local optimality and extracting descent directions at nondifferentiable points of empirical risks of one-hidden-layer ReLU networks.

ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware
Keywords:Neural Architecture Search, Efficient Neural Networks
TL;DR:Proxy-less neural architecture search for directly learning architectures on large-scale target task (ImageNet) while reducing the cost to the same level of normal training.

Hierarchical Reinforcement Learning via Advantage-Weighted Information Maximization
Keywords:Hierarchical reinforcement learning, Representation learning, Continuous control
TL;DR:This paper presents a hierarchical reinforcement learning framework based on deterministic option policies and mutual information maximization.

Generalizable Adversarial Training via Spectral Normalization
Keywords:None
TL;DR:None

Adversarial Domain Adaptation for Stable Brain-Machine Interfaces
Keywords:Brain-Machine Interfaces, Domain Adaptation, Adversarial Networks
TL;DR:We implement an adversarial domain adaptation network to stabilize a fixed Brain-Machine Interface against gradual changes in the recorded neural signals.

Deep Online Learning Via Meta-Learning: Continual Adaptation for Model-Based RL
Keywords:None
TL;DR:None

Deep Anomaly Detection with Outlier Exposure
Keywords:confidence, uncertainty, anomaly, robustness
TL;DR:OE teaches anomaly detectors to learn heuristics for detecting unseen anomalies; experiments are in classification, density estimation, and calibration in NLP and vision settings; we do not tune on test distribution samples, unlike previous work

Contingency-Aware Exploration in Reinforcement Learning
Keywords:Reinforcement Learning, Exploration, Contingency-Awareness
TL;DR:We investigate contingency-awareness and controllable aspects in exploration and achieve state-of-the-art performance on Montezuma's Revenge without expert demonstrations.

Context-adaptive Entropy Model for End-to-end Optimized Image Compression
Keywords:image compression, deep learning, entropy model
TL;DR:Context-adaptive entropy model for use in end-to-end optimized image compression, which significantly improves compression performance

Variational Discriminator Bottleneck: Improving Imitation Learning, Inverse RL, and GANs by Constraining Information Flow
Keywords:reinforcement learning, generative adversarial networks, imitation learning, inverse reinforcement learning, information bottleneck
TL;DR:Regularizing adversarial learning with an information bottleneck, applied to imitation learning, inverse reinforcement learning, and generative adversarial networks.

Meta-learning with differentiable closed-form solvers
Keywords:few-shot learning, one-shot learning, meta-learning, deep learning, ridge regression, classification
TL;DR:We propose a meta-learning approach for few-shot classification that achieves strong performance at high-speed by back-propagating through the solution of fast solvers, such as ridge regression or logistic regression.

Learning Self-Imitating Diverse Policies
Keywords:Reinforcement-learning, Imitation-learning, Ensemble-training
TL;DR:Policy optimization by using past good rollouts from the agent; learning shaped rewards via divergence minimization; SVPG with JS-kernel for population-based exploration.

ProxQuant: Quantized Neural Networks via Proximal Operators
Keywords:Model quantization, Optimization, Regularization
TL;DR:A principled framework for model quantization using the proximal gradient method, with empirical evaluation and theoretical convergence analyses.

Universal Transformers
Keywords:sequence-to-sequence, rnn, transformer, machine translation, language understanding, learning to execute
TL;DR:We introduce the Universal Transformer, a self-attentive parallel-in-time recurrent sequence model that outperforms Transformers and LSTMs on a wide range of sequence-to-sequence tasks, including machine translation.

Learning to Adapt in Dynamic, Real-World Environments through Meta-Reinforcement Learning
Keywords:meta-learning, reinforcement learning, meta reinforcement learning, online adaptation
TL;DR:A model-based meta-RL algorithm that enables a real robot to adapt online in dynamic environments

L-Shapley and C-Shapley: Efficient Model Interpretation for Structured Data
Keywords:Model Interpretation, Feature Selection
TL;DR:We develop two linear-complexity algorithms for model-agnostic model interpretation based on the Shapley value, in the settings where the contribution of features to the target is well-approximated by a graph-structured factorization.

Discovery of Natural Language Concepts in Individual Units of CNNs
Keywords:interpretability of deep neural networks, natural language representation
TL;DR:We show that individual units in CNN representations learned in NLP tasks are selectively responsive to natural language concepts.

Towards the first adversarially robust neural network model on MNIST
Keywords:None
TL;DR:None

Discriminator Rejection Sampling
Keywords:GANs, rejection sampling
TL;DR:We use a GAN discriminator to perform an approximate rejection sampling scheme on the output of the GAN generator.

Harmonic Unpaired Image-to-image Translation
Keywords:unpaired image-to-image translation, cyclegan, smoothness constraint
TL;DR:Smooth regularization over sample graph for unpaired image-to-image translation results in significantly improved consistency

Universal Successor Features Approximators
Keywords:None
TL;DR:None

Gradient Descent Provably Optimizes Over-parameterized Neural Networks
Keywords:theory, non-convex optimization, overparameterization, gradient descent
TL;DR:We prove gradient descent achieves zero training loss with a linear rate on over-parameterized neural networks.

Opportunistic Learning: Budgeted Cost-Sensitive Learning from Data Streams
Keywords:Cost-Aware Learning, Feature Acquisition, Reinforcement Learning, Stream Learning, Deep Q-Learning
TL;DR:An online algorithm for cost-aware feature acquisition and prediction

DARTS: Differentiable Architecture Search
Keywords:deep learning, autoML, neural architecture search, image classification, language modeling
TL;DR:We propose a differentiable architecture search algorithm for both convolutional and recurrent networks, achieving competitive performance with the state of the art using orders of magnitude less computation resources.

Feature-Wise Bias Amplification
Keywords:None
TL;DR:None

The relativistic discriminator: a key element missing from standard GAN
Keywords:Improving the quality and stability of GANs using a relativistic discriminator; IPM GANs (such as WGAN-GP) are a special case.
TL;DR:AI, deep learning, generative models, GAN

Understanding and Improving Interpolation in Autoencoders via an Adversarial Regularizer
Keywords:We propose a regularizer that improves interpolation and autoencoders and show that it also improves the learned representation for downstream tasks.
TL;DR:autoencoders, interpolation, unsupervised learning, representation learning, adversarial learning

Quasi-hyperbolic momentum and Adam for deep learning
Keywords:sgd, momentum, nesterov, adam, qhm, qhadam, optimization
TL;DR:Mix plain SGD and momentum (or do something similar with Adam) for great profit.

Local SGD Converges Fast and Communicates Little
Keywords:optimization, communication, theory, stochastic gradient descent, SGD, mini-batch, local SGD, parallel restart SGD, distributed training
TL;DR:We prove that parallel local SGD achieves linear speedup with much lesser communication than parallel mini-batch SGD.

Learning Finite State Representations of Recurrent Policy Networks
Keywords:recurrent neural networks, finite state machine, quantization, interpretability, autoencoder, moore machine, reinforcement learning, imitation learning, representation, Atari, Tomita
TL;DR:Extracting a finite state machine from a recurrent neural network via quantization for the purpose of interpretability with experiments on Atari.

Multilingual Neural Machine Translation with Knowledge Distillation
Keywords:NMT, Multilingual NMT, Knowledge Distillation
TL;DR:We proposed a knowledge distillation based method to boost the accuracy of multilingual neural machine translation.

MisGAN: Learning from Incomplete Data with Generative Adversarial Networks
Keywords:generative models, missing data
TL;DR:This paper presents a GAN-based framework for learning the distribution from high-dimensional incomplete data.

A Direct Approach to Robust Deep Learning Using Adversarial Networks
Keywords:deep learning, adversarial learning, generative adversarial networks
TL;DR:Jointly train an adversarial noise generating network with a classification network to provide better robustness to adversarial attacks.

Combinatorial Attacks on Binarized Neural Networks
Keywords:binarized neural networks, combinatorial optimization, integer programming
TL;DR:Gradient-based attacks on binarized neural networks are not effective due to the non-differentiability of such networks; Our IPROP algorithm solves this problem using integer optimization

Exemplar Guided Unsupervised Image-to-Image Translation with Semantic Consistency
Keywords:image-to-image translation, image generation, domain adaptation
TL;DR:We propose the Exemplar Guided & Semantically Consistent Image-to-image Translation (EGSC-IT) network which conditions the translation process on an exemplar image in the target domain.

ARM: Augment-REINFORCE-Merge Gradient for Stochastic Binary Networks
Keywords:Antithetic sampling, variable augmentation, deep discrete latent variable models, variance reduction, variational auto-encoder
TL;DR:An unbiased and low-variance gradient estimator for discrete latent variable models

Building Dynamic Knowledge Graphs from Text using Machine Reading Comprehension
Keywords:None
TL;DR:None

Information asymmetry in KL-regularized RL
Keywords:Deep Reinforcement Learning, Continuous Control, RL as Inference
TL;DR:Limiting state information for the default policy can improvement performance, in a KL-regularized RL framework where both agent and default policy are optimized together

TimbreTron: A WaveNet(CycleGAN(CQT(Audio))) Pipeline for Musical Timbre Transfer
Keywords:Generative models, Timbre Transfer, Wavenet, CycleGAN
TL;DR:We present the TimbreTron, a pipeline for perfoming high-quality timbre transfer on musical waveforms using CQT-domain style transfer.

Whitening and Coloring Batch Transform for GANs
Keywords:None
TL;DR:None

Learnable Embedding Space for Efficient Neural Architecture Compression
Keywords:Network Compression, Neural Architecture Search, Bayesian Optimization, Architecture Embedding
TL;DR:We propose a method to incrementally learn an embedding space over the domain of network architectures, to enable the careful selection of architectures for evaluation during compressed architecture search.

On the Sensitivity of Adversarial Robustness to Input Data Distributions
Keywords:adversarial robustness, adversarial training, PGD training, adversarial perturbation, input data distribution
TL;DR:Robustness performance of PGD trained models are sensitive to semantics-preserving transformation of image datasets, which implies the trickiness of evaluation of robust learning algorithms in practice.

Minimal Images in Deep Neural Networks: Fragile Object Recognition in Natural Images
Keywords:None
TL;DR:None

A Statistical Approach to Assessing Neural Network Robustness
Keywords:neural network verification, multi-level splitting, formal verification
TL;DR:We introduce a statistical approach to assessing neural network robustness that provides an informative notion of how robust a network is, rather than just the conventional binary assertion of whether or not of property is violated.

Improving Sequence-to-Sequence Learning via Optimal Transport
Keywords:None
TL;DR:None

PATE-GAN: Generating Synthetic Data with Differential Privacy Guarantees
Keywords:None
TL;DR:None

Integer Networks for Data Compression with Latent-Variable Models
Keywords:data compression, variational models, network quantization
TL;DR:We train variational models with quantized networks for computational determinism. This enables using them for cross-platform data compression.

Value Propagation Networks
Keywords:Reinforcement Learning, Value Iteration, Navigation, Convolutional Neural Networks, Learning to plan
TL;DR:We present planners based on convnets that are sample-efficient and that generalize to larger instances of navigation and pathfinding problems.

Bayesian Policy Optimization for Model Uncertainty
Keywords:Bayes-Adaptive Markov Decision Process, Model Uncertainty, Bayes Policy Optimization
TL;DR:We formulate model uncertainty in Reinforcement Learning as a continuous Bayes-Adaptive Markov Decision Process and present a method for practical and scalable Bayesian policy optimization.

Variational Bayesian Phylogenetic Inference
Keywords:Bayesian phylogenetic inference, Variational inference, Subsplit Bayesian networks
TL;DR:The first variational Bayes formulation of phylogenetic inference, a challenging inference problem over structures with intertwined discrete and continuous components

LEARNING FACTORIZED REPRESENTATIONS FOR OPEN-SET DOMAIN ADAPTATION
Keywords:None
TL;DR:None

On the Universal Approximability and Complexity Bounds of Quantized ReLU Neural Networks
Keywords:Quantized Neural Networks, Universial Approximability, Complexity Bounds, Optimal Bit-width
TL;DR:This paper proves the universal approximability of quantized ReLU neural networks and puts forward the complexity bound given arbitrary error.

Learning Localized Generative Models for 3D Point Clouds via Graph Convolution
Keywords:A GAN using graph convolution operations with dynamically computed graphs from hidden features
TL;DR:GAN, graph convolution, point clouds

ACCELERATING NONCONVEX LEARNING VIA REPLICA EXCHANGE LANGEVIN DIFFUSION
Keywords:None
TL;DR:None

Dynamically Unfolding Recurrent Restorer: A Moving Endpoint Control Method for Image Restoration
Keywords:image restoration, differential equation
TL;DR:We propose a novel method to handle image degradations of different levels by learning a diffusion terminal time. Our model can generalize to unseen degradation level and different noise statistic.

Bias-Reduced Uncertainty Estimation for Deep Neural Classifiers
Keywords:Uncertainty estimation, Deep learning
TL;DR:We use snapshots from the training process to improve any uncertainty estimation method of a DNN classifier.

CAMOU: Learning Physical Vehicle Camouflages to Adversarially Attack Detectors in the Wild
Keywords:Adversarial Attack, Object Detection, Synthetic Simulation
TL;DR:We propose a method to learn physical vehicle camouflage to adversarially attack object detectors in the wild. We find our camouflage effective and transferable.

Learning Latent Superstructures in Variational Autoencoders for Deep Multidimensional Clustering
Keywords:latent tree model, variational autoencoder, deep learning, latent variable model, bayesian network, structure learning, stepwise em, message passing, graphical model, multidimensional clustering, unsupervised learning
TL;DR:We investigate a variant of variational autoencoders where there is a superstructure of discrete latent variables on top of the latent features.

Learning Programmatically Structured Representations with Perceptor Gradients
Keywords:None
TL;DR:None

Variational Autoencoders with Jointly Optimized Latent Dependency Structure
Keywords:deep generative models, structure learning
TL;DR:We propose a method for learning latent dependency structure in variational autoencoders.

The Unusual Effectiveness of Averaging in GAN Training
Keywords:None
TL;DR:None

Beyond Pixel Norm-Balls: Parametric Adversaries using an Analytically Differentiable Renderer
Keywords:adversarial examples, norm-balls, differentiable renderer
TL;DR:Enabled by a novel differentiable renderer, we propose a new metric that has real-world implications for evaluating adversarial machine learning algorithms, resolving the lack of realism of the existing metric based on pixel norms.

Diversity is All You Need: Learning Skills without a Reward Function
Keywords:reinforcement learning, unsupervised learning, skill discovery
TL;DR:We propose an algorithm for learning useful skills without a reward function, and show how these skills can be used to solve downstream tasks.

Supervised Policy Update for Deep Reinforcement Learning
Keywords:Deep Reinforcement Learning
TL;DR:first posing and solving the sample efficiency optimization problem in the non-parameterized policy space, and then solving a supervised regression problem to find a parameterized policy that is near the optimal non-parameterized policy.

Learning sparse relational transition models
Keywords:Deictic reference, relational model, rule-based transition model
TL;DR:A new approach that learns a representation for describing transition models in complex uncertaindomains using relational rules.

Learning to Schedule Communication in Multi-agent Reinforcement Learning
Keywords:None
TL;DR:None

Hierarchical RL Using an Ensemble of Proprioceptive Periodic Policies
Keywords:None
TL;DR:None

Multi-class classification without multi-class labels
Keywords:None
TL;DR:None

What do you learn from context? Probing for sentence structure in contextualized word representations
Keywords:natural language processing, word embeddings, transfer learning, interpretability
TL;DR:We probe for sentence structure in ELMo and related contextual embedding models. We find existing models efficiently encode syntax and show evidence of long-range dependencies, but only offer small improvements on semantic tasks.

Spectral Inference Networks: Unifying Deep and Spectral Learning
Keywords:spectral learning, unsupervised learning, manifold learning, dimensionality reduction
TL;DR:We show how to learn spectral decompositions of linear operators with deep learning, and use it for unsupervised learning without a generative model.

PeerNets: Exploiting Peer Wisdom Against Adversarial Attacks
Keywords:None
TL;DR:None

Attentive Neural Processes
Keywords:Neural Processes, Conditional Neural Processes, Stochastic Processes, Regression, Attention
TL;DR:A model for regression that learns conditional distributions of a stochastic process, by incorporating attention into Neural Processes.

Representation Degeneration Problem in Training Natural Language Generation Models
Keywords:None
TL;DR:None

Hierarchical interpretations for neural network predictions
Keywords:interpretability, natural language processing, computer vision
TL;DR:We introduce and validate hierarchical local interpretations, the first technique to automatically search for and display important interactions for individual predictions made by LSTMs and CNNs.

Spreading vectors for similarity search
Keywords:dimensionality reduction, similarity search, indexing, differential entropy
TL;DR:We learn a neural network that uniformizes the input distribution, which leads to competitive indexing performance in high-dimensional space

A Convergence Analysis of Gradient Descent for Deep Linear Neural Networks
Keywords:Deep Learning, Learning Theory, Non-Convex Optimization
TL;DR:We analyze gradient descent for deep linear neural networks, providing a guarantee of convergence to global optimum at a linear rate.

Feed-forward Propagation in Probabilistic Neural Networks with Categorical and Max Layers
Keywords:probabilistic neural network, uncertainty, dropout, bayesian, softmax, argmax, logsumexp
TL;DR:Approximating mean and variance of the NN output over noisy input / dropout / uncertain parameters. Analytic approximations for argmax, softmax and max layers.

Measuring and regularizing networks in function space
Keywords:function space, Hilbert space, empirical characterization, multitask learning, catastrophic forgetting, optimization, natural gradient
TL;DR:We find movement in function space is not proportional to movement in parameter space during optimization. We propose a new natural-gradient style optimizer to address this.

Fluctuation-dissipation relations for stochastic gradient descent
Keywords:stochastic gradient descent, adaptive method, loss surface, Hessian
TL;DR:We prove fluctuation-dissipation relations for SGD, which can be used to (i) adaptively set learning rates and (ii) probe loss surfaces.

Poincare Glove: Hyperbolic Word Embeddings
Keywords:word embeddings, hyperbolic spaces, poincare ball, hypernymy, analogy, similarity, gaussian embeddings
TL;DR:We embed words in the hyperbolic space and make the connection with the Gaussian word embeddings.

Episodic Curiosity through Reachability
Keywords:deep learning, reinforcement learning, curiosity, exploration, episodic memory
TL;DR:We propose a novel model of curiosity based on episodic memory and the ideas of reachability which allows us to overcome the known "couch-potato" issues of prior work.

Phase-Aware Speech Enhancement with Deep Complex U-Net
Keywords:speech enhancement, deep learning, complex neural networks, phase estimation
TL;DR:This paper proposes a novel complex masking method for speech enhancement along with a loss function for efficient phase estimation.

Generative predecessor models for sample-efficient imitation learning
Keywords:None
TL;DR:None

Adaptive Estimators Show Information Compression in Deep Neural Networks
Keywords:deep neural networks, mutual information, information bottleneck, noise, L2 regularization
TL;DR:We developed robust mutual information estimates for DNNs and used them to observe compression in networks with non-saturating activation functions

Multilingual Neural Machine Translation With Soft Decoupled Encoding
Keywords:None
TL;DR:None

Approximating CNNs with Bag-of-local-Features models works surprisingly well on ImageNet
Keywords:interpretability, representation learning, bag of features, deep learning, object recognition
TL;DR:Aggregating class evidence from many small image patches suffices to solve ImageNet, yields more interpretable models and can explain aspects of the decision-making of popular DNNs.

Reward Constrained Policy Optimization
Keywords:reinforcement learning, markov decision process, constrained markov decision process, deep learning
TL;DR:For complex constraints in which it is not easy to estimate the gradient, we use the discounted penalty as a guiding signal. We prove that under certain assumptions it converges to a feasible solution.

On the Relation Between the Sharpest Directions of DNN Loss and the SGD Step Length
Keywords:optimization, generalization, theory of deep learning, SGD, hessian
TL;DR:SGD is steered early on in training towards a region in which its step is too large compared to curvature, which impacts the rest of training.

Modeling the Long Term Future in Model-Based Reinforcement Learning
Keywords:model-based reinforcement learning, variation inference
TL;DR:incorporating, in the model, latent variables that encode future content improves the long-term prediction accuracy, which is critical for better planning in model-based RL.

Understanding Straight-Through Estimator in Training Activation Quantized Neural Nets
Keywords:straight-through estimator, quantized activation, binary neuron
TL;DR:We make theoretical justification for the concept of straight-through estimator.

DISTRIBUTIONAL CONCAVITY REGULARIZATION FOR GANS
Keywords:None
TL;DR:None

LeMoNADe: Learned Motif and Neuronal Assembly Detection in calcium imaging videos
Keywords:VAE, unsupervised learning, neuronal assemblies, calcium imaging analysis
TL;DR:We present LeMoNADe, an end-to-end learned motif detection method directly operating on calcium imaging videos.

Competitive experience replay
Keywords:reinforcement learning, sparse reward, goal-based learning
TL;DR:a novel method to learn with sparse reward using adversarial reward re-labeling

Multi-Domain Adversarial Learning
Keywords:multi-domain learning, domain adaptation, adversarial learning, H-divergence, deep representation learning, high-content microscopy
TL;DR:Adversarial Domain adaptation and Multi-domain learning: a new loss to handle multi- and single-domain classes in the semi-supervised setting.

ProMP: Proximal Meta-Policy Search
Keywords:Meta-Reinforcement Learning, Meta-Learning, Reinforcement-Learning
TL;DR:A novel and theoretically grounded meta-reinforcement learning algorithm

Don't Settle for Average, Go for the Max: Fuzzy Sets and Max-Pooled Word Vectors
Keywords:word vectors, sentence representations, distributed representations, fuzzy sets, bag-of-words, unsupervised learning, word vector compositionality, max-pooling, Jaccard index
TL;DR:Max-pooled word vectors with fuzzy Jaccard set similarity are an extremely competitive baseline for semantic similarity; we propose a simple dynamic variant that performs even better.

Stable Opponent Shaping in Differentiable Games
Keywords:multi-agent learning, multiple interacting losses, opponent shaping, exploitation, convergence
TL;DR:Opponent shaping is a powerful approach to multi-agent learning but can prevent convergence; our SOS algorithm fixes this with strong guarantees in all differentiable games.

A Mean Field Theory of Batch Normalization
Keywords:theory, batch normalization, mean field theory, trainability
TL;DR:Batch normalization causes exploding gradients in vanilla feedforward networks.

Learning Exploration Policies for Navigation
Keywords:None
TL;DR:None

Distribution-Interpolation Trade off in Generative Models
Keywords:generative models, latent distribution, Cauchy distribution, interpolations
TL;DR:We theoretically prove that linear interpolations are unsuitable for analysis of trained implicit generative models.

Learning to Describe Scenes with Programs
Keywords:Structured scene representations, program synthesis
TL;DR:We present scene programs, a structured scene representation that captures both low-level object appearance and high-level regularity in the scene.

Visceral Machines: Risk-Aversion in Reinforcement Learning with Intrinsic Physiological Rewards
Keywords:Reinforcement Learning, Simulation, Affective Computing
TL;DR:We present a novel approach to reinforcement learning that leverages a task-independent intrinsic reward function trained on peripheral pulse measurements that are correlated with human autonomic nervous system responses.

Deep Frank-Wolfe For Neural Network Optimization
Keywords:optimization, conditional gradient, Frank-Wolfe, SVM
TL;DR:We train neural networks by locally linearizing them and using a linear SVM solver (Frank-Wolfe) at each iteration.

LEARNING TO PROPAGATE LABELS: TRANSDUCTIVE PROPAGATION NETWORK FOR FEW-SHOT LEARNING
Keywords:few-shot learning, meta-learning, label propagation, manifold learning
TL;DR:We propose a novel meta-learning framework for transductive inference that classifies the entire test set at once to alleviate the low-data problem.

Improving the Generalization of Adversarial Training with Domain Adaptation
Keywords:adversarial training, domain adaptation, adversarial example, deep learning
TL;DR:We propose a novel adversarial training with domain adaptation method that significantly improves the generalization ability on adversarial examples from different attacks.

Dimensionality Reduction for Representing the Knowledge of Probabilistic Models
Keywords:metric learning, distance learning, dimensionality reduction, bound guarantees
TL;DR:dimensionality reduction for cases where examples can be represented as soft probability distributions

Learning protein sequence embeddings using information from structure
Keywords:sequence embedding, sequence alignment, RNN, LSTM, protein structure, amino acid sequence, contextual embeddings, transmembrane prediction
TL;DR:We present a method for learning protein sequence embedding models using structural information in the form of global structural similarity between proteins and within protein residue-residue contacts.

Variational Smoothing in Recurrent Neural Network Language Models
Keywords:None
TL;DR:None

Biologically-Plausible Learning Algorithms Can Scale to Large Datasets
Keywords:biologically plausible learning algorithm, ImageNet, sign-symmetry, feedback alignment
TL;DR:Biologically plausible learning algorithms, particularly sign-symmetry, work well on ImageNet

Coarse-grain Fine-grain Coattention Network for Multi-evidence Question Answering
Keywords:question answering, reading comprehension, nlp, natural language processing, attention, representation learning
TL;DR:A new state-of-the-art model for multi-evidence question answering using coarse-grain fine-grain hierarchical attention.

Learning a Meta-Solver for Syntax-Guided Program Synthesis
Keywords:Syntax-guided Synthesis, Context Free Grammar, Logical Specification, Representation Learning, Meta Learning, Reinforcement Learning
TL;DR:We propose a meta-learning framework that learns a transferable policy from only weak supervision to solve synthesis tasks with different logical specifications and grammars.

Towards Robust, Locally Linear Deep Networks
Keywords:robust derivatives, transparency, interpretability
TL;DR:A scalable algorithm to establish robust derivatives of deep networks w.r.t. the inputs.

How Important is a Neuron
Keywords:None
TL;DR:None

Learning to Make Analogies by Contrasting Abstract Relational Structure
Keywords:cognitive science, analogy, psychology, cognitive theory, cognition, abstraction, generalization
TL;DR:The most robust capacity for analogical reasoning is induced when networks learn analogies by contrasting abstract relational structures in their input domains.

Learning what you can do before doing anything
Keywords:unsupervised learning, vision, motion, action space, video prediction, variational models
TL;DR:We learn a representation of an agent's action space from pure visual observations. We use a recurrent latent variable approach with a novel composability loss.

Learning Grid Cells as Vector Representation of Self-Position Coupled with Matrix Representation of Self-Motion
Keywords:None
TL;DR:None

Universal Stagewise Learning for Non-Convex Problems with Convergence on Averaged Solutions
Keywords:None
TL;DR:None

Invariant and Equivariant Graph Networks
Keywords:graph learning, equivariance, deep learning
TL;DR:The paper provides a full characterization of permutation invariant and equivariant linear layers for graph data.

Robustness May Be at Odds with Accuracy
Keywords:adversarial examples, robust machine learning, robust optimization, deep feature representations
TL;DR:We show that adversarial robustness might come at the cost of standard classification performance, but also yields unexpected benefits.

Feature Intertwiner for Object Detection
Keywords:feature learning, computer vision, deep learning
TL;DR:(Camera-ready version) A feature intertwiner module to leverage features from one accurate set to help the learning of another less reliable set.

Adversarial Reprogramming of Neural Networks
Keywords:Adversarial, Neural Networks, Machine Learning Security
TL;DR:We introduce the first instance of adversarial attacks that reprogram the target model to perform a task chosen by the attacker---without the attacker needing to specify or compute the desired output for each test-time input.

G-SGD: Optimizing ReLU Neural Networks in its Positively Scale-Invariant Space
Keywords:None
TL;DR:None

From Hard to Soft: Understanding Deep Network Nonlinearities via Vector Quantization and Statistical Inference
Keywords:Spline, Vector Quantization, Inference, Nonlinearities, Deep Network
TL;DR:Reformulate deep networks nonlinearities from a vector quantization scope and bridge most known nonlinearities together.

Aggregated Momentum: Stability Through Passive Damping
Keywords:momentum, optimization, deep learning, neural networks
TL;DR:We introduce a simple variant of momentum optimization which is able to outperform classical momentum, Nesterov, and Adam on deep learning tasks with minimal hyperparameter tuning.

Variational Autoencoder with Arbitrary Conditioning
Keywords:unsupervised learning, generative models, conditional variational autoencoder, variational autoencoder, missing features multiple imputation, inpainting
TL;DR:We propose an extension of conditional variational autoencoder that allows conditioning on an arbitrary subset of the features and sampling the remaining ones.

Time-Agnostic Prediction: Predicting Predictable Video Frames
Keywords:visual prediction, subgoal generation, bottleneck states, time-agnostic
TL;DR:In visual prediction tasks, letting your predictive model choose which times to predict does two things: (i) improves prediction quality, and (ii) leads to semantically coherent "bottleneck state" predictions, which are useful for planning.

A Closer Look at Deep Learning Heuristics: Learning rate restarts, Warmup and Distillation
Keywords:deep learning heuristics, learning rate restarts, learning rate warmup, knowledge distillation, mode connectivity, SVCCA
TL;DR:We use empirical tools of mode connectivity and SVCCA to investigate neural network training heuristics of learning rate restarts, warmup and knowledge distillation.

Self-Monitoring Navigation Agent via Auxiliary Progress Estimation
Keywords:visual grounding, textual grounding, instruction-following, navigation agent
TL;DR:We propose a self-monitoring agent for the Vision-and-Language Navigation task.

Kernel Change-point Detection with Auxiliary Deep Generative Models
Keywords:deep kernel learning, generative models, kernel two-sample test, time series change-point detection
TL;DR:In this paper, we propose KL-CPD, a novel kernel learning framework for time series CPD that optimizes a lower bound of test power via an auxiliary generative model as a surrogate to the abnormal distribution.

Unsupervised Learning via Meta-Learning
Keywords:unsupervised learning, meta-learning
TL;DR:An unsupervised learning method that uses meta-learning to enable efficient learning of downstream image classification tasks, outperforming state-of-the-art methods.

Auxiliary Variational MCMC
Keywords:None
TL;DR:None

Neural network gradient-based learning of black-box function interfaces
Keywords:neural networks, black box functions, gradient descent
TL;DR:Training DNNs to interface w black box functions wo intermediate labels by using an estimator sub-network that can be replaced with the black box after training

Self-Tuning Networks: Bilevel Optimization of Hyperparameters using Structured Best-Response Functions
Keywords:hyperparameter optimization, game theory, optimization
TL;DR:We use a hypernetwork to predict optimal weights given hyperparameters, and jointly train everything together.

Unsupervised Control Through Non-Parametric Discriminative Rewards
Keywords:deep reinforcement learning, goals, UVFA, mutual information
TL;DR:Unsupervised reinforcement learning method for learning a policy to robustly achieve perceptually specified goals.

Interpolation-Prediction Networks for Irregularly Sampled Time Series
Keywords:irregular sampling, multivariate time series, supervised learning, interpolation, missing data
TL;DR:This paper presents a new deep learning architecture for addressing the problem of supervised learning with sparse and irregularly sampled multivariate time series.

Riemannian Adaptive Optimization Methods
Keywords:Riemannian optimization, adaptive, hyperbolic, curvature, manifold, adam, amsgrad, adagrad, rsgd, convergence
TL;DR:Adapting Adam, Amsgrad, Adagrad to Riemannian manifolds.

Minimal Random Code Learning: Getting Bits Back from Compressed Model Parameters
Keywords:compression, neural networks, bits-back argument, Bayesian, Shannon, information theory
TL;DR:This paper proposes an effective method to compress neural networks based on recent results in information theory.

Characterizing Audio Adversarial Examples Using Temporal Dependency
Keywords:audio adversarial example, mitigation, detection, machine learning
TL;DR:Adversarial audio discrimination using temporal dependency

Equi-normalization of Neural Networks
Keywords:convolutional neural networks, Normalization, Sinkhorn, Regularization
TL;DR:Fast iterative algorithm to balance the energy of a network while staying in the same functional equivalence class

Generalized Tensor Models for Recurrent Neural Networks
Keywords:expressive power, recurrent neural networks, Tensor-Train decomposition
TL;DR:Analysis of expressivity and generality of recurrent neural networks with ReLu nonlinearities using Tensor-Train decomposition.

Wizard of Wikipedia: Knowledge-Powered Conversational Agents
Keywords:dialogue, knowledge, language, conversation
TL;DR:We build knowledgeable conversational agents by conditioning on Wikipedia + a new supervised task.

Are adversarial examples inevitable?
Keywords:adversarial examples, neural networks, security
TL;DR:This paper identifies classes of problems for which adversarial examples are inescapable, and derives fundamental bounds on the susceptibility of any classifier to adversarial examples.

A Variational Inequality Perspective on Generative Adversarial Networks
Keywords:optimization, variational inequality, games, saddle point, extrapolation, averaging, extragradient, generative modeling, generative adversarial network
TL;DR:We cast GANs in the variational inequality framework and import techniques from this literature to optimize GANs better; we give algorithmic extensions and empirically test their performance for training GANs.

Learning-Based Frequency Estimation Algorithms
Keywords:streaming algorithms, heavy-hitters, Count-Min, Count-Sketch
TL;DR:Data stream algorithms can be improved using deep learning, while retaining performance guarantees.

From Language to Goals: Inverse Reinforcement Learning for Vision-Based Instruction Following
Keywords:inverse reinforcement learning, language grounding, instruction following, language-based learning
TL;DR:We ground language commands in a high-dimensional visual environment by learning language-conditioned rewards using inverse reinforcement learning.

Backpropamine: training self-modifying neural networks with differentiable neuromodulated plasticity
Keywords:meta-learning, reinforcement learning, plasticity, neuromodulation, Hebbian learning, recurrent neural networks
TL;DR:Neural networks can be trained to modify their own connectivity, improving their online learning performance on challenging tasks.

Recurrent Experience Replay in Distributed Reinforcement Learning
Keywords:RNN, LSTM, experience replay, distributed training, reinforcement learning
TL;DR:Investigation on combining recurrent neural networks and experience replay leading to state-of-the-art agent on both Atari-57 and DMLab-30 using single set of hyper-parameters.

A Generative Model For Electron Paths
Keywords:Molecules, Reaction Prediction, Graph Neural Networks, Deep Generative Models
TL;DR:A generative model for reaction prediction that learns the mechanistic electron steps of a reaction directly from raw reaction data.

Modeling Uncertainty with Hedged Instance Embeddings
Keywords:uncertainty, instance embedding, metric learning, probabilistic embedding
TL;DR:The paper proposes using probability distributions instead of points for instance embeddings tasks such as recognition and verification.

Beyond Greedy Ranking: Slate Optimization via List-CVAE
Keywords:CVAE, VAE, recommendation system, slate optimization, whole page optimization
TL;DR:We used a CVAE type model structure to learn to directly generate slates/whole pages for recommendation systems.

Stochastic Prediction of Multi-Agent Interactions from Partial Observations
Keywords:Dynamics modeling, partial observations, multi-agent interactions, predictive models
TL;DR:We present a method which learns to integrate temporal information and ambiguous visual information in the context of interacting agents.

GamePad: A Learning Environment for Theorem Proving
Keywords:Theorem proving, ITP, systems, neural embeddings
TL;DR:We introduce a system called GamePad to explore the application of machine learning methods to theorem proving in the Coq proof assistant.

GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Keywords:natural language understanding, multi-task learning, evaluation
TL;DR:We present a multi-task benchmark and analysis platform for evaluating generalization in natural language understanding systems.

On Computation and Generalization of Generative Adversarial Networks under Spectrum Control
Keywords:None
TL;DR:None

Large-Scale Study of Curiosity-Driven Learning
Keywords:exploration, curiosity, intrinsic reward, no extrinsic reward, unsupervised, no-reward, skills
TL;DR:An agent trained only with curiosity, and no extrinsic reward, does surprisingly well on 54 popular environments, including the suite of Atari games, Mario etc.

Unsupervised Discovery of Parts, Structure, and Dynamics
Keywords:Self-Supervised Learning, Visual Prediction, Hierarchical Models
TL;DR:Learning object parts, hierarchical structure, and dynamics by watching how they move

Music Transformer: Generating Music with Long-Term Structure
Keywords:music generation
TL;DR:We show the first successful use of Transformer in generating music that exhibits long-term structure.

BabyAI: A Platform to Study the Sample Efficiency of Grounded Language Learning
Keywords:language, learning, efficiency, imitation learning, reinforcement learning
TL;DR:We present the BabyAI platform for studying data efficiency of language learning with a human in the loop

Analyzing Inverse Problems with Invertible Neural Networks
Keywords:Inverse problems, Neural Networks, Uncertainty, Invertible Neural Networks
TL;DR:To analyze inverse problems with Invertible Neural Networks

RelGAN: Relational Generative Adversarial Networks for Text Generation
Keywords:None
TL;DR:None

The Singular Values of Convolutional Layers
Keywords:singular values, operator norm, convolutional layers, regularization
TL;DR:We characterize the singular values of the linear transformation associated with a standard 2D multi-channel convolutional layer, enabling their efficient computation.

An Empirical study of Binary Neural Networks' Optimisation
Keywords:None
TL;DR:None

Approximability of Discriminators Implies Diversity in GANs
Keywords:Theory, Generative adversarial networks, Mode collapse, Generalization
TL;DR:GANs can in principle learn distributions sample-efficiently, if the discriminator class is compact and has strong distinguishing power against the particular generator class.

Learning Embeddings into Entropic Wasserstein Spaces
Keywords:Embedding, Wasserstein, Sinkhorn, Optimal Transport
TL;DR:We show that Wasserstein spaces are good targets for embedding data with complex semantic structure.

DeepOBS: A Deep Learning Optimizer Benchmark Suite
Keywords:deep learning, optimization
TL;DR:We provide a software package that drastically simplifies, automates, and improves the evaluation of deep learning optimizers.

InfoBot: Transfer and Exploration via the Information Bottleneck
Keywords:Information bottleneck, policy transfer, policy generalization, exploration
TL;DR:Training agents with goal-policy information bottlenecks promotes transfer and yields a powerful exploration bonus

The Comparative Power of ReLU Networks and Polynomial Kernels in the Presence of Sparse Latent Structure
Keywords:theory, representational power, universal approximators, polynomial kernels, latent sparsity, beyond worst case, separation result
TL;DR:Beyond-worst-case analysis of the representational power of ReLU nets & polynomial kernels -- in particular in the presence of sparse latent structure.

Learning Implicitly Recurrent CNNs Through Parameter Sharing
Keywords:deep learning, architecture search, computer vision
TL;DR:We propose a method that enables CNN folding to create recurrent connections

Learning Particle Dynamics for Manipulating Rigid Bodies, Deformable Objects, and Fluids
Keywords:Dynamics modeling, Control, Particle-Based Representation
TL;DR:Learning particle dynamics with dynamic interaction graphs for simulating and control rigid bodies, deformable objects, and fluids.

Regularized Learning for Domain Adaptation under Label Shifts
Keywords:Deep Learning, Domain Adaptation, Label Shift, Importance Weights, Generalization
TL;DR:A practical and provably guaranteed approach for training efficiently classifiers in the presence of label shifts between Source and Target data sets

Von Mises-Fisher Loss for Training Sequence to Sequence Models with Continuous Outputs
Keywords:Language Generation, Regression, Word Embeddings, Machine Translation
TL;DR:Language generation using seq2seq models which produce word embeddings instead of a softmax based distribution over the vocabulary at each step enabling much faster training while maintaining generation quality

Relational Forward Models for Multi-Agent Learning
Keywords:multi-agent reinforcement learning, relational reasoning, forward models
TL;DR:Relational Forward Models for multi-agent learning make accurate predictions of agents' future behavior, they produce intepretable representations and can be used inside agents.

Imposing Category Trees Onto Word-Embeddings Using A Geometric Construction
Keywords:category tree, word-embeddings, geometry
TL;DR:we show a geometric method to perfectly encode categroy tree information into pre-trained word-embeddings.

Two-Timescale Networks for Nonlinear Value Function Approximation
Keywords:Reinforcement learning, policy evaluation, nonlinear function approximation
TL;DR:We propose an architecture for learning value functions which allows the use of any linear policy evaluation algorithm in tandem with nonlinear feature learning.

Diversity-Sensitive Conditional Generative Adversarial Networks
Keywords:Conditional Generative Adversarial Network, mode-collapse, multi-modal generation, image-to-image translation, image in-painting, video prediction
TL;DR:We propose a simple and general approach that avoids a mode collapse problem in various conditional GANs.

Query-Efficient Hard-label Black-box Attack: An Optimization-based Approach
Keywords:None
TL;DR:None

Rethinking the Value of Network Pruning
Keywords:In structured network pruning, fine-tuning a pruned model only gives comparable performance with training it from scratch.
TL;DR:network pruning, network compression, architecture search, train from scratch

Hyperbolic Attention Networks
Keywords:Hyperbolic Geometry, Attention Methods, Reasoning on Graphs, Relation Learning, Scale Free Graphs, Transformers, Power Law
TL;DR:We propose to incorporate inductive biases and operations coming from hyperbolic geometry to improve the attention mechanism of the neural networks.

Learning from Positive and Unlabeled Data with a Selection Bias
Keywords:None
TL;DR:None

Adv-BNN: Improved Adversarial Defense through Robust Bayesian Neural Network
Keywords:None
TL;DR:None

Optimal Completion Distillation for Sequence Learning
Keywords:Sequence Learning, Edit Distance, Speech Recognition, Deep Reinforcement Learning
TL;DR:Optimal Completion Distillation (OCD) is a training procedure for optimizing sequence to sequence models based on edit distance which achieves state-of-the-art on end-to-end Speech Recognition tasks.

Caveats for information bottleneck in deterministic scenarios
Keywords:Information bottleneck behaves in surprising ways whenever the output is a deterministic function of the input.
TL;DR:information bottleneck, supervised learning, deep learning, information theory

Deep Learning 3D Shapes Using Alt-az Anisotropic 2-Sphere Convolution
Keywords:Spherical Convolution, Geometric deep learning, 3D shape analysis
TL;DR:A method for applying deep learning to 3D surfaces using their spherical descriptors and alt-az anisotropic convolution on 2-sphere.

Small nonlinearities in activation functions create bad local minima in neural networks
Keywords:spurious local minima, loss surface, optimization landscape, neural network
TL;DR:We constructively prove that even the slightest nonlinear activation functions introduce spurious local minima, for general datasets and activation functions.

Information Theoretic lower bounds on negative log likelihood
Keywords:latent variable modeling, rate-distortion theory, log likelihood bounds
TL;DR:Use rate-distortion theory to bound how much a latent variable model can be improved

Preferences Implicit in the State of the World
Keywords:Preference learning, Inverse reinforcement learning, Inverse optimal stochastic control, Maximum entropy reinforcement learning, Apprenticeship learning
TL;DR:When a robot is deployed in an environment that humans have been acting in, the state of the environment is already optimized for what humans want, and we can use this to infer human preferences.

A Kernel Random Matrix-Based Approach for Sparse PCA
Keywords:None
TL;DR:None

Bayesian Prediction of Future Street Scenes using Synthetic Likelihoods
Keywords:bayesian inference, segmentation, anticipation, multi-modality
TL;DR:Dropout based Bayesian inference is extended to deal with multi-modality and is evaluated on scene anticipation tasks.

There Are Many Consistent Explanations of Unlabeled Data: Why You Should Average
Keywords:semi-supervised learning, computer vision, classification, consistency regularization, flatness, weight averaging, stochastic weight averaging
TL;DR:Consistency-based models for semi-supervised learning do not converge to a single point but continue to explore a diverse set of plausible solutions on the perimeter of a flat region. Weight averaging helps improve generalization performance.

Large-Scale Answerer in Questioner's Mind for Visual Dialog Question Generation
Keywords:None
TL;DR:None

Graph HyperNetworks for Neural Architecture Search
Keywords:None
TL;DR:None

DELTA: DEEP LEARNING TRANSFER USING FEATURE MAP WITH ATTENTION FOR CONVOLUTIONAL NETWORKS
Keywords:transfer learning, deep learning, regularization, attention, cnn
TL;DR:improving deep transfer learning with regularization using attention based feature maps

textTOvec: DEEP CONTEXTUALIZED NEURAL AUTOREGRESSIVE TOPIC MODELS OF LANGUAGE WITH DISTRIBUTED COMPOSITIONAL PRIOR
Keywords:neural topic model, natural language processing, text representation, language modeling, information retrieval, deep learning
TL;DR:Unified neural model of topic and language modeling to introduce language structure in topic models for contextualized topic vectors

Amortized Bayesian Meta-Learning
Keywords:variational inference, meta-learning, few-shot learning, uncertainty quantification
TL;DR:We propose a meta-learning method which efficiently amortizes hierarchical variational inference across training episodes.

Probabilistic Recursive Reasoning for Multi-Agent Reinforcement Learning
Keywords:Multi-agent Reinforcement Learning, Recursive Reasoning
TL;DR:We proposed a novel probabilisitic recursive reasoning (PR2) framework for multi-agent deep reinforcement learning tasks.

Learning Neural PDE Solvers with Convergence Guarantees
Keywords:Partial differential equation, deep learning
TL;DR:We learn a fast neural solver for PDEs that has convergence guarantees.

A new dog learns old tricks: RL finds classic optimization algorithms
Keywords:reinforcement learning, algorithms, adwords, knapsack, secretary
TL;DR:By combining ideas from traditional algorithms design and reinforcement learning, we introduce a novel framework for learning algorithms that solve online combinatorial optimization problems.

Deep Graph Infomax
Keywords:Unsupervised Learning, Graph Neural Networks, Graph Convolutions, Mutual Information, Infomax, Deep Learning
TL;DR:A new method for unsupervised representation learning on graphs, relying on maximizing mutual information between local and global representations in a graph. State-of-the-art results, competitive with supervised learning.

Theoretical Analysis of Auto Rate-Tuning by Batch Normalization
Keywords:batch normalization, scale invariance, learning rate, stationary point
TL;DR:We give a theoretical analysis of the ability of batch normalization to automatically tune learning rates, in the context of finding stationary points for a deep learning objective.

Per-Tensor Fixed-Point Quantization of the Back-Propagation Algorithm
Keywords:deep learning, reduced precision, fixed-point, quantization, back-propagation algorithm
TL;DR:We analyze and determine the precision requirements for training neural networks when all tensors, including back-propagated signals and weight accumulators, are quantized to fixed-point format.

FUNCTIONAL VARIATIONAL BAYESIAN NEURAL NETWORKS
Keywords:functional variational inference, Bayesian neural networks, stochastic processes
TL;DR:We perform functional variational inference on the stochastic processes defined by Bayesian neural networks.

NADPEx: An on-policy temporally consistent exploration method for deep reinforcement learning
Keywords:None
TL;DR:None

SPIGAN: Privileged Adversarial Learning from Simulation
Keywords:domain adaptation, GAN, semantic segmentation, simulation, privileged information
TL;DR:An unsupervised sim-to-real domain adaptation method for semantic segmentation using privileged information from a simulator with GAN-based image translation.

Generating Multi-Agent Trajectories using Programmatic Weak Supervision
Keywords:deep learning, generative models, imitation learning, hierarchical methods, data programming, weak supervision, spatiotemporal
TL;DR:We blend deep generative models with programmatic weak supervision to generate coordinated multi-agent trajectories of significantly higher quality than previous baselines.

Label super-resolution networks
Keywords:weakly supervised segmentation, land cover mapping, medical imaging
TL;DR:Super-resolving coarse labels into pixel-level labels, applied to aerial imagery and medical scans.

ANYTIME MINIBATCH: EXPLOITING STRAGGLERS IN ONLINE DISTRIBUTED OPTIMIZATION
Keywords:distributed optimization, gradient descent, minibatch, stragglers
TL;DR:Accelerate distributed optimization by exploiting stragglers.

Sample Efficient Adaptive Text-to-Speech
Keywords:few shot, meta learning, text to speech, wavenet
TL;DR:Sample efficient algorithms to adapt a text-to-speech model to a new voice style with the state-of-the-art performance.

Practical lossless compression with latent variables using bits back coding
Keywords:compression, variational auto-encoders, deep latent gaussian models, lossless compression, latent variables, approximate inference, variational inference
TL;DR:We do lossless compression of large image datasets using a VAE, beat existing compression algorithms.

Kernel RNN Learning (KeRNL)
Keywords:RNNs, Biologically plausible learning rules, Algorithm, Neural Networks, Supervised Learning
TL;DR:A biologically plausible learning rule for training recurrent neural networks

Deep, Skinny Neural Networks are not Universal Approximators
Keywords:This paper proves that skinny neural networks cannot approximate certain functions, no matter how deep they are.
TL;DR:neural network, universality, expressability

Large Scale Graph Learning From Smooth Signals
Keywords:None
TL;DR:None

Overcoming Catastrophic Forgetting for Continual Learning via Model Adaptation
Keywords:None
TL;DR:None

Analysis of Quantized Models
Keywords:weight quantization, gradient quantization, distributed learning
TL;DR:In this paper, we studied efficient training of loss-aware weight-quantized networks with quantized gradient in a distributed environment, both theoretically and empirically.

Deep learning generalizes because the parameter-function map is biased towards simple functions
Keywords:generalization, deep learning theory, PAC-Bayes, Gaussian processes, parameter-function map, simplicity bias
TL;DR:The parameter-function map of deep networks is hugely biased; this can explain why they generalize. We use PAC-Bayes and Gaussian processes to obtain nonvacuous bounds.

Learning when to Communicate at Scale in Multiagent Cooperative and Competitive Tasks
Keywords:multiagent, communication, competitive, cooperative, continuous, emergent, reinforcement learning
TL;DR:We introduce IC3Net, a single network which can be used to train agents in cooperative, competitive and mixed scenarios. We also show that agents can learn when to communicate using our model.

Synthetic Datasets for Neural Program Synthesis
Keywords:None
TL;DR:None

DPSNet: End-to-end Deep Plane Sweep Stereo
Keywords:Deep Learning, Stereo, Depth, Geometry
TL;DR:A convolution neural network for multi-view stereo matching whose design is inspired by best practices of traditional geometry-based approaches

Conditional Network Embeddings
Keywords:Network embedding, graph embedding, learning node representations, link prediction, multi-label classification of nodes
TL;DR:We introduce a network embedding method that accounts for prior information about the network, yielding superior empirical performance.

Defensive Quantization: When Efficiency Meets Robustness
Keywords:defensive quantization, model quantization, adversarial attack, efficiency, robustness
TL;DR:We designed a novel quantization methodology to jointly optimize the efficiency and robustness of deep learning models.

GO Gradient for Expectation-Based Objectives
Keywords:generalized reparameterization gradient, variance reduction, non-reparameterizable, discrete random variable, GO gradient, general and one-sample gradient, expectation-based objective, variable nabla, statistical back-propagation, hierarchical, graphical model
TL;DR:a Rep-like gradient for non-reparameterizable continuous/discrete distributions; further generalized to deep probabilistic models, yielding statistical back-propagation

h-detach: Modifying the LSTM Gradient Towards Better Optimization
Keywords:LSTM, Optimization, Long term dependencies, Back-propagation through time
TL;DR:A simple algorithm to improve optimization and handling of long term dependencies in LSTM

An analytic theory of generalization dynamics and transfer learning in deep linear networks
Keywords:Generalization, Theory, Transfer, Multi-task, Linear
TL;DR:We provide many insights into neural network generalization from the theoretically tractable linear case.

Differentiable Learning-to-Normalize via Switchable Normalization
Keywords:None
TL;DR:None

SOM-VAE: Interpretable Discrete Representation Learning on Time Series
Keywords:deep learning, self-organizing map, variational autoencoder, representation learning, time series, machine learning, interpretability
TL;DR:We present a method to learn interpretable representations on time series using ideas from variational autoencoders, self-organizing maps and probabilistic models.

Hierarchical Generative Modeling for Controllable Speech Synthesis
Keywords:speech synthesis, representation learning, deep generative model, sequence-to-sequence model
TL;DR:Building a TTS model with Gaussian Mixture VAEs enables fine-grained control of speaking style, noise condition, and more.

Learning Factorized Multimodal Representations
Keywords:multimodal learning, representation learning
TL;DR:We propose a model to learn factorized multimodal representations that are discriminative, generative, and interpretable.

Composing Complex Skills by Learning Transition Policies
Keywords:reinforcement learning, hierarchical reinforcement learning, continuous control, modular framework
TL;DR:Transition policies enable agents to compose complex skills by smoothly connecting previously acquired primitive skills.

Human-level Protein Localization with Convolutional Neural Networks
Keywords:None
TL;DR:None

Environment Probing Interaction Policies
Keywords:None
TL;DR:None

Lagging Inference Networks and Posterior Collapse in Variational Autoencoders
Keywords:variational autoencoders, posterior collapse, generative models
TL;DR:To address posterior collapse in VAEs, we propose a novel yet simple training procedure that aggressively optimizes inference network with more updates. This new training procedure mitigates posterior collapse and leads to a better VAE model.

A2BCD: Asynchronous Acceleration with Optimal Complexity
Keywords:asynchronous, optimization, parallel, accelerated, complexity
TL;DR:We prove the first-ever convergence proof of an asynchronous accelerated algorithm that attains a speedup.

Learning to Infer and Execute 3D Shape Programs
Keywords:Program Synthesis, 3D Shape Modeling, Self-supervised Learning
TL;DR:We propose 3D shape programs, a structured, compositional shape representation. Our model learns to infer and execute shape programs to explain 3D shapes.

Deep Decoder: Concise Image Representations from Untrained Non-convolutional Networks
Keywords:natural image model, image prior, under-determined neural networks, untrained network, non-convolutional network, denoising, inverse problem
TL;DR:We introduce an underparameterized, nonconvolutional, and simple deep neural network that can, without training, effectively represent natural images and solve image processing tasks like compression and denoising competitively.

SNAS: stochastic neural architecture search
Keywords:None
TL;DR:None

Revealing interpretable object representations from human behavior
Keywords:category representation, sparse coding, representation learning, interpretable representations
TL;DR:Human behavioral judgments are used to obtain sparse and interpretable representations of objects that generalize to other tasks

AntisymmetricRNN: A Dynamical System View on Recurrent Neural Networks
Keywords:None
TL;DR:None

Global-to-local Memory Pointer Networks for Task-Oriented Dialogue
Keywords:pointer networks, memory networks, task-oriented dialogue systems, natural language processing
TL;DR:GLMP: Global memory encoder (context RNN, global pointer) and local memory decoder (sketch RNN, local pointer) that share external knowledge (MemNN) are proposed to strengthen response generation in task-oriented dialogue.

InstaGAN: Instance-aware Image-to-Image Translation
Keywords:Image-to-Image Translation, Generative Adversarial Networks
TL;DR:We propose a novel method to incorporate the set of instance attributes for image-to-image translation.

Deep Layers as Stochastic Solvers
Keywords:deep networks, optimization
TL;DR:A framework that links deep network layers to stochastic optimization algorithms; can be used to improve model accuracy and inform network design.

Learning Multi-Level Hierarchies with Hindsight
Keywords:Hierarchical Reinforcement Learning, Reinforcement Learning, Deep Reinforcement Learning
TL;DR:We introduce the first Hierarchical RL approach to successfully learn 3-level hierarchies in parallel in tasks with continuous state and action spaces.