标题：信息学院专题系列讲座— Machine Learning and Artificial Intelligence

时间：2017年1月6日周五14：00 - 17：00

地点：信息学院1A-200室

报告人：Yi Wu，Shixiang (Shane) Gu，Junbo (Jake) Zhao

**Seminar Topic:** Value Iteration Network (2016 NIPS Best Paper Award)

**Speaker:** Yi Wu

**Time:** Jan. 6, 2:00 p.m. - 3:00 p.m.

**Venue:** Room 1A-200, SIST Building

**Abstract: **

We introduce the value iteration network (VIN): a fully differentiable neural network with a ‘planning module' embedded within. VINs can learn to plan, and are suitable for predicting outcomes that involve planning-based reasoning, such as policies for reinforcement learning. Key to our approach is a novel differentiable approximation of the value-iteration algorithm, which can be represented as a convolutional neural network, and trained end-to-end using standard back-propagation. We evaluate VIN based policies on discrete and continuous path-planning domains, and on a natural-language based search task. We show that by learning an explicit planning computation, VIN policies generalize better to new, unseen domains.

**Biography: **

Yi Wu is now a 3rd-year Computer Science Ph.D. student at UC Berkeley advised by Prof. Stuart Russell. He received his B.E. from the special pilot class (Yao class) from Institute of Interdisciplinary Information Sciences, Tsinghua University. Yi's research focuses how to effectively incorporate human knowledge into AI models to produce both interpretable and generalizable solution. He is now working on a variety of projects, including probabilistic generative models, probabilistic programming and hierachical reinforcement learning.

**
**

**#2:**

**Seminar Topic:** Sample-Efficient and Stable Deep Reinforcement Learning for Robotics

**Speaker:** Shixiang (Shane) Gu

**Time:** Jan. 6, 3:00 p.m. - 4:00 p.m.

**Venue:** Room 1A-200, SIST Building

**Abstract: **

Model-free deep reinforcement learning (RL) methods have been successful in a wide variety of simulated domains. However, a major obstacle facing deep RL in the real world is the high sample complexity of such methods. We present two independent lines of work to address this fundamental problem. In the first part, we explore how off-policy deep RL methods based on normalized advantage functions (NAF) can learn real-world robotic manipulation skills, with multiple robots simultaneously pooling their experiences. Our results show that we can obtain faster training and, in some cases, converge to a better solution when training on multiple robots, and we show that we can learn a real-world door opening skill with deep neural network policies using about 2.5 hours of total training time with two robots. In the second part, we present Q-Prop, a novel model-free method that combines the stability of unbiased policy gradients with the efficiency of off-policy RL. We analyze the connection between Q-Prop and existing model-free algorithms, and use control variate theory to derive two variants of Q-Prop with conservative and aggressive adaptation. We show that conservative Q-Prop provides substantial gains in sample efficiency over trust region policy optimization (TRPO) with generalized advantage estimation (GAE), and improves stability over deep deterministic policy gradient (DDPG), the state-of-the-art on-policy and off-policy methods, on OpenAI Gym’s MuJoCo continuous control environments.

**Biography: **

Shixiang (Shane) Gu started PhD in Machine Learning under Cambridge-Tübingen PhD Fellowship in the fall 2014, where he is co-supervised by Richard E. Turner and Zoubin Ghahramani at University of Cambridge, and Bernhard Schölkopf at the Max Planck Institute for Intelligent Systems in Tübingen. He also collaborates closely with Sergey Levine at UC Berkeley/Google Brain and Timothy Lillicrap at DeepMind. He obtained his B.ASc. in Engineering Science from the University of Toronto in 2013, where he completed his thesis with Geoffrey Hinton. He is funded by NSERC and a Google Focused Research Award.

**
**

**#3:**

**Seminar Topic:** Generative Adversarial Networks

**Speaker:** Junbo (Jake) Zhao

**Time:** Jan. 6, 4:00 p.m. - 5:00 p.m.

**Venue:** Room 1A-200, SIST Building

**Abstract: **

The talk will be focusing on the development and application of generative adversarial networks. I will be mainly talking about our recent work: Energy-based Generative Adversarial Networks and Disentangling factors of variation in deep representation using adversarial training.

**Biography: **

Junbo (Jake) Zhao is currently a Ph.D. student working in CILVR lab at NYU, under the supervision of Professor Yann LeCun. My main research interest is deep learning on computer vision and natural language processing. Before, I holds a master degree in data science from NYU center for data science and a engineering degree from Wuhan University.