学术海报

5月6号 张长水教授学术报告(物理与电子工程学院)

报告地点:物电学院7号楼三楼321小会议室

发布者:孙传红发布时间:2016-05-05浏览次数:298

报 告 人:张长水 教授

报告题目:Learning Sequences: image caption with region-based attention and scene factorization

报告时间:56号上午900(周五)

报告地点:物电学院7号楼三楼321小会议室

主办单位:物理与电子工程学院、科技处

报告人简介:

张长水,1965 年出生,1986 7 月毕业于北京大学数学系,获得学士学位。19927 月毕业于清华大学自动化系,获得博士学位。1992 7 月至今在清华大学自动化系工作。现任清华大学自动化系教授、博士生导师,主要研究兴趣包括:机器学习、模式识别、计算视觉等方面。目前是计算机学会高级会员;担任学术期刊:”Pattern Recognition”, “计算机学报自动化学报等编委;在国际期刊发表论文100多篇,在顶级会议上发表论文50多篇。

报告摘要:Learning sequence is a challenge task. Recent progress on automatic generation of image captions has shown that it is possible to describe the most salient information conveyed by images with accurate and meaningful sentences. In this talk, we introduce some models for sequence modeling. Then we introduce our image caption system that exploits the parallel structures between images and sentences. In our model, the process of generating the next word, given the previously generated ones, is aligned with the visual perception experience where the attention shifting among the visual regions imposes a thread of visual ordering. This alignment characterizes the flow of abstract meaning, encoding what is semantically shared by both the visual scene and the text description. Our system also makes another novel modeling contribution by introducing scene-specific contexts that capture higher-level semantic information encoded in an image. The contexts adapt language models for word generation to specific scene types. We benchmark our system and contrast to published results on several popular datasets. We show that using either region-based attention or scene-specific contexts improves systems without those components. Furthermore, combining these two modeling ingredients attains the state-of-the-art performance.