Local Features: State of the art, open problems and performance evaluation

Workshop programme

08:30 - 09:30	Welcome, Recent advances in feature detectors & descriptors
09:30 - 10:10	Sergey Sagoruyko, Learning to compare image patches
10:10 - 10:40	Coffe break
10:40 - 11:20	Iasonas Kokkinos, Metric learning techniques for convolutional patch descriptors and segmentation-aware neural networks
11:20 - 12:00	Vincent Lepetit, LIFT: Learned Invariant Feature Transform
12:00 - 12:40	New benchmark for descriptors & detectors, challenge results
12:40 - 13:00	Coffee break
13:00 - 14:00	Poster spotlight followed by poster session
14:00	Closing

Abstracts and bios

Engineering and Learning Local Representations - Stefano Soatto

Abstract

Representations are functions of past (training) data that are useful for a class of tasks: The most useful are sufficient statistics (of the data, for the task) that are invariant to (task-dependent) nuisance factors. For low-level tasks such as correspondence, in the presence of occlusion nuisances, we show that sufficient invariants are computed locally, and relate to low-level descriptors traditionally used in Computer Vision, such as SIFT and its variants. For more general tasks, where intra-class variability is significant, a sufficient invariant can be computed hierarchically. I will define and characterize formally optimal representations, and establish relations with existing methods, including deep convolutional neural networks, for vision-based decision tasks, including object and scene detection, recognition, categorization.

Speaker

Stefano Soatto is the founder and director of the UCLA Vision Lab (vision.ucla.edu). He received his Ph.D. in Control and Dynamical Systems from the California Institute of Technology in 1996; he joined UCLA in 2000 after being Associate Professor of Electrical and Biomedical Engineering at Washington University, Research Associate in Applied Sciences at Harvard University, Assistant Professor in Mathematics and Computer Science at the University of Udine, Italy, and EAP Fellow at UC Berkeley. He received his D.Ing. degree (highest honors) from the University of Padova- Italy in 1992. Dr. Soatto is the recipient of the David Marr Prize (with Y. Ma, J. Kosecka and S. Sastry) for work on Euclidean reconstruction and reprojection up to subgroups. He also received the Siemens Prize with the Outstanding Paper Award from the IEEE Computer Society for his work on optimal structure from motion (with R. Brockett). He received the National Science Foundation Career Award and the Okawa Foundation Grant. He received the Best Conference Paper Award at ICRA 2015 for his work on visual-inertial sensor fusion. He is a Fellow of the IEEE and a Member of the Editorial Board of the International Journal of Computer Vision (IJCV), the International Journal of Mathematical Imaging and Vision (JMIV) and Foundations and Trends in Computer Graphics and Vision, the SIAM Journal of Imaging Science. He was Program Co-Chair of CVPR 2005, ICCV 2011, SIAM 2016, and will be Program Co-Chair of ICCV 2017.

Metric learning techniques for convolutional patch descriptors and segmentation-aware neural networks - Iasonas Kokkinos

Abstract

In this talk we will describe recent advances on using metric learning in conjunction with convolutional neural networks (CNNs). The first part of the talk will present a convolutional network trained to extract interest point descriptors. We train a Siamese network with a pairwise loss that uses the euclidean distance between the CNN outputs; using the euclidean distance allows to use the resulting descriptor as a drop-in replacement for SIFT, or any other hand-crafted descriptor. Our descriptors systematically outperform a broad range of descriptors in geometric correspondence tasks, while also exhibiting surprisingly good generalization to other tasks. In the second part of the talk we will build on our earlier work on learning segmentation-aware `shallow' descriptors and present a recent CNN-based counterpart. For this we first show how one can use metric learning to train a CNN that `softly' captures segmentation information through a vector-valued embedding. We then turn to using this embedding within a modified, `segmentation-aware' CNN that relies on normalized convolution to eliminate the effects of background variation on neuron outputs. When evaluated on a semantic segmentation task, we show that the learned CNN delivers systematically better results, in terms of both visual sharpness and accuracy.

Speaker

Dr Iasonas Kokkinos received the Diploma and Ph.D. degrees in 2001 and 2006 from the School of Electrical and Computer Engineering in NTUA , working in the Computer Vision and Signal Processing group. After obtaining his Ph.D. he joined Prof. Yuille's group as a postdoc in UCLA between 2006 and 2008. As of Sep. 2008 he joined Ecole Centrale Paris as Assistant Professor in the Department of Applied Mathematics, working in the Center for Visual Computing laboratory, and the GALEN research team of INRIA-Saclay.

LIFT: Learned Invariant Feature Transform - Vincent Lepetit

Abstract

I will present a novel Deep Network architecture that implements the full feature point handling pipeline, that is, detection, orientation estimation, and feature description. While previous works have successfully tackled each one of these problems individually, we show how to learn to do all three in a unified manner while preserving end-to-end differentiability. We then demonstrate that our Deep pipeline outperforms state-of-the-art methods on a number of benchmark datasets, without the need of retraining. I will also argue that such approach is very general and could be applied to other Computer Vision problems.

Speaker

Vincent Lepetit is a Professor at the Institute for Computer Graphics and Vision, TU Graz and a Visiting Professor at the Computer Vision Laboratory, EPFL. He received the engineering and master degrees in Computer Science from the ESIAL in 1996. He received the PhD degree in Computer Vision in 2001 from the University of Nancy, France, after working in the ISA INRIA team. He then joined the Virtual Reality Lab at EPFL as a post-doctoral fellow and became a founding member of the Computer Vision Laboratory. His research interests include vision-based Augmented Reality, 3D camera tracking, object recognition and 3D reconstruction.