Workshop programme
08:30 - 09:30 |
Welcome, Recent advances in feature detectors & descriptors |
09:30 - 10:10 |
Sergey Sagoruyko, Learning to compare image patches |
10:10 - 10:40 |
Coffe break |
10:40 - 11:20 |
Iasonas Kokkinos, Metric learning techniques for convolutional patch descriptors and segmentation-aware neural networks |
11:20 - 12:00 |
Vincent Lepetit, LIFT: Learned Invariant Feature Transform |
12:00 - 12:40 |
New benchmark for descriptors & detectors, challenge results |
12:40 - 13:00 |
Coffee break |
13:00 - 14:00 |
Poster spotlight followed by poster session |
14:00 |
Closing |
Abstracts and bios
Engineering and Learning Local Representations - Stefano Soatto
Abstract
Representations
are functions of past (training) data that are useful for a class of tasks: The most useful are sufficient statistics (of the data, for the task) that are invariant to (task-dependent) nuisance factors. For low-level tasks such as correspondence, in the presence
of occlusion nuisances, we show that sufficient invariants are computed locally, and relate to low-level descriptors traditionally used in Computer Vision, such as SIFT and its variants. For more general tasks, where intra-class variability is significant,
a sufficient invariant can be computed hierarchically. I will define and characterize formally optimal representations, and establish relations with existing methods, including deep convolutional neural networks, for vision-based decision tasks, including
object and scene detection, recognition, categorization.
Speaker
Stefano
Soatto is the founder and director of the UCLA Vision Lab (vision.ucla.edu). He received his Ph.D. in Control and Dynamical Systems from the California Institute of Technology in 1996; he joined UCLA in 2000 after being Associate Professor of Electrical and
Biomedical Engineering at Washington University, Research Associate in Applied Sciences at Harvard University, Assistant Professor in Mathematics and Computer Science at the University of Udine, Italy, and EAP Fellow at UC Berkeley. He received his D.Ing.
degree (highest honors) from the University of Padova- Italy in 1992. Dr. Soatto is the recipient of the David Marr Prize (with Y. Ma, J. Kosecka and S. Sastry) for work on Euclidean reconstruction and reprojection up to subgroups. He also received the Siemens
Prize with the Outstanding Paper Award from the IEEE Computer Society for his work on optimal structure from motion (with R. Brockett). He received the National Science Foundation Career Award and the Okawa Foundation Grant. He received the Best Conference
Paper Award at ICRA 2015 for his work on visual-inertial sensor fusion. He is a Fellow of the IEEE and a Member of the Editorial Board of the International Journal of Computer Vision (IJCV), the International Journal of Mathematical Imaging and Vision (JMIV)
and Foundations and Trends in Computer Graphics and Vision, the SIAM
Journal of Imaging Science. He was Program Co-Chair of CVPR 2005, ICCV
2011, SIAM 2016, and will be Program Co-Chair of ICCV 2017.
Metric learning techniques for convolutional patch descriptors
and segmentation-aware neural networks - Iasonas Kokkinos
Abstract
In this talk we will describe recent advances on using metric learning in conjunction with convolutional neural networks (CNNs).
The first part of the talk will present a convolutional network
trained to extract interest point descriptors. We train a Siamese
network with a pairwise loss that uses the euclidean distance between
the CNN outputs; using the euclidean distance allows to use the
resulting descriptor as a drop-in replacement for SIFT, or any other
hand-crafted descriptor. Our descriptors systematically outperform a
broad range of descriptors in geometric correspondence tasks, while
also exhibiting surprisingly good generalization to other tasks.
In the second part of the talk we will build on our earlier work on
learning segmentation-aware `shallow' descriptors and present a recent
CNN-based counterpart. For this we first show how one can use metric
learning to train a CNN that `softly' captures segmentation
information through a vector-valued embedding. We then turn to using
this embedding within a modified, `segmentation-aware' CNN that relies
on normalized convolution to eliminate the effects of background
variation on neuron outputs. When evaluated on a semantic
segmentation task, we show that the learned CNN delivers
systematically better results, in terms of both visual sharpness and
accuracy.
Speaker
Dr Iasonas Kokkinos received the Diploma and Ph.D. degrees in 2001 and 2006 from the School of Electrical and Computer Engineering in NTUA , working in the Computer Vision and Signal Processing group.
After obtaining his Ph.D. he joined Prof. Yuille's group as a postdoc in UCLA between 2006 and 2008.
As of Sep. 2008 he joined Ecole Centrale Paris as Assistant Professor
in the Department of Applied Mathematics, working in the Center for
Visual Computing laboratory, and the GALEN research team of
INRIA-Saclay.
LIFT:
Learned Invariant Feature Transform - Vincent Lepetit
Abstract
I will present a novel Deep Network architecture that implements the
full feature point handling pipeline, that is, detection, orientation
estimation, and feature description. While previous works have
successfully tackled each one of these problems individually, we show
how to learn to do all three in a unified manner while preserving
end-to-end differentiability. We then demonstrate that our Deep
pipeline outperforms state-of-the-art methods on a number of
benchmark datasets, without the need of retraining. I will also argue
that such approach is very general and could be applied to other
Computer Vision problems.
Speaker
Vincent
Lepetit is a Professor at the Institute for Computer Graphics and
Vision, TU Graz and a Visiting Professor at the Computer Vision
Laboratory, EPFL. He received the engineering and master degrees in
Computer Science from the ESIAL in 1996. He received the PhD degree
in Computer Vision in 2001 from the University of Nancy, France,
after working in the ISA INRIA team. He then joined the Virtual
Reality Lab at EPFL as a post-doctoral fellow and became a founding
member of the Computer Vision Laboratory. His research interests
include vision-based Augmented Reality, 3D camera tracking, object
recognition and 3D reconstruction.