VISUM: 2nd edition

Created with flickr slideshow.


Machine Learning for Computer Vision:

This introductory course on Machine Learning aims to give the student the adroitness to understand and apply some of the following topics:

  • The abc of computer vision: we will briefly review some concepts from image processing and computer vision, with a focus on object detection and recognition. Most of the subsequent concepts will be motivated with the object detection and recognition in visual data.
  • The abc of the learning process: we will briefly review some concepts from the machine learning field with a focus on the classification task and the independent observations assumption. We will illustrate these concepts with some Matlab examples for object recognition in images. Finally we will discuss the limitations of these assumptions in computer vision and pave the way to the articulation between objects and context for object recognition.
  • Context dependent data modelling in computer vision: we will start by discussing the sources of context, from the most common local pixel context, which captures the basic notion that image pixels/patches around the region of interest carry useful information, to scene and cultural context. Then we will discuss how these sources of context could be used for improved object detection and recognition. This will motivate a revision of some of the most common tool for modelling context: Hidden Markov Models, Conditional Random Fields, Martingales, Probabilistic Graphical Models, and Statistical Relational Learning. Some of the tools will be illustrated with the application to object recognition using again Matlab. Some of the limitations of these techniques will be stressed, and the difficulties in the application to gigantic databases of visual information will be addressed.
  • Context dependent data modelling in computer vision: are we there yet?
    Despite the quality of the research in this field and the significant recent advances, we will argue that existing data models cannot yet naturally and directly represent such context-dependent information in computer vision. We will highlight open questions and stimulate promising future research.
  • Take home messages: conclusion with a suggestion of a set of useful references in the field.

Professor Jaime S. Cardoso
INESC TEC, Faculdade de Engenharia, Universidade Porto


  • Christopher M. Bishop; Pattern recognition and machine learning.
  • Forsyth and Ponce, Computer Vision: A Modern Approach, Prentice Hall, 2002.
  • Kevin Murphy, Machine Learning: A Probabilistic Perspective.

Image Segmentation:

I this lecture I will give an overview the progress in the area of semantic segmentation including top-down and bottom-up approaches, including different methodologies based on variational approaches and graph cuts. I will also review our group’s recent work on semantic visual interpretation based on image segmentation techniques. Differently from existing bag-of-words or regular-grid description methods that bypass image segmentation entirely, and unlike methods that segment images and recognize objects by detecting known object parts or fusing superpixel maps by means of random field models, we will explore interpretation strategies based on multiple figure-ground segmentations. Central to our approach is a combinatorial parametric max flow methodology (CPMC) that can explore, exactly, a large space of object layout hypotheses constrained at different image locations and spatial scales, in polynomial time. Once a potentially large ensemble of such hypotheses is obtained, we show that it is possible to distill and diversify a pool of a few hundred elements, at minimal loss of accuracy, by training category-independent models to predict how well each segment hypothesis exhibits real world regularities based on mid-level properties like boundary smoothness, Euler-number or convexity. I will show that such a simple combinatorial strategy operating on only low-level and mid-level features can generate segments that cover entire objects or parts in images with high probability and good accuracy, as empirically measured on most existing segmentation benchmarks. Moreover, the figure-ground segment pool can be now used within a sliding-segment- as opposed to sliding window – strategy, or compositionally -- and in conjunction with second-order pooled region descriptions, for object detection, semantic segmentation, video processing or monocular 3d human pose reconstruction. A proof of concept system based on such principles has been demonstrated in the PASCAL VOC semantic segmentation challenge where it was top-ranked over the past four editions.

Cristian Sminchisescu
Lund University


  • J. Carreira and C. Sminchisescu. CPMC: Automatic Object Segmentation Using Constrained Parametric Min-Cuts. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2012.
  • F. Li, J. Carreira, and C. Sminchisescu. Object Recognition by Sequential Figure-Ground Ranking. In International Journal of Computer Vision, 2012.
  • C. Ionescu, F. Li, and C. Sminchisescu. Latent Structured Models for Human Pose Estimation. In IEEE International Conference on Computer Vision, November 2011.
  • A. Ion, J. Carreira, and C. Sminchisescu. Probabilistic Joint Image Segmentation and Labeling. In Advances in Neural Information Processing Systems, December 2011.
  • J. Carreira, R. Caseiro, J. Batista, and C. Sminchisescu. Semantic Segmentation with Second-Order Pooling. In European Conference on Computer Vision, October 2012.

Feature Detectors:

I will give an overview of methods used in Computer Vision to extract and describe image features, from the ad hoc methods that were originally proposed, to the more modern methods based on Machine Learning techniques.

Professor Vincent Lepetit


  • David G. Lowe, "Distinctive image features from scale-invariant keypoints," International Journal of Computer Vision, 60, 2 (2004), pp. 91-110.
  • Herbert Bay, Andreas Ess, Tinne Tuytelaars, Luc Van Gool, "SURF: Speeded Up Robust Features", Computer Vision and Image Understanding (CVIU), Vol. 110, No. 3, pp. 346--359, 2008
  • Jan Sichuan, Jiri Matas. WaldBoost, Learning for Time Constrained Sequential Detection. In Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR), 2005.
  • T. Trzcinski, M. Christoudias, P. Fua, and V. Lepetit, Boosting Binary Keypoint Descriptors. In Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR), 2013.


I will cover a general introduction to user-centered design, and will focus on some elements of intelligent interaction that make the creation of effective interactive systems a challenge. I will mix practical work with lectures and other materials, so that people get an experience of designing a system and learn how to understand user requirements.

Professor Russell Beale
School of Computer Science, University of Birmingham

RGB-D cameras:

The advent of the Microsoft Kinect and other RGB-D sensors has resulted in great progress in dense mapping, object recognition and SLAM in recent years. Given the low cost of the sensor coupled with the high resolution visual and depth information provided at video frame rate, methods relying on RGB-D sensors are becoming more popular in tackling some of the key perception problems in robotics and computer vision. This course will feature an overview of many of the recent advances in RGB-D camera-based research, including methods which specifically exploit the wide-scale availability of general-purpose computing on graphics processing units. The following topics will be covered in the course:

  • Sensor technology and calibration
  • Dense tracking & GPGPU approaches
  • 3D reconstruction
  • Large scale dense SLAM
  • Applications of RGB-D data

Thomas Whelan
Computer Vision Group, National University of Ireland Maynooth


  • “Kintinuous: Spatially Extended KinectFusion” by T. Whelan, M. Kaess, M.F. Fallon, H. Johannsson, J.J. Leonard and J.B. McDonald. In RSS Workshop on RGB-D: Advanced Reasoning with Depth Cameras, (Sydney, Australia), July 2012. .

  • "Robust Real-Time Visual Odometry for Dense RGB-D Mapping" by T. Whelan, H. Johannsson, M. Kaess, J.J. Leonard, and J.B. McDonald. In IEEE Intl. Conf. on Robotics and Automation, ICRA, (Karlsruhe, Germany), May 2013.

  • "Deformation-based Loop Closure for Large Scale Dense RGB-D SLAM" by T. Whelan, M. Kaess, J.J. Leonard, and J.B. McDonald. In IEEE/RSJ Intl. Conf. on Intelligent Robots and Systems, IROS, (Tokyo, Japan), November 2013.


  • Introduction to Biometric Recognition
    • Biometric Traits: Comparison and Critical Review
  • A Biometric recognition System From the Pattern Recognition Perspective
    • Data Acquisition
    • Object detection
    • Object Segmentation
    • Feature Encoding
      • Gabor Filters
      • Multi Lobe Differential Filters
    • Matching
  • Biometric Systems Performance
    • Inter-class and Intra-class Variability
    • Performance Measures
      • Receiver Operating Characteristic Curves
      • Detection-Error Trade-off Curves
      • Area Under Curve
      • Equal Error Rate
      • False Rejection, given False Acceptance rate
  • Multimodal Biometrics
    • Fusion at Different Levels: Data, Features, Scores and Responses

Professor Hugo Proença
Department of Computer Science, University of Beira Interior


  • Hugo Proença; Iris Biometrics: Indexing and Retrieving Heavily Degraded Data, IEEE Transactions on Information Forensics and Security, volume 8, issue 12, pag. 1975-1985, ISSN 1556-6013, Digital Object Identifier 10.1109/TIFS.2013.2283458, 2013.
  • Hugo Proença, Luís A. Alexandre; Toward Covert Iris Biometric Recognition: Experimental Results From the NICE Contests, IEEE Transactions on Information Forensics and Security, volume 7, issue 2, pag. 798-808, ISSN 1556-6013, Digital Object Identifier 10.1109/TIFS.2011.2177659.


One of the major goals of image processing and computer vision is to extract meaningful high level information from image data. Image classification can refer to a variety of purposes, including the labelling of multi-spectral image pixels to produce thematic maps. This has been a major topic in processing Earth Observation Satellite (EOS) images since the 1970s.
In this talk, a brief overview of the various image classification approaches will be made. The evolution of EOS images will be also addressed, from the historic low resolution multi-spectral images to the now available hyperspectral and very high spatial resolution datasets. The challenges posed by the current and planned image datasets are huge, due to a variety of reasons including the hyper-dimensionality and the sheer volume of the data itself. These issues will be addressed together with some of the most promising techniques to handle the image classification task.

Professor André Marçal
INESC TEC, Faculdade de Ciências, Universidade Porto