VISUM: 3rd Edition


Randomised Decision Forests and Tree-structured Algorithms in Computer Vision

Many computer vision tasks can be cast as large-scale classification problems, where extremely efficient and powerful classification methods are pursued for real-time performance. Randomised Decision Forests is an emerging technique in the fields, being highly successful for various real-time vision applications. It roots to tree and ensemble learning. A hierarchical tree structure yields many short paths, accelerating evaluation time, while ensemble learning with randomisation ensures smooth decision regions for good generalisation to unseen data. On the other hand, Boosting as a representative ensemble learning technique has been a standard method for computational demanding tasks, e.g. object detection. A boosting algorithm with simple weak learners can be seen as a flat structure and many developments including a Boosting cascade for time-efficient classification as a tree structure. In this talk, we review Randomised Decision Forests and tree-structured methods with comparative and insightful discussions. Following the concepts and principles, its various applications are also presented. More information is found at

Professor Tae-Kyun Kim
Intelligent Systems and Networks Group Group and Imperial College London


  • T-K. Kim and R. Cipolla, MCBoost: Multiple Classifier Boosting for Perceptual Co-clustering of Images and Visual Features, In Advances in Neural Information Processing Systems (NIPS), Vancouver, Canada, 2008.
  • T-K. Kim, I. Budvytis, R. Cipolla, Making a Shallow Network Deep: Conversion of a Boosting Classifier into a Decision Tree by Boolean Optimisation, Int. Journal of Computer Vision, 100(2):203-215, 2012.
  • D. Tang, T.H. Yu and T-K. Kim, Real-time Articulated Hand Pose Estimation using Semi-supervised Transductive Regression Forests, Proc. of IEEE Int. Conf. on Computer Vision (ICCV), Sydney, Australia, 2013.
  • X. Zhao, T-K. Kim, W. Luo, Unified Face Analysis by Iterative Multi-Output Random Forests, Proc. of IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Columbus, Ohio, USA, 2014.

Local Features Extraction and Description

Local features have been exploited in competitive or state-of-the approches addressing a range problems like multi-veiw matching, 3D reconstruction, augmented reality, SLAM and texture recognition. In general, each approach places different requirements on the properties of the features - the speed of extration, location accuracy, robustness to illumination changes, covariance to a certain group of geometric transformation, etc.

Local features will be reviwed from this perspective, focusing on methods that provide the best trade-off for certain classes of problems and applications.

Professor Jiri Matas
Center for Machine Perception and Czech Technical University, Prague

Document Image Analysis

Document Image Analysis and Recognition (DIAR) is an important field in Pattern Recognition, whose aim is the automatic analysis of contents of document images, towards their recognition and understanding. Traditionally, DIAR has been focused on the analysis of scanned document images, and has been instrumental in the development of key technologies such as Optical Character Recognition (OCR) engines, and the introduction of key pattern recognition and machine learning concepts. In the last decades, the discipline has grown exponentially extending to camera based DIAR, on-line pen-based interfaces and contemporary text containers other than paper documents (real scenes, digital-born images, etc). In parallel, DIAR has become a cornerstone technology for the preservation of cultural heritage, especially in Europe. The DIAR field is represented in the International Association of Pattern Recognition through two Technical Committees: TC10 (“Graphics Recognition”) and TC11 (“Reading Systems”). The main conference of the field is the bi-annual International Conference on Document Analysis and Recognition (ICDAR) that attracts about 400 participants.

The lecture will cover the typical DIAR processes, including document enhancement, layout analysis, Optical Character Recognition, handwriting recognition, document classification, information spotting and retrieval, graphics recognition and writer identification.

Doctor Alicia Fornes
Computer Vision Center, Universitat Autònoma de Barcelona


3D Scene understanding


Professor Martial Hebert
The Robotics Institute, School of Computer Science and Carnegie Mellon University

Automatic Facial Expression Recognition - A Practical Introduction

Facial Expression Recognition has reached a state where it can make its first tentative steps out in the wild, the most notable example being the smile detection on digital cameras. In this hands-on tutorial we will give a practical introduction to facial expression analysis, guiding you through a number of essential steps resulting in a working version of a smile detector. Resources for the tutorial can be downloaded from here.

Professor Michel Valstar
University of Nottingham and

RGB-D Cameras

The advent of the Microsoft Kinect and other RGB-D sensors has resulted in great progress in dense mapping, object recognition and SLAM in recent years. Given the low cost of the sensor coupled with the high resolution visual and depth information provided at video frame rate, methods relying on RGB-D sensors are becoming more popular in tackling some of the key perception problems in robotics and computer vision. This course will feature an overview of many of the recent advances in RGB-D camera-based research, including methods which specifically exploit the wide-scale availability of general-purpose computing on graphics processing units. The following topics will be covered in the course:

  • Sensor technology and calibration
  • Dense tracking & GPGPU approaches
  • 3D reconstruction
  • Large scale dense SLAM
  • Applications of RGB-D data

Doctor Thomas Whelan
Dyson Robotics Lab, Imperial College London


  • “Kintinuous: Spatially Extended KinectFusion” by T. Whelan, M. Kaess, M.F. Fallon, H. Johannsson, J.J. Leonard and J.B. McDonald. In RSS Workshop on RGB-D: Advanced Reasoning with Depth Cameras, (Sydney, Australia), July 2012. .
  • "Robust Real-Time Visual Odometry for Dense RGB-D Mapping" by T. Whelan, H. Johannsson, M. Kaess, J.J. Leonard, and J.B. McDonald. In IEEE Intl. Conf. on Robotics and Automation, ICRA, (Karlsruhe, Germany), May 2013.
  • "Deformation-based Loop Closure for Large Scale Dense RGB-D SLAM" by T. Whelan, M. Kaess, J.J. Leonard, and J.B. McDonald. In IEEE/RSJ Intl. Conf. on Intelligent Robots and Systems, IROS, (Tokyo, Japan), November 2013.


João Paulo Costeira
Instituto de Sistemas e Robótica, Instituto Superior Técnico


Social Programme

  • Porto Tour
  • Palácio da Bolsa
  • Museu Romântico da Quinta da Macieirinha

Welcome drink

Welcome drink‬ will take place in the Sheraton Hotel


VISUM 2015 dinner will be at the Restaurant Além-Mar at Hotel Dom Henrique