MRES.B.02.01 Selected Topics in image Processing and Computer Vision (MRES.B.02.01)

Elias ZOIS

Description

Computer vision is perhaps one of the most thrilling fields which combines the concepts of data-driven Machine Learning and image processing. Computer vision exists in numerous applications ranging from Navigation, e.g., by any type of an autonomous vehicle; document analysis and understanding, mixed reality etc. The course module contains selected topics in computer vision and pattern recognition.

Course Objectives/Goals

Upon successful completion of the course, students are expected to be able to:

  1. Identify basic concepts, terminology, theories, models and methods in the field of digital image processing and computer vision. 
  2. Describe basic methods of computer vision related to multi-scale representation, edge detection
    and detection of other primitives,for object recognition. 
  3. Suggest a design of a computer vision system for a specific problem. 
  4. Describe known principles of human visual system by means of the Bag of visual Words model and Sparse Representation principles. 
Prerequisites/Prior Knowledge

Undergraduate courses dedication on Digital signal and image processing, Linear algebra, probability and statistics, and Pattern recognition would be useful.

Furthermore, the potential students should have a minimal knowledge of MATLAB and/or Python. 

Assessment Methods

Student evaluation comes from: 

a) One mini-project regarding the use of typical image processing methods for object recognition.

b) One mini-project regarding the use of a bag-of-words model for object rcognition. 

Bibliography

TextBooks

1. Computer Vision - A modern approach, by D. Forsyth and J. Ponce, Prentice Hall Robot Vision, by B. K. P. Horn, McGraw-Hill, ISBN 978-0136085928.

2. Digital Image Processing 4th Edition by Rafael Gonzalez, Richard Woods, Pearson, ISBN: 9780133356724. 

 

Reference Books

1. Richard Szeliksy “Computer Vision: Algorithms and Applications” (http://szeliski.org/Book/).

2. Multiple View Geometry in Computer Vision Second Edition, Richard Hartley and Andrew Zisserman, Cambridge University Press, March 2004 (https://www.robots.ox.ac.uk/~vgg/hzbook/).

3. Sparse and Redundant Representations: From Theory to Applications in Signal and Image Processing 2010th Edition by Michael Elad, (https://elad.cs.technion.ac.il/publications/other/).

4. Machine Learning A Bayesian and Optimization Perspective 2nd Edition, Sergios Theodoridis, ISBN: 9780128188033. 

 

Additional info

RESEARCH ARTICLES

1. An anthology of research papers offered by: 

A. Computer Vision Foundation. openaccess.thecvf.com/menu. Research papers from top notch conferences such as: 

  • Computer Vision & Pattern Recognition (CVPR)
  • International Conference on Computer Vision (ICCV)
  • Winter Applications on Computer Vision (WACV)

B. European Computer Vision Association repository, www.ecva.net/papers.php. Research papers from top notch conferences such as:

  • European Conference on Computer Vision (ECCV)

C.  IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) 

D. IEEE signal processing society

 

2. Highly cited resarch papers.

  • Aharon, M., Elad, M. and Bruckstein, A., 2006. K-SVD: An algorithm for designing overcomplete dictionaries for sparse representation. IEEE Transactions on signal processing54(11), pp.4311-4322.
  • Wright, J., Ma, Y., Mairal, J., Sapiro, G., Huang, T.S. and Yan, S., 2010. Sparse representation for computer vision and pattern recognition. Proceedings of the IEEE98(6), pp.1031-1044.
  • LeCun, Yann, Yoshua Bengio, and Geoffrey Hinton. "Deep learning." nature 521.7553 (2015): 436-444.
  • LeCun, Y., Bottou, L., Bengio, Y. and Haffner, P., 1998. Gradient-based learning applied to document recognition. Proceedings of the IEEE86(11), pp.2278-2324.
  • Fei-Fei, Li, Robert Fergus, and Pietro Perona. "One-shot learning of object categories." IEEE transactions on pattern analysis and machine intelligence 28, no. 4 (2006): 594-611.
  • Lowe, David G. "Distinctive image features from scale-invariant keypoints." International journal of computer vision 60 (2004): 91-110.
  • Perona, Pietro, and Jitendra Malik. "Scale-space and edge detection using anisotropic diffusion." IEEE Transactions on pattern analysis and machine intelligence 12, no. 7 (1990): 629-639.
  • Simonyan, Karen, and Andrew Zisserman. "Very deep convolutional networks for large-scale image recognition." arXiv preprint arXiv:1409.1556 (2014).
  • Olshausen, Bruno A., and David J. Field. "Emergence of simple-cell receptive field properties by learning a sparse code for natural images." Nature 381, no. 6583 (1996): 607-609.
  • Ng, Andrew, Michael Jordan, and Yair Weiss. "On spectral clustering: Analysis and an algorithm." Advances in neural information processing systems 14 (2001).
  • He, Kaiming, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. "Deep residual learning for image recognition." In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770-778. 2016.
  • Tuzel, O., Porikli, F., & Meer, P. (2008). Pedestrian detection via classification on riemannian manifolds. IEEE transactions on pattern analysis and machine intelligence30(10), 1713-1727.
  • Deng, Jia, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. "Imagenet: A large-scale hierarchical image database." In 2009 IEEE conference on computer vision and pattern recognition, pp. 248-255. Ieee, 2009.
  • Lazebnik, Svetlana, Cordelia Schmid, and Jean Ponce. "Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories." In 2006 IEEE computer society conference on computer vision and pattern recognition (CVPR'06), vol. 2, pp. 2169-2178. IEEE, 2006.
  • Jiang, Xingyu, Jiayi Ma, Guobao Xiao, Zhenfeng Shao, and Xiaojie Guo. "A review of multimodal image matching: Methods and applications." Information Fusion 73 (2021): 22-71.
  • Mikolajczyk, Krystian, Tinne Tuytelaars, Cordelia Schmid, Andrew Zisserman, Jiri Matas, Frederik Schaffalitzky, Timor Kadir, and L. Van Gool. "A comparison of affine region detectors." International journal of computer vision 65 (2005): 43-72.

 

TOOLS

VLFEAT: https://www.vlfeat.org/

SPAMS: SPArse Modeling Software 

TENSORFLOW: www.tensorflow.org/

OPENCV: opencv.org/

PyTorch: pytorch.org/

MANOPT: www.manopt.org/

 

WEBSITES

Google machine learning education: https://developers.google.com/machine-learning

Prof. M. Harandi website: https://sites.google.com/site/mehrtashharandi/ 

Prof. M. Elad website: https://elad.cs.technion.ac.il/

Prof. F. Porikli website: https://www.porikli.com/

Prof. A. Ng website: https://www.andrewng.org/

Prof. Y. LeCun website: http://yann.lecun.com/

Prof. A. Zisserman website: https://www.robots.ox.ac.uk/~az/

Prof. Fei-Fei Li website: http://vision.stanford.edu/

Stanford University DL-CV: cs231n.stanford.edu/index.html

Units

Students can find sample evaluation material for this course module under the "DOCUMENTS" in the left frame menu.

Topics of this unit include:

a) Image transforms: the 2D Fourrier& Discrete cosine transform (DCT).

b) Image compression: the JPEG standard is method of lossy compression for digital images, particularly for those images produced by digital photography. 

c) Morphological transformations: Mathematical morphology (MM) is a theory and technique for the analysis and processing of geometrical structures, based on set theory, lattice theory, topology, and random functions. MM is most commonly applied to digital images, but it can be employed as well on graphs, surface meshes, solids, and many other spatial structures.

Topics of this lecture include mainly edge detection techniques:

a) Lines, edges and ridges with the Sobel, Prewitt, Roberts and Canny operators.

b) Corner detection: the role of Hessian and the Harris operator.

c) Blob detection with Laplacian of Gaussian (LoG) and Difference of Gaussian (DoG). 

Topics of this lecture covers the identification and coding of an image with the use of "Regions of Interest" (ROI). This is achieved with the use of Key-point detection and visual descriptor algorithms such as the popular "Scale-invariant feature transform" (SIFT). In addition, we extend the use of the SIFT by introducing the concept of "Local correspondence" and the RANSAC algorithm.

 

This topic covers mainly the image representation problem by means of the popular Bag of Visual Words (BoW) and the dictionary learning "or codebook" problem. The BoW model converts vector-represented image patches to "codewords" , which in theor turn create a "codebook". A codeword can be considered as a representative of several similar patches. One simple method is performing k-means clustering over all the vectors. Codewords are then defined as the centers of the learned clusters. The number of the clusters is the codebook size.

The sparse representation (SR) is a neurological inspired computer vision technique in which only a number of few vision elements - receptive fields are activated. Thus, sparse coding occurs when an image (object) under examination is encoded by the strong activation of a relatively small set of neurons. For each image to be encoded, there is a  completely different subset of all available neurons. 

In computer vision,the k-SVD is a dictionary (i.e. codebook) learning (DL) algorithm which creates one dictionary D, employed in sparse representation coding. k-SVD, is a merging of the singular value decomposition (SVD) and a generalization of the k-means clustering method. The k-SVD algorithm has been applied to numerous applications such as image denoising and recognition, audio processing, biology, and document analysis.

The k-SVD is always accompanied by its Orthogonal Matching Pursuit (OMP) algorithm counterpart. The OMP is actually a sparse representation algorithm, which finds the "best matching" projections of multidimensional data onto the span of an over-complete  dictionary D. 

Both k-SVD and OMP are considered to be examples of joint optimization allgorithms. 

Calendar

Announcements

  • - There are no announcements -