Vlado Menkovski bio photo

Vlado Menkovski

Assistant Professor at Eindhoven University of Technology.

Email Google Scholar

Many real-world applications present data in the form of high dimensional sequences. Typical examples are speech recognition, handwriting recognition, and sequence transduction [1,2].

However, in these tasks, the formulation of the problem vary in consequential ways. The input can be from a fixed set of symbols or from a fixed vocabulary (as is typically in natural language processing tasks), or the input can be a high-frequency sample from a sensor as in speech recognition. The target sequence can also be specified in a number of ways. The target can be a sequence of different length and of a different character. In speech recognition, the input microphone recordings need to be mapped to words or phonemes. In this case, we do not necessarily need to align both sequences, we are just aiming to get the correct output sequence. A particular type of sequence classification is the aligned sequence classification or sequence transduction where we need to precisely determine a sequence of classification decisions for each point in the input sequence. This type of classification is analogous to the image segmentation task in image analysis, where for each pixel in the image we need to associate a class.

The overall goal of this project is to develop a model for aligned sequence classification.

The topic can be studied on a number of datasets such as:

http://www.fki.inf.unibe.ch/databases/iam-handwriting-database

https://edwin-de-jong.github.io/blog/mnist-sequence-data/

This overall goal can be broken up in specific research questions such as:

  • What are the limitations of CNN-RNN models for aligned sequence classifications?
  • Does attention mechanisms provide additional benefits in this context?
  • What are the advantages and disadvantages of using a CTC[3] loss function in this context?
  • What are the advantages/disadvantages of using a sequence to sequence model?
  • What are the advantages of the Neural Transducer model[2]?
  • Can autoregressive models such as Wavenet or transformer architecture be adapted to this task? What would be the advantages/disadvantages of such an approach?
  • Generative models, particularly latent variable models, would allow for the further utility of the models such as conditional generation and likelihood estimation. What are the possibilities to develop generative models such as sequential variational autoencoders [5] in this context?

Further context. Typically, an aligned sequence classification task proposes a supervised setting of {x, y} pairs, where x and y are aligned sequences (see example problem: [4]). However, in more realistic settings, you may consider scenarios, where there is a significantly much less unlabelled data {x} available and only a restricted amount of labeled data {x, y} available. Another realistic scenario is that there may be noise in the labels, both in the actual label sequence as well in the alignment with the input. You can furthermore, consider scenarios where there may be additional expert knowledge available about the statistical properties of the output sequence that may be valuable to achieve better performance. Such the target sequence {y} may represent a state of a system for which we have some knowledge e.g. the starting state and/or the ending state as well as the probability of transitions given previous states. This information may be available or may need to be computed from the available data.

Supervision: Vlado Menkovski

[1] Graves, Alex. “Sequence transduction with recurrent neural networks.” arXiv preprint arXiv:1211.3711 (2012). link

[2] Jaitly, Navdeep, et al. “A neural transducer.” arXiv preprint arXiv:1511.04868 (2015). link

[3] Graves, Alex, et al. “Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks.” Proceedings of the 23rd international conference on Machine learning. ACM, 2006. link

[4] F. Matos, V. Menkovski, F. Felici, A. Pau, F. Jenko (the TCV Team and the EUROfusion MST1 Team). “Classification of tokamak plasma confinement states with convolutional recurrent neural networks” link

[5] Fraccaro, Marco, et al. “Sequential neural models with stochastic layers.” Advances in neural information processing systems. 2016. link