Kernel Dependency Estimation, Weston, Chapelle, Elisseeff, Schoelkopf, and Vapnik, 2003

Citation

Jason Weston, Olivier Chapelle, André Elisseeff, Bernhard Schölkopf, Vladimir Vapnik, Kernel Dependency Estimation, In NIPS (2002), pp. 873-880.

Online version

[1]

Summary

The Paper introduces kernels on structured outputs. In a traditional sense, a structured prediction problem is defined as minimizing the value of a risk function $\int l(y,f(x,y))dP(x,y)$ . The loss function $l(y,y')$ is a measure of similarity between the outputs and can be viewed as an inner product in some Hilbert space. Using the same intuition that is applied on input, we can visualize the loss function as a kernelized version of an output space (probably of a lower dimensionality?).

The paper provides a framework known as Kernel Dependency Estimation. Within the framework, it is tested on three tasks. Finally, it concludes with a discussion.

Method

The method is divided into three steps.

Decompose the outputs : Construct a kernel matrix $L_{ij}=l(y_{i},y_{j})$ . Perform a [kernel PCA]. The eigenvectors corresponding to the top 'n' eigenvalues constitute the direction. The space of loss functions is projected into this n dimensional space.
Learn a map : The output is estimated in this n-dimensional space using multivariate regression.
Solve the preimage : The intended structured output is the pre-image of the transformation in step 1. This is known as pre-image problem.

Testing

The method is tested for three tasks

Toy problem (String classification): Strings from three classes and alphabet {a, b, c, d} are generated with a random length (10-15). The output for each class is a string corrupted with noise. The kernel for the outputs used is the string subsequence kernel and a non linear map using RBF kernel. The preimage of the estimated output is the closest training example output. The method is compared with [KNN]. KDE performs significantly better (on the standard error).

Multi-class classification problem : USPS handwritten digit database. Used RBF kernel for inputs and zero-one multiclass classification for the outputs. It was compared with [KNN] and 1-vs-rest SVM.

Image reconstruction : Reconstructing the bottom half of a handwritten digit given the top half. Kernel used - RBF with width estimated by maximizing the kernel alignment with a target kernel generated via k-means clustering (30 clusters). Pre-image problem is solved by choosing the closest training example.

Pre-image problem

One of the major issues with this method is the pre-image problem. Although the authors list several methods for solving this problem, they admit that for complex problems, improved pre-image approaches have to be applied.

Kernel Dependency Estimation, Weston, Chapelle, Elisseeff, Schoelkopf, and Vapnik, 2003

Contents

Citation

Online version

Summary

Method

Testing

Pre-image problem

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools