Satpal and Sarawagi PKDD 2007

From Cohen Courses
Jump to navigationJump to search

Citation

Satpal, S. and Sarawagi, S. Domain adaptation of conditional probability models via feature subsetting. Proceedings of PKDD’07 (2007).

Online Version

[1]

Summary

This paper introduces a method for transfer learning that encourages a model trained in one domain to use features that it shares in common with a target domain. This proves useful in instances where there is abundant training data in a domain (such as news wire articles) but little in a domain of interest (such as blogs).

The key challenge in transfer learning is how to reconcile the differences between two distributions. This method addresses this by searching for the subset of features that are present in both domains where the expected values for the features are closest. For instance, if we were trying to recognize names of people, we might take capital letters to be a useful, maybe even required, feature, but that feature may not be reliable in informal blogs, and should be ignored.

Selecting the best feature subset to use is accomplished not by exploring the power set of all feature combinations but by converting the problem into a soft selection problem, where we strongly down-weight features which diverge greatly between the two domains. The formulation is quite similar to regularizing a standard CRF with a Gaussian prior whose variance for each feature changes depending on how different that feature is between domains.

The algorithm can be trained using standard optimization approaches, but the objective function must be treated specially since it is non-convex and its gradient requires quadratic time to evaluate. The authors work around this by using a nested iterative approach, holding some expectations constant at each inner iteration. The method is tested on several combinations of training and target domains and is shown to bring improvements over unadapted models.

Related Papers

The authors compare their method to Blitzer, EMNLP 2006 (structural correspondence learning) and find that their method performs better on the majority of the tasks. The methods are orthogonal though and can be combined to yield even stronger performance.

The method bears resemblance to generalized expectations Mann, ACL 2008, which also seeks to use unlabeled data (but from the same domain) and constrains expectations. These constraints though are sourced from experts.

Do and Ng, ANIPS 2006 present a transfer learning approach which utilizes soft-max regression to train a meta-learner effective across domains.