Difference between revisions of "Satpal and Sarawagi PKDD 2007"
PastStudents (talk | contribs) (Created page with '== Citation == Satpal, S. and Sarawagi, S. Domain adaptation of conditional probability models via feature subsetting. Proceedings of PKDD’07 (2007). == Online Version == […') |
PastStudents (talk | contribs) |
||
(One intermediate revision by the same user not shown) | |||
Line 7: | Line 7: | ||
[http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.99.8784&rep=rep1&type=pdf] | [http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.99.8784&rep=rep1&type=pdf] | ||
+ | == Summary == | ||
− | + | This [[Category::paper]] introduces a method for [[AddressesProblem::transfer learning]] that encourages a model trained in one domain to use features that it shares in common with a target domain. This proves useful in instances where there is abundant training data in a domain (such as news wire articles) but little in a domain of interest (such as blogs). | |
− | + | The key challenge in transfer learning is how to reconcile the differences between two distributions. This method addresses this by searching for the subset of features that are present in both domains where the expected values for the features are closest. For instance, if we were trying to recognize names of people, we might take capital letters to be a useful, maybe even required, feature, but that feature may not be reliable in informal blogs, and should be ignored. | |
− | + | Selecting the best feature subset to use is accomplished not by exploring the power set of all feature combinations but by converting the problem into a soft selection problem, where we strongly down-weight features which diverge greatly between the two domains. The formulation is quite similar to regularizing a standard [[UsesMethod::CRF]] with a Gaussian prior whose variance for each feature changes depending on how different that feature is between domains. | |
− | The | + | The algorithm can be trained using standard optimization approaches, but the objective function must be treated specially since it is non-convex and its gradient requires quadratic time to evaluate. The authors work around this by using a nested iterative approach, holding some expectations constant at each inner iteration. The method is tested on several combinations of training and target domains and is shown to bring improvements over unadapted models. |
== Related Papers == | == Related Papers == | ||
− | [[RelatedPaper:: | + | The authors compare their method to [[RelatedPaper::Blitzer, EMNLP 2006]] (structural correspondence learning) and find that their method performs better on the majority of the tasks. The methods are orthogonal though and can be combined to yield even stronger performance. |
− | [[RelatedPaper:: | + | The method bears resemblance to generalized expectations [[RelatedPaper::Mann, ACL 2008]], which also seeks to use unlabeled data (but from the same domain) and constrains expectations. These constraints though are sourced from experts. |
− | [[RelatedPaper:: | + | [[RelatedPaper::Do and Ng, ANIPS 2006]] present a transfer learning approach which utilizes soft-max regression to train a meta-learner effective across domains. |
Latest revision as of 05:50, 1 December 2010
Citation
Satpal, S. and Sarawagi, S. Domain adaptation of conditional probability models via feature subsetting. Proceedings of PKDD’07 (2007).
Online Version
Summary
This paper introduces a method for transfer learning that encourages a model trained in one domain to use features that it shares in common with a target domain. This proves useful in instances where there is abundant training data in a domain (such as news wire articles) but little in a domain of interest (such as blogs).
The key challenge in transfer learning is how to reconcile the differences between two distributions. This method addresses this by searching for the subset of features that are present in both domains where the expected values for the features are closest. For instance, if we were trying to recognize names of people, we might take capital letters to be a useful, maybe even required, feature, but that feature may not be reliable in informal blogs, and should be ignored.
Selecting the best feature subset to use is accomplished not by exploring the power set of all feature combinations but by converting the problem into a soft selection problem, where we strongly down-weight features which diverge greatly between the two domains. The formulation is quite similar to regularizing a standard CRF with a Gaussian prior whose variance for each feature changes depending on how different that feature is between domains.
The algorithm can be trained using standard optimization approaches, but the objective function must be treated specially since it is non-convex and its gradient requires quadratic time to evaluate. The authors work around this by using a nested iterative approach, holding some expectations constant at each inner iteration. The method is tested on several combinations of training and target domains and is shown to bring improvements over unadapted models.
Related Papers
The authors compare their method to Blitzer, EMNLP 2006 (structural correspondence learning) and find that their method performs better on the majority of the tasks. The methods are orthogonal though and can be combined to yield even stronger performance.
The method bears resemblance to generalized expectations Mann, ACL 2008, which also seeks to use unlabeled data (but from the same domain) and constrains expectations. These constraints though are sourced from experts.
Do and Ng, ANIPS 2006 present a transfer learning approach which utilizes soft-max regression to train a meta-learner effective across domains.