Search results

Create the page "Optimization algorithms and methods" on this wiki! See also the search results found.

10-601 Wrap-up on Linear Classification
To summarize this part of the course, we looked at a four algorithms in some detail. ...e Bayes by starting with strong independence assumptions, which lead to an optimization problem that you can solve in closed form. This gives us a super-fast meth

1 KB (225 words) - 16:52, 30 July 2013
10-601B Boosting and Other Ensembles
* [http://dl.acm.org/citation.cfm?id=743935 Ensemble Methods in Machine Learning], Tom Dietterich .../papers/IntroToBoosting.pdf A Short Introduction to Boosting], Yoav Freund and Robert Schapire.

1,006 bytes (139 words) - 10:18, 12 January 2016
Perez-Cruz and Ghahramani 2007 Conditional graphical models
== Citation and Online Link == ...e graph. In addition, the optimization problem during training is easier, and decoding is identical to normal CRFs.

2 KB (327 words) - 03:08, 11 October 2011
10-601 Ensembles
* [http://dl.acm.org/citation.cfm?id=743935 Ensemble Methods in Machine Learning], Tom Dietterich .../papers/IntroToBoosting.pdf A Short Introduction to Boosting], Yoav Freund and Robert Schapire.

1 KB (181 words) - 16:45, 6 January 2016
Class Meeting for 10-710 09-29-2011
...sochantaridis05a/tsochantaridis05a.pdf Large Margin Methods for Structured and Interdependent Output Variables]. Journal of Machine Learning Research 6:14 ...ith 2011]; also, A.5 (in the appendix) discusses "aggressive" optimization algorithms

2 KB (261 words) - 19:23, 28 September 2011
10-601 Ensembles 2
* [http://dl.acm.org/citation.cfm?id=743935 Ensemble Methods in Machine Learning], Tom Dietterich .../papers/IntroToBoosting.pdf A Short Introduction to Boosting], Yoav Freund and Robert Schapire.

1 KB (188 words) - 17:36, 21 July 2014
L-BFGS
{{MyCiteconference | booktitle = Journal of Optimization Theory and Applications | coauthors = | date = 1985| first = D.F.| last = Shanno | tit ...ginal (non-limited memory variant) algorithm: Broyden, Fletcher, Goldfarb, and Shanno. Here is a picture of all of them:

5 KB (788 words) - 17:55, 31 October 2011
Sha 2003 Shallow Parsing with Conditional Random Fields
Fei, S. and Pereira, F. 2003. Shallow Parsing with Conditional Random Fields. In Procee ...hor presents the comparison results with previous methods on both accuracy and time efficiency.

4 KB (576 words) - 22:36, 30 November 2010
Daume and Marcu 2005 Learning as Search Optimization: Approximate Large Margin Methods for Structured Prediction
== Citation and Link == ...iel Marcu. 2005. Learning as search optimization: approximate large margin methods for structured prediction. In ''Proceedings of the 22nd international confe

3 KB (468 words) - 03:08, 11 October 2011
Smith and Eisner 2005:Contrastive Estimation: Training Log-Linear Models on Unlabeled Data
Smith, Noah A. and Jason Eisner (2005). Contrastive estimation: Training log-linear models on [http://www.cs.jhu.edu/~jason/papers/smith+eisner.acl05.pdf Smith and Eisner 2005]

6 KB (978 words) - 17:16, 13 October 2011
Stoyanov et al 2011: Empirical Risk Minimization of Graphical Model Parameters Given Approximate Inference, Decoding, and Model Structure
...ation of Graphical Model Parameters Given Approximate Inference, Decoding, and Model Structure", in Proceedings of AISTATS, 2011. ...AddressesProblem::Word Alignment]], [[AddressesProblem::Shallow Parsing]], and [[AddressesProblem::Constituent Parsing]]). The paper formulates the appro

8 KB (1,106 words) - 02:23, 3 November 2011
Leskovec et al., WWW 2010
...eskovec, Kevin J. Lang, and Michael Mahoney. 2010. Empirical comparison of algorithms for network community detection. In Proceedings of the 19th international c ...istics to extract sets of nodes that are related to the objective function and that "look like" good communities for the application of interest.

6 KB (870 words) - 21:54, 26 September 2012
Stochastic Gradient Descent
This is an optimization [[Category::method]], used in many algorithms such as [[AddressesProblem::Support Vector Machines]] or [[AddressesProblem Given a function to be minimized <math>F(\mathbf{x})</math> and a point <math>\mathbf{x}=\mathbf{a}</math>, let <math>\mathbf{b}=\gamma\nab

3 KB (510 words) - 19:08, 30 September 2011
Hopkins and May, EMNLP 2011. Tuning as Ranking
Mark Hopkins and Jonathan May. 2011. Tuning as Ranking. In ''Proceedings of EMNLP-2011''. ...[RelatedPaper::Och, 2003]] as it is not limited to a handful of parameters and can easily handle systems with thousands of features. In addition, unlike r

8 KB (1,132 words) - 20:40, 29 November 2011
Support Vector Machines
...d learn hyperplanes that maximize the distance separation between positive and negative instances. For cases where the training data isn't perfectly separ We want to choose the <math>{\mathbf{w}}</math> and <math>b</math> to maximize the margin, or distance between the parallel hyp

3 KB (450 words) - 23:25, 31 March 2011
McDonald et al, ACL 2005: Non-Projective Dependency Parsing Using Spanning Tree Algorithms
...edings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing'', pp. 523-530, Vancouver, October 2005. ...ature vector depending on the words <math>x_i</math> and <math>x_j</math>, and <math>\mathbf{w}</math> is a weight factor.

6 KB (1,005 words) - 21:11, 22 November 2011
M. E. J. Newman PNAS 2006
title = {Modularity and community structure in networks}, [http://www.pnas.org/content/103/23/8577.abstract Modularity and community structure in networks]

14 KB (2,287 words) - 01:21, 4 October 2012
Syllabus for Machine Learning with Large Datasets 10-605 in Spring 2013
...eeting for 10-605 2013 01 16|Review of probabilities, joint-distributions, and naive Bayes]] ...ing for 10-605 2013 01 23|Streaming algorithms and Naive Bayes; The stream-and-sort design pattern; Naive Bayes for large feature sets.]]

7 KB (1,005 words) - 17:20, 10 January 2014
Syllabus for Machine Learning with Large Datasets 10-605 in Fall 2017
* Lecture notes and/or slides will be (re)posted around the time of the lectures. ...nal research questions (eg parameter sensitivity), or reimplementations of methods based on their published description. An acceptable result might confirm s

9 KB (1,220 words) - 12:06, 28 November 2017
Eisenstein et al 2011: Sparse Additive Generative Models of Text
Sparse Additive Generative Models of Text. Eisenstein, Ahmed and Xing. Proceedings of ICML 2011. ...proach to [[UsesMethod::Latent Dirichlet Allocation]] (LDA) where sparsity and log-space additive modeling are NOT considered or introduced.

6 KB (796 words) - 22:55, 28 November 2011

Search results

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools