Search results
From Cohen Courses
Jump to navigationJump to searchCreate the page "Optimization algorithms and methods" on this wiki! See also the search results found.
- To summarize this part of the course, we looked at a four algorithms in some detail. ...e Bayes by starting with strong independence assumptions, which lead to an optimization problem that you can solve in closed form. This gives us a super-fast meth1 KB (225 words) - 15:52, 30 July 2013
- * [http://dl.acm.org/citation.cfm?id=743935 Ensemble Methods in Machine Learning], Tom Dietterich .../papers/IntroToBoosting.pdf A Short Introduction to Boosting], Yoav Freund and Robert Schapire.1,006 bytes (139 words) - 09:18, 12 January 2016
- == Citation and Online Link == ...e graph. In addition, the optimization problem during training is easier, and decoding is identical to normal CRFs.2 KB (327 words) - 02:08, 11 October 2011
- * [http://dl.acm.org/citation.cfm?id=743935 Ensemble Methods in Machine Learning], Tom Dietterich .../papers/IntroToBoosting.pdf A Short Introduction to Boosting], Yoav Freund and Robert Schapire.1 KB (181 words) - 15:45, 6 January 2016
- ...sochantaridis05a/tsochantaridis05a.pdf Large Margin Methods for Structured and Interdependent Output Variables]. Journal of Machine Learning Research 6:14 ...ith 2011]; also, A.5 (in the appendix) discusses "aggressive" optimization algorithms2 KB (261 words) - 18:23, 28 September 2011
- * [http://dl.acm.org/citation.cfm?id=743935 Ensemble Methods in Machine Learning], Tom Dietterich .../papers/IntroToBoosting.pdf A Short Introduction to Boosting], Yoav Freund and Robert Schapire.1 KB (188 words) - 16:36, 21 July 2014
- {{MyCiteconference | booktitle = Journal of Optimization Theory and Applications | coauthors = | date = 1985| first = D.F.| last = Shanno | tit ...ginal (non-limited memory variant) algorithm: Broyden, Fletcher, Goldfarb, and Shanno. Here is a picture of all of them:5 KB (788 words) - 16:55, 31 October 2011
- Fei, S. and Pereira, F. 2003. Shallow Parsing with Conditional Random Fields. In Procee ...hor presents the comparison results with previous methods on both accuracy and time efficiency.4 KB (576 words) - 21:36, 30 November 2010
- == Citation and Link == ...iel Marcu. 2005. Learning as search optimization: approximate large margin methods for structured prediction. In ''Proceedings of the 22nd international confe3 KB (468 words) - 02:08, 11 October 2011
- Smith, Noah A. and Jason Eisner (2005). Contrastive estimation: Training log-linear models on [http://www.cs.jhu.edu/~jason/papers/smith+eisner.acl05.pdf Smith and Eisner 2005]6 KB (978 words) - 16:16, 13 October 2011
- ...ation of Graphical Model Parameters Given Approximate Inference, Decoding, and Model Structure", in Proceedings of AISTATS, 2011. ...AddressesProblem::Word Alignment]], [[AddressesProblem::Shallow Parsing]], and [[AddressesProblem::Constituent Parsing]]). The paper formulates the appro8 KB (1,106 words) - 01:23, 3 November 2011
- ...eskovec, Kevin J. Lang, and Michael Mahoney. 2010. Empirical comparison of algorithms for network community detection. In Proceedings of the 19th international c ...istics to extract sets of nodes that are related to the objective function and that "look like" good communities for the application of interest.6 KB (870 words) - 20:54, 26 September 2012
- This is an optimization [[Category::method]], used in many algorithms such as [[AddressesProblem::Support Vector Machines]] or [[AddressesProblem Given a function to be minimized <math>F(\mathbf{x})</math> and a point <math>\mathbf{x}=\mathbf{a}</math>, let <math>\mathbf{b}=\gamma\nab3 KB (510 words) - 18:08, 30 September 2011
- Mark Hopkins and Jonathan May. 2011. Tuning as Ranking. In ''Proceedings of EMNLP-2011''. ...[RelatedPaper::Och, 2003]] as it is not limited to a handful of parameters and can easily handle systems with thousands of features. In addition, unlike r8 KB (1,132 words) - 19:40, 29 November 2011
- ...d learn hyperplanes that maximize the distance separation between positive and negative instances. For cases where the training data isn't perfectly separ We want to choose the <math>{\mathbf{w}}</math> and <math>b</math> to maximize the margin, or distance between the parallel hyp3 KB (450 words) - 22:25, 31 March 2011
- ...edings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing'', pp. 523-530, Vancouver, October 2005. ...ature vector depending on the words <math>x_i</math> and <math>x_j</math>, and <math>\mathbf{w}</math> is a weight factor.6 KB (1,005 words) - 20:11, 22 November 2011
- title = {Modularity and community structure in networks}, [http://www.pnas.org/content/103/23/8577.abstract Modularity and community structure in networks]14 KB (2,287 words) - 00:21, 4 October 2012
- ...eeting for 10-605 2013 01 16|Review of probabilities, joint-distributions, and naive Bayes]] ...ing for 10-605 2013 01 23|Streaming algorithms and Naive Bayes; The stream-and-sort design pattern; Naive Bayes for large feature sets.]]7 KB (1,005 words) - 16:20, 10 January 2014
- * Lecture notes and/or slides will be (re)posted around the time of the lectures. ...nal research questions (eg parameter sensitivity), or reimplementations of methods based on their published description. An acceptable result might confirm s9 KB (1,220 words) - 11:06, 28 November 2017
- Sparse Additive Generative Models of Text. Eisenstein, Ahmed and Xing. Proceedings of ICML 2011. ...proach to [[UsesMethod::Latent Dirichlet Allocation]] (LDA) where sparsity and log-space additive modeling are NOT considered or introduced.6 KB (796 words) - 21:55, 28 November 2011