Search results

From Cohen Courses
Jump to navigationJump to search
  • To summarize this part of the course, we looked at a four algorithms in some detail. ...e Bayes by starting with strong independence assumptions, which lead to an optimization problem that you can solve in closed form. This gives us a super-fast meth
    1 KB (225 words) - 16:52, 30 July 2013
  • * [http://dl.acm.org/citation.cfm?id=743935 Ensemble Methods in Machine Learning], Tom Dietterich .../papers/IntroToBoosting.pdf A Short Introduction to Boosting], Yoav Freund and Robert Schapire.
    1,006 bytes (139 words) - 10:18, 12 January 2016
  • == Citation and Online Link == ...e graph. In addition, the optimization problem during training is easier, and decoding is identical to normal CRFs.
    2 KB (327 words) - 03:08, 11 October 2011
  • * [http://dl.acm.org/citation.cfm?id=743935 Ensemble Methods in Machine Learning], Tom Dietterich .../papers/IntroToBoosting.pdf A Short Introduction to Boosting], Yoav Freund and Robert Schapire.
    1 KB (181 words) - 16:45, 6 January 2016
  • ...sochantaridis05a/tsochantaridis05a.pdf Large Margin Methods for Structured and Interdependent Output Variables]. Journal of Machine Learning Research 6:14 ...ith 2011]; also, A.5 (in the appendix) discusses "aggressive" optimization algorithms
    2 KB (261 words) - 19:23, 28 September 2011
  • * [http://dl.acm.org/citation.cfm?id=743935 Ensemble Methods in Machine Learning], Tom Dietterich .../papers/IntroToBoosting.pdf A Short Introduction to Boosting], Yoav Freund and Robert Schapire.
    1 KB (188 words) - 17:36, 21 July 2014
  • {{MyCiteconference | booktitle = Journal of Optimization Theory and Applications | coauthors = | date = 1985| first = D.F.| last = Shanno | tit ...ginal (non-limited memory variant) algorithm: Broyden, Fletcher, Goldfarb, and Shanno. Here is a picture of all of them:
    5 KB (788 words) - 17:55, 31 October 2011
  • Fei, S. and Pereira, F. 2003. Shallow Parsing with Conditional Random Fields. In Procee ...hor presents the comparison results with previous methods on both accuracy and time efficiency.
    4 KB (576 words) - 22:36, 30 November 2010
  • == Citation and Link == ...iel Marcu. 2005. Learning as search optimization: approximate large margin methods for structured prediction. In ''Proceedings of the 22nd international confe
    3 KB (468 words) - 03:08, 11 October 2011
  • Smith, Noah A. and Jason Eisner (2005). Contrastive estimation: Training log-linear models on [http://www.cs.jhu.edu/~jason/papers/smith+eisner.acl05.pdf Smith and Eisner 2005]
    6 KB (978 words) - 17:16, 13 October 2011
  • ...ation of Graphical Model Parameters Given Approximate Inference, Decoding, and Model Structure", in Proceedings of AISTATS, 2011. ...AddressesProblem::Word Alignment]], [[AddressesProblem::Shallow Parsing]], and [[AddressesProblem::Constituent Parsing]]). The paper formulates the appro
    8 KB (1,106 words) - 02:23, 3 November 2011
  • ...eskovec, Kevin J. Lang, and Michael Mahoney. 2010. Empirical comparison of algorithms for network community detection. In Proceedings of the 19th international c ...istics to extract sets of nodes that are related to the objective function and that "look like" good communities for the application of interest.
    6 KB (870 words) - 21:54, 26 September 2012
  • This is an optimization [[Category::method]], used in many algorithms such as [[AddressesProblem::Support Vector Machines]] or [[AddressesProblem Given a function to be minimized <math>F(\mathbf{x})</math> and a point <math>\mathbf{x}=\mathbf{a}</math>, let <math>\mathbf{b}=\gamma\nab
    3 KB (510 words) - 19:08, 30 September 2011
  • Mark Hopkins and Jonathan May. 2011. Tuning as Ranking. In ''Proceedings of EMNLP-2011''. ...[RelatedPaper::Och, 2003]] as it is not limited to a handful of parameters and can easily handle systems with thousands of features. In addition, unlike r
    8 KB (1,132 words) - 20:40, 29 November 2011
  • ...d learn hyperplanes that maximize the distance separation between positive and negative instances. For cases where the training data isn't perfectly separ We want to choose the <math>{\mathbf{w}}</math> and <math>b</math> to maximize the margin, or distance between the parallel hyp
    3 KB (450 words) - 23:25, 31 March 2011
  • ...edings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing'', pp. 523-530, Vancouver, October 2005. ...ature vector depending on the words <math>x_i</math> and <math>x_j</math>, and <math>\mathbf{w}</math> is a weight factor.
    6 KB (1,005 words) - 21:11, 22 November 2011
  • title = {Modularity and community structure in networks}, [http://www.pnas.org/content/103/23/8577.abstract Modularity and community structure in networks]
    14 KB (2,287 words) - 01:21, 4 October 2012
  • ...eeting for 10-605 2013 01 16|Review of probabilities, joint-distributions, and naive Bayes]] ...ing for 10-605 2013 01 23|Streaming algorithms and Naive Bayes; The stream-and-sort design pattern; Naive Bayes for large feature sets.]]
    7 KB (1,005 words) - 17:20, 10 January 2014
  • * Lecture notes and/or slides will be (re)posted around the time of the lectures. ...nal research questions (eg parameter sensitivity), or reimplementations of methods based on their published description. An acceptable result might confirm s
    9 KB (1,220 words) - 12:06, 28 November 2017
  • Sparse Additive Generative Models of Text. Eisenstein, Ahmed and Xing. Proceedings of ICML 2011. ...proach to [[UsesMethod::Latent Dirichlet Allocation]] (LDA) where sparsity and log-space additive modeling are NOT considered or introduced.
    6 KB (796 words) - 22:55, 28 November 2011

View (previous 20 | next 20) (20 | 50 | 100 | 250 | 500)