Difference between revisions of "Class meeting for 10-605 Randomized Algorithms"

From Cohen Courses
Jump to navigationJump to search
(Created page with "This is one of the class meetings on the schedule for the course Machine Learning with Large Datase...")
 
Line 6: Line 6:
  
  
=== Readings ===
+
This is one of the class meetings on the [[Syllabus for Machine Learning with Large Datasets 10-605 in Fall 2016|schedule]] for the course [[Machine Learning with Large Datasets 10-605 in Fall_2016]].
  
* [http://dl.acm.org/citation.cfm?id=1150479 Samping from Large Graphs], Jure Leskovec and Christos Faloutsos, KDD 2006.
+
 
* [http://www.math.ucsd.edu/~fan/wp/localpartition.pdf Local Graph Partitioning using PageRank Vectors], Andersen, Chung, Lang, FOCS 2006
+
=== Slides ===
* [http://link.springer.com/chapter/10.1007/978-3-540-77004-6_13#page-1 Andersen, Reid, Fan Chung, and Kevin Lang. "Local partitioning for directed graphs using PageRank." Algorithms and Models for the Web-Graph. Springer Berlin Heidelberg, 2007. 166-178.]
+
 
 +
* TBD
 +
 
 +
Supplement:
 +
 
 +
* [http://www.cs.cmu.edu/~wcohen/10-605/bloomfilter.py Python demo code for Bloom filter]
 +
 
 +
=== Optional Readings ===
 +
 
 +
* [http://dl.acm.org/citation.cfm?id=1219840.1219917 Randomized Algorithms and NLP: Using Locality Sensitive Hash Functions for High Speed Noun Clustering] Deepak Ravichandran, Patrick Pantel, and Eduard Hovy
 +
* [http://www.cs.jhu.edu/~vandurme/papers/VanDurmeLallACL10.pdf Online Generation of Locality Sensitive Hash Signatures]. Benjamin Van Durme and Ashwin Lall.  ACL Short. 2010
 +
* [http://www.umiacs.umd.edu/~amit/Papers/goyalPointQueryEMNLP12.pdf Sketch Algorithms for Estimating Point Queries in NLP.]  Amit Goyal, Hal Daume III, and Graham Cormode, EMNLP 2012]
 +
 
 +
=== Key things to remember ===
 +
 
 +
* The API for the randomized methods we studied: Bloom filters, LSH, CM sketches, and specifically, when you would use which technique.
 +
* The relationship between hash kernels and CM sketches.
 +
* What are the key tradeoffs associated with these methods, in terms of space/time efficiency and accuracy, and what sorts of errors are made by which algorithms (e.g., if they give over/under estimates, false positives/false negatives, etc).
 +
* What guarantees are possible, and how space grows as you require more accuracy.
 +
* Which algorithms allow one to combine sketches easily.
  
 
=== Key things to remember ===
 
=== Key things to remember ===

Revision as of 16:29, 11 August 2016

This is one of the class meetings on the schedule for the course Machine Learning with Large Datasets 10-605 in Fall_2016.

Slides

  • TBD


This is one of the class meetings on the schedule for the course Machine Learning with Large Datasets 10-605 in Fall_2016.


Slides

  • TBD

Supplement:

Optional Readings

Key things to remember

  • The API for the randomized methods we studied: Bloom filters, LSH, CM sketches, and specifically, when you would use which technique.
  • The relationship between hash kernels and CM sketches.
  • What are the key tradeoffs associated with these methods, in terms of space/time efficiency and accuracy, and what sorts of errors are made by which algorithms (e.g., if they give over/under estimates, false positives/false negatives, etc).
  • What guarantees are possible, and how space grows as you require more accuracy.
  • Which algorithms allow one to combine sketches easily.

Key things to remember

  • How to implement graph algorithms like PageRank by streaming through a graph, under various conditions:
    • Vertex weights fit in memory
    • Vertex weights do not fit in memory
  • The meaning of various graph statistics: degree distribution, clustering coefficient, ...
  • Why sampling from a graph is non-trivial if you want to preserve properties of the graph like
    • Degree distribution
    • Homophily as measured by clustering coefficient,
  • What local graph partitioning is and how the PageRank-Nibble algorithm, together with sweeps to optimize conductance, can be used to approximately solve it.
  • The implications of the analysis of PageRank-Nibble.