Difference between revisions of "Class meeting for 10-605 Randomized Algorithms"

From Cohen Courses
Jump to navigationJump to search
Line 22: Line 22:
 
* What guarantees are possible, and how space grows as you require more accuracy.
 
* What guarantees are possible, and how space grows as you require more accuracy.
 
* Which algorithms allow one to combine sketches easily.
 
* Which algorithms allow one to combine sketches easily.
 
=== Key things to remember ===
 
 
* How to implement graph algorithms like PageRank by streaming through a graph, under various conditions:
 
** Vertex weights fit in memory
 
** Vertex weights do not fit in memory
 
* The meaning of various graph statistics: degree distribution, clustering coefficient, ...
 
* Why sampling from a graph is non-trivial if you want to preserve properties of the graph like
 
** Degree distribution
 
** Homophily as measured by clustering coefficient,
 
* What local graph partitioning is and how the PageRank-Nibble algorithm, together with sweeps to optimize conductance, can be used to approximately solve it.
 
* The implications of the analysis of PageRank-Nibble.
 

Revision as of 16:30, 11 August 2016

This is one of the class meetings on the schedule for the course Machine Learning with Large Datasets 10-605 in Fall_2016.

Slides

  • TBD

Supplement:

Optional Readings

Key things to remember

  • The API for the randomized methods we studied: Bloom filters, LSH, CM sketches, and specifically, when you would use which technique.
  • The relationship between hash kernels and CM sketches.
  • What are the key tradeoffs associated with these methods, in terms of space/time efficiency and accuracy, and what sorts of errors are made by which algorithms (e.g., if they give over/under estimates, false positives/false negatives, etc).
  • What guarantees are possible, and how space grows as you require more accuracy.
  • Which algorithms allow one to combine sketches easily.