Difference between revisions of "Class meeting for 10-405 Workflows For Hadoop"

From Cohen Courses
Jump to navigationJump to search
(Undo revision 19115 by Wcohen (talk))
Line 4: Line 4:
  
 
* First lecture: Slides [http://www.cs.cmu.edu/~wcohen/10-405/workflows-1.pptx in Powerpoint], [http://www.cs.cmu.edu/~wcohen/10-405/workflows-1.pdf in PDF].
 
* First lecture: Slides [http://www.cs.cmu.edu/~wcohen/10-405/workflows-1.pptx in Powerpoint], [http://www.cs.cmu.edu/~wcohen/10-405/workflows-1.pdf in PDF].
* Second lecture: Slides [http://www.cs.cmu.edu/~wcohen/10-405/workflows-2.pptx in Powerpoint], [http://www.cs.cmu.edu/~wcohen/10-405/workflows-2.pdf in PDF].
+
* Second lecture: Slides [http://www.cs.cmu.edu/~wcohen/10-405/workflows-2.pptx in Powerpoint], [http://www.cs.cmu.edu/~wcohen/10-405/workflows-2.pdf in PDF] (draft).
* Third lecture: Slides [http://www.cs.cmu.edu/~wcohen/10-405/workflows-3.pptx in Powerpoint],  [http://www.cs.cmu.edu/~wcohen/10-405/workflows-3.pdf in PDF].
+
* Third lecture: Slides [http://www.cs.cmu.edu/~wcohen/10-405/workflows-3.pptx in Powerpoint],  [http://www.cs.cmu.edu/~wcohen/10-405/workflows-3.pdf in PDF] (draft).
  
 
=== Quizzes ===
 
=== Quizzes ===

Revision as of 14:41, 5 February 2018

This is one of the class meetings on the schedule for the course Machine Learning with Large Datasets 10-405 in Spring 2018.

Slides

Quizzes

Readings

Also discussed

Things to Remember

  • The TFIDF representation for documents.
  • What dataflow languages are, what sort of abstract operations they use, and what the complexity of these operations is.
  • How joins are implemented in dataflow (and the difference between map-side and reduce-side joins)
  • What the PageRank algorithm is
  • Common ways of representing graphs in map-reduce system
    • A list of edges
    • A list of nodes with outlinks
  • Why iteration is often expensive in pure dataflow algorithms.
  • How Spark differs from and/or is similar to other dataflow algorithms
    • Actions/transformations
    • RDDs
    • Caching
  • Definition of a similarity join/soft join.
  • Why inverted indices make TFIDF representations useful for similarity joins
    • e.g., whether high-IDF words have shorter or longer indices, and more or less impact in a similarity measure