Difference between revisions of "Class meeting for 10-605 Overview"

From Cohen Courses
Jump to navigationJump to search
 
(3 intermediate revisions by the same user not shown)
Line 1: Line 1:
  
This is one of the class meetings on the [[Syllabus for Machine Learning with Large Datasets 10-605 in Fall 2016|schedule]] for the course [[Machine Learning with Large Datasets 10-605 in Fall 2017]].
+
This is one of the class meetings on the [[Syllabus for Machine Learning with Large Datasets 10-605 in Fall 2017|schedule]] for the course [[Machine Learning with Large Datasets 10-605 in Fall 2017]].
  
 
=== Slides ===
 
=== Slides ===
Line 15: Line 15:
  
 
And after each lecture in this class there will be a quiz.
 
And after each lecture in this class there will be a quiz.
* Today's quiz: [https://qna-app.appspot.com/edit_new.html#/pages/view/aglzfnFuYS1hcHByGQsSDFF1ZXN0aW9uTGlzdBiAgIDQqdaqCQw]
+
* Today's quiz: [https://qna.cs.cmu.edu/#/pages/view/149]
  
 
=== Readings for the Class ===
 
=== Readings for the Class ===

Latest revision as of 11:09, 29 August 2017

This is one of the class meetings on the schedule for the course Machine Learning with Large Datasets 10-605 in Fall 2017.

Slides

Homework

The slides used in these lectures are posted here, along with some review notes for what is covered.

And after each lecture in this class there will be a quiz.

  • Today's quiz: [1]

Readings for the Class

Also discussed

Things to remember

  • Why use big data?
    • Simple learning methods with large data sets can outperform complex learners with smaller datasets
    • The ordering of learning methods, best-to-worst, can be different for small datasets than from large datasets
    • The best way to improve performance for a learning system is often to collect more data
    • Large datasets often imply large classifiers
  • Asymptotic analysis
    • It measures number of operations as function of problem size
    • Different operations (eg disk seeking, scanning, memory access) can have very very different costs
    • Disk access is cheapest when you scan sequentially