Difference between revisions of "Class meeting for 10-605 in Fall 2016 Overview"

From Cohen Courses
Jump to navigationJump to search
Line 8: Line 8:
 
=== Homework ===
 
=== Homework ===
  
* Before the next class: watch [https://mediatech-stream.andrew.cmu.edu/Mediasite/Catalog/Full/4e86c44694a14b9fbe1ea7653f553ac621 My overview lecture from 10-601 ] (lecture 1, and a little of lecture 2) if you need it.
+
* Before the next class: review your probabilities!  You should be familiar with the material in these lectures:
 +
**  [https://mediatech-stream.andrew.cmu.edu/Mediasite/Play/9e04feebd4bb4900a8c828388be620d91d?catalog=81e613d0-fda8-47a4-8340-86b96d5a3cbb my overview lecture from 10-601 ] (lecture from 1-13-2016)
 +
** [https://mediatech-stream.andrew.cmu.edu/Mediasite/Play/e99b074dadb24a11a68b6dae418ac9a91d?catalog=81e613d0-fda8-47a4-8340-86b96d5a3cbb first 20 minutes of second over lecture for 10-601] (lecture from 1-16-2016, up to the 'joint distribution' section)
 +
The slides used in these lectures are [[10-601_Introduction_to_Probability|posted here]], along with some review notes for what is covered.
 +
 
  
 
* Today's quiz: [https://qna-app.appspot.com/edit_new.html#/pages/view/aglzfnFuYS1hcHByGQsSDFF1ZXN0aW9uTGlzdBiAgIDQqdaqCQw]
 
* Today's quiz: [https://qna-app.appspot.com/edit_new.html#/pages/view/aglzfnFuYS1hcHByGQsSDFF1ZXN0aW9uTGlzdBiAgIDQqdaqCQw]

Revision as of 11:05, 9 August 2017

This is one of the class meetings on the schedule for the course Machine Learning with Large Datasets 10-605 in Fall 2016.

Slides

Homework

The slides used in these lectures are posted here, along with some review notes for what is covered.


  • Today's quiz: [1]

Readings for the Class

Also discussed

Things to remember

  • Why use big data?
    • Simple learning methods with large data sets can outperform complex learners with smaller datasets
    • The ordering of learning methods, best-to-worst, can be different for small datasets than from large datasets
    • The best way to improve performance for a learning system is often to collect more data
    • Large datasets often imply large classifiers
  • Asymptotic analysis
    • It measures number of operations as function of problem size
    • Different operations (eg disk seeking, scanning, memory access) can have very very different costs
    • Disk access is cheapest when you scan sequentially