Class meeting for 10-405 Overview

From Cohen Courses
Jump to navigationJump to search

This is one of the class meetings on the schedule for the course Machine Learning with Large Datasets 10-405 in Spring 2018.

Slides

Homework

The slides used in these lectures are posted here, along with some review notes for what is covered.

And after each lecture in this class there will be a quiz.

  • Today's quiz: [1]

Readings for the Class

Also discussed

Things to remember

  • Why use big data?
    • Simple learning methods with large data sets can outperform complex learners with smaller datasets
    • The ordering of learning methods, best-to-worst, can be different for small datasets than from large datasets
    • The best way to improve performance for a learning system is often to collect more data
    • Large datasets often imply large classifiers
  • Asymptotic analysis
    • It measures number of operations as function of problem size
    • Different operations (eg disk seeking, scanning, memory access) can have very very different costs
    • Disk access is cheapest when you scan sequentially