Difference between revisions of "10-601 PAC"

Latest revision as of 16:46, 6 January 2016

This a lecture used in the Syllabus for Machine Learning 10-601B in Spring 2016

Slides

William's lecture: Slides in pdf, Slides in Powerpoint,

Readings

Mitchell Chapter 7

What you should remember

Definition of pac-learnability.
Definition of sample complexity vs time complexity
How sample complexity grows with 1/epsilon, 1/delta, and |H|
- in the noise-free case.
- in the "agnostic" setting, where noise is present and the learner outputs the smallest-error-rate hypothesis.
The definition of VC-dimension and shattering
How VC dimension relates to sample complexity

@@ Line 1: / Line 1: @@
+This a lecture used in the [[Syllabus for Machine Learning 10-601B in Spring 2016]]
 === Slides ===
-[http://curtis.ml.cmu.edu/w/courses/images/8/8d/Lecture10-pac.pdf Slides in PDF]
+* William's lecture: [http://www.cs.cmu.edu/~wcohen/10-601/pac-learning.pdf Slides in pdf], [http://www.cs.cmu.edu/~wcohen/10-601/pac-learning.pptx Slides in Powerpoint],
-[http://curtis.ml.cmu.edu/w/courses/images/0/0f/Lecture10-pac-annotated.pdf Annotated Slides in PDF]
 === Readings ===
-the PAC learning, VC dimension, etc. in Tom Mitchell 's Chapter 7
+* Mitchell Chapter 7
-=== An example of PAC learnability of Boolean literals : Learning a Boolean Conjunction ===
-(from http://www.cis.temple.edu/~giorgio/cis587/readings/pac.html)
-We can efficiently PAC learn concepts that are represented as the conjunction of boolean literals (i.e. positive or negative boolean variables). Here is a learning algorithm:
-. Start with an hypothesis h which is the conjunction of each variable and its negation x1 & ~x1 & x2 & ~x2 & .. & xn & ~xn.
-. Do nothing with negative instances.
-. On a positive instance a, eliminate in h ~xi if ai is positive, eliminate xi if ai is negative. For example if a positive instance is 01100 then eliminate x1, ~x2, ~x3, x4, and x5.
-In this algorithm h, as domain, is non-decreasing and at all times contained in the set denoted by c. [By induction: certainly true initially, ..]. We will have an error when h contains a literal z which is not in c.
-We compute first the probability that a literal z is deleted from h because of one specific positive example. Clearly this probability is 0 if z occurs in c, and if ~z is in c the probability is 1. At issue are the z where neither z nor ~z is in c.
-We would like to eliminate both of them from h. If one of the two remains, we have an error for an instance a that is positive for c and negative for h. Let's call these literals, '''free''' literals.
-We have:
-'''error(h) is less than or equal to the Sum of the probabilities of the free literals z in h not to be eliminated by one positive example.'''
-Since there are at most 2*n literals in h, if h is a bad hypothesis, i.e. an hypothesis with error greater than epsilon, we will have
-'''Probability[free literal z is eliminated from h by one positive example] > epsilon/(2*n)'''
-From this we obtain :
-'''Probability[free literal z survives one positive example] = 1 - Probability[free literal z is eliminated from h by one positive example] < (1 - epsilon/(2*n))'''
-'''Probability[free literal z survives m positive examples] < (1 - epsilon/(2*n))^m'''
-'''Probability[some free literal z survives m positive examples] < 2n*(1 - epsilon/(2*n))^m < 2n*(e^(-epsilon/(2*n)))^m = 2n*e^(-(m*epsilon)/(2*n))'''
-That is
-'''m > (2*n/epsilon)*(ln(1/delta)+n*ln(2))'''
 === What you should remember ===
-* Relationships between sample complexity, error bound, and “capacity” of chosen hypothesis space
+* Definition of pac-learnability.
-* Within the PAC learning setting, we can bound the probability that learner will output hypothesis with given error
+* Definition of sample complexity vs time complexity
-** For ANY consistent learner (case where c in H)
+* How sample complexity grows with 1/epsilon, 1/delta, and |H|
-** For ANY “best fit” hypothesis (agnostic learning, where perhaps c not in H)
+**  in the noise-free case.
+** in the "agnostic" setting, where noise is present and the learner outputs the smallest-error-rate hypothesis.
-* VC dimension as measure of complexity of H
+* The definition of VC-dimension and shattering
+* How VC dimension relates to sample complexity

Difference between revisions of "10-601 PAC"

Latest revision as of 16:46, 6 January 2016

Slides

Readings

What you should remember

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools