Difference between revisions of "10-601 PAC"

From Cohen Courses
Jump to navigationJump to search
(Created page with "1) reading: the PAC learning, VC dimension, etc. and relevant chapters from Tom Mitchell 's (I forgot which chapter, please find out) 2) an example of PAC learnability of Boo...")
 
 
(14 intermediate revisions by 3 users not shown)
Line 1: Line 1:
1) reading: the PAC learning, VC dimension, etc. and relevant chapters from Tom Mitchell 's (I forgot which chapter, please find out)
+
This a lecture used in the [[Syllabus for Machine Learning 10-601B in Spring 2016]]
  
2) an example of PAC learnability of Boolean literals (from http://www.cis.temple.edu/~giorgio/cis587/readings/pac.html)
+
=== Slides ===
  
 +
* William's lecture: [http://www.cs.cmu.edu/~wcohen/10-601/pac-learning.pdf Slides in pdf], [http://www.cs.cmu.edu/~wcohen/10-601/pac-learning.pptx Slides in Powerpoint],
  
 +
=== Readings ===
  
== Learning a Boolean Conjunction ==
+
* Mitchell Chapter 7
  
 +
=== What you should remember ===
  
We can efficiently PAC learn concepts that are represented as the conjunction of boolean literals (i.e. positive or negative boolean variables). Here is a learning algorithm:
+
* Definition of pac-learnability.
Start with an hypothesis h which is the conjunction of each variable and its negation x1 & ~x1 & x2 & ~x2 & .. & xn & ~xn.
+
* Definition of sample complexity vs time complexity
Do nothing with negative instances.
+
* How sample complexity grows with 1/epsilon, 1/delta, and |H|
On a positive instance a, eliminate in h ~xi if ai is positive, eliminate xi if ai is negative. For example if a positive instance is 01100 then eliminate x1, ~x2, ~x3, x4, and x5.
+
**  in the noise-free case.
In this algorithm h, as domain, is non-decreasing and at all times contained in the set denoted by c. [By induction: certainly true initially, ..]. We will have an error when h contains a literal z which is not in c.
+
** in the "agnostic" setting, where noise is present and the learner outputs the smallest-error-rate hypothesis.
We compute first the probability that a literal z is deleted from h because of one specific positive example. Clearly this probability is 0 if z occurs in c, and if ~z is in c the probability is 1. At issue are the z where neither z nor ~z is in c. We would like to eliminate both of them from h. If one of the two remains, we have an error for an instance a that is positive for c and negative for h. Let's call these literals, free literals.
+
* The definition of VC-dimension and shattering
We have:
+
* How VC dimension relates to sample complexity
 
 
error(h) is less than or equal to the Sum of the probabilities of the free literals z in h not to be eliminated by one positive example.
 
 
 
Since there are at most 2*n literals in h, if h is a bad hypothesis, i.e. an hypothesis with error greater than epsilon, we will have
 
Probability[free literal z is eliminated from h by one positive example] > epsilon/(2*n)
 
 
 
From this we obtain :
 
Probability[free literal z survives one positive example] = 1 - Probability[free literal z is eliminated from h by one positive example] < (1 - epsilon/(2*n))
 
 
 
Probability[free literal z survives m positive examples] < (1 - epsilon/(2*n))^m
 
 
 
Probability[some free literal z survives m positive examples] < 2n*(1 - epsilon/(2*n))^m < 2n*(e^(-epsilon/(2*n)))^m = 2n*e^(-(m*epsilon)/(2*n))
 
 
 
That is
 
m > (2*n/epsilon)*(ln(1/delta)+n*ln(2))
 
 
 
 
 
3) what you should remember:
 
lRelationships between sample complexity, error bound, and “capacity” of chosen hypothesis space
 
l
 
lWithin the PAC learning setting, we can bound the probability that learner will output hypothesis with given error
 
lFor ANY consistent learner (case where c in H)
 
lFor ANY “best fit” hypothesis (agnostic learning, where perhaps c not in H)
 
lVC dimension as measure of complexity of H
 

Latest revision as of 16:46, 6 January 2016

This a lecture used in the Syllabus for Machine Learning 10-601B in Spring 2016

Slides

Readings

  • Mitchell Chapter 7

What you should remember

  • Definition of pac-learnability.
  • Definition of sample complexity vs time complexity
  • How sample complexity grows with 1/epsilon, 1/delta, and |H|
    • in the noise-free case.
    • in the "agnostic" setting, where noise is present and the learner outputs the smallest-error-rate hypothesis.
  • The definition of VC-dimension and shattering
  • How VC dimension relates to sample complexity