Difference between revisions of "10-601 PAC"
(Created page with "1) reading: the PAC learning, VC dimension, etc. and relevant chapters from Tom Mitchell 's (I forgot which chapter, please find out) 2) an example of PAC learnability of Boo...") |
|||
Line 1: | Line 1: | ||
− | + | === Slides === | |
− | + | [http://curtis.ml.cmu.edu/w/courses/images/8/8d/Lecture10-pac.pdf Slides in PDF] | |
+ | === Readings === | ||
+ | the PAC learning, VC dimension, etc. in Tom Mitchell 's Chapter 7 | ||
− | == Learning a Boolean Conjunction == | + | === An example of PAC learnability of Boolean literals : Learning a Boolean Conjunction === |
+ | (from http://www.cis.temple.edu/~giorgio/cis587/readings/pac.html) | ||
We can efficiently PAC learn concepts that are represented as the conjunction of boolean literals (i.e. positive or negative boolean variables). Here is a learning algorithm: | We can efficiently PAC learn concepts that are represented as the conjunction of boolean literals (i.e. positive or negative boolean variables). Here is a learning algorithm: | ||
− | Start with an hypothesis h which is the conjunction of each variable and its negation x1 & ~x1 & x2 & ~x2 & .. & xn & ~xn. | + | |
− | Do nothing with negative instances. | + | 1. Start with an hypothesis h which is the conjunction of each variable and its negation x1 & ~x1 & x2 & ~x2 & .. & xn & ~xn. |
− | On a positive instance a, eliminate in h ~xi if ai is positive, eliminate xi if ai is negative. For example if a positive instance is 01100 then eliminate x1, ~x2, ~x3, x4, and x5. | + | |
+ | 2. Do nothing with negative instances. | ||
+ | |||
+ | 3. On a positive instance a, eliminate in h ~xi if ai is positive, eliminate xi if ai is negative. For example if a positive instance is 01100 then eliminate x1, ~x2, ~x3, x4, and x5. | ||
+ | |||
In this algorithm h, as domain, is non-decreasing and at all times contained in the set denoted by c. [By induction: certainly true initially, ..]. We will have an error when h contains a literal z which is not in c. | In this algorithm h, as domain, is non-decreasing and at all times contained in the set denoted by c. [By induction: certainly true initially, ..]. We will have an error when h contains a literal z which is not in c. | ||
− | We compute first the probability that a literal z is deleted from h because of one specific positive example. Clearly this probability is 0 if z occurs in c, and if ~z is in c the probability is 1. At issue are the z where neither z nor ~z is in c. We would like to eliminate both of them from h. If one of the two remains, we have an error for an instance a that is positive for c and negative for h. Let's call these literals, free literals. | + | |
+ | We compute first the probability that a literal z is deleted from h because of one specific positive example. Clearly this probability is 0 if z occurs in c, and if ~z is in c the probability is 1. At issue are the z where neither z nor ~z is in c. | ||
+ | |||
+ | We would like to eliminate both of them from h. If one of the two remains, we have an error for an instance a that is positive for c and negative for h. Let's call these literals, '''free''' literals. | ||
+ | |||
We have: | We have: | ||
− | error(h) is less than or equal to the Sum of the probabilities of the free literals z in h not to be eliminated by one positive example. | + | '''error(h) is less than or equal to the Sum of the probabilities of the free literals z in h not to be eliminated by one positive example.''' |
Since there are at most 2*n literals in h, if h is a bad hypothesis, i.e. an hypothesis with error greater than epsilon, we will have | Since there are at most 2*n literals in h, if h is a bad hypothesis, i.e. an hypothesis with error greater than epsilon, we will have | ||
− | Probability[free literal z is eliminated from h by one positive example] > epsilon/(2*n) | + | |
+ | '''Probability[free literal z is eliminated from h by one positive example] > epsilon/(2*n)''' | ||
From this we obtain : | From this we obtain : | ||
− | |||
− | Probability[free literal z survives | + | '''Probability[free literal z survives one positive example] = 1 - Probability[free literal z is eliminated from h by one positive example] < (1 - epsilon/(2*n))''' |
− | Probability[some free literal z survives m positive examples] < 2n*(1 - epsilon/(2*n))^m < 2n*(e^(-epsilon/(2*n)))^m = 2n*e^(-(m*epsilon)/(2*n)) | + | '''Probability[free literal z survives m positive examples] < (1 - epsilon/(2*n))^m''' |
+ | |||
+ | '''Probability[some free literal z survives m positive examples] < 2n*(1 - epsilon/(2*n))^m < 2n*(e^(-epsilon/(2*n)))^m = 2n*e^(-(m*epsilon)/(2*n))''' | ||
That is | That is | ||
− | |||
+ | '''m > (2*n/epsilon)*(ln(1/delta)+n*ln(2))''' | ||
+ | |||
+ | === what you should remember === | ||
+ | |||
+ | * Relationships between sample complexity, error bound, and “capacity” of chosen hypothesis space | ||
+ | * Within the PAC learning setting, we can bound the probability that learner will output hypothesis with given error | ||
+ | ** For ANY consistent learner (case where c in H) | ||
+ | ** For ANY “best fit” hypothesis (agnostic learning, where perhaps c not in H) | ||
− | + | * VC dimension as measure of complexity of H | |
− | |||
− | |||
− | |||
− | |||
− | |||
− |
Revision as of 17:27, 7 October 2013
Contents
Slides
Readings
the PAC learning, VC dimension, etc. in Tom Mitchell 's Chapter 7
An example of PAC learnability of Boolean literals : Learning a Boolean Conjunction
(from http://www.cis.temple.edu/~giorgio/cis587/readings/pac.html)
We can efficiently PAC learn concepts that are represented as the conjunction of boolean literals (i.e. positive or negative boolean variables). Here is a learning algorithm:
1. Start with an hypothesis h which is the conjunction of each variable and its negation x1 & ~x1 & x2 & ~x2 & .. & xn & ~xn.
2. Do nothing with negative instances.
3. On a positive instance a, eliminate in h ~xi if ai is positive, eliminate xi if ai is negative. For example if a positive instance is 01100 then eliminate x1, ~x2, ~x3, x4, and x5.
In this algorithm h, as domain, is non-decreasing and at all times contained in the set denoted by c. [By induction: certainly true initially, ..]. We will have an error when h contains a literal z which is not in c.
We compute first the probability that a literal z is deleted from h because of one specific positive example. Clearly this probability is 0 if z occurs in c, and if ~z is in c the probability is 1. At issue are the z where neither z nor ~z is in c.
We would like to eliminate both of them from h. If one of the two remains, we have an error for an instance a that is positive for c and negative for h. Let's call these literals, free literals.
We have:
error(h) is less than or equal to the Sum of the probabilities of the free literals z in h not to be eliminated by one positive example.
Since there are at most 2*n literals in h, if h is a bad hypothesis, i.e. an hypothesis with error greater than epsilon, we will have
Probability[free literal z is eliminated from h by one positive example] > epsilon/(2*n)
From this we obtain :
Probability[free literal z survives one positive example] = 1 - Probability[free literal z is eliminated from h by one positive example] < (1 - epsilon/(2*n))
Probability[free literal z survives m positive examples] < (1 - epsilon/(2*n))^m
Probability[some free literal z survives m positive examples] < 2n*(1 - epsilon/(2*n))^m < 2n*(e^(-epsilon/(2*n)))^m = 2n*e^(-(m*epsilon)/(2*n))
That is
m > (2*n/epsilon)*(ln(1/delta)+n*ln(2))
what you should remember
- Relationships between sample complexity, error bound, and “capacity” of chosen hypothesis space
- Within the PAC learning setting, we can bound the probability that learner will output hypothesis with given error
- For ANY consistent learner (case where c in H)
- For ANY “best fit” hypothesis (agnostic learning, where perhaps c not in H)
- VC dimension as measure of complexity of H