Difference between revisions of "CYK Parsing"
From Cohen Courses
Jump to navigationJump to searchm |
m |
||
Line 14: | Line 14: | ||
for each production A -> a | for each production A -> a | ||
if a == X[i] | if a == X[i] | ||
− | add A -> a to C[i,i] | + | add (A -> a, 0) to C[i,i] |
for each L = 2 to n | for each L = 2 to n | ||
Line 21: | Line 21: | ||
for each production A -> A1 A2 | for each production A -> A1 A2 | ||
if A1 in C[i,i+j] and A2 in C[i+j+1,i+L] | if A1 in C[i,i+j] and A2 in C[i+j+1,i+L] | ||
− | add A -> A1 A2 to C[i,i+L] | + | add (A -> A1 A2, i+j) to C[i,i+L] |
</pre> | </pre> | ||
+ | |||
+ | We can recover the parse trees by traversing through the chart, which contains the production rule used and the point where the subtrees are split. The probability of a parse tree is simply the product of the probability of each production rule that we have used. |
Revision as of 17:53, 30 October 2011
This is a Method page for CYK Parsing.
The Cocke–Younger–Kasami (CYK) algorithm is an algorithm for parsing PCFG. It is bottom up and makes use of dynamic programming.
Description
The input to the algorithm is a grammar in Chomsky Normal Form, and a sentence . The idea behind the parsing algorithm is to recursively build parses from bottom up, and mantaining a chart C[i, j] which contains productions that generates X[i] ... X[j].
Pseudocode
for each i = 1 to n for each production A -> a if a == X[i] add (A -> a, 0) to C[i,i] for each L = 2 to n for each i = 1 to n-L+1 for each j = 1 to L - 1 for each production A -> A1 A2 if A1 in C[i,i+j] and A2 in C[i+j+1,i+L] add (A -> A1 A2, i+j) to C[i,i+L]
We can recover the parse trees by traversing through the chart, which contains the production rule used and the point where the subtrees are split. The probability of a parse tree is simply the product of the probability of each production rule that we have used.