The Voted Perceptron for Ranking and Structured Classification William Cohen 3-6-2007
Download ReportTranscript The Voted Perceptron for Ranking and Structured Classification William Cohen 3-6-2007
The Voted Perceptron for Ranking and Structured Classification William Cohen 3-6-2007 A few critique questions • Why use a non-convergent method for computing expectations (for skip-CRFs) ? Was that the only choice? – Sadly the choice is: provably fast or provably convergent -- pick only one. • Does it matter that the structure is different at different nodes in the skip-chain CRF? – Does it matter that some linear-chain nodes have only one neighbor? – Does it matter that some documents have 100 words and some have 1000? • What is all the loopy BP stuff all about anyway? – Bishop’s textbook chapter 8 for introduction. The voted perceptron A instance xi B ^ yi yi ^ Compute: yi = vk . xi If mistake: vk+1 = vk + yi xi (1) A target u (2) The guess v1 after one positive example. u u v1 -u +x1 -u 2γ 2γ (3a) The guess v2 after the two (3b) The guess v2 after the one positive and positive examples: v2=v1+x2 one negative example: v2=v1-x2 u u +x2 v2 v2 v1 v1 +x1 -x2 -u -u 2γ 2γ (3a) The guess v2 after the two (3b) The guess v2 after the one positive and positive examples: v2=v1+x2 one negative example: v2=v1-x2 u u +x2 v2 v2 >γ v1 v1 +x1 -x2 -u -u 2γ 2γ (3a) The guess v2 after the two (3b) The guess v2 after the one positive and positive examples: v2=v1+x2 one negative example: v2=v1-x2 u u +x2 v2 v2 v1 v1 +x1 -x2 -u -u 2γ 2γ On-line to batch learning 1. Pick a vk at random according to mk/m, the fraction of examples it was used for. 2. Predict using the vk you just picked. 3. (Actually, use some sort of deterministic approximation to this). The voted perceptron for ranking A instances x1 x2 x3 x4… b* B ^ Compute: yi = vk . xi Return: the index b* of the “best” xi If mistake: vk+1 = vk + xb - xb* b u Ranking some x’s with the target vector u x γ x x x -u x u Ranking some x’s with some guess vector v – part 1 x γ v x x x -u x u Ranking some x’s with some guess vector v – part 2. x v x x x -u x The purple-circles x is xb* - the green one is xb, the one A has chosen to rank highest. u Correcting v by adding xb – xb* x v x x x -u x Correcting v by adding xb – xb* Vk+1 (part 2) x vk x x x x (3a) The guess v2 after the two u positive examples: v2=v1+x2 u +x2 v2 x >γ v x v1 x x -u -u 2γ x (3a) The guess v2 after the two u positive examples: v2=v1+x2 u +x2 v2 x >γ v1 x x -u -u 2γ 3 v x x Notice this doesn’t depend at all on the number of x’s being ranked u (3a) The guess v2 after the two positive examples: v2=v1+x2 u +x2 v2 x >γ v x v1 x x -u -u 2γ x The voted perceptron for ranking A instances x1 x2 x3 x4… B b* ^ Compute: yi = vk . xi Return: the index b* of the “best” xi If mistake: vk+1 = vk + xb - xb* b Change number one: replace x with z The voted perceptron for NER A instances z1 z2 z3 z4… b* B ^ Compute: yi = vk . zi Return: the index b* of the “best” zi If mistake: vk+1 = vk + zb - zb* b 1. A sends B the Sha & Pereira paper and instructions for creating the instances: • A sends a word vector xi. Then B could create the instances F(xi,y)….. • but instead B just returns the y* that gives the best score for the dot product vk . F(xi,y*) by using Viterbi. 2. A sends B the correct label sequence yi. 3. On errors, B sets vk+1 = vk + zb - zb* = vk + F(xi,y) - F(xi,y*) The voted perceptron for NER A instances z1 z2 z3 z4… b* B ^ Compute: yi = vk . zi Return: the index b* of the “best” zi If mistake: vk+1 = vk + zb - zb* b 1. A sends a word vector xi. 2. B just returns the y* that gives the best score for vk . F(xi,y*) 3. A sends B the correct label sequence yi. 4. On errors, B sets vk+1 = vk + zb - zb* = vk + F(xi,y) - F(xi,y*) Collins’ results