The Voted Perceptron for Ranking and Structured Classification William Cohen 3-6-2007

Transcript The Voted Perceptron for Ranking and Structured Classification William Cohen 3-6-2007

The Voted Perceptron for Ranking and
Structured Classification
William Cohen
3-6-2007
A few critique questions
• Why use a non-convergent method for computing
expectations (for skip-CRFs) ? Was that the only choice?
– Sadly the choice is: provably fast or provably convergent -- pick
only one.
• Does it matter that the structure is different at different
nodes in the skip-chain CRF?
– Does it matter that some linear-chain nodes have only one
neighbor?
– Does it matter that some documents have 100 words and some
have 1000?
• What is all the loopy BP stuff all about anyway?
– Bishop’s textbook chapter 8 for introduction.
The voted perceptron
A
instance xi
B
^
yi
yi
^
Compute: yi = vk . xi
If mistake: vk+1 = vk + yi xi
(1) A target u
(2) The guess v1 after one
positive example.
u
u
v1
-u
+x1
-u
2γ
2γ
(3a) The guess v2 after the two
(3b) The guess v2 after the one positive and
positive examples: v2=v1+x2
one negative example: v2=v1-x2
u
u
+x2
v2
v2
v1
v1
+x1
-x2
-u
-u
2γ
2γ
(3a) The guess v2 after the two
(3b) The guess v2 after the one positive and
positive examples: v2=v1+x2
one negative example: v2=v1-x2
u
u
+x2
v2
v2
>γ
v1
v1
+x1
-x2
-u
-u
2γ
2γ
(3a) The guess v2 after the two
(3b) The guess v2 after the one positive and
positive examples: v2=v1+x2
one negative example: v2=v1-x2
u
u
+x2
v2
v2
v1
v1
+x1
-x2
-u
-u
2γ
2γ
On-line to batch learning
1. Pick a vk at random
according to mk/m, the
fraction of examples it
was used for.
2. Predict using the vk
you just picked.
3. (Actually, use some
sort of deterministic
approximation to this).
The voted perceptron for ranking
A
instances x1 x2 x3 x4…
b*
B
^
Compute: yi = vk . xi
Return: the index b* of the “best” xi
If mistake: vk+1 = vk + xb - xb*
b
u
Ranking some x’s
with the target
vector u
x
γ
x
x x
-u
x
u
Ranking some x’s
with some guess
vector v – part 1
x
γ
v
x
x x
-u
x
u
Ranking some x’s
with some guess
vector v – part 2.
x
v
x
x x
-u
x
The purple-circles
x is xb* - the green
one is xb, the one
A has chosen to
rank highest.
u
Correcting v by
adding xb – xb*
x
v
x
x x
-u
x
Correcting v by
adding xb – xb*
Vk+1
(part 2)
x
vk
x
x x
x
(3a) The guess v2 after the two
u
positive examples: v2=v1+x2
u
+x2
v2
x
>γ
v
x
v1
x x
-u
-u
2γ
x
(3a) The guess v2 after the two
u
positive examples: v2=v1+x2
u
+x2
v2
x
>γ
v1
x x
-u
-u
2γ
3
v
x
x
Notice this doesn’t depend at all on the number of x’s being ranked
u
(3a) The guess v2 after the two
positive examples: v2=v1+x2
u
+x2
v2
x
>γ
v
x
v1
x x
-u
-u
2γ
x
The voted perceptron for ranking
A
instances x1 x2 x3 x4…
B
b*
^
Compute: yi = vk . xi
Return: the index b* of the “best” xi
If mistake: vk+1 = vk + xb - xb*
b
Change number one: replace x with z
The voted perceptron for NER
A
instances z1 z2 z3 z4…
b*
B
^
Compute: yi = vk . zi
Return: the index b* of the “best” zi
If mistake: vk+1 = vk + zb - zb*
b
1. A sends B the Sha & Pereira paper and instructions for creating the
instances:
•
A sends a word vector xi. Then B could create the instances
F(xi,y)…..
•
but instead B just returns the y* that gives the best score for the
dot product vk . F(xi,y*) by using Viterbi.
2. A sends B the correct label sequence yi.
3. On errors, B sets vk+1 = vk + zb - zb* = vk + F(xi,y) - F(xi,y*)
The voted perceptron for NER
A
instances z1 z2 z3 z4…
b*
B
^
Compute: yi = vk . zi
Return: the index b* of the “best” zi
If mistake: vk+1 = vk + zb - zb*
b
1. A sends a word vector xi.
2. B just returns the y* that gives the best score for vk . F(xi,y*)
3. A sends B the correct label sequence yi.
4. On errors, B sets vk+1 = vk + zb - zb* = vk + F(xi,y) - F(xi,y*)
Collins’ results