Transcript PowerPoint Template - National University of Singapore
MWE
Re-examination of Association Measures for identifying Verb Particle and Light Verb Constructions
Supervisor: Dr. Kan Min-Yen, Dr. Su Nam Kim, (Dr. Timothy Baldwin) Advisor: Lin Ziheng Student: Hoang Huu Hung 25-Apr-20
Outline
Verb Particle and Light Verb Constructions
Motivations
Association Measures
MI and PMI MI and PMI The high-frequency constituents Context measures
Conclusions & Future work
Conclusions & Future work
25-Apr-20
Multiword Expressions
Multiple simplex words
Idiosyncratic
Lexically:
ad hoc
(ad ? , hoc ?) Syntactically:
by and large
(prep. + conj. + adj. ?) Semantically:
spill the beans
Statistically:
strong coffee
(powerful coffee ?)
Obstacles to language understanding, translation, generation, etc.
25-Apr-20
Verb Particle and Light Verb Constructions
Verb Particle Constructions
Verb + Particle(s)
Bolster up
,
put off
Light Verb Constructions Light verb + Complement
do
,
get
,
give
,
have
,
make
,
put,
Put up with
,
get on with take
Cut short , let go
Make a speech
,
give a demo
Syntactically Flexible:
Inflections, passive, internal modifications
Extensive research has been carried out on this topic
.
He has given many excellent speeches in his career
.
Semantically
Non-compositional
•
carry out , give up
(Semi-?)compositional
•
walk off , finish up
Compositional
meaning from the de-verbal noun •
give a demo
Subtle meaning deviations •
Have a read
vs.
Read
25-Apr-20
Identification of VPCs and LVCs
≠ Free Combination VPCs:
free verb + preposition Leave it to me.
LVCs:
free light verb + noun make decision ≠ make drugs
VPCs ≠ Prepositional verbs :
look after , search for
VPCs
Joint & split configurations
Look up the word/ Look the word up Look it up
No intervening manner adverb
Look up the word carefully
*
Look carefully up the word
Prepositional verbs
Only joint configuration
Look after your mum Look after her
Flexible positions
Look after your mum carefully Look carefully after your mum
25-Apr-20
Outline
Verb Particle and Light Verb Constructions
Motivations
Association Measures
MI and PMI The high-frequency constituents Context measures
Conclusion & Future work
25-Apr-20
Motivation
Pecina P. and Schlesinger P.
Association Measures for Collocation Extraction
(COLING/ACL 2006)
Combining
An exhaustive list of 82 Lexical Association Measures for bigrams
Lexical Association Measures (AM)
Mathematical formulae devised to capture the degree of association between words of a phrase • Association ~ dependence • Degree ~ score Input: statistical information. Output: scores 25-Apr-20
A comparison
Pecina and Schlesinger Our Project
Czech bigram “collocations”
Mixed idiomatic exps, LVCs, terminologies, stock phrases,…
English bigram VPCs, LVCs
Separately and mixed
Ranking extracted bigrams
Prague Dependency Treebank • 1.5 million words Average Precision (AP)
Rank VPC & LVC candidates
Wall Street Journal Corpus • 1 million words AP
Machine-learning based combination
Analysis
of AMs
Categorization Modifications 25-Apr-20
Gold-standard Evaluation data
Bigrams with frequency ≥ 6 413 WSJ-attested VPC candidates
Candidates: (verb + particle) pair Annotations from Baldwin, T. (2005)
100 WSJ-attested LVC candidates
LVC candidates: (light verb + noun) pairs Annotations from Tan, Y. F., Kan M., Y. and Cui H. (2006) Size Evaluation set VPC LVC Mixed 413 100 513 Negative Instances 296 72 368 Positive Instances 117 28 145 % of positive Instances 28.33% 28% 28.26% A random ranker has an AP ~ 0.28
25-Apr-20
Rank-equivalence
Idea: refer to AMs having the same AP
simplification Categorization
ad bc r
ad ad
bc
bc r
ad ad
Rank-equivalence over a set C
“Ranking all members of C in the same way” Notation:
Property
bc bc
M(
c
): score assigned by M to instance
c
25-Apr-20
Example of AMs
Contingency table of bigram (x y)
w
: any word except
w
;
*
: any word; f(.) : frequency of (.) N: total number of bigrams Null hypothesis of independence ( )]
f N
25-Apr-20
Outline
Verb Particle and Light Verb Constructions
Motivations
Association Measures
MI and PMI The high-frequency constituents Context measures
Conclusion & Future work
25-Apr-20
Categorization: 4 main groups
Group 1 : Dependence ~ Reduction in uncertainty
MI and PMI, Salience, etc.
Group 2: Dependence ~ set’s similarity ↑↓ marginal frequencies
Dice, Minimum Sensitivity, Laplace etc. Sokal-Michiner, Odds ratio, etc.
Group 3:
Compare observed freq. with expected freq.
Null hypothesis of independence T, Z test; Pearson’s chi-squared, Fisher’s exact tests, etc.
Group 4:
Dependence ~ Non-compositionality
Non-compositionality ~ context similarity • Cosine Similarity in tf.idf space, idf space, etc.
Non-compositionality ↑↓ context entropy 25-Apr-20
MI and PMI
Mutual Information (MI)
MI(U; V) ) log ) ) ( ) Reduction in uncertainty of U given knowledge of V MI 1
N
f ij f f
log ˆ
ij ij
Point-wise MI (PMI)
log “MI at a specific point” ) ) ( ) log )
y
)
2 known drawbacks
25-Apr-20
The first drawback
Higher performance in [0, 1] than [2, 100] log
P xy
) ( )
y
) log )
y
) k VPCs LVCs Mixed 0.32
0.546
0.525
100
Joint prob.
0.217
0.170
0.546
0.505
0.431
0.289
0.28
k
0.544
)
k
0.528
0.515
f x
) (
y
) 0.370
0.290
0.236
0.224
0.175
25-Apr-20
The second drawback
Beside the degree of dependence,
MI grows with entropy
PMI grows with frequency
Mathematically,
1 ( 1
P x
) , log ) 1 1 ) ) ) , log 1 1 ) ) Comparing these scores are just not appropriate !!!
25-Apr-20
Proposed Solution
Normalizing scores
Share the same unit
Proposed normalization factor (NF)
NF =
(1
) (
y
)
[0, 1] or
NF-α
P x
), (
y
))
25-Apr-20
Against high-frequency constituents
[M35] Simpson
[M35] Simpson VPCs
a
max( , 0.478
0.249
min(
a
)
a
0.578
0.382
)
a
a
Mixed
a
0.486
0.260
Insights: penalizing against the higher productive constituent
Confirmed by [M49] Laplace, [M41] S cost AMs VPCs LVCs Mixed
a
1 2
a a
1 2 )
a
1 0.577
0.493
a
1
[M49] Laplace
0.241
0.388
0.254
2 25-Apr-20
Against high-frequency constituents
0.565
[M18] Sokal-Michiner
–max(
b, c
) 0.540
0.433
Better to penalize both constituents ???
Proposed modification:
b r
Mixed
a d r b c
0.546
0.519
)
c
25-Apr-20
Context-based Measures
Non-compositionality of (x y)
Context of (
x y
) ≠ context of
x
and
y
Context of
x
≠ context of
y
Eg:
Dutch courage
,
hot dog
Context as a distribution of words
Relative entropy (KL divergence), Jensen-Shannon diverg. Dice similarity
Context as a point/vector in R N
Euclidean, Manhattan, etc. distance Angle distance (Cosine similarity) 25-Apr-20
Representation of context
Context of z C z
w
1
w
2 ), ...
Common representation schemes
w n
Tf.Idf: ( ( (
w w w i
)
i i
) ) 0 1
i
).
i
)
N
(
i
)
i
Dice similarity dice(c , c )
x y
x
2 2
x y i i
y i
2 )) VPCs LVCs Mixed 0.367 0.374
• Salton and Buckley (1987) Dice in (Scaled tf).idf space 0.568
0.488 0.553
i
C x
} | 25-Apr-20
Outline
Verb Particle and Light Verb Constructions
Motivations
Association Measures
MI and PMI The high-frequency constituents Context measures
Conclusion & Future work
25-Apr-20
Conclusions
The 82 AMs: 4 main groups
Meaning Rank-equivalence
Group 1 : Dependence ~ Reduction in uncertainty
Effective
Group 2:
Dependence ~ set’s similarity, marginal frequencies
Simple but most effective
Group 3:
Compare observed freq. with expected freq.
Not effective
Group 4: Non-compositionality ~ context similarity, entropy
Compromised by the ubiquity of particles and light verbs.
25-Apr-20
Conclusions
Co-occurrence frequency f(xy)
Not useful for VPCs: f(xy) 0.13
Ok for LVCs: f(xy) 0.85
Marginal frequencies f(x*) and f(*y)
Effective to discriminate against high-frequency constituents Useful discriminative units: • VPCs: –b –c • LVCs: 1/bc
MI and PMI
An indicator of independence, not dependence • Manning and Schutze (1999, p. 67) As an indicator of dependence • PMI: normalized for VPCs • MI: normalized for VPCs and LVCs
The tf.idf
Effective to normalize tf to [0.5, 1] • Salton and Buckley (1987) 25-Apr-20
Future work
More types of particles of VPCs
Adjective:
cut short , put straight
Verb: let go , make do
Trigram model
Phrasal-prepositional verbs: Verb + adverb + prep. •
Look forward to , get away with
Idioms •
Kick the bucket
,
spill the beans
Adaptation of bigram-AMs
A larger corpus, evaluation data set
25-Apr-20
References
Baldwin, T. (2005).
The deep lexical acquisition of English verb-particle constructions
. Computer Speech and Language, Special Issue on Multiword Expressions, 19(4):398 –414 Evert, S. (2004).
The Statistics of Word Cooccurrences: Word Pairs and Collocations
. Ph.D. dissertation, University of Stuttgart.
Kim, S. N. (2008).
Statistical Modeling of Multiword Expressions
. Ph.D. Thesis, University of Melbourne, Australia.
Lin, D. (1999).
Automatic identification of non-compositional phrases . In Proc. of the 37th Annual Meeting of the ACL, Manning, C.D. and Schutze, H.(1999).
Foundations of Statistical Natural Language Processing
. The MIT Press, Cambridge, Massachusetts.
Pecina, P. and Schlesinger, P. (2006).
Combining association measures for collocation extraction . COLING/ACL 2006 Tan, Y. F., Kan, M. Y. and Cui, H. (2006).
identification of light verb constructions using a supervised learning framework . EACL 2006 Workshop on Multi-word-expressions in a
multilingual context
Extending corpus-based Zhai, C. (1997).
Exploiting context to identify lexical atoms – A statistical view of linguistic context . In International and Interdisciplinary Conference on Modelling and Using Context (CONTEXT-97).
25-Apr-20
MWE
25-Apr-20
Q&A
Bigram idioms
in all
,
after all
,
later on
,
as such
25-Apr-20
Types of MWEs
Lexicalized phrases MWEs Institutionalized phrases Fixed exps Semi-fixed exps Syntactically-flexible exps Idioms NCs PP-Ds LVCs VPCs 25-Apr-20
Pavel (2005, 2006, 2008)
Combination of 82 association measures
Statistical tests:
Mutual information Statistical independence Likely-hood measures
Semantics tests:
Entropy of immediate context • Immediate context: immediately preceding/following words Diversity of empirical context • Empirical context: words within a certain specified window 25-Apr-20
Pavel (2005, 2006, 2008)
Combination of 82 association measures
Result:
MAP: 80.81% “Equivalent” measures 17 measures
Issues:
Possible conflicting predictions • All combined is best?
Unclear linguistic and/or statistical significance 25-Apr-20
Other linguistics-driven tests
MWEs are lexically fixed?
Substitutability test (Lin, 1999)
MWEs are order-specific?
Permutation entropy (PE) (Yi Zhang et al., 2006) Entropy of Permutation and Insertion (EPI) • (Aline et al., 2008) • Not all permutations are valid • “Permutation” “Syntactic variants”
Others…?
25-Apr-20
Q&A
Bigram idioms
in all
,
after all
,
later on
,
as such
25-Apr-20
Outline
Verb Particle and Light Verb Constructions
Motivations
Association Measures
MI and PMI The high-frequency constituents Context measures
Conclusion & Future work
25-Apr-20