PowerPoint Template - National University of Singapore

Download Report

Transcript PowerPoint Template - National University of Singapore

MWE

Re-examination of Association Measures for identifying Verb Particle and Light Verb Constructions

Supervisor: Dr. Kan Min-Yen, Dr. Su Nam Kim, (Dr. Timothy Baldwin) Advisor: Lin Ziheng Student: Hoang Huu Hung 25-Apr-20

Outline

Verb Particle and Light Verb Constructions

Motivations

Association Measures

MI and PMI MI and PMI The high-frequency constituents Context measures 

Conclusions & Future work

Conclusions & Future work

25-Apr-20

Multiword Expressions

Multiple simplex words

Idiosyncratic

  Lexically:

ad hoc

(ad ? , hoc ?) Syntactically:

by and large

(prep. + conj. + adj. ?)  Semantically:

spill the beans

 Statistically:

strong coffee

(powerful coffee ?) 

Obstacles to language understanding, translation, generation, etc.

25-Apr-20

Verb Particle and Light Verb Constructions

Verb Particle Constructions

Verb + Particle(s)

Bolster up

,

put off

Light Verb Constructions Light verb + Complement

do

,

get

,

give

,

have

,

make

,

put,

Put up with

,

get on with take

Cut short , let go

Make a speech

,

give a demo

Syntactically Flexible:

Inflections, passive, internal modifications 

Extensive research has been carried out on this topic

. 

He has given many excellent speeches in his career

.

Semantically

Non-compositional

carry out , give up

(Semi-?)compositional

walk off , finish up

Compositional

 meaning from the de-verbal noun •

give a demo

 Subtle meaning deviations •

Have a read

vs.

Read

25-Apr-20

Identification of VPCs and LVCs

≠ Free Combination VPCs:

free verb + preposition  Leave it to me.

LVCs:

 free light verb + noun make decision ≠ make drugs 

VPCs ≠ Prepositional verbs :

look after , search for

 

VPCs

Joint & split configurations  

Look up the word/ Look the word up Look it up

 No intervening manner adverb   

Look up the word carefully

*

Look carefully up the word

Prepositional verbs

Only joint configuration  

Look after your mum Look after her

Flexible positions  

Look after your mum carefully Look carefully after your mum

25-Apr-20

Outline

Verb Particle and Light Verb Constructions

Motivations

Association Measures

 MI and PMI  The high-frequency constituents  Context measures 

Conclusion & Future work

25-Apr-20

Motivation

Pecina P. and Schlesinger P.

Association Measures for Collocation Extraction

(COLING/ACL 2006)

Combining

 An exhaustive list of 82 Lexical Association Measures for bigrams 

Lexical Association Measures (AM)

 Mathematical formulae devised to capture the degree of association between words of a phrase • Association ~ dependence • Degree ~ score  Input: statistical information. Output: scores 25-Apr-20

A comparison

Pecina and Schlesinger Our Project

Czech bigram “collocations”

 Mixed idiomatic exps, LVCs, terminologies, stock phrases,… 

English bigram VPCs, LVCs

 Separately and mixed 

Ranking extracted bigrams

 Prague Dependency Treebank • 1.5 million words  Average Precision (AP) 

Rank VPC & LVC candidates

 Wall Street Journal Corpus • 1 million words  AP 

Machine-learning based combination

Analysis

 

of AMs

Categorization Modifications 25-Apr-20

Gold-standard Evaluation data

  

Bigrams with frequency ≥ 6 413 WSJ-attested VPC candidates

 Candidates: (verb + particle) pair  Annotations from Baldwin, T. (2005)

100 WSJ-attested LVC candidates

 LVC candidates: (light verb + noun) pairs  Annotations from Tan, Y. F., Kan M., Y. and Cui H. (2006) Size Evaluation set VPC LVC Mixed 413 100 513 Negative Instances 296 72 368 Positive Instances 117 28 145 % of positive Instances 28.33% 28% 28.26%  A random ranker has an AP ~ 0.28

25-Apr-20

Rank-equivalence

Idea: refer to AMs having the same AP

 simplification  Categorization

ad bc r

ad ad

bc

bc r

ad ad

   

Rank-equivalence over a set C

 “Ranking all members of C in the same way”  Notation:

Property

bc bc

 M(

c

): score assigned by M to instance

c

25-Apr-20

Example of AMs

Contingency table of bigram (x y)

    

w

: any word except

w

;

*

: any word; f(.) : frequency of (.) N: total number of bigrams Null hypothesis of independence  ( )]  

f N

25-Apr-20

Outline

Verb Particle and Light Verb Constructions

Motivations

Association Measures

 MI and PMI  The high-frequency constituents  Context measures 

Conclusion & Future work

25-Apr-20

Categorization: 4 main groups

 

Group 1 : Dependence ~ Reduction in uncertainty

 MI and PMI, Salience, etc.

Group 2: Dependence ~ set’s similarity ↑↓ marginal frequencies

  Dice, Minimum Sensitivity, Laplace etc. Sokal-Michiner, Odds ratio, etc. 

Group 3:

Compare observed freq. with expected freq.

Null hypothesis of independence   T, Z test; Pearson’s chi-squared, Fisher’s exact tests, etc.

Group 4:

 

Dependence ~ Non-compositionality

Non-compositionality ~ context similarity • Cosine Similarity in tf.idf space, idf space, etc.

Non-compositionality ↑↓ context entropy 25-Apr-20

MI and PMI

Mutual Information (MI)

 MI(U; V)   ) log )  ) (  ) Reduction in uncertainty of U given knowledge of V  MI  1

N

f ij f f

log ˆ

ij ij

Point-wise MI (PMI)

  log “MI at a specific point” )  ) (  )  log  ) 

y

) 

2 known drawbacks

25-Apr-20

The first drawback

  Higher performance in [0, 1] than [2, 100]  log

P xy

 ) ( ) 

y

)  log  ) 

y

) k VPCs LVCs Mixed     0.32

0.546

0.525

100

Joint prob.

0.217

0.170

0.546

0.505

0.431

0.289

0.28

k

0.544

 )

k

0.528

0.515

f x

 ) ( 

y

) 0.370

0.290

0.236

0.224

0.175

25-Apr-20

The second drawback

Beside the degree of dependence,

MI grows with entropy

PMI grows with frequency

Mathematically,

1 (  1

P x

 ) , log ) 1  1  ) ) ) , log 1  1  ) ) Comparing these scores are just not appropriate !!!

25-Apr-20

Proposed Solution

Normalizing scores

 Share the same unit 

Proposed normalization factor (NF)

NF =

(1 

) ( 

y

)

  [0, 1] or

NF-α 

P x

 ), ( 

y

))

25-Apr-20

Against high-frequency constituents

[M35] Simpson

[M35] Simpson VPCs

a

max(  , 0.478

0.249

min(

a

 )

a

0.578

 0.382

) 

a

a

Mixed

a

0.486

0.260

Insights: penalizing against the higher productive constituent

 Confirmed by [M49] Laplace, [M41] S cost AMs VPCs LVCs Mixed

a

 1 2

a a

 1 2 ) 

a

 1 0.577

 0.493

a

 1

[M49] Laplace

0.241

0.388

0.254

 2 25-Apr-20

Against high-frequency constituents

   0.565

[M18] Sokal-Michiner

 –max(

b, c

) 0.540

0.433

Better to penalize both constituents ???

Proposed modification:

b r

Mixed

a d r b c

0.546

0.519

) 

c

25-Apr-20

Context-based Measures

Non-compositionality of (x y)

  Context of (

x y

) ≠ context of

x

and

y

Context of

x

≠ context of

y

 Eg:

Dutch courage

,

hot dog

Context as a distribution of words

 Relative entropy (KL divergence), Jensen-Shannon diverg.  Dice similarity 

Context as a point/vector in R N

 Euclidean, Manhattan, etc. distance  Angle distance (Cosine similarity) 25-Apr-20

Representation of context

  

Context of z C z

w

1 

w

2 ), ...

Common representation schemes

w n

   Tf.Idf:    ( ( (

w w w i

)

i i

) )    0 1 

i

).

i

)

N

(

i

)

i

 Dice similarity dice(c , c )

x y

 

x

2  2 

x y i i

y i

2 )) VPCs LVCs  Mixed 0.367 0.374

• Salton and Buckley (1987) Dice in (Scaled tf).idf space 0.568

0.488 0.553

i

C x

} | 25-Apr-20

Outline

Verb Particle and Light Verb Constructions

Motivations

Association Measures

 MI and PMI  The high-frequency constituents  Context measures 

Conclusion & Future work

25-Apr-20

Conclusions

 

The 82 AMs: 4 main groups

 Meaning  Rank-equivalence

Group 1 : Dependence ~ Reduction in uncertainty

 Effective 

Group 2:

Dependence ~ set’s similarity, marginal frequencies

Simple but most effective 

Group 3:

Compare observed freq. with expected freq.

Not effective 

Group 4: Non-compositionality ~ context similarity, entropy

 Compromised by the ubiquity of particles and light verbs.

25-Apr-20

Conclusions

Co-occurrence frequency f(xy)

 Not useful for VPCs: f(xy) 0.13

 Ok for LVCs: f(xy) 0.85

Marginal frequencies f(x*) and f(*y)

 Effective to discriminate against high-frequency constituents  Useful discriminative units: • VPCs: –b –c • LVCs: 1/bc 

MI and PMI

 An indicator of independence, not dependence • Manning and Schutze (1999, p. 67)  As an indicator of dependence • PMI: normalized for VPCs • MI: normalized for VPCs and LVCs 

The tf.idf

 Effective to normalize tf to [0.5, 1] • Salton and Buckley (1987) 25-Apr-20

Future work

More types of particles of VPCs

 Adjective:

cut short , put straight

Verb: let go , make do

Trigram model

 Phrasal-prepositional verbs: Verb + adverb + prep. •

Look forward to , get away with

 Idioms •

Kick the bucket

,

spill the beans

 Adaptation of bigram-AMs 

A larger corpus, evaluation data set

25-Apr-20

References

      

Baldwin, T. (2005).

The deep lexical acquisition of English verb-particle constructions

. Computer Speech and Language, Special Issue on Multiword Expressions, 19(4):398 –414 Evert, S. (2004).

The Statistics of Word Cooccurrences: Word Pairs and Collocations

. Ph.D. dissertation, University of Stuttgart.

Kim, S. N. (2008).

Statistical Modeling of Multiword Expressions

. Ph.D. Thesis, University of Melbourne, Australia.

Lin, D. (1999).

Automatic identification of non-compositional phrases . In Proc. of the 37th Annual Meeting of the ACL, Manning, C.D. and Schutze, H.(1999).

Foundations of Statistical Natural Language Processing

. The MIT Press, Cambridge, Massachusetts.

Pecina, P. and Schlesinger, P. (2006).

Combining association measures for collocation extraction . COLING/ACL 2006 Tan, Y. F., Kan, M. Y. and Cui, H. (2006).

identification of light verb constructions using a supervised learning framework . EACL 2006 Workshop on Multi-word-expressions in a

multilingual context

Extending corpus-based Zhai, C. (1997).

Exploiting context to identify lexical atoms – A statistical view of linguistic context . In International and Interdisciplinary Conference on Modelling and Using Context (CONTEXT-97).

25-Apr-20

MWE

25-Apr-20

Q&A

Bigram idioms

in all

,

after all

,

later on

,

as such

25-Apr-20

Types of MWEs

Lexicalized phrases MWEs Institutionalized phrases Fixed exps Semi-fixed exps Syntactically-flexible exps Idioms NCs PP-Ds LVCs VPCs 25-Apr-20

Pavel (2005, 2006, 2008)

Combination of 82 association measures

Statistical tests:

  Mutual information Statistical independence  Likely-hood measures 

Semantics tests:

 Entropy of immediate context • Immediate context: immediately preceding/following words  Diversity of empirical context • Empirical context: words within a certain specified window 25-Apr-20

Pavel (2005, 2006, 2008)

Combination of 82 association measures

Result:

 MAP: 80.81%  “Equivalent” measures  17 measures 

Issues:

 Possible conflicting predictions • All combined is best?

 Unclear linguistic and/or statistical significance 25-Apr-20

Other linguistics-driven tests

MWEs are lexically fixed?

 Substitutability test (Lin, 1999) 

MWEs are order-specific?

 Permutation entropy (PE) (Yi Zhang et al., 2006)  Entropy of Permutation and Insertion (EPI) • (Aline et al., 2008) • Not all permutations are valid • “Permutation”  “Syntactic variants” 

Others…?

25-Apr-20

Q&A

Bigram idioms

in all

,

after all

,

later on

,

as such

25-Apr-20

Outline

Verb Particle and Light Verb Constructions

Motivations

Association Measures

 MI and PMI  The high-frequency constituents  Context measures 

Conclusion & Future work

25-Apr-20