CS60057 Speech &Natural Language Processing Autumn 2007 Lecture 14b 24 August 2007 Lecture 1, 7/21/2005 Natural Language Processing.
Download ReportTranscript CS60057 Speech &Natural Language Processing Autumn 2007 Lecture 14b 24 August 2007 Lecture 1, 7/21/2005 Natural Language Processing.
CS60057 Speech &Natural Language Processing Autumn 2007 Lecture 14b 24 August 2007 Lecture 1, 7/21/2005 Natural Language Processing 1 LING 180 SYMBSYS 138 Intro to Computer Speech and Language Processing Lecture 9: Machine Translation (I) November 7, 2006 Dan Jurafsky Thanks to Bonnie Dorr for some of these slides!! Lecture 1, 7/21/2005 Natural Language Processing 2 Outline for MT Week Intro and a little history Language Similarities and Divergences Three classic MT Approaches Transfer Interlingua Direct Modern Statistical MT Evaluation Lecture 1, 7/21/2005 Natural Language Processing 3 What is MT? Translating a text from one language to another automatically. Lecture 1, 7/21/2005 Natural Language Processing 4 Machine Translation dai yu zi zai chuang shang gan nian bao chai you ting jian chuang wai zhu shao xiang ye zhe shang, yu sheng xi li, qing han tou mu, bu jue you di xia lei lai. Dai-yu alone on bed top think-of-with-gratitude Bao-chai again listen to window outside bamboo tip plantain leaf of on-top rain sound sigh drop clear cold penetrate curtain not feeling again fall down tears come As she lay there alone, Dai-yu’s thoughts turned to Bao-chai… Then she listened to the insistent rustle of the rain on the bamboos and plantains outside her window. The coldness penetrated the curtains of her bed. Almost without noticing it she had begun to cry. Lecture 1, 7/21/2005 Natural Language Processing 5 Machine Translation Lecture 1, 7/21/2005 Natural Language Processing 6 Machine Translation The Story of the Stone =The Dream of the Red Chamber (Cao Xueqin 1792) Issues: Word segmentation Sentence segmentation: 4 English sentences to 1 Chinese Grammatical differences Chinese rarely marks tense: As, turned to, had begun, tou -> penetrated Zero anaphora No articles Stylistic and cultural differences Bamboo tip plaintain leaf -> bamboos and plantains Ma ‘curtain’ -> curtains of her bed Rain sound sigh drop -> insistent rustle of the rain Lecture 1, 7/21/2005 Natural Language Processing 7 Not just literature Hansards: Canadian parliamentary proceeedings Lecture 1, 7/21/2005 Natural Language Processing 8 What is MT not good for? Really hard stuff Literature Natural spoken speech (meetings, court reporting) Really important stuff Medical translation in hospitals, 911 Lecture 1, 7/21/2005 Natural Language Processing 9 What is MT good for? Tasks for which a rough translation is fine Web pages, email Tasks for which MT can be post-edited MT as first pass “Computer-aided human translation Tasks in sublanguage domains where high-quality MT is possible FAHQT Lecture 1, 7/21/2005 Natural Language Processing 10 Sublanguage domain Weather forecasting “Cloudy with a chance of showers today and Thursday” “Low tonight 4” Can be modeling completely enough to use raw MT output Word classes and semantic features like MONTH, PLACE, DIRECTION, TIME POINT Lecture 1, 7/21/2005 Natural Language Processing 11 MT History 1946 Booth and Weaver discuss MT at Rockefeller foundation in New York; 1947-48 idea of dictionary-based direct translation 1949 Weaver memorandum popularized idea 1952 all 18 MT researchers in world meet at MIT 1954 IBM/Georgetown Demo Russian-English MT 1955-65 lots of labs take up MT Lecture 1, 7/21/2005 Natural Language Processing 12 History of MT: Pessimism 1959/1960: Bar-Hillel “Report on the state of MT in US and GB” Argued FAHQT too hard (semantic ambiguity, etc) Should work on semi-automatic instead of automatic His argument Little John was looking for his toy box. Finally, he found it. The box was in the pen. John was very happy. Only human knowledge let’s us know that ‘playpens’ are bigger than boxes, but ‘writing pens’ are smaller His claim: we would have to encode all of human knowledge Lecture 1, 7/21/2005 Natural Language Processing 13 History of MT: Pessimism The ALPAC report Headed by John R. Pierce of Bell Labs Conclusions: Supply of human translators exceeds demand All the Soviet literature is already being translated MT has been a failure: all current MT work had to be post-edited Sponsored evaluations which showed that intelligibility and informativeness was worse than human translations Results: MT research suffered Funding loss Number of research labs declined Association for Machine Translation and Computational Linguistics dropped MT from its name Lecture 1, 7/21/2005 Natural Language Processing 14 History of MT 1976 Meteo, weather forecasts from English to French Systran (Babelfish) been used for 40 years 1970’s: European focus in MT; mainly ignored in US 1980’s ideas of using AI techniques in MT (KBMT, CMU) 1990’s Commercial MT systems Statistical MT Speech-to-speech translation Lecture 1, 7/21/2005 Natural Language Processing 15 Language Similarities and Divergences Some aspects of human language are universal or nearuniversal, others diverge greatly. Typology: the study of systematic cross-linguistic similarities and differences What are the dimensions along with human languages vary? Lecture 1, 7/21/2005 Natural Language Processing 16 Morphological Variation Isolating languages Cantonese, Vietnamese: each word generally has one morpheme Vs. Polysynthetic languages Siberian Yupik (`Eskimo’): single word may have very many morphemes Agglutinative languages Turkish: morphemes have clean boundaries Vs. Fusion languages Russian: single affix may have many morphemes Lecture 1, 7/21/2005 Natural Language Processing 17 Syntactic Variation SVO (Subject-Verb-Object) languages English, German, French, Mandarin SOV Languages Japanese, Hindi VSO languages Irish, Classical Arabic SVO lgs generally prepositions: to Yuriko VSO lgs generally postpositions: Yuriko ni Lecture 1, 7/21/2005 Natural Language Processing 18 Segmentation Variation Not every writing system has word boundaries marked Chinese, Japanese, Thai, Vietnamese Some languages tend to have sentences that are quite long, closer to English paragraphs than sentences: Modern Standard Arabic, Chinese Lecture 1, 7/21/2005 Natural Language Processing 19 Inferential Load: cold vs. hot lgs Some ‘cold’ languages require the hearer to do more “figuring out” of who the various actors in the various events are: Japanese, Chinese, Other ‘hot’ languages are pretty explicit about saying who did what to whom. English Lecture 1, 7/21/2005 Natural Language Processing 20 Inferential Load (2) All noun phrases in blue do not appear in Chinese text … But they are needed for a good translation Lecture 1, 7/21/2005 Natural Language Processing 21 Lexical Divergences Word to phrases: English “computer science” = French “informatique” POS divergences Eng. ‘she likes/VERB to sing’ Ger. Sie singt gerne/ADV Eng ‘I’m hungry/ADJ Sp. ‘tengo hambre/NOUN Lecture 1, 7/21/2005 Natural Language Processing 22 Lexical Divergences: Specificity Grammatical constraints English has gender on pronouns, Mandarin not. So translating “3rd person” from Chinese to English, need to figure out gender of the person! Similarly from English “they” to French “ils/elles” Semantic constraints English `brother’ Mandarin ‘gege’ (older) versus ‘didi’ (younger) English ‘wall’ German ‘Wand’ (inside) ‘Mauer’ (outside) German ‘Berg’ English ‘hill’ or ‘mountain’ Lecture 1, 7/21/2005 Natural Language Processing 23 Lexical Divergence: many-to-many Lecture 1, 7/21/2005 Natural Language Processing 24 Lexical Divergence: lexical gaps Japanese: no word for privacy English: no word for Cantonese ‘haauseun’ or Japanese ‘oyakoko’ (something like `filial piety’) English ‘cow’ versus ‘beef’, Cantonese ‘ngau’ Lecture 1, 7/21/2005 Natural Language Processing 25 Event-to-argument divergences English The bottle floated out. Spanish La botella salió flotando. The bottle exited floating Verb-framed lg: mark direction of motion on verb Spanish, French, Arabic, Hebrew, Japanese, Tamil, Polynesian, Mayan, Bantu familiies Satellite-framed lg: mark direction of motion on satellite Crawl out, float off, jump down, walk over to, run after Rest of Indo-European, Hungarian, Finnish, Chinese Lecture 1, 7/21/2005 Natural Language Processing 26 Structural divergences G: Wir treffen uns am Mittwoch E: We’ll meet on Wednesday Lecture 1, 7/21/2005 Natural Language Processing 27 Head Swapping E: X swim across Y S: X crucar Y nadando E: I like to eat G: Ich esse gern E: I’d prefer vanilla G: Mir wäre Vanille lieber Lecture 1, 7/21/2005 Natural Language Processing 28 Thematic divergence Y me gusto I like Y G: Mir fällt der Termin ein E: I forget the date Lecture 1, 7/21/2005 Natural Language Processing 29 Divergence counts from Bonnie Dorr 32% of sentences in UN Spanish/English Corpus (5K) Categorial X tener hambre Y have hunger 98% Conflational X dar puñaladas a Z X stab Z 83% Structural X entrar en Y X enter Y 35% Head Swapping X cruzar Y nadando X swim across Y 8% Thematic X gustar a Y Y likes X 6% Lecture 1, 7/21/2005 Natural Language Processing 30 MT on the web Babelfish: http://babelfish.altavista.com/ Google: http://www.google.com/search?hl=en&lr=&client=safa ri&rls=en&q="1+taza+de+jugo"+%28zumo%29+de+n aranja+5+cucharadas+de+azucar+morena&btnG=Se arch Lecture 1, 7/21/2005 Natural Language Processing 31 3 methods for MT Direct Transfer Interlingua Lecture 1, 7/21/2005 Natural Language Processing 32 Three MT Approaches: Direct, Transfer, Interlingual Lecture 1, 7/21/2005 Natural Language Processing 33 Direct Translation Proceed word-by-word through text Translating each word No intermediate structures except morphology Knowledge is in the form of Huge bilingual dictionary word-to-word translation information After word translation, can do simple reordering Adjective ordering English -> French/Spanish Lecture 1, 7/21/2005 Natural Language Processing 34 Direct MT Dictionary entry Lecture 1, 7/21/2005 Natural Language Processing 35 Direct MT Lecture 1, 7/21/2005 Natural Language Processing 36 Problems with direct MT German Chinese Lecture 1, 7/21/2005 Natural Language Processing 37 The Transfer Model Idea: apply contrastive knowledge, i.e., knowledge about the difference between two languages Steps: Analysis: Syntactically parse Source language Transfer: Rules to turn this parse into parse for Target language Generation: Generate Target sentence from parse tree Lecture 1, 7/21/2005 Natural Language Processing 38 English to French Generally English: Adjective Noun French: Noun Adjective Note: not always true Route mauvaise ‘bad road, badly-paved road’ Mauvaise route ‘wrong road’) But is a reasonable first approximation Rule: Lecture 1, 7/21/2005 Natural Language Processing 39 Transfer rules Lecture 1, 7/21/2005 Natural Language Processing 40 Lexical transfer Transfer-based systems also need lexical transfer rules Bilingual dictionary (like for direct MT) English home: German nach Hause (going home) Heim (home game) Heimat (homeland, home country) zu Hause (at home) Can list “at home <-> zu Hause” Or do Word Sense Disambiguation Lecture 1, 7/21/2005 Natural Language Processing 41 Systran: combining direct and transfer Analysis Morphological analysis, POS tagging Chunking of NPs, PPs, phrases Shallow dependency parsing Transfer Translation of idioms Word sense disambiguation Assigning prepositions based on governing verbs Synthesis Apply rich bilingual dictionary Deal with reordering Morphological generation Lecture 1, 7/21/2005 Natural Language Processing 42 Transfer: some problems N2 sets of transfer rules! Grammar and lexicon full of language-specific stuff Hard to build, hard to maintain Lecture 1, 7/21/2005 Natural Language Processing 43 Interlingua Intuition: Instead of lg-lg knowledge rules, use the meaning of the sentence to help Steps: 1) translate source sentence into meaning representation 2) generate target sentence from meaning. Lecture 1, 7/21/2005 Natural Language Processing 44 Interlingua for Mary did not slap the green witch Lecture 1, 7/21/2005 Natural Language Processing 45 Interlingua Idea is that some of the MT work that we need to do is part of other NLP tasks E.g., disambiguating E:book S:‘libro’ from E:book S:‘reservar’ So we could have concepts like BOOKVOLUME and RESERVE and solve this problem once for each language Lecture 1, 7/21/2005 Natural Language Processing 46 Direct MT: pros and cons (Bonnie Dorr) Pros Fast Simple Cheap No translation rules hidden in lexicon Cons Unreliable Not powerful Rule proliferation Requires lots of context Major restructuring after lexical substitution Lecture 1, 7/21/2005 Natural Language Processing 47 Interlingual MT: pros and cons (B. Dorr) Pros Avoids the N2 problem Easier to write rules Cons: Semantics is HARD Useful information lost (paraphrase) Lecture 1, 7/21/2005 Natural Language Processing 48 The impossibility of translation Hebrew “adonoi roi” for a culture without sheep or shepherds Something fluent and understandable, but not faithful: “The Lord will look after me” Something faithful, but not fluent and nautral “The Lord is for me like somebody who looks after animals with cotton-like hair” Lecture 1, 7/21/2005 Natural Language Processing 49 What makes a good translation Translators often talk about two factors we want to maximize: Faithfulness or fidelity How close is the meaning of the translation to the meaning of the original (Even better: does the translation cause the reader to draw the same inferences as the original would have) Fluency or naturalness How natural the translation is, just considering its fluency in the target language Lecture 1, 7/21/2005 Natural Language Processing 50 Statistical MT: Faithfulness and Fluency formalized! Best-translation of a source sentence S: Tˆ argmaxT fluency(T)faithfulness(T,S) Developed by researchers who were originally in speech recognition at IBM Called the IBM model Lecture 1, 7/21/2005 Natural Language Processing 51 The IBM model Hmm, those two factors might look familiar… Tˆ argmaxT fluency(T)faithfulness(T,S) Yup, it’s Bayes rule: Tˆ argmaxT P(T)P(S | T) Lecture 1, 7/21/2005 Natural Language Processing 52 More formally Assume we are translating from a foreign language sentence F to an English sentence E: F = f1, f2, f3,…, fm We want to find the best English sentence E-hat = e1, e2, e3,…, en E-hat = argmaxE P(E|F) = argmaxE P(F|E)P(E)/P(F) = argmaxE P(F|E)P(E) Translation Model Language Model Lecture 1, 7/21/2005 Natural Language Processing 53 The noisy channel model for MT Lecture 1, 7/21/2005 Natural Language Processing 54 Fluency: P(T) How to measure that this sentence That car was almost crash onto me is less fluent than this one: That car almost hit me. Answer: language models (N-grams!) For example P(hit|almost) > P(was|almost) But can use any other more sophisticated model of grammar Advantage: this is monolingual knowledge! Lecture 1, 7/21/2005 Natural Language Processing 55 Faithfulness: P(S|T) French: ça me plait [that me pleases] English: that pleases me - most fluent I like it I’ll take that one How to quantify this? Intuition: degree to which words in one sentence are plausible translations of words in other sentence Product of probabilities that each word in target sentence would generate each word in source sentence. Lecture 1, 7/21/2005 Natural Language Processing 56 Faithfulness P(S|T) Need to know, for every target language word, probability of it mapping to every source language word. How do we learn these probabilities? Parallel texts! Lots of times we have two texts that are translations of each other If we knew which word in Source Text mapped to each word in Target Text, we could just count! Lecture 1, 7/21/2005 Natural Language Processing 57 Faithfulness P(S|T) Sentence alignment: Figuring out which source language sentence maps to which target language sentence Word alignment Figuring out which source language word maps to which target language word Lecture 1, 7/21/2005 Natural Language Processing 58 Big Point about Faithfulness and Fluency Job of the faithfulness model P(S|T) is just to model “bag of words”; which words come from say English to Spanish. P(S|T) doesn’t have to worry about internal facts about Spanish word order: that’s the job of P(T) P(T) can do Bag generation: put the following words in order (from Kevin Knight) have programming a seen never I language better -actual the hashing is since not collision-free usually the is less perfectly the of somewhat capacity table Lecture 1, 7/21/2005 Natural Language Processing 59 P(T) and bag generation: the answer “Usually the actual capacity of the table is somewhat less, since the hashing is not collision-free” How about: loves Mary John Lecture 1, 7/21/2005 Natural Language Processing 60 Summary Intro and a little history Language Similarities and Divergences Three classic MT Approaches Transfer Interlingua Direct Modern Statistical MT Evaluation Lecture 1, 7/21/2005 Natural Language Processing 61