WORDS Lab CSC 9010: Special Topics. Natural Language Processing. Paula Matuszek, Mary-Angela Papalaskari Spring, 2005 Examples taken from the Bird, Klein and Loper: NLTK.
Download ReportTranscript WORDS Lab CSC 9010: Special Topics. Natural Language Processing. Paula Matuszek, Mary-Angela Papalaskari Spring, 2005 Examples taken from the Bird, Klein and Loper: NLTK.
WORDS Lab CSC 9010: Special Topics. Natural Language Processing. Paula Matuszek, Mary-Angela Papalaskari Spring, 2005 Examples taken from the Bird, Klein and Loper: NLTK Tutorial, Tagging, nltk.sourceforge.net/tutorial/tagging/index.html CSC 9010: Special Topics, Natural Language Processing. Spring, 2005. Matuszek & Papalaskari 1 Words, Words, Words • So far we have covered methods that largely operate on tokens. – Tokenizing text – Stemming words and determining lemmas – POS-tagging – Language models based on n-gram frequencies CSC 9010: Special Topics, Natural Language Processing. Spring, 2005. Matuszek & Papalaskari 2 Every time I fire a linguist, my performance goes up1 • None of this has much of what could be considered "linguistic" knowledge or "understanding". – No parsing – Not much domain knowledge o "meaning" • For the next two sections of the course we will talk extensively about syntax and semantics. 1. Hirschberg, Julia. 1998. "Every time I fire a linguist, my performance goes up," and other myths of the statistical natural language processing revolution. Invited talk, Fifteenth National Conference on Artificial Intelligence (AAAI-98). CSC 9010: Special Topics, Natural Language Processing. Spring, 2005. Matuszek & Papalaskari 3 What's In a Word? • For this lab, we will focus on some of the things that can be done with application of the techniques we have already studied. • Format will be – Try a demo – Discuss what techniques were needed to implement it – Discuss some of what would be needed to improve it CSC 9010: Special Topics, Natural Language Processing. Spring, 2005. Matuszek & Papalaskari 4 Gender Genie • www.bookblog.net/gender/genie.html • Techniques: • How good is it? What might improve it? • Reference: – www.cs.biu.ac.il/~koppel/papers/male-female-textfinal.pdf CSC 9010: Special Topics, Natural Language Processing. Spring, 2005. Matuszek & Papalaskari 5 Pearson Knowledge Technologies Text Classification Demo • www.k-a-t.com:8080/classify/ • Techniques: • How good is it? What might improve it? • Reference: www.k-a-t.com/publications.shtml CSC 9010: Special Topics, Natural Language Processing. Spring, 2005. Matuszek & Papalaskari 6 Google Sets • labs.google.com/sets • Techniques: • How good is it? What might improve it? • Reference: if you find one let me know. Possibly something like this: ww.arxiv.org/pdf/cs.CL/0412098 CSC 9010: Special Topics, Natural Language Processing. Spring, 2005. Matuszek & Papalaskari 7 AT&T Text to Speech • www.research.att.com/projects/tts/demo.html • Techniques: • How good is it? What might improve it? • Reference: www.research.att.com/projects/tts/pubs.html CSC 9010: Special Topics, Natural Language Processing. Spring, 2005. Matuszek & Papalaskari 8