Transcript Διαφάνεια 1
Work-Package 5: Multimodal Processing and Interaction E-TEAMS Overview Leaders: Petros Maragos, ICCS-NTUA Alexandros Potamianos, TSI-TUC WP5 Outline: Description of Work in JPA3 T1. Book on Multimodal Processing and Interaction T2. Audio-Visual Speech Analysis and Recognition T2.1 Audio-Visual Feature Extraction and Fusion T2.2: Dynamic Models for AV-ASR, Evaluation T2.3: Audio-Visual to Articulatory Speech Inversion T3. Multimodal Integration for MM Analysis & Recognition T3.1: Video Analysis & Integration of Asynchronous Time-evolving Modalities T3.2: Multimodal Saliency T3.3: Integrated Multimedia Content Analysis T4. Interfaces to Multimedia T4.1: Multimodal Dialogue Interfaces T4.2: Eye-tracking Interfaces for Information Retrieval T4.3: Mobile Interfaces T5. Coordination of research and Dissemination of results WP5-e-Teams MUSCLE Plenary, Dec. 2006, France MUSCLE e-Teams: Goals & Objectives E-Team 10: Audio-Visual Speech Analysis & Recognition AV Feature Extraction and Feature Fusion Dynamical Models for AV-ASR, Evaluation Audio-Visual to Articulatory Speech Inversion. E-Team 11: Multimodal Processing & Multimedia Understanding Video Analysis and Integration of Asynchronous Time-evolving Modalities Audio-Visual Attention Modeling and Salient Event Detection Integrated Multimedia Content Analysis E-Team 12: Multimodal Interfaces Multimodal Recognition and Dialogue Systems Mobile Services Novel Interfaces (Eye-tracking) WP5-e-Teams MUSCLE Plenary, Dec. 2006, France MUSCLE e-Team 10: AV Speech Analysis & Recogn. Partners P. Maragos, G. Panandreou, A. Katsamanis, V. Pitsikalis (ICCS-NTUA) Alex Potamianos (TSI-TUC) Khalid Daoudi, Eduardo Sanchez-Soto (IRIT) Yves Laprie (INRIA-Parole) Guillaume Gravier, Patrick Gros (INRIA-Texmex) Costas Kotropoulos, N. Nikolaidis, I. Pitas (AUTH) Ron Kimmel (Technion) WP5-e-Teams MUSCLE Plenary, Dec. 2006, France MUSCLE e-Team 10: AV Speech Analysis & Recogn. Research areas include: Active-Appearance (and other Deformable) Models and Statistical Approaches for Face (or only mouth area) detection, modelling and feature extraction Nonlinear Speech Modelling for better audio & articulatory feature extraction A-V Feature Fusion Audio-visual to Articulatory Speech Inversion Application areas include: Audio-Visual Automatic Speech Recognition (including Lip-Reading) Collection of AV Databases and Evaluations Applications of AV articulatory Speech Inversion. WP5-e-Teams MUSCLE Plenary, Dec. 2006, France MUSCLE e-Team 10: AV Speech Analysis & Recogn. The main goals of e-team10 are Goal 1: Contribute to the Update of the State-of-Art Surveys of the WP5 MUSCLE Book Goal 2: Co-Author New Research Chapters of the WP5 MUSCLE Book Goal 3: Co-author conference and journal Papers on some focus theme with multiple MUSCLE partners (improve integration) Goal 4: Collaboration on common research agendas for AV-ASR and AV speech inversion WP5-e-Teams MUSCLE Plenary, Dec. 2006, France MUSCLE e-Team 10: AV Speech Analysis & Recogn. Recent Work Audio-Visual Speech Recognition (TUC, NTUA) Multimodal Feature Fusion (TUC, IRIT, NTUA) Audio-Visual Speech Inversion (INRIA-Parole, NTUA, KTHSpeech) Contribution to MUSCLE Book AV-ASR showcase proposal Future Plans Continued collaboration in aforementioned research areas Book project: first draft by June Workshop in Athens: April 2007 (joint with e-team 11,12) WP5-e-Teams MUSCLE Plenary, Dec. 2006, France MUSCLE e-Team 11: Multimodal Proc. & Understanding Partners P. Maragos, G. Evangelopoulos, K. Rapantzikos, S. Kollias (NTUA) Patrick Gros, Ewa Kijak, Guillaume Gravier (INRIA-Texmex) Costas Kotropoulos, N. Nikolaidis, I. Pitas (AUTH) Andreas Rauber (TU Wien) Alex Potamianos (TUC) Sanni Siltanen (VTT) Fred Stentiford, Wole Oyekoya (UCL) Enis Cetin (Bilkent) WP5-e-Teams MUSCLE Plenary, Dec. 2006, France MUSCLE e-Team 11: Multimodal Proc. & Understanding Research areas include: Stochastic modeling with several data streams / several temporal rates / weakly synchronized data Audio-Visual Cooperative Feature Extraction and Salient Event Detection Audio-visual Dialogue Understanding Image + Text Integration Audio + Text integration Application areas include: Understand (= structure) TV and other MM documents, and Prepare these documents for applications (repurposing, archiving) Event Detection and Segmentation in Sports videos Salient Event Detection and Dialogue Detection in Movies videos Speech Transcription and NLP Music genre analysis and music retrieval WP5-e-Teams MUSCLE Plenary, Dec. 2006, France MUSCLE e-Team 11: Multimodal Proc. & Understanding The main goals of e-team11 are: Goal 1: Contribute to the Update of the State-of-Art Surveys of the WP5 MUSCLE Book. Goal 2: Co-Author New Research Chapters of the WP5 MUSCLE Book. Goal 3: Co-author conference and journal Papers on some focus theme with multiple MUSCLE partners (improve integration). Goal 4: Collaboration on a common research agenda for multimodal feature fusion, saliency detection and multimodal processing. WP5-e-Teams MUSCLE Plenary, Dec. 2006, France MUSCLE e-Team 11: Multimodal Proc. & Understanding Recent Work Annotated Movie Information Database (AUTH) Audio-Visual Saliency Detection (AUTH, INRIA-Texmex, NTUA, TUC) Contribution to MUSCLE Book (NTUA, TUC, AUTH, INRIATexMex, TUWien, Bilkent) Movie summarization showcase proposal Future Plans Closer collaboration between partners on common movie DB Book project: first draft by June Workshop in Athens: April 2007 (joint with e-team 11,12) WP5-e-Teams MUSCLE Plenary, Dec. 2006, France MUSCLE e-Team 12: Multimodal Interfaces Partners: Alex Potamianos, Manolis Perakakis, Michalis Toutoudakis, TUC Petros Maragos, Nassos Katsamanis, George Panandreou, NTUA Sanni Siltanen, Santtu Toivonen, VTT Fred Stentiford, UCL Ugur Gudukbay, Ozgur Ulusoy, Enis Cetin, Yigithan Dedeoglu, Serkan Genc, Bilkent University Costas Kotropoulos, AUTH Andreas Rauber, TU Wien WP5-e-Teams MUSCLE Plenary, Dec. 2006, France MUSCLE e-Team 12: Multimodal Interfaces Research areas: multimodality annotation of multimedia databases search interface efficiency eye-tracking interfaces speech interfaces mobile interfaces Application areas: search/information retrieval on image and video databases search/information rertieval on the web information-seeking spoken dialogue systems mobile services portal/applications search/information retrieval for audio data WP5-e-Teams MUSCLE Plenary, Dec. 2006, France MUSCLE e-Team 12: Multimodal Interfaces The main goals of e-team 12 are: Goal 1: Contribute to the Update of the State-of-Art Surveys of the WP5 MUSCLE Book. Goal 2: Co-Author New Research Chapters of the WP5 MUSCLE Book. Goal 3: Co-author conference and journal Papers on some focus theme with multiple MUSCLE partners (improve integration). Goal 4: Collaboration on a common research agenda for multimodal feature fusion, saliency detection and multimodal processing. WP5-e-Teams MUSCLE Plenary, Dec. 2006, France MUSCLE e-Team 12: Multimodal Interfaces Recent Work Multimodal Spoken Interfaces (TUC, NTUA). Mobile Interfaces (TUC, VTT) Contribution to MUSCLE Book (TUC, UCL, VTT) “Augmented assembly using a multimodal interface” showcase proposal Future Plans Improve integration/collaboration between partners Book project: first draft by June Workshop in Athens: April 2007 (joint with e-team 11,12) WP5-e-Teams MUSCLE Plenary, Dec. 2006, France MUSCLE BOOK Title: Multimodal Processing and Interaction: Audio, Video, Text Contents: State-of-Art Reviews of WP6 + WP10 (updated) Contributed Research Chapters: New Work Agenda: Scope and Thematic Areas discussed during Audio-Conf & Meetings Each interested participant emails preliminary title + abstract Table-of-Contents of selected chapters is discussed with all participants Publisher is contacted WP5 MUSCLE Plenary, Dec. 2006, France MUSCLE Multimodal Processing and Interaction: Audio, Video, Text PART I: Review of the State-of-the-Art Cross-Modal Integration for Performance Improving in Multimedia: State-of-the-Art Review Human-Computer Interfaces for Multimedia Retrieval: State-of-the-Art Review PART II: New Research Directions Integrated Multimedia Analysis and Recognition 1. Stochastic Models for Multimodal Video Analysis 2. Adaptive Multimodal Fusion by Uncertainty Compensation with Application to Audiovisual Speech Recognition 3. Movie Analysis with Emphasis to Dialogue Detections 4. Using HMM for Action Recognition in Audio-Visual streams 5. Surveillance Using Both Video and Audio 6. Audiovisual Attention Modeling and Salient Event Detection WP5 MUSCLE Plenary, Dec. 2006, France MUSCLE Multimodal Processing and Interaction: Audio, Video, Text PART II (cont.): New Research Directions Searching Multimedia Content 1. Interactive Image Retrieval using a Hybrid Visual and Conceptual Content Representation 2. Multi-Modal Analysis of Text and Audio Features for Music Information Retrieval 3. Toward the Integration of NLP and ASR: POS Tagging and Transcription Interfaces to Multimedia Content 1. Design Principles for Multimodal Spoken Dialogue Systems 2. Eye Tracking for Image Retrieval 3. Natural/ Novel User Interfaces for Mobile Devices WP5 MUSCLE Plenary, Dec. 2006, France MUSCLE WP5 e-Team Scientific Talks WP5 e-team 10 scientific talk: "Stream weight computation for Audio-Visual Speech Recognition", by Eduardo Sanchez-Soto, IRIT (duration 15‘) WP5 e-team 11 scientific talk: "Dialogue Detecion in Movies", by D. Ververidis, AUTH (duration 15') WP5 e-team 12 scientific talk: "Augmented reality visualization: Construncting the mobile user interface", by Sanni Siltanen, VTT (duration 15') WP5 MUSCLE Plenary, Dec. 2006, France MUSCLE WP5 Scientific Talks (FRIDAY) WP 5 scientific talk: "Multimodal Fusion: Application to AV-ASR and AV Speech Inversion", by George Papandreou, NTUA (duration 15') WP 5 scientific talk: "A Natural Language Interface for a Video Database Management System", by Ugur Gudukbay, Bilkent U. (duration 15') WP 5 scientific talk: " Modality selection in Multimodal Dialogue Systems", by Alex Potamianos, TUC (duration 15') WP5 MUSCLE Plenary, Dec. 2006, France MUSCLE