Transcript Slide 1
Overview of research at HP Labs India © 2006 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice HP Labs around the world 7 locations 600 researchers in 23 labs Palo Alto Beijing Tokyo Bristol St. Petersburg Haifa Bangalore 20-30 large projects in 8 high-impact areas High-Impact Research Areas The next technology challenges and opportunities Digital Commercial Print Intelligent Infrastructure Content Transformation Sustainability Immersive Interaction Cloud Analytics Information Management Digital Commercial Print End State: Flexible, customized, on-demand printing that replaces the traditional distribution of mass-produced materials HP Labs’ research contribution: Breakthrough technology to accelerate the transformation to digital commercial printing Printing Process Commercial-grade throughput, cost and quality Data Path Efficient processing of massive data streams Color Self-calibration, intuitive rendering Job Creation Automated content generation Content Transformation End State: Complete convergence of physical and digital information HP Labs’ research contribution: Technologies to transfer content seamlessly from paper to digital and access digital content wherever paper is used today Displays/Materials Unbreakable, conformable, ultra-thin and lightweight; Digital with the look and feel of paper Content Management Intuitive, personalized organization; Intelligent content extraction; Live, interactive documents Immersive Interaction End state: Intuitive human interaction through and with technology HP Labs’ research contribution: Radically simplify the user experience to make technology more useful, intuitive and pervasive Intuitive Interfaces Natural, multi-modal, computer-human interactions Seamless Collaboration Immersive multimedia communication – anytime, anywhere – with no physical barriers Contextual Services Delivering “the right thing at the right time”; Personal paradigms to simplify Web interaction Information Management End State: The vast universe of enterprise information transformed into immediate, business-relevant insight HP Labs’ research contribution: Redefine the twin tasks of taming and exploiting information to revolutionize enterprise decision making Management Superior analysis, extraction and delivery of massive enterprise content Intelligence Capabilities to transform massive-scale, real-time data into transactional, operational business intelligence Analytics End state: Application of mathematic and scientific methodologies create better run businesses HP Labs’ research contribution: Drive secure, informed, highly effective decision making Solutions Predictive customer behavior; Individual profile learning Software Enhance automation and business processes Services Analytics that address and transform operational efficiency and security Cloud End state: Everything-as-a-Service: Billions of users, millions of services, thousands of service providers, millions of servers, exabytes of data, terabytes of traffic HP Labs’ research contribution: Develop an integrated cloud stack, from infrastructure to services Infrastructure Enterprise-grade security, capacity and management Services Disrupt traditional industries and offer rich, dynamic experiences Intelligent Infrastructure End state: Capture more value via dramatic computing performance and cost improvements HP Labs’ research contribution: Radical, new approaches for collecting, storing and transmitting data to feed the exascale data center Nanotechnology Intelligent Storage Memristors, Sensors, Cloud-scale, dynamic Photonic Interconnect enterprise-grade Data Center Cost and power efficient; Manageable, reliable; Easily programmable Networks Programmable, scalable, energy-efficient Sustainability End state: An IT industry with a light carbon footprint that drives the reduction of carbon emissions throughout the global economy HP Labs’ research contribution: Displace conventional supply chains with sustainable IT ecosystems Data Centers Integrated, end-to-end management of compute, power & cooling resources from cradle to cradle Tools & Methodologies Reengineer existing value chains using IT to lower environmental footprint 2008 HP Labs Innovation Research Awards 41 awards, 34 universities,14 countries • Stanford University • University of California, Berkeley • University of California, Davis • University of California, San Diego • University of California, Santa Barbara • University of Southern California • University of Toronto • Carnegie Mellon University • Massachusetts Institute of Technology • State University of New York at Buffalo • Rochester Institute of Technology • • • • University of Edinburgh, Scotland University of Bath, England University of Leeds, England University of Bristol, England • • • • EMEA Europe, Middle East & Africa Konstanz University, Germany Technische Universitaet Muenchen, Germany Vrije Universiteit Amsterdam, Netherlands Universidade do Minho, Portugal • Russian Academy of Sciences, Russia • University of Saint-Petersburg, Russia • Bilkent University, Turkey • Technion, Israel Institute of Technology, Israel Americas • National Institute of Informatics, Japan • University of Illinois at Urbana-Champaign • University of Michigan • University of WisconsinMadison • Purdue University • Georgia Institute of Technology 12 17 July 2015 • Peking University, China • Tsinghua University, China • Nanyang Technological University, Singapore • Indian Institute of Technology, Madras, India • Indian Institute of Technology, Bombay, India APJ Asia-Pacific & Japan Open cloud computing research test bed • A loose federation of “Centers of Excellence” around the globe − UIUC, Singapore IDA, KIT: 3 initial CoE − HP, Intel, Yahoo: 3 initial sponsors with CoE • Research objectives − Multi-datacenter, multi-geography, multitenancy, secure, massive scale, open test bed • Each center: 1000-4000 cores and up to PB storage − Base service: PRS (physical resource set) − Required services: Open EC2-like, S3, and Hadoop-on-demand − Plus additional local extensions/variants/service types HP Labs India © 2006 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice Gesture-based keyboard (GKB) PrintCast Uplink Side Downlink Side Uplink Dish Receiver Dish & LNBC Solid State Power Amplifier Set Top Box Up converter PrintCast Decoder Modulator Encoder Television Inserter AV Signal Data from PC Printer Paper & IT convergence Secure AiO HP Labs India • Three ongoing projects − Simplifying web consumption for the next billion (SWAN) – Remainder of this talk − Intuitive multimodal and gestural interaction (IMAGIN) − Paper in the digital enterprise (PRIDE) SWAN project - Motivation Simplifying web consumption for all Web is useful but complex to use for non-tech-savvy people Web has to be useful in the mobile context as well Why is web consumption complex ? • Each web site forces its own cognitive model on the user − Website decides the interaction model, user has to learn it & remember it − Different websites of the same genre impose their model • Web requires very “low” level instructions − Information access is through query and manual filtering approach − Content adaption, e.g. translation, require a lot of technical skills • Mobile web consumption is challenging − User’s frame of mind is different (limited attention span, distracted) − Devices are resource challenged • Broken web experience across different access methods − experience continuity across broadband, mobile & disconnected connectivity State of the art chumby Web Simplificatio n Web Widgets Passive consumption Browser Scripting Pipes Personalized Web Content Alerts Personalized web pages Mashups Mobile environments The Gap: Need to Simplify Personal Web Interactions - especially for Mobile Environments Technical Goals Users to set their own preferred interaction pattern Enabling users to easily express their own web interaction patterns Providing a familiar interface to all personal actions on the web Higher level intent while interacting with services Implicit web content consumption based on higher user intent expression, user feedback and user profile. Understanding and translating user intent to web actions Always responsive interactions Providing continual interaction across multiple devices & connectivity situations Providing ‘Responsive-Behavior’ despite disconnections Approach Create simple interactions for long term and exploratory information needs Intent Query Goal User Profiles Aggregation, ranking Query expansion Summarization End user value: Simplify the “Intent -> Query -> Goal” cycle Google Youtube Digg/Delicious Using User profiles to personalize services Explicit and Implicit info Data Collection User User Profile Profile Constructor Application Personalized services (Search, news, video, shopping) Aren’t online portals already doing this? • Online portals and search engines build user profiles using cookies and other stored data (search keywords, web pages accessed) − However, they don’t see all the user data − No way for users to aggregate and reuse the profiles different websites (Google, Yahoo, ..) build using their data − Privacy is a big problem Implicit profile construction - Prior approaches and their limitations • Word based Approach − Use words in user documents to represent user interests − Problems • Words appear independent of page content (“Home”, “page”) • Polysemy and Synonymy • Large profile sizes • DMOZ approach − Use existing ontology maintained for free − Problems • Too large (about 6 lakh DMOZ nodes), ontology has to be drastically pruned for use • Need to build classifiers for each DMOZ node Our approach • Use Wikipedia as the language of profile representation, map user documents to Wikipedia concepts − Has bias lower than DMOZ and variance lower than words • Build a hierarchical profile based on Wikipedia • Tag the profile concepts as (transactional or recreational) • Compute recency of user interests in a particular topic Mapping documents (web pages) to Wikipedia concepts Item: “Sony to slash PlayStation3 price” Term vector Representation: <sony:1>,<slash:1>, <playstation3:1>,<price:1> Item: “Jittery Sony Knocks $100 Off PS3 Price Tag” Term vector Representation: <jittery:1>, <sony:1>, <knocks:1> <ps3:1>,<price:1>, <tag:1> Additional features: titles of the retrieved articles query Sony to slash PlayStation3 price Index of Wikipedia dump 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. PlayStation Network Platform PlayStation 2 Ducks demo PlayStation 3 PlayStation Ken Kutaragi PlayStation Portable Console manufacturer Sony Group Crystal Dynamics PlayStation 3 accessories … … Term Vector vs Wikipedia profiles Words in TF * IDF based user profile Concepts in Wikipedia Based user profile Search Text Retrieval Conference Home Help News Privacy Google Terms HTML element Bank of America Google search ICICI Bank IDBI Bank Bank fraud New Artificial neural network Page Use Web View Results Information Web crawler Web design Debit card Extensible Markup Language Hewlett-Packard Microsoft Account XHTML Demand account Constructing the hierarchical profile Algorithm of Xu et.al. [WWW 2007] Wild life photography (5) Nature photography (10) Photography (15) Support (# pages mapped to this concept) Photography (15) Wild life photography (5) Nature photography (10) Tagging concepts in user profiles • Two types of tags − Whether the concept is of commercial or recreational interest − Recency of interest • Tagging Commercial interest − Crawl shopping site pages, map pages to concepts and label these concepts as commercial interests • Tagging Recreational interest − Use topics in Wikipedia recreational/hobby categories • Recency of Interest – Sigma(1/e^(today – time page supporting topic last accessed)) Wikipedia based profile Evaluation results •Profiles are stable (fig 1) 0.8 •Profile elements at all levels of the hierarchy have similar precision (fig 3) 0.6 Stability •Profile elements with high support have high precision (fig 2) 0.7 0.5 Stability_alpha 0.4 Stability_date 0.3 0.2 0.1 0 •Bookmarks are not a good data source for profiles 0 200 400 600 Number of web pages in cache 1 1.2 0.95 1 0.9 0.8 Percent (%) Precision Figure 1 0.85 0.8 0.75 Percentage in profile 0.6 Precision 0.4 0.2 0.7 0 Support > 5 Figure 2 3 < Support < 5 Support < 3 Level 1 Level 2 Level 3 Level 4 Level 5 Level 6 Figure 3 Query expansion – Personalized video • Approach − Create three additional queries (based on terms with high TF in title, tags and description) − Evaluating which expansion is better • • Example: Query on Youtube for “trains” Expansion using − Title train+osbourne+midnight+bullet+rollin+mystery+ maglev − Description train+runaway+record+version+video+http+track − Tags train+railroad+guitar+osbourne+railway+bullet • Cross-lingual expansions − Baba Ramdev − Baba+ramdev+yoga+swami+pranayam+li ye+ram+disease+dev+india+dhyan Query expansion - “Find similar” Problem – Can we construct queries to make getting “similar content” easier ? Approach - Identify key phrases for text document, query standard search engine, rank results Query - Ed Lazowska’s talk •Retrieving the original document capture restart+ capture random+random walk+page rank+capture random walk+restart yields retrieves Hopcroft’s talk at rank 1 in Google Result – Hopcroft’s talk Query expansion – “Find similar” economic growth global development economic history economic governance adam smith good governance economic growth process modern technology Query economic+growth+global+development+history+governance+adam+smith+process+rich+g ood+new+knowledge+cgd+brief+world+property+rights+productivity+labor+human+capital +getting+use+modern+technology+trade+barriers+public+goods+poor+countries+machine +natural+resources+research+intellectua Aggregating search results • Current search interfaces geared to immediate gratification, no way to tradeoff search latency for more relevant results • Different search engines have different coverage, no way to benefit from this • Navigation of results requires clicking back and forth on search results − Search result snippets often misleading Our solution • To create an aggregated and personalized Information Retrieval (IR) system that − compiles and consolidates the most relevant information on particular topic(s) from the web − automatically creates a PDF document on the topic Ranking results • Content Based Ranking (based on TF,IDF, Document Boost, Field Boost) • Delicious Vector Cosine Similarity Rank (URL) = d*(CBR) + (1-d) ( DVCS) User Interface results User study Document summarization using Wikipedia Algorithm1 Document sentences mapped to Wikipedia concepts Uses in degree of concept-sentence bipartite graph for sentence selection Additional features: titles of the retrieved articles query 1. Sony to slash PlayStation3 price PlayStation Network Platform PlayStation 2 Ducks demo PlayStation 3 PlayStation Ken Kutaragi PlayStation Portable Console manufacturer Sony Group Crystal Dynamics PlayStation 3 accessories … … 2. 3. 4. 5. 6. 7. Index of Wikipedia content 8. 9. 10. 11. Tested on DUC 2002 data from NIST 12. 13. Would have come in 3rd in the NIST challenge Limitations - Controlling size of the summary - General concepts (e.g. Sports) may win over specific concepts (e.g. Soccer) C1 C2 C3 C4 1 0 1 0 0 1 1 0 0 0 0 1 S1 S2 S3 In degree = 2 Document summarization - Algorithm 2 Intuition : Important sentences in the document map to important concepts and vice versa x t 1 n f(x ,G) t n Accumulate step Propagate sentence importance to concepts and concept importance to sentences over multiple iterations y t 1 m x t n n N m Broadcast step x Future work – Size of summary, multi-document summaries, Indian language summaries t 1 n m M n y t m Challenge 1 • Better intent expression • Multi-lingual query reformulation −Baba Ramdev − Baba+ramdev+yoga+swami+pranayam+liye+ram+disease+dev+in dia+dhyan • Interfaces to simplify feedback for query reformulation Challenge 2 • Long standing queries • Queries spread over time − Learning photography − Information delivery needs to be incremental and non-repetitive − Video retrieval • Channels • Create Initial stickiness • Ensure ongoing interest − Caching – Utility models • What are good evaluation measures for such systems ? Challenge 3 • Document summarization − Extracting leads − Compression versus missed information − Cross lingual summarization