Data-driven Modeling and Design of Networked Mobile Societies: A Paradigm Shift for Future Social Networking Ahmed Helmy Computer and Information Science and Engineering (CISE)
Download ReportTranscript Data-driven Modeling and Design of Networked Mobile Societies: A Paradigm Shift for Future Social Networking Ahmed Helmy Computer and Information Science and Engineering (CISE)
Data-driven Modeling and Design of Networked Mobile Societies: A Paradigm Shift for Future Social Networking Ahmed Helmy Computer and Information Science and Engineering (CISE) Department University of Florida [email protected] , http://www.cise.ufl.edu/~helmy Founder & Director: Wireless Mobile Networking Lab http://nile.cise.ufl.edu Funded by: Networked Mobile Societies Everywhere, Anytime Transportation/Vehicular Networks Sensor Networks Disaster & Emergency alerts Mobile Ad hoc, Sensor and Delay Tolerant Networks Emerging Behavior-Aware Services • Tight coupling between users, devices – Devices can infer user preferences, behavior – Capabilities: comm, comp, storage, sensing • New generation of behavior-aware protocols – Behavior: mobility, interest, trust, friendship,… – Apps: interest-cast, participatory sensing, crowd sourcing, mobile social nets, alert systems, … New paradigms of communication?! Paradigm Shift in Protocol Design Used to: Design general purpose protocols Evaluate using models (random mobility, traffic, …) Deployment context: Modify to improve performance and failures for specific context – May end up with suboptimal performance or failures due to lack of context in the design Propose to: Analyze, model deployment context Design ‘application class’-specific parameterized protocols Utilize insights from context analysis to fine-tune protocol parameters Problem Statement • How to gain insight into deployment context? • How to utilize insight to design future services? Approach • Extensive trace-based analysis to identify dominant trends & characteristics • Analyze user behavioral patterns – Individual user behavior and mobility – Collective user behavior: grouping, encounters • Integrate findings in modeling and protocol design – I. User mobility modeling – II. Behavioral grouping – III. Information dissemination in mobile societies, profile-cast The TRACE framework x1,1 L x1,n M O M x t,1 L x t,n MobiLib Represent Trace Analyze Characterize, Cluster Employ (Modeling, Protocol Design) Community-wide Wireless/Mobility Library • Library of – Measurements from Universities, vehicular networks – Realistic models of behavior (mobility, traffic, encounters) – Simulation benchmarks - Tools for trace data mining • Available libraries: – CRAWDAD (Dartmouth, ‘05-) crawdad.cs.dartmouth.edu MobiLib (USC & UFL, ’04-) nile.cise.ufl.edu/MobiLib • 60+ Traces from: USC, Dartmouth, MIT, UCSD, UCSB, UNC, UMass, GATech, Cambridge, UFL, … • Tools for mobility modeling (IMPORTANT, TVC), data mining • Types of traces: – Campuses (WLANs), Conference AP and encounter traces – Municipal (off-campus) wireless APs, Bus & vehicular Trace IMPACT: Investigation of Mobile-user Patterns Across University Campuses using WLAN Trace Analysis* - 4 major campuses – 30 day traces studied from 2+ years of traces - Total users > 12,000 users - Total Access Points > 1,300 Trace source Trace duration User type Environment Collection method Analyzed part MIT 7/20/02 – 8/17/02 Generic 3 corporate buildings Polling Whole trace Dartmouth 4/01/01 – 6/30/04 Generic w/ subgroup University campus Event-based July ’03 April ’04 UCSD 9/22/02 – 12/8/02 PDA only University campus Polling USC 4/20/05 – 3/31/06 Generic University campus Event-based 04/20/0505/19/05 (Bldg) 09/22/0210/21/02 * W. Hsu, A. Helmy, “IMPACT: Investigation of Mobile-user Patterns Across University Campuses using WLAN Trace Analysis”, two papers at IEEE Wireless Networks Measurements (WiNMee), April 2006 and IEEE Transactions on Mobile Computing, 2010 (To appear). Case study I – Individual Mobility T ra c e s O b se rv a tio n A p p lic a tio n In d iv id u a l u se r m o b ility M o b ility m odel M ic ro sc o p ic b e h a v io r U se r g ro u p s in th e p o p u la tio n E n c o u n te r p a tte rn s in th e n e tw o rk P ro file -c a st p ro to c o l S m a llW o rld b a se d m e ssa g e d isse m in a tio n M a c ro sc o p ic b e h a v io r Classification of Mobility Models Mobility Space Geographic Restriction Temporal Correlation * F. Bai, A. Helmy, "A Survey of Mobility Modeling and Analysis in Wireles Adhoc Networks", Book Chapter in the book "Wireless Ad Hoc and Sensor Networks”, Kluwer Academic Publishers, June 2004. Spatial Correlation Spatio-temporal Mobility in WLANs Skewed location preference Prob.(online time fraction > x) • Simple existing models are very different from the spatio-temporal characteristics in WLANs Characterize On/off activity pattern 95% on-line time at 5 most visited APs Periodic re-appearance Periodic repetition peaks daily/weekly The TVC Model: Reproducing Mobility Characteristics Skewed location visiting preference 1 Average fraction of online time associated with the AP Time-Variant Community (TVC) Model: 1- Assigns communities (locations) to users to re-produce location visiting preference 2- Varies temporal assignment of communities to re-produce the periodic re-appearance AP sorted by total am ount of tim e associated with it 11 21 31 41 51 61 71 81 91 1.E+00 1.E-01 1.E-02 M odel-sim plified 1.E-03 1.E-04 1.E-05 M IT M odel-com plex 1.E-06 Periodic re-appearance Prob.(Node re-appear at the same AP after the time gap) 0 .3 IEEE INFOCOM 2007 IEEE/ACM Trans. on Networking 2009 CCDF 0 .2 5 0 .2 M odel-sim plified M IT 0 .1 5 0 .1 0 .0 5 M odel-com plex 0 0 2 4 Tim e gap (days) 6 8 * Model-simplified: single community per node. Model-complex: multiple communities ** Similar matches achieved for USC and Dartmouth traces Case study II – Encounter Patterns T ra c e s O b s e rv a tio n A p p lic a tio n In d iv id u a l u s e r m o b ility M o b ility m odel M ic ro s c o p ic b e h a v io r U s e r g ro u p s in th e p o p u la tio n E n c o u n te r p a tte rn s in th e n e tw o rk P ro file -c a s t p ro to c o l S m a llW o rld based m essage d is s e m in a tio n M a c ro s c o p ic b e h a v io r Case Study II: Goal • Understand inter-node encounter patterns from a global perspective – How do we represent encounter patterns? – How do the encounter patterns influence network connectivity and communication protocols? • Encounter definition: – In WLAN: When two mobile nodes access the same AP at the same time they have an ‘encounter’ – In DTN: When two mobile nodes move within communication range they have an ‘encounter’ 0 Fraction of user population (x) 0.4 0.6 0.2 0.8 1 1 Cambridge 0.1 UCSD MIT USC 0.01 Dart-04 0.001 0.0001 Prob. (total encounter events > x) Prob. (unique encounter fraction > x) Observations: Nodal Encounters Dart-03 CCDF of unique encounter count CCDF of total encounter count •In all the traces, the MNs encounter a small fraction of the user population. • A user encounters 1.8%-6% on average of the user population •The number of total encounters for the users follows a BiPareto distribution. W. Hsu, A. Helmy, “On Nodal Encounter Patterns in Wireless LAN Traces”, IEEE Transactions on Mobile Computing (TMC), To appear The Encounter graph • Vertices: mobile nodes, Edges: node encounters x1,1 x1,n xt ,1 xt ,n Represent Daily encounter graphs for MIT trace Small Worlds of Encounters Regular graph Normalized CC and PL • Encounter graph: nodes as vertices and edges link all vertices that encounter Clustering Coefficient (CC) Small World Av. Path Length Random graph • The encounter graph is a Small World graph (high CC, low PL) • Even for short time period (1 day) its metrics (CC, PL) almost saturate Information Diffusion in DTNs via Encounters • Epidemic routing (spatio-temporal broadcast) achieves almost complete delivery Trace duration = 15 days Unreachable ratio (Fig: USC) Robust to the removal of short encounters Robust to selfish nodes (up to ~40%) Encounter-graphs using Friends • Distribution for friendship index FI is exponential for all the traces • Friendship between MNs is highly asymmetric • Among all node pairs: < 5% with FI > 0.01, and <1% with FI > 0.4 •Top-ranked friends form cliques and low-ranked friends are key to provide random links (short cuts) to reduce the degree of separation in encounter graph. Case study III – Groups in WLAN T ra c e s O b s e rv a tio n A p p lic a tio n In d iv id u a l u s e r m o b ility M o b ility m odel M ic ro s c o p ic b e h a v io r U s e r g ro u p s in th e p o p u la tio n E n c o u n te r p a tte rn s in th e n e tw o rk P ro file -c a s t p ro to c o l S m a llW o rld based m essage d is s e m in a tio n M a c ro s c o p ic b e h a v io r Case Study III: Goal • Identify similar users (in terms of long run mobility preferences) from the diverse WLAN user population – Understand the constituents of the population – Identify potential groups for group-aware service • Classify users based on their mobility trends and location-visiting preferences – Traces studied: semester-long USC trace (spring 2006, 94days) and quarter-long Dartmouth trace (spring 2004, 61 days) Representation of User Association Patterns W. Hsu, D. Dutta, A. Helmy, “Mining Behavioral Groups in WLANs”, ACM MobiCom ‘07 • Summarize user association per day by a vector – a = {aj : fraction of online time user i spends at APj on day d} -Office, 10AM -12PM Association vector: -Library, 3PM – 4PM (library, office, class) =(0.2, 0.4, 0.4) -Class, 6PM – 8PM • Sum long-run mobility in “association matrix” Each row represents the percentage of time spent at each location for a day An entry represents the percentage of online time during time day i at location j 0 .5 x 2 ,1 x t ,1 Office Dorm 0 .4 xi, j 0 .1 x t , n Each column corresponds to a location Example association matrix to describe a given user’s location visiting preference x1,1 x1,n xt ,1 xt ,n Represent Eigen-behaviors & Behavioral Similarity Distance • Eigen-behaviors (EB): Vectors describing maximum remaining power in assoc. matrix M (through SVD): - Get Eigen-vectors: - Get relative importance: - Get Eigen-values: • Eigen-behavior Distance weighted inner products of EBs – Sim(U ,V ) wi w j ui v j i , j • Assoc. patterns can be re-constructed with low rank & error • For over 99% of users, < 7 vectors capture > 90% of M’s power Similarity-based User Classification • Hierarchical clustering of similar behavioral groups Dartmouth • High quality clustering: 1 0 .8 0 .6 CDF – Inter-group vs. intra-group distance – Significance vs. random groups In te r-g ro u p In tra -g ro u p S e rie s 3 S e rie s 4 • 0.93 v.s. 0.46 (USC), 0.91 v.s. 0.42 (Dart) A M V D E ig e n - b e h a v io r d is ta n c e 0 .4 0 .2 0 0 0 .2 0 .4 0 .6 0 .8 D is ta n c e b e tw e e n u s e r s *AMVD = Average Minimum Vector Distance – Unique groups based on Eigen Behaviors Significance score of top eigenbehavior for USC Dartmouth Its own group 0.779 0.727 Other groups 0.005 0.004 1 User Groups in WLAN - Observations • Identified hundreds of distinct groups of similar users • Skewed group size distribution – – the largest 10 groups account for more than 30% of population on campus – Power-law distributed of group sizes • Most groups can be described by a list of locations with a clear ordering of importance • Some groups visit multiple locations with similar importance – – taking the most important location for each user is not sufficient Group size 1000 Videos D artm ou th 5 4 0 *x^-0 .6 7 USC 5 0 0 *x^-0 .7 5 100 10 1 1 10 100 U ser group size rank 1000 Behavioral Similarity: The Missing Link Models Models Traces Trace s Traces Models Existing models produce behaviorally homogeneous users and lack the richness of behavioral structure in real traces. Richer models are needed ! Behavioral Similarity Graphs Random and community models produce fully connected similarity graphs G. Thakur, A. Helmy, W. Hsu, “Similarity analysis and modeling of similarity in mobile societies: The missing link”, UF Tech Report, Jun 2010 Profile-cast: A New Communication Paradigm W. Hsu, D. Dutta, A. Helmy, ACM Mobicom 2007, WCNC 2008, Trans. Networking To appear Payload Dest Address Payload Target Profile • Sending messages to others with similar behavior, without knowing their identity – Announcements to users with specific behavioral profile V – Interest-based ads, similarity resource discovery • For Delay Tolerant Networks (DTNs) B Is E similar to V? E Is B similar to V? C ? D Is C/D similar to V? A Profile-cast Use Cases • Mobility-based profile-cast (Target mode) – Targeting group of users who move in a particular pattern (lost-andfound, context-aware messages, moviegoers) – Approach: use “similarity metric” between users Mobility space N SN S D D Scoped message spread in the mobility space Forward ?? N N D • Mobility-independent profile-cast (Dissemination mode) – Targeting people with a certain characteristics independent of mobility (classic music lovers) – Approach: use “Small World” encounter patterns Profile-cast Operation 1. profiling N N S N • Determining user similarity – S sends Eigen behaviors for the virtual profile to N – N evaluated the similarity by weighted inner products of Eigen-behaviors Sim(U ,V ) wi w j ui v j 2. Forwarding decision N i , j – Message forwarded if Sim(U,V) is high (the goal is to deliver messages to nodes with similar profile) – Privacy conserving: N and S do not send information about their own behavior Profile-cast CSI protocol: Target-mode S Sim (BP(A), P(T)) = similarity of node’s behavioral profile to the target profile Mobility Profile-cast (intra-group) Goal Epidemic S Group-spread S Single long random walk S S Multiple short random walks S Mobility Profile-cast (inter-group) Goal Epidemic S T.P. S T.P. Gradient-ascend S T.P. Single long random walk S T.P. Group-spread Multiple short random walks S T.P. S T.P. Profile-cast Evaluation * Results presented as the ratio to epidemic routing - Over 96% delivery ratio – Over 98% reduction in overhead w.r.t. Epidemic - RW < 45% delivery - Strikes a near optimal balance between delivery, overhead and delay - Other variants (e.g., multi-copy, simulated annealing) under investigation Video Extending Interest, Behavior Beyond Mobility • In addition to mobility, user’s web access and traffic patterns, applications used (among others) represent other dimensions of interest and behavior • Further analysis of network measurements (e.g., Netflow) can reveal behavioral characteristics in these dimensions • Netflow traces are 3 orders of magnitude larger than WLANs (WLANs: dozens of millions, Netflows: dozens of billions) • New challenges in mining ‘big data’ to get information S. Moghaddam, A. Helmy, S. Ranka, M. Somaya, “Data-driven Co-clustering Model of Internet Usage in Large Mobile Societies”, UF Tech Report, May 2010 Web-usage Spatio-temporal multi-D Clustering Clustering of Locations based on web access (similar locations coded with same color) - Users can be consistently modeled using few (~10) clusters with disjoint profiles. - Access patterns from multiple locations show clustered distinct behavior. Gender-based feature analysis in Campus-wide WLANs U. Kumar, N. Yadav, A. Helmy, Mobicom 2007, Crawdad 2007 3500 Male 2500 Female 25 Female 20 2000 15 1500 visitors 1000 10 500 0 Male 30 University Campus traces traces Area 5 0 Intel Apple Gem… Ente… Links… ASKE… D-Link Ager… Netg… Average Duration (sec) 3000 Percentage Users 35 Manufacturer - Able to classify users by gender using knowledge of campus map -Users exhibit distinct on-line behavior, preference of device and mobility based on gender -On-going Work -How much more can we know? -What is the “information-privacy trade-off”? Future Directions (Applications) • Behavior aware push/caching services (targeted ads, events of interest, announcements) • Caching based on behavioral prediction • Detecting abnormal user behavior & access patterns based on previous profiles • Can we extend this paradigm to include social aspects (trust, friendship, cooperation)? • Privacy issues and mobile k-anonymity • Participatory sensing, deputizing the community Disaster Relief (Self-Configuring) Networks sensor sensor sensor sensor sensor sensor sensor sensor sensor sensor sensor sensor sensor sensor sensor sensor sensor sensor sensor sensor On-going and Future Directions Utilizing mobility – Controlled mobility scenarios • DakNet, Message Ferries, Info Station – Mobility-Assisted protocols • Mobility-assisted information diffusion: EASE, FRESH, DTN, $100 laptop – Context-aware Networking • Mobility-aware protocols: self-configuring, mobility-adaptive protocols • Socially-aware protocols: security, trust, friendship, associations, small worlds – On-going Projects • Next Generation (Boundless) Classroom • Disaster Relief Self-configuring Survivable Networks The Next Generation (Boundless) Classroom Students sensor sensor sensor sensor sensor sensor-adhoc Embedded sensor network WLAN/adhoc WLAN/adhoc sensor sensor sensor Multi-party conference Tele-collaboration tools sensor sensor sensor-adhoc Instructor WLAN/adhoc Challenges sensor sensor sensor sensor sensor-adhoc -Integration of wired Internet, WLANs, Adhoc Mobile and Sensor Networks -Will this paradigm provide better learning experience for the students? Real world group experiments (structural health monitoring) Future Directions: TechnologyHuman Interaction The Next Generation Classroom Emerging Wireless & Multimedia Technologies Protocols, Applications, Services Human Behavior Mobility, Load Dynamics Engineering Multi-Disciplinary Research Human Computer Interaction (HCI) & User Interface Social Sciences Cognitive Sciences Education Psycology Application Development Service Provisioning Emerging Wireless & Multimedia Technologies How to Capture? Protocols, Applications, Services Human Behavior Educational/ Learning Experience Protocol Design How to Evaluate? Measurements Mobility Models Context-aware Networking How to Design? Traffic Models Mobility, Load Dynamics Thank you! Ahmed Helmy [email protected] URL: www.cise.ufl.edu/~helmy MobiLib: nile.cise.ufl.edu/MobiLib Outline • Ad Hoc, Sensor Networks & DTNs – The paradigm shift: trace-driven design • • • • • The TRACE framework Small worlds of encounters Mining the mobile society: Similarity analysis Profile-cast Future directions Background: Delay Tolerant Networks (DTN) • DTNs are mobile networks with sparse, intermittent nodal connectivity • Encounter events provide the communication opportunities among nodes • Messages are stored and moved across the network with nodal mobility A B C Graphs , Path Length and Clustering Small World Graph: Low path length, High clustering Regular Graph - High path length - High clustering 1 Random Graph - Low path length, - Low clustering 0.8 0.6 0.4 0.2 [Helmy’03] Clustering Path Length 0 0.0001 0.001 0.01 0.1 1 probability of re-wiring (p) - In Small Worlds, a few short cuts contract the diameter (i.e., path length) of a regular graph to resemble diameter of a random graph without affecting the graph structure (i.e., clustering) On Mobility & Predictability of VoIP & WLAN Users J. Kim, Y. Du, M. Chen, A. Helmy, Crawdad 2007 Work in-progress Markov O(2) Predictor Accuracy VoIP User Prediction Accuracy -VoIP users are highly mobile and exhibit dramatic difference in behavior than WLAN users -Prediction accuracy drops from ave ~62% for WLAN users to below 25% for VoIP users Motivates -Revisiting mobility modeling -Revisiting mobility prediction Profile-cast Operation 1. profiling N N S N – Singular value decomposition • Profiling user mobility provides summary ofnode the – Theamobility of a matrixis(Arepresented few eigen-behavior by an vectors are sufficient, e.g. for association matrix 99% of users at most 7 vectors describe 90% of power in the x x association matrix) x 1 ,1 N Each row represents an association vector for time slot a entry represents x 2 ,1 x t ,1 1, 2 1, n x ,j Sum. i vectors x t , n An the percentage of online time during time slot i at location j Mobility Independent Profile-cast Goal Flooding S SmallWorld-based S Single long random walk S S Multiple short random walks S Thank you! Ahmed Helmy [email protected] URL: www.cise.ufl.edu/~helmy MobiLib: nile.cise.ufl.edu/MobiLib Implementation Details (in progress) Future Work – N-copy-per-clique in the “mobility space” - D iffe re n t le g e n d s re p re se n t n o d e s w ith d iffe re n t m o b ility tre n d s -W h ite n o d e s d e n o te th e ta rg e t re c ip ie n ts S S S In te re st sp a c e M o b ility sp a c e P h y sic a l sp a c e – We expect this to work because similarity in mobility leads to frequent encounters 0 .7 0 .6 Encounter Ratio 0 .5 0 .4 0 .3 0 .2 0 .1 0 0 0 .2 0 .4 0 .6 U s e r p a ir s im ila rity 0 .8 1 Future Work – N-copy-per-clique in the “mobility space” S S S In te re st sp a c e M o b ility sp a c e - D iffe re n t le g e n d s re p re se n t n o d e s w ith d iffe re n t m o b ility tre n d s -W h ite n o d e s d e n o te th e ta rg e t re c ip ie n ts P h y sic a l sp a c e – Challenge: From mobility to interest and other classifications Netflow Trace Sample