Transcript Slide 1
CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University http://cs224w.stanford.edu Networks with positive and negative relationships Our basic unit of investigation will be signed triangles First we will talk about undirected nets then directed Plan for today: - + - - + Model: Consider two soc. theories of signed nets Data: Reason about them in large online networks Application: Predict if A and B are linked with + or - 7/18/2015 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 2 Networks with positive and negative relationships Consider an undirected complete graph Label each edge as either: Positive: friendship, trust, positive sentiment, … Negative: enemy, distrust, negative sentiment, … Examine triples of connected nodes A, B, C 7/18/2015 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 3 Start with the intuition [Heider ’46]: Friend of my friend is my friend Enemy of enemy is my friend Enemy of friend is my enemy Look at connected triples of nodes: + + - - +Balanced + Consistent with “friend of a friend” or “enemy of the enemy” intuition 7/18/2015 + + - - -UnbalancedInconsistent with the “friend of a friend” or “enemy of the enemy” intuition Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 4 Graph is balanced if every connected triple of nodes has: all 3 edges labeled +, or exactly 1 edge labeled + Unbalanced 7/18/2015 Balanced Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 5 Balance implies global coalitions [Cartwright-Harary] If all triangles are balanced, then either: The network contains only positive edges, or Nodes can be split into 2 sets where negative edges only point between the sets + 7/18/2015 + + L Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu R 6 Every node in L is enemy of R B Any 2 nodes in L are friends + + + C – – A D + – Any 2 nodes in R are friends E R L Friends of A 7/18/2015 Enemies of A Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 7 International relations: Positive edge: alliance Negative edge: animosity Separation of Bangladesh from Pakistan in 1971: US supports Pakistan. Why? B 7/18/2015 USSR was enemy of China China was enemy of India India was enemy of Pakistan P US was friendly with China +? China vetoed Bangladesh from U.N. U – + –? – – C + – Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu I R 8 7/18/2015 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 9 7/18/2015 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 10 7/18/2015 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 11 7/18/2015 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 12 7/18/2015 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 13 7/18/2015 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 14 + Fill in the missing edges to achieve balance - Def 2: Global view Divide the graph into two coalitions Balanced? 7/18/2015 Def 1: Local view The 2 defs. are equivalent! Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 15 Graph is balanced if and only if it contains no cycle with an odd number of negative edges. How to compute this? Find connected components on + edges For each component create a super-node Connect components A and B if there is a negative edge between the members Assign super-nodes to sides using BFS 7/18/2015 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 16 7/18/2015 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 17 7/18/2015 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 18 7/18/2015 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 19 Using BFS assign each node a side Graph is unbalanced if any two super-nodes are assigned the same side L R R L L Unbalanced! 7/18/2015 L R Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 20 [CHI ‘10] Each link AB is explicitly tagged with a sign: Epinions: Trust/Distrust Does A trust B’s product reviews? (only positive links are visible) Wikipedia: Support/Oppose Does A support B to become Wikipedia administrator? + – + – – + – + + + – + +– – Slashdot: Friend/Foe Does A like B’s comments? Other examples: Online multiplayer games 7/18/2015 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 21 [CHI ‘10] Does structural balance hold? Triad + - P(T) P0(T) 0.62 0.70 0.49 - 0.07 0.05 0.21 0.10 + 0.05 0.32 0.08 0.49 - 0.007 0.003 0.011 0.010 - P0(T) Balance 0.87 + + P(T) Wikipedia + + - Epinions P(T) … probability of a triad P0(T)… triad probability if the signs would be random 7/18/2015 + x – + + + – – + + x + + + + Real data + + + x x – + + + – x + x – + + + x Shuffled data Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 22 Intuitive picture of social network in terms of densely linked clusters How does structure interact with links? Embeddedness of link (A,B): Number of shared neighbors 23 [CHI ‘10] Embeddedness of ties: Epinions Positive ties tend to be more embedded Positive ties tend to be more clumped together Public display of signs (votes) in Wikipedia further attenuates this 7/18/2015 Wikipedia Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 24 [CHI ‘10] Clustering: +net: More clustering than baseline –net: Less clustering than baseline Size of connected component: + - + + + +/–net: Smaller than the baseline 7/18/2015 + + Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu + - - + + + 25 [CHI ‘10] New setting: Links are directed and created over time A X B - + - + - + - - + - + + How many are now 16 *2 signed directed triads explained by balance? Only half (8 out of 16) Is there a better explanation? Yes. Status. 7/18/2015 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 26 [CHI ‘10] Links are directed and created over time Status theory [Davis-Leinhardt ‘68, Guha et al. ’04, Leskovec et al. ‘10] + Link A B means: B has higher status than A – Link A B means: B has lower status than A Status and balance give different predictions: - X - A B Balance: + Status: – 7/18/2015 + X + A B Balance: + Status: – Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 27 [CHI ‘10] Edges are directed Edges are created over time X has links to A and B Now, A links to B (triad A-B-X) How does sign of A-B depend signs of X? + A ? We need to formalize: Links are embedded in triads: Provides context for signs Users are heterogeneous in their linking behavior 7/18/2015 X A Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu X + B B 28 [CHI ‘10] Link (A,B) appears in the context (A,B; X) 16 different contextualized links: 7/18/2015 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 29 [CHI ‘10] Surprise: How much behavior of user deviates from baseline in context t: (A1, B1; X1),…, (An, Bn; Xn) … instances of contextualized link t k of them closed with a plus pg(Ai)… generative baseline of Ai A Then: generative surprise of k p (A ) triad type t: n Std. rnd. var.: sg (t ) i 1 i n p ( A )(1 p ( A )) g i 7/18/2015 g - Vs. empirical prob. of Ai giving a plus Give a better explanation of what we really do (2 slides): 1) ForX every node compute the baseline 2) Identify all the edges B that close same type of triads X 3) Compute surprise i g i A Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu B 30 Two basic examples: - X - A B Gen. surprise of A: — Rec. surprise of B: — 7/18/2015 + X + A B Gen. surprise of A: — Rec. surprise of B: — Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 31 END (when I spent 15 min for finishing up the previous lecture) 7/18/2015 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 32 [CHI ‘10] Determine node status: Assign X status 0 Based on signs and directions of edges set status of A and B + +1 Surprise is status-consistent, if: X A 0 + B +1 Status-consistent if: Gen. surprise > 0 Rec. surprise < 0 Gen. surprise is status-consistent if it has same sign as status of B Rec. surprise is status-consistent if it has the opposite sign from the status of A Surprise is balance-consistent, if: If it completes a balanced triad 7/18/2015 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 37 [CHI ‘10] Predictions: Sg(ti) Sr(ti) Bg Br Sg Sr t3 t15 t2 t14 t16 7/18/2015 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 38 [WWW ‘10] Both theories make predictions about the global structure of the network Structural balance – Factions Find coalitions + - + Status theory – Global Status Flip direction and sign of minus edges Assign each node a unique status so that edges point from low to high 7/18/2015 3 2 1 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 39 [WWW ‘10] Fraction of edges of the network that satisfy Balance and Status? Observations: No evidence for global balance beyond the random baselines Real data is 80% consistent vs. 80% consistency under random baseline Evidence for global status beyond the random baselines Real data is 80% consistent, but 50% consistency under random baseline 7/18/2015 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 40 [WWW ‘10] – Edge sign prediction problem – Given a network and signs on all but – u one edge, predict the missing sign – Machine Learning Formulation: + Predict sign of edge (u,v) Class label: Dataset: +1: positive edge -1: negative edge Learning method: Logistic regression + – v ? + + + + – – + Original: 80% +edges Balanced: 50% +edges Evaluation: Accuracy and ROC curves Features for learning: Next slide 7/18/2015 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 41 [WWW ‘10] For each edge (u,v) create features: Triad counts (16): Counts of signed triads edge uv takes part in Node degree (7 features): + + + u - + v Signed degree: d+out(u), d-out(u), d+in(v), d-in(v) Total degree: dout(u), din(v) Embeddedness of edge (u,v) 7/18/2015 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 42 [WWW ‘10] Classification Accuracy: Epin Epinions: 93.5% Slashdot: 94.4% Wikipedia: 81% Signs can be modeled from local network structure alone Trust propagation model of [Guha et al. ‘04] has 14% error on Epinions Triad features perform less well for less embedded edges Wikipedia is harder to model: Slash Wiki Votes are publicly visible 7/18/2015 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 43 + + + + + + + + - + + + + + + + + 44 Do people use these very different linking systems by obeying the same principles? How generalizable are the results across the datasets? Train on row “dataset”, predict on “column” Nearly perfect generalization of the models even though networks come from very different applications 7/18/2015 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 45 Signed networks provide insight into how social computing systems are used: Status vs. Balance Different role of reciprocated links Role of embeddedness and public display Sign of relationship can be reliably predicted from the local network context ~90% accuracy sign of the edge 46 More evidence that networks are globally organized based on status People use signed edges consistently regardless of particular application Near perfect generalization of models across datasets Many further directions: Status difference of nodes A and B [ICWSM ‘10]: A<B A=B A>B Status difference (A-B)