WWW 2008 / Poster Paper April 21-25, 2008 · Beijing, China Guanxi in the Chinese Web - a Study of Mutual Linking Valerie King Dept. of Computer Science University of Victoria Victoria, BC, CANADA Louis Lei Yu Dept. of Computer Science University of Victoria Victoria, BC, CANADA Yan Zhuang val@uvic.ca yul@uvic.ca Dept. of Computer Science University of Victoria Victoria, BC, CANADA yzhuang@uvic.ca ABSTRACT Guanxi is a typ e of dyadic social interaction based on feelings ("qing") and trust ("xin"). Long studied by scholars of Chinese origin, it has recently drawn the attention of researchers outside of China. We define the concept of guanxi as applied to the interaction b etween web sites. We explore methods to identify guanxi in the Chinese web, show the unique characteristics of the Chinese web which result from it, and introduce a mechanism for simulating guanxi in a web graph model. Categories and Sub ject Descriptors: H.4.m [Information Systems]: Information Systems Applications; J.4 [Computer Appllications]: Social and Behavioral Sciences; General Terms: Algorithms, Theory Keywords: web graph, social networks, stochastic graph modeling, web measurement guanxi relationship, or two web sites can establish a guanxi base through common interests or through a third web site. We consider link exchange schemes, where only a phone call or an email is all that is required to establish the guanxi base and linking is done for the sole purp ose of promoting one's own web site, a weaker form of guanxi which we call cheap guanxi. 1 After establishing a guanxi base, two web sites will reach a mutual agreement to exchange resources; in this case, these resources take the form of links. Distinguishing b etween strong and cheap guanxi is one goal of our work. High degree nodes: As establishing strong guanxi takes effort, mutual links incident to nodes with many mutual links are more likely to b e weak guanxi. In some of our studies, we filter such edges out when considering strong guanxi. Triangles: If two web sites A and B establish guanxi via a third web site C, mutual links may form b etween each pairs of the web sites. We identify two structures: a Type 1 triangle, comp osed of two mutual links and one uni-directional link and a Type 2 triangle in which all three sides are mutual links, to b e good indications of two websites establishing guanxi via a third website. Over time, we exp ect some Typ e 1 triangles to turn into Typ e 2 triangles. We take the numb er of triangles involving a mutual link to b e one indication of the strength of its guanxi. Textual clues: Chinese web sites often have a sp ecially titled section of links lab eled "friendly links" or sometimes in the case of commercial web sites "partnership links". These links are likely to indicate either the existence of guanxi or the desire to establish guanxi with the other web sites. 1. GUANXI The Chinese web is notable for a large numb er of mutually linking web sites. We hyp othesize that this is in part a manifestation of a social construct known as guanxi, which can b e widely observed in Chinese culture. Guanxi has b een describ ed as "an informal ... p ersonal connection b etween two individuals who are b ounded by an implicit psychological contract to [maintain] a long term relationship, mutual commitment, loyalty and obligation" [2]. Dyadic relationships are the fundamental units of guanxi networks [2]. To establish guanxi, two parties must first establish a guanxi base: a tie b etween two individuals [2], e.g., same birthplace, same workplace, same family, close friendship. Also, two individuals can claim to have guanxi by acquaintance through a third party with whom they b oth have guanxi. Once a guanxi base is formed, guanxi can b e develop ed through the exchange of resources ranging from moral supp ort and friendship to favors and material goods [2]. 3. STRUCTURAL ANALYSIS OF GUANXI IN THE CHINESE WEB We use a web graph data set which is representative of the Chinese web [4]: CWT200G collected by Peking University in May 2006 and construct a digraph as follows: each web site is represented by a node. There is a single directed edge from node A to node B in the site graph iff there is at least one link from a web page at web site A to a web page at web site B . We refer to the resulting digraph as the Chinese site graph. It has 11,570 nodes and 475,880 edges. We randomly sampled 30,000 web sites from the data obe.g., from a travel agency web page: "Our website welcomes friendly links... .Each website's PageRank score has to b e 4, ranking in Alexa 50, 000." 1 2. GUANXI APPLIED TO THE WEB We regard a web site as representing a company, a p erson or a news source. Two web sites may exhibit guanxi by mutual linking. Their linking may reflect a prior existing Authors listed alphab etically. This work was supp orted by NSERC. Copyright is held by the author/owner(s). WWW 2008, April 21­25, 2008, Beijing, China. ACM 978-1-60558-085-2/08/04. 1161 WWW 2008 / Poster Paper tained from a general web crawl conducted by Microsoft in 2006 and constructed a general site graph of 30,000 nodes and 654,240 edges. 2 Directly comparing these two site graphs can b e misleading since they are of different sizes and densities. So, we use the hostgraph model [1] (where links are created by copying links of a randomly chosen prototyp e node as in[3]) to generate random graphs with prop erties similar to the Chinese web. That is, by tuning the parameters of the hostgraph model, we randomly generate graphs comparable in size, density, and in-degree distribution to that of the Chinese site graph. We found that the hostgraph model cannot explain the unusual numb er of mutual links in the Chinese site graph. A detailed comparison is illustrated in Figure 1. April 21-25, 2008 · Beijing, China if owner of b oth sites have established guanxi outside the web. Overall, the guanxi model can b e describ ed as follows: at each time step, dep ending on the density of the graph, either a new node with k edges is added or k edges are added to an existing node chosen uniformly at random. The k edges are added as follows: (1) With probability , we add k edges to destinations using the hostgraph model; (2) With probability 1- , we add k guanxi edges to destinations using the guanxi mechanism. We use this new model to generate a random graph with similar prop erties of the Chinese site graph extracted from CWT200G. The results are summarized in Figure 2. By changing the parameters, we can control the p ercentage of nodes and links involved in mutual links, Typ e 1 and Typ e 2 triangles resp ectively. 725 724 723 7 126 125 1234567869 6785 127 1263 126 1253 125 1243 124 1213 1 89 9 9 9 124 123 1 89 9 9 89 9 9 89 9 9 89 9 9 9 9 9 897 9 893 99 9 9 12345657638579 2 9 4375 98 9 89 9 99 Figure 1: Fraction of nodes in mutual links in the Chinese site graph, general site graph and hostgraph model graph Figure 2: Simulation results 5. ONGOING WORK AND APPLICATIONS 4. A GUANXI MODEL OF THE WEB We prop ose a mechanism to model the evolution of the guanxi structure on the web and we inject this mechanism into the hostgraph model to produce a new model for the Chinese web. The guanxi mechanism is defined as follows: in each time step, we add k guanxi edges to a node A. The destinations of the k guanxi edges are decided as follows: we first choose a prototyp e uniformly at random from the existing nodes. 1. With probability q , we add k edges with a method similar to the hostgraph model [1]. Once each edge is established, there is a probability f that the destination will link back to A. 2. With probability 1 - q , the node A first links to the prototyp e and then copies the remaining k - 1 edges from the guanxi links of the prototyp e randomly. Once each link is established, there is a probability g that the destination will link back to A. The copying process in (1) simulates web site A's attempt to form cheap guanxi links with p opular web sites in order to promote his/her own web site. We set the probability f to b e prop ortional to the relative p opularity (as determined by in-degree) of A and inversely prop ortional to the p opularity of destination B. In (2), we simulate the creation of guanxi links through a third party. Here g may b e a fixed constant 2 The crawl from Microsoft was done in a breadth-first-search fashion, after pre-processing and spam filtering, they successfully retrieved 463,685,607 HTML pages. These pages contain 17,672,011,890 hyp erlinks. Currently, we are conducting exp eriments to refine our ability to distinguish b etween strong and cheap guanxi, by analyzing textual indications of guanxi in the Chinese web and studying mutual links and related graph structures as they evolve over time. We are examining our findings in light of studies of social networks and the economics of link exchange schemes. To understand guanxi on the web as a cultural phenomon, we intend to examine site graphs of other nationalities. We b elieve this work may have applications to tasks such as producing p ersonally tailored recommendations, filtering out web spam, and understanding social networks. 6. REFERENCES [1] K. Bharat, B.-W. Chang, M. R. Henzinger, and M. Ruhl. Who links to whom: Mining linkage b etween web sites. In ICDM '01: Proceedings of the 2001 IEEE International Conference on Data Mining, pages 51­58, Washington, DC, USA, 2001. IEEE Computer Society. [2] X.-P. Chen and C. C. Chen. On the intricacies of the chinese guanxi: A process model of guanxi development. Asia Pacific Journal of Management, 21(3):305­324, 09 2004. [3] J. M. Kleinb erg, R. Kumar, P. Raghavan, S. Ra jagopalan, and A. S. Tomkins. The Web as a graph: Measurements, models and methods. Lecture Notes in Computer Science, 1627:1­17, 1999. [4] Q. Qi. On the construction of the document set of a large scale collection ­ cwt200g. Master's thesis, Peking University, 2006. 1162