WWW 2008 / Poster Paper April 21-25, 2008 ˇ Beijing, China Personalized Tag Suggestion for Flickr Nikhil Garg Ecole Polytechnique Fédérale de Lausanne Ingmar Weber Ecole Polytechnique Fédérale de Lausanne nikhil.garg@epfl.ch ingmar.weber@epfl.ch ABSTRACT We present a system for p ersonalized tag suggestion for Flickr: While the user is entering/selecting new tags for a particular picture, the system is suggesting related tags to her, based on the tags that she or other p eople have used in the past along with (some of ) the tags already entered. The suggested tags are dynamically up dated with every additional tag entered/selected. We describ e three algorithms which can b e applied to this problem. In exp eriments, our b est-p erforming method yields an improvement in precision of 10-15% over a baseline method very similar to the system currently used by Flickr. Our system is accessible at http://ltaa5.epfl.ch/flickr-tags/. To the b est of our knowledge, this is the first study on tag suggestion in a setting where (i) no full text information is available, such as for blogs, (ii) no item has b een tagged by more than one p erson, such as for social b ookmarking sites, and (iii) suggestions are dynamically up dated, requiring efficient yet effective algorithms. others, as tags tend to capture the relevant keywords. Encouraging users to add more quality tags thus increases the overall value of the system. In this work, we describ e algorithms which help to semi-automate the tagging process by suggesting relevant tags to the user, who can then choose to add them (by clicking) or to ignore them (by adding a different tag manually). More concretely, we prop ose algorithms for the following personalized tag suggestion problem : Given (i) the identity of a user, (ii) an initial (small or even empty) set of tags, as well as (iii) the tagging history of all (or many) users, suggest a ranked list of related tags to the user. The initial set of tags we refer to as a "query". This problem is indep endent of any particular application, but we only evaluated our algorithms in the context of Flickr. We consider it most applicable in an interactive setting, where the user adds/selects tags one after another, and the system presents an up dated choice to the user at each step. The p ersonalization comp onent helps to (i) find sp ecialized tags, such as names of friends or places, and to (ii) disambiguate the query, by suggesting "car" to a car lover, when the query is "jaguar". Flickr itself offers a simple service which tries to suggest tags: when a user wants to tag a picture, she can select "Choose from your tags" and is then presented with a subset of tags she used in the past. This list is sorted lexicographical ly and the items displayed seem to b e a mix of tags used b oth very recently and very frequently by the user. The set is independent of the query, i.e., the tags already entered by the user for the given picture. This approach is similar to our Method 1. Categories and Subject Descriptors H.5 [Information Interfaces and Presentation]: H.5.2 User Interfaces General Terms Algorithms, Human Factors, Languages, Measurement Keywords tagging systems, tag suggestions, exp eriments, flickr 1. INTRODUCTION 2. RELATED WORK Automatically assigning/suggesting tags to blog posts [3, 4] is related to our problem. But as blog p osts come with full text, finding similar p osts and hence related tags is considerably easier than for pictures. Similarly, automatic tagging of web pages can b e improved/p ersonalized, if the system has access to the surfer's desktop [1]. In settings where several users collab oratively tag the same items, such as for social b ookmarking sites, yet other techniques can b e applied [5]. Algorithms for mining association rules [2] are also relevant. But they (i) are comparably slow, (ii) would not give any result if the initial set of tags does not occur in any picture, and (iii) would not generate a ranked result list. Still, we might explore adaptations of those techniques in the future. Flickr (http://www.flickr.com) is a p opular photo sharing site which hosts ab out 2 billion images. Users can upload and share their pictures, and they can also search among the public pictures. Search is not content-based, e.g., you cannot ask for "all pictures showing a jaguar". Rather Flickr offers keyword search: "find all pictures containing the term `jaguar' in the title or in the description or which have this tag". Titles are often not very useful, as they are typically very short and/or not descriptive, e.g., "freedom" or "love". Similarly, the descriptions usually do not describ e the picture, but tend to b e more of a commentary or p oetic nature, e.g., praising the b eauty of something or containing a related p oem. Tags are usually the b est resource for finding or organizing pictures, b oth p ersonal ones and pictures taken by Copyright is held by the author/owner(s). WWW 2008, April 21­25, 2008, Beijing, China. ACM 978-1-60558-085-2/08/04. 3. THE ALGORITHMS All of the following algorithms take as input an instance of the aforementioned problem and output a ranked list of (a ) tags, along with a weight wt for each tag t. Weights wt 1063 WWW 2008 / Poster Paper and wt can thus b e easily linearly combined, assuming that weights are normalized to make them comparable. (b) April 21-25, 2008 ˇ Beijing, China related tags. For each of the 400 pictures, we also asked a volunteer to evaluate the relevance of ab out 50 tags, obtained by combining the first 20 results of the three methods, excluding tags present in the original picture. The volunteer used a conservative approach and, e.g., delib erately marked "ears" as irrelevant for a standard p ortrait or "Europ e" for an arbitrary picture taken in a Europ ean country. Results for 200 pictures with 4-8 tags Method 1 Method 2 Method 3 .85ˇM2+.15ˇM3 .30 (.36) .42 (.51) .24 (.41) .41 (.56) .14 (.20) .19 (.24) .12 (.25) .19 (.29) .09 (.13) .11 (.15) .08 (.17) .11 (.19) .05 (.08) .06 (.09) .05 (.11) .06 (.12) Results for 200 pictures with 10+ tags Method 1 Method 2 Method 3 .85ˇM2+.15ˇM3 .55 (.60) .67 (.73) .43 (.57) .68 (.73) .35 (.40) .48 (.54) .30 (.40) .49 (.57) .25 (.29) .33 (.39) .21 (.30) .35 (.42) .17 (.22) .21 (.26) .15 (.22) .22 (.29) 3.1 Method 1: Local & Query-Independent Let u b e the user asking for tag suggestions. Let nu (t) b e the numb er of times a user u has used tag t in the past. (t (1) nu (t)2 ), gives us our first Simply using wt = nu (t)/ method, which we will use as a baseline. If a user has used "sky" 9 times and "john" 4 times b efore, it suggests "sky" b efore "john", regardless of what the user has already entered. Method 1 (or variants using the recency of a tag), which is similar to Flickr's approach, is a natural candidate to use at the b eginning when T (q ), the set of tags present in the query, is still empty. P@1 P@5 P@10 P@20 3.2 Method 2: Local & Query-Dependent Let T (p) b e the set of tags of a picture p. Let P (u) b e the set of pictures for user u. Method 2 first computes a weight for each picture p P (u) and each tag t T (p) as / / wp (t) = |T (p) T (q )| (and = 0 if p P (u) or t T (p)). The p (2) weight for a tag t is then given by w t = P (u) wp (t), ( t (2) 2 2) (2) normalized to wt = w ( / w t ). This method t gives a higher weight to "john", if the user has already entered "man", and if "man" and "john" were used together in the past. P@1 P@5 P@10 P@20 3.3 Method 3: Non-Local & Query-Dependent Due to space constraints, this method can only b e outlined. It proceeds in two phases. First, it identifies a set of promising groups1 . It does this by building a group-tag matrix and by using cosine similarities b oth on the user and the group profiles directly (for a query indep endent user-group similarity) and on the original query which is expanded by Method 2 (for a query-group similarity). Once a small set of promising groups has b een identified, it then uses Method 2 for each of these groups to get a list of tags to suggest. Table 1: Precision at tag position 1, 5, 10 and 20 for all three methods and a combination of two schemes. The numbers in parentheses include the additional relevance information from the user study. Numbers in bold indicate the best performing method. A combination of Method 2 and 3 gives an improvement of 10-15% for P@1 over Method 1. The fact that the absolute numb ers are higher for the second set has two reasons. First, there are more relevant tags to b e found. Even with the user evaluation, certain tags such as "vacation" or "friends" were, lacking background information, conservatively chosen as irrelevant.Second, users who add more tags to an individual picture usually have a b etter tagging history, so that all three methods automatically work b etter. Overall, Method 3 b enefits the most from the user study, as it suggests relevant tags outside the user's vocabulary. Although the improvements of the combined scheme over Method 2 are not dramatic, they are certainly noticable. Currently, we are working on an improved use of the "external knowledge" from groups. 4. THE EVALUATION SETUP Given a set of pictures, which have b een tagged by users with r > k tags, we do the following for each picture in the set. First, we remove it from the tag history, so that the system does not know ab out this combination of tags b eing used by the particular user. Then we give the first k tags (along with the identity of the user) to the algorithm to evaluate, which has to produce a ranked list of tags. To evaluate the quality of the result list, one can then compute precision-recall measures/graphs, where the remaining r - k tags are treated as the only relevant tags. This setup gives an underestimate of the quality of a system, as it assumes that each user has exhaustively tagged her pictures, so that any additional tag is considered irrelevant by her. 6. REFERENCES [1] P. A. Chirita, S. Costache, W. Nejdl, and S. Handschuh. P-tag: large scale automatic generation of p ersonalized annotation tags for the web. In The 16th international conference on World Wide Web (WWW), pages 845­854, 2007. [2] J. Hipp, U. Guntzer, and G. Nakhaeizadeh. Algorithms ¨ for association rule mining ° a general survey and U comparison. SIGKDD Explorations Newsletter, 2(1):58­64, 2000. [3] G. Mishne. Autotag: a collab orative approach to automated tag assignment for weblog p osts. In The 15th international conference on World Wide Web (WWW), pages 953­954, 2006. [4] S. C. Sood, K. J. Hammond, S. H. Owsley, and L. Birnbaum. TagAssist: Automatic Tag Suggestion for Blog Posts. In The 1st International Conference on Weblogs and Social Media (ICWSM 2007), 2007. [5] Z. Xu, Y. Fu, J. Mao, and D. Su. Towards the semantic web: Collab orative tag suggestions. Col laborative Web Tagging Workshop at WWW2006, 2006. 5. EXPERIMENTS We downloaded roughly 60M publicly accessible pictures for 80K groups. To filter out noise and to reduce the size of the data, we removed tags used less than 50 times. These tags were, however, not removed from the pictures on which we ran the evaluation, so that a p erfect recall was imp ossible. We tested the three algorithms on two sets of pictures: (i) 200 pictures with 4-8 given tags and (ii) 200 pictures with 10+ given tags each. Each algorithm was only given the first two tags of each picture, and then produced a ranked list of 1 Groups on flickr are more focused topic-wise than individual users, and groups gave b etter exp erimental results. 1064