WWW 2008 / Poster Paper April 21-25, 2008 · Beijing, China Feature Weighting in Content Based Recommendation System Using Social Network Analysis Souvik Debnath Indian Institute of Technology Kharagpur, India - 721302 Niloy Ganguly Indian Institute of Technology Kharagpur, India - 721302 Pabitra Mitra Indian Institute of Technology Kharagpur, India - 721302 cs_souvik@yahoo.co.in ABSTRACT niloy@cse.iitkgp.ernet.in pabitra@cse.iitkgp.ernet.in We prop ose a hybridization of collab orative filtering and content based recommendation system. Attributes used for content based recommendations are assigned weights dep ending on their imp ortance to users. The weight values are estimated from a set of linear regression equations obtained from a social network graph which captures human judgment ab out similarity of items. feature combination of different typ es of recommender system [2]. But none of these talks ab out producing recommendation to a user without getting her preferences. We demonstrate the effectiveness of the prop osed system for recommending movies in Internet Movie Database (IMDB) [1]. From the results it is seen that our recommendation is quite in agreement with IMDB recommendation. Categories and Subject Descriptors H.3.3 [Information Storage and Retrieval]: Information Search and Retrieval--information filtering 2. FEATURE WEIGHTING IN CONTENT BASED RECOMMENDATION In content based recommendation every item is represented by a feature vector or an attribute profile. The features hold numeric or nominal values representing certain asp ects of the item like color, price etc. A variety of distance measures b etween the feature vectors may b e used to compute the similarity of two items. The similarity values are then used to obtain a ranked list of recommended items. If one considers Euclidian or cosine similarity; implicitly equal imp ortance is asserted on all features. However, human judgment of similarity b etween two items often gives different weights to different attributes. For example, while choosing a camera, price of a camera may b e more imp ortant than the b ody color attribute. It may b e stated that users base their judgments on some latent criteria which is a weighted linear combination of the differences in individual attribute. Accordingly, we define similarity S b etween ob jects Oi and Oj as S (Oi , Oj ) = 1 f (A1i , A1j ) + 2 f (A2i , A2j ) + · · · + n f (Ani , Anj ) (1) General Terms Algorithms, Design, Exp erimentation Keywords Recommender System, Social Network, Feature Similarity 1. INTRODUCTION Recommendation systems produce a ranked list of items on which a user might b e interested, in the context of her current choice of an item. Recommendation systems are built for movies, b ooks, communities, news, articles etc. There are two main approaches to build a recommendation system - collab orative filtering and content based [3]. Collab orative filtering computes similarity b etween two users based on their rating profile, and recommends items which are highly rated by similar users. However, quality of collaborative filtering suffers in case of sparse preference databases. Content based system on the other hand does not use any preference data and provides recommendation directly based on similarity of items. Similarity is computed based on item attributes using appropriate distance measures. We attempt to hybridize collab orative filtering and content based recommendation for circumventing the difficulties of these individual approaches. Item similarity measure used in content based recommendation is learned from a collab orative social network of users. Some previous attempts at integrating collab orative filtering and content based approach include content b oosted collab orative filtering [3], weighted, mixed, switching and Copyright is held by the author/owner(s). WWW 2008, April 21­25, 2008, Beijing, China. ACM 978-1-60558-085-2/08/04. where n is the weight given to the difference in value of attribute An b etween ob jects Oi and Oj , the difference given by f (Ani , Anj ). The definition of f dep ends on the typ e of attribute (numeric, nominal, b oolean). We normalize f 's to have value in [0, 1]. In general the weights 1 , 2 , · · · , n are unknown. In the next section we describ e a method of determining these weights from a social collab orative network. We have used the ab ove methodology for recommending movie in IMDB database. A set of 13 features are considered. The features along with their typ e, domain and distance measures are shown in Table 1. All these feature values can b e obtained from the IMDB database. 3. DETERMINING FEATURE WEIGHTS We estimate the feature weights from a social network graph of items. The underlying principle is to use exist- 1041 WWW 2008 / Poster Paper Table 1: Features Used in Movie Recommendation Feature Typ e Domain Distance Measure (300-|Y1 -Y2 |) Release Year YYYY 300 Typ e String Movie,TV etc. T1 = T2 ?1 : 0 (10-|R1 -R2 |) Rating Integer (0-10) 10 (Vmax -|V1 -V2 |) Vote Integer ( 5) Vmax Director String D1 = D2 ?1 : 0 Writer String W1 = W2 ?1 : 0 |G1 G2 | Genre (String)* Drama etc. Gmax |K1 K2 | Keyword (String)* College etc. Kmax |C1 C2 | Cast (String)* ()* Cmax |C1 C2 | Country (String)* France etc. Cmax |L1 L2 | Language (String)* English etc. Lmax Color String Color, B/W C1 = C2 ?1 : 0 Company String C1 = C2 ?1 : 0 ing recommendation by users to construct a social network graph with items as nodes. The graph represents human judgment of similarity b etween items aggregated over a large p opulation of users. Optimal feature weights are considered to b e those which induce a similarity measure b etween items b est conforming to this social network graph. We describ e b elow a linear regression framework for determining the optimal feature weights. Let the items under consideration b e denoted by O1 , O2 , · · · , Ol , they corresp onds to the vertices of our social network. The edge weight b etween vertices Oi and Oj , E (Oi , Oj ) = # of users who are interested in b oth Oi , Oj . E (Oi , Oj ), suitably normalized, may b e considered as human judgment of similarity b etween Oi , Oj . Recall that feature vector (content based) similarity b etween Oi , Oj has b een defined as S (Oi , Oj ) in Eq. (1). Equating E (Oi , Oj ) with S (Oi , Oj ) leads to the following set of regression equations. i, j = 1..l i = j , 0 + 1 f (A1i , A1j ) + 2 f (A2i , A2j ) + · · · + n f (Ani , Anj ) = E (Oi , Oj ) (2) April 21-25, 2008 · Beijing, China Table 2: Feature Weight Values Feature Mean Variance Typ e 0.18 0.0023 Writer 0.36 0.0048 Genre 0.04 0.0001 Keyword 0.03 0.0011 Cast 0.01 0.0003 Country 0.07 0.0013 Language 0.09 0.0004 Company 0.21 0.0110 ferent sets of regression equations and solve for the weights. We consider the following varieties of regression equations. I. Equations using only edge weights 1 (i.e. movie pairs having at least one co-reviewer) I I. Equations using only edge weights 2 . (Note that this gives a graph which is a sub-graph of the previous graph.) For the ab ove graphs we construct a set of equations for each of the three (partitioned) datasets having 105 movies. Thus we get six sets of regression equations which we solve using SPSS package. It is observed from the weight values obtained from each of the ab ove six sets of regression equations that some of the features have stable weight values, while some features like Director, Rating, Vote, Year, Color have unstable or negative weight. We remove the features with unstable or negative weights from our regression equations and obtain the following set (Table 2) of stable weights for eight features. Also note, out of the 8, 3 features namely typ e, writer and company are particularly imp ortant. These features along with their weights are used to obtain the recommendations. 4.2 Performance of the Recommender System The prop osed algorithm is compared with pure content based method (considering equal weights for all features) and IMDB recommendations. Performance is measured using the classical Recal l measure, considering IMDB recommendation as b enchmark. The exp eriment has b een done on 10 different movies. The prop osed method achieves an average recall of 0.29. Where as, the pure content based method achieves a recall of 0.24 with IMDB. Thus the prop osed method agrees well with IMDB recommendation and in this regard it outp erforms pure content based method. This demonstrates the effectiveness of feature weighting. The values of f (A1i , A1j ), f (A2i , A2j ), · · · , f (Ani , Anj ) are known from the data as are the values of E (Oi , Oj ). Solving the ab ove regression equations provide estimates for the values of 1 , 2 , · · · , n . If there are l ob jects under consideration, it is p ossible to have l C2 regression equations of the ab ove form. In the case of movie recommendation we have considered movies as nodes in the social network. The edge weight b etween two movies is the numb er of IMDB reviewers who have reviewed b oth the movies. 5. CONCLUSION A hybridization of content based and collab orative filtering based recommendation is prop osed. The weights of different attributes of an item are computed from the collab orative social network using regression analysis. Further studies on other weight estimation techniques like sparse regression and isometric pro jection are b eing considered. Also more rigorous p erformance evaluation based on human judgment will b e undertaken. 4. EXPERIMENTAL RESULTS The movie database used in our recommendation system consists of 3 × 105 random movies downloaded from the IMDB. The movies voted by less than 5 p eople or the movies that have not b een reviewed by a single p erson are filtered out. The data is then divided into three equal sets. Each movie is describ ed by 13 features (Table 1). 6. REFERENCES [1] Internet Movie Database. http://www.imdb.com. [2] Bruke, R. Hybrid recommender systems: survey and experiments, User Mo deling and User Adapted Interaction 12 (2002) 331-370. [3] P. Melville, R.J. Mo oney, R. Nagara jan Content-Boosted Col laborative Filtering for Improved Recommendations, Pro ceedings of the 18th National Conference on Aritificial Intelligence (AAAI-2002), July 2002, Edmonton, Canada. 4.1 Stability of Feature Weights Our recommendation system is based on the presumption that feature weights are almost universal for different sets of users and movies. To test this presumption we consider dif- 1042