WWW 2008 / Alternate Track: WWW in China - Mining the Chinese Web April 21-25, 2008 · Beijing, China Hidden Sentiment Association in Chinese Web Opinion Mining Qi Su1 , Xinying Xu1 , Honglei Guo2 , Zhili Guo2 , Xian Wu2 , Xiaoxun Zhang2 , Bin Swen1 and Zhong Su2 Peking University1 Beijing, 100871, China {sukia, xuxinying, bswen}@pku.edu.cn IBM China Research Lab2 Beijing, 100094, China {guohl, guozhili, wuxian, zhangxx, suzhong}@cn.ibm.com ABSTRACT The b oom of product review websites, blogs and forums on the web has attracted many research efforts on opinion mining. Recently, there was a growing interest in the finergrained opinion mining, which detects opinions on different review features as opp osed to the whole review level. The researches on feature-level opinion mining mainly rely on identifying the explicit relatedness b etween product feature words and opinion words in reviews. However, the sentiment relatedness b etween the two ob jects is usually complicated. For many cases, product feature words are implied by the opinion words in reviews. The detection of such hidden sentiment association is still a big challenge in opinion mining. Esp ecially, it is an even harder task of feature-level opinion mining on Chinese reviews due to the nature of Chinese language. In this pap er, we prop ose a novel mutual reinforcement approach to deal with the feature-level opinion mining problem. More sp ecially, 1) the approach clusters product features and opinion words simultaneously and iteratively by fusing b oth their content information and sentiment link information. 2) under the same framework, based on the product feature categories and opinion word groups, we construct the sentiment association set b etween the two groups of data ob jects by identifying their strongest n sentiment links. Moreover, knowledge from multi-source is incorp orated to enhance clustering in the procedure. Based on the pre-constructed association set, our approach can largely predict opinions relating to different product features, even for the case without the explicit app earance of product feature words in reviews. Thus it provides a more accurate opinion evaluation. The exp erimental results demonstrate that our method outp erforms the state-of-art algorithms. Keywords opinion mining, product feature, opinion word, association, mutual reinforcement 1. INTRODUCTION With the dramatic growth of web's p opularity, the numb er of freely available online reviews is increasing at a high sp eed. A significant numb er of websites, blogs and forums allow users to p ost reviews for various products or services (e.g., amazon.com). Such reviews are valuable resources to help the p otential customers make their purchase decisions. This situation is also notable in Chinese web services. In the past few years, mining the opinions expressed in web reviews attracts extensive researches [3, 10, 13, 19]. Based on a collection of customer reviews, the task of opinion mining is to extract customers' opinions and predict the sentiment orientation. It usually can b e integrated into search engines to satisfy users' search needs related to opinions, such as comparative web search(CWS)[17] and opinion question answering[9, 22]. In recent years, the Text REtrieval Conference (TREC) also held a task of finding relevant opinion sentences to a given topic in the Novelty track[9]. The task of opinion mining has b een usually approached as a classification of either p ositive or negative on a review or its snipp et. However, for many applications, simply judging the sentiment orientation of a review unit is not sufficient. Researchers[7, 8, 10, 15] b egan to work on finer-grained opinion mining which predicts the sentiment orientation related to different review features. The task is known as featurelevel opinion mining. Take product review as an example, a reviewer may praise some features of the product and while b emoan it in other features. So it is imp ortant to find out reviewers' opinions toward different product features instead of the overall opinion in those reviews. In feature-level opinion mining, most of the existing researches associate product features and opinions by their explicit co-occurrence. For example, for a product feature that app ears explicitly in reviews, we can judge the attitude towards it by its nearest adjacent opinion words. Either we can conduct syntax parsing to judge the modification relationship b etween opinion words and the product feature within a review unit. However, the approaches are either crude or inefficient in time cost, thus not very fit for realtime online web applications. Moreover, real reviews from customers are usually complicated. The approaches are not effective for many cases. Look at the following automobile review sentences: Categories and Subject Descriptors I.2.7 [Artificial Intelligence]: Natural language processing--text analysis General Terms Algorithms Part of this work was done while Qi Su and Xinying Xu were interns at IBM China Research Lab Copyright is held by the International World Wide Web Conference Committee (IW3C2). Distribution of these papers is limited to classroom use, and personal use by others. WWW 2008, April 21­25, 2008, Beijing, China. ACM 978-1-60558-085-2/08/04. 959 WWW 2008 / Alternate Track: WWW in China - Mining the Chinese Web April 21-25, 2008 · Beijing, China 1. MiniCooper Convertible . . . iniCooper Convertible . . . lovely, pretty but too expensive. 2. he car has a smooth line and stylish feeling. 3. he first feeling which Phaeton arouses is that it's a typical Germany car, ordinary and plain. 4. itron C5's front design is impressive. ¨ 5. 05 Eastar 05 has good performance, . . . M T à Tï C ÍÆÙ ðè Õ Ð Ûÿ º Áà Ì Ç C5 Ç üà ÄÝ ± ... 6. 7. 8. 9. 10. The Corvette C6 is beautiful, but commonplace . The NSX is truly one of the world's finest cars. Parts are expensive and the car is far more complicated. It is superbly sporty, yet elegant and understated. Transmission is clunky and suspect on many older A8s. Take the Chinese parts in the ab ove table1 as examples, we use a figure (Figure 1) to show the complicated relationship b etween the product features and the opinion words in the sentences. Sometimes the product features is explicit in reviews. Such as the product feature "front design " in the sentence 4. But for many cases, product feature words are implicit in review sentences. "MiniCooper Convertible is expensive " has the same meaning as "MiniCooper Convertible's price is expensive ". So it is considered that the real product feature "price " is left out in the review sentence. The similar situation happ ens in the sentence 3. Although the product feature may not app ear explicitly in reviews, it is usually implied by the opinion words in its context. For example, from the opinion words of "lovely, pretty, . . . ", we can deduce that the product feature b eing evaluated should b e the "appearance" or "design" of the car (see the related hollow circles and dash lines b etween product features and opinion words in figure 1). So, hidden sentiment association essentially exists b etween the product feature category and the group of opinion words. There is no doubt that the approach of either explicit adjacency or syntactic analysis is not the way to deal with this kind of problem. The basic purp ose of our approach in this pap er is to mine the hidden sentiment links b etween the groups of product feature words and opinion words, then build the association set. Using the pre-constructed association set, we can identify feature-oriented sentiment orientation of opinions more conveniently and accurately. The ma jor contributions of our approach are as follows: · Product feature words and opinion words are organized into categories, thus we can provide a non-trivial and more sound opinion evaluation than the existing word-based approaches. · We develop a mutual reinforcement principle to mine the associations b etween product feature categories and opinion word groups. · We prop ose to enhance clustering quality by b oth the multi-source knowledge and the mutual reinforcement principle. Aim at the Chinese applications, we develop the system architecture based on the sp ecialty of Chinese language, The Chinese parts of these examples are taken from http://auto.sohu.com; the English parts are taken from http://www.carreview.com 1 Figure 1: Complicated Relationship Between Product Features and Opinion Words in Real Reviews (solid circle/solid line represents an explicit word/ relationship; hollow circle/dash line represents an implicit word/relationship) and verify the p erformance on Chinese web reviews. However, the main prop osed approach in this pap er is languageindep endent in essence. With our approach, we can get an association set for the ab ove sentences as in table 1. It shows the advantage of our approach over the existing approaches in identifying the hidden sentiment links b etween product features and opinion words. Since the association set is pre-constructed, our approach is well fit for online applications. Table 1: Identified Product Features and the Related Opinion Words Using the Existing Approaches And Our Approach (with sentence numbers in brackets) Existing Approaches Identified Feature MiniCooper Convertible (1) line (2) feeling (2) it/Phaeton (3) front design (4) performance (5) Opinion Word lovely; pretty; expensive smooth stylish ordinary; plain impressive good Our Approach Identified Feature appearance; line; design; front design (1, 2, 3, 4) price (1) Opinion Word lovely; pretty; smooth; plain; stylish; ordinary; impressive expensive The remainder of the pap er is organized as follows. In section 2, we introduce some related works. Our sentiment association approach based on the mutual reinforcement principle is prop osed in section 3. In addition, we present the strategy of clustering optimization for the mutual reinforcement based on a combination of multi-source knowledge. Section 4 overviews the system architecture, and also con- 960 WWW 2008 / Alternate Track: WWW in China - Mining the Chinese Web cerns the extraction, pruning and representation of product features. Exp eriments and evaluations are rep orted in section 5. We conclude the pap er in section 6 with future researches. April 21-25, 2008 · Beijing, China feature category construction, we prop ose to utilize multisource knowledge including semantic and textual structure to enhance the algorithm. 3.1 The Problem In product reviews, opinion words are used to express opinion, sentiment or attitude of reviewers. Although some review units may express general opinions toward a product, most review units are regarding to sp ecific features of the p ro duct. A product is always reviewed under a certain feature set F . Supp ose we have got a lexical list O which includes all the opinion expressions and their sentiment p olarities. For the feature level opinion mining, identifying the sentiment association b etween F and O is essential. The key p oints in the whole process are as follows: · get opinion word set O (with p olarity lab els) · get product feature set F · identify relationships b etween F and O The focus of the pap er is on the latter two steps. We prop ose an unsup ervised approach to deal with the tasks. Given a product review, existing approaches identify the association b etween product feature words and opinion words in the review by their explicit adjacency. But for our approach, the association set is pre-constructed. The prop osed approach detects the sentiment association b etween F and O based on product feature word categories and opinion word groups gained from the review corpus. A mutual reinforcement principle is develop ed to solve the task. Meanwhile, we p erform clustering optimization under the unified framework. Generally sp eaking, the b enefits of our approach are threefold. · It groups the product feature terms in reviews if they have similar meaning or refer to the same topic. Thus it can provide users a more sound and non-trivial opinion evaluation. · Based on a pre-constructed association set, our approach is effective in finding the implicit product features, and well fit for online applications. · Also, we can largely identify the related explicit product features which an opinion word is attached in reviews. What is more, the approach is easy to b e combined with the existing explicit adjacency approaches to optimize the p erformance. 2. RELATED WORKS Opinion mining has b een extensively studied in recent years. A ma jority of these researches has focused on identifying the p olarity expressed in various opinion units such as word, phrase, sentence or review document. While not so much work has b een done on feature level opinion mining, esp ecially for Chinese reviews[16]. Liu[10] and Hu[7, 8]'s works may b e the most representative researches in this area. The app earance of implicit product feature was first showed in their pap ers based on English data. Obviously, if a product feature app ears explicitly in review units, it is an explicit product feature. While we consider that an implicit product feature should satisfy the following two conditions: 1) the related product feature word doesn't occur explicitly; 2) the feature can b e deduced by its surrounding opinion words in the review. Our definition of implicit product feature is a little different from the definition in [10]. In the pap er, they gave an example to show the implicit product feature in a digital camera review:"included 16MB is stingy". They considered "16MB " as a value of product feature "memory". Since the feature word "memory" does not app ear in the sentence segment, it is an implicit feature. For our approach, we only take the product features implied by opinion words as implicit ones. Words like "16MB " are treated as clustering ob jects to build product feature categories. The association rule mining approach in [10] did a good job in identifying product features, but it can not deal with the identification of implicit features effectively. They also noted the cases of synonyms and granularity of features. Different words may b e used to mean the same product feature. In addition, some product features may b e too sp ecific and fragment the opinion evaluation. They deal with the problems by the synonym set in WordNet and the semiautomated tagging of reviews. Our approach groups product feature words (including those which are considered to express the values of some product features in [10]) into categories. It's an unsup ervised method and easy to b e adapted to new domains. Our approach associates product feature categories and opinion word groups by their interrelationship. The idea of mutual reinforcement for multi-typ e interrelated data objects is utilized in some applications, such as web mining and collab orative filtering [21]. We develop the idea to identify the association b etween product feature categories and opinion word groups, and simultaneously enhance clustering under the uniform framework. 3.2 Associate Product Feature Categories and Opinion Word Groups By Mutual Reinforcement We first consider two sets of association ob jects: the set of product feature words F = {f1 , f2 , . . . , fm } and the set of opinion words O = {o1 , o2 , . . . , on }. A weighted bipartite graph from F and O can b e built, denoted by G (F , O, R). Here R = [rij ] is the m × n link weight matrix containing all the pairwise weights b etween set F and O. The weight can b e calculated with different weighting schemes. For example, if a product feature word fi and an opinion word oj co-occur in a sentence, we set the weight rij = 1, otherwise rij = 0. In this pap er, we set rij by the co-app earance frequency of fi and oj in clause level. The main idea of association approach is shown as figure 2. The co-app earance of product feature words and opinion words may b e incidental in review corpus and without essential semantic relatedness. Meanwhile, for the real se- 3. ASSOCIATION APPROACH TO FIND HIDDEN LINKS BETWEEN PRODUCT FEATURES AND OPINION WORDS In this section, we first illustrate the problem of feature level opinion mining. Then an association approach based on mutual reinforcement b etween product feature categories and opinion word groups is prop osed. Under the framework, for improving the p erformance of association and product 961 WWW 2008 / Alternate Track: WWW in China - Mining the Chinese Web April 21-25, 2008 · Beijing, China (i) (i) (i) Xi is represented as Ri = [r1 , r2 , . . . , rn ]T . And the interrelated relationship feature of Xj (Xj F ) is represented (j ) (j ) (j ) as Rj = [r1 , r2 , . . . , rn ]T . r (i) and r (j ) are the entries in the link weight matrix R of product feature set F and opinion word set O. Then, the inter similarity b etween Xi and Xj can b e calculated by: Sinter (Xi , Xj ) = cos(Ri , Rj ) (2) Figure 2: Association Approach Using the set of Product Feature Words F and the set of Opinion Words O mantic relatedness b etween product feature words and opinion words, the co-app earance may b e quantitatively sparse. Statistics based on word-occurrence loses semantic related information. While by clustering, we can organize product feature words or opinion words if they have similar meaning or refer to the same concept. So the judgement of association can b e more effective if it is applied with product feature categories and opinion word groups. The association set b etween product features and opinion word groups will b e constructed according to the interrelated pairwise weight b etween the two typ es of ob ject groups. To form the two kinds of groups, the general approach is to cluster the ob jects in F and O separately. However, the two typ es of ob jects are highly interrelated. It is obvious that surrounding opinion words play an imp ortant role in clustering product feature words. Similarly, when clustering opinion words, the product feature words co-occurred should also b e imp ortant. So we consider b oth intra relationship from single typ e homogeneous data ob jects and inter relationship from different typ e interrelated data ob jects. This up dated relationship space is utilized to p erform clustering on those related typ es of ob jects in set F and O. The purp ose of clustering data ob jects is to partition each ob ject into one cluster so that ob jects in the same cluster have high similarity, and ob jects from different clusters are dissimilar. Using the up dated relationship space, the similarity b etween two ob jects of the same typ e is defined as: S (Xi , Xj ) = Sintra (Xi , Xj ) + (1 - )Sinter (Xi , Xj ) (1) wher e, {Xi F Xj F } {Xi O Xj O} In equation 1, the similarity b etween two data ob jects Xi and Xj is denoted as a linear combination of intra similarity and inter similarity. The parameter reflects the weight of different relationship spaces. Sintra (Xi , Xj ) is the similarity of homogeneous data ob ject Xi and Xj calculated by traditional approach. This kind of similarity can b e considered based on the content information b etween two data ob jects. While Sinter (Xi , Xj ) determines the similarity of homogeneous data ob ject Xi and Xj by their resp ective heterogeneous relationships, which are based on the degree of interrelated association b etween product features and opinion words. It can b e considered based on the link information b etween two data ob jects. For example, supp ose Xi F , the interrelated relationship feature of data ob ject The basic idea of the mutual reinforcement principle is to propagate the clustered results b etween different typ e data ob jects by up dating their inter- relationship spaces, that is, the link information b etween two data ob ject groups. The clustering process can b egin from an arbitrary typ e of data ob ject. The clustering results of one data ob ject typ e update the link information thus reinforce the data ob ject categorization of another typ e. The process is iterative until clustering results of b oth ob ject typ es converge. Supp ose we b egin the clustering process from data ob jects in set F , then the steps can b e expressed as follows. ---------------------------------------------------- step 1. Cluster the data ob jects in set F into k clusters according to the intra relationship; step 2. Up date the interrelated relationship space of data ob jects in set O. Xi (Xi O), the interrelated rela(i) (i) (i) tionship feature is replaced with Ri = [u1 , u2 , . . . , uk ]. Where ux (x [1, k]) is an up dated pairwise weight with each comp onent in the vector corresp onding to one of the k clusters of F layer; step 3. Cluster the data ob jects in set O into l clusters based on the up dated inter-typ e relationship space; step 4. Up date the interrelated relationship space of data ob jects in set F . Yi (Yi F ), the interrelated relationship (i) (i) (i) feature is replaced with Ri = [v1 , v2 , . . . , vl ]T . Where vx (x [1, l]) is an up dated pairwise weight with each comp onent in the vector corresp onding to one of the l clusters of O layer; step 5. Re-cluster the data ob jects in set F into k clusters based on the up dated inter-typ e relationship space; step 6. Iterative the steps 2-5 until clustering results in b oth ob ject typ es converge. ---------------------------------------------------- In the procedure, a basic clustering algorithm is needed to cluster ob jects in each layer based on the defined similarity function (equation 1). In the first step of iterative reinforcement, we cluster data ob jects only by their intra relationship without interrelated link information, since in most cases link information is too sparse in the b eginning to help the clustering [23]. Then b oth intra- and inter- relationships are combined in the subsequent steps to iteratively enhance reinforcement. After the iteration, we can get the strongest n links b etween product feature categories and opinion word groups. That constitutes our set of sentiment association. 3.3 Product Feature Category Optimization Based on Semantic and Textual Structural Knowledge In the process of mutual reinforcement, any traditional clustering algorithm can b e easily emb edded into the iterative process, such as the K-Means algorithm[12] and other state-of-art algorithms. Take the plain K-Means algorithm 962 WWW 2008 / Alternate Track: WWW in China - Mining the Chinese Web as example, it is an unsup ervised learning based on iterative relocation to partition a dataset into k clusters of similar datap oints, typically by minimizing an ob jective function of average squared distance. The algorithm utilizes the constructed instance representation to conduct the process of clustering. As an unsup ervised learning, its p erformance is usually not comparable with sup ervised learning. However, the p erformance of mutual reinforcement of multi-typ e data ob jects is effected by the emb edded clustering. Usually, background knowledge ab out the application is useful in clustering. If we add more background knowledge for the clustering algorithm, we may exp ect to get a b etter clustering result. Our basic idea of clustering enhancement by background knowledge comes from COP-KMeans. COP-KMeans [20] is a semi-sup ervised variant of K-Means. Background knowledge, provided in the form of constraints b etween data objects, is used to generate the partition in the clustering process. Two typ es of constraints are used in COP-KMeans, including: Compatibility. two data ob jects have to b e in the same cluster. Incompatibility. two data ob jects must not to b e in the same cluster. The compatibility and incompatibility constraints in COPKMeans are checked by human lab eler. Here we employ a clustering optimization method in which background knowledge are extracted automatically from several knowledge resources. Then we construct the knowledge-based constraints to improve the primary clustering similarity measure based on content information. · Semantic Class. We use a WordNet-like semantic lexicon, Chinese Concept Dictionary (CCD )[11], to obtain coarse semantic class information for each data ob ject. Generally sp eaking, the noun network is richly develop ed in most of electronic lexicon like WordNet. Comparing with nouns, researches on semantic relatedness using WordNet p erformed far worse for words with other part-of-sp eeches [1]. And in our research, the extracted product feature words are included in the set of nouns and noun phrases (see section 4.2). So we only generate constraints based on semantic relatedness of nouns. There are totally 25 semantic class tags in CCD. We use the noun part to provide semantic class constraints for clustering enhancement. Two kinds of automatic constraint generation strategies are prop osed. First, some words may b elong to multi semantic classes simultaneously. For each such word, set A is generated by pairing any two of the elements in its semantic w class set. or d A denotes the complete set of all the word with multi semantic classes. By pairing any two of all the semantic classes, we get a set B. Then the incompatibility w table is constructed by the difference set of B and o r d A. In addition, we utilize the information of the common father node of two instances. If we cannot find their common father node in the semantic lexicon or the level of their common father node is too low, e.g. in the first level of the lexicon, we consider the two instances incompatible. · Textual Structure. The semantic class information of an data ob ject is context-indep endent. However, the context-dep endent information is also useful to construct constraints. In general, paragraph is a collection of related sentences with a single focus, which locates a rough semantic b oundary. Semantic coherence usually can b e assessed April 21-25, 2008 · Beijing, China within a paragraph. Our observation of product review corpus largely meet the p oint. For example, for an editor review on automobile, reviewers may usually present their opinions on the power of the automobile in a paragraph, followed by their opinions on the appearance in another paragraph. That's a common case in reviews on kinds of products. So we prop ose to calculate the textual structure based similarity b etween product feature word X1 and X2 by their paragraphic co-occurrence. It is denoted by equation 3. N pf (X1 , X2 ) sim(X1 , X2 ) = × pf (X1 ) × pf (X2 ) Np pf doc (X1 , X2 ) N × df (X1 , X2 ) (3) Here pf(w) is the paragraphic frequency of word w by counting the numb er of paragraphs in the corpus containing word w. pfdoc (w) is the paragraphic frequency of word w within a document. N denotes the total numb er of documents in the corpus. While Np is the numb er of paragraphs in a document. The equation indicates the similarity b etween two words according to their p ositional relationship based on paragraph structure. Utilizing their similarity, we augment the distance metric b etween the two data ob jects with a weighting function according to equation 4. × 1 dist (X1 , X2 ) = incom(X1 , X2 ) × - log -1 sim(X1 , X2 ) dist(X1 , X2 ) (4) The first two items denote the constraints which incorp orate prior knowledge from b oth universal language resources and corpus. They alter the original distance measure dist(X1 , X2 ) for the emb edded clustering algorithm. incom(X1 , X2 ) = 0 represents the semantic class based incompatibility of two data ob jects X1 and X2 . Since they are incompatible, we can first rule out of these imp ossible matches. The item of sim(X1 , X2 ) increases or decreases the similarity measure of the original vector distance according to their paragraphic distribution features. 4. SYSTEM ARCHITECTURE Based on the approach prop osed in section 3, we construct a feature-level opinion mining system to conduct sentiment analysis on Chinese web reviews. Some modules in our system are considered based on the sp ecialty of Chinese language, including product feature extraction & filtering, named entity identification and etc. While in essence, the main prop osed approach in this pap er is language indep endent. It is easy to adapt our system to different applications. 4.1 Architecture The architecture of our approach is illustrated (see Figure 3) in this section. Given a sp ecific product topic, the system first crawls the related reviews and puts them in the review database. Then parsing is conducted, including splitting review texts into sentences/clauses, Chinese word segment and part-of-sp eech tagging. After that, candidate product feature words and opinion words are extracted from reviews. Then we prune the candidates to generate the set of product feature words. The product feature words and opinion words are represented by Vector Space Model. According to the representation, we conduct the mutual reinforcement approach to construct product feature categories and realize sentiment association. Using the pre-constructed sentiment 963 WWW 2008 / Alternate Track: WWW in China - Mining the Chinese Web April 21-25, 2008 · Beijing, China Figure 3: Architecture of The System association set, we can then deal with feature-level opinion mining effectively. Below, we discuss the steps of candidate product feature extraction and pruning, followed by their representation for the mutual reinforcement based sentiment association. 4.2 Candidate Product Feature Word Extraction and Pruning Usually, adjectives are normally used to express opinions in reviews[6]. Therefore, most of the existing researches take adjectives as opinion words. In the research of[7, 8], they prop osed that other comp onents of a sentence are unlikely to b e product features except for nouns and noun phrases. In the pap er of [4], they targeted nouns, noun phrases and verb phrases. The adding of verb phrases caused the identification of more p ossible product features, while brought lots of noises. So in this pap er, we follow the p oints in[7, 8], extract nouns and noun phrases as candidate product feature words. In grammatical theory, a noun phrase consists of a pronoun or noun with any associated modifiers, including adjectives, adjective phrases, adjective clauses, and other nouns. Since many adjectives are evaluative indicators, we do not want to include the comp onents in our candidate product features. For our extraction, two or more adjacent nouns are identified as the candidate "noun phrases". The strategy is effective for some cases. While it may bring many noises. The noises may come from two asp ects. 1) Some candidates may not b e the integrated phrases. 2) It's obvious that not all the nouns or noun phrases could b e product feature words. We prop ose methods to prune the candidate product feature words from the two asp ects. The BD (b oundary dep endency) algorithm is prop osed to verify the phrase b oundary of candidates. The definition of BD is shown as equation 5. BD(w1 . . . wn ) = f (bdw + w1 . . . wn )f (w1 . . . wn + bdw) f (w1 . . . wn )2 (5) engine query to estimate the frequency, instead of our existing corpus (We use Google in our exp eriment). The BD method is prop osed based on the following consideration: some sp ecific adjacent words or characters indicate a rough phrasal b oundary. Such as "de" ('s ) in Chinese. We name these words b oundary indicators (b dw). In addition, some words usually cannot b e prefix word or suffix word of noun phrases. The ab ove two p oints can help to determine the phrasal b oundaries. If b oundary indicators app ear on the left and right of a noun phrase and the BD is higher than a threshold , we consider it correct noun phrase. Whereas, if imp ossible prefix or suffix words/characters app ear, we judge the extracted phrases is not completed ones. Some completed noun phrases and nouns may not b e real product features. Such as car, BMW, driver. . . in automobile reviews. We filter out part of the non-product feature words by their sense. Named Entity Recognition (NER) is utilized in the process. We use a NER system develop ed by IBM[5]. The system can recognize four typ es of NEs: p erson (PER), location (LOC), organization (ORG), and miscellaneous NE (MISC) that does not b elong to the previous three groups (e.g. products, brands, conferences etc.). Since the NEs have little probability of b eing product features, we prune the candidate nouns or noun phrases which have the ab ove NE taggers. By the pruning of candidate product feature words, we get the set of product feature words F . And the set of opinion words O is comp osed by all the adjectives in reviews. 4.3 Representation of Product Features and Opinion Words for Sentiment Association Product feature words and opinion words are clustered resp ectively in the iterative process of mutual reinforcement. To conduct the procedure, we represent each data ob ject instance(including product feature word and opinion word) by a feature vector and then conduct clusterings and the mutual reinforcement. Data from online customer product reviews are preprocessed in several steps, including sentence segmentation, stop words elimination and etc. Then, we get the second-order substantival context of each product feature instance and opinion word instance in reviews, say, the [-2, +2] substantival window around the instance after stop words elimination. The context is requested to b e in the same clause of the instance. We represent an instance as a set of following features. · Pointwise Mutual information (PMI) b etween the instance and its context. · For phrases, we also calculate the inner word PMI within the phrases. · Part-of-sp eech tagger of the context is another feature we used in the instance representation. Eg. for the noun phrase "battery life " in the sentence "The battery life of this camera is too short.", the instance's [-2, +2] substantival window should b e [ NULL, NULL, camera, short ]. The inner words are "battery " and "life ". Let w1 , w2 b e two words or phrases. The p ointwise mutual information[18] b etween w1 and w2 is defined as: PMI(w1 , w2 ) = log P (w1 , w2 ) P(w1 )P(w2 ) (6) In the equation, w1 . . . wn denotes an extracted adjacent noun in the sp ecific product reviews. f (w1 . . . wn ) is its frequency. To avoid data sparseness and get a more reliable frequency statistic, we use the numb er returned by a search where P(w1 ) and P(w2 ) are the frequency of w1 and w2 in the corpus. While P (w1 , w2 ) is the co-occurrent frequency of w1 and w2 in a certain p osition. For example, when cal- 964 WWW 2008 / Alternate Track: WWW in China - Mining the Chinese Web culating the inner word PMI within a phrase, P (w1 , w2 ) denotes the co-occurrence frequency of w1 and w2 within a phrase's range. Although mutual information weight is biased towards infrequent words [14], it can utilize more relatedness and restriction than other weight settings such as instance's document frequency (DF) and etc. So we represent the instances by the PMI weight in this research. April 21-25, 2008 · Beijing, China 5. EXPERIMENT AND EVALUATION We evaluate our approach from three p ersp ectives: 1) effectiveness of product feature category construction by mutual reinforcement based clustering; 2) precision of sentiment association b etween product feature categories and opinion word groups; 3) p erformance of our association approach to apply for feature level opinion mining. strongest n links b etween product feature categories and opinion word groups to construct the sentiment association set, part of the noisy candidates may b e excluded in the process. To conduct evaluations, we pre-construct an evaluation set. The extracted product feature words and opinion words are checked manually. If the word satisfies the sp ecification of some automobile review categories, we give it the relevant lab els. A word may have multiple lab els. For example, the word "color " may b e associated with b oth "exterior" and "interior". In our lab eled set, the average numb er of lab els for product feature words is 1.135. The average lab el numb er p er opinion word is 1.556. We utilize the set to conduct evaluations on b oth product feature categorization and sentiment association. 5.1 Data Our exp eriments take automobile reviews (in Chinese) as example. The corpus used in the exp eriments is comp osed by 300 editor reviews on automobile, including 806,923 Chinese characters. They are extracted from several sp ecialized auto review websites. Editor reviews are usually long in length, so a completed editor review may b e distributed over multiple web pages. For our corpus, the largest numb er of distribution is 14 web pages. The numb er of candidate product feature words and opinion words extracted from the corpus are shown as Table 2. Table 2: Number of Candidate Product Features and Opinion Words in Our Corpus Extracted Instance Total Non-Rep etitive Candidate Product Feature 89,542 18,867 Opinion Word 27,812 1,343 We use b oth the BD algorithm and NER based method to prune the candidate product features. Precision and Recall for the pruning strategy is shown in table 3 Table 3: Results of Candidate Product Feature Filtering Using Different Pruning Strategy Pruning Precision Recall Numb er of Remained Strategy (P) (R) Product Features BDnp 78.94% 90.73% 9,389 BDf eature 47.11% 88.57% 9,389 +NER 52.49% 86.48% 7,660 5.2 Evaluation of Product Feature Category Construction The p erformance of product feature categorization is evaluated using the measure of Rand index [2, 20]. In equation 7, P1 and P2 resp ectively represents the partition of an algorithm and manual lab eling. The agreement of P1 and P2 is checked on their n (n - 1)/2 pairs of instances, where n is the size of data set D. For each two instances in D, P1 and P2 either assigns them to the same cluster or to different clusters. Let a b e the frequency where pairs b elong to the same cluster of b oth partitions. Let b b e the frequency where pairs b elong to the different cluster of b oth partitions. Then the Rand index is calculated by the prop ortion of total agreement. Rand(P1 , P2 ) = 2(a + b) n × (n - 1) (7) We use two pruning strategies on candidate product feature words. The BD algorithm is effective to locate phrase b oundary, thus identify correct noun phrases. Its p erformance in identifying noun phrases is shown in table 3 as BDnp . However, it cannot b e used to judge whether a noun phrase is a product feature (shown as BDf eature ). Named entity tagging helps to filter out noisy candidates, but does not show significant improvement. In fact, finding real product feature words in reviews is still an issue in related researches. Most of existing research just simply use nouns and noun phrases as candidate product features, then conduct frequency based filtering. This problem will b e studied in our future research. Actually, since we choose the The parts of product feature words in the pre-constructed evaluation set are used to represent the data set D. Partition agreements b etween the pairs of any two words in the parts and in the clustering results are checked automatically. Our approach of mutual reinforcement can easily integrate any traditional clustering algorithm. The parameter reflects the relative imp ortance of content information and link information in the iteration. = 0 denotes only link information is utilized. When = 1, the approach is similar to traditional content-based clustering. Those can b e taken as the baselines. We fix several value of parameter ( [0, 1], stepp ed by 0.2) to conduct the exp eriments. Figure 4 shows the clustering results by different parameter . We can find from the results that the iterative mutual reinforcement achieves a higher p erformance than b oth the content-based ( = 1) and link-based ( = 0) approach. The reason for the improvement lies in the fact that the mutual reinforcement approach can fully exploit the relationship b etween product features and opinion words. Comparing with the two parts, the content-based ( = 1) method gets a higher p erformance than the link-based ( = 0) method. The improvement is also in that the former utilizes more context information than the latter. A comparative exp eriment is conducted to show the impact of background knowledge on the clustering quality. Figure 5 shows the p erformance of similar exp eriment settings as in figure 4 but without introducing background knowledge. Seen from figures 4 and 5, the approach utilizing background knowledge get higher precision than the approach 965 WWW 2008 / Alternate Track: WWW in China - Mining the Chinese Web April 21-25, 2008 · Beijing, China we first judge the category which the product feature word b elongs to. The headword (word with the highest frequency) of a product feature category is used to represent the category. We check the lab els of the headword and the detected opinion word in the pre-constructed evaluation set. If the two lab els are different, we judge the detected pair as illegal. Otherwise, there is a logical sentiment association in the pair. We define the precision as: precision = numb er of correctly associated pairs numb er of detected pairs (8) Figure 4: Performance of Product Feature Categorization by the Iterative Mutual Reinforcement Approach without it. In addition, the prop osed approach of combining b oth content information and link information b etween two data ob jects always outp erforms the two baselines, which use either content information ( = 1) or link information ( = 1) in the two exp eriment settings. The two groups of exp eriment results get their b est results at > 0.5. So, although link information can help to improve clustering p erformance, content information is still an imp ortant factor in the clustering process. Precision measures the prop ortion of correct sentiment association in the detected pairs. Since we use the same product feature grouping result on b oth evaluations of explicit adjacency and our association approach, it does not skew the evaluation comparison. The precisions are calculated on the pre-constructed evaluation set. So we did not check all the detected association pairs. Only the pairs which product feature word parts and opinion word parts are in the evaluation set are checked. Table 4: Impact of Sentiment Association By Explicit Adjacency and Mutual Reinforcement Approach Approach Detected Numb er Precision Explicit Adjacency 28,976 68.91% Association Approach 294,965 81.90% Table 4 shows the advantage of our association approach over the extraction by explicit adjacency. Using the same product feature categorization, our sentiment association approach get a more accuracy pair set than the direct extraction based on explicit adjacency. The precision we obtained by the mutual reinforcement approach is 81.90%, almost 13 p oints higher than the adjacency approach. Numb er of detected association by our approach shows its ability to finding hidden sentiment association. 5.4 Evaluation of Opinion Mining Relating to Different Product Features Figure 5: Performance of Product Feature Categorization by the Iterative Mutual Reinforcement Approach (Without Background Knowledge) We use a new test corpus to evaluate the ability of our association approach on feature level opinion mining. The corpus is comp osed by 50 automobile reviews(161,205 characters). In the reviews, automobile review features are rated on a 5 star scale (in half star increments) resp ectively. There is usually large variation in the sentiment scoring criterion for different automobile websites. We extract automobile reviews from the same websites for keeping a consistent scoring system. To validate the usefulness of the hidden sentiment link identification in the feature level opinion mining, we design the following exp eriments to predict sentiment on different product features in our test corpus: · By the Explicit Adjacency: for a product feature word in reviews, we first find its nearest neighb oring opinion word within the clause. The distance b etween a product feature word and its nearest opinion word may b e equal for the two conditions of left adjacency and right adjacency. So we try out two sentiment attachment strategies for the case. - Left Adjacency First: We attach the nearest opinion word in its left context to the product feature word. The setting is denoted by "adjacency(L)" in figure 6. - Right Adjacency First: We attach the nearest opinion 5.3 Evaluation of Sentiment Association In the process of iterative mutual reinforcement b etween product features and opinion words, clusterings of b oth data ob jects converge with iteration. Simultaneously, the interrelationship information b etween product feature categories and opinion word groups tend to b e stable. We evaluate the association set by the precision measurement. A comparable evaluation is made on the original extracted pairs based on the explicit adjacency. Since our purp ose is to find the association b etween opinion words and product feature categories, b oth of the two evaluations utilize the product feature categories generated by the same grouping method. For a detected pair of product feature word and opinion word by the explicit adjacency or our association approach, 966 WWW 2008 / Alternate Track: WWW in China - Mining the Chinese Web word in its right context to the product feature word. The setting is denoted by "adjacency(R)" in figure 6. We check a p olarity lexicon for the sentiment p olarity of the opinion word, and attach the sentiment p olarity to the feature category which the product feature word b elongs to. The sentiment strength for a feature category is obtained by summing up all the attached sentiment orientation with the category. · By the Pre-constructed Association Between Product Feature Categories and Opinion Word Groups: by our association approach, we have constructed a sentiment association set b etween product feature categories and opinion word groups. So in this exp eriment setting, we evaluate feature-oriented sentiment without using product feature words in reviews. Utilizing the association set, we directly attach the sentiment orientation of an opinion word to it related product feature category. The sentiment strength for a feature category is obtained by summing up values of each related sentiment orientation. If an opinion word is associated with several product feature categories, we attach its sentiment orientation to all the related product feature categories. The approach is denoted as "our approach" in the exp erimental figure. · By the Combination of Explicit Adjacency and Pre-constructed Association: we evaluate the combination of b oth explicit adjacency and pre-constructed association set. If there are opinion words but no product feature words in a clause, we attach the sentiment orientation to the related product feature category by our association set. Otherwise, if a clause includes b oth kinds of words, we attach the sentiment orientation to the related product feature category by the adjacency. Similarly, we try out two strategies to identify the nearest neighb ors of opinion words and product feature words. - Left Adjacency First, denoted by "combination(L)" - Right Adjacency First, denoted by "combination(R)" As we have mentioned in section 2, three key p oints in the feature level opinion mining are opinion word list, product feature category and their association. In this pap er, we deal with the latter two tasks. To evaluate the sentiment orientation of opinions, we have constructed a p olarity lexicon. The lexicon consists of 1,000 opinion words with p olarity lab els as 1 (p ositive) or -1 (negative). We predict sentiment strength for different product features in reviews by adding the p olarity of the related opinion words. The semantic relatedness b etween product features and opinion words is judged with the ab ove mentioned methods. In our test corpus, product features involved in each review are rated on a 1-5 star scale rating. 1 star is the lowest rating of p ositive sentiment; 5 stars is the highest one in the rating system. We compare the relative ranking of different scoring methods with the standard answer set. For each product feature, we rank the 50 reviews according to their sentiment evaluation on the product feature. Then the corresp onding ranking is extracted from the standard evaluation set. We check the coincidence b etween a generated ranking with the standard ranking. Given a reviewed product feature fj and a review set X which is comp osed by n product reviews, we can get a ranked review sequence of Ranking(fj , X ). The sequence is obtained according to their sentiment strength for fj . We use Ranking(fj , X )i (i < n) to denote the i p osition in the ranking. If a generated ranking has the same memb er with the standard ranking in their April 21-25, 2008 · Beijing, China Ranking(fj , X )i , it is considered having a correct output in the p osition. We measure the ratio of correct output with the length of a ranking. For our exp eriments, the ranking length is the same as the numb er of product reviews in the test corpus. Figure 6 shows the ranking precision of the 50 reviews on different product features. Figure 6: Ranked Sentiment Strength of Reviews on Different Product Feature Categories From the figure, we see a remarkable effect of our association set for identifying sentiment related to the product feature of "exterior ". A similar but not so significant effect can b e seen on the product feature of "power". For the product feature of "exterior ", We can get a quite more accurate ranking by our pre-constructed association set than b oth the adjacency method and the combination method. We b elieve that the advantage comes from the ability of our approach to identify implicit product features. Product feature words which have the similar meaning of "exterior" are usually implicitly expressed in reviews. People seldom explicitly use the words like "exterior" to comment the app earance asp ect of a product or other topics. They just have comment on the exterior of a thing by saying "it's beautiful, elegant..." or something like that. So our association approach can get an amazing p erformance on the sentiment evaluation of this kind of product feature. For the automobile feature of "interior", our association approach shows a little worse p erformance than the adjacency based shallow extraction. Through a checking of the corpus, we find it's a common case that p eople review the product feature by lots of explicit feature words, such as "seat", "acoustics" and so on. If the related opinion is expressed in such a sentence like "The acoustics is excel lence.", our approach is less effective than the approach by explicit adjacency. For all the product features, the combination approach always get b etter p erformance than b oth the adjacency methods. That shows the contribution of our pre-constructed association set. The set can provide hidden sentiment identification to help in getting a more accurate feature level opinion mining. 6. CONCLUSION AND FUTURE WORK In this pap er, we prop ose a novel algorithm to deal with the feature-level product opinion mining problem. An unsu- 967 WWW 2008 / Alternate Track: WWW in China - Mining the Chinese Web pervised approach based on the mutual reinforcement principle is prop osed. The approach clusters product features and opinion words simultaneously and iteratively by fusing b oth their content information and link information. Based on the clusterings of the two interrelated data ob jects, we construct an association set b etween product feature categories and opinion word groups by identifying the strongest n sentiment links. Thus we can exploit the sentiment association hidden in reviews. Moreover, knowledge from multisource is used to enhance clustering in the procedure. Our approach can largely predict opinions relating to different product features, even for the case without explicit app earance of product feature words in reviews. The exp erimental results based on real Chinese web reviews demonstrate that our method outp erforms the state-of-art algorithms. Although our methods of candidate product feature extraction and filtering can partly identify real product features, it may lose some data and remain some noises. We'll conduct deep er research in this area in future works. April 21-25, 2008 · Beijing, China [10] [11] [12] [13] [14] 7. ACKNOWLEDGMENTS [15] The work describ ed here was supp orted in part by PKUIBM Innovation Institute and the National Natural Science Foundation of China (Pro ject No. 60435020). 8. REFERENCES [16] [1] A. Budanitsky and G. Hirst. Evaluating wordnetbased measures of lexical semantic relatedness. Computational Linguistics, 32(1):13­47, 2006. [2] C. Cardie, K. Wagstaff, and et al. Noun phrase coreference as clustering. In Proceedings of the Joint Conf on Empirical Methods in NLP and Very Large Corpora, pages 82­89, 1999. [3] K. Dave, S. Lawrence, and D. M. Pennock. Mining the p eanut gallery: Opinion extraction and semantic classification of product reviews. In Proceedings of 12th International Conference on the World Wide Web (WWW'03), pages 519­528, 2003. [4] A. Fujii and T. Ishikawa. A system for summarizing and visualizing arguments in sub jective documents: Toward supp orting decision making. In Proceedings of the Workshop on Sentiment and Subjectivity in Text, ACL2006, pages 15­22, 2006. [5] H. Guo, J. Jiang, G. Hu, and T. Zhang. Chinese named entity recognition based on multilevel linguistic features. Lecture Notes in Artificial Intel ligence, 3248:90­99, 2005. [6] V. Hatzivassiloglou and K. McKeown. Predicting the semantic orientation of adjectives. In Proceedings of the 35th Annual Meeting of the ACL and the 8th Conference of the European Chapter of the ACL, pages 174­181, 1997. [7] M. Hu and B. Liu. Mining and summarizing customer reviews. In Proceedings of the ACM SIGKDD International Conference on Know ledge Discovery & Data Mining (KDD-2004), pages 761­769, 2004. [8] M. Hu and B. Liu. Mining opinion features in customer reviews. In Proceedings of Nineteeth National Conference on Artificial Intel lgience (AAAI-2004), pages 755­760, 2004. [9] S.-M. Kim and E. Hovy. Identifying opinion holders for question answering in opinion texts. In Proceedings [17] [18] [19] [20] [21] [22] [23] of AAAI Workshop on Question Answering in Restricted Domains, 2005. B. Liu, M. Hu, and J. Cheng. Opinion observer: analyzing and comparing opinions on the web. In Proceedings of the 14th international conference on World Wide Web (WWW'05), pages 1024­1025, 2005. Y. Liu and et al. The CCD construction model & its auxiliary tool vacol. Applied Linguistics, 45(1):83­88, 2003. J. B. MacQueen. Some methods for classification and analysis of multivariate observations. In Proceedings of 5-th Berkeley Symposium on Mathematical Statistics and Probability, pages 281­297, 1967. B. Pang and L. Lee. Seeing stars: Exploiting class relationships for sentiment categorization with resp ect to rating scales. In Proceedings of 43nd Annual Meeting of the Association for Computational Linguistics (ACL'05), 2005. P. Pantel and D. Lin. Discovering word senses from text. In Proceedings of ACM SIGKDD Conference on Know ledge Discovery and Data Mining, pages 613­619, 2002. A.-M. Pop escu and O. Etzioni. Extracting product features and opinions from reviews. In Proceedings of Human Language Technology Conference/Conference on Empirical Methods in Natural Language Processing(HLT-EMNLP -05), Vancouver, CA, 2005. Q. Su, Y. Zhu, B. Swen, and S.-W. Yu. Mining feature based opinion expressions by mutual information approach. International Journal of Computer Processing of Oriental Languages, 20(2/3):137­150. J.-T. Sun, X. Wang, D. Shen, H.-J. Zeng, and Z. Chen. CWS: a comparative web search system. In Proceedings of the 15th international conference on World Wide Web, 2006. M. C. Thomas and J. A. Thomas. Elements of Information Theory. Numb er 10. 1991. P. Turney. Thumbs up or thumbs down? semantic orientation applied to unsup ervised classification of reviews. In Proceedings of 40th Annual Meeting of the Association for Computational Linguistics (ACL'02), pages 417­424, 2002. K. Wagstaff, C. Cardie, S. Rogers, and S. Schroedl. Constrained k-means clustering with background knowledge. In Proceedings of the Eighteenth International Conference on Machine Learning, pages 577­584, 2001. G.-R. Xue, Y. Yu, D. Shen, Q. Yang, H.-J. Zeng, and Z. Chen. Reinforcing web-ob ject categorization through interrelationships. Numb er 12(2-3), 2006. H. Yu and V. Hatzivassiloglou. Towards answering opinion questions: Separating facts from opinions and identifying the p olarity of opinion sentences. In Proceedings of EMNLP 2003, 2003. H.-J. Zeng, Z. Chen, and W.-Y. Ma. A unified framework for clustering heterogeneous web ob jects. In Proceedings of the 3rd International Conference on Web Information Systems Engineering, pages 161­172, 2002. 968