TY - CONF T1 - Selecting hierarchical clustering cut points for web person-name disambiguation T2 - Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval Y1 - 2009 A1 - Gong,Jun A1 - Oard, Douglas KW - clustering KW - person-name disambiguation AB - Hierarchical clustering is often used to cluster person-names referring to the same entities. Since the correct number of clusters for a given person-name is not known a priori, some way of deciding where to cut the resulting dendrogram to balance risks of over- or under-clustering is needed. This paper reports on experiments in which outcome-specific and result-set measures are used to learn a global similarity threshold. Results on the Web People Search (WePS)-2 task indicate that approximately 85% of the optimal F1 measure can be achieved on held-out data. JA - Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval T3 - SIGIR '09 PB - ACM CY - New York, NY, USA SN - 978-1-60558-483-6 UR - http://doi.acm.org/10.1145/1571941.1572124 M3 - 10.1145/1571941.1572124 ER -