%0 Journal Article %J Interacting with Computers %D 2012 %T Querying event sequences by exact match or similarity search: Design and empirical evaluation %A Wongsuphasawat,Krist %A Plaisant, Catherine %A Taieb-Maimon,Meirav %A Shneiderman, Ben %K Event sequence %K Similan %K similarity measure %K Similarity Search %K temporal categorical data %K Temporal query interface %X Specifying event sequence queries is challenging even for skilled computer professionals familiar with SQL. Most graphical user interfaces for database search use an exact match approach, which is often effective, but near misses may also be of interest. We describe a new similarity search interface, in which users specify a query by simply placing events on a blank timeline and retrieve a similarity-ranked list of results. Behind this user interface is a new similarity measure for event sequences which the users can customize by four decision criteria, enabling them to adjust the impact of missing, extra, or swapped events or the impact of time shifts. We describe a use case with Electronic Health Records based on our ongoing collaboration with hospital physicians. A controlled experiment with 18 participants compared exact match and similarity search interfaces. We report on the advantages and disadvantages of each interface and suggest a hybrid interface combining the best of both. %B Interacting with Computers %V 24 %P 55 - 68 %8 2012/03// %@ 0953-5438 %G eng %U http://www.sciencedirect.com/science/article/pii/S0953543812000124 %N 2 %R 10.1016/j.intcom.2012.01.003 %0 Conference Paper %B IEEE Symposium on Visual Analytics Science and Technology, 2009. VAST 2009 %D 2009 %T Finding comparable temporal categorical records: A similarity measure with an interactive visualization %A Wongsuphasawat,K. %A Shneiderman, Ben %K data visualisation %K Educational institutions %K Feedback %K Information retrieval %K interactive search tool %K interactive systems %K interactive visualization tool %K large databases %K M&M Measure %K Match & Mismatch measure %K Medical services %K numerical time series %K parameters customization %K Particle measurements %K Similan %K similarity measure %K Similarity Search %K temporal categorical databases %K Temporal Categorical Records %K temporal databases %K Testing %K Time measurement %K time series %K transportation %K usability %K very large databases %K visual databases %K Visualization %X An increasing number of temporal categorical databases are being collected: Electronic Health Records in healthcare organizations, traffic incident logs in transportation systems, or student records in universities. Finding similar records within these large databases requires effective similarity measures that capture the searcher's intent. Many similarity measures exist for numerical time series, but temporal categorical records are different. We propose a temporal categorical similarity measure, the M&M (Match & Mismatch) measure, which is based on the concept of aligning records by sentinel events, then matching events between the target and the compared records. The M&M measure combines the time differences between pairs of events and the number of mismatches. To accom-modate customization of parameters in the M&M measure and results interpretation, we implemented Similan, an interactive search and visualization tool for temporal categorical records. A usability study with 8 participants demonstrated that Similan was easy to learn and enabled them to find similar records, but users had difficulty understanding the M&M measure. The usability study feedback, led to an improved version with a continuous timeline, which was tested in a pilot study with 5 participants. %B IEEE Symposium on Visual Analytics Science and Technology, 2009. VAST 2009 %I IEEE %P 27 - 34 %8 2009/10/12/13 %@ 978-1-4244-5283-5 %G eng %R 10.1109/VAST.2009.5332595 %0 Conference Paper %B Proceedings of the twenty-fourth annual symposium on Computational geometry %D 2008 %T Embedding and similarity search for point sets under translation %A Cho,Minkyoung %A Mount, Dave %K EMBEDDING %K point pattern matching %K randomized algorithms %K Similarity Search %X Pattern matching in point sets is a well studied problem with numerous applications. We assume that the point sets may contain outliers (missing or spurious points) and are subject to an unknown translation. We define the distance between any two point sets to be the minimum size of their symmetric difference over all translations of one set relative to the other. We consider the problem in the context of similarity search. We assume that a large database of point sets is to be preprocessed so that given any query point set, the closest matches in the database can be computed efficiently. Our approach is based on showing that there is a randomized algorithm that computes a translation-invariant embedding of any point set of size at most n into the L_1 metric, so that with high probability, distances are subject to a distortion that is O(log2 n). %B Proceedings of the twenty-fourth annual symposium on Computational geometry %S SCG '08 %I ACM %C New York, NY, USA %P 320 - 327 %8 2008/// %@ 978-1-60558-071-5 %G eng %U http://doi.acm.org/10.1145/1377676.1377731 %R 10.1145/1377676.1377731