TY - JOUR T1 - Querying event sequences by exact match or similarity search: Design and empirical evaluation JF - Interacting with Computers Y1 - 2012 A1 - Wongsuphasawat,Krist A1 - Plaisant, Catherine A1 - Taieb-Maimon,Meirav A1 - Shneiderman, Ben KW - Event sequence KW - Similan KW - similarity measure KW - Similarity Search KW - temporal categorical data KW - Temporal query interface AB - Specifying event sequence queries is challenging even for skilled computer professionals familiar with SQL. Most graphical user interfaces for database search use an exact match approach, which is often effective, but near misses may also be of interest. We describe a new similarity search interface, in which users specify a query by simply placing events on a blank timeline and retrieve a similarity-ranked list of results. Behind this user interface is a new similarity measure for event sequences which the users can customize by four decision criteria, enabling them to adjust the impact of missing, extra, or swapped events or the impact of time shifts. We describe a use case with Electronic Health Records based on our ongoing collaboration with hospital physicians. A controlled experiment with 18 participants compared exact match and similarity search interfaces. We report on the advantages and disadvantages of each interface and suggest a hybrid interface combining the best of both. VL - 24 SN - 0953-5438 UR - http://www.sciencedirect.com/science/article/pii/S0953543812000124 CP - 2 M3 - 10.1016/j.intcom.2012.01.003 ER - TY - CONF T1 - Finding comparable temporal categorical records: A similarity measure with an interactive visualization T2 - IEEE Symposium on Visual Analytics Science and Technology, 2009. VAST 2009 Y1 - 2009 A1 - Wongsuphasawat,K. A1 - Shneiderman, Ben KW - data visualisation KW - Educational institutions KW - Feedback KW - Information retrieval KW - interactive search tool KW - interactive systems KW - interactive visualization tool KW - large databases KW - M&M Measure KW - Match & Mismatch measure KW - Medical services KW - numerical time series KW - parameters customization KW - Particle measurements KW - Similan KW - similarity measure KW - Similarity Search KW - temporal categorical databases KW - Temporal Categorical Records KW - temporal databases KW - Testing KW - Time measurement KW - time series KW - transportation KW - usability KW - very large databases KW - visual databases KW - Visualization AB - An increasing number of temporal categorical databases are being collected: Electronic Health Records in healthcare organizations, traffic incident logs in transportation systems, or student records in universities. Finding similar records within these large databases requires effective similarity measures that capture the searcher's intent. Many similarity measures exist for numerical time series, but temporal categorical records are different. We propose a temporal categorical similarity measure, the M&M (Match & Mismatch) measure, which is based on the concept of aligning records by sentinel events, then matching events between the target and the compared records. The M&M measure combines the time differences between pairs of events and the number of mismatches. To accom-modate customization of parameters in the M&M measure and results interpretation, we implemented Similan, an interactive search and visualization tool for temporal categorical records. A usability study with 8 participants demonstrated that Similan was easy to learn and enabled them to find similar records, but users had difficulty understanding the M&M measure. The usability study feedback, led to an improved version with a continuous timeline, which was tested in a pilot study with 5 participants. JA - IEEE Symposium on Visual Analytics Science and Technology, 2009. VAST 2009 PB - IEEE SN - 978-1-4244-5283-5 M3 - 10.1109/VAST.2009.5332595 ER - TY - CONF T1 - Embedding and similarity search for point sets under translation T2 - Proceedings of the twenty-fourth annual symposium on Computational geometry Y1 - 2008 A1 - Cho,Minkyoung A1 - Mount, Dave KW - EMBEDDING KW - point pattern matching KW - randomized algorithms KW - Similarity Search AB - Pattern matching in point sets is a well studied problem with numerous applications. We assume that the point sets may contain outliers (missing or spurious points) and are subject to an unknown translation. We define the distance between any two point sets to be the minimum size of their symmetric difference over all translations of one set relative to the other. We consider the problem in the context of similarity search. We assume that a large database of point sets is to be preprocessed so that given any query point set, the closest matches in the database can be computed efficiently. Our approach is based on showing that there is a randomized algorithm that computes a translation-invariant embedding of any point set of size at most n into the L_1 metric, so that with high probability, distances are subject to a distortion that is O(log2 n). JA - Proceedings of the twenty-fourth annual symposium on Computational geometry T3 - SCG '08 PB - ACM CY - New York, NY, USA SN - 978-1-60558-071-5 UR - http://doi.acm.org/10.1145/1377676.1377731 M3 - 10.1145/1377676.1377731 ER -