%0 Conference Paper %B 2011 IEEE 27th International Conference on Data Engineering Workshops (ICDEW) %D 2011 %T Declarative analysis of noisy information networks %A Moustafa,W. E %A Namata,G. %A Deshpande, Amol %A Getoor, Lise %K Cleaning %K Data analysis %K data cleaning operations %K data management system %K data mining %K Databases %K Datalog %K declarative analysis %K graph structure %K information networks %K Noise measurement %K noisy information networks %K Prediction algorithms %K semantics %K Syntactics %X There is a growing interest in methods for analyzing data describing networks of all types, including information, biological, physical, and social networks. Typically the data describing these networks is observational, and thus noisy and incomplete; it is often at the wrong level of fidelity and abstraction for meaningful data analysis. This has resulted in a growing body of work on extracting, cleaning, and annotating network data. Unfortunately, much of this work is ad hoc and domain-specific. In this paper, we present the architecture of a data management system that enables efficient, declarative analysis of large-scale information networks. We identify a set of primitives to support the extraction and inference of a network from observational data, and describe a framework that enables a network analyst to easily implement and combine new extraction and analysis techniques, and efficiently apply them to large observation networks. The key insight behind our approach is to decouple, to the extent possible, (a) the operations that require traversing the graph structure (typically the computationally expensive step), from (b) the operations that do the modification and update of the extracted network. We present an analysis language based on Datalog, and show how to use it to cleanly achieve such decoupling. We briefly describe our prototype system that supports these abstractions. We include a preliminary performance evaluation of the system and show that our approach scales well and can efficiently handle a wide spectrum of data cleaning operations on network data. %B 2011 IEEE 27th International Conference on Data Engineering Workshops (ICDEW) %I IEEE %P 106 - 111 %8 2011/04/11/16 %@ 978-1-4244-9195-7 %G eng %R 10.1109/ICDEW.2011.5767619 %0 Conference Paper %B 2011 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC) %D 2011 %T Evaluating visual and statistical exploration of scientific literature networks %A Gove,R. %A Dunne,C. %A Shneiderman, Ben %A Klavans,J. %A Dorr, Bonnie J %K abstracting %K academic literature %K action science explorer %K automatic clustering %K citation analysis %K citation network visualization %K Communities %K Context %K custom exploration goal %K Data visualization %K Databases %K Document filtering %K document handling %K document ranking %K easy-to-understand metrics %K empirical evaluation %K Google %K Graphical user interfaces %K Information filtering %K Information Visualization %K Libraries %K literature exploration %K network statistics %K paper filtering %K paper ranking %K scientific literature network %K statistical exploration %K summarization technique %K user-defined tasks %K visual exploration %K Visualization %X Action Science Explorer (ASE) is a tool designed to support users in rapidly generating readily consumable summaries of academic literature. It uses citation network visualization, ranking and filtering papers by network statistics, and automatic clustering and summarization techniques. We describe how early formative evaluations of ASE led to a mature system evaluation, consisting of an in-depth empirical evaluation with four domain experts. The evaluation tasks were of two types: predefined tasks to test system performance in common scenarios, and user-defined tasks to test the system's usefulness for custom exploration goals. The primary contribution of this paper is a validation of the ASE design and recommendations to provide: easy-to-understand metrics for ranking and filtering documents, user control over which document sets to explore, and overviews of the document set in coordinated views along with details-on-demand of specific papers. We contribute a taxonomy of features for literature search and exploration tools and describe exploration goals identified by our participants. %B 2011 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC) %I IEEE %P 217 - 224 %8 2011/09/18/22 %@ 978-1-4577-1246-3 %G eng %R 10.1109/VLHCC.2011.6070403 %0 Conference Paper %B 2011 18th IEEE International Conference on Image Processing (ICIP) %D 2011 %T Illumination robust dictionary-based face recognition %A Patel, Vishal M. %A Tao Wu %A Biswas,S. %A Phillips,P.J. %A Chellapa, Rama %K albedo %K approximation theory %K classification %K competitive face recognition algorithms %K Databases %K Dictionaries %K Face %K face recognition %K face recognition method %K filtering theory %K human face recognition %K illumination robust dictionary-based face recognition %K illumination variation %K image representation %K learned dictionary %K learning (artificial intelligence) %K lighting %K lighting conditions %K multiple images %K nonstationary stochastic filter %K publicly available databases %K relighting %K relighting approach %K representation error %K residual vectors %K Robustness %K simultaneous sparse approximations %K simultaneous sparse signal representation %K sparseness constraint %K Training %K varying illumination %K vectors %X In this paper, we present a face recognition method based on simultaneous sparse approximations under varying illumination. Our method consists of two main stages. In the first stage, a dictionary is learned for each face class based on given training examples which minimizes the representation error with a sparseness constraint. In the second stage, a test image is projected onto the span of the atoms in each learned dictionary. The resulting residual vectors are then used for classification. Furthermore, to handle changes in lighting conditions, we use a relighting approach based on a non-stationary stochastic filter to generate multiple images of the same person with different lighting. As a result, our algorithm has the ability to recognize human faces with good accuracy even when only a single or a very few images are provided for training. The effectiveness of the proposed method is demonstrated on publicly available databases and it is shown that this method is efficient and can perform significantly better than many competitive face recognition algorithms. %B 2011 18th IEEE International Conference on Image Processing (ICIP) %I IEEE %P 777 - 780 %8 2011/09/11/14 %@ 978-1-4577-1304-0 %G eng %R 10.1109/ICIP.2011.6116670 %0 Conference Paper %B 2011 IEEE International Conference on Automatic Face & Gesture Recognition and Workshops (FG 2011) %D 2011 %T Towards view-invariant expression analysis using analytic shape manifolds %A Taheri, S. %A Turaga,P. %A Chellapa, Rama %K Databases %K Deformable models %K Face %K face recognition %K facial expression analysis %K Geometry %K Gold %K Human-computer interaction %K Manifolds %K projective transformation %K Riemannian interpretation %K SHAPE %K view invariant expression analysis %X Facial expression analysis is one of the important components for effective human-computer interaction. However, to develop robust and generalizable models for expression analysis one needs to break the dependence of the models on the choice of the coordinate frame of the camera i.e. expression models should generalize across facial poses. To perform this systematically, one needs to understand the space of observed images subject to projective transformations. However, since the projective shape-space is cumbersome to work with, we address this problem by deriving models for expressions on the affine shape-space as an approximation to the projective shape-space by using a Riemannian interpretation of deformations that facial expressions cause on different parts of the face. We use landmark configurations to represent facial deformations and exploit the fact that the affine shape-space can be studied using the Grassmann manifold. This representation enables us to perform various expression analysis and recognition algorithms without the need for the normalization as a preprocessing step. We extend some of the available approaches for expression analysis to the Grassmann manifold and experimentally show promising results, paving the way for a more general theory of view-invariant expression analysis. %B 2011 IEEE International Conference on Automatic Face & Gesture Recognition and Workshops (FG 2011) %I IEEE %P 306 - 313 %8 2011/03/21/25 %@ 978-1-4244-9140-7 %G eng %R 10.1109/FG.2011.5771415 %0 Conference Paper %B 2010 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) %D 2010 %T MetaPhyler: Taxonomic profiling for metagenomic sequences %A Liu,Bo %A Gibbons,T. %A Ghodsi,M. %A Pop, Mihai %K Bioinformatics %K CARMA comparison %K Databases %K Genomics %K Linear regression %K marker genes %K matching length %K Megan comparison %K metagenomic sequences %K metagenomics %K MetaPhyler %K microbial diversity %K microorganisms %K molecular biophysics %K molecular configurations %K Pattern classification %K pattern matching %K phylogenetic classification %K Phylogeny %K PhymmBL comparison %K reference gene database %K Sensitivity %K sequence matching %K taxonomic classifier %K taxonomic level %K taxonomic profiling %K whole metagenome sequencing data %X A major goal of metagenomics is to characterize the microbial diversity of an environment. The most popular approach relies on 16S rRNA sequencing, however this approach can generate biased estimates due to differences in the copy number of the 16S rRNA gene between even closely related organisms, and due to PCR artifacts. The taxonomic composition can also be determined from whole-metagenome sequencing data by matching individual sequences against a database of reference genes. One major limitation of prior methods used for this purpose is the use of a universal classification threshold for all genes at all taxonomic levels. We propose that better classification results can be obtained by tuning the taxonomic classifier to each matching length, reference gene, and taxonomic level. We present a novel taxonomic profiler MetaPhyler, which uses marker genes as a taxonomic reference. Results on simulated datasets demonstrate that MetaPhyler outperforms other tools commonly used in this context (CARMA, Megan and PhymmBL). We also present interesting results obtained by applying MetaPhyler to a real metagenomic dataset. %B 2010 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) %I IEEE %P 95 - 100 %8 2010/12/18/21 %@ 978-1-4244-8306-8 %G eng %R 10.1109/BIBM.2010.5706544 %0 Conference Paper %B IEEE 25th International Conference on Data Engineering, 2009. ICDE '09 %D 2009 %T Ef?cient Query Evaluation over Temporally Correlated Probabilistic Streams %A Kanagal,B. %A Deshpande, Amol %K Birds %K Computerized monitoring %K correlated probabilistic streams %K correlation structure %K Data engineering %K data mining %K Databases %K Event detection %K Graphical models %K inference mechanisms %K Markov processes %K polynomial time %K probabilistic database %K probabilistic graphical model %K probabilistic query evaluation %K query evaluation %K query planning algorithm %K query plans %K Query processing %K Random variables %K stream processing operator %K Streaming media %X In this paper, we address the problem of efficient query evaluation over highly correlated probabilistic streams. We observe that although probabilistic streams tend to be strongly correlated in space and time, the correlations are usually quite structured (i.e., the same set of dependencies and independences repeat across time) and Markovian (i.e., the state at time "t+1" is independent of the states at previous times given the state at time "t"). We exploit this observation to compactly encode probabilistic streams by decoupling the correlation structure (the set of dependencies) from the actual probability values. We develop novel stream processing operators that can efficiently and incrementally process new data items; our operators are based on the previously proposed framework of viewing probabilistic query evaluation as inference over probabilistic graphical models (PGMs) [P. Sen and A. Deshpande, 2007]. We develop a query planning algorithm that constructs efficient query plans that are executable in polynomial-time whenever possible, and we characterize queries for which such plans are not possible. Finally we conduct an extensive experimental evaluation that illustrates the advantages of exploiting the structured nature of correlations in probabilistic streams. %B IEEE 25th International Conference on Data Engineering, 2009. ICDE '09 %I IEEE %P 1315 - 1318 %8 2009/04/29/March %@ 978-1-4244-3422-0 %G eng %R 10.1109/ICDE.2009.229 %0 Conference Paper %B IEEE INFOCOM 2009 %D 2009 %T Fighting Spam with the NeighborhoodWatch DHT %A Bender,A. %A Sherwood,R. %A Monner,D. %A Goergen,N. %A Spring, Neil %A Bhattacharjee, Bobby %K Communications Society %K computer crime %K cryptography %K Databases %K IP addresses %K IP networks %K on-line trusted authority %K Peer to peer computing %K peer-to-peer computing %K peer-to-peer distributed hash table %K Postal services %K Relays %K Resilience %K Routing %K Security %K table size routing %K Unsolicited electronic mail %X In this paper, we present DHTBL, an anti-spam blacklist built upon a novel secure distributed hash table (DHT). We show how DHTBL can be used to replace existing DNS-based blacklists (DNSBLs) of IP addresses of mail relays that forward spam. Implementing a blacklist on a DHT improves resilience to DoS attacks and secures message delivery, when compared to DNSBLs. However, due to the sensitive nature of the blacklist, storing the data in a peer-to-peer DHT would invite attackers to infiltrate the system. Typical DHTs can withstand fail-stop failures, but malicious nodes may provide incorrect routing information, refuse to return published items, or simply ignore certain queries. The neighborhoodwatch DHT is resilient to malicious nodes and maintains the O(logiV) bounds on routing table size and expected lookup time. NeighborhoodWatch depends on two assumptions in order to make these guarantees: (1) the existence of an on-line trusted authority that periodically contacts and issues signed certificates to each node, and (2) for every sequence of k + 1 consecutive nodes in the ID space, at least one is alive and non-malicious. We show how NeighborhoodWatch maintains many of its security properties even when the second assumption is violated. Honest nodes in NeighborhoodWatch can detect malicious behavior and expel the responsible nodes from the DHT. %B IEEE INFOCOM 2009 %I IEEE %P 1755 - 1763 %8 2009/04/19/25 %@ 978-1-4244-3512-8 %G eng %R 10.1109/INFCOM.2009.5062095 %0 Conference Paper %B IEEE 24th International Conference on Data Engineering, 2008. ICDE 2008 %D 2008 %T Flow Algorithms for Parallel Query Optimization %A Deshpande, Amol %A Hellerstein,L. %K Casting %K computational complexity %K Cost function %K Databases %K Delay %K distributed environment %K Educational institutions %K flow maximization algorithm %K Interleaved codes %K interoperator parallelism %K minimisation %K multiway join query response time minimization problem %K parallel database %K Parallel databases %K parallel query optimization %K Partitioning algorithms %K pipeline processing %K pipelined parallelism %K polynomial-time algorithm %K query planning problem %K Query processing %K Web service %K Web services %X We address the problem of minimizing the response time of a multi-way join query using pipelined (inter-operator) parallelism, in a parallel or a distributed environment. We observe that in order to fully exploit the parallelism in the system, we must consider a new class of ";interleaving"; plans, where multiple query plans are used simultaneously to minimize the response time of a query (or to maximize the tuple-throughput of the system). We cast the query planning problem in this environment as a ";flow maximization problem";, and present polynomial-time algorithms that (statically) find the optimal set of plans to use for a given query, for a large class of multi-way join queries. Our proposed algorithms also naturally extend to query optimization over web services. Finally we present an extensive experimental evaluation that demonstrates both the need to consider such plans in parallel query processing and the effectiveness of our algorithms. %B IEEE 24th International Conference on Data Engineering, 2008. ICDE 2008 %I IEEE %P 754 - 763 %8 2008/04/07/12 %@ 978-1-4244-1836-7 %G eng %R 10.1109/ICDE.2008.4497484 %0 Journal Article %J Computer %D 2007 %T A Language for Human Action %A Guerra-Filho,G. %A Aloimonos, J. %K anthropocentric system %K artificial agent %K Artificial intelligence %K cognitive model %K Concrete %K Databases %K human action %K human activity language %K Human computer interaction %K human factors %K human sensory-motor skill %K human-centered computing %K human-machine interaction %K HUMANS %K Intelligent sensors %K linguistic framework %K linguistics %K Mirrors %K Morphology %K natural language %K natural languages %K Neurons %K Power system modeling %K user interface %K User interfaces %X Human-centered computing (HCC) involves conforming computer technology to humans while naturally achieving human-machine interaction. In a human-centered system, the interaction focuses on human requirements, capabilities, and limitations. These anthropocentric systems also focus on the consideration of human sensory-motor skills in a wide range of activities. This ensures that the interface between artificial agents and human users accounts for perception and action in a novel interaction paradigm. In turn, this leads to behavior understanding through cognitive models that allow content description and, ultimately, the integration of real and virtual worlds. Our work focuses on building a language that maps to the lower-level sensory and motor languages and to the higher-level natural language. An empirically demonstrated human activity language provides sensory-motor-grounded representations for understanding human actions. A linguistic framework allows the analysis and synthesis of these actions. %B Computer %V 40 %P 42 - 51 %8 2007/05// %@ 0018-9162 %G eng %N 5 %R 10.1109/MC.2007.154 %0 Conference Paper %B Proceedings of the 22nd International Conference on Data Engineering, 2006. ICDE '06 %D 2006 %T Approximate Data Collection in Sensor Networks using Probabilistic Models %A Chu,D. %A Deshpande, Amol %A Hellerstein,J. M %A Wei Hong %K Batteries %K Biological system modeling %K Biosensors %K data mining %K Databases %K Energy consumption %K Intelligent networks %K Intelligent sensors %K Robustness %K Wireless sensor networks %X Wireless sensor networks are proving to be useful in a variety of settings. A core challenge in these networks is to minimize energy consumption. Prior database research has proposed to achieve this by pushing data-reducing operators like aggregation and selection down into the network. This approach has proven unpopular with early adopters of sensor network technology, who typically want to extract complete "dumps" of the sensor readings, i.e., to run "SELECT *" queries. Unfortunately, because these queries do no data reduction, they consume significant energy in current sensornet query processors. In this paper we attack the "SELECT " problem for sensor networks. We propose a robust approximate technique called Ken that uses replicated dynamic probabilistic models to minimize communication from sensor nodes to the network’s PC base station. In addition to data collection, we show that Ken is well suited to anomaly- and event-detection applications. A key challenge in this work is to intelligently exploit spatial correlations across sensor nodes without imposing undue sensor-to-sensor communication burdens to maintain the models. Using traces from two real-world sensor network deployments, we demonstrate that relatively simple models can provide significant communication (and hence energy) savings without undue sacrifice in result quality or frequency. Choosing optimally among even our simple models is NPhard, but our experiments show that a greedy heuristic performs nearly as well as an exhaustive algorithm. %B Proceedings of the 22nd International Conference on Data Engineering, 2006. ICDE '06 %I IEEE %P 48 - 48 %8 2006/04/03/07 %@ 0-7695-2570-9 %G eng %R 10.1109/ICDE.2006.21 %0 Conference Paper %B 16th International Conference on Scientific and Statistical Database Management, 2004. Proceedings %D 2004 %T Exploiting multiple paths to express scientific queries %A Lacroix,Z. %A Moths,T. %A Parekh,K. %A Raschid, Louiqa %A Vidal,M. -E %K access protocols %K biology computing %K BioNavigation system %K complex queries %K Costs %K Data analysis %K data handling %K Data visualization %K data warehouse %K Data warehouses %K Databases %K diseases %K distributed databases %K hard-coded scripts %K information resources %K Information retrieval %K mediation-based data integration system %K multiple paths %K query evaluation %K Query processing %K scientific data collection %K scientific discovery %K scientific information %K scientific information systems %K scientific object of interest %K scientific queries %K sequences %K Web resources %X The purpose of this demonstration is to present the main features of the BioNavigation system. Scientific data collection needed in various stages of scientific discovery is typically performed manually. For each scientific object of interest (e.g., a gene, a sequence), scientists query a succession of Web resources following links between retrieved entries. Each of the steps provides part of the intended characterization of the scientific object. This process is sometimes partially supported by hard-coded scripts or complex queries that will be evaluated by a mediation-based data integration system or against a data warehouse. These approaches fail in guiding the scientists during the collection process. In contrast, the BioNavigation approach presented in the paper provides the scientists with information on the available alternative resources, their provenance, and the costs of data collection. The BioNavigation system enhances a mediation-based integration system and provides scientists with support for the following: to ask queries at a high conceptual level; to visualize the multiple alternative resources that may be exploited to execute their data collection queries; to choose the final execution path to evaluate their queries. %B 16th International Conference on Scientific and Statistical Database Management, 2004. Proceedings %I IEEE %P 357 - 360 %8 2004/06/21/23 %@ 0-7695-2146-0 %G eng %R 10.1109/SSDM.2004.1311231 %0 Conference Paper %B 19th International Conference on Data Engineering, 2003. Proceedings %D 2003 %T Using state modules for adaptive query processing %A Vijayshankar Raman %A Deshpande, Amol %A Hellerstein,J. M %K adaptive query processing %K Bandwidth %K Calibration %K data encapsulation %K data structure %K Data structures %K Databases %K Dictionaries %K eddy routing %K eddy routing operator %K Encapsulation %K join operator %K multiple algorithm automatic hybridization %K multiple competing join algorithm %K query architecture %K Query processing %K query spanning tree %K Routing %K routing policy %K Runtime %K shared materialization point %K State Module %K SteMs %K Telegraph dataflow system %K Telegraphy %K Tree data structures %X We present a query architecture in which join operators are decomposed into their constituent data structures (State Modules, or SteMs), and dataflow among these SteMs is managed adaptively by an eddy routing operator [R. Avnur et al., (2000)]. Breaking the encapsulation of joins serves two purposes. First, it allows the eddy to observe multiple physical operations embedded in a join algorithm, allowing for better calibration and control of these operations. Second, the SteM on a relation serves as a shared materialization point, enabling multiple competing access methods to share results, which can be leveraged by multiple competing join algorithms. Our architecture extends prior work significantly, allowing continuously adaptive decisions for most major aspects of traditional query optimization: choice of access methods and join algorithms, ordering of operators, and choice of a query spanning tree. SteMs introduce significant routing flexibility to the eddy, enabling more opportunities for adaptation, but also introducing the possibility of incorrect query results. We present constraints on eddy routing through SteMs that ensure correctness while preserving a great deal of flexibility. We also demonstrate the benefits of our architecture via experiments in the Telegraph dataflow system. We show that even a simple routing policy allows significant flexibility in adaptation, including novel effects like automatic "hybridization " of multiple algorithms for a single join. %B 19th International Conference on Data Engineering, 2003. Proceedings %I IEEE %P 353 - 364 %8 2003/03/05/8 %@ 0-7803-7665-X %G eng %R 10.1109/ICDE.2003.1260805 %0 Conference Paper %B IEEE/RSJ International Conference on Intelligent Robots and Systems, 2002 %D 2002 %T Contour migration: solving object ambiguity with shape-space visual guidance %A Abd-Almageed, Wael %A Smith,C.E. %K Artificial intelligence %K camera motion %K CAMERAS %K Computer vision %K contour migration %K Databases %K edge detection %K Intelligent robots %K Laboratories %K Machine vision %K object ambiguity %K Object recognition %K pattern matching %K Robot vision systems %K servomechanisms %K SHAPE %K shape matching %K shape-space visual guidance %K silhouette matching %K visual servoing %X A fundamental problem in computer vision is the issue of shape ambiguity. Simply stated, a silhouette cannot uniquely identify an object or an object's classification since many unique objects can present identical occluding contours. This problem has no solution in the general case for a monocular vision system. This paper presents a method for disambiguating objects during silhouette matching using a visual servoing system. This method identifies the camera motion(s) that gives disambiguating views of the objects. These motions are identified through a new technique called contour migration. The occluding contour's shape is used to identify objects or object classes that are potential matches for that shape. A contour migration is then determined that disambiguates the possible matches by purposive viewpoint adjustment. The technique is demonstrated using an example set of objects. %B IEEE/RSJ International Conference on Intelligent Robots and Systems, 2002 %I IEEE %V 1 %P 330- 335 vol.1 - 330- 335 vol.1 %8 2002/// %@ 0-7803-7398-7 %G eng %R 10.1109/IRDS.2002.1041410 %0 Conference Paper %B Fifth IEEE International Conference on Automatic Face and Gesture Recognition, 2002. Proceedings %D 2002 %T Smiling faces are better for face recognition %A Yacoob,Yaser %A Davis, Larry S. %K between-class scatter matrices %K Databases %K discrimination power measure %K dynamic scenarios %K expressive faces %K face recognition %K facial expressions %K performance %K performance differences %K smiling faces %K software performance evaluation %K Training %K visual databases %K within-class scatter matrices %X This paper investigates face recognition during facial expressions. While face expressions have been treated as an adverse factor in standard face recognition approaches, our research suggests that, if a system has a choice in the selection of faces to use in training and recognition, its best performance would be obtained on faces displaying expressions. Naturally, smiling faces are the most prevalent (among expressive faces) for both training and recognition in dynamic scenarios. We employ a measure of discrimination power that is computed from between-class and within-class scatter matrices. Two databases are used to show the performance differences on different sets of faces %B Fifth IEEE International Conference on Automatic Face and Gesture Recognition, 2002. Proceedings %I IEEE %P 52 - 57 %8 2002/05// %@ 0-7695-1602-5 %G eng %R 10.1109/AFGR.2002.1004132 %0 Conference Paper %B Thirteenth International Conference on Scientific and Statistical Database Management, 2001. SSDBM 2001. Proceedings %D 2001 %T Integrating distributed scientific data sources with MOCHA and XRoaster %A Rodriguez-Martinez,M. %A Roussopoulos, Nick %A McGann,J. M %A Kelley,S. %A Mokwa,J. %A White,B. %A Jala,J. %K client-server systems %K data sets %K data sites %K Databases %K Distributed computing %K distributed databases %K distributed scientific data source integration %K Educational institutions %K graphical tool %K hypermedia markup languages %K IP networks %K java %K Large-scale systems %K Maintenance engineering %K meta data %K metadata %K Middleware %K middleware system %K MOCHA %K Query processing %K remote sites %K scientific information systems %K user-defined types %K visual programming %K XML %K XML metadata elements %K XML-based framework %K XRoaster %X MOCHA is a novel middleware system for integrating distributed data sources that we have developed at the University of Maryland. MOCHA is based on the idea that the code that implements user-defined types and functions should be automatically deployed to remote sites by the middleware system itself. To this end, we have developed an XML-based framework to specify metadata about data sites, data sets, and user-defined types and functions. XRoaster is a graphical tool that we have developed to help the user create all the XML metadata elements to be used in MOCHA %B Thirteenth International Conference on Scientific and Statistical Database Management, 2001. SSDBM 2001. Proceedings %I IEEE %P 263 - 266 %8 2001/// %@ 0-7695-1218-6 %G eng %R 10.1109/SSDM.2001.938560 %0 Conference Paper %B IEEE International Conference on Information Visualization, 2000. Proceedings %D 2000 %T Direct annotation: a drag-and-drop strategy for labeling photos %A Shneiderman, Ben %A Kang,H. %K Biomedical imaging %K Cities and towns %K Computer errors %K Computer science %K database indexing %K database schema %K Databases %K digital libraries %K direct annotation %K direct manipulation %K drag-and-drop strategy %K Educational institutions %K graphical user interface %K Graphical user interfaces %K History %K hobby computing %K image searching %K label placement %K Labeling %K Laboratories %K Libraries %K personal information systems %K personal names database %K PhotoFinder prototype %K photograph labelling %K photographic libraries %K Photography %K scrolling list %K user interface design %K visual databases %X Annotating photographs is such a time-consuming, tedious and error-prone data entry task that it discourages most owners of personal photo libraries. By allowing the user to drag labels, such as personal names, from a scrolling list and drop them onto a photo, we believe we can make the task faster, easier and more appealing. Since the names are entered in a database, searching for all photos of a friend or family member is dramatically simplified. We describe the user interface design and the database schema to support direct annotation, as implemented in our PhotoFinder prototype %B IEEE International Conference on Information Visualization, 2000. Proceedings %I IEEE %P 88 - 95 %8 2000/// %@ 0-7695-0743-3 %G eng %R 10.1109/IV.2000.859742 %0 Conference Paper %B 3rd IFCIS International Conference on Cooperative Information Systems, 1998. Proceedings %D 1998 %T Wrapper generation for Web accessible data sources %A Gruser,J. %A Raschid, Louiqa %A Vidal,M. E %A Bright,L. %K application program interfaces %K data mining %K Databases %K Educational institutions %K Electrical capacitance tomography %K HTML %K HTML documents %K Internet %K Query processing %K Read only memory %K Search engines %K Specification languages %K Uniform resource locators %K World Wide Web %K wrapper generation toolkit %K WWW %X There is an increase in the number of data sources that can be queried across the WWW. Such sources typically support HTML forms-based interfaces and search engines query collections of suitably indexed data. The data is displayed via a browser: One drawback to these sources is that there is no standard programming interface suitable for applications to submit queries. Second, the output (answer to a query) is not well structured. Structured objects have to be extracted from the HTML documents which contain irrelevant data and which may be volatile. Third, domain knowledge about the data source is also embedded in HTML documents and must be extracted. To solve these problems, we present technology to define and (automatically) generate wrappers for Web accessible sources. Our contributions are as follows: (1) Defining a wrapper interface to specify the capability of Web accessible data sources. (2) Developing a wrapper generation toolkit of graphical interfaces and specification languages to specify the capability of sources and the functionality of the wrapper (3) Developing the technology to automatically generate a wrapper appropriate to the Web accessible source, from the specifications. %B 3rd IFCIS International Conference on Cooperative Information Systems, 1998. Proceedings %I IEEE %P 14 - 23 %8 1998/08/22/22 %@ 0-8186-8380-5 %G eng %R 10.1109/COOPIS.1998.706180 %0 Journal Article %J IEEE Software %D 1997 %T A framework for search interfaces %A Shneiderman, Ben %K Abstracts %K Cities and towns %K Databases %K Delay %K design practice %K four-phase framework %K frequent users %K Information retrieval %K Information services %K Libraries %K multimedia libraries %K online front-ends %K popular search systems %K predictable design %K Protection %K relevance ranked list %K search interfaces %K search results %K stand alone systems %K textual database searching %K Thesauri %K User interfaces %K user performance %K word processing %K World Wide Web %X Searching textual databases can be confusing for users. Popular search systems for the World Wide Web and stand alone systems typically provide a simple interface: users type in keywords and receive a relevance ranked list of 10 results. This is appealing in its simplicity, but users are often frustrated because search results are confusing or aspects of the search are out of their control. If we are to improve user performance, reduce mistaken assumptions, and increase successful searches, we need more predictable design. To coordinate design practice, we suggest a four-phase framework that would satisfy first time, intermittent, and frequent users accessing a variety of textual and multimedia libraries %B IEEE Software %V 14 %P 18 - 20 %8 1997/04//Mar %@ 0740-7459 %G eng %N 2 %R 10.1109/52.582969 %0 Conference Paper %B Proceedings of the Fourth Workshop on Enabling Technologies: Infrastructure for Collaborative Enterprises, 1995 %D 1995 %T Organization overviews and role management: inspiration for future desktop environments %A Plaisant, Catherine %A Shneiderman, Ben %K Asia %K bank data processing %K Databases %K Environmental economics %K Environmental management %K future desktop environments %K Graphical user interfaces %K human resource management %K management information systems %K Management training %K multiple personal roles %K office automation %K organization overviews %K personnel %K Project management %K Prototypes %K role management %K role-centered approach %K scheduling %K semi-automated searches %K User interfaces %K Utility programs %K World Bank %X In our exploration of future work environments for the World Bank we proposed two concepts. First, organization overviews provide a consistent support to present the results of a variety of manual or semi-automated searches. Second this view can be adapted or expanded for each class of users to finally map the multiple personal roles an individual has in an organization. After command line interfaces, graphical user interfaces, and the current “docu-centric” designs, a natural direction is towards a role-centered approach where we believe the emphasis is on the management of those multiple roles. Large visual overviews of the organization can be rapidly manipulated and zoomed in on to reveal the multiple roles each individual plays. Each role involves coordination with groups of people and accomplishment of tasks within a schedule %B Proceedings of the Fourth Workshop on Enabling Technologies: Infrastructure for Collaborative Enterprises, 1995 %I IEEE %P 14 - 22 %8 1995/04/20/22 %@ 0-8186-7019-3 %G eng %R 10.1109/ENABL.1995.484544 %0 Journal Article %J IEEE Software %D 1994 %T Dynamic queries for visual information seeking %A Shneiderman, Ben %K Algorithm design and analysis %K animated results %K animation %K Application software %K Command languages %K complex queries %K database management systems %K Databases %K display algorithms %K Displays %K dynamic queries %K Educational institutions %K Games %K Graphical user interfaces %K human factors %K Query processing %K retrieval %K Runtime %K Technological innovation %K user-interface design %K visual databases %K visual information seeking %K visual interfaces %K widgets %X Considers how dynamic queries allow users to "fly through" databases by adjusting widgets and viewing the animated results. In studies, users reacted to this approach with an enthusiasm more commonly associated with video games. Adoption requires research into retrieval and display algorithms and user-interface design. The author discusses how experts may benefit from visual interfaces because they will be able to formulate more complex queries and interpret intricate results.<> %B IEEE Software %V 11 %P 70 - 77 %8 1994/11// %@ 0740-7459 %G eng %N 6 %R 10.1109/52.329404 %0 Conference Paper %B International Conference on Application Specific Array Processors, 1994. Proceedings %D 1994 %T A SIMD solution to the sequence comparison problem on the MGAP %A Borah,M. %A Bajwa,R. S %A Hannenhalli, Sridhar %A Irwin,M. J %K AT-optimal algorithm %K Biological information theory %K biology computing %K biosequence comparison problem %K computational complexity %K Computer science %K Costs %K database size %K Databases %K DNA computing %K dynamic programming %K dynamic programming algorithms %K fine-grained massively parallel processor array %K Genetics %K Heuristic algorithms %K maximally similar sequence %K MGAP parallel computer %K Micro-Grain Array Processor %K Military computing %K molecular biology %K molecular biophysics %K Nearest neighbor searches %K nearest-neighbor connections %K Parallel algorithms %K pipeline processing %K pipelined SIMD solution %K sequence alignment problem %K sequences %X Molecular biologists frequently compare an unknown biosequence with a set of other known biosequences to find the sequence which is maximally similar, with the hope that what is true of one sequence, either physically or functionally, could be true of its analogue. Even though efficient dynamic programming algorithms exist for the problem, when the size of the database is large, the time required is quite long, even for moderate length sequences. In this paper, we present an efficient pipelined SIMD solution to the sequence alignment problem on the Micro-Grain Array Processor (MGAP), a fine-grained massively parallel array of processors with nearest-neighbor connections. The algorithm compares K sequences of length O(M) with the actual sequence of length N, in O(M+N+K) time with O(MN) processors, which is AT-optimal. The implementation on the MGAP computes at the rate of about 0.1 million comparisons per second for sequences of length 128 %B International Conference on Application Specific Array Processors, 1994. Proceedings %I IEEE %P 336 - 345 %8 1994/08/22/24 %@ 0-8186-6517-3 %G eng %R 10.1109/ASAP.1994.331791 %0 Journal Article %J IEEE Transactions on Software Engineering %D 1991 %T Trie hashing with controlled load %A Litwin,W. A %A Roussopoulos, Nick %A Levy,G. %A Hong,W. %K B-tree file %K Computer science %K controlled load %K Databases %K disk access %K dynamic files %K file organisation %K high load factor %K information retrieval systems %K key search %K load factor %K Military computing %K ordered insertions %K Predictive models %K primary key access method %K Protocols %K random insertions %K TH file %K THCL %K Tree data structures %K trees (mathematics) %K trie hashing %X Trie hashing (TH), a primary key access method for storing and accessing records of dynamic files, is discussed. The key address is computed through a trie. A key search usually requires only one disk access when the trie is in core and two disk accesses for very large files when the trie must be on disk. A refinement to trie hashing, trie hashing with controlled load (THCL), is presented. It is designed to control the load factor of a TH file as tightly as that of a B-tree file, allows high load factor of up to 100% for ordered insertions, and increases the load factor for random insertions from 70% to over 85%. It is shown that these properties make trie hashing preferable to a B-tree %B IEEE Transactions on Software Engineering %V 17 %P 678 - 691 %8 1991/07// %@ 0098-5589 %G eng %N 7 %R 10.1109/32.83904 %0 Journal Article %J IEEE Transactions on Software Engineering %D 1982 %T The Logical Access Path Schema of a Database %A Roussopoulos, Nick %K Aggregation hierarchy %K Calculus %K Computer science %K Data structures %K Databases %K Design optimization %K external logical subschema %K generalization hierarchy %K Information retrieval %K Joining processes %K logical access path %K propositional calculus %K views %X A new schema which models the usage of the logical access paths of the database is proposed. The schema models all database activities (i.e., retrievals and updates), and integrates their logical access paths by recognizing common subpaths and increasing the "weight" of the shared subpaths. The logical access path schema provides a comprehensive picture of the logical access paths, and the cumulative usage of the shared subpaths and/or intermediate results. The schema serves a dual purpose. Firstly, it is used as a model of the access requirements during the database design, and secondly, as the basis for optimization during the operation of the database. %B IEEE Transactions on Software Engineering %V SE-8 %P 563 - 573 %8 1982/11// %@ 0098-5589 %G eng %N 6 %R 10.1109/TSE.1982.235886 %0 Conference Paper %B Fifth International Conference on Very Large Data Bases, 1979 %D 1979 %T Database Program Conversion: A Framework For Research %A Taylor,R. W %A Fry,J. P %A Shneiderman, Ben %A Smith,D. C.P %A Su,S. Y.W %K Application software %K Costs %K Data conversion %K Data structures %K Databases %K Delay %K Prototypes %K Technology planning %K US Government %K Writing %B Fifth International Conference on Very Large Data Bases, 1979 %I IEEE %P 299 - 312 %8 1979/10/03/5 %G eng %R 10.1109/VLDB.1979.718145