TY - CONF T1 - Decentralized, accurate, and low-cost network bandwidth prediction T2 - 2011 Proceedings IEEE INFOCOM Y1 - 2011 A1 - Sukhyun Song A1 - Keleher,P. A1 - Bhattacharjee, Bobby A1 - Sussman, Alan KW - accuracy KW - approximate tree metric space KW - Bandwidth KW - bandwidth allocation KW - bandwidth measurement KW - decentralized low cost system KW - distributed tree KW - end-to-end prediction KW - Extraterrestrial measurements KW - Internet KW - low-cost network bandwidth prediction KW - Measurement uncertainty KW - pairwise bandwidth KW - Peer to peer computing KW - Prediction algorithms KW - trees (mathematics) AB - The distributed nature of modern computing makes end-to-end prediction of network bandwidth increasingly important. Our work is inspired by prior work that treats the Internet and bandwidth as an approximate tree metric space. This paper presents a decentralized, accurate, and low cost system that predicts pairwise bandwidth between hosts. We describe an algorithm to construct a distributed tree that embeds bandwidth measurements. The correctness of the algorithm is provable when driven by precise measurements. We then describe three novel heuristics that achieve high accuracy for predicting bandwidth even with imprecise input data. Simulation experiments with a real-world dataset confirm that our approach shows high accuracy with low cost. JA - 2011 Proceedings IEEE INFOCOM PB - IEEE SN - 978-1-4244-9919-9 M3 - 10.1109/INFCOM.2011.5935251 ER - TY - CONF T1 - TreeVersity: Comparing tree structures by topology and node's attributes differences T2 - 2011 IEEE Conference on Visual Analytics Science and Technology (VAST) Y1 - 2011 A1 - Gomez,J.A.G. A1 - Buck-Coleman,A. A1 - Plaisant, Catherine A1 - Shneiderman, Ben KW - Computer science KW - data classification KW - Data visualization KW - Educational institutions KW - hierarchy KW - Image color analysis KW - LifeFlow KW - node attributes differences KW - Pattern classification KW - structural changes KW - Topology KW - topology attributes differences KW - traffic agencies KW - tree structures comparison KW - trees (mathematics) KW - TreeVersity KW - Vegetation KW - Visualization AB - It is common to classify data in hierarchies, they provide a comprehensible way of understanding big amounts of data. From budgets to organizational charts or even the stock market, trees are everywhere and people find them easy to use. However when analysts need to compare two versions of the same tree structure, or two related taxonomies, the task is not so easy. Much work has been done on this topic, but almost all of it has been restricted to either compare the trees by topology, or by the node attribute values. With this project we are proposing TreeVersity, a framework for comparing tree structures, both by structural changes and by differences in the node attributes. This paper is based on our previous work on comparing traffic agencies using LifeFlow [1, 2] and on a first prototype of TreeVersity. JA - 2011 IEEE Conference on Visual Analytics Science and Technology (VAST) PB - IEEE SN - 978-1-4673-0015-5 M3 - 10.1109/VAST.2011.6102471 ER - TY - CONF T1 - On Computing Compression Trees for Data Collection in Wireless Sensor Networks T2 - 2010 Proceedings IEEE INFOCOM Y1 - 2010 A1 - Li,Jian A1 - Deshpande, Amol A1 - Khuller, Samir KW - Approximation algorithms KW - Base stations KW - Communications Society KW - Computer networks KW - Computer science KW - computing compression trees KW - Costs KW - data collection KW - Data communication KW - data compression KW - designing algorithms KW - Educational institutions KW - Entropy KW - graph concept KW - Monitoring KW - Protocols KW - trees (mathematics) KW - weakly connected dominating sets KW - Wireless sensor networks AB - We address the problem of efficiently gathering correlated data from a wireless sensor network, with the aim of designing algorithms with provable optimality guarantees, and understanding how close we can get to the known theoretical lower bounds. Our proposed approach is based on finding an optimal or a near-optimal compression tree for a given sensor network: a compression tree is a directed tree over the sensor network nodes such that the value of a node is compressed using the value of its parent. We focus on broadcast communication model in this paper, but our results are more generally applicable to a unicast communication model as well. We draw connections between the data collection problem and a previously studied graph concept called weakly connected dominating sets, and we use this to develop novel approximation algorithms for the problem. We present comparative results on several synthetic and real-world datasets showing that our algorithms construct near-optimal compression trees that yield a significant reduction in the data collection cost. JA - 2010 Proceedings IEEE INFOCOM PB - IEEE SN - 978-1-4244-5836-3 M3 - 10.1109/INFCOM.2010.5462035 ER - TY - CONF T1 - Minimizing Communication Cost in Distributed Multi-query Processing T2 - IEEE 25th International Conference on Data Engineering, 2009. ICDE '09 Y1 - 2009 A1 - Li,Jian A1 - Deshpande, Amol A1 - Khuller, Samir KW - Approximation algorithms KW - Communication networks KW - Computer science KW - Cost function KW - Data engineering KW - distributed communication network KW - distributed databases KW - distributed multi-query processing KW - grid computing KW - Large-scale systems KW - NP-hard KW - optimisation KW - Polynomials KW - Publish-subscribe KW - publish-subscribe systems KW - Query optimization KW - Query processing KW - sensor networks KW - Steiner tree problem KW - Tree graphs KW - trees (mathematics) AB - Increasing prevalence of large-scale distributed monitoring and computing environments such as sensor networks, scientific federations, Grids etc., has led to a renewed interest in the area of distributed query processing and optimization. In this paper we address a general, distributed multi-query processing problem motivated by the need to minimize the communication cost in these environments. Specifically we address the problem of optimally sharing data movement across the communication edges in a distributed communication network given a set of overlapping queries and query plans for them (specifying the operations to be executed). Most of the problem variations of our general problem can be shown to be NP-Hard by a reduction from the Steiner tree problem. However, we show that the problem can be solved optimally if the communication network is a tree, and present a novel algorithm for finding an optimal data movement plan. For general communication networks, we present efficient approximation algorithms for several variations of the problem. Finally, we present an experimental study over synthetic datasets showing both the need for exploiting the sharing of data movement and the effectiveness of our algorithms at finding such plans. JA - IEEE 25th International Conference on Data Engineering, 2009. ICDE '09 PB - IEEE SN - 978-1-4244-3422-0 M3 - 10.1109/ICDE.2009.85 ER - TY - JOUR T1 - Applications of SHOP and SHOP2 JF - Intelligent Systems, IEEE Y1 - 2005 A1 - Nau, Dana S. A1 - Au,T.-C. A1 - Ilghami,O. A1 - Kuter,U. A1 - Wu,D. A1 - Yaman,F. A1 - Munoz-Avila,H. A1 - Murdock,J. W. KW - automated planning KW - hierarchical task network planning KW - ordered task decomposition KW - planning (artificial intelligence) KW - problem solving KW - search-control strategy KW - simple hierarchical ordered planner KW - trees (mathematics) KW - uncertainty handling AB - We design the simple hierarchical ordered planner (SHOP) and its successor, SHOP2, with two goals in mind: to investigate research issues in automated planning and to provide some simple, practical planning tools. SHOP and SHOP2 are based on a planning formalism called hierarchical task network planning. SHOP and SHOP2 use a search-control strategy called ordered task decomposition, which breaks tasks into subtasks and generates the plan's actions in the same order that the plan executor executes them. So, throughout the planning process, the planner can tell what the state of the world at each step of the plan. VL - 20 SN - 1541-1672 CP - 2 M3 - 10.1109/MIS.2005.20 ER - TY - CONF T1 - Distributions on level-sets with applications to approximation algorithms T2 - 42nd IEEE Symposium on Foundations of Computer Science, 2001. Proceedings Y1 - 2001 A1 - Srinivasan, Aravind KW - Approximation algorithms KW - approximation guarantee KW - approximation theory KW - bounded-degree graphs KW - computational geometry KW - Concrete KW - fixed-weight vectors KW - group Steiner tree problem KW - level-sets KW - linear-time algorithm KW - low-congestion multi-path routing KW - marginal distributions KW - maximum coverage versions KW - negative correlation properties KW - partial vertex cover problems KW - trees (mathematics) AB - We consider a family of distributions on fixed-weight vectors in {0, 1}t; these distributions enjoy certain negative correlation properties and also satisfy pre-specified conditions on their marginal distributions. We show the existence of such families, and present a linear-time algorithm to sample from them. This yields improved approximation algorithms for the following problems: (a) low-congestion multi-path routing; (b) maximum coverage versions of set cover; (c) partial vertex cover problems for bounded-degree graphs; and (d) the Group Steiner Tree problem. For (a) and (b), the improvement is in the approximation ratio; for (c), we show how to speedup existing approximation algorithms while preserving the best-known approximation ratio; we also improve the approximation ratio for certain families of instances of unbounded degree. For (d), we derive an approximation algorithm whose approximation guarantee is at least as good as what is known; our algorithm is shown to have a better approximation guarantee for the worst known input families for existing algorithms. JA - 42nd IEEE Symposium on Foundations of Computer Science, 2001. Proceedings PB - IEEE SN - 0-7695-1116-3 M3 - 10.1109/SFCS.2001.959935 ER - TY - CONF T1 - Nearly optimal expected-case planar point location T2 - Foundations of Computer Science, 2000. Proceedings. 41st Annual Symposium on Y1 - 2000 A1 - Arya,S. A1 - Malamatos,T. A1 - Mount, Dave KW - computational geometry KW - convex cells KW - data structure KW - expected search time KW - nearly optimal expected-case planar point location KW - optimal binary search tree KW - planar point location KW - planar polygonal subdivision KW - polygonal cells KW - polygonal subdivision KW - probability KW - search problems KW - search structure KW - subdivision KW - trees (mathematics) AB - We consider the planar point location problem from the perspective of expected search time. We are given a planar polygonal subdivision S and for each polygon of the subdivision the probability that a query point lies within this polygon. The goal is to compute a search structure to determine which cell of the subdivision contains a given query point, so as to minimize the expected search time. This is a generalization of the classical problem of computing an optimal binary search tree for one-dimensional keys. In the one-dimensional case it has long been known that the entropy H of the distribution is the dominant term in the lower bound on the expected-case search time, and further there exist search trees achieving expected search times of at most H+2. Prior to this work, there has been no known structure for planar point location with an expected search time better than 2H, and this result required strong assumptions on the nature of the query point distribution. Here we present a data structure whose expected search time is nearly equal to the entropy lower bound, namely H+o(H). The result holds for any polygonal subdivision in which the number of sides of each of the polygonal cells is bounded, and there are no assumptions on the query distribution within each cell. We extend these results to subdivisions with convex cells, assuming a uniform query distribution within each cell JA - Foundations of Computer Science, 2000. Proceedings. 41st Annual Symposium on M3 - 10.1109/SFCS.2000.892108 ER - TY - CONF T1 - Tree-maps: a space-filling approach to the visualization of hierarchical information structures T2 - , IEEE Conference on Visualization, 1991. Visualization '91, Proceedings Y1 - 1991 A1 - Johnson,B. A1 - Shneiderman, Ben KW - Computer displays KW - Computer Graphics KW - Computer science KW - Data analysis KW - display space KW - Educational institutions KW - Feedback KW - hierarchical information structures KW - HUMANS KW - Laboratories KW - Libraries KW - Marine vehicles KW - rectangular region KW - semantic information KW - space-filling approach KW - tree-map visualization technique KW - trees (mathematics) KW - Two dimensional displays KW - Visualization AB - A method for visualizing hierarchically structured information is described. The tree-map visualization technique makes 100% use of the available display space, mapping the full hierarchy onto a rectangular region in a space-filling manner. This efficient use of space allows very large hierarchies to be displayed in their entirety and facilitates the presentation of semantic information. Tree-maps can depict both the structure and content of the hierarchy. However, the approach is best suited to hierarchies in which the content of the leaf nodes and the structure of the hierarchy are of primary importance, and the content information associated with internal nodes is largely derived from their children JA - , IEEE Conference on Visualization, 1991. Visualization '91, Proceedings PB - IEEE SN - 0-8186-2245-8 M3 - 10.1109/VISUAL.1991.175815 ER - TY - JOUR T1 - Trie hashing with controlled load JF - IEEE Transactions on Software Engineering Y1 - 1991 A1 - Litwin,W. A A1 - Roussopoulos, Nick A1 - Levy,G. A1 - Hong,W. KW - B-tree file KW - Computer science KW - controlled load KW - Databases KW - disk access KW - dynamic files KW - file organisation KW - high load factor KW - information retrieval systems KW - key search KW - load factor KW - Military computing KW - ordered insertions KW - Predictive models KW - primary key access method KW - Protocols KW - random insertions KW - TH file KW - THCL KW - Tree data structures KW - trees (mathematics) KW - trie hashing AB - Trie hashing (TH), a primary key access method for storing and accessing records of dynamic files, is discussed. The key address is computed through a trie. A key search usually requires only one disk access when the trie is in core and two disk accesses for very large files when the trie must be on disk. A refinement to trie hashing, trie hashing with controlled load (THCL), is presented. It is designed to control the load factor of a TH file as tightly as that of a B-tree file, allows high load factor of up to 100% for ordered insertions, and increases the load factor for random insertions from 70% to over 85%. It is shown that these properties make trie hashing preferable to a B-tree VL - 17 SN - 0098-5589 CP - 7 M3 - 10.1109/32.83904 ER - TY - CONF T1 - Software metric classification trees help guide the maintenance of large-scale systems T2 - , Conference on Software Maintenance, 1989., Proceedings Y1 - 1989 A1 - Selby,R. W A1 - Porter, Adam KW - automated method KW - automatic programming KW - classification KW - Classification tree analysis KW - classification trees KW - Computer errors KW - empirically-based models KW - error-prone software objects KW - Fault diagnosis KW - feasibility study KW - high development effort KW - Large-scale systems KW - multivalued functions KW - NASA KW - NASA projects KW - recursive algorithm KW - Software algorithms KW - software engineering KW - Software maintenance KW - Software measurement KW - software metrics KW - software modules KW - Software systems KW - trees (mathematics) AB - The 80:20 rule states that approximately 20% of a software system is responsible for 80% of its errors. The authors propose an automated method for generating empirically-based models of error-prone software objects. These models are intended to help localize the troublesome 20%. The method uses a recursive algorithm to automatically generate classification trees whose nodes are multivalued functions based on software metrics. The purpose of the classification trees is to identify components that are likely to be error prone or costly, so that developers can focus their resources accordingly. A feasibility study was conducted using 16 NASA projects. On average, the classification trees correctly identified 79.3% of the software modules that had high development effort or faults JA - , Conference on Software Maintenance, 1989., Proceedings PB - IEEE SN - 0-8186-1965-1 M3 - 10.1109/ICSM.1989.65202 ER - TY - JOUR T1 - Learning from examples: generation and evaluation of decision trees for software resource analysis JF - IEEE Transactions on Software Engineering Y1 - 1988 A1 - Selby,R. W A1 - Porter, Adam KW - Analysis of variance KW - Artificial intelligence KW - Classification tree analysis KW - Data analysis KW - decision theory KW - Decision trees KW - Fault diagnosis KW - Information analysis KW - machine learning KW - metrics KW - NASA KW - production environment KW - software engineering KW - software modules KW - software resource analysis KW - Software systems KW - Termination of employment KW - trees (mathematics) AB - A general solution method for the automatic generation of decision (or classification) trees is investigated. The approach is to provide insights through in-depth empirical characterization and evaluation of decision trees for one problem domain, specifically, that of software resource data analysis. The purpose of the decision trees is to identify classes of objects (software modules) that had high development effort, i.e. in the uppermost quartile relative to past data. Sixteen software systems ranging from 3000 to 112000 source lines have been selected for analysis from a NASA production environment. The collection and analysis of 74 attributes (or metrics), for over 4700 objects, capture a multitude of information about the objects: development effort, faults, changes, design style, and implementation style. A total of 9600 decision trees are automatically generated and evaluated. The analysis focuses on the characterization and evaluation of decision tree accuracy, complexity, and composition. The decision trees correctly identified 79.3% of the software modules that had high development effort or faults, on the average across all 9600 trees. The decision trees generated from the best parameter combinations correctly identified 88.4% of the modules on the average. Visualization of the results is emphasized, and sample decision trees are included VL - 14 SN - 0098-5589 CP - 12 M3 - 10.1109/32.9061 ER -