%0 Conference Paper %B 2011 Proceedings IEEE INFOCOM %D 2011 %T Decentralized, accurate, and low-cost network bandwidth prediction %A Sukhyun Song %A Keleher,P. %A Bhattacharjee, Bobby %A Sussman, Alan %K accuracy %K approximate tree metric space %K Bandwidth %K bandwidth allocation %K bandwidth measurement %K decentralized low cost system %K distributed tree %K end-to-end prediction %K Extraterrestrial measurements %K Internet %K low-cost network bandwidth prediction %K Measurement uncertainty %K pairwise bandwidth %K Peer to peer computing %K Prediction algorithms %K trees (mathematics) %X The distributed nature of modern computing makes end-to-end prediction of network bandwidth increasingly important. Our work is inspired by prior work that treats the Internet and bandwidth as an approximate tree metric space. This paper presents a decentralized, accurate, and low cost system that predicts pairwise bandwidth between hosts. We describe an algorithm to construct a distributed tree that embeds bandwidth measurements. The correctness of the algorithm is provable when driven by precise measurements. We then describe three novel heuristics that achieve high accuracy for predicting bandwidth even with imprecise input data. Simulation experiments with a real-world dataset confirm that our approach shows high accuracy with low cost. %B 2011 Proceedings IEEE INFOCOM %I IEEE %P 6 - 10 %8 2011/04/10/15 %@ 978-1-4244-9919-9 %G eng %R 10.1109/INFCOM.2011.5935251 %0 Conference Paper %B 2011 IEEE Conference on Visual Analytics Science and Technology (VAST) %D 2011 %T TreeVersity: Comparing tree structures by topology and node's attributes differences %A Gomez,J.A.G. %A Buck-Coleman,A. %A Plaisant, Catherine %A Shneiderman, Ben %K Computer science %K data classification %K Data visualization %K Educational institutions %K hierarchy %K Image color analysis %K LifeFlow %K node attributes differences %K Pattern classification %K structural changes %K Topology %K topology attributes differences %K traffic agencies %K tree structures comparison %K trees (mathematics) %K TreeVersity %K Vegetation %K Visualization %X It is common to classify data in hierarchies, they provide a comprehensible way of understanding big amounts of data. From budgets to organizational charts or even the stock market, trees are everywhere and people find them easy to use. However when analysts need to compare two versions of the same tree structure, or two related taxonomies, the task is not so easy. Much work has been done on this topic, but almost all of it has been restricted to either compare the trees by topology, or by the node attribute values. With this project we are proposing TreeVersity, a framework for comparing tree structures, both by structural changes and by differences in the node attributes. This paper is based on our previous work on comparing traffic agencies using LifeFlow [1, 2] and on a first prototype of TreeVersity. %B 2011 IEEE Conference on Visual Analytics Science and Technology (VAST) %I IEEE %P 275 - 276 %8 2011/10/23/28 %@ 978-1-4673-0015-5 %G eng %R 10.1109/VAST.2011.6102471 %0 Conference Paper %B 2010 Proceedings IEEE INFOCOM %D 2010 %T On Computing Compression Trees for Data Collection in Wireless Sensor Networks %A Li,Jian %A Deshpande, Amol %A Khuller, Samir %K Approximation algorithms %K Base stations %K Communications Society %K Computer networks %K Computer science %K computing compression trees %K Costs %K data collection %K Data communication %K data compression %K designing algorithms %K Educational institutions %K Entropy %K graph concept %K Monitoring %K Protocols %K trees (mathematics) %K weakly connected dominating sets %K Wireless sensor networks %X We address the problem of efficiently gathering correlated data from a wireless sensor network, with the aim of designing algorithms with provable optimality guarantees, and understanding how close we can get to the known theoretical lower bounds. Our proposed approach is based on finding an optimal or a near-optimal compression tree for a given sensor network: a compression tree is a directed tree over the sensor network nodes such that the value of a node is compressed using the value of its parent. We focus on broadcast communication model in this paper, but our results are more generally applicable to a unicast communication model as well. We draw connections between the data collection problem and a previously studied graph concept called weakly connected dominating sets, and we use this to develop novel approximation algorithms for the problem. We present comparative results on several synthetic and real-world datasets showing that our algorithms construct near-optimal compression trees that yield a significant reduction in the data collection cost. %B 2010 Proceedings IEEE INFOCOM %I IEEE %P 1 - 9 %8 2010/03/14/19 %@ 978-1-4244-5836-3 %G eng %R 10.1109/INFCOM.2010.5462035 %0 Conference Paper %B IEEE 25th International Conference on Data Engineering, 2009. ICDE '09 %D 2009 %T Minimizing Communication Cost in Distributed Multi-query Processing %A Li,Jian %A Deshpande, Amol %A Khuller, Samir %K Approximation algorithms %K Communication networks %K Computer science %K Cost function %K Data engineering %K distributed communication network %K distributed databases %K distributed multi-query processing %K grid computing %K Large-scale systems %K NP-hard %K optimisation %K Polynomials %K Publish-subscribe %K publish-subscribe systems %K Query optimization %K Query processing %K sensor networks %K Steiner tree problem %K Tree graphs %K trees (mathematics) %X Increasing prevalence of large-scale distributed monitoring and computing environments such as sensor networks, scientific federations, Grids etc., has led to a renewed interest in the area of distributed query processing and optimization. In this paper we address a general, distributed multi-query processing problem motivated by the need to minimize the communication cost in these environments. Specifically we address the problem of optimally sharing data movement across the communication edges in a distributed communication network given a set of overlapping queries and query plans for them (specifying the operations to be executed). Most of the problem variations of our general problem can be shown to be NP-Hard by a reduction from the Steiner tree problem. However, we show that the problem can be solved optimally if the communication network is a tree, and present a novel algorithm for finding an optimal data movement plan. For general communication networks, we present efficient approximation algorithms for several variations of the problem. Finally, we present an experimental study over synthetic datasets showing both the need for exploiting the sharing of data movement and the effectiveness of our algorithms at finding such plans. %B IEEE 25th International Conference on Data Engineering, 2009. ICDE '09 %I IEEE %P 772 - 783 %8 2009/04/29/March %@ 978-1-4244-3422-0 %G eng %R 10.1109/ICDE.2009.85 %0 Journal Article %J Intelligent Systems, IEEE %D 2005 %T Applications of SHOP and SHOP2 %A Nau, Dana S. %A Au,T.-C. %A Ilghami,O. %A Kuter,U. %A Wu,D. %A Yaman,F. %A Munoz-Avila,H. %A Murdock,J. W. %K automated planning %K hierarchical task network planning %K ordered task decomposition %K planning (artificial intelligence) %K problem solving %K search-control strategy %K simple hierarchical ordered planner %K trees (mathematics) %K uncertainty handling %X We design the simple hierarchical ordered planner (SHOP) and its successor, SHOP2, with two goals in mind: to investigate research issues in automated planning and to provide some simple, practical planning tools. SHOP and SHOP2 are based on a planning formalism called hierarchical task network planning. SHOP and SHOP2 use a search-control strategy called ordered task decomposition, which breaks tasks into subtasks and generates the plan's actions in the same order that the plan executor executes them. So, throughout the planning process, the planner can tell what the state of the world at each step of the plan. %B Intelligent Systems, IEEE %V 20 %P 34 - 41 %8 2005/04//march %@ 1541-1672 %G eng %N 2 %R 10.1109/MIS.2005.20 %0 Conference Paper %B 42nd IEEE Symposium on Foundations of Computer Science, 2001. Proceedings %D 2001 %T Distributions on level-sets with applications to approximation algorithms %A Srinivasan, Aravind %K Approximation algorithms %K approximation guarantee %K approximation theory %K bounded-degree graphs %K computational geometry %K Concrete %K fixed-weight vectors %K group Steiner tree problem %K level-sets %K linear-time algorithm %K low-congestion multi-path routing %K marginal distributions %K maximum coverage versions %K negative correlation properties %K partial vertex cover problems %K trees (mathematics) %X We consider a family of distributions on fixed-weight vectors in {0, 1}t; these distributions enjoy certain negative correlation properties and also satisfy pre-specified conditions on their marginal distributions. We show the existence of such families, and present a linear-time algorithm to sample from them. This yields improved approximation algorithms for the following problems: (a) low-congestion multi-path routing; (b) maximum coverage versions of set cover; (c) partial vertex cover problems for bounded-degree graphs; and (d) the Group Steiner Tree problem. For (a) and (b), the improvement is in the approximation ratio; for (c), we show how to speedup existing approximation algorithms while preserving the best-known approximation ratio; we also improve the approximation ratio for certain families of instances of unbounded degree. For (d), we derive an approximation algorithm whose approximation guarantee is at least as good as what is known; our algorithm is shown to have a better approximation guarantee for the worst known input families for existing algorithms. %B 42nd IEEE Symposium on Foundations of Computer Science, 2001. Proceedings %I IEEE %P 588 - 597 %8 2001/10/08/11 %@ 0-7695-1116-3 %G eng %R 10.1109/SFCS.2001.959935 %0 Conference Paper %B Foundations of Computer Science, 2000. Proceedings. 41st Annual Symposium on %D 2000 %T Nearly optimal expected-case planar point location %A Arya,S. %A Malamatos,T. %A Mount, Dave %K computational geometry %K convex cells %K data structure %K expected search time %K nearly optimal expected-case planar point location %K optimal binary search tree %K planar point location %K planar polygonal subdivision %K polygonal cells %K polygonal subdivision %K probability %K search problems %K search structure %K subdivision %K trees (mathematics) %X We consider the planar point location problem from the perspective of expected search time. We are given a planar polygonal subdivision S and for each polygon of the subdivision the probability that a query point lies within this polygon. The goal is to compute a search structure to determine which cell of the subdivision contains a given query point, so as to minimize the expected search time. This is a generalization of the classical problem of computing an optimal binary search tree for one-dimensional keys. In the one-dimensional case it has long been known that the entropy H of the distribution is the dominant term in the lower bound on the expected-case search time, and further there exist search trees achieving expected search times of at most H+2. Prior to this work, there has been no known structure for planar point location with an expected search time better than 2H, and this result required strong assumptions on the nature of the query point distribution. Here we present a data structure whose expected search time is nearly equal to the entropy lower bound, namely H+o(H). The result holds for any polygonal subdivision in which the number of sides of each of the polygonal cells is bounded, and there are no assumptions on the query distribution within each cell. We extend these results to subdivisions with convex cells, assuming a uniform query distribution within each cell %B Foundations of Computer Science, 2000. Proceedings. 41st Annual Symposium on %P 208 - 218 %8 2000/// %G eng %R 10.1109/SFCS.2000.892108 %0 Conference Paper %B , IEEE Conference on Visualization, 1991. Visualization '91, Proceedings %D 1991 %T Tree-maps: a space-filling approach to the visualization of hierarchical information structures %A Johnson,B. %A Shneiderman, Ben %K Computer displays %K Computer Graphics %K Computer science %K Data analysis %K display space %K Educational institutions %K Feedback %K hierarchical information structures %K HUMANS %K Laboratories %K Libraries %K Marine vehicles %K rectangular region %K semantic information %K space-filling approach %K tree-map visualization technique %K trees (mathematics) %K Two dimensional displays %K Visualization %X A method for visualizing hierarchically structured information is described. The tree-map visualization technique makes 100% use of the available display space, mapping the full hierarchy onto a rectangular region in a space-filling manner. This efficient use of space allows very large hierarchies to be displayed in their entirety and facilitates the presentation of semantic information. Tree-maps can depict both the structure and content of the hierarchy. However, the approach is best suited to hierarchies in which the content of the leaf nodes and the structure of the hierarchy are of primary importance, and the content information associated with internal nodes is largely derived from their children %B , IEEE Conference on Visualization, 1991. Visualization '91, Proceedings %I IEEE %P 284 - 291 %8 1991/10/22/25 %@ 0-8186-2245-8 %G eng %R 10.1109/VISUAL.1991.175815 %0 Journal Article %J IEEE Transactions on Software Engineering %D 1991 %T Trie hashing with controlled load %A Litwin,W. A %A Roussopoulos, Nick %A Levy,G. %A Hong,W. %K B-tree file %K Computer science %K controlled load %K Databases %K disk access %K dynamic files %K file organisation %K high load factor %K information retrieval systems %K key search %K load factor %K Military computing %K ordered insertions %K Predictive models %K primary key access method %K Protocols %K random insertions %K TH file %K THCL %K Tree data structures %K trees (mathematics) %K trie hashing %X Trie hashing (TH), a primary key access method for storing and accessing records of dynamic files, is discussed. The key address is computed through a trie. A key search usually requires only one disk access when the trie is in core and two disk accesses for very large files when the trie must be on disk. A refinement to trie hashing, trie hashing with controlled load (THCL), is presented. It is designed to control the load factor of a TH file as tightly as that of a B-tree file, allows high load factor of up to 100% for ordered insertions, and increases the load factor for random insertions from 70% to over 85%. It is shown that these properties make trie hashing preferable to a B-tree %B IEEE Transactions on Software Engineering %V 17 %P 678 - 691 %8 1991/07// %@ 0098-5589 %G eng %N 7 %R 10.1109/32.83904 %0 Conference Paper %B , Conference on Software Maintenance, 1989., Proceedings %D 1989 %T Software metric classification trees help guide the maintenance of large-scale systems %A Selby,R. W %A Porter, Adam %K automated method %K automatic programming %K classification %K Classification tree analysis %K classification trees %K Computer errors %K empirically-based models %K error-prone software objects %K Fault diagnosis %K feasibility study %K high development effort %K Large-scale systems %K multivalued functions %K NASA %K NASA projects %K recursive algorithm %K Software algorithms %K software engineering %K Software maintenance %K Software measurement %K software metrics %K software modules %K Software systems %K trees (mathematics) %X The 80:20 rule states that approximately 20% of a software system is responsible for 80% of its errors. The authors propose an automated method for generating empirically-based models of error-prone software objects. These models are intended to help localize the troublesome 20%. The method uses a recursive algorithm to automatically generate classification trees whose nodes are multivalued functions based on software metrics. The purpose of the classification trees is to identify components that are likely to be error prone or costly, so that developers can focus their resources accordingly. A feasibility study was conducted using 16 NASA projects. On average, the classification trees correctly identified 79.3% of the software modules that had high development effort or faults %B , Conference on Software Maintenance, 1989., Proceedings %I IEEE %P 116 - 123 %8 1989/10/16/19 %@ 0-8186-1965-1 %G eng %R 10.1109/ICSM.1989.65202 %0 Journal Article %J IEEE Transactions on Software Engineering %D 1988 %T Learning from examples: generation and evaluation of decision trees for software resource analysis %A Selby,R. W %A Porter, Adam %K Analysis of variance %K Artificial intelligence %K Classification tree analysis %K Data analysis %K decision theory %K Decision trees %K Fault diagnosis %K Information analysis %K machine learning %K metrics %K NASA %K production environment %K software engineering %K software modules %K software resource analysis %K Software systems %K Termination of employment %K trees (mathematics) %X A general solution method for the automatic generation of decision (or classification) trees is investigated. The approach is to provide insights through in-depth empirical characterization and evaluation of decision trees for one problem domain, specifically, that of software resource data analysis. The purpose of the decision trees is to identify classes of objects (software modules) that had high development effort, i.e. in the uppermost quartile relative to past data. Sixteen software systems ranging from 3000 to 112000 source lines have been selected for analysis from a NASA production environment. The collection and analysis of 74 attributes (or metrics), for over 4700 objects, capture a multitude of information about the objects: development effort, faults, changes, design style, and implementation style. A total of 9600 decision trees are automatically generated and evaluated. The analysis focuses on the characterization and evaluation of decision tree accuracy, complexity, and composition. The decision trees correctly identified 79.3% of the software modules that had high development effort or faults, on the average across all 9600 trees. The decision trees generated from the best parameter combinations correctly identified 88.4% of the modules on the average. Visualization of the results is emphasized, and sample decision trees are included %B IEEE Transactions on Software Engineering %V 14 %P 1743 - 1757 %8 1988/12// %@ 0098-5589 %G eng %N 12 %R 10.1109/32.9061