TY - CONF T1 - Evaluating Files to Audit for Detecting Intrusions in FileSystem Data Y1 - 2009 A1 - Molina,J. A1 - Michel Cukier KW - authorisation KW - Bayes methods KW - Bayesian metric KW - data auditing KW - empirical SSH compromise data KW - Entropy KW - entropy-based metric KW - file evaluation KW - file organisation KW - filesystem attack activity KW - filesystem data monitoring KW - honeypot KW - information theory KW - intrusion detection system KW - invasive software KW - malware download KW - meta data KW - optimisation KW - optimization problem KW - password modification KW - probability KW - reconnaissance action KW - unauthorized user AB - Monitoring filesystem data is a common method used to detect intrusions. Once a computer is compromised, an attacker may alter files, add new files or delete existing files. The changes that attackers make may target any part of the filesystem, including metadata along with files (e.g., permissions, ownerships and inodes). The accuracy of detecting an intrusion depends on the data audited: if an intrusion does not manifest in the data, the intrusion will not be detected. Moreover, not all files, which contain filesystem activity, are suitable to detect intrusions, as some may fail to provide useful information. In this paper, we describe an empirical study that focused on filesystem attack activity after a SSH compromise. Three types of attacker action are considered: reconnaissance, password modification, and malware download. For each type of action, we evaluated the files to audit using metrics derived from the field of information theory and estimated with the empirical SSH compromise data. M3 - 10.1109/NCA.2009.38 ER - TY - CONF T1 - An empirical study of filesystem activity following a SSH compromise Y1 - 2007 A1 - Molina,J. A1 - Gordon,J. A1 - Chorin,X. A1 - Michel Cukier KW - attack activity KW - filesystem activity KW - filesystem data monitoring KW - intrusion detection systems evaluation KW - meta data KW - metadata KW - security of data KW - SSH compromised attacks AB - Monitoring filesystem data is a common method used to detect attacks. Once a computer is compromised, attackers will likely alter files, add new files or delete existing files. The changes that attackers make may target any part of the filesystem, including metadata along with files (e.g., permissions, ownerships and inodes). In this paper, we describe an empirical study that focused on SSH compromised attacks. First statistical data on the number of files targeted and the associated activity (e.g., read, write, delete, ownership and rights) is reported. Then, we refine the analysis to identify and understand patterns in the attack activity. M3 - 10.1109/ICICS.2007.4449675 ER - TY - CONF T1 - Reranking for Sentence Boundary Detection in Conversational Speech T2 - 2006 IEEE International Conference on Acoustics, Speech and Signal Processing, 2006. ICASSP 2006 Proceedings Y1 - 2006 A1 - Roark,B. A1 - Liu,Yang A1 - Harper,M. A1 - Stewart,R. A1 - Lease,M. A1 - Snover,M. A1 - Shafran,I. A1 - Dorr, Bonnie J A1 - Hale,J. A1 - Krasnyanskaya,A. A1 - Yung,L. KW - Automatic speech recognition KW - conversational speech KW - data mining KW - Ear KW - EARS metadata extraction tasks KW - Feature extraction KW - hidden Markov models KW - meta data KW - Model driven engineering KW - NIST KW - NIST RT-04F community evaluation KW - oracle accuracy KW - performance evaluation KW - reranking KW - sentence-like unit boundary detection KW - Speech processing KW - Speech recognition KW - Telephony AB - We present a reranking approach to sentence-like unit (SU) boundary detection, one of the EARS metadata extraction tasks. Techniques for generating relatively small n-best lists with high oracle accuracy are presented. For each candidate, features are derived from a range of information sources, including the output of a number of parsers. Our approach yields significant improvements over the best performing system from the NIST RT-04F community evaluation JA - 2006 IEEE International Conference on Acoustics, Speech and Signal Processing, 2006. ICASSP 2006 Proceedings PB - IEEE VL - 1 SN - 1-4244-0469-X M3 - 10.1109/ICASSP.2006.1660078 ER - TY - CONF T1 - Improving access to multi-dimensional self-describing scientific datasets T2 - 3rd IEEE/ACM International Symposium on Cluster Computing and the Grid, 2003. Proceedings. CCGrid 2003 Y1 - 2003 A1 - Nam,B. A1 - Sussman, Alan KW - Application software KW - application-specific semantic metadata KW - Bandwidth KW - Computer science KW - database indexing KW - disk I/O bandwidth KW - distributed databases KW - Educational institutions KW - Indexing KW - indexing structures KW - Libraries KW - meta data KW - Middleware KW - multidimensional arrays KW - multidimensional datasets KW - Multidimensional systems KW - NASA KW - NASA remote sensing data KW - Navigation KW - query formulation KW - self-describing scientific data file formats KW - structural metadata KW - very large databases AB - Applications that query into very large multidimensional datasets are becoming more common. Many self-describing scientific data file formats have also emerged, which have structural metadata to help navigate the multi-dimensional arrays that are stored in the files. The files may also contain application-specific semantic metadata. In this paper, we discuss efficient methods for performing searches for subsets of multi-dimensional data objects, using semantic information to build multidimensional indexes, and group data items into properly sized chunks to maximize disk I/O bandwidth. This work is the first step in the design and implementation of a generic indexing library that will work with various high-dimension scientific data file formats containing semantic information about the stored data. To validate the approach, we have implemented indexing structures for NASA remote sensing data stored in the HDF format with a specific schema (HDF-EOS), and show the performance improvements that are gained from indexing the datasets, compared to using the existing HDF library for accessing the data. JA - 3rd IEEE/ACM International Symposium on Cluster Computing and the Grid, 2003. Proceedings. CCGrid 2003 PB - IEEE SN - 0-7695-1919-9 M3 - 10.1109/CCGRID.2003.1199366 ER - TY - CONF T1 - Integrating distributed scientific data sources with MOCHA and XRoaster T2 - Thirteenth International Conference on Scientific and Statistical Database Management, 2001. SSDBM 2001. Proceedings Y1 - 2001 A1 - Rodriguez-Martinez,M. A1 - Roussopoulos, Nick A1 - McGann,J. M A1 - Kelley,S. A1 - Mokwa,J. A1 - White,B. A1 - Jala,J. KW - client-server systems KW - data sets KW - data sites KW - Databases KW - Distributed computing KW - distributed databases KW - distributed scientific data source integration KW - Educational institutions KW - graphical tool KW - hypermedia markup languages KW - IP networks KW - java KW - Large-scale systems KW - Maintenance engineering KW - meta data KW - metadata KW - Middleware KW - middleware system KW - MOCHA KW - Query processing KW - remote sites KW - scientific information systems KW - user-defined types KW - visual programming KW - XML KW - XML metadata elements KW - XML-based framework KW - XRoaster AB - MOCHA is a novel middleware system for integrating distributed data sources that we have developed at the University of Maryland. MOCHA is based on the idea that the code that implements user-defined types and functions should be automatically deployed to remote sites by the middleware system itself. To this end, we have developed an XML-based framework to specify metadata about data sites, data sets, and user-defined types and functions. XRoaster is a graphical tool that we have developed to help the user create all the XML metadata elements to be used in MOCHA JA - Thirteenth International Conference on Scientific and Statistical Database Management, 2001. SSDBM 2001. Proceedings PB - IEEE SN - 0-7695-1218-6 M3 - 10.1109/SSDM.2001.938560 ER - TY - CONF T1 - Optimized seamless integration of biomolecular data T2 - Proceedings of the IEEE 2nd International Symposium on Bioinformatics and Bioengineering Conference, 2001 Y1 - 2001 A1 - Eckman,B. A A1 - Lacroix,Z. A1 - Raschid, Louiqa KW - analysis KW - Bioinformatics KW - biology computing KW - cost based knowledge KW - Costs KW - Data analysis KW - data mining KW - data visualisation KW - Data visualization KW - Data warehouses KW - decision support KW - digital library KW - Educational institutions KW - information resources KW - Internet KW - low cost query evaluation plans KW - Mediation KW - meta data KW - metadata KW - molecular biophysics KW - multiple local heterogeneous data sources KW - multiple remote heterogeneous data sources KW - optimized seamless biomolecular data integration KW - scientific discovery KW - scientific information systems KW - semantic knowledge KW - software libraries KW - visual databases KW - Visualization AB - Today, scientific data is inevitably digitized, stored in a variety of heterogeneous formats, and is accessible over the Internet. Scientists need to access an integrated view of multiple remote or local heterogeneous data sources. They then integrate the results of complex queries and apply further analysis and visualization to support the task of scientific discovery. Building a digital library for scientific discovery requires accessing and manipulating data extracted from flat files or databases, documents retrieved from the Web, as well as data that is locally materialized in warehouses or is generated by software. We consider several tasks to provide optimized and seamless integration of biomolecular data. Challenges to be addressed include capturing and representing source capabilities; developing a methodology to acquire and represent metadata about source contents and access costs; and decision support to select sources and capabilities using cost based and semantic knowledge, and generating low cost query evaluation plans JA - Proceedings of the IEEE 2nd International Symposium on Bioinformatics and Bioengineering Conference, 2001 PB - IEEE SN - 0-7695-1423-5 M3 - 10.1109/BIBE.2001.974408 ER -