A feature generation algorithm for sequences with application to splice-site prediction

TitleA feature generation algorithm for sequences with application to splice-site prediction
Publication TypeJournal Articles
Year of Publication2006
AuthorsIslamaj R, Getoor L, Wilbur W
JournalKnowledge Discovery in Databases: PKDD 2006
Pagination553 - 560
Date Published2006///
Abstract

In this paper we present a new approach to feature selection for sequence data. We identify general feature categories and give construction algorithms for each of them. We show how they can be integrated in a system that tightly couples feature construction and feature selection. This integrated process, which we refer to as feature generation, allows us to systematically search a large space of potential features. We demonstrate the effectiveness of our approach for an important component of the gene finding problem, splice-site prediction. We show that predictive models built using our feature generation algorithm achieve a significant improvement in accuracy over existing, state-of-the-art approaches.

DOI10.1007/11871637_55