%0 Conference Paper %B 2010 IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP) %D 2010 %T FPGA-based design and implementation of the 3GPP-LTE physical layer using parameterized synchronous dataflow techniques %A Kee, Hojin %A Bhattacharyya, Shuvra S. %A Wong, I. %A Yong Rao %K 3G mobile communication %K 3GPP-long term evolution %K 3GPP-LTE physical layer %K 4G communication systems %K Computational modeling %K data flow analysis %K data flow graphs %K Dataflow modeling %K Digital signal processing %K DSP applications %K Field programmable gate arrays %K FPGA architecture framework %K FPGA implementation %K FPGA-based design %K Hardware %K hardware synthesis %K Instruments %K LabVIEW FPGA %K Logic Design %K LTE %K next generation cellular standard %K parameterized synchronous data flow technique %K Pervasive computing %K Physical layer %K Physics computing %K Production %K PSDF graph %K reconfigurable hardware implementation %K Runtime %K software synthesis %K Ubiquitous Computing %K ubiquitous data flow model %X Synchronous dataflow (SDF) is an ubiquitous dataflow model of computation that has been studied extensively for efficient simulation and software synthesis of DSP applications. In recent years, parameterized SDF (PSDF) has evolved as a useful framework for modeling SDF graphs in which arbitrary parameters can be changed dynamically. However, the potential to enable efficient hardware synthesis has been treated relatively sparsely in the literature for SDF and even more so for the newer, more general PSDF model. This paper investigates efficient FPGA-based design and implementation of the physical layer for 3GPP-Long Term Evolution (LTE), a next generation cellular standard. To capture the SDF behavior of the functional core of LTE along with higher level dynamics in the standard, we use a novel PSDF-based FPGA architecture framework. We implement our PSDF-based, LTE design framework using National Instrument's LabVIEW FPGA, a recently-introduced commercial platform for reconfigurable hardware implementation. We show that our framework can effectively model the dynamics of the LTE protocol, while also providing a synthesis framework for efficient FPGA implementation. %B 2010 IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP) %P 1510 - 1513 %8 2010 %G eng %0 Journal Article %J IEEE Transactions on Signal Processing %D 2006 %T Contention-conscious transaction ordering in multiprocessor DSP systems %A Khandelia,M. %A Bambha,N. K %A Bhattacharyya, Shuvra S. %K contention-conscious transaction ordering %K Costs %K data flow graphs %K Dataflow %K Delay %K Digital signal processing %K digital signal processing chips %K Embedded system %K graph-theoretic analysis %K Instruments %K Internet telephony %K interprocessor communication %K iterative dataflow graphs %K iterative methods %K Message passing %K multiprocessor %K multiprocessor DSP systems %K NP-complete problem %K Processor scheduling %K scheduling %K Signal processing %K synchronization %K Throughput %X This paper explores the problem of efficiently ordering interprocessor communication (IPC) operations in statically scheduled multiprocessors for iterative dataflow graphs. In most digital signal processing (DSP) applications, the throughput of the system is significantly affected by communication costs. By explicitly modeling these costs within an effective graph-theoretic analysis framework, we show that ordered transaction schedules can significantly outperform self-timed schedules even when synchronization costs are low. However, we also show that when communication latencies are nonnegligible, finding an optimal transaction order given a static schedule is an NP-complete problem, and that this intractability holds both under iterative and noniterative execution. We develop new heuristics for finding efficient transaction orders, and perform an extensive experimental comparison to gauge the performance of these heuristics. %B IEEE Transactions on Signal Processing %V 54 %P 556 - 569 %8 2006/02// %@ 1053-587X %G eng %N 2 %R 10.1109/TSP.2005.861074 %0 Journal Article %J IEEE Transactions on Parallel and Distributed Systems %D 1998 %T Critical path profiling of message passing and shared-memory programs %A Hollingsworth, Jeffrey K %K Computer Society %K Concurrent computing %K critical path computation %K critical path profile %K critical path zeroing %K distributed processing %K distributed shared memory systems %K Instruments %K Message passing %K Monitoring %K online algorithm %K online critical path profiling %K Parallel algorithms %K program bottlenecks %K Runtime %K runtime nontrace-based algorithm %K runtime overhead %K shared-memory programs %K system monitoring %K Time measurement %K Yarn %X We introduce a runtime, nontrace-based algorithm to compute the critical path profile of the execution of message passing and shared-memory parallel programs. Our algorithm permits starting or stopping the critical path computation during program execution and reporting intermediate values. We also present an online algorithm to compute a variant of critical path, called critical path zeroing, that measures the reduction in application execution time that improving a selected procedure will have. Finally, we present a brief case study to quantify the runtime overhead of our algorithm and to show that online critical path profiling can be used to find program bottlenecks %B IEEE Transactions on Parallel and Distributed Systems %V 9 %P 1029 - 1040 %8 1998/10// %@ 1045-9219 %G eng %N 10 %R 10.1109/71.730530 %0 Journal Article %J IEEE Transactions on Software Engineering %D 1998 %T Modeling and evaluating design alternatives for an on-line instrumentation system: a case study %A Waheed, A. %A Rover, D. T %A Hollingsworth, Jeffrey K %K alternative system configurations %K Application software %K batch-and-forward %K collect-and-forward %K Computer aided software engineering %K design alternatives %K design decisions %K Feedback %K IBM SP-2 platform %K Instruments %K massively parallel processing %K model-based evaluation approach %K Monitoring %K multiprocessing programs %K on-line instrumentation system %K Paradyn parallel performance measurement tool %K PARALLEL PROCESSING %K Real time systems %K scalability characteristics %K software metrics %K software tools %K Space technology %K symmetric multiprocessors %K system architectures %K system monitoring %K System testing %K task scheduling policies %K tool developers %K tree forwarding configuration %K Workstations %X This paper demonstrates the use of a model-based evaluation approach for instrumentation systems (ISs). The overall objective of this study is to provide early feedback to tool developers regarding IS overhead and performance; such feedback helps developers make appropriate design decisions about alternative system configurations and task scheduling policies. We consider three types of system architectures: network of workstations (NOW), symmetric multiprocessors (SMP), and massively parallel processing (MPP) systems. We develop a Resource OCCupancy (ROCC) model for an on-line IS for an existing tool and parameterize it for an IBM SP-2 platform. This model is simulated to answer several “what if” questions regarding two policies to schedule instrumentation data forwarding: collect-and-forward (CF) and batch-and-forward (BF). In addition, this study investigates two alternatives for forwarding the instrumentation data: direct and binary tree forwarding for an MPP system. Simulation results indicate that the BF policy can significantly reduce the overhead and that the tree forwarding configuration exhibits desirable scalability characteristics for MPP systems. Initial measurement-based testing results indicate more than 60 percent reduction in the direct IS overhead when the BF policy was added to Paradyn parallel performance measurement tool %B IEEE Transactions on Software Engineering %V 24 %P 451 - 470 %8 1998/06// %@ 0098-5589 %G eng %N 6 %R 10.1109/32.689402 %0 Conference Paper %B , 1997 International Conference on Parallel Architectures and Compilation Techniques., 1997. Proceedings %D 1997 %T MDL: a language and compiler for dynamic program instrumentation %A Hollingsworth, Jeffrey K %A Niam, O. %A Miller, B. P %A Zhichen Xu %A Goncalves,M. J.R %A Ling Zheng %K Alpha architecture %K application program %K application program interfaces %K Application software %K compiler generators %K Computer science %K dynamic code generation %K Dynamic compiler %K dynamic program instrumentation %K Educational institutions %K files %K instrumentation code %K Instruments %K MDL %K measurement %K message channels %K Message passing %K Metric Description Language %K modules %K nodes %K Operating systems %K optimising compilers %K PA-RISC %K Paradyn Parallel Performance Tools %K Parallel architectures %K parallel programming %K performance data %K platform independent descriptions %K Power 2 architecture %K Power generation %K procedures %K program debugging %K Program processors %K running programs %K Runtime %K software metrics %K SPARC %K Specification languages %K x86 architecture %X We use a form of dynamic code generation, called dynamic instrumentation, to collect data about the execution of an application program. Dynamic instrumentation allows us to instrument running programs to collect performance and other types of information. The instrumentation code is generated incrementally and can be inserted and removed at any time. Our instrumentation currently runs on the SPARC, PA-RISC, Power 2, Alpha, and x86 architectures. Specification of what data to collect are written in a specialized language called the Metric Description Language, that is part of the Paradyn Parallel Performance Tools. This language allows platform independent descriptions of how to collect performance data. It also provides a concise way to specify, how to constrain performance data to particular resources such as modules, procedures, nodes, files, or message channels (or combinations of these resources). We also describe the details of how we weave instrumentation into a running program %B , 1997 International Conference on Parallel Architectures and Compilation Techniques., 1997. Proceedings %I IEEE %P 201 - 212 %8 1997/11/10/14 %@ 0-8186-8090-3 %G eng %R 10.1109/PACT.1997.644016 %0 Conference Paper %B Proceedings of the 1996 ACM/IEEE Conference on Supercomputing, 1996 %D 1996 %T Modeling, Evaluation, and Testing of Paradyn Instrumentation System %A Waheed, A. %A Rover, D. T %A Hollingsworth, Jeffrey K %K Distributed control %K Feedback %K High performance computing %K Instruments %K Monitoring %K Real time systems %K Software measurement %K Software systems %K Software testing %K System testing %X This paper presents a case study of modeling, evaluating, and testing the data collection services (called an instrumentation system) of the Paradyn parallel performance measurement tool using well-known performance evaluation and experiment design techniques. The overall objective of the study is to use modeling- and simulation-based evaluation to provide feedback to the tool developers to help them choose system configurations and task scheduling policies that can significantly reduce the data collection overheads. We develop and parameterize a resource occupancy model for the Paradyn instrumentation system (IS) for an IBM SP-2 platform. This model is parameterized with a measurement-based workload characterization and subsequently used to answer several "what if" questions regarding configuration options and two policies to schedule instrumentation system tasks: collect-and-forward (CF) and batch-and-forward (BF) policies. Simulation results indicate that the BF policy can significantly reduce the overheads. Based on this feedback, the BF policy was implemented in the Paradyn IS as an option to manage the data collection. Measurement-based testing results obtained from this enhanced version of the Paradyn IS are reported in this paper and indicate more than 60% reduction in the direct IS overheads when the BF policy is used. %B Proceedings of the 1996 ACM/IEEE Conference on Supercomputing, 1996 %I IEEE %P 18 - 18 %8 1996/// %@ 0-89791-854-1 %G eng %R 10.1109/SUPERC.1996.183524 %0 Journal Article %J Computer %D 1995 %T The Paradyn parallel performance measurement tool %A Miller, B. P %A Callaghan, M. D %A Cargille, J. M %A Hollingsworth, Jeffrey K %A Irvin, R. B %A Karavanic, K. L %A Kunchithapadam, K. %A Newhall, T. %K Aerodynamics %K Automatic control %K automatic instrumentation control %K Debugging %K dynamic instrumentation %K flexible performance information %K high level languages %K insertion %K Instruments %K large-scale parallel program %K Large-scale systems %K measurement %K Paradyn parallel performance measurement tool %K Parallel machines %K parallel programming %K Performance Consultant %K Programming profession %K scalability %K software performance evaluation %K software tools %X Paradyn is a tool for measuring the performance of large-scale parallel programs. Our goal in designing a new performance tool was to provide detailed, flexible performance information without incurring the space (and time) overhead typically associated with trace-based tools. Paradyn achieves this goal by dynamically instrumenting the application and automatically controlling this instrumentation in search of performance problems. Dynamic instrumentation lets us defer insertion until the moment it is needed (and remove it when it is no longer needed); Paradyn's Performance Consultant decides when and where to insert instrumentation %B Computer %V 28 %P 37 - 46 %8 1995/11// %@ 0018-9162 %G eng %N 11 %R 10.1109/2.471178 %0 Conference Paper %B Scalable High-Performance Computing Conference, 1994., Proceedings of the %D 1994 %T Dynamic program instrumentation for scalable performance tools %A Hollingsworth, Jeffrey K %A Miller, B. P %A Cargille, J. %K Application software %K binary image %K compiler writing %K Computer architecture %K Computer displays %K Computerized monitoring %K Concurrent computing %K data acquisition %K data collection %K data visualisation %K Data visualization %K dynamic program instrumentation %K efficient monitoring %K executing program %K Instruments %K large-scale parallel applications %K Large-scale systems %K operating system design %K Operating systems %K parallel programming %K program analysis %K program diagnostics %K program visualization %K Programming profession %K Sampling methods %K scalable performance tools %K software tools %X Presents a new technique called `dynamic instrumentation' that provides efficient, scalable, yet detailed data collection for large-scale parallel applications. Our approach is unique because it defers inserting any instrumentation until the application is in execution. We can insert or change instrumentation at any time during execution by modifying the application's binary image. Only the instrumentation required for the currently selected analysis or visualization is inserted. As a result, our technique collects several orders of magnitude less data than traditional data collection approaches. We have implemented a prototype of our dynamic instrumentation on the CM-5, and present results for several real applications. In addition, we include recommendations to operating system designers, compiler writers, and computer architects about the features necessary to permit efficient monitoring of large-scale parallel systems %B Scalable High-Performance Computing Conference, 1994., Proceedings of the %I IEEE %P 841 - 850 %8 1994/05// %@ 0-8186-5680-8 %G eng %R 10.1109/SHPCC.1994.296728 %0 Journal Article %J IEEE Transactions on Parallel and Distributed Systems %D 1990 %T IPS-2: the second generation of a parallel program measurement system %A Miller, B. P %A Clark, M. %A Hollingsworth, Jeffrey K %A Kierstead, S. %A Lim,S. -S %A Torzewski, T. %K 4.3BSD UNIX systems %K automatic guidance techniques %K Automatic testing %K Charlotte distributed operating system %K CPA %K DECstation %K design concepts %K distributed programs %K graphical user interface %K Graphical user interfaces %K Instruments %K interactive program analysis %K IPS-2 %K measurement %K message systems %K network operating systems %K Operating systems %K parallel program measurement system %K parallel programming %K parallel programs %K Performance analysis %K performance analysis techniques %K performance evaluation %K performance measurement system %K Power system modeling %K program bottlenecks %K program diagnostics %K Programming profession %K semantics %K Sequent Symmetry multiprocessor %K shared-memory systems %K software tools %K Springs %K Sun %K Sun 4 %K Unix %K VAX %X IPS, a performance measurement system for parallel and distributed programs, is currently running on its second implementation. IPS's model of parallel programs uses knowledge about the semantics of a program's structure to provide two important features. First, IPS provides a large amount of performance data about the execution of a parallel program, and this information is organized so that access to it is easy and intuitive. Secondly, IPS provides performance analysis techniques that help to guide the programmer automatically to the location of program bottlenecks. The first implementation of IPS was a testbed for the basic design concepts, providing experience with a hierarchical program and measurement model, interactive program analysis, and automatic guidance techniques. It was built on the Charlotte distributed operating system. The second implementation, IPS-2, extends the basic system with new instrumentation techniques, an interactive and graphical user interface, and new automatic guidance analysis techniques. This implementation runs on 4.3BSD UNIX systems, on the VAX, DECstation, Sun 4, and Sequent Symmetry multiprocessor %B IEEE Transactions on Parallel and Distributed Systems %V 1 %P 206 - 217 %8 1990/04// %@ 1045-9219 %G eng %N 2 %R 10.1109/71.80132