%0 Conference Paper %B Person-Oriented Vision (POV), 2011 IEEE Workshop on %D 2011 %T Active inference for retrieval in camera networks %A Daozheng Chen %A Bilgic,M. %A Getoor, Lise %A Jacobs, David W. %A Mihalkova,L. %A Tom Yeh %K active %K annotation;probabilistic %K frame;cameras;inference %K inference;camera %K mechanisms;probability;search %K model;human %K model;retrieval %K network;graphical %K problem;video %K problems;video %K processing; %K retrieval;video %K signal %K system;searching %X We address the problem of searching camera network videos to retrieve frames containing specified individuals. We show the benefit of utilizing a learned probabilistic model that captures dependencies among the cameras. In addition, we develop an active inference framework that can request human input at inference time, directing human attention to the portions of the videos whose correct annotation would provide the biggest performance improvements. Our primary contribution is to show that by mapping video frames in a camera network onto a graphical model, we can apply collective classification and active inference algorithms to significantly increase the performance of the retrieval system, while minimizing the number of human annotations required. %B Person-Oriented Vision (POV), 2011 IEEE Workshop on %P 13 - 20 %8 2011/01// %G eng %R 10.1109/POV.2011.5712363 %0 Journal Article %J Pattern Analysis and Machine Intelligence, IEEE Transactions on %D 2011 %T Dynamic Processing Allocation in Video %A Daozheng Chen %A Bilgic,M. %A Getoor, Lise %A Jacobs, David W. %K algorithms;digital %K allocation;video %K analysis;computer %K background %K detection;graphical %K detection;resource %K graphics;face %K model;resource %K processing; %K processing;face %K recognition;object %K signal %K subtraction;baseline %K video %X Large stores of digital video pose severe computational challenges to existing video analysis algorithms. In applying these algorithms, users must often trade off processing speed for accuracy, as many sophisticated and effective algorithms require large computational resources that make it impractical to apply them throughout long videos. One can save considerable effort by applying these expensive algorithms sparingly, directing their application using the results of more limited processing. We show how to do this for retrospective video analysis by modeling a video using a chain graphical model and performing inference both to analyze the video and to direct processing. We apply our method to problems in background subtraction and face detection, and show in experiments that this leads to significant improvements over baseline algorithms. %B Pattern Analysis and Machine Intelligence, IEEE Transactions on %V 33 %P 2174 - 2187 %8 2011/11// %@ 0162-8828 %G eng %N 11 %R 10.1109/TPAMI.2011.55 %0 Conference Paper %B Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE International Conference on %D 2011 %T Modeling temporal correlations in content fingerprints %A Varna,A.L. %A M. Wu %K chain %K correlations;video %K database;Markov %K databases; %K databases;video %K detection;fingerprint %K detector;certain %K fingerprints;hybrid %K identification;temporal %K Markov %K model;adaptive %K model;temporal %K processes;adaptive %K regime;content %K signal %X Previous analysis of content fingerprints has mainly focused on the case of independent and identically distributed finger prints. Practical fingerprints, however, exhibit correlations between components computed from successive frames. In this paper, a Markov chain based model is used to capture the temporal correlations, and the suitability of this model is evaluated through experiments on a video database. The results indicate that the Markov chain model is a good fit only in a certain regime. A hybrid model is then developed to account for this behavior and a corresponding adaptive detector is derived. The adaptive detector achieves better identification accuracy at a small computational expense. %B Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE International Conference on %P 1860 - 1863 %8 2011/05// %G eng %R 10.1109/ICASSP.2011.5946868 %0 Conference Paper %B Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on %D 2011 %T Multi-agent event recognition in structured scenarios %A Morariu,V.I. %A Davis, Larry S. %K Allen %K analysis;Markov %K descriptions;video %K event %K grounding %K inference;semantic %K interval %K Logic %K logic;interval-based %K logic;Markov %K logic;multi-agent %K logical %K networks;bottom-up %K processes;formal %K processing; %K reasoning;multiagent %K reasoning;video %K recognition;probabilistic %K recognition;temporal %K scheme;first-order %K signal %K spatio-temporal %K systems;object %K temporal %X We present a framework for the automatic recognition of complex multi-agent events in settings where structure is imposed by rules that agents must follow while performing activities. Given semantic spatio-temporal descriptions of what generally happens (i.e., rules, event descriptions, physical constraints), and based on video analysis, we determine the events that occurred. Knowledge about spatio-temporal structure is encoded using first-order logic using an approach based on Allen's Interval Logic, and robustness to low-level observation uncertainty is provided by Markov Logic Networks (MLN). Our main contribution is that we integrate interval-based temporal reasoning with probabilistic logical inference, relying on an efficient bottom-up grounding scheme to avoid combinatorial explosion. Applied to one-on-one basketball, our framework detects and tracks players, their hands and feet, and the ball, generates event observations from the resulting trajectories, and performs probabilistic logical inference to determine the most consistent sequence of events. We demonstrate our approach on 1hr (100,000 frames) of outdoor videos. %B Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on %P 3289 - 3296 %8 2011/06// %G eng %R 10.1109/CVPR.2011.5995386 %0 Conference Paper %B Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE International Conference on %D 2011 %T Secure video processing: Problems and challenges %A Lu,Wenjun %A Varna,A. %A M. Wu %K data;video %K fashion;secure %K management;secure %K of %K online %K privacy-preserving %K processing; %K processing;security %K signal %K video %X Secure signal processing is an emerging technology to enable signal processing tasks in a secure and privacy-preserving fashion. It has attracted a great amount of research attention due to the increasing demand to enable rich functionalities for private data stored online. Desirable functionalities may include search, analysis, clustering, etc. In this paper, we discuss the research issues and challenges in secure video processing with focus on the application of secure online video management. Video is different from text due to its large data volume and rich content diversity. To be practical, secure video processing requires efficient solutions that may involve a trade-off between security and complexity. We look at three representative video processing tasks and review existing techniques that can be applied. Many of the tasks do not have efficient solutions yet, and we discuss the challenges and research questions that need to be addressed. %B Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE International Conference on %P 5856 - 5859 %8 2011/05// %G eng %R 10.1109/ICASSP.2011.5947693 %0 Conference Paper %B Acoustics Speech and Signal Processing (ICASSP), 2010 IEEE International Conference on %D 2010 %T Automatic matched filter recovery via the audio camera %A O'Donovan,A.E. %A Duraiswami, Ramani %A Zotkin,Dmitry N %K acoustic %K array;real-time %K arrays;transient %K audio %K camera;automatic %K constraints;microphone %K filter %K filters;microphone %K images;room %K impulse %K Matched %K positions;acoustic %K processing;cameras;matched %K radiators;array %K recovery;beamforming;geometric %K response; %K response;sound %K sensor;acoustic %K signal %K source;source/receiver %K sources;audio %X The sound reaching the acoustic sensor in a realistic environment contains not only the part arriving directly from the sound source but also a number of environmental reflections. The effect of those on the sound is equivalent to a convolution with the room impulse response and can be undone via deconvolution - a technique known as matched filter processing. However, the filter is usually pre-computed in advance using known room geometry and source/receiver positions, and any deviations from those cause the performance to degrade significantly. In this work, an algorithm is proposed to compute the matched filter automatically using an audio camera - a microphone array based system that provides real-time audio images (essentially plots of steered response power in various directions) of environment. Acoustic sources, as well as their significant reflections, are revealed as peaks in the audio image. The reflections are associated with sound source(s) using an acoustic similarity metric, and an approximate matched filter is computed to align the reflections in time with the direct arrival. Preliminary experimental evaluation of the method is performed. It is shown that in case of two sources the reflections are identified correctly, the time delays recovered agree well with those computed from geometric constraints, and that the output SNR improves when the reflections are added coherently to the signal obtained by beamforming directly at the source. %B Acoustics Speech and Signal Processing (ICASSP), 2010 IEEE International Conference on %P 2826 - 2829 %8 2010/03// %G eng %R 10.1109/ICASSP.2010.5496187 %0 Conference Paper %B Image Processing (ICIP), 2010 17th IEEE International Conference on %D 2010 %T Automatic target recognition based on simultaneous sparse representation %A Patel, Vishal M. %A Nasrabadi,N.M. %A Chellapa, Rama %K (artificial %K algorithm;feature %K based %K classification;iterative %K classification;learning %K Comanche %K data %K dictionary;matching %K extraction;image %K forward-looking %K infrared %K intelligence);military %K learning %K MATCHING %K matrix;dictionary %K measure;military %K methods;learning %K orthogonal %K pursuit %K pursuit;confusion %K recognition;class %K recognition;target %K representation;feature %K representation;sparse %K set;automatic %K signal %K similarity %K simultaneous %K sparse %K supervised %K systems;object %K target %K target;simultaneous %K tracking; %X In this paper, an automatic target recognition algorithm is presented based on a framework for learning dictionaries for simultaneous sparse signal representation and feature extraction. The dictionary learning algorithm is based on class supervised simultaneous orthogonal matching pursuit while a matching pursuit-based similarity measure is used for classification. We show how the proposed framework can be helpful for efficient utilization of data, with the possibility of developing real-time, robust target classification. We verify the efficacy of the proposed algorithm using confusion matrices on the well known Comanche forward-looking infrared data set consisting of ten different military targets at different orientations. %B Image Processing (ICIP), 2010 17th IEEE International Conference on %P 1377 - 1380 %8 2010/09// %G eng %R 10.1109/ICIP.2010.5652306 %0 Conference Paper %B Image Processing (ICIP), 2010 17th IEEE International Conference on %D 2010 %T Recognizing offensive strategies from football videos %A Ruonan Li %A Chellapa, Rama %K American %K analysis;video %K changes;image %K errors;variables %K football %K identification;tracking %K models;geometric %K models;view %K play %K players %K processing; %K properties;nonlinear %K recognition;offensive %K recognition;statistical %K signal %K spaces;offensive %K statistical %K strategies %K technique;design %K videos;analysis-by-synthesis %X We address the problem of recognizing offensive play strategies from American football play videos. Specifically, we propose a probabilistic model which describes the generative process of an observed football play and takes into account practical issues in real football videos, such as difficulty in identifying offensive players, view changes, and tracking errors. In particular, we exploit the geometric properties of nonlinear spaces of involved variables and design statistical models on these manifolds. Then recognition is performed via 'analysis-by-synthesis' technique. Experiments on a newly established dataset of American football videos demonstrate the effectiveness of the approach. %B Image Processing (ICIP), 2010 17th IEEE International Conference on %P 4585 - 4588 %8 2010/09// %G eng %R 10.1109/ICIP.2010.5652192 %0 Journal Article %J Pattern Analysis and Machine Intelligence, IEEE Transactions on %D 2010 %T Video Metrology Using a Single Camera %A Guo,Feng %A Chellapa, Rama %K camera;vanishing %K circles;uncalibrated %K concentric %K image-based %K information;vehicle %K line %K measurement;video %K metrology;cameras;image %K processing; %K segment;multiple %K segmentation;video %K signal %K single %K techniques;line %K wheelbase %X This paper presents a video metrology approach using an uncalibrated single camera that is either stationary or in planar motion. Although theoretically simple, measuring the length of even a line segment in a given video is often a difficult problem. Most existing techniques for this task are extensions of single image-based techniques and do not achieve the desired accuracy especially in noisy environments. In contrast, the proposed algorithm moves line segments on the reference plane to share a common endpoint using the vanishing line information followed by fitting multiple concentric circles on the image plane. A fully automated real-time system based on this algorithm has been developed to measure vehicle wheelbases using an uncalibrated stationary camera. The system estimates the vanishing line using invariant lengths on the reference plane from multiple frames rather than the given parallel lines, which may not exist in videos. It is further extended to a camera undergoing a planar motion by automatically selecting frames with similar vanishing lines from the video. Experimental results show that the measurement results are accurate enough to classify moving vehicles based on their size. %B Pattern Analysis and Machine Intelligence, IEEE Transactions on %V 32 %P 1329 - 1335 %8 2010/07// %@ 0162-8828 %G eng %N 7 %R 10.1109/TPAMI.2010.26 %0 Conference Paper %B Image Processing (ICIP), 2009 16th IEEE International Conference on %D 2009 %T Concurrent transition and shot detection in football videos using Fuzzy Logic %A Refaey,M.A. %A Elsayed,K.M. %A Hanafy,S.M. %A Davis, Larry S. %K analysis;inference %K boundary;shot %K Color %K colour %K detection;sports %K functions;shot %K histogram;concurrent %K logic;image %K logic;inference %K mechanism;intensity %K mechanisms;sport;video %K processing; %K processing;videonanalysis;fuzzy %K signal %K transition;edgeness;football %K variance;membership %K video;video %K videos;fuzzy %X Shot detection is a fundamental step in video processing and analysis that should be achieved with high degree of accuracy. In this paper, we introduce a unified algorithm for shot detection in sports video using fuzzy logic as a powerful inference mechanism. Fuzzy logic overcomes the problems of hard cut thresholds and the need to large training data used in previous work. The proposed algorithm integrates many features like color histogram, edgeness, intensity variance, etc. Membership functions to represent different features and transitions between shots have been developed to detect different shot boundary and transition types. We address the detection of cut, fade, dissolve, and wipe shot transitions. The results show that our algorithm achieves high degree of accuracy. %B Image Processing (ICIP), 2009 16th IEEE International Conference on %P 4341 - 4344 %8 2009/11// %G eng %R 10.1109/ICIP.2009.5413648 %0 Conference Paper %B Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on %D 2009 %T Learning multi-modal densities on Discriminative Temporal Interaction Manifold for group activity recognition %A Ruonan Li %A Chellapa, Rama %A Zhou,S. K %K a %K activities;parametric %K activity %K analysis;belief %K analysis;pattern %K Bayesian %K classification;image %K classifier;multimodal %K complex %K constraints;data-driven %K density %K equipment;video %K function;multiobject %K interaction %K manifold;discriminative %K matrix;football %K MOTION %K network;video-based %K networks;image %K play %K posteriori %K processing; %K recognition %K recognition;group %K recognition;maximum %K signal %K spatial %K strategy;discriminative %K temporal %X While video-based activity analysis and recognition has received much attention, existing body of work mostly deals with single object/person case. Coordinated multi-object activities, or group activities, present in a variety of applications such as surveillance, sports, and biological monitoring records, etc., are the main focus of this paper. Unlike earlier attempts which model the complex spatial temporal constraints among multiple objects with a parametric Bayesian network, we propose a Discriminative Temporal Interaction Manifold (DTIM) framework as a data-driven strategy to characterize the group motion pattern without employing specific domain knowledge. In particular, we establish probability densities on the DTIM, whose element, the discriminative temporal interaction matrix, compactly describes the coordination and interaction among multiple objects in a group activity. For each class of group activity we learn a multi-modal density function on the DTIM. A Maximum a Posteriori (MAP) classifier on the manifold is then designed for recognizing new activities. Experiments on football play recognition demonstrate the effectiveness of the approach. %B Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on %P 2450 - 2457 %8 2009/06// %G eng %R 10.1109/CVPR.2009.5206676 %0 Conference Paper %B Acoustics, Speech and Signal Processing, 2009. ICASSP 2009. IEEE International Conference on %D 2009 %T Plane-wave decomposition of a sound scene using a cylindrical microphone array %A Zotkin,Dmitry N %A Duraiswami, Ramani %K array;plane %K arrays; %K baffle;cylindrical %K beamforming;cylindrical %K decomposition;sound-hard %K localization;array %K microphone %K plane-wave %K processing;microphone %K scene %K signal %K spherical;source %K waves;sound %X The analysis for microphone arrays formed by mounting microphones on a sound-hard spherical or cylindrical baffle is typically performed using a decomposition of the sound field in terms of orthogonal basis functions. An alternative representation in terms of plane waves and a method for obtaining the coefficients of such a representation directly from measurements was proposed recently for the case of a spherical array. It was shown that representing the field as a collection of plane waves arriving from various directions simplifies both source localization and beamforming. In this paper, these results are extended to the case of the cylindrical array. Similarly to the spherical array case, localization and beamforming based on plane-wave decomposition perform as well as the traditional orthogonal function based methods while being numerically more stable. Both simulated and experimental results are presented. %B Acoustics, Speech and Signal Processing, 2009. ICASSP 2009. IEEE International Conference on %P 85 - 88 %8 2009/04// %G eng %R 10.1109/ICASSP.2009.4959526 %0 Conference Paper %B Acoustics, Speech and Signal Processing, 2009. ICASSP 2009. IEEE International Conference on %D 2009 %T Recognizing coordinated multi-object activities using a dynamic event ensemble model %A Ruonan Li %A Chellapa, Rama %K activities %K activity %K analysis;Bayes %K Bayesian %K classification;video %K description %K ensemble %K event %K framework;dynamic %K functions;ensemble %K geometric %K manifold;football %K methods;geometry;image %K model;ensemble %K multiobject %K network;video-based %K play %K processing; %K property;classifier;coordinated %K recognition;data-driven %K recognition;parametric %K Riemannian %K signal %K strategy;dynamic %X While video-based activity analysis and recognition has received broad attention, existing body of work mostly deals with single object/person case. Modeling involving multiple objects and recognition of coordinated group activities, present in a variety of applications such as surveillance, sports, biological records, and so on, is the main focus of this paper. Unlike earlier attempts which model the complex spatial temporal constraints among different activities of multiple objects with a parametric Bayesian network, we propose a dynamic dasiaevent ensemblepsila framework as a data-driven strategy to characterize the group motion pattern without employing any specific domain knowledge. In particular, we exploit the Riemannian geometric property of the set of ensemble description functions and develop a compact representation for group activities on the ensemble manifold. An appropriate classifier on the manifold is then designed for recognizing new activities. Experiments on football play recognition demonstrate the effectiveness of the framework. %B Acoustics, Speech and Signal Processing, 2009. ICASSP 2009. IEEE International Conference on %P 3541 - 3544 %8 2009/04// %G eng %R 10.1109/ICASSP.2009.4960390 %0 Journal Article %J Signal Processing, IEEE Transactions on %D 2009 %T Vehicle Speed Estimation Using Acoustic Wave Patterns %A Cevher, V. %A Chellapa, Rama %A McClellan, J.H. %K acoustic %K Doppler %K estimation;passive %K estimation;road %K estimation;wheelbase %K factor;acoustic %K identification;vehicle %K length;Doppler %K length;vehicle %K likelihood %K patterns;envelope %K processing;maximum %K profile %K sensor;tire %K shape;maximum %K shift %K shift;acoustic %K signal %K speed %K track %K vector;vehicle %K vehicles; %K wave %X We estimate a vehicle's speed, its wheelbase length, and tire track length by jointly estimating its acoustic wave pattern with a single passive acoustic sensor that records the vehicle's drive-by noise. The acoustic wave pattern is determined using the vehicle's speed, the Doppler shift factor, the sensor's distance to the vehicle's closest-point-of-approach, and three envelope shape (ES) components, which approximate the shape variations of the received signal's power envelope. We incorporate the parameters of the ES components along with estimates of the vehicle engine RPM, the number of cylinders, and the vehicle's initial bearing, loudness and speed to form a vehicle profile vector. This vector provides a fingerprint that can be used for vehicle identification and classification. We also provide possible reasons why some of the existing methods are unable to provide unbiased vehicle speed estimates using the same framework. The approach is illustrated using vehicle speed estimation and classification results obtained with field data. %B Signal Processing, IEEE Transactions on %V 57 %P 30 - 47 %8 2009/01// %@ 1053-587X %G eng %N 1 %R 10.1109/TSP.2008.2005750 %0 Conference Paper %B Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on %D 2008 %T Action recognition using ballistic dynamics %A Vitaladevuni,S.N. %A Kellokumpu,V. %A Davis, Larry S. %K analysis;image %K Bayesian %K dynamics;gesture %K feature;person-centric %K framework;action %K History %K image %K labels;psycho-kinesiological %K morphological %K MOTION %K Movement %K movements;motion %K planning;interactive %K processing; %K recognition %K recognition;ballistic %K recognition;image %K segmentation;video %K signal %K studies;image %K task;human %X We present a Bayesian framework for action recognition through ballistic dynamics. Psycho-kinesiological studies indicate that ballistic movements form the natural units for human movement planning. The framework leads to an efficient and robust algorithm for temporally segmenting videos into atomic movements. Individual movements are annotated with person-centric morphological labels called ballistic verbs. This is tested on a dataset of interactive movements, achieving high recognition rates. The approach is also applied on a gesture recognition task, improving a previously reported recognition rate from 84% to 92%. Consideration of ballistic dynamics enhances the performance of the popular Motion History Image feature. We also illustrate the approachpsilas general utility on real-world videos. Experiments indicate that the method is robust to view, style and appearance variations. %B Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on %P 1 - 8 %8 2008/06// %G eng %R 10.1109/CVPR.2008.4587806 %0 Conference Paper %B Acoustics, Speech and Signal Processing, 2008. ICASSP 2008. IEEE International Conference on %D 2008 %T Compressive wireless arrays for bearing estimation %A Cevher, V. %A Gurbuz, A.C. %A McClellan, J.H. %A Chellapa, Rama %K acoustic %K algorithms;sparse %K arrays;acoustic %K arrays;hypothesis %K bandwidth;compressive %K bearing %K Estimation %K estimation;signal %K estimation;sparse %K matrices;vectors;wireless %K minimization %K networks; %K problem;communication %K problem;microphone %K problems;joint %K PROCESSING %K processing;array %K processing;direction-of-arrival %K processing;l1-norm %K sensor %K signal %K signals;parameter %K Testing %K vector;wireless %K wireless %X Joint processing of sensor array outputs improves the performance of parameter estimation and hypothesis testing problems beyond the sum of the individual sensor processing results. When the sensors have high data sampling rates, arrays are tethered, creating a disadvantage for their deployment and also limiting their aperture size. In this paper, we develop the signal processing algorithms for randomly deployable wireless sensor arrays that are severely constrained in communication bandwidth. We focus on the acoustic bearing estimation problem and show that when the target bearings are modeled as a sparse vector in the angle space, low dimensional random projections of the microphone signals can be used to determine multiple source bearings by solving an l 1-norm minimization problem. Field data results are shown where only 10 bits of information is passed from each microphone to estimate multiple target bearings. %B Acoustics, Speech and Signal Processing, 2008. ICASSP 2008. IEEE International Conference on %P 2497 - 2500 %8 2008/04/31/4 %G eng %R 10.1109/ICASSP.2008.4518155 %0 Journal Article %J Multimedia, IEEE Transactions on %D 2008 %T A Constrained Probabilistic Petri Net Framework for Human Activity Detection in Video %A Albanese, M. %A Chellapa, Rama %A Moscato, V. %A Picariello, A. %A V.S. Subrahmanian %A Turaga,P. %A Udrea,O. %K activity %K dataset;automated %K detection;human %K interactions;security %K net;human %K nets;image %K Petri %K probabilistic %K processing;multiagent %K processing;video %K representation;low-level %K representation;video %K signal %K Surveillance %K surveillance; %K systems;constrained %K systems;surveillance %K tarmac %K TSA %K videos;Petri %X Recognition of human activities in restricted settings such as airports, parking lots and banks is of significant interest in security and automated surveillance systems. In such settings, data is usually in the form of surveillance videos with wide variation in quality and granularity. Interpretation and identification of human activities requires an activity model that a) is rich enough to handle complex multi-agent interactions, b) is robust to uncertainty in low-level processing and c) can handle ambiguities in the unfolding of activities. We present a computational framework for human activity representation based on Petri nets. We propose an extension-Probabilistic Petri Nets (PPN)-and show how this model is well suited to address each of the above requirements in a wide variety of settings. We then focus on answering two types of questions: (i) what are the minimal sub-videos in which a given activity is identified with a probability above a certain threshold and (ii) for a given video, which activity from a given set occurred with the highest probability? We provide the PPN-MPS algorithm for the first problem, as well as two different algorithms (naive PPN-MPA and PPN-MPA) to solve the second. Our experimental results on a dataset consisting of bank surveillance videos and an unconstrained TSA tarmac surveillance dataset show that our algorithms are both fast and provide high quality results. %B Multimedia, IEEE Transactions on %V 10 %P 982 - 996 %8 2008/10// %@ 1520-9210 %G eng %N 6 %R 10.1109/TMM.2008.2001369 %0 Conference Paper %B Acoustics, Speech and Signal Processing, 2008. ICASSP 2008. IEEE International Conference on %D 2008 %T Factorized variational approximations for acoustic multi source localization %A Cevher, V. %A Sankaranarayanan,A. C %A Chellapa, Rama %K approximation;acoustic %K approximation;Bayes %K approximation;multisensor %K approximation;variational %K Bayes %K Chi %K detection;wireless %K distribution;Gaussian %K localization;factorized %K localization;sensor %K methods;Gaussian %K multisource %K networks; %K networks;stochastic %K processes;acoustic %K sensor %K signal %K strength;sensor %K systems;object %K tracking;received %K variational %X Estimation based on received signal strength (RSS) is crucial in sensor networks for sensor localization, target tracking, etc. In this paper, we present a Gaussian approximation of the Chi distribution that is applicable to general RSS source localization problems in sensor networks. Using our Gaussian approximation, we provide a factorized variational Bayes (VB) approximation to the location and power posterior of multiple sources using a sensor network. When the source signal and the sensor noise have uncorrelated Gaussian distributions, we demonstrate that the envelope of the sensor output can be accurately modeled with a multiplicative Gaussian noise model. In turn, our factorized VB approximations decrease the computational complexity and provide computational robustness as the number of targets increases. Simulations are provided to demonstrate the effectiveness of the proposed approximations. %B Acoustics, Speech and Signal Processing, 2008. ICASSP 2008. IEEE International Conference on %P 2409 - 2412 %8 2008/04/31/4 %G eng %R 10.1109/ICASSP.2008.4518133 %0 Conference Paper %B Semantic Media Adaptation and Personalization, 2008. SMAP '08. Third International Workshop on %D 2008 %T A Logic Framework for Sports Video Summarization Using Text-Based Semantic Annotation %A Refaey,M.A. %A Abd-Almageed, Wael %A Davis, Larry S. %K (mathematics);video %K analysis;trees %K annotation;Internet;broadcasting;sport;text %K AUTOMATIC %K detection;logic %K engine;parse %K event %K PROCESSING %K processing; %K semantic %K signal %K summarization;text %K trees;sports %K video %K Webcasting;text-based %X Detection of semantic events in sports videos is an essential step towards video summarization. A large volume of research has been conducted for automatic semantic event detection and summarization of sports videos. In this paper we present a novel sports video summarization framework using a combination of text, video and logic analysis. Parse trees are used to analyze structured and free-style text webcasting of sports games and extract the game¿s semantic events, such as goals and penalties in soccer games. Semantic events are then hierarchically arranged before being passed to a logic processing engine. The logic engine receives the summary preferences from the user and subsequently parses the event hierarchy to generate the game¿s summary according to the user¿s preferences. The proposed framework was applied to both soccer and basketball videos. We achieved an average accuracy of 98.6% and 100% on soccer and basketball videos, respectively. %B Semantic Media Adaptation and Personalization, 2008. SMAP '08. Third International Workshop on %P 69 - 75 %8 2008/12// %G eng %R 10.1109/SMAP.2008.25 %0 Conference Paper %B Applications of Computer Vision, 2008. WACV 2008. IEEE Workshop on %D 2008 %T Tracking Down Under: Following the Satin Bowerbird %A Kembhavi,A. %A Farrell,R. %A Luo,Yuancheng %A Jacobs, David W. %A Duraiswami, Ramani %A Davis, Larry S. %K analysis %K behavior;animal %K Bowerbird;animal %K computing;feature %K detection;animal %K extraction;tracking;video %K processing;zoology; %K Satin %K sciences %K selection;behavioural %K signal %K tool;feature %K tracking;automated %K video %X Socio biologists collect huge volumes of video to study animal behavior (our collaborators work with 30,000 hours of video). The scale of these datasets demands the development of automated video analysis tools. Detecting and tracking animals is a critical first step in this process. However, off-the-shelf methods prove incapable of handling videos characterized by poor quality, drastic illumination changes, non-stationary scenery and foreground objects that become motionless for long stretches of time. We improve on existing approaches by taking advantage of specific aspects of this problem: by using information from the entire video we are able to find animals that become motionless for long intervals of time; we make robust decisions based on regional features; for different parts of the image, we tailor the selection of model features, choosing the features most helpful in differentiating the target animal from the background in that part of the image. We evaluate our method, achieving almost 83% tracking accuracy on a more than 200,000 frame dataset of Satin Bowerbird courtship videos. %B Applications of Computer Vision, 2008. WACV 2008. IEEE Workshop on %P 1 - 7 %8 2008/01// %G eng %R 10.1109/WACV.2008.4544004 %0 Conference Paper %B Computer Vision and Pattern Recognition, 2007. CVPR '07. IEEE Conference on %D 2007 %T Bilattice-based Logical Reasoning for Human Detection %A Shet,V. D %A Neumann, J. %A Ramesh,V. %A Davis, Larry S. %K automated %K detection;parts-based %K detectors;static %K histograms;human %K images;formal %K interactions;gradient %K logic;image %K logical %K mechanisms;surveillance;video %K processing; %K reasoning;complex %K recognition;inference %K signal %K Surveillance %K systems;bilattice-based %K visual %X The capacity to robustly detect humans in video is a critical component of automated visual surveillance systems. This paper describes a bilattice based logical reasoning approach that exploits contextual information and knowledge about interactions between humans, and augments it with the output of different low level detectors for human detection. Detections from low level parts-based detectors are treated as logical facts and used to reason explicitly about the presence or absence of humans in the scene. Positive and negative information from different sources, as well as uncertainties from detections and logical rules, are integrated within the bilattice framework. This approach also generates proofs or justifications for each hypothesis it proposes. These justifications (or lack thereof) are further employed by the system to explain and validate, or reject potential hypotheses. This allows the system to explicitly reason about complex interactions between humans and handle occlusions. These proofs are also available to the end user as an explanation of why the system thinks a particular hypothesis is actually a human. We employ a boosted cascade of gradient histograms based detector to detect individual body parts. We have applied this framework to analyze the presence of humans in static images from different datasets. %B Computer Vision and Pattern Recognition, 2007. CVPR '07. IEEE Conference on %P 1 - 8 %8 2007/06// %G eng %R 10.1109/CVPR.2007.383133 %0 Conference Paper %B Acoustics, Speech and Signal Processing, 2007. ICASSP 2007. IEEE International Conference on %D 2007 %T Coarse-to-Fine Event Model for Human Activities %A Cuntoor, N.P. %A Chellapa, Rama %K action %K activities;spatial %K airport %K browsing;video %K dataset;activity %K dataset;UCF %K event %K framework;human %K human %K indoor %K Markov %K model %K model;event %K models;image %K probabilities;hidden %K processing; %K recognition;coarse-to-fine %K reduction;video %K representation;image %K resolution %K resolution;image %K sequences;hidden %K sequences;stability;video %K signal %K Surveillance %K tarmac %K TSA %X We analyze coarse-to-fine hierarchical representation of human activities in video sequences. It can be used for efficient video browsing and activity recognition. Activities are modeled using a sequence of instantaneous events. Events in activities can be represented in a coarse-to-fine hierarchy in several ways, i.e., there may not be a unique hierarchical structure. We present five criteria and quantitative measures for evaluating their effectiveness. The criteria are minimalism, stability, consistency, accessibility and applicability. It is desirable to develop activity models that rank highly on these criteria at all levels of hierarchy. In this paper, activities are represented as sequence of event probabilities computed using the hidden Markov model framework. Two aspects of hierarchies are analyzed: the effect of reduced frame rate on the accuracy of events detected at a finer scale; and the effect of reduced spatial resolution on activity recognition. Experiments using the UCF indoor human action dataset and the TSA airport tarmac surveillance dataset show encouraging results %B Acoustics, Speech and Signal Processing, 2007. ICASSP 2007. IEEE International Conference on %V 1 %P I-813 -I-816 - I-813 -I-816 %8 2007/04// %G eng %R 10.1109/ICASSP.2007.366032 %0 Conference Paper %B Acoustics, Speech and Signal Processing, 2007. ICASSP 2007. IEEE International Conference on %D 2007 %T Colluding Fingerprinted Video using the Gradient Attack %A He,Shan %A Kirovski,D. %A M. Wu %K attack;multimedia %K attacks;digital %K content %K data;video %K distribution;fingerprint %K effort;gradient %K fingerprinted %K fingerprinting;disproportional %K fingerprints;colluding %K fingerprints;Laplace %K Gaussian %K identification;multimedia %K of %K processing; %K protection;unauthorized %K signal %K spectrum %K spread %K systems;security %K video;collusion %X Digital fingerprinting is an emerging tool to protect multimedia content from unauthorized distribution by embedding a unique fingerprint into each user's copy. Although several fingerprinting schemes have been proposed in related work, disproportional effort has been targeted towards identifying effective collusion attacks on fingerprinting schemes. Recent introduction of the gradient attack has refined the definition of an optimal attack and demonstrated strong effect on direct-sequence, uniformly distributed, and Gaussian spread spectrum fingerprints when applied to synthetic signals. In this paper, we apply the gradient attack on an existing well-engineered video fingerprinting scheme, refine the attack procedure, and demonstrate that the gradient attack is effective on Laplace fingerprints. Finally, we explore an improvement on fingerprint design to thwart the gradient attack. Results suggest that Laplace fingerprint should be avoided. However, we show that a signal mixed of Laplace and Gaussian fingerprints may serve as a design strategy to disable the gradient attack and force pirates into averaging as a form of adversary collusion. %B Acoustics, Speech and Signal Processing, 2007. ICASSP 2007. IEEE International Conference on %V 2 %P II-161 -II-164 - II-161 -II-164 %8 2007/04// %G eng %R 10.1109/ICASSP.2007.366197 %0 Conference Paper %B Computer Vision and Pattern Recognition, 2007. CVPR '07. IEEE Conference on %D 2007 %T Epitomic Representation of Human Activities %A Cuntoor, N.P. %A Chellapa, Rama %K action %K activities %K airport %K dataset;epitomic %K dataset;UCF %K decomposition;modelling;statistics;video %K decomposition;TSA %K dynamical %K human %K indoor %K Iwasawa %K matrix %K matrix;human %K modeling;input %K processing; %K representation;estimated %K sequences;image %K sequences;matrix %K signal %K statistics;linear %K Surveillance %K system %K systems;video %X We introduce an epitomic representation for modeling human activities in video sequences. A video sequence is divided into segments within which the dynamics of objects is assumed to be linear and modeled using linear dynamical systems. The tuple consisting of the estimated system matrix, statistics of the input signal and the initial state value is said to form an epitome. The system matrices are decomposed using the Iwasawa matrix decomposition to isolate the effect of rotation, scaling and projective action on the state vector. "We demonstrate the usefulness of the proposed representation and decomposition for activity recognition using the TSA airport surveillance dataset and the UCF indoor human action dataset. %B Computer Vision and Pattern Recognition, 2007. CVPR '07. IEEE Conference on %P 1 - 8 %8 2007/06// %G eng %R 10.1109/CVPR.2007.383135 %0 Journal Article %J Audio, Speech, and Language Processing, IEEE Transactions on %D 2007 %T Flexible and Optimal Design of Spherical Microphone Arrays for Beamforming %A Li,Z. %A Duraiswami, Ramani %K adaptive %K algorithm;beamforming;beampattern;frequency %K arrays; %K arrays;acoustic %K directivity;optimal %K microphone %K processing;array %K processing;microphone %K response;maximum %K signal %K spherical %X This paper describes a methodology for designing a flexible and optimal spherical microphone array for beamforming. Using the approach presented, a spherical microphone array can have very flexible layouts of microphones on the spherical surface, yet optimally approximate a desired beampattern of higher order within a specified robustness constraint. Depending on the specified beampattern order, our approach automatically achieves optimal performances in two cases: when the specified beampattern order is reachable within the robustness constraint we achieve a beamformer with optimal approximation of the desired beampattern; otherwise we achieve a beamformer with maximum directivity, both robustly. For efficient implementation, we also developed an adaptive algorithm for computing the beamformer weights. It converges to the optimal performance quickly while exactly satisfying the specified frequency response and robustness constraint in each step. One application of the method is to allow the building of a real-world system, where microphones may not be placeable on regions, such as near cable outlets and/or a mounting base, while having a minimal effect on the performance. Simulation results are presented %B Audio, Speech, and Language Processing, IEEE Transactions on %V 15 %P 702 - 714 %8 2007/02// %@ 1558-7916 %G eng %N 2 %R 10.1109/TASL.2006.876764 %0 Conference Paper %B Computer Vision and Pattern Recognition, 2007. CVPR '07. IEEE Conference on %D 2007 %T From Videos to Verbs: Mining Videos for Activities using a Cascade of Dynamical Systems %A Turaga, P.K. %A Veeraraghavan,A. %A Chellapa, Rama %K activities %K clustering;image %K clustering;video %K extraction;dynamical %K mining;video %K processing; %K sequence %K sequences;pattern %K signal %K stream;video %K systems;single %K video %X Clustering video sequences in order to infer and extract activities from a single video stream is an extremely important problem and has significant potential in video indexing, surveillance, activity discovery and event recognition. Clustering a video sequence into activities requires one to simultaneously recognize activity boundaries (activity consistent subsequences) and cluster these activity subsequences. In order to do this, we build a generative model for activities (in video) using a cascade of dynamical systems and show that this model is able to capture and represent a diverse class of activities. We then derive algorithms to learn the model parameters from a video stream and also show how a single video sequence may be clustered into different clusters where each cluster represents an activity. We also propose a novel technique to build affine, view, rate invariance of the activity into the distance metric for clustering. Experiments show that the clusters found by the algorithm correspond to semantically meaningful activities. %B Computer Vision and Pattern Recognition, 2007. CVPR '07. IEEE Conference on %P 1 - 8 %8 2007/06// %G eng %R 10.1109/CVPR.2007.383170 %0 Conference Paper %B Computer Vision and Pattern Recognition, 2007. CVPR '07. IEEE Conference on %D 2007 %T Human Identification using Gait and Face %A Chellapa, Rama %A Roy-Chowdhury, A.K. %A Kale, A. %K algorithm;visual-hull %K analysis;image %K approach;cameras;face %K approximation;probabilistic %K database;camera;face %K databases; %K fusion;face %K fusion;human %K fusion;probability;video %K Gait %K identification;planar %K NIST %K processing;visual %K recognition %K recognition;gait %K recognition;view-invariant %K signal %X In general the visual-hull approach for performing integrated face and gait recognition requires at least two cameras. In this paper we present experimental results for fusion of face and gait for the single camera case. We considered the NIST database which contains outdoor face and gait data for 30 subjects. In the NIST database, subjects walk along an inverted Sigma pattern. In (A. Kale, et al., 2003), we presented a view-invariant gait recognition algorithm for the single camera case along with some experimental evaluations. In this chapter we present the results of our view-invariant gait recognition algorithm in (A. Kale, et al., 2003) on the NIST database. The algorithm is based on the planar approximation of the person which is valid when the person walks far away from the camera. In (S. Zhou et al., 2003), an algorithm for probabilistic recognition of human faces from video was proposed and the results were demonstrated on the NIST database. Details of these methods can be found in the respective papers. We give an outline of the fusion strategy here. %B Computer Vision and Pattern Recognition, 2007. CVPR '07. IEEE Conference on %P 1 - 2 %8 2007/06// %G eng %R 10.1109/CVPR.2007.383523 %0 Conference Paper %B Acoustics, Speech and Signal Processing, 2007. ICASSP 2007. IEEE International Conference on %D 2007 %T Joint Acoustic-Video Fingerprinting of Vehicles, Part I %A Cevher, V. %A Chellapa, Rama %A McClellan, J.H. %K acoustic %K acoustic-video %K components;joint %K detection;acoustic %K estimation;video %K fingerprinting;passive %K processing; %K processing;acoustic %K sensor;vehicle %K sensors;acoustic %K sensors;wheel %K SHAPE %K signal %K speed %K transducers;video %K wave-pattern;envelope %X We address vehicle classification and measurement problems using acoustic and video sensors. In this paper, we show how to estimate a vehicle's speed, width, and length by jointly estimating its acoustic wave-pattern using a single passive sensor that records the vehicle's drive-by noise. The acoustic wave-pattern is approximated using three envelope shape (ES) components, which approximate the shape of the received signal's power envelope. We incorporate the parameters of the ES components along with the estimates of the vehicle engine RPM and number of cylinders to create a vehicle profile vector that forms an intuitive discriminatory feature space. In the companion paper, we discuss vehicle classification and mensuration based on silhouette extraction and wheel detection, using a video sensor. Vehicle speed estimation and classification results are provided using field data. %B Acoustics, Speech and Signal Processing, 2007. ICASSP 2007. IEEE International Conference on %V 2 %P II-745 -II-748 - II-745 -II-748 %8 2007/04// %G eng %R 10.1109/ICASSP.2007.366343 %0 Conference Paper %B Acoustics, Speech and Signal Processing, 2007. ICASSP 2007. IEEE International Conference on %D 2007 %T Joint Acoustic-Video Fingerprinting of Vehicles, Part II %A Cevher, V. %A Guo, F. %A Sankaranarayanan,A. C %A Chellapa, Rama %K acoustic-video %K analysis;video %K approximations;acoustic %K Bayesian %K colour %K density %K efficiency;joint %K estimation;Bayes %K fingerprinting;metrology %K framework;Laplacian %K functions;performance %K fusion;color %K identification;image %K invariants;computational %K methods;acoustic %K metrology;acoustic %K processing; %K processing;fingerprint %K signal %K video %X In this second paper, we first show how to estimate the wheelbase length of a vehicle using line metrology in video. We then address the vehicle fingerprinting problem using vehicle silhouettes and color invariants. We combine the acoustic metrology and classification results discussed in Part I with the video results to improve estimation performance and robustness. The acoustic video fusion is achieved in a Bayesian framework by assuming conditional independence of the observations of each modality. For the metrology density functions, Laplacian approximations are used for computational efficiency. Experimental results are given using field data %B Acoustics, Speech and Signal Processing, 2007. ICASSP 2007. IEEE International Conference on %V 2 %P II-749 -II-752 - II-749 -II-752 %8 2007/04// %G eng %R 10.1109/ICASSP.2007.366344 %0 Conference Paper %B Geoscience and Remote Sensing Symposium, 2007. IGARSS 2007. IEEE International %D 2007 %T Kernel fully constrained least squares abundance estimates %A Broadwater, J. %A Chellapa, Rama %A Banerjee, A. %A Burlina, P. %K abundance %K algorithm;kernel %K analysis; %K AVIRIS %K based %K constrained %K constraint;feature %K constraint;spectral %K estimates;linear %K extraction;geophysical %K feature %K fully %K image;hyperspectral %K imagery;kernel %K least %K mixing %K model;nonnegativity %K processing;geophysical %K processing;multidimensional %K processing;spectral %K signal %K space;kernel %K squares %K techniques;image %K unmixing;sum-to-one %X A critical step for fitting a linear mixing model to hyperspectral imagery is the estimation of the abundances. The abundances are the percentage of each end member within a given pixel; therefore, they should be non-negative and sum to one. With the advent of kernel based algorithms for hyperspectral imagery, kernel based abundance estimates have become necessary. This paper presents such an algorithm that estimates the abundances in the kernel feature space while maintaining the non-negativity and sum-to-one constraints. The usefulness of the algorithm is shown using the AVIRIS Cuprite, Nevada image. %B Geoscience and Remote Sensing Symposium, 2007. IGARSS 2007. IEEE International %P 4041 - 4044 %8 2007/07// %G eng %R 10.1109/IGARSS.2007.4423736 %0 Conference Paper %B Computer Vision and Pattern Recognition, 2007. CVPR '07. IEEE Conference on %D 2007 %T Microphone Arrays as Generalized Cameras for Integrated Audio Visual Processing %A O'Donovan,A. %A Duraiswami, Ramani %A Neumann, J. %K algorithms;generalized %K arrays;scene %K arrays;sensor %K audio %K audio-visual %K cameras;geometrical %K fusion; %K fusion;computer-vision %K geometry;sound %K information %K information;geometry %K inspired %K location;acoustic %K processing;array %K processing;audio-visual %K processing;microphone %K sensors;integrated %K signal %K sources;source %K systems;cameras;computer %K vision;geometry;microphone %K visual %X Combinations of microphones and cameras allow the joint audio visual sensing of a scene. Such arrangements of sensors are common in biological organisms and in applications such as meeting recording and surveillance where both modalities are necessary to provide scene understanding. Microphone arrays provide geometrical information on the source location, and allow the sound sources in the scene to be separated and the noise suppressed, while cameras allow the scene geometry and the location and motion of people and other objects to be estimated. In most previous work the fusion of the audio-visual information occurs at a relatively late stage. In contrast, we take the viewpoint that both cameras and microphone arrays are geometry sensors, and treat the microphone arrays as generalized cameras. We employ computer-vision inspired algorithms to treat the combined system of arrays and cameras. In particular, we consider the geometry introduced by a general microphone array and spherical microphone arrays. The latter show a geometry that is very close to central projection cameras, and we show how standard vision based calibration algorithms can be profitably applied to them. Experiments are presented that demonstrate the usefulness of the considered approach. %B Computer Vision and Pattern Recognition, 2007. CVPR '07. IEEE Conference on %P 1 - 8 %8 2007/06// %G eng %R 10.1109/CVPR.2007.383345 %0 Conference Paper %B Computer Vision and Pattern Recognition, 2007. CVPR '07. IEEE Conference on %D 2007 %T Multimodal Tracking for Smart Videoconferencing and Video Surveillance %A Zotkin,Dmitry N %A Raykar,V.C. %A Duraiswami, Ramani %A Davis, Larry S. %K (numerical %K 3D %K algorithm;smart %K analysis;least %K approximations;particle %K arrays;nonlinear %K cameras;multiple %K Carlo %K estimator;multimodal %K filter;self-calibration %K Filtering %K least %K likelihood %K methods);teleconferencing;video %K methods;image %K microphone %K MOTION %K motion;Monte-Carlo %K problem;particle %K processing;video %K signal %K simulations;maximum %K squares %K surveillance; %K surveillance;Monte %K tracking;multiple %K videoconferencing;video %X Many applications require the ability to track the 3-D motion of the subjects. We build a particle filter based framework for multimodal tracking using multiple cameras and multiple microphone arrays. In order to calibrate the resulting system, we propose a method to determine the locations of all microphones using at least five loudspeakers and under assumption that for each loudspeaker there exists a microphone very close to it. We derive the maximum likelihood (ML) estimator, which reduces to the solution of the non-linear least squares problem. We verify the correctness and robustness of the multimodal tracker and of the self-calibration algorithm both with Monte-Carlo simulations and on real data from three experimental setups. %B Computer Vision and Pattern Recognition, 2007. CVPR '07. IEEE Conference on %P 1 - 2 %8 2007/06// %G eng %R 10.1109/CVPR.2007.383525 %0 Conference Paper %B Computer Vision and Pattern Recognition, 2007. CVPR '07. IEEE Conference on %D 2007 %T Objects in Action: An Approach for Combining Action Understanding and Object Perception %A Gupta,A. %A Davis, Larry S. %K analysis;image %K approach;action %K Bayesian %K classification;image %K classification;object %K framework;Bayes %K interactions;inference %K interpretation %K localization;object %K methods;gesture %K MOTION %K movements;human %K perception; %K perception;human-object %K perception;object %K process;object %K processing;visual %K recognition;image %K recognition;video %K segmentation;action %K segmentation;object %K signal %K understanding;human %X Analysis of videos of human-object interactions involves understanding human movements, locating and recognizing objects and observing the effects of human movements on those objects. While each of these can be conducted independently, recognition improves when interactions between these elements are considered. Motivated by psychological studies of human perception, we present a Bayesian approach which unifies the inference processes involved in object classification and localization, action understanding and perception of object reaction. Traditional approaches for object classification and action understanding have relied on shape features and movement analysis respectively. By placing object classification and localization in a video interpretation framework, we can localize and classify objects which are either hard to localize due to clutter or hard to recognize due to lack of discriminative features. Similarly, by applying context on human movements from the objects on which these movements impinge and the effects of these movements, we can segment and recognize actions which are either too subtle to perceive or too hard to recognize using motion features alone. %B Computer Vision and Pattern Recognition, 2007. CVPR '07. IEEE Conference on %P 1 - 8 %8 2007/06// %G eng %R 10.1109/CVPR.2007.383331 %0 Conference Paper %B Computer Vision, 2007. ICCV 2007. IEEE 11th International Conference on %D 2007 %T Probabilistic Fusion Tracking Using Mixture Kernel-Based Bayesian Filtering %A Han,Bohyung %A Joo,Seong-Wook %A Davis, Larry S. %K (numerical %K adaptive %K arrangement %K Bayesian %K Filtering %K filtering;multiple %K filters;probabilistic %K fusion %K fusion;tracking; %K integration;mixture %K kernel-based %K methods);sensor %K methods;array %K particle %K processing;particle %K sensors;object %K signal %K system;blind %K techniques;visual %K tracking;Bayes %K tracking;particle %K tracking;sensor %X Even though sensor fusion techniques based on particle filters have been applied to object tracking, their implementations have been limited to combining measurements from multiple sensors by the simple product of individual likelihoods. Therefore, the number of observations is increased as many times as the number of sensors, and the combined observation may become unreliable through blind integration of sensor observations - especially if some sensors are too noisy and non-discriminative. We describe a methodology to model interactions between multiple sensors and to estimate the current state by using a mixture of Bayesian filters - one filter for each sensor, where each filter makes a different level of contribution to estimate the combined posterior in a reliable manner. In this framework, an adaptive particle arrangement system is constructed in which each particle is allocated to only one of the sensors for observation and a different number of samples is assigned to each sensor using prior distribution and partial observations. We apply this technique to visual tracking in logical and physical sensor fusion frameworks, and demonstrate its effectiveness through tracking results. %B Computer Vision, 2007. ICCV 2007. IEEE 11th International Conference on %P 1 - 8 %8 2007/10// %G eng %R 10.1109/ICCV.2007.4408938 %0 Conference Paper %B Computer Vision, 2007. ICCV 2007. IEEE 11th International Conference on %D 2007 %T Robust Visual Tracking Using the Time-Reversibility Constraint %A Wu,Hao %A Chellapa, Rama %A Sankaranarayanan,A. C %A Zhou,S. K %K backward %K constraint;video %K criterion;state %K forward %K frame %K KLT %K processing; %K processing;video %K processing;visual %K signal %K tracker;minimization %K tracking;minimisation;video %K vectors;time-reversibility %X Visual tracking is a very important front-end to many vision applications. We present a new framework for robust visual tracking in this paper. Instead of just looking forward in the time domain, we incorporate both forward and backward processing of video frames using a novel time-reversibility constraint. This leads to a new minimization criterion that combines the forward and backward similarity functions and the distances of the state vectors between the forward and backward states of the tracker. The new framework reduces the possibility of the tracker getting stuck in local minima and significantly improves the tracking robustness and accuracy. Our approach is general enough to be incorporated into most of the current tracking algorithms. We illustrate the improvements due to the proposed approach for the popular KLT tracker and a search based tracker. The experimental results show that the improved KLT tracker significantly outperforms the original KLT tracker. The time-reversibility constraint used for tracking can be incorporated to improve the performance of optical flow, mean shift tracking and other algorithms. %B Computer Vision, 2007. ICCV 2007. IEEE 11th International Conference on %P 1 - 8 %8 2007/10// %G eng %R 10.1109/ICCV.2007.4408956 %0 Journal Article %J Multimedia, IEEE Transactions on %D 2007 %T Target Tracking Using a Joint Acoustic Video System %A Cevher, V. %A Sankaranarayanan,A. C %A McClellan, J.H. %A Chellapa, Rama %K (numerical %K acoustic %K adaptive %K appearance %K approach;synchronization;time-delay %K data %K delay;acoustic %K divergence;acoustic %K estimate;joint %K estimation;hidden %K feature %K filter;sliding %K Filtering %K fusion;multitarget %K fusion;synchronisation;target %K highways;direction-of-arrival %K Kullback-Leibler %K methods);sensor %K model;particle %K processing; %K processing;automated %K propagation %K removal;optical %K signal %K system;multimodal %K tracking;acoustic %K tracking;direction-of-arrival %K tracking;occlusion;online %K tracking;particle %K tracking;video %K variable;visual %K video %K window;state-space %X In this paper, a multitarget tracking system for collocated video and acoustic sensors is presented. We formulate the tracking problem using a particle filter based on a state-space approach. We first discuss the acoustic state-space formulation whose observations use a sliding window of direction-of-arrival estimates. We then present the video state space that tracks a target's position on the image plane based on online adaptive appearance models. For the joint operation of the filter, we combine the state vectors of the individual modalities and also introduce a time-delay variable to handle the acoustic-video data synchronization issue, caused by acoustic propagation delays. A novel particle filter proposal strategy for joint state-space tracking is introduced, which places the random support of the joint filter where the final posterior is likely to lie. By using the Kullback-Leibler divergence measure, it is shown that the joint operation of the filter decreases the worst case divergence of the individual modalities. The resulting joint tracking filter is quite robust against video and acoustic occlusions due to our proposal strategy. Computer simulations are presented with synthetic and field data to demonstrate the filter's performance %B Multimedia, IEEE Transactions on %V 9 %P 715 - 727 %8 2007/06// %@ 1520-9210 %G eng %N 4 %R 10.1109/TMM.2007.893340 %0 Conference Paper %B Image Analysis and Processing, 2007. ICIAP 2007. 14th International Conference on %D 2007 %T Video Biometrics %A Chellapa, Rama %A Aggarwal,G. %K (access %K analysis;video %K biometrics;biometrics %K control);face %K dynamics;ofbiometric %K images;surveillance %K inherent %K MOTION %K processing; %K recognition;image %K recognition;still %K scenarios;unconstrained %K scenarios;video %K signal %X A strong requirement to come up with secure and user- friendly ways to authenticate and identify people, to safeguard their rights and interests, has probably been the main guiding force behind biometrics research. Though a vast amount of research has been done to recognize humans based on still images, the problem is still far from solved for unconstrained scenarios. This has led to an increased interest in using video for the task of biometric recognition. Not only does video provide more information, but also is more suitable for recognizing humans in general surveillance scenarios. Other than the multitude of still frames, video makes it possible to characterize biometrics based on inherent dynamics like gait which is not possible with still images. In this paper, we describe several recent algorithms to illustrate the usefulness of videos to identify humans. A brief discussion on remaining challenges is also included. %B Image Analysis and Processing, 2007. ICIAP 2007. 14th International Conference on %P 363 - 370 %8 2007/09// %G eng %R 10.1109/ICIAP.2007.4362805 %0 Conference Paper %B Acoustics, Speech and Signal Processing, 2006. ICASSP 2006 Proceedings. 2006 IEEE International Conference on %D 2006 %T An Adaptive Threshold Method for Hyperspectral Target Detection %A Broadwater, J. %A Chellapa, Rama %K adaptive %K blind %K detection; %K detection;inverse %K importance %K method;background %K processing;importance %K sampling;geophysical %K sampling;object %K signal %K statistics;hyperspectral %K target %K threshold %X In this paper, we present a new approach to automatically determine a detector threshold. This research problem is especially important in hyperspectral target detection as targets are typically very similar to the background. While a number of methods exist to determine the threshold, these methods require either large amounts of data or make simplifying assumptions about the background distribution. We use a method called inverse blind importance sampling which requires few samples and makes no a-priori assumptions about the background statistics. Results show the promise of this algorithm to determine thresholds for fixed false alarm densities in hyperspectral detectors %B Acoustics, Speech and Signal Processing, 2006. ICASSP 2006 Proceedings. 2006 IEEE International Conference on %V 5 %P V - V %8 2006/05// %G eng %R 10.1109/ICASSP.2006.1661497 %0 Conference Paper %B Acoustics, Speech and Signal Processing, 2006. ICASSP 2006 Proceedings. 2006 IEEE International Conference on %D 2006 %T Headphone-Based Reproduction of 3D Auditory Scenes Captured by Spherical/Hemispherical Microphone Arrays %A Zhiyun Li %A Duraiswami, Ramani %K 3D %K analysis;headphones;microphone %K arrays;orthogonal %K arrays;spatial %K auditory %K beam-space;spatial %K filter;spherical %K filters; %K function;headphone-based %K harmonics;array %K microphone %K processing;audio %K processing;harmonic %K related %K reproduction;hemispherical %K scenes;head %K signal %K transfer %X We propose a method to reproduce 3D auditory scenes captured by spherical microphone arrays over headphones. This algorithm employs expansions of the captured sound and the head related transfer function over the sphere and uses the orthonormality of the spherical harmonics. Using a spherical microphone array, we first record the 3D auditory scene, then the recordings are spatially filtered and reproduced through headphones in the orthogonal beam-space of the head related transfer functions (HRTFs). We use the KEMAR HRTF measurements to verify our algorithm %B Acoustics, Speech and Signal Processing, 2006. ICASSP 2006 Proceedings. 2006 IEEE International Conference on %V 5 %P V - V %8 2006/05// %G eng %R 10.1109/ICASSP.2006.1661281 %0 Conference Paper %B Image Processing, 2006 IEEE International Conference on %D 2006 %T Invariant Geometric Representation of 3D Point Clouds for Registration and Matching %A Biswas,S. %A Aggarwal,G. %A Chellapa, Rama %K 3D %K cloud;computer %K function %K geometric %K graphics;geophysical %K graphics;image %K Interpolation %K matching;image %K point %K processing;image %K reconstruction;image %K registration;image %K registration;implicit %K representation;interpolation; %K representation;variational %K signal %K technique;clouds;computer %K value;invariant %X Though implicit representations of surfaces have often been used for various computer graphics tasks like modeling and morphing of objects, it has rarely been used for registration and matching of 3D point clouds. Unlike in graphics, where the goal is precise reconstruction, we use isosurfaces to derive a smooth and approximate representation of the underlying point cloud which helps in generalization. Implicit surfaces are generated using a variational interpolation technique. Implicit function values on a set of concentric spheres around the 3D point cloud of object are used as features for matching. Geometric-invariance is achieved by decomposing implicit values based feature set into various spherical harmonics. The decomposition provides a compact representation of 3D point clouds while achieving rotation invariance %B Image Processing, 2006 IEEE International Conference on %P 1209 - 1212 %8 2006/10// %G eng %R 10.1109/ICIP.2006.312542 %0 Conference Paper %B Acoustics, Speech and Signal Processing, 2006. ICASSP 2006 Proceedings. 2006 IEEE International Conference on %D 2006 %T Motion Based Correspondence for 3D Tracking of Multiple Dim Objects %A Veeraraghavan,A. %A Srinivasan, M. %A Chellapa, Rama %A Baird, E. %A Lamont, R. %K 3D %K analysis;motion %K analysis;video %K based %K cameras;feature %K correspondence;motion %K dim %K extraction;image %K extraction;multiple %K features %K MOTION %K objects;video %K processing; %K signal %K tracking;motion %X Tracking multiple objects in a video is a demanding task that is frequently encountered in several systems such as surveillance and motion analysis. Ability to track objects in 3D requires the use of multiple cameras. While tracking multiple objects using multiples video cameras, establishing correspondence between objects in the various cameras is a nontrivial task. Specifically, when the targets are dim or are very far away from the camera, appearance cannot be used in order to establish this correspondence. Here, we propose a technique to establish correspondence across cameras using the motion features extracted from the targets, even when the relative position of the cameras is unknown. Experimental results are provided for the problem of tracking multiple bees in natural flight using two cameras. The reconstructed 3D flight paths of the bees show some interesting flight patterns %B Acoustics, Speech and Signal Processing, 2006. ICASSP 2006 Proceedings. 2006 IEEE International Conference on %V 2 %P II - II %8 2006/05// %G eng %R 10.1109/ICASSP.2006.1660431 %0 Conference Paper %B Acoustics, Speech and Signal Processing, 2006. ICASSP 2006 Proceedings. 2006 IEEE International Conference on %D 2006 %T Non-Intrusive Forensic Analysis of Visual Sensors Using Output Images %A Swaminathan,A. %A M. Wu %A Liu,K. J.R %K algorithms;interpolation %K analysis;image %K analysis;output %K array %K cameras;forensic %K Color %K colour %K engineering;forensic %K forensic %K images;visual %K methods;nonintrusive %K PROCESSING %K sensor;digital %K sensors;cameras;image %K sensors;interpolation; %K signal %X This paper considers the problem of non-intrusive forensic analysis of the individual components in visual sensors and its implementation. As a new addition to the emerging area of forensic engineering, we present a framework for analyzing technologies employed inside digital cameras based on output images, and develop a set of forensic signal processing algorithms for visual sensors based on color array sensor and interpolation methods. We show through simulations that the proposed method is robust against compression and noise, and can help identify various processing components inside the camera. Such a non-intrusive forensic framework would provide useful evidence for analyzing technology infringement and evolution for visual sensors %B Acoustics, Speech and Signal Processing, 2006. ICASSP 2006 Proceedings. 2006 IEEE International Conference on %V 5 %P V - V %8 2006/05// %G eng %R 10.1109/ICASSP.2006.1661297 %0 Conference Paper %B Acoustics, Speech and Signal Processing, 2006. ICASSP 2006 Proceedings. 2006 IEEE International Conference on %D 2006 %T Security Issues in Cooperative Communications: Tracing Adversarial Relays %A Mao,Yinian %A M. Wu %K adaptive %K Communication %K communications;adaptive %K construction;multiple %K detection;adversarial %K detection;decoding;radiocommunication;telecommunication %K issues;strategic %K nodes;pseudo-random %K relay %K relay;decode-and-forward %K relays;cooperative %K relays;wireless %K security; %K sequence %K signal %K spectrum %K spread %K strategy;direct %K symbol %K symbols;security %K system;cooperative %K tracing %X Cooperative communication system explores a new dimension of diversity in wireless communications to combat unfriendly wireless environment through strategic relays. While this emerging technology is promising in improving communication quality, some security problems inherent to cooperative relay also arise. In this paper we investigate the security issues in cooperative communications that consist of multiple relay nodes using decode-and-forward strategy. In particular, we consider the situation where one of the relay nodes is adversarial and tries to corrupt the communications by sending garbled signals. We show that the conventional physical-layer signal detection will not be effective in such a scenario, and the application-layer cryptography alone is not sufficient to identify the adversarial relay. To combat adversarial relay, we propose a cross-layer scheme that uses pseudo-random tracing symbols, with an adaptive signal detection rule at the physical layer, and direct sequence spread spectrum symbol construction at the application layer for tracing and identifying adversarial relay. Our experimental simulations show that the proposed tracing scheme is effective and efficient %B Acoustics, Speech and Signal Processing, 2006. ICASSP 2006 Proceedings. 2006 IEEE International Conference on %V 4 %P IV - IV %8 2006/05// %G eng %R 10.1109/ICASSP.2006.1660907 %0 Journal Article %J Image Processing, IEEE Transactions on %D 2006 %T Structure From Planar Motion %A Li,Jian %A Chellapa, Rama %K algebra;road %K analysis;matrix %K camera;surveillance %K directional %K matrix;planar %K MOTION %K motion;stationary %K perspective %K processing; %K reconstruction %K signal %K system;image %K uncertainty;measurement %K vehicles;surveillance;video %K videos;vehicle %X Planar motion is arguably the most dominant type of motion in surveillance videos. The constraints on motion lead to a simplified factorization method for structure from planar motion when using a stationary perspective camera. Compared with methods for general motion , our approach has two major advantages: a measurement matrix that fully exploits the motion constraints is formed such that the new measurement matrix has a rank of at most 3, instead of 4; the measurement matrix needs similar scalings, but the estimation of fundamental matrices or epipoles is not needed. Experimental results show that the algorithm is accurate and fairly robust to noise and inaccurate calibration. As the new measurement matrix is a nonlinear function of the observed variables, a different method is introduced to deal with the directional uncertainty in the observed variables. Differences and the dual relationship between planar motion and planar object are also clarified. Based on our method, a fully automated vehicle reconstruction system has been designed %B Image Processing, IEEE Transactions on %V 15 %P 3466 - 3477 %8 2006/11// %@ 1057-7149 %G eng %N 11 %R 10.1109/TIP.2006.881943 %0 Conference Paper %B Acoustics, Speech, and Signal Processing, 2005. Proceedings. (ICASSP '05). IEEE International Conference on %D 2005 %T Approximate expressions for the mean and the covariance of the maximum likelihood estimator for acoustic source localization %A Raykar,V.C. %A Duraiswami, Ramani %K (mathematics); %K acoustic %K approximate %K approximation %K array %K array; %K covariance %K estimation; %K expansion; %K expressions; %K function; %K likelihood %K localization; %K matrices; %K matrix; %K maximum %K mean %K microphone %K objective %K processing; %K series %K signal %K source %K Taylor %K theory; %K vector; %K vectors; %X Acoustic source localization using multiple microphones can be formulated as a maximum likelihood estimation problem. The estimator is implicitly defined as the minimum of a certain objective function. As a result, we cannot get explicit expressions for the mean and the covariance of the estimator. We derive approximate expressions for the mean vector and covariance matrix of the estimator using Taylor's series expansion of the implicitly defined estimator. The validity of our expressions is verified by Monte-Carlo simulations. We also study the performance of the estimator for different microphone array configurations. %B Acoustics, Speech, and Signal Processing, 2005. Proceedings. (ICASSP '05). IEEE International Conference on %V 3 %P iii/73 - iii/76 Vol. 3 - iii/73 - iii/76 Vol. 3 %8 2005/03// %G eng %R 10.1109/ICASSP.2005.1415649 %0 Journal Article %J Signal Processing Magazine, IEEE %D 2005 %T An interactive and team approach to multimedia design curriculum %A M. Wu %A Liu,K. J.R %K approach; %K communication; %K courses; %K curriculum %K curriculum; %K design %K development; %K digital %K education; %K educational %K interactive %K learning; %K multimedia %K multimedia; %K processing; %K signal %K team %X Over the past decade, increasingly powerful technologies have made it easier to compress, distribute, and store multimedia content. The merger of computing and communications has created a ubiquitous infrastructure that brings digital multimedia closer to the users and opens up tremendous educational and commercial opportunities in multimedia content creation, delivery, rendering, and archiving for millions of users worldwide. Multimedia has become a basic skill demanded by an increasing number of potential jobs for electrical engineering/computer science graduates. In this article, the authors intend to share their experiences and new ways of thinking about curriculum development. It is beneficial for colleagues in the multimedia signal processing areas for use in developing or revising the curriculum to fit the needs and resources of their own programs. %B Signal Processing Magazine, IEEE %V 22 %P 14 - 19 %8 2005/11// %@ 1053-5888 %G eng %N 6 %R 10.1109/MSP.2005.1550186 %0 Conference Paper %B Acoustics, Speech, and Signal Processing, 2005. Proceedings. (ICASSP '05). IEEE International Conference on %D 2005 %T Moving Object Segmentation and Dynamic Scene Reconstruction Using Two Frames %A Agrawala, Ashok K. %A Chellapa, Rama %K 3D %K analysis; %K constraints; %K dynamic %K ego-motion %K estimation; %K flow %K image %K images; %K independent %K INTENSITY %K least %K mean %K median %K method; %K methods; %K model; %K MOTION %K motion; %K moving %K object %K of %K parallax %K parallax; %K parametric %K processing; %K reconstruction; %K scene %K segmentation; %K signal %K squares %K squares; %K static %K structure; %K subspace %K surface %K translational %K two-frame %K unconstrained %K video %B Acoustics, Speech, and Signal Processing, 2005. Proceedings. (ICASSP '05). IEEE International Conference on %V 2 %P 705 - 708 %8 2005//18/23 %G eng %R 10.1109/ICASSP.2005.1415502 %0 Conference Paper %B Information Fusion, 2005 8th International Conference on %D 2005 %T A new approach to image fusion based on cokriging %A Memarsadeghi,N. %A Le Moigne,J. %A Mount, Dave %A Morisette,J. %K ALI; %K analysis; %K based %K cokriging; %K component %K data; %K forecasting %K fusion %K fusion; %K geophysical %K geostatistical %K Hyperion %K image %K Interpolation %K interpolation; %K invasive %K ISFS %K method; %K metrics; %K PCA; %K principal %K processing; %K project; %K QUALITY %K quantitative %K remote %K remotely %K sensed %K sensing; %K sensor %K sensors; %K signal %K species %K system; %K techniques; %K transforms; %K wavelet %K wavelet-based %X We consider the image fusion problem involving remotely sensed data. We introduce cokriging as a method to perform fusion. We investigate the advantages of fusing Hyperion with ALI. This evaluation is performed by comparing the classification of the fused data with that of input images and by calculating well-chosen quantitative fusion quality metrics. We consider the invasive species forecasting system (ISFS) project as our fusion application. The fusion of ALI with Hyperion data is studied using PCA and wavelet-based fusion. We then propose utilizing a geostatistical based interpolation method called cokriging as a new approach for image fusion. %B Information Fusion, 2005 8th International Conference on %V 1 %P 8 pp. - 8 pp. %8 2005/07// %G eng %R 10.1109/ICIF.2005.1591912 %0 Conference Paper %B Acoustics, Speech, and Signal Processing, 2005. Proceedings. (ICASSP '05). IEEE International Conference on %D 2005 %T A robust and self-reconfigurable design of spherical microphone array for multi-resolution beamforming %A Zhiyun Li %A Duraiswami, Ramani %K 3D %K anti-terrorism; %K array %K array; %K arrays; %K audio %K beam %K beamforming; %K beampattern %K directivity %K Frequency %K microphone %K multiresolution %K omnidirectional %K optimisation; %K optimization; %K processing; %K reorganization %K response; %K robustness; %K sampling; %K self-reconfigurable %K signal %K soundfield %K spherical %K steering; %X We describe a robust and self-reconfigurable design of a spherical microphone array for beamforming. Our approach achieves a multi-resolution spherical beamformer with performance that is either optimal in the approximation of desired beampattern or is optimal in the directivity achieved, both robustly. Our implementation converges to the optimal performance quickly while exactly satisfying the specified frequency response and robustness constraint in each iteration step without accumulated round-off errors. The advantage of this design lies in its robustness and self-reconfiguration in microphone array reorganization, such as microphone failure, which is highly desirable in online maintenance and anti-terrorism. Design examples and simulation results are presented. %B Acoustics, Speech, and Signal Processing, 2005. Proceedings. (ICASSP '05). IEEE International Conference on %V 4 %P iv/1137 - iv/1140 Vol. 4 - iv/1137 - iv/1140 Vol. 4 %8 2005/03// %G eng %R 10.1109/ICASSP.2005.1416214 %0 Conference Paper %B Image Processing, 2005. ICIP 2005. IEEE International Conference on %D 2005 %T Tracking objects in video using motion and appearance models %A Sankaranarayanan,A. C %A Chellapa, Rama %A Qinfen Zheng %K algorithm; %K analysis; %K appearance %K background %K estimation; %K image %K likelihood %K maximum %K model; %K models; %K MOTION %K object %K processing; %K signal %K target %K tracking %K tracking; %K video %K visual %X This paper proposes a visual tracking algorithm that combines motion and appearance in a statistical framework. It is assumed that image observations are generated simultaneously from a background model and a target appearance model. This is different from conventional appearance-based tracking, that does not use motion information. The proposed algorithm attempts to maximize the likelihood ratio of the tracked region, derived from appearance and background models. Incorporation of motion in appearance based tracking provides robust tracking, even when the target violates the appearance model. We show that the proposed algorithm performs well in tracking targets efficiently over long time intervals. %B Image Processing, 2005. ICIP 2005. IEEE International Conference on %V 2 %P II - 394-7 - II - 394-7 %8 2005/09// %G eng %R 10.1109/ICIP.2005.1530075 %0 Conference Paper %B Advanced Video and Signal Based Surveillance, 2005. AVSS 2005. IEEE Conference on %D 2005 %T VidMAP: video monitoring of activity with Prolog %A Shet,V. D %A Harwood,D. %A Davis, Larry S. %K activities %K algorithms; %K based %K Computer %K computerised %K engine; %K higher %K image %K level %K Logic %K monitoring; %K multicamera %K processing; %K programming; %K Prolog %K PROLOG; %K reasoning %K recognition; %K scenario; %K signal %K streaming; %K streams; %K Surveillance %K surveillance; %K system; %K video %K VISION %K vision; %K visual %X This paper describes the architecture of a visual surveillance system that combines real time computer vision algorithms with logic programming to represent and recognize activities involving interactions amongst people, packages and the environments through which they move. The low level computer vision algorithms log primitive events of interest as observed facts, while the higher level Prolog based reasoning engine uses these facts in conjunction with predefined rules to recognize various activities in the input video streams. The system is illustrated in action on a multi-camera surveillance scenario that includes both security and safety violations. %B Advanced Video and Signal Based Surveillance, 2005. AVSS 2005. IEEE Conference on %P 224 - 229 %8 2005/09// %G eng %R 10.1109/AVSS.2005.1577271 %0 Conference Paper %B Acoustics, Speech, and Signal Processing, 2004. Proceedings. (ICASSP '04). IEEE International Conference on %D 2004 %T Appearance-based tracking and recognition using the 3D trilinear tensor %A Jie Shao %A Zhou,S. K %A Chellapa, Rama %K 3D %K adaptive %K affine-transformation %K airborne %K algorithm; %K appearance %K appearance-based %K based %K estimation; %K geometrical %K image %K mathematical %K novel %K object %K operator; %K operators; %K perspective %K prediction; %K processing; %K recognition; %K representation; %K signal %K structure %K synthesis; %K template %K tensor %K tensor; %K tensors; %K tracking; %K transformation; %K trilinear %K updating; %K video %K video-based %K video; %K view %X The paper presents an appearance-based adaptive algorithm for simultaneous tracking and recognition by generalizing the transformation model to 3D perspective transformation. A trilinear tensor operator is used to represent the 3D geometrical structure. The tensor is estimated by predicting the corresponding points using the existing affine-transformation based algorithm. The estimated tensor is used to synthesize novel views to update the appearance templates. Some experimental results using airborne video are presented. %B Acoustics, Speech, and Signal Processing, 2004. Proceedings. (ICASSP '04). IEEE International Conference on %V 3 %P iii - 613-16 vol.3 - iii - 613-16 vol.3 %8 2004/05// %G eng %R 10.1109/ICASSP.2004.1326619 %0 Conference Paper %B Acoustics, Speech, and Signal Processing, 2004. Proceedings. (ICASSP '04). IEEE International Conference on %D 2004 %T Automatic position calibration of multiple microphones %A Raykar,V.C. %A Duraiswami, Ramani %K approximations; %K array %K audio %K AUTOMATIC %K calibration; %K closed %K covariance; %K dimensional %K estimation; %K form %K function %K implicit %K least %K likelihood %K loudspeakers; %K maximum %K microphone %K microphones; %K minimisation; %K minimization; %K multiple %K nonlinear %K position %K positions; %K problem; %K processing; %K signal %K solution; %K squares %K theorem; %K three %X We describe a method to determine automatically the relative three dimensional positions of multiple microphones using at least five loudspeakers in unknown positions. The only assumption we make is that there is a microphone which is very close to a loudspeaker. In our experimental setup, we attach one microphone to each loudspeaker. We derive the maximum likelihood estimator and the solution turns out to be a non-linear least squares problem. A closed form solution which can be used as the initial guess for the minimization routine is derived. We also derive an approximate expression for the covariance of the estimator using the implicit function theorem. Using this, we analyze the performance of the estimator with respect to the positions of the loudspeakers. The algorithm is validated using both Monte-Carlo simulations and a real-time experimental setup. %B Acoustics, Speech, and Signal Processing, 2004. Proceedings. (ICASSP '04). IEEE International Conference on %V 4 %P iv-69 - iv-72 vol.4 - iv-69 - iv-72 vol.4 %8 2004/05// %G eng %R 10.1109/ICASSP.2004.1326765 %0 Conference Paper %B Image Processing, 2004. ICIP '04. 2004 International Conference on %D 2004 %T A fine-structure image/video quality measure using local statistics %A Kim,Kyungnam %A Davis, Larry S. %K algorithm; %K background %K degradation; %K detection; %K foreground %K image %K line-structure %K local %K measure; %K modeling; %K no-reference %K object %K objective %K processing; %K QUALITY %K signal %K statistics; %K subtraction %K surveillance; %K video %X An objective no-reference measure is presented to assess line-structure image/video quality. It was designed to measure image/video quality for video surveillance applications, especially for background modeling and foreground object detection. The proposed measure using local statistics reflects image degradation well in terms of noise and blur. The experimental results on a background subtraction algorithm validate the usefulness of the proposed measure, by showing its correlation with the algorithm's performance. %B Image Processing, 2004. ICIP '04. 2004 International Conference on %V 5 %P 3535 - 3538 Vol. 5 - 3535 - 3538 Vol. 5 %8 2004/10// %G eng %R 10.1109/ICIP.2004.1421879 %0 Conference Paper %B Acoustics, Speech, and Signal Processing, 2004. Proceedings. (ICASSP '04). IEEE International Conference on %D 2004 %T Flexible layout and optimal cancellation of the orthonormality error for spherical microphone arrays %A Zhiyun Li %A Duraiswami, Ramani %A Grassi,E. %A Davis, Larry S. %K array %K audio %K base; %K beamforming; %K cable %K cancellation %K correction; %K error %K flexible %K harmonics; %K higher %K layout; %K microphone %K microphones; %K mounting %K optimization; %K order %K orthonormality %K outlets; %K processing; %K signal %K spherical %K surface; %X This paper describes an approach to achieving a flexible layout of microphones on the surface of a spherical microphone array for beamforming. Our approach achieves orthonormality of spherical harmonics to higher order for relatively distributed layouts. This gives great flexibility in microphone layout on the spherical surface. One direct advantage is that it makes it much easier to build a real world system, such as those with cable outlets and a mounting base, with minimal effects on the performance. Simulation results are presented. %B Acoustics, Speech, and Signal Processing, 2004. Proceedings. (ICASSP '04). IEEE International Conference on %V 4 %P iv-41 - iv-44 vol.4 - iv-41 - iv-44 vol.4 %8 2004/05// %G eng %R 10.1109/ICASSP.2004.1326758 %0 Conference Paper %B Geoscience and Remote Sensing Symposium, 2004. IGARSS '04. Proceedings. 2004 IEEE International %D 2004 %T A hybrid algorithm for subpixel detection in hyperspectral imagery %A Broadwater, J. %A Meth, R. %A Chellapa, Rama %K alarm %K algorithm;abundance %K algorithm;Fully %K algorithm;hybrid %K algorithm;statistical %K AMSD;Adaptive %K analysis;structured %K approximations;maximum %K backgrounds;subpixel %K constrained %K detection;alarm %K detection;target %K Detector;FCLS %K detector;hyperspectral %K estimation; %K estimation;emittance %K identification;adaptive %K imagery;reflectance %K least %K likelihood %K Matched %K matching;least %K processing;geophysical %K rate;generalized %K ratio %K signal %K signatures;spectral %K spectra;false %K spectra;spectral %K squares %K subspace %K systems;geophysical %K techniques;image %K tests;hybrid %K unmixing %X Numerous subpixel detection algorithms utilizing structured backgrounds have been developed over the past few years. These range from detection schemes based on spectral unmixing to generalized likelihood ratio tests. Spectral unmixing algorithms such as the Fully Constrained Least Squares (FCLS) algorithm have the advantage of physically modeling the interactions of spectral signatures based on reflectance/emittance spectroscopy. Generalized likelihood ratio tests like the Adaptive Matched Subspace Detector (AMSD) have the advantage of identifying targets that are statistically different from the background. Therefore, a hybrid detector based on both AMSD and FCLS was developed to take advantage of each detector's strengths. Results demonstrate that the hybrid detector achieved the lowest false alarm rates while also producing meaningful abundance estimates %B Geoscience and Remote Sensing Symposium, 2004. IGARSS '04. Proceedings. 2004 IEEE International %V 3 %P 1601 -1604 vol.3 - 1601 -1604 vol.3 %8 2004/09// %G eng %R 10.1109/IGARSS.2004.1370633 %0 Conference Paper %B Automatic Face and Gesture Recognition, 2004. Proceedings. Sixth IEEE International Conference on %D 2004 %T Illuminating light field: image-based face recognition across illuminations and poses %A Zhou,Shaohua %A Chellapa, Rama %K Face %K field; %K illuminating %K image-based %K Lambertain %K light %K lighting; %K model; %K multidimensional %K poses; %K processing; %K recognition; %K reflectance %K reflectivity; %K signal %X We present an image-based method for face recognition across different illuminations and different poses, where the term 'image-based' means that only 2D images are used and no explicit 3D models are needed. As face recognition across illuminations and poses involves three factors, namely identity, illumination, and pose, generalizations from known identities to novel identities, from known illuminations to novel illuminations, and from known poses to unknown poses are desired. Our approach, called the illuminating light field, derives an identity signature that is invariant to illuminations and poses, where a subspace encoding is assumed for the identity, a Lambertain reflectance model for the illumination, and a light field model for the poses. Experimental results using the PIE database demonstrate the effectiveness of the proposed approach. %B Automatic Face and Gesture Recognition, 2004. Proceedings. Sixth IEEE International Conference on %P 229 - 234 %8 2004/05// %G eng %R 10.1109/AFGR.2004.1301536 %0 Conference Paper %B Image Processing, 2004. ICIP '04. 2004 International Conference on %D 2004 %T Multiple view tracking of humans modelled by kinematic chains %A Sundaresan, A. %A Chellapa, Rama %A RoyChowdhury, R. %K 3D %K algorithm; %K analysis; %K body %K calibrated %K cameras; %K chain %K displacement; %K error %K estimation; %K human %K image %K iterative %K kinematic %K kinematics; %K methods; %K model; %K MOTION %K motion; %K multiple %K parameters; %K perspective %K Pixel %K processing; %K projection %K sequences; %K signal %K tracking; %K video %K view %X We use a kinematic chain to model human body motion. We estimate the kinematic chain motion parameters using pixel displacements calculated from video sequences obtained from multiple calibrated cameras to perform tracking. We derive a linear relation between the 2D motion of pixels in terms of the 3D motion parameters of various body parts using a perspective projection model for the cameras, a rigid body motion model for the base body and the kinematic chain model for the body parts. An error analysis of the estimator is provided, leading to an iterative algorithm for calculating the motion parameters from the pixel displacements. We provide experimental results to demonstrate the accuracy of our formulation. We also compare our iterative algorithm to the noniterative algorithm and discuss its robustness in the presence of noise. %B Image Processing, 2004. ICIP '04. 2004 International Conference on %V 2 %P 1009 - 1012 Vol.2 - 1009 - 1012 Vol.2 %8 2004/10// %G eng %R 10.1109/ICIP.2004.1419472 %0 Conference Paper %B Mixed and Augmented Reality, 2004. ISMAR 2004. Third IEEE and ACM International Symposium on %D 2004 %T Recording and reproducing high order surround auditory scenes for mixed and augmented reality %A Zhiyun Li %A Duraiswami, Ramani %A Davis, Larry S. %K array; %K audio %K auditory %K augmented %K Computer %K graphics; %K high %K loudspeaker %K microphone %K mixed %K order %K processing; %K reality %K reality; %K scene; %K signal %K surround %K system; %K technology; %K virtual %K VISION %K vision; %X Virtual reality systems are largely based on computer graphics and vision technologies. However, sound also plays an important role in human's interaction with the surrounding environment, especially for the visually impaired people. In this paper, we develop the theory of recording and reproducing real-world surround auditory scenes in high orders using specially designed microphone and loudspeaker arrays. It is complementary to vision-based technologies in creating mixed and augmented realities. Design examples and simulations are presented. %B Mixed and Augmented Reality, 2004. ISMAR 2004. Third IEEE and ACM International Symposium on %P 240 - 249 %8 2004/11// %G eng %R 10.1109/ISMAR.2004.51 %0 Journal Article %J Multimedia, IEEE Transactions on %D 2004 %T Rendering localized spatial audio in a virtual auditory space %A Zotkin,Dmitry N %A Duraiswami, Ramani %A Davis, Larry S. %K (computer %K 3-D %K audio %K audio; %K auditory %K augmented %K data %K environments; %K functions; %K graphics); %K Head %K interfaces; %K perceptual %K processing; %K reality %K reality; %K related %K rendering %K rendering; %K scene %K signal %K sonification; %K spaces; %K spatial %K transfer %K user %K virtual %X High-quality virtual audio scene rendering is required for emerging virtual and augmented reality applications, perceptual user interfaces, and sonification of data. We describe algorithms for creation of virtual auditory spaces by rendering cues that arise from anatomical scattering, environmental scattering, and dynamical effects. We use a novel way of personalizing the head related transfer functions (HRTFs) from a database, based on anatomical measurements. Details of algorithms for HRTF interpolation, room impulse response creation, HRTF selection from a database, and audio scene presentation are presented. Our system runs in real time on an office PC without specialized DSP hardware. %B Multimedia, IEEE Transactions on %V 6 %P 553 - 564 %8 2004/08// %@ 1520-9210 %G eng %N 4 %R 10.1109/TMM.2004.827516 %0 Conference Paper %B Acoustics, Speech, and Signal Processing, 2004. Proceedings. (ICASSP '04). IEEE International Conference on %D 2004 %T Robust two-camera tracking using homography %A Yue,Zhanfeng %A Zhou,S. K %A Chellapa, Rama %K Carlo %K filter; %K filters; %K frame %K framework; %K homography; %K image %K method; %K methods; %K Monte %K nonlinear %K occlusions; %K optical %K particle %K processing; %K robust %K sequences; %K sequential %K signal %K statistics; %K tracking %K tracking; %K two %K two-camera %K video %K view %K visual %X The paper introduces a two view tracking method which uses the homography relation between the two views to handle occlusions. An adaptive appearance-based model is incorporated in a particle filter to realize robust visual tracking. Occlusion is detected using robust statistics. When there is occlusion in one view, the homography from this view to other views is estimated from previous tracking results and used to infer the correct transformation for the occluded view. Experimental results show the robustness of the two view tracker. %B Acoustics, Speech, and Signal Processing, 2004. Proceedings. (ICASSP '04). IEEE International Conference on %V 3 %P iii - 1-4 vol.3 - iii - 1-4 vol.3 %8 2004/05// %G eng %R 10.1109/ICASSP.2004.1326466 %0 Conference Paper %B Image Processing, 2004. ICIP '04. 2004 International Conference on %D 2004 %T Security evaluation for communication-friendly encryption of multimedia %A Mao,Yinian %A M. Wu %K access %K approximation %K atomic %K attacks; %K bitrate %K coding; %K communication-friendly %K communication; %K control; %K cryptography; %K data %K encryption %K encryption; %K generic %K joint %K method; %K metrics; %K multimedia %K multimedia-specific %K overhead; %K primitives; %K processing/cryptographic %K Security %K security; %K signal %K system; %K Telecommunication %K video %X This paper addresses the access control issues unique to multimedia, by using a joint signal processing and cryptographic approach to multimedia encryption. Based on three atomic encryption primitives, we present a systematic study on how to strategically integrate different atomic operations to build a video encryption system. We also propose a set of multimedia-specific security metrics to quantify the security against approximation attacks and to complement the existing notion of generic data security. The resulting system can provide superior performance to both generic encryption and its simple adaptation to video in terms of a joint consideration of security, bitrate overhead, and communication friendliness. %B Image Processing, 2004. ICIP '04. 2004 International Conference on %V 1 %P 569 - 572 Vol. 1 - 569 - 572 Vol. 1 %8 2004/10// %G eng %R 10.1109/ICIP.2004.1418818 %0 Conference Paper %B Image Processing, 2004. ICIP '04. 2004 International Conference on %D 2004 %T Simultaneous background and foreground modeling for tracking in surveillance video %A Shao, J. %A Zhou,S. K %A Chellapa, Rama %K algorithm; %K analysis; %K background-foreground %K displacement %K estimation; %K image %K information; %K INTENSITY %K modeling; %K MOTION %K processes; %K processing; %K resolution; %K sequences; %K signal %K Stochastic %K Surveillance %K surveillance; %K tracking %K tracking; %K video %X We present a stochastic tracking algorithm for surveillance video where targets are dim and at low resolution. The algorithm builds motion models for both background and foreground by integrating motion and intensity information. Some other merits of the algorithm include adaptive selection of feature points for scene description and defining proper cost functions for displacement estimation. The experimental results show tracking robustness and precision in a challenging video sequences. %B Image Processing, 2004. ICIP '04. 2004 International Conference on %V 2 %P 1053 - 1056 Vol.2 - 1053 - 1056 Vol.2 %8 2004/10// %G eng %R 10.1109/ICIP.2004.1419483 %0 Conference Paper %B Intelligent Transportation Systems, 2004. Proceedings. The 7th International IEEE Conference on %D 2004 %T A spherical microphone array system for traffic scene analysis %A Zhiyun Li %A Duraiswami, Ramani %A Grassi,E. %A Davis, Larry S. %K -6 %K 3D %K analysis; %K array %K arrays; %K audio; %K auditory %K beamformer; %K capture; %K dB; %K environment; %K gain; %K microphone %K NOISE %K noise; %K processing; %K real %K robust %K scene %K signal %K spherical %K system; %K traffic %K traffic; %K virtual %K white %K World %X This paper describes a practical spherical microphone array system for traffic auditory scene capture and analysis. Our system uses 60 microphones positioned on the rigid surface of a sphere. We then propose an optimal design of a robust spherical beamformer with minimum white noise gain (WNG) of -6 dB. We test this system in a real-world traffic environment. Some preliminary simulation and experimental results are presented to demonstrate its performance. This system may also find applications in broader areas such as 3D audio, virtual environment, etc. %B Intelligent Transportation Systems, 2004. Proceedings. The 7th International IEEE Conference on %P 338 - 342 %8 2004/10// %G eng %R 10.1109/ITSC.2004.1398921 %0 Conference Paper %B Pattern Recognition, 2004. ICPR 2004. Proceedings of the 17th International Conference on %D 2004 %T A system identification approach for video-based face recognition %A Aggarwal,G. %A Chowdhury, A.K.R. %A Chellapa, Rama %K and %K autoregressive %K average %K dynamical %K Face %K gallery %K identification; %K image %K linear %K model; %K moving %K processes; %K processing; %K recognition; %K sequences; %K signal %K system %K system; %K video %K video-based %X The paper poses video-to-video face recognition as a dynamical system identification and classification problem. We model a moving face as a linear dynamical system whose appearance changes with pose. An autoregressive and moving average (ARMA) model is used to represent such a system. The choice of ARMA model is based on its ability to take care of the change in appearance while modeling the dynamics of pose, expression etc. Recognition is performed using the concept of sub space angles to compute distances between probe and gallery video sequences. The results obtained are very promising given the extent of pose, expression and illumination variation in the video data used for experiments. %B Pattern Recognition, 2004. ICPR 2004. Proceedings of the 17th International Conference on %V 4 %P 175 - 178 Vol.4 - 175 - 178 Vol.4 %8 2004/08// %G eng %R 10.1109/ICPR.2004.1333732 %0 Conference Paper %B Acoustics, Speech, and Signal Processing, 2004. Proceedings. (ICASSP '04). IEEE International Conference on %D 2004 %T Vehicle detection and tracking using acoustic and video sensors %A Chellapa, Rama %A Qian,Gang %A Qinfen Zheng %K acoustic %K applications; %K audio %K audio-visual %K beam-forming %K Carlo %K chain %K density %K detection; %K direction-of-arrival %K DOA %K empirical %K estimation; %K framework; %K functions; %K fusion %K fusion; %K joint %K Markov %K methods; %K Monte %K moving %K multimodal %K object %K optical %K posterior %K probability %K probability; %K processes; %K processing; %K sensing; %K sensor %K sensors; %K signal %K Surveillance %K surveillance; %K systems; %K target %K techniques; %K tracking; %K vehicle %K video %X Multimodal sensing has attracted much attention in solving a wide range of problems, including target detection, tracking, classification, activity understanding, speech recognition, etc. In surveillance applications, different types of sensors, such as video and acoustic sensors, provide distinct observations of ongoing activities. We present a fusion framework using both video and acoustic sensors for vehicle detection and tracking. In the detection phase, a rough estimate of target direction-of-arrival (DOA) is first obtained using acoustic data through beam-forming techniques. This initial DOA estimate designates the approximate target location in video. Given the initial target position, the DOA is refined by moving target detection using the video data. Markov chain Monte Carlo techniques are then used for joint audio-visual tracking. A novel fusion approach has been proposed for tracking, based on different characteristics of audio and visual trackers. Experimental results using both synthetic and real data are presented. Improved tracking performance has been observed by fusing the empirical posterior probability density functions obtained using both types of sensors. %B Acoustics, Speech, and Signal Processing, 2004. Proceedings. (ICASSP '04). IEEE International Conference on %V 3 %P iii - 793-6 vol.3 - iii - 793-6 vol.3 %8 2004/05// %G eng %R 10.1109/ICASSP.2004.1326664 %0 Conference Paper %B Computer Vision and Pattern Recognition, 2003. Proceedings. 2003 IEEE Computer Society Conference on %D 2003 %T Activity recognition using the dynamics of the configuration of interacting objects %A Vaswani, N. %A RoyChowdhury, A. %A Chellapa, Rama %K 2D %K abnormal %K abnormality %K abnormality; %K acoustic %K activity %K analysis; %K change; %K Computer %K configuration %K configuration; %K data; %K DETECTION %K detection; %K distribution; %K drastic %K dynamics; %K event; %K filter; %K hand-picked %K image %K infrared %K interacting %K learning; %K location %K low %K mean %K model; %K monitoring; %K MOTION %K moving %K noise; %K noisy %K object %K object; %K observation %K observation; %K particle %K pattern %K plane; %K point %K polygonal %K probability %K probability; %K problem; %K processing; %K radar %K recognition; %K resolution %K sensor; %K sensors; %K sequence; %K SHAPE %K shape; %K signal %K slow %K statistic; %K strategy; %K Surveillance %K surveillance; %K target %K test %K tracking; %K video %K video; %K visible %K vision; %X Monitoring activities using video data is an important surveillance problem. A special scenario is to learn the pattern of normal activities and detect abnormal events from a very low resolution video where the moving objects are small enough to be modeled as point objects in a 2D plane. Instead of tracking each point separately, we propose to model an activity by the polygonal 'shape' of the configuration of these point masses at any time t, and its deformation over time. We learn the mean shape and the dynamics of the shape change using hand-picked location data (no observation noise) and define an abnormality detection statistic for the simple case of a test sequence with negligible observation noise. For the more practical case where observation (point locations) noise is large and cannot be ignored, we use a particle filter to estimate the probability distribution of the shape given the noisy observations up to the current time. Abnormality detection in this case is formulated as a change detection problem. We propose a detection strategy that can detect both 'drastic' and 'slow' abnormalities. Our framework can be directly applied for object location data obtained using any type of sensors - visible, radar, infrared or acoustic. %B Computer Vision and Pattern Recognition, 2003. Proceedings. 2003 IEEE Computer Society Conference on %V 2 %P II - 633-40 vol.2 - II - 633-40 vol.2 %8 2003/06// %G eng %R 10.1109/CVPR.2003.1211526 %0 Conference Paper %B Multimedia and Expo, 2003. ICME '03. Proceedings. 2003 International Conference on %D 2003 %T Adaptive visual tracking and recognition using particle filters %A Zhou,Shaohua %A Chellapa, Rama %A Moghaddam, B. %K adaptive %K adaptive-velocity %K appearance %K extra-personal %K Filtering %K filters; %K image %K intra-personal %K model; %K MOTION %K particle %K processing; %K recognition; %K sequence; %K sequences; %K series %K signal %K spaces; %K theory; %K TIME %K tracking; %K video %K visual %X This paper presents an improved method for simultaneous tracking and recognition of human faces from video, where a time series model is used to resolve the uncertainties in tracking and recognition. The improvements mainly arise from three aspects: (i) modeling the inter-frame appearance changes within the video sequence using an adaptive appearance model and an adaptive-velocity motion model; (ii) modeling the appearance changes between the video frames and gallery images by constructing intra- and extra-personal spaces; and (iii) utilization of the fact that the gallery images are in frontal views. By embedding them in a particle filter, we are able to achieve a stabilized tracker and an accurate recognizer when confronted by pose and illumination variations. %B Multimedia and Expo, 2003. ICME '03. Proceedings. 2003 International Conference on %V 2 %P II - 349-52 vol.2 - II - 349-52 vol.2 %8 2003/07// %G eng %R 10.1109/ICME.2003.1221625 %0 Journal Article %J Signal Processing, IEEE Transactions on %D 2003 %T Anti-collusion fingerprinting for multimedia %A Trappe,W. %A M. Wu %A Wang,Z.J. %A Liu,K. J.R %K (mathematics); %K additive %K algorithm; %K and %K anti-collusion %K attack; %K averaging %K binary %K code %K codes; %K codevectors; %K coding; %K colluders %K collusion; %K combinatorial %K communication; %K compression; %K correlation; %K cost-effective %K data %K data; %K design %K DETECTION %K detection; %K digital %K embedding; %K fingerprinting; %K Gaussian %K identification; %K image %K images; %K keying; %K logical %K mathematics; %K Modulation %K modulation; %K multimedia %K multimedia; %K of %K on-off %K operation; %K orthogonal %K processes; %K real %K redistribution; %K Security %K signal %K signals; %K theory; %K tree-structured %K TREES %K watermarking; %X Digital fingerprinting is a technique for identifying users who use multimedia content for unintended purposes, such as redistribution. These fingerprints are typically embedded into the content using watermarking techniques that are designed to be robust to a variety of attacks. A cost-effective attack against such digital fingerprints is collusion, where several differently marked copies of the same content are combined to disrupt the underlying fingerprints. We investigate the problem of designing fingerprints that can withstand collusion and allow for the identification of colluders. We begin by introducing the collusion problem for additive embedding. We then study the effect that averaging collusion has on orthogonal modulation. We introduce a tree-structured detection algorithm for identifying the fingerprints associated with K colluders that requires O(Klog(n/K)) correlations for a group of n users. We next develop a fingerprinting scheme based on code modulation that does not require as many basis signals as orthogonal modulation. We propose a new class of codes, called anti-collusion codes (ACCs), which have the property that the composition of any subset of K or fewer codevectors is unique. Using this property, we can therefore identify groups of K or fewer colluders. We present a construction of binary-valued ACC under the logical AND operation that uses the theory of combinatorial designs and is suitable for both the on-off keying and antipodal form of binary code modulation. In order to accommodate n users, our code construction requires only O( radic;n) orthogonal signals for a given number of colluders. We introduce three different detection strategies that can be used with our ACC for identifying a suspect set of colluders. We demonstrate the performance of our ACC for fingerprinting multimedia and identifying colluders through experiments using Gaussian signals and real images. %B Signal Processing, IEEE Transactions on %V 51 %P 1069 - 1087 %8 2003/04// %@ 1053-587X %G eng %N 4 %R 10.1109/TSP.2003.809378 %0 Conference Paper %B Image Processing, 2003. ICIP 2003. Proceedings. 2003 International Conference on %D 2003 %T An appearance based approach for human and object tracking %A Capellades,M. B %A David Doermann %A DeMenthon,D. %A Chellapa, Rama %K algorithm; %K analysis; %K background %K basis; %K by %K Color %K colour %K correlogram %K detection; %K distributions; %K frame %K histogram %K human %K image %K information; %K object %K processing; %K segmentation; %K sequences; %K signal %K subtraction %K tracking; %K video %X A system for tracking humans and detecting human-object interactions in indoor environments is described. A combination of correlogram and histogram information is used to model object and human color distributions. Humans and objects are detected using a background subtraction algorithm. The models are built on the fly and used to track them on a frame by frame basis. The system is able to detect when people merge into groups and segment them during occlusion. Identities are preserved during the sequence, even if a person enters and leaves the scene. The system is also able to detect when a person deposits or removes an object from the scene. In the first case the models are used to track the object retroactively in time. In the second case the objects are tracked for the rest of the sequence. Experimental results using indoor video sequences are presented. %B Image Processing, 2003. ICIP 2003. Proceedings. 2003 International Conference on %V 2 %P II - 85-8 vol.3 - II - 85-8 vol.3 %8 2003/09// %G eng %R 10.1109/ICIP.2003.1246622 %0 Journal Article %J Image Processing, IEEE Transactions on %D 2003 %T Data hiding in image and video .I. Fundamental issues and solutions %A M. Wu %A Liu,Bede %K adaptive %K analysis; %K bits; %K colour %K condition; %K constant %K CONTROL %K data %K embedded %K EMBEDDING %K embedding; %K encapsulation; %K extractable %K hiding; %K image %K Modulation %K modulation; %K multilevel %K multiplexing %K multiplexing; %K NOISE %K nonstationary %K processing; %K rate; %K reviews; %K shuffling; %K signal %K signals; %K simulation; %K solution; %K techniques; %K variable %K video %K visual %X We address a number of fundamental issues of data hiding in image and video and propose general solutions to them. We begin with a review of two major types of embedding, based on which we propose a new multilevel embedding framework to allow the amount of extractable data to be adaptive according to the actual noise condition. We then study the issues of hiding multiple bits through a comparison of various modulation and multiplexing techniques. Finally, the nonstationary nature of visual signals leads to highly uneven distribution of embedding capacity and causes difficulty in data hiding. We propose an adaptive solution switching between using constant embedding rate with shuffling and using variable embedding rate with embedded control bits. We verify the effectiveness of our proposed solutions through analysis and simulation. %B Image Processing, IEEE Transactions on %V 12 %P 685 - 695 %8 2003/06// %@ 1057-7149 %G eng %N 6 %R 10.1109/TIP.2003.810588 %0 Journal Article %J Image Processing, IEEE Transactions on %D 2003 %T Data hiding in image and video .II. Designs and applications %A M. Wu %A Yu,H. %A Liu,Bede %K access %K annotation; %K authentication; %K capacity; %K conditions; %K content-based %K CONTROL %K control; %K copy %K data %K distortions; %K EMBEDDING %K embedding; %K encapsulation; %K extraction; %K frame %K hiding; %K image %K information; %K jitter; %K message %K multilevel %K NOISE %K noise; %K payload %K processing; %K robust %K signal %K uneven %K user %K video %X For pt. I see ibid., vol.12, no.6, p.685-95 (2003). This paper applies the solutions to the fundamental issues addressed in Part I to specific design problems of embedding data in image and video. We apply multilevel embedding to allow the amount of embedded information that can be reliably extracted to be adaptive with respect to the actual noise conditions. When extending the multilevel embedding to video, we propose strategies for handling uneven embedding capacity from region to region within a frame as well as from frame to frame. We also embed control information to facilitate the accurate extraction of the user data payload and to combat such distortions as frame jitter. The proposed algorithm can be used for a variety of applications such as copy control, access control, robust annotation, and content-based authentication. %B Image Processing, IEEE Transactions on %V 12 %P 696 - 705 %8 2003/06// %@ 1057-7149 %G eng %N 6 %R 10.1109/TIP.2003.810589 %0 Conference Paper %B Applications of Signal Processing to Audio and Acoustics, 2003 IEEE Workshop on. %D 2003 %T HRTF personalization using anthropometric measurements %A Zotkin,Dmitry N %A Hwang,J. %A Duraiswami, Ramani %A Davis, Larry S. %K acoustic %K anthropometric %K audio %K audio; %K auditory %K Ear %K functions; %K Head %K head-and-torso %K HRTF %K individualized %K localization; %K measurements; %K model; %K models; %K parameters; %K perception; %K personalization; %K physiological %K processing; %K related %K scattering; %K scene; %K signal %K sound %K spatial %K subjective %K transfer %K virtual %K wave %X Individualized head related transfer functions (HRTFs) are needed for accurate rendering of spatial audio, which is important in many applications. Since these are relatively tedious to acquire, they may not be acceptable for some applications. A number of studies have sought to perform simple customization of the HRTF. We propose and test a strategy for HRTF personalization, based on matching certain anthropometric ear parameters with the HRTF database, and the incorporation of a low-frequency "head-and-torso" model. We present preliminary tests aimed at evaluation of this customization. Results show that the approach improves both the accuracy of the localization and subjective perception of the virtual auditory scene. %B Applications of Signal Processing to Audio and Acoustics, 2003 IEEE Workshop on. %P 157 - 160 %8 2003/10// %G eng %R 10.1109/ASPAA.2003.1285855 %0 Conference Paper %B Proceedings. IEEE Conference on Advanced Video and Signal Based Surveillance, 2003. %D 2003 %T Scalable image-based multi-camera visual surveillance system %A Lim,Ser-Nam %A Davis, Larry S. %A Elgammal,A. %K ACQUISITION %K algorithm; %K camera; %K constraints; %K feature %K hidden %K image-based %K MATCHING %K maximum %K multi-camera %K occlusion %K pan-tilt-zoom %K PLAN %K prediction; %K processing; %K removal; %K scalable %K scheduling; %K signal %K Surveillance %K surveillance; %K system; %K task %K video %K view; %K visibility %K visual %K weight %X We describe the design of a scalable and wide coverage visual surveillance system. Scalability (the ability to add and remove cameras easily during system operation with minimal overhead and system degradation) is achieved by utilizing only image-based information for camera control. We show that when a pan-tilt-zoom camera pans and tilts, a given image point moves in a circular and a linear trajectory, respectively. We create a scene model using a plan view of the scene. The scene model makes it easy for us to handle occlusion prediction and schedule video acquisition tasks subject to visibility constraints. We describe a maximum weight matching algorithm to assign cameras to tasks that meet the visibility constraints. The system is illustrated both through simulations and real video from a 6-camera configuration. %B Proceedings. IEEE Conference on Advanced Video and Signal Based Surveillance, 2003. %P 205 - 212 %8 2003/07// %G eng %R 10.1109/AVSS.2003.1217923 %0 Conference Paper %B Multimedia and Expo, 2003. ICME '03. Proceedings. 2003 International Conference on %D 2003 %T Shape and motion driven particle filtering for human body tracking %A Yamamoto, T. %A Chellapa, Rama %K 3D %K body %K broadcast %K camera; %K cameras; %K estimation; %K Filtering %K framework; %K human %K image %K MOTION %K motion; %K particle %K processing; %K rotational %K sequence; %K sequences; %K signal %K single %K static %K theory; %K tracking; %K TV %K video %X In this paper, we propose a method to recover 3D human body motion from a video acquired by a single static camera. In order to estimate the complex state distribution of a human body, we adopt the particle filtering framework. We present the human body using several layers of representation and compose the whole body step by step. In this way, more effective particles are generated and ineffective particles are removed as we process each layer. In order to deal with the rotational motion, the frequency of rotation is obtained using a preprocessing operation. In the preprocessing step, the variance of the motion field at each image is computed, and the frequency of rotation is estimated. The estimated frequency is used for the state update in the algorithm. We successfully track the movement of figure skaters in TV broadcast image sequence, and recover the 3D shape and motion of the skater. %B Multimedia and Expo, 2003. ICME '03. Proceedings. 2003 International Conference on %V 3 %P III - 61-4 vol.3 - III - 61-4 vol.3 %8 2003/07// %G eng %R 10.1109/ICME.2003.1221248 %0 Conference Paper %B Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03). 2003 IEEE International Conference on %D 2003 %T Simultaneous tracking and recognition of human faces from video %A Zhou,Shaohua %A Chellapa, Rama %K appearance %K changes; %K density; %K Face %K human %K illumination %K Laplacian %K model; %K optical %K pose %K processing; %K recognition; %K series %K series; %K signal %K TIME %K tracking; %K variations; %K video %K video; %X The paper investigates the interaction between tracking and recognition of human faces from video under a framework proposed earlier (Shaohua Zhou et al., Proc. 5th Int. Conf. on Face and Gesture Recog., 2002; Shaohua Zhou and Chellappa, R., Proc. European Conf. on Computer Vision, 2002), where a time series model is used to resolve the uncertainties in both tracking and recognition. However, our earlier efforts employed only a simple likelihood measurement in the form of a Laplacian density to deal with appearance changes between frames and between the observation and gallery images, yielding poor accuracies in both tracking and recognition when confronted by pose and illumination variations. The interaction between tracking and recognition was not well understood. We address the interdependence between tracking and recognition using a series of experiments and quantify the interacting nature of tracking and recognition. %B Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03). 2003 IEEE International Conference on %V 3 %P III - 225-8 vol.3 - III - 225-8 vol.3 %8 2003/04// %G eng %R 10.1109/ICASSP.2003.1199148 %0 Conference Paper %B Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03). 2003 IEEE International Conference on %D 2003 %T Statistical shape theory for activity modeling %A Vaswani, N. %A Chowdhury, A.R. %A Chellapa, Rama %K abnormal %K activities %K activity %K analysis; %K behavior; %K classification; %K data; %K image %K mass; %K matching; %K modeling; %K monitoring; %K moving %K normal %K particle; %K pattern %K pattern; %K point %K polygonal %K probability; %K problem; %K processing; %K sequence; %K sequences; %K SHAPE %K shape; %K signal %K statistical %K Surveillance %K surveillance; %K theory; %K video %X Monitoring activities in a certain region from video data is an important surveillance problem. The goal is to learn the pattern of normal activities and detect unusual ones by identifying activities that deviate appreciably from the typical ones. We propose an approach using statistical shape theory based on the shape model of D.G. Kendall et al. (see "Shape and Shape Theory", John Wiley and Sons, 1999). In a low resolution video, each moving object is best represented as a moving point mass or particle. In this case, an activity can be defined by the interactions of all or some of these moving particles over time. We model this configuration of the particles by a polygonal shape formed from the locations of the points in a frame and the activity by the deformation of the polygons in time. These parameters are learned for each typical activity. Given a test video sequence, an activity is classified as abnormal if the probability for the sequence (represented by the mean shape and the dynamics of the deviations), given the model, is below a certain threshold The approach gives very encouraging results in surveillance applications using a single camera and is able to identify various kinds of abnormal behavior. %B Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03). 2003 IEEE International Conference on %V 3 %P III - 493-6 vol.3 - III - 493-6 vol.3 %8 2003/04// %G eng %R 10.1109/ICASSP.2003.1199519 %0 Conference Paper %B Multimedia and Expo, 2003. ICME '03. Proceedings. 2003 International Conference on %D 2003 %T Video based rendering of planar dynamic scenes %A Kale, A. %A Chowdhury, A.K.R. %A Chellapa, Rama %K (computer %K 3D %K analysis; %K approximation; %K based %K camera; %K cameras; %K direction; %K dynamic %K graphics); %K image %K monocular %K MOTION %K perspective %K planar %K processing; %K rendering %K rendering; %K scenes; %K sequence; %K sequences; %K signal %K video %K weak %X In this paper, we propose a method to synthesize arbitrary views of a planar scene from a monocular video sequence of it. The 3-D direction of motion of the object is robustly estimated from the video sequence. Given this direction any other view of the object can be synthesized through a perspective projection approach, under assumptions of planarity. If the distance of the object from the camera is large, a planar approximation is reasonable even for non-planar scenes. Such a method has many important applications, one of them being gait recognition where a side view of the person is required. Our method can be used to synthesize the side-view of the person in case he/she does not present a side view to the camera. Since the planarity assumption is often an approximation, the effects of non-planarity can lead to inaccuracies in rendering and needs to be corrected for. Regions where this happens are examined and a simple technique based on weak perspective approximation is proposed to offset rendering inaccuracies. Examples of synthesized views using our method and performance evaluation are presented. %B Multimedia and Expo, 2003. ICME '03. Proceedings. 2003 International Conference on %V 1 %P I - 477-80 vol.1 - I - 477-80 vol.1 %8 2003/07// %G eng %R 10.1109/ICME.2003.1220958 %0 Conference Paper %B Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03). 2003 IEEE International Conference on %D 2003 %T Video synthesis of arbitrary views for approximately planar scenes %A Chowdhury, A.K.R. %A Kale, A. %A Chellapa, Rama %K (access %K 3D %K applications; %K approach; %K approximately %K approximation; %K arbitrary %K Biometrics %K control); %K data; %K direction %K estimation; %K evaluation; %K Gait %K image %K monocular %K MOTION %K performance %K perspective %K planar %K processing; %K projection %K recognition; %K recovery; %K scenes; %K sequence; %K sequences; %K side %K signal %K structure; %K Surveillance %K surveillance; %K synthesis; %K synthesized %K video %K view %K views; %X In this paper, we propose a method to synthesize arbitrary views of a planar scene, given a monocular video sequence. The method is based on the availability of knowledge of the angle between the original and synthesized views. Such a method has many important applications, one of them being gait recognition. Gait recognition algorithms rely on the availability of an approximate side-view of the person. From a realistic viewpoint, such an assumption is impractical in surveillance applications and it is of interest to develop methods to synthesize a side view of the person, given an arbitrary view. For large distances from the camera, a planar approximation for the individual can be assumed. In this paper, we propose a perspective projection approach for recovering the direction of motion of the person purely from the video data, followed by synthesis of a new video sequence at a different angle. The algorithm works purely in the image and video domain, though 3D structure plays an implicit role in its theoretical justification. Examples of synthesized views using our method and performance evaluation are presented. %B Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03). 2003 IEEE International Conference on %V 3 %P III - 497-500 vol.3 - III - 497-500 vol.3 %8 2003/04// %G eng %R 10.1109/ICASSP.2003.1199520 %0 Conference Paper %B Multimedia and Expo, 2002. ICME '02. Proceedings. 2002 IEEE International Conference on %D 2002 %T 3D face reconstruction from video using a generic model %A Chowdhury, A.R. %A Chellapa, Rama %A Krishnamurthy, S. %A Vo, T. %K 3D %K algorithm; %K algorithms; %K analysis; %K Carlo %K chain %K Computer %K Face %K from %K function; %K generic %K human %K image %K Markov %K MCMC %K methods; %K model; %K Monte %K MOTION %K optimisation; %K OPTIMIZATION %K processes; %K processing; %K recognition; %K reconstruction %K reconstruction; %K sampling; %K sequence; %K sequences; %K SfM %K signal %K structure %K surveillance; %K video %K vision; %X Reconstructing a 3D model of a human face from a video sequence is an important problem in computer vision, with applications to recognition, surveillance, multimedia etc. However, the quality of 3D reconstructions using structure from motion (SfM) algorithms is often not satisfactory. One common method of overcoming this problem is to use a generic model of a face. Existing work using this approach initializes the reconstruction algorithm with this generic model. The problem with this approach is that the algorithm can converge to a solution very close to this initial value, resulting in a reconstruction which resembles the generic model rather than the particular face in the video which needs to be modeled. We propose a method of 3D reconstruction of a human face from video in which the 3D reconstruction algorithm and the generic model are handled separately. A 3D estimate is obtained purely from the video sequence using SfM algorithms without use of the generic model. The final 3D model is obtained after combining the SfM estimate and the generic model using an energy function that corrects for the errors in the estimate by comparing local regions in the two models. The optimization is done using a Markov chain Monte Carlo (MCMC) sampling strategy. The main advantage of our algorithm over others is that it is able to retain the specific features of the face in the video sequence even when these features are different from those of the generic model. The evolution of the 3D model through the various stages of the algorithm is presented. %B Multimedia and Expo, 2002. ICME '02. Proceedings. 2002 IEEE International Conference on %V 1 %P 449 - 452 vol.1 - 449 - 452 vol.1 %8 2002/// %G eng %R 10.1109/ICME.2002.1035815 %0 Conference Paper %B Image Processing. 2002. Proceedings. 2002 International Conference on %D 2002 %T Bayesian structure from motion using inertial information %A Qian,Gang %A Chellapa, Rama %A Qinfen Zheng %K 3D %K analysis; %K Bayes %K Bayesian %K camera %K estimation; %K image %K images; %K importance %K inertial %K information; %K methods; %K MOTION %K motion; %K parameter %K processing; %K real %K reconstruction; %K sampling; %K scene %K sensors; %K sequence; %K sequences; %K sequential %K signal %K structure-from-motion; %K synthetic %K systems; %K video %X A novel approach to Bayesian structure from motion (SfM) using inertial information and sequential importance sampling (SIS) is presented. The inertial information is obtained from camera-mounted inertial sensors and is used in the Bayesian SfM approach as prior knowledge of the camera motion in the sampling algorithm. Experimental results using both synthetic and real images show that, when inertial information is used, more accurate results can be obtained or the same estimation accuracy can be obtained at a lower cost. %B Image Processing. 2002. Proceedings. 2002 International Conference on %V 3 %P III-425 - III-428 vol.3 - III-425 - III-428 vol.3 %8 2002/// %G eng %R 10.1109/ICIP.2002.1038996 %0 Journal Article %J Image Processing, IEEE Transactions on %D 2002 %T A generic approach to simultaneous tracking and verification in video %A Li,Baoxin %A Chellapa, Rama %K approach; %K Carlo %K configuration; %K correspondence %K data; %K density %K density; %K estimated %K estimation; %K evaluation; %K extraction; %K Face %K facial %K feature %K generic %K human %K hypothesis %K image %K measurement %K methods; %K Monte %K object %K performance %K posterior %K probability %K probability; %K problem; %K processing; %K propagation; %K recognition; %K road %K sequence %K sequences; %K sequential %K signal %K space; %K stabilization; %K state %K synthetic %K temporal %K testing; %K tracking; %K vector; %K vehicle %K vehicles; %K verification; %K video %K visual %X A generic approach to simultaneous tracking and verification in video data is presented. The approach is based on posterior density estimation using sequential Monte Carlo methods. Visual tracking, which is in essence a temporal correspondence problem, is solved through probability density propagation, with the density being defined over a proper state space characterizing the object configuration. Verification is realized through hypothesis testing using the estimated posterior density. In its most basic form, verification can be performed as follows. Given a measurement vector Z and two hypotheses H1 and H0, we first estimate posterior probabilities P(H0|Z) and P(H1|Z), and then choose the one with the larger posterior probability as the true hypothesis. Several applications of the approach are illustrated by experiments devised to evaluate its performance. The idea is first tested on synthetic data, and then experiments with real video sequences are presented, illustrating vehicle tracking and verification, human (face) tracking and verification, facial feature tracking, and image sequence stabilization. %B Image Processing, IEEE Transactions on %V 11 %P 530 - 544 %8 2002/05// %@ 1057-7149 %G eng %N 5 %R 10.1109/TIP.2002.1006400 %0 Conference Paper %B Motion and Video Computing, 2002. Proceedings. Workshop on %D 2002 %T A hierarchical approach for obtaining structure from two-frame optical flow %A Liu,Haiying %A Chellapa, Rama %A Rosenfeld, A. %K algorithm; %K aliasing; %K analysis; %K computer-rendered %K depth %K depth; %K error %K estimation; %K extraction; %K Face %K feature %K flow; %K gesture %K hierarchical %K image %K images; %K inverse %K iterative %K methods; %K MOTION %K nonlinear %K optical %K parameter %K processing; %K real %K recognition; %K sequences; %K signal %K structure-from-motion; %K system; %K systems; %K TIME %K two-frame %K variation; %K video %X A hierarchical iterative algorithm is proposed for extracting structure from two-frame optical flow. The algorithm exploits two facts: one is that in many applications, such as face and gesture recognition, the depth variation of the visible surface of an object in a scene is small compared to the distance between the optical center and the object; the other is that the time aliasing problem is alleviated at the coarse level for any two-frame optical flow estimate so that the estimate tends to be more accurate. A hierarchical representation for the relationship between the optical flow, depth, and the motion parameters is derived, and the resulting non-linear system is iteratively solved through two linear subsystems. At the coarsest level, the surface of the object tends to be flat, so that the inverse depth tends to be a constant, which is used as the initial depth map. Inverse depth and motion parameters are estimated by the two linear subsystems at each level and the results are propagated to finer levels. Error analysis and experiments using both computer-rendered images and real images demonstrate the correctness and effectiveness of our algorithm. %B Motion and Video Computing, 2002. Proceedings. Workshop on %P 214 - 219 %8 2002/12// %G eng %R 10.1109/MOTION.2002.1182239 %0 Conference Paper %B Image Processing. 2002. Proceedings. 2002 International Conference on %D 2002 %T Probabilistic recognition of human faces from video %A Chellapa, Rama %A Kruger, V. %A Zhou,Shaohua %K Bayes %K Bayesian %K CMU; %K distribution; %K Face %K faces; %K gallery; %K handling; %K human %K image %K images; %K importance %K likelihood; %K methods; %K NIST/USF; %K observation %K posterior %K probabilistic %K probability; %K processing; %K propagation; %K recognition; %K sampling; %K sequential %K signal %K still %K Still-to-video %K Uncertainty %K video %K Video-to-video %X Most present face recognition approaches recognize faces based on still images. We present a novel approach to recognize faces in video. In that scenario, the face gallery may consist of still images or may be derived from a videos. For evidence integration we use classical Bayesian propagation over time and compute the posterior distribution using sequential importance sampling. The probabilistic approach allows us to handle uncertainties in a systematic manner. Experimental results using videos collected by NIST/USF and CMU illustrate the effectiveness of this approach in both still-to-video and video-to-video scenarios with appropriate model choices. %B Image Processing. 2002. Proceedings. 2002 International Conference on %V 1 %P I-41 - I-44 vol.1 - I-41 - I-44 vol.1 %8 2002/// %G eng %R 10.1109/ICIP.2002.1037954 %0 Conference Paper %B Multimedia Signal Processing, 2002 IEEE Workshop on %D 2002 %T Wide baseline image registration using prior information %A Chowdhury, AM %A Chellapa, Rama %A Keaton, T. %K 2D %K 3D %K algorithm; %K alignment; %K angles; %K baseline %K Computer %K configuration; %K constellation; %K correspondence %K creation; %K doubly %K error %K extraction; %K Face %K feature %K global %K holistic %K image %K images; %K matching; %K matrix; %K model %K models; %K normalization %K panoramic %K probability; %K procedure; %K processes; %K processing; %K registration; %K robust %K sequences; %K SHAPE %K signal %K Sinkhorn %K spatial %K statistics; %K stereo; %K Stochastic %K video %K view %K viewing %K vision; %K wide %X Establishing correspondence between features in two images of the same scene taken from different viewing angles in a challenging problem in image processing and computer vision. However, its solution is an important step in many applications like wide baseline stereo, 3D model alignment, creation of panoramic views etc. In this paper, we propose a technique for registration of two images of a face obtained from different viewing angles. We show that prior information about the general characteristics of a face obtained from video sequences of different faces can be used to design a robust correspondence algorithm. The method works by matching 2D shapes of the different features of the face. A doubly stochastic matrix, representing the probability of match between the features, is derived using the Sinkhorn normalization procedure. The final correspondence is obtained by minimizing the probability of error of a match between the entire constellations of features in the two sets, thus taking into account the global spatial configuration of the features. The method is applied for creating holistic 3D models of a face from partial representations. Although this paper focuses primarily on faces, the algorithm can also be used for other objects with small modifications. %B Multimedia Signal Processing, 2002 IEEE Workshop on %P 37 - 40 %8 2002/12// %G eng %R 10.1109/MMSP.2002.1203242 %0 Conference Paper %B Multimedia and Expo, 2000. ICME 2000. 2000 IEEE International Conference on %D 2000 %T Video access control via multi-level data hiding %A M. Wu %A Yu,Hong Heather %K access %K adaptive %K algorithms;hidden %K bits;high %K conditions;robustness;robustness-capacity %K control;adaptive %K data %K data;video %K design;user %K digital %K embedding;noise %K encapsulation;multimedia %K hiding %K hiding;multi-level %K information;data %K processing; %K QUALITY %K signal %K systems;authorisation;data %K systems;video %K technique;control %K tradeoff;system %K user %K video;multi-level %X The paper proposes novel data hiding algorithms and system design for high quality digital video. Instead of targeting on a single degree of robustness, which results in overestimation and/or underestimation of the noise conditions, we apply multi-level embedding to digital video to achieve more than one level of robustness-capacity tradeoff. In addition, an adaptive technique is proposed to determine how many bits are embedded in each part of the video. Besides user data, control information such as synchronization and the number of hidden user bits are embedded as well. The algorithm can be used for applications such as access control %B Multimedia and Expo, 2000. ICME 2000. 2000 IEEE International Conference on %V 1 %P 381 -384 vol.1 - 381 -384 vol.1 %8 2000/// %G eng %R 10.1109/ICME.2000.869620 %0 Conference Paper %B Geoscience and Remote Sensing Symposium, 2000. Proceedings. IGARSS 2000. IEEE 2000 International %D 2000 %T Web based progressive transmission for browsing remotely sensed imagery %A Mareboyana,M. %A Srivastava,S. %A JaJa, Joseph F. %K based %K decomposition;geophysical %K image;model-based %K interest;remote %K interest;vector %K mapping;user %K mapping;vector %K measurement %K of %K processing;geophysical %K processing;image %K Progressive %K quantisation;wavelet %K quantization;wavelet %K refinement;region %K regions %K representation;land %K representation;remote %K sensing;scalar;terrain %K sensing;terrain %K signal %K specified %K specified;user %K surface;large %K technique;image %K techniques;image %K transforms; %K transmission;browsing;geophysical %K VQ;progressive %K Web %X This paper describes an image representation technique that entails progressive refinement of user specified regions of interest (ROI) of large images. Progressive refinement to original quality can be accomplished in theory. However, due to heavy burden on storage resources for the authors' applications, they restrict the refinement to about 25% of the original data resolution. A wavelet decomposition of the data combined with scalar and vector quantization (VQ) of the high frequency components and JPEG/DCT compression of low frequency component is used as representation framework. Their software will reconstruct the region selected by the user from its wavelet decomposition such that it fills up the preview window with the appropriate subimages at the desired resolution, including full resolution stored for preview. Further refinement from the first preview can be obtained progressively by transmitting high frequency coefficients from low resolution to high resolution which are compressed by variant of vector quantization called model-based VQ (MVQ). The user will have an option for progressive build up of the ROIs until full resolution stored or terminate the transmission at any time during the progressive refinement %B Geoscience and Remote Sensing Symposium, 2000. Proceedings. IGARSS 2000. IEEE 2000 International %V 2 %P 591 -593 vol.2 - 591 -593 vol.2 %8 2000/// %G eng %R 10.1109/IGARSS.2000.861640 %0 Conference Paper %B Geoscience and Remote Sensing Symposium, 1999. IGARSS '99 Proceedings. IEEE 1999 International %D 1999 %T A hierarchical data archiving and processing system to generate custom tailored products from AVHRR data %A Kalluri, SNV %A Zhang,Z. %A JaJa, Joseph F. %A Bader, D.A. %A Song,H. %A El Saleous,N. %A Vermote,E. %A Townshend,J.R.G. %K archiving;image %K AVHRR;GIS;PACS;custom %K data %K image;land %K image;remote %K mapping; %K mapping;PACS;geophysical %K measurement %K PROCESSING %K processing;geophysical %K product;data %K remote %K scheme;infrared %K sensing;optical %K sensing;terrain %K signal %K surface;multispectral %K system;indexing %K tailored %K technique;hierarchical %K techniques;remote %X A novel indexing scheme is described to catalogue satellite data on a pixel basis. The objective of this research is to develop an efficient methodology to archive, retrieve and process satellite data, so that data products can be generated to meet the specific needs of individual scientists. When requesting data, users can specify the spatial and temporal resolution, geographic projection, choice of atmospheric correction, and the data selection methodology. The data processing is done in two stages. Satellite data is calibrated, navigated and quality flags are appended in the initial processing. This processed data is then indexed and stored. Secondary processing such as atmospheric correction and projection are done after a user requests the data to create custom made products. By dividing the processing in to two stages saves time, since the basic processing tasks such as navigation and calibration which are common to all requests are not repeated when different users request satellite data. The indexing scheme described can be extended to allow fusion of data sets from different sensors %B Geoscience and Remote Sensing Symposium, 1999. IGARSS '99 Proceedings. IEEE 1999 International %V 5 %P 2374 -2376 vol.5 - 2374 -2376 vol.5 %8 1999/// %G eng %R 10.1109/IGARSS.1999.771514 %0 Conference Paper %B Image Processing, 1998. ICIP 98. Proceedings. 1998 International Conference on %D 1998 %T An algorithm for wipe detection %A M. Wu %A Wolf,W. %A Liu,B. %K algorithm;image %K analysis;video %K content;video %K DC %K DETECTION %K detection;statistical %K domain;object %K information;structural %K information;video %K motion;compressed %K motion;shot %K processing; %K programs;camera %K programs;wipe %K sequence;MPEG %K sequences;signal %K signal %K stream;TV %K transitions %X The detection of transitions between shots in video programs is an important first step in analyzing video content. The wipe is a frequently used transitional form between shots. Wipe detection is more involved than the detection of abrupt and other gradual transitions because a wipe may take various patterns and because of the difficulty in discriminating a wipe from object and camera motion. In this paper, we propose an algorithm for detecting wipes using both structural and statistical information. The algorithm can effectively detect most wipes used in current TV programs. It uses the DC sequence which can be easily extracted from the MPEG stream without full decompression %B Image Processing, 1998. ICIP 98. Proceedings. 1998 International Conference on %V 1 %P 893 -897 vol.1 - 893 -897 vol.1 %8 1998/10// %G eng %R 10.1109/ICIP.1998.723664 %0 Journal Article %J Computational Science Engineering, IEEE %D 1998 %T Models and high-performance algorithms for global BRDF retrieval %A Zengyan Zhang %A Kalluri, SNV %A JaJa, Joseph F. %A Liang,Shunlin %A Townshend,J.R.G. %K algorithms; %K BRDF %K Earth %K geomorphology; %K geophysical %K global %K high-performance %K IBM %K information %K light %K machines; %K models; %K Parallel %K processing; %K reflectivity; %K retrieval %K retrieval; %K scattering; %K signal %K SP2; %K surface; %X The authors describe three models for retrieving information related to the scattering of light on the Earth's surface. Using these models, they've developed algorithms for the IBM SP2 that efficiently retrieve this information %B Computational Science Engineering, IEEE %V 5 %P 16 - 29 %8 1998/12//oct %@ 1070-9924 %G eng %N 4 %R 10.1109/99.735892 %0 Conference Paper %B Acoustics, Speech, and Signal Processing, 1993. ICASSP-93., 1993 IEEE International Conference on %D 1993 %T An adaptive ESPRIT based on URV decomposition %A Liu,K. J.R %A O'Leary, Dianne P. %A Stewart, G.W. %A Wu,Y.-J.J. %K algorithm;array %K complexity;parameter %K decomposition;real-time %K ESPRIT %K estimation;real-time %K filters;array %K of %K processing;adaptive %K processing;computational %K savings;performance;rank-revealing %K sensors;computational %K signal %K systems; %K systems;time-varying %K URV %K URV-based %X ESPRIT is an algorithm for determining the fixed directions of arrival of a set of narrowband signals at an array of sensors. Its computational burden makes it unsuitable for real-time processing of signals with time-varying directions of arrival. The authors develop a new implementation of ESPRIT that has potential for real-time processing. It is based on a rank-revealing URV decomposition, rather than the eigendecomposition or singular value decomposition (SVD) used in previous ESPRIT algorithms. Its performance is demonstrated on simulated data representing both constant and time-varying signals. It is shown that the URV-based ESPRIT algorithm is effective for estimating time-varying directions-of-arrival at considerable computational savings over the SVD-based algorithm.≪ETX≫ %B Acoustics, Speech, and Signal Processing, 1993. ICASSP-93., 1993 IEEE International Conference on %V 4 %P 37 -40 vol.4 - 37 -40 vol.4 %8 1993/04// %G eng %R 10.1109/ICASSP.1993.319588 %0 Journal Article %J Signal Processing, IEEE Transactions on %D 1993 %T VLSI implementation of a tree searched vector quantizer %A Kolagotla,R. K. %A Yu,S.-S. %A JaJa, Joseph F. %K (mathematics); %K 2 %K 20 %K chips; %K coding; %K compression; %K data %K design; %K digital %K image %K implementation; %K MHz; %K micron; %K PROCESSING %K quantisation; %K quantizer; %K searched %K signal %K tree %K TREES %K vector %K VLSI %K VLSI; %X The VLSI design and implementation of a tree-searched vector quantizer is presented. The number of processors needed is equal to the depth of the tree. All processors are identical, and data flow between processors is regular. No global control signals are needed. The processors have been fabricated using 2 mu;m N-well process on a 7.9 times;9.2 mm die. Each processor chip contains 25000 transistors and has 84 pins. The processors have been thoroughly tested at a clock frequency of 20 MHz %B Signal Processing, IEEE Transactions on %V 41 %P 901 - 905 %8 1993/02// %@ 1053-587X %G eng %N 2 %R 10.1109/78.193225