%0 Conference Paper %B Person-Oriented Vision (POV), 2011 IEEE Workshop on %D 2011 %T Active inference for retrieval in camera networks %A Daozheng Chen %A Bilgic,M. %A Getoor, Lise %A Jacobs, David W. %A Mihalkova,L. %A Tom Yeh %K active %K annotation;probabilistic %K frame;cameras;inference %K inference;camera %K mechanisms;probability;search %K model;human %K model;retrieval %K network;graphical %K problem;video %K problems;video %K processing; %K retrieval;video %K signal %K system;searching %X We address the problem of searching camera network videos to retrieve frames containing specified individuals. We show the benefit of utilizing a learned probabilistic model that captures dependencies among the cameras. In addition, we develop an active inference framework that can request human input at inference time, directing human attention to the portions of the videos whose correct annotation would provide the biggest performance improvements. Our primary contribution is to show that by mapping video frames in a camera network onto a graphical model, we can apply collective classification and active inference algorithms to significantly increase the performance of the retrieval system, while minimizing the number of human annotations required. %B Person-Oriented Vision (POV), 2011 IEEE Workshop on %P 13 - 20 %8 2011/01// %G eng %R 10.1109/POV.2011.5712363 %0 Journal Article %J Pattern Analysis and Machine Intelligence, IEEE Transactions on %D 2011 %T Dynamic Processing Allocation in Video %A Daozheng Chen %A Bilgic,M. %A Getoor, Lise %A Jacobs, David W. %K algorithms;digital %K allocation;video %K analysis;computer %K background %K detection;graphical %K detection;resource %K graphics;face %K model;resource %K processing; %K processing;face %K recognition;object %K signal %K subtraction;baseline %K video %X Large stores of digital video pose severe computational challenges to existing video analysis algorithms. In applying these algorithms, users must often trade off processing speed for accuracy, as many sophisticated and effective algorithms require large computational resources that make it impractical to apply them throughout long videos. One can save considerable effort by applying these expensive algorithms sparingly, directing their application using the results of more limited processing. We show how to do this for retrospective video analysis by modeling a video using a chain graphical model and performing inference both to analyze the video and to direct processing. We apply our method to problems in background subtraction and face detection, and show in experiments that this leads to significant improvements over baseline algorithms. %B Pattern Analysis and Machine Intelligence, IEEE Transactions on %V 33 %P 2174 - 2187 %8 2011/11// %@ 0162-8828 %G eng %N 11 %R 10.1109/TPAMI.2011.55 %0 Conference Paper %B Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on %D 2011 %T Multi-agent event recognition in structured scenarios %A Morariu,V.I. %A Davis, Larry S. %K Allen %K analysis;Markov %K descriptions;video %K event %K grounding %K inference;semantic %K interval %K Logic %K logic;interval-based %K logic;Markov %K logic;multi-agent %K logical %K networks;bottom-up %K processes;formal %K processing; %K reasoning;multiagent %K reasoning;video %K recognition;probabilistic %K recognition;temporal %K scheme;first-order %K signal %K spatio-temporal %K systems;object %K temporal %X We present a framework for the automatic recognition of complex multi-agent events in settings where structure is imposed by rules that agents must follow while performing activities. Given semantic spatio-temporal descriptions of what generally happens (i.e., rules, event descriptions, physical constraints), and based on video analysis, we determine the events that occurred. Knowledge about spatio-temporal structure is encoded using first-order logic using an approach based on Allen's Interval Logic, and robustness to low-level observation uncertainty is provided by Markov Logic Networks (MLN). Our main contribution is that we integrate interval-based temporal reasoning with probabilistic logical inference, relying on an efficient bottom-up grounding scheme to avoid combinatorial explosion. Applied to one-on-one basketball, our framework detects and tracks players, their hands and feet, and the ball, generates event observations from the resulting trajectories, and performs probabilistic logical inference to determine the most consistent sequence of events. We demonstrate our approach on 1hr (100,000 frames) of outdoor videos. %B Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on %P 3289 - 3296 %8 2011/06// %G eng %R 10.1109/CVPR.2011.5995386 %0 Conference Paper %B Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE International Conference on %D 2011 %T Secure video processing: Problems and challenges %A Lu,Wenjun %A Varna,A. %A M. Wu %K data;video %K fashion;secure %K management;secure %K of %K online %K privacy-preserving %K processing; %K processing;security %K signal %K video %X Secure signal processing is an emerging technology to enable signal processing tasks in a secure and privacy-preserving fashion. It has attracted a great amount of research attention due to the increasing demand to enable rich functionalities for private data stored online. Desirable functionalities may include search, analysis, clustering, etc. In this paper, we discuss the research issues and challenges in secure video processing with focus on the application of secure online video management. Video is different from text due to its large data volume and rich content diversity. To be practical, secure video processing requires efficient solutions that may involve a trade-off between security and complexity. We look at three representative video processing tasks and review existing techniques that can be applied. Many of the tasks do not have efficient solutions yet, and we discuss the challenges and research questions that need to be addressed. %B Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE International Conference on %P 5856 - 5859 %8 2011/05// %G eng %R 10.1109/ICASSP.2011.5947693 %0 Conference Paper %B Computer Vision Workshops (ICCV Workshops), 2011 IEEE International Conference on %D 2011 %T Trainable 3D recognition using stereo matching %A Castillo,C. D %A Jacobs, David W. %K 2D %K 3D %K class %K classification %K classification;image %K data %K dataset;CMU %K dataset;face %K descriptor;occlusion;pose %K estimation;solid %K image %K image;3D %K matching;pose %K matching;trainable %K modelling;stereo %K object %K PIE %K processing; %K recognition;face %K recognition;image %K set;3D %K variation;stereo %X Stereo matching has been used for face recognition in the presence of pose variation. In this approach, stereo matching is used to compare two 2-D images based on correspondences that reflect the effects of viewpoint variation and allow for occlusion. We show how to use stereo matching to derive image descriptors that can be used to train a classifier. This improves face recognition performance, producing the best published results on the CMU PIE dataset. We also demonstrate that classification based on stereo matching can be used for general object classification in the presence of pose variation. In preliminary experiments we show promising results on the 3D object class dataset, a standard, challenging 3D classification data set. %B Computer Vision Workshops (ICCV Workshops), 2011 IEEE International Conference on %P 625 - 631 %8 2011/// %G eng %R 10.1109/ICCVW.2011.6130301 %0 Conference Paper %B Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on %D 2011 %T Wide-baseline stereo for face recognition with large pose variation %A Castillo,C. D %A Jacobs, David W. %K 2D %K algorithm;frontal %K dataset;dynamic %K estimation;stereo %K Face %K image %K image;near %K image;pose %K MATCHING %K matching;pose %K matching;surface %K method;dynamic %K performance;stereo %K PIE %K processing; %K profile %K Programming %K programming;face %K recognition;CMU %K recognition;image %K slant;wide-baseline %K stereo %K stereo;window-based %K variation;recognition %X 2-D face recognition in the presence of large pose variations presents a significant challenge. When comparing a frontal image of a face to a near profile image, one must cope with large occlusions, non-linear correspondences, and significant changes in appearance due to viewpoint. Stereo matching has been used to handle these problems, but performance of this approach degrades with large pose changes. We show that some of this difficulty is due to the effect that foreshortening of slanted surfaces has on window-based matching methods, which are needed to provide robustness to lighting change. We address this problem by designing a new, dynamic programming stereo algorithm that accounts for surface slant. We show that on the CMU PIE dataset this method results in significant improvements in recognition performance. %B Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on %P 537 - 544 %8 2011/06// %G eng %R 10.1109/CVPR.2011.5995559 %0 Conference Paper %B Image Processing (ICIP), 2010 17th IEEE International Conference on %D 2010 %T Recognizing offensive strategies from football videos %A Ruonan Li %A Chellapa, Rama %K American %K analysis;video %K changes;image %K errors;variables %K football %K identification;tracking %K models;geometric %K models;view %K play %K players %K processing; %K properties;nonlinear %K recognition;offensive %K recognition;statistical %K signal %K spaces;offensive %K statistical %K strategies %K technique;design %K videos;analysis-by-synthesis %X We address the problem of recognizing offensive play strategies from American football play videos. Specifically, we propose a probabilistic model which describes the generative process of an observed football play and takes into account practical issues in real football videos, such as difficulty in identifying offensive players, view changes, and tracking errors. In particular, we exploit the geometric properties of nonlinear spaces of involved variables and design statistical models on these manifolds. Then recognition is performed via 'analysis-by-synthesis' technique. Experiments on a newly established dataset of American football videos demonstrate the effectiveness of the approach. %B Image Processing (ICIP), 2010 17th IEEE International Conference on %P 4585 - 4588 %8 2010/09// %G eng %R 10.1109/ICIP.2010.5652192 %0 Conference Paper %B Cloud Computing Technology and Science (CloudCom), 2010 IEEE Second International Conference on %D 2010 %T Scaling Populations of a Genetic Algorithm for Job Shop Scheduling Problems Using MapReduce %A Di-Wei Huang %A Jimmy Lin %K algorithm;job %K algorithms;job %K computing;genetic %K data %K large-scale %K MapReduce;cloud %K problems;parallel %K processing; %K processing;cloud %K scheduling %K scheduling;parallel %K shop %X Inspired by Darwinian evolution, a genetic algorithm (GA) approach is one popular heuristic method for solving hard problems such as the Job Shop Scheduling Problem (JSSP), which is one of the hardest problems lacking efficient exact solutions today. It is intuitive that the population size of a GA may greatly affect the quality of the solution, but it is unclear what are the effects of having population sizes that are significantly greater than typical experiments. The emergence of MapReduce, a framework running on a cluster of computers that aims to provide large-scale data processing, offers great opportunities to investigate this issue. In this paper, a GA is implemented to scale the population using MapReduce. Experiments are conducted on a large cluster, and population sizes up to 107 are inspected. It is shown that larger population sizes not only tend to yield better solutions, but also require fewer generations. Therefore, it is clear that when dealing with a hard problem such as JSSP, an existing GA can be improved by massively scaling up populations with MapReduce, so that the solution can be parallelized and completed in reasonable time. %B Cloud Computing Technology and Science (CloudCom), 2010 IEEE Second International Conference on %P 780 - 785 %8 2010/// %G eng %R 10.1109/CloudCom.2010.18 %0 Journal Article %J Pattern Analysis and Machine Intelligence, IEEE Transactions on %D 2010 %T Video Metrology Using a Single Camera %A Guo,Feng %A Chellapa, Rama %K camera;vanishing %K circles;uncalibrated %K concentric %K image-based %K information;vehicle %K line %K measurement;video %K metrology;cameras;image %K processing; %K segment;multiple %K segmentation;video %K signal %K single %K techniques;line %K wheelbase %X This paper presents a video metrology approach using an uncalibrated single camera that is either stationary or in planar motion. Although theoretically simple, measuring the length of even a line segment in a given video is often a difficult problem. Most existing techniques for this task are extensions of single image-based techniques and do not achieve the desired accuracy especially in noisy environments. In contrast, the proposed algorithm moves line segments on the reference plane to share a common endpoint using the vanishing line information followed by fitting multiple concentric circles on the image plane. A fully automated real-time system based on this algorithm has been developed to measure vehicle wheelbases using an uncalibrated stationary camera. The system estimates the vanishing line using invariant lengths on the reference plane from multiple frames rather than the given parallel lines, which may not exist in videos. It is further extended to a camera undergoing a planar motion by automatically selecting frames with similar vanishing lines from the video. Experimental results show that the measurement results are accurate enough to classify moving vehicles based on their size. %B Pattern Analysis and Machine Intelligence, IEEE Transactions on %V 32 %P 1329 - 1335 %8 2010/07// %@ 0162-8828 %G eng %N 7 %R 10.1109/TPAMI.2010.26 %0 Conference Paper %B Image Processing (ICIP), 2009 16th IEEE International Conference on %D 2009 %T Concurrent transition and shot detection in football videos using Fuzzy Logic %A Refaey,M.A. %A Elsayed,K.M. %A Hanafy,S.M. %A Davis, Larry S. %K analysis;inference %K boundary;shot %K Color %K colour %K detection;sports %K functions;shot %K histogram;concurrent %K logic;image %K logic;inference %K mechanism;intensity %K mechanisms;sport;video %K processing; %K processing;videonanalysis;fuzzy %K signal %K transition;edgeness;football %K variance;membership %K video;video %K videos;fuzzy %X Shot detection is a fundamental step in video processing and analysis that should be achieved with high degree of accuracy. In this paper, we introduce a unified algorithm for shot detection in sports video using fuzzy logic as a powerful inference mechanism. Fuzzy logic overcomes the problems of hard cut thresholds and the need to large training data used in previous work. The proposed algorithm integrates many features like color histogram, edgeness, intensity variance, etc. Membership functions to represent different features and transitions between shots have been developed to detect different shot boundary and transition types. We address the detection of cut, fade, dissolve, and wipe shot transitions. The results show that our algorithm achieves high degree of accuracy. %B Image Processing (ICIP), 2009 16th IEEE International Conference on %P 4341 - 4344 %8 2009/11// %G eng %R 10.1109/ICIP.2009.5413648 %0 Conference Paper %B Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on %D 2009 %T Learning multi-modal densities on Discriminative Temporal Interaction Manifold for group activity recognition %A Ruonan Li %A Chellapa, Rama %A Zhou,S. K %K a %K activities;parametric %K activity %K analysis;belief %K analysis;pattern %K Bayesian %K classification;image %K classifier;multimodal %K complex %K constraints;data-driven %K density %K equipment;video %K function;multiobject %K interaction %K manifold;discriminative %K matrix;football %K MOTION %K network;video-based %K networks;image %K play %K posteriori %K processing; %K recognition %K recognition;group %K recognition;maximum %K signal %K spatial %K strategy;discriminative %K temporal %X While video-based activity analysis and recognition has received much attention, existing body of work mostly deals with single object/person case. Coordinated multi-object activities, or group activities, present in a variety of applications such as surveillance, sports, and biological monitoring records, etc., are the main focus of this paper. Unlike earlier attempts which model the complex spatial temporal constraints among multiple objects with a parametric Bayesian network, we propose a Discriminative Temporal Interaction Manifold (DTIM) framework as a data-driven strategy to characterize the group motion pattern without employing specific domain knowledge. In particular, we establish probability densities on the DTIM, whose element, the discriminative temporal interaction matrix, compactly describes the coordination and interaction among multiple objects in a group activity. For each class of group activity we learn a multi-modal density function on the DTIM. A Maximum a Posteriori (MAP) classifier on the manifold is then designed for recognizing new activities. Experiments on football play recognition demonstrate the effectiveness of the approach. %B Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on %P 2450 - 2457 %8 2009/06// %G eng %R 10.1109/CVPR.2009.5206676 %0 Journal Article %J Very Large Scale Integration (VLSI) Systems, IEEE Transactions on %D 2009 %T Mesh-of-Trees and Alternative Interconnection Networks for Single-Chip Parallelism %A Balkan,A.O. %A Gang Qu %A Vishkin, Uzi %K 90 %K cache;single %K complexity;multiprocessor %K delay;single-chip %K first-level %K high-throughput %K interconnection %K low-latency %K network;memory %K network;shared %K networks;network-on-chip;parallel %K nm;wire %K Parallel %K parallelism;size %K processing; %K processor;single-chip %K switch %K topologies;on-chip %K units;mesh-of-trees;network %X In single-chip parallel processors, it is crucial to implement a high-throughput low-latency interconnection network to connect the on-chip components, especially the processing units and the memory units. In this paper, we propose a new mesh of trees (MoT) implementation of the interconnection network and evaluate it relative to metrics such as wire complexity, total register count, single switch delay, maximum throughput, tradeoffs between throughput and latency, and post-layout performance. We show that on-chip interconnection networks can provide higher bandwidth between processors and shared first-level cache than previously considered possible, facilitating greater scalability of memory architectures that require that. MoT is also compared, both analytically and experimentally, to some other traditional network topologies, such as hypercube, butterfly, fat trees and butterfly fat trees. When we evaluate a 64-terminal MoT network at 90-nm technology, concrete results show that MoT provides higher throughput and lower latency especially when the input traffic (or the on-chip parallelism) is high, at comparable area. A recurring problem in networking and communication is that of achieving good sustained throughput in contrast to just high theoretical peak performance that does not materialize for typical work loads. Our quantitative results demonstrate a clear advantage of the proposed MoT network in the context of single-chip parallel processing. %B Very Large Scale Integration (VLSI) Systems, IEEE Transactions on %V 17 %P 1419 - 1432 %8 2009/10// %@ 1063-8210 %G eng %N 10 %R 10.1109/TVLSI.2008.2003999 %0 Conference Paper %B Acoustics, Speech and Signal Processing, 2009. ICASSP 2009. IEEE International Conference on %D 2009 %T Recognizing coordinated multi-object activities using a dynamic event ensemble model %A Ruonan Li %A Chellapa, Rama %K activities %K activity %K analysis;Bayes %K Bayesian %K classification;video %K description %K ensemble %K event %K framework;dynamic %K functions;ensemble %K geometric %K manifold;football %K methods;geometry;image %K model;ensemble %K multiobject %K network;video-based %K play %K processing; %K property;classifier;coordinated %K recognition;data-driven %K recognition;parametric %K Riemannian %K signal %K strategy;dynamic %X While video-based activity analysis and recognition has received broad attention, existing body of work mostly deals with single object/person case. Modeling involving multiple objects and recognition of coordinated group activities, present in a variety of applications such as surveillance, sports, biological records, and so on, is the main focus of this paper. Unlike earlier attempts which model the complex spatial temporal constraints among different activities of multiple objects with a parametric Bayesian network, we propose a dynamic dasiaevent ensemblepsila framework as a data-driven strategy to characterize the group motion pattern without employing any specific domain knowledge. In particular, we exploit the Riemannian geometric property of the set of ensemble description functions and develop a compact representation for group activities on the ensemble manifold. An appropriate classifier on the manifold is then designed for recognizing new activities. Experiments on football play recognition demonstrate the effectiveness of the framework. %B Acoustics, Speech and Signal Processing, 2009. ICASSP 2009. IEEE International Conference on %P 3541 - 3544 %8 2009/04// %G eng %R 10.1109/ICASSP.2009.4960390 %0 Conference Paper %B Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on %D 2008 %T Action recognition using ballistic dynamics %A Vitaladevuni,S.N. %A Kellokumpu,V. %A Davis, Larry S. %K analysis;image %K Bayesian %K dynamics;gesture %K feature;person-centric %K framework;action %K History %K image %K labels;psycho-kinesiological %K morphological %K MOTION %K Movement %K movements;motion %K planning;interactive %K processing; %K recognition %K recognition;ballistic %K recognition;image %K segmentation;video %K signal %K studies;image %K task;human %X We present a Bayesian framework for action recognition through ballistic dynamics. Psycho-kinesiological studies indicate that ballistic movements form the natural units for human movement planning. The framework leads to an efficient and robust algorithm for temporally segmenting videos into atomic movements. Individual movements are annotated with person-centric morphological labels called ballistic verbs. This is tested on a dataset of interactive movements, achieving high recognition rates. The approach is also applied on a gesture recognition task, improving a previously reported recognition rate from 84% to 92%. Consideration of ballistic dynamics enhances the performance of the popular Motion History Image feature. We also illustrate the approachpsilas general utility on real-world videos. Experiments indicate that the method is robust to view, style and appearance variations. %B Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on %P 1 - 8 %8 2008/06// %G eng %R 10.1109/CVPR.2008.4587806 %0 Conference Paper %B Similarity Search and Applications, 2008. SISAP 2008. First International Workshop on %D 2008 %T High-Dimensional Similarity Retrieval Using Dimensional Choice %A Tahmoush,D. %A Samet, Hanan %K data %K database %K function;high-dimensional %K management %K processing; %K processing;sequential %K reduction;database %K reduction;distance %K retrieval;query %K search;data %K search;similarity %K similarity %K system;dimension %K systems;query %X There are several pieces of information that can be utilized in order to improve the efficiency of similarity searches on high-dimensional data. The most commonly used information is the distribution of the data itself but the use of dimensional choice based on the information in the query as well as the parameters of the distribution can provide an effective improvement in the query processing speed and storage. The use of this method can produce dimension reduction by as much as a factor of n, the number of data points in the database, over sequential search. We demonstrate that the curse of dimensionality is not based on the dimension of the data itself, but primarily upon the effective dimension of the distance function. We also introduce a new distance function that utilizes fewer dimensions of the higher dimensional space to produce a maximal lower bound distance in order to approximate the full distance function. This work has demonstrated significant dimension reduction, up to 70% reduction with an improvement in accuracy or over 99% with only a 6% loss in accuracy on a prostate cancer data set. %B Similarity Search and Applications, 2008. SISAP 2008. First International Workshop on %P 35 - 42 %8 2008/04// %G eng %R 10.1109/SISAP.2008.20 %0 Conference Paper %B Semantic Media Adaptation and Personalization, 2008. SMAP '08. Third International Workshop on %D 2008 %T A Logic Framework for Sports Video Summarization Using Text-Based Semantic Annotation %A Refaey,M.A. %A Abd-Almageed, Wael %A Davis, Larry S. %K (mathematics);video %K analysis;trees %K annotation;Internet;broadcasting;sport;text %K AUTOMATIC %K detection;logic %K engine;parse %K event %K PROCESSING %K processing; %K semantic %K signal %K summarization;text %K trees;sports %K video %K Webcasting;text-based %X Detection of semantic events in sports videos is an essential step towards video summarization. A large volume of research has been conducted for automatic semantic event detection and summarization of sports videos. In this paper we present a novel sports video summarization framework using a combination of text, video and logic analysis. Parse trees are used to analyze structured and free-style text webcasting of sports games and extract the game¿s semantic events, such as goals and penalties in soccer games. Semantic events are then hierarchically arranged before being passed to a logic processing engine. The logic engine receives the summary preferences from the user and subsequently parses the event hierarchy to generate the game¿s summary according to the user¿s preferences. The proposed framework was applied to both soccer and basketball videos. We achieved an average accuracy of 98.6% and 100% on soccer and basketball videos, respectively. %B Semantic Media Adaptation and Personalization, 2008. SMAP '08. Third International Workshop on %P 69 - 75 %8 2008/12// %G eng %R 10.1109/SMAP.2008.25 %0 Conference Paper %B Computer Vision and Pattern Recognition, 2007. CVPR '07. IEEE Conference on %D 2007 %T Bilattice-based Logical Reasoning for Human Detection %A Shet,V. D %A Neumann, J. %A Ramesh,V. %A Davis, Larry S. %K automated %K detection;parts-based %K detectors;static %K histograms;human %K images;formal %K interactions;gradient %K logic;image %K logical %K mechanisms;surveillance;video %K processing; %K reasoning;complex %K recognition;inference %K signal %K Surveillance %K systems;bilattice-based %K visual %X The capacity to robustly detect humans in video is a critical component of automated visual surveillance systems. This paper describes a bilattice based logical reasoning approach that exploits contextual information and knowledge about interactions between humans, and augments it with the output of different low level detectors for human detection. Detections from low level parts-based detectors are treated as logical facts and used to reason explicitly about the presence or absence of humans in the scene. Positive and negative information from different sources, as well as uncertainties from detections and logical rules, are integrated within the bilattice framework. This approach also generates proofs or justifications for each hypothesis it proposes. These justifications (or lack thereof) are further employed by the system to explain and validate, or reject potential hypotheses. This allows the system to explicitly reason about complex interactions between humans and handle occlusions. These proofs are also available to the end user as an explanation of why the system thinks a particular hypothesis is actually a human. We employ a boosted cascade of gradient histograms based detector to detect individual body parts. We have applied this framework to analyze the presence of humans in static images from different datasets. %B Computer Vision and Pattern Recognition, 2007. CVPR '07. IEEE Conference on %P 1 - 8 %8 2007/06// %G eng %R 10.1109/CVPR.2007.383133 %0 Conference Paper %B Acoustics, Speech and Signal Processing, 2007. ICASSP 2007. IEEE International Conference on %D 2007 %T Coarse-to-Fine Event Model for Human Activities %A Cuntoor, N.P. %A Chellapa, Rama %K action %K activities;spatial %K airport %K browsing;video %K dataset;activity %K dataset;UCF %K event %K framework;human %K human %K indoor %K Markov %K model %K model;event %K models;image %K probabilities;hidden %K processing; %K recognition;coarse-to-fine %K reduction;video %K representation;image %K resolution %K resolution;image %K sequences;hidden %K sequences;stability;video %K signal %K Surveillance %K tarmac %K TSA %X We analyze coarse-to-fine hierarchical representation of human activities in video sequences. It can be used for efficient video browsing and activity recognition. Activities are modeled using a sequence of instantaneous events. Events in activities can be represented in a coarse-to-fine hierarchy in several ways, i.e., there may not be a unique hierarchical structure. We present five criteria and quantitative measures for evaluating their effectiveness. The criteria are minimalism, stability, consistency, accessibility and applicability. It is desirable to develop activity models that rank highly on these criteria at all levels of hierarchy. In this paper, activities are represented as sequence of event probabilities computed using the hidden Markov model framework. Two aspects of hierarchies are analyzed: the effect of reduced frame rate on the accuracy of events detected at a finer scale; and the effect of reduced spatial resolution on activity recognition. Experiments using the UCF indoor human action dataset and the TSA airport tarmac surveillance dataset show encouraging results %B Acoustics, Speech and Signal Processing, 2007. ICASSP 2007. IEEE International Conference on %V 1 %P I-813 -I-816 - I-813 -I-816 %8 2007/04// %G eng %R 10.1109/ICASSP.2007.366032 %0 Conference Paper %B Acoustics, Speech and Signal Processing, 2007. ICASSP 2007. IEEE International Conference on %D 2007 %T Colluding Fingerprinted Video using the Gradient Attack %A He,Shan %A Kirovski,D. %A M. Wu %K attack;multimedia %K attacks;digital %K content %K data;video %K distribution;fingerprint %K effort;gradient %K fingerprinted %K fingerprinting;disproportional %K fingerprints;colluding %K fingerprints;Laplace %K Gaussian %K identification;multimedia %K of %K processing; %K protection;unauthorized %K signal %K spectrum %K spread %K systems;security %K video;collusion %X Digital fingerprinting is an emerging tool to protect multimedia content from unauthorized distribution by embedding a unique fingerprint into each user's copy. Although several fingerprinting schemes have been proposed in related work, disproportional effort has been targeted towards identifying effective collusion attacks on fingerprinting schemes. Recent introduction of the gradient attack has refined the definition of an optimal attack and demonstrated strong effect on direct-sequence, uniformly distributed, and Gaussian spread spectrum fingerprints when applied to synthetic signals. In this paper, we apply the gradient attack on an existing well-engineered video fingerprinting scheme, refine the attack procedure, and demonstrate that the gradient attack is effective on Laplace fingerprints. Finally, we explore an improvement on fingerprint design to thwart the gradient attack. Results suggest that Laplace fingerprint should be avoided. However, we show that a signal mixed of Laplace and Gaussian fingerprints may serve as a design strategy to disable the gradient attack and force pirates into averaging as a form of adversary collusion. %B Acoustics, Speech and Signal Processing, 2007. ICASSP 2007. IEEE International Conference on %V 2 %P II-161 -II-164 - II-161 -II-164 %8 2007/04// %G eng %R 10.1109/ICASSP.2007.366197 %0 Journal Article %J Annals of the History of Computing, IEEE %D 2007 %T Developing a Computer Science Department at the University of Maryland %A Minker, Jack %K administrative %K data %K department;educational %K MARYLAND %K processing; %K Science %K University;computer %X This article describes the first six years of the Computer Science Department, established in 1973 at the University of Maryland. The department evolved out of the Computer Science Center, which had been instituted in February 1962. In 1980, the National Academy of Sciences judged the department as being among the leading computer science departments in the US. %B Annals of the History of Computing, IEEE %V 29 %P 64 - 75 %8 2007/12//oct %@ 1058-6180 %G eng %N 4 %R 10.1109/MAHC.2007.4407446 %0 Conference Paper %B Computer Vision and Pattern Recognition, 2007. CVPR '07. IEEE Conference on %D 2007 %T Epitomic Representation of Human Activities %A Cuntoor, N.P. %A Chellapa, Rama %K action %K activities %K airport %K dataset;epitomic %K dataset;UCF %K decomposition;modelling;statistics;video %K decomposition;TSA %K dynamical %K human %K indoor %K Iwasawa %K matrix %K matrix;human %K modeling;input %K processing; %K representation;estimated %K sequences;image %K sequences;matrix %K signal %K statistics;linear %K Surveillance %K system %K systems;video %X We introduce an epitomic representation for modeling human activities in video sequences. A video sequence is divided into segments within which the dynamics of objects is assumed to be linear and modeled using linear dynamical systems. The tuple consisting of the estimated system matrix, statistics of the input signal and the initial state value is said to form an epitome. The system matrices are decomposed using the Iwasawa matrix decomposition to isolate the effect of rotation, scaling and projective action on the state vector. "We demonstrate the usefulness of the proposed representation and decomposition for activity recognition using the TSA airport surveillance dataset and the UCF indoor human action dataset. %B Computer Vision and Pattern Recognition, 2007. CVPR '07. IEEE Conference on %P 1 - 8 %8 2007/06// %G eng %R 10.1109/CVPR.2007.383135 %0 Conference Paper %B Computer Vision and Pattern Recognition, 2007. CVPR '07. IEEE Conference on %D 2007 %T From Videos to Verbs: Mining Videos for Activities using a Cascade of Dynamical Systems %A Turaga, P.K. %A Veeraraghavan,A. %A Chellapa, Rama %K activities %K clustering;image %K clustering;video %K extraction;dynamical %K mining;video %K processing; %K sequence %K sequences;pattern %K signal %K stream;video %K systems;single %K video %X Clustering video sequences in order to infer and extract activities from a single video stream is an extremely important problem and has significant potential in video indexing, surveillance, activity discovery and event recognition. Clustering a video sequence into activities requires one to simultaneously recognize activity boundaries (activity consistent subsequences) and cluster these activity subsequences. In order to do this, we build a generative model for activities (in video) using a cascade of dynamical systems and show that this model is able to capture and represent a diverse class of activities. We then derive algorithms to learn the model parameters from a video stream and also show how a single video sequence may be clustered into different clusters where each cluster represents an activity. We also propose a novel technique to build affine, view, rate invariance of the activity into the distance metric for clustering. Experiments show that the clusters found by the algorithm correspond to semantically meaningful activities. %B Computer Vision and Pattern Recognition, 2007. CVPR '07. IEEE Conference on %P 1 - 8 %8 2007/06// %G eng %R 10.1109/CVPR.2007.383170 %0 Conference Paper %B Acoustics, Speech and Signal Processing, 2007. ICASSP 2007. IEEE International Conference on %D 2007 %T Joint Acoustic-Video Fingerprinting of Vehicles, Part I %A Cevher, V. %A Chellapa, Rama %A McClellan, J.H. %K acoustic %K acoustic-video %K components;joint %K detection;acoustic %K estimation;video %K fingerprinting;passive %K processing; %K processing;acoustic %K sensor;vehicle %K sensors;acoustic %K sensors;wheel %K SHAPE %K signal %K speed %K transducers;video %K wave-pattern;envelope %X We address vehicle classification and measurement problems using acoustic and video sensors. In this paper, we show how to estimate a vehicle's speed, width, and length by jointly estimating its acoustic wave-pattern using a single passive sensor that records the vehicle's drive-by noise. The acoustic wave-pattern is approximated using three envelope shape (ES) components, which approximate the shape of the received signal's power envelope. We incorporate the parameters of the ES components along with the estimates of the vehicle engine RPM and number of cylinders to create a vehicle profile vector that forms an intuitive discriminatory feature space. In the companion paper, we discuss vehicle classification and mensuration based on silhouette extraction and wheel detection, using a video sensor. Vehicle speed estimation and classification results are provided using field data. %B Acoustics, Speech and Signal Processing, 2007. ICASSP 2007. IEEE International Conference on %V 2 %P II-745 -II-748 - II-745 -II-748 %8 2007/04// %G eng %R 10.1109/ICASSP.2007.366343 %0 Conference Paper %B Acoustics, Speech and Signal Processing, 2007. ICASSP 2007. IEEE International Conference on %D 2007 %T Joint Acoustic-Video Fingerprinting of Vehicles, Part II %A Cevher, V. %A Guo, F. %A Sankaranarayanan,A. C %A Chellapa, Rama %K acoustic-video %K analysis;video %K approximations;acoustic %K Bayesian %K colour %K density %K efficiency;joint %K estimation;Bayes %K fingerprinting;metrology %K framework;Laplacian %K functions;performance %K fusion;color %K identification;image %K invariants;computational %K methods;acoustic %K metrology;acoustic %K processing; %K processing;fingerprint %K signal %K video %X In this second paper, we first show how to estimate the wheelbase length of a vehicle using line metrology in video. We then address the vehicle fingerprinting problem using vehicle silhouettes and color invariants. We combine the acoustic metrology and classification results discussed in Part I with the video results to improve estimation performance and robustness. The acoustic video fusion is achieved in a Bayesian framework by assuming conditional independence of the observations of each modality. For the metrology density functions, Laplacian approximations are used for computational efficiency. Experimental results are given using field data %B Acoustics, Speech and Signal Processing, 2007. ICASSP 2007. IEEE International Conference on %V 2 %P II-749 -II-752 - II-749 -II-752 %8 2007/04// %G eng %R 10.1109/ICASSP.2007.366344 %0 Conference Paper %B High-Performance Interconnects, 2007. HOTI 2007. 15th Annual IEEE Symposium on %D 2007 %T Layout-Accurate Design and Implementation of a High-Throughput Interconnection Network for Single-Chip Parallel Processing %A Balkan,A.O. %A Horak,M.N. %A Gang Qu %A Vishkin, Uzi %K description %K design;mesh %K interconnection %K languages;multi-threading;multiprocessor %K MoT %K multi-threading;layout-accurate %K network;on-chip %K network;Verilog %K networks;parallel %K of %K on-chip %K Parallel %K processing; %K processing;hardware %K processor;parallel %K processors;on-chip %K programming;pipeline %K registers;single-chip %K simulations;eXplicit %K TREES %K XMT %X A mesh of trees (MoT) on-chip interconnection network has been proposed recently to provide high throughput between memory units and processors for single-chip parallel processing (Balkan et al., 2006). In this paper, we report our findings in bringing this concept to silicon. Specifically, we conduct cycle-accurate Verilog simulations to verify the analytical results claimed in (Balkan et al., 2006). We synthesize and obtain the layout of the MoT interconnection networks of various sizes. To further improve throughput, we investigate different arbitration primitives to handle load and store, the two most common memory operations. We also study the use of pipeline registers in large networks when there are long wires. Simulation based on full network layout demonstrates that significant throughput improvement can be achieved over the original proposed MoT interconnection network. The importance of this work lies in its validation of performance features of the MoT interconnection network, as they were previously shown to be competitive with traditional network solutions. The MoT network is currently used in an eXplicit multi-threading (XMT) on-chip parallel processor, which is engineered to support parallel programming. In that context, a 32-terminal MoT network could support up to 512 on-chip XMT processors. Our 8-terminal network that could serve 8 processor clusters (or 128 total processors), was also accepted recently for fabrication. %B High-Performance Interconnects, 2007. HOTI 2007. 15th Annual IEEE Symposium on %P 21 - 28 %8 2007/08// %G eng %R 10.1109/HOTI.2007.11 %0 Conference Paper %B Computer Vision, 2007. ICCV 2007. IEEE 11th International Conference on %D 2007 %T Robust Visual Tracking Using the Time-Reversibility Constraint %A Wu,Hao %A Chellapa, Rama %A Sankaranarayanan,A. C %A Zhou,S. K %K backward %K constraint;video %K criterion;state %K forward %K frame %K KLT %K processing; %K processing;video %K processing;visual %K signal %K tracker;minimization %K tracking;minimisation;video %K vectors;time-reversibility %X Visual tracking is a very important front-end to many vision applications. We present a new framework for robust visual tracking in this paper. Instead of just looking forward in the time domain, we incorporate both forward and backward processing of video frames using a novel time-reversibility constraint. This leads to a new minimization criterion that combines the forward and backward similarity functions and the distances of the state vectors between the forward and backward states of the tracker. The new framework reduces the possibility of the tracker getting stuck in local minima and significantly improves the tracking robustness and accuracy. Our approach is general enough to be incorporated into most of the current tracking algorithms. We illustrate the improvements due to the proposed approach for the popular KLT tracker and a search based tracker. The experimental results show that the improved KLT tracker significantly outperforms the original KLT tracker. The time-reversibility constraint used for tracking can be incorporated to improve the performance of optical flow, mean shift tracking and other algorithms. %B Computer Vision, 2007. ICCV 2007. IEEE 11th International Conference on %P 1 - 8 %8 2007/10// %G eng %R 10.1109/ICCV.2007.4408956 %0 Journal Article %J Signal Processing Magazine, IEEE %D 2007 %T Signal Processing for Biometric Systems [DSP Forum] %A Jain, A.K. %A Chellapa, Rama %A Draper, S.C. %A Memon, N. %A Phillips,P.J. %A Vetro, A. %K (access %K biometric %K control);security;signal %K forum;signal %K magazine %K PROCESSING %K processing; %K security;biometric %K standardization;fusion %K systems %K technique;multibiometric %K technique;signal %K technology;biometrics %X This IEEE signal processing magazine (SPM) forum discuses signal processing applications, technologies, requirements, and standardization of biometric systems. The forum members bring their expert insights into issues such as biometric security, privacy, and multibiometric and fusion techniques. The invited forum members are Prof. Anil K. Jain of Michigan State University, Prof. Rama Chellappa of the University of Maryland, Dr. Stark C. Draper of theUniversity of Wisconsin in Madison, Prof. Nasir Memon of Polytechnic University, and Dr. P. Jonathon Phillips of the National Institute of Standards and Technology. The moderator of the forum is Dr. Anthony Vetro of Mitsubishi Electric Research Labs, and associate editor of SPM. %B Signal Processing Magazine, IEEE %V 24 %P 146 - 152 %8 2007/11// %@ 1053-5888 %G eng %N 6 %R 10.1109/MSP.2007.905886 %0 Journal Article %J Multimedia, IEEE Transactions on %D 2007 %T Target Tracking Using a Joint Acoustic Video System %A Cevher, V. %A Sankaranarayanan,A. C %A McClellan, J.H. %A Chellapa, Rama %K (numerical %K acoustic %K adaptive %K appearance %K approach;synchronization;time-delay %K data %K delay;acoustic %K divergence;acoustic %K estimate;joint %K estimation;hidden %K feature %K filter;sliding %K Filtering %K fusion;multitarget %K fusion;synchronisation;target %K highways;direction-of-arrival %K Kullback-Leibler %K methods);sensor %K model;particle %K processing; %K processing;automated %K propagation %K removal;optical %K signal %K system;multimodal %K tracking;acoustic %K tracking;direction-of-arrival %K tracking;occlusion;online %K tracking;particle %K tracking;video %K variable;visual %K video %K window;state-space %X In this paper, a multitarget tracking system for collocated video and acoustic sensors is presented. We formulate the tracking problem using a particle filter based on a state-space approach. We first discuss the acoustic state-space formulation whose observations use a sliding window of direction-of-arrival estimates. We then present the video state space that tracks a target's position on the image plane based on online adaptive appearance models. For the joint operation of the filter, we combine the state vectors of the individual modalities and also introduce a time-delay variable to handle the acoustic-video data synchronization issue, caused by acoustic propagation delays. A novel particle filter proposal strategy for joint state-space tracking is introduced, which places the random support of the joint filter where the final posterior is likely to lie. By using the Kullback-Leibler divergence measure, it is shown that the joint operation of the filter decreases the worst case divergence of the individual modalities. The resulting joint tracking filter is quite robust against video and acoustic occlusions due to our proposal strategy. Computer simulations are presented with synthetic and field data to demonstrate the filter's performance %B Multimedia, IEEE Transactions on %V 9 %P 715 - 727 %8 2007/06// %@ 1520-9210 %G eng %N 4 %R 10.1109/TMM.2007.893340 %0 Conference Paper %B Computer Vision and Pattern Recognition, 2007. CVPR '07. IEEE Conference on %D 2007 %T Using Stereo Matching for 2-D Face Recognition Across Pose %A Castillo,C. D %A Jacobs, David W. %K 2D %K estimation;stereo %K Face %K gallery %K image %K image;2D %K image;dynamic %K matching;dynamic %K matching;pose %K processing; %K programming;face %K programming;pose %K query %K recognition;2D %K recognition;image %K variation;stereo %X We propose using stereo matching for 2-D face recognition across pose. We match one 2-D query image to one 2-D gallery image without performing 3-D reconstruction. Then the cost of this matching is used to evaluate the similarity of the two images. We show that this cost is robust to pose variations. To illustrate this idea we built a face recognition system on top of a dynamic programming stereo matching algorithm. The method works well even when the epipolar lines we use do not exactly fit the viewpoints. We have tested our approach on the PIE dataset. In all the experiments, our method demonstrates effective performance compared with other algorithms. %B Computer Vision and Pattern Recognition, 2007. CVPR '07. IEEE Conference on %P 1 - 8 %8 2007/06// %G eng %R 10.1109/CVPR.2007.383111 %0 Conference Paper %B Image Analysis and Processing, 2007. ICIAP 2007. 14th International Conference on %D 2007 %T Video Biometrics %A Chellapa, Rama %A Aggarwal,G. %K (access %K analysis;video %K biometrics;biometrics %K control);face %K dynamics;ofbiometric %K images;surveillance %K inherent %K MOTION %K processing; %K recognition;image %K recognition;still %K scenarios;unconstrained %K scenarios;video %K signal %X A strong requirement to come up with secure and user- friendly ways to authenticate and identify people, to safeguard their rights and interests, has probably been the main guiding force behind biometrics research. Though a vast amount of research has been done to recognize humans based on still images, the problem is still far from solved for unconstrained scenarios. This has led to an increased interest in using video for the task of biometric recognition. Not only does video provide more information, but also is more suitable for recognizing humans in general surveillance scenarios. Other than the multitude of still frames, video makes it possible to characterize biometrics based on inherent dynamics like gait which is not possible with still images. In this paper, we describe several recent algorithms to illustrate the usefulness of videos to identify humans. A brief discussion on remaining challenges is also included. %B Image Analysis and Processing, 2007. ICIAP 2007. 14th International Conference on %P 363 - 370 %8 2007/09// %G eng %R 10.1109/ICIAP.2007.4362805 %0 Conference Paper %B Acoustics, Speech and Signal Processing, 2006. ICASSP 2006 Proceedings. 2006 IEEE International Conference on %D 2006 %T Motion Based Correspondence for 3D Tracking of Multiple Dim Objects %A Veeraraghavan,A. %A Srinivasan, M. %A Chellapa, Rama %A Baird, E. %A Lamont, R. %K 3D %K analysis;motion %K analysis;video %K based %K cameras;feature %K correspondence;motion %K dim %K extraction;image %K extraction;multiple %K features %K MOTION %K objects;video %K processing; %K signal %K tracking;motion %X Tracking multiple objects in a video is a demanding task that is frequently encountered in several systems such as surveillance and motion analysis. Ability to track objects in 3D requires the use of multiple cameras. While tracking multiple objects using multiples video cameras, establishing correspondence between objects in the various cameras is a nontrivial task. Specifically, when the targets are dim or are very far away from the camera, appearance cannot be used in order to establish this correspondence. Here, we propose a technique to establish correspondence across cameras using the motion features extracted from the targets, even when the relative position of the cameras is unknown. Experimental results are provided for the problem of tracking multiple bees in natural flight using two cameras. The reconstructed 3D flight paths of the bees show some interesting flight patterns %B Acoustics, Speech and Signal Processing, 2006. ICASSP 2006 Proceedings. 2006 IEEE International Conference on %V 2 %P II - II %8 2006/05// %G eng %R 10.1109/ICASSP.2006.1660431 %0 Journal Article %J Information Forensics and Security, IEEE Transactions on %D 2006 %T Robust and secure image hashing %A Swaminathan,A. %A Mao,Yinian %A M. Wu %K content-preserving %K cryptography; %K differential %K distortions; %K entropy; %K Filtering %K Fourier %K functions; %K hash %K hashing; %K image %K modifications; %K processing; %K secure %K theory; %K transform; %K transforms; %X Image hash functions find extensive applications in content authentication, database search, and watermarking. This paper develops a novel algorithm for generating an image hash based on Fourier transform features and controlled randomization. We formulate the robustness of image hashing as a hypothesis testing problem and evaluate the performance under various image processing operations. We show that the proposed hash function is resilient to content-preserving modifications, such as moderate geometric and filtering distortions. We introduce a general framework to study and evaluate the security of image hashing systems. Under this new framework, we model the hash values as random variables and quantify its uncertainty in terms of differential entropy. Using this security framework, we analyze the security of the proposed schemes and several existing representative methods for image hashing. We then examine the security versus robustness tradeoff and show that the proposed hashing methods can provide excellent security and robustness. %B Information Forensics and Security, IEEE Transactions on %V 1 %P 215 - 230 %8 2006/06// %@ 1556-6013 %G eng %N 2 %R 10.1109/TIFS.2006.873601 %0 Journal Article %J Image Processing, IEEE Transactions on %D 2006 %T Structure From Planar Motion %A Li,Jian %A Chellapa, Rama %K algebra;road %K analysis;matrix %K camera;surveillance %K directional %K matrix;planar %K MOTION %K motion;stationary %K perspective %K processing; %K reconstruction %K signal %K system;image %K uncertainty;measurement %K vehicles;surveillance;video %K videos;vehicle %X Planar motion is arguably the most dominant type of motion in surveillance videos. The constraints on motion lead to a simplified factorization method for structure from planar motion when using a stationary perspective camera. Compared with methods for general motion , our approach has two major advantages: a measurement matrix that fully exploits the motion constraints is formed such that the new measurement matrix has a rank of at most 3, instead of 4; the measurement matrix needs similar scalings, but the estimation of fundamental matrices or epipoles is not needed. Experimental results show that the algorithm is accurate and fairly robust to noise and inaccurate calibration. As the new measurement matrix is a nonlinear function of the observed variables, a different method is introduced to deal with the directional uncertainty in the observed variables. Differences and the dual relationship between planar motion and planar object are also clarified. Based on our method, a fully automated vehicle reconstruction system has been designed %B Image Processing, IEEE Transactions on %V 15 %P 3466 - 3477 %8 2006/11// %@ 1057-7149 %G eng %N 11 %R 10.1109/TIP.2006.881943 %0 Conference Paper %B Computer Design: VLSI in Computers and Processors, 2005. ICCD 2005. Proceedings. 2005 IEEE International Conference on %D 2005 %T Algorithmic and architectural design methodology for particle filters in hardware %A Sankaranarayanan,A. C %A Chellapa, Rama %A Srivastava, A. %K (numerical %K algorithmic %K architectural %K architectures; %K bearing %K complexity; %K computational %K design %K digital %K evolution; %K Filtering %K filtering; %K filters; %K implementation; %K methodology; %K methods); %K nonGaussian %K nonlinear %K only %K Parallel %K particle %K pipeline %K pipelined %K problem; %K processing; %K state %K tracking %K VLSI %K VLSI; %X In this paper, we present algorithmic and architectural methodology for building particle filters in hardware. Particle filtering is a new paradigm for filtering in presence of nonGaussian nonlinear state evolution and observation models. This technique has found wide-spread application in tracking, navigation, detection problems especially in a sensing environment. So far most particle filtering implementations are not lucrative for real time problems due to excessive computational complexity involved. In this paper, we re-derive the particle filtering theory to make it more amenable to simplified VLSI implementations. Furthermore, we present and analyze pipelined architectural methodology for designing these computational blocks. Finally, we present an application using the bearing only tracking problem and evaluate the proposed architecture and algorithmic methodology. %B Computer Design: VLSI in Computers and Processors, 2005. ICCD 2005. Proceedings. 2005 IEEE International Conference on %P 275 - 280 %8 2005/10// %G eng %R 10.1109/ICCD.2005.20 %0 Conference Paper %B Acoustics, Speech, and Signal Processing, 2005. Proceedings. (ICASSP '05). IEEE International Conference on %D 2005 %T Approximate expressions for the mean and the covariance of the maximum likelihood estimator for acoustic source localization %A Raykar,V.C. %A Duraiswami, Ramani %K (mathematics); %K acoustic %K approximate %K approximation %K array %K array; %K covariance %K estimation; %K expansion; %K expressions; %K function; %K likelihood %K localization; %K matrices; %K matrix; %K maximum %K mean %K microphone %K objective %K processing; %K series %K signal %K source %K Taylor %K theory; %K vector; %K vectors; %X Acoustic source localization using multiple microphones can be formulated as a maximum likelihood estimation problem. The estimator is implicitly defined as the minimum of a certain objective function. As a result, we cannot get explicit expressions for the mean and the covariance of the estimator. We derive approximate expressions for the mean vector and covariance matrix of the estimator using Taylor's series expansion of the implicitly defined estimator. The validity of our expressions is verified by Monte-Carlo simulations. We also study the performance of the estimator for different microphone array configurations. %B Acoustics, Speech, and Signal Processing, 2005. Proceedings. (ICASSP '05). IEEE International Conference on %V 3 %P iii/73 - iii/76 Vol. 3 - iii/73 - iii/76 Vol. 3 %8 2005/03// %G eng %R 10.1109/ICASSP.2005.1415649 %0 Conference Paper %B Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on %D 2005 %T Fast illumination-invariant background subtraction using two views: error analysis, sensor placement and applications %A Lim,Ser-Nam %A Mittal,A. %A Davis, Larry S. %A Paragios,N. %K analysis; %K application; %K background %K cameras; %K configuration; %K DETECTION %K detection; %K error %K error; %K extraction; %K false %K feature %K handling; %K illumination-invariance; %K image %K intelligent %K matching; %K modeling; %K object %K placement; %K processing; %K sensor %K sensors; %K shadow %K stereo %K subtraction; %K video %X Background modeling and subtraction to detect new or moving objects in a scene is an important component of many intelligent video applications. Compared to a single camera, the use of multiple cameras leads to better handling of shadows, specularities and illumination changes due to the utilization of geometric information. Although the result of stereo matching can be used as the feature for detection, it has been shown that the detection process can be made much faster by a simple subtraction of the intensities observed at stereo-generated conjugate pairs in the two views. The methodology however, suffers from false and missed detections due to some geometric considerations. In this paper, we perform a detailed analysis of such errors. Then, we propose a sensor configuration that eliminates false detections. Algorithms are also proposed that effectively eliminate most detection errors due to missed detections, specular reflections and objects being geometrically close to the background. Experiments on several scenes illustrate the utility and enhanced performance of the proposed approach compared to existing techniques. %B Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on %V 1 %P 1071 - 1078 vol. 1 - 1071 - 1078 vol. 1 %8 2005/06// %G eng %R 10.1109/CVPR.2005.155 %0 Conference Paper %B Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on %D 2005 %T Flattening curved documents in images %A Liang,Jian %A DeMenthon,D. %A David Doermann %K calibration; %K camera %K character %K content; %K curved %K distortion; %K document %K document; %K image %K images; %K OCR %K optical %K page %K pictures; %K printed %K processing; %K recognition; %K restoration; %K scanned %K techniques; %K textual %K warping; %X Compared to scanned images, document pictures captured by camera can suffer from distortions due to perspective and page warping. It is necessary to restore a frontal planar view of the page before other OCR techniques can be applied. In this paper we describe a novel approach for flattening a curved document in a single picture captured by an uncalibrated camera. To our knowledge this is the first reported method able to process general curved documents in images without camera calibration. We propose to model the page surface by a developable surface, and exploit the properties (parallelism and equal line spacing) of the printed textual content on the page to recover the surface shape. Experiments show that the output images are much more OCR friendly than the original ones. While our method is designed to work with any general developable surfaces, it can be adapted for typical special cases including planar pages, scans of thick books, and opened books. %B Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on %V 2 %P 338 - 345 vol. 2 - 338 - 345 vol. 2 %8 2005/06// %G eng %R 10.1109/CVPR.2005.163 %0 Journal Article %J Signal Processing Magazine, IEEE %D 2005 %T An interactive and team approach to multimedia design curriculum %A M. Wu %A Liu,K. J.R %K approach; %K communication; %K courses; %K curriculum %K curriculum; %K design %K development; %K digital %K education; %K educational %K interactive %K learning; %K multimedia %K multimedia; %K processing; %K signal %K team %X Over the past decade, increasingly powerful technologies have made it easier to compress, distribute, and store multimedia content. The merger of computing and communications has created a ubiquitous infrastructure that brings digital multimedia closer to the users and opens up tremendous educational and commercial opportunities in multimedia content creation, delivery, rendering, and archiving for millions of users worldwide. Multimedia has become a basic skill demanded by an increasing number of potential jobs for electrical engineering/computer science graduates. In this article, the authors intend to share their experiences and new ways of thinking about curriculum development. It is beneficial for colleagues in the multimedia signal processing areas for use in developing or revising the curriculum to fit the needs and resources of their own programs. %B Signal Processing Magazine, IEEE %V 22 %P 14 - 19 %8 2005/11// %@ 1053-5888 %G eng %N 6 %R 10.1109/MSP.2005.1550186 %0 Conference Paper %B Acoustics, Speech, and Signal Processing, 2005. Proceedings. (ICASSP '05). IEEE International Conference on %D 2005 %T Moving Object Segmentation and Dynamic Scene Reconstruction Using Two Frames %A Agrawala, Ashok K. %A Chellapa, Rama %K 3D %K analysis; %K constraints; %K dynamic %K ego-motion %K estimation; %K flow %K image %K images; %K independent %K INTENSITY %K least %K mean %K median %K method; %K methods; %K model; %K MOTION %K motion; %K moving %K object %K of %K parallax %K parallax; %K parametric %K processing; %K reconstruction; %K scene %K segmentation; %K signal %K squares %K squares; %K static %K structure; %K subspace %K surface %K translational %K two-frame %K unconstrained %K video %B Acoustics, Speech, and Signal Processing, 2005. Proceedings. (ICASSP '05). IEEE International Conference on %V 2 %P 705 - 708 %8 2005//18/23 %G eng %R 10.1109/ICASSP.2005.1415502 %0 Conference Paper %B Information Fusion, 2005 8th International Conference on %D 2005 %T A new approach to image fusion based on cokriging %A Memarsadeghi,N. %A Le Moigne,J. %A Mount, Dave %A Morisette,J. %K ALI; %K analysis; %K based %K cokriging; %K component %K data; %K forecasting %K fusion %K fusion; %K geophysical %K geostatistical %K Hyperion %K image %K Interpolation %K interpolation; %K invasive %K ISFS %K method; %K metrics; %K PCA; %K principal %K processing; %K project; %K QUALITY %K quantitative %K remote %K remotely %K sensed %K sensing; %K sensor %K sensors; %K signal %K species %K system; %K techniques; %K transforms; %K wavelet %K wavelet-based %X We consider the image fusion problem involving remotely sensed data. We introduce cokriging as a method to perform fusion. We investigate the advantages of fusing Hyperion with ALI. This evaluation is performed by comparing the classification of the fused data with that of input images and by calculating well-chosen quantitative fusion quality metrics. We consider the invasive species forecasting system (ISFS) project as our fusion application. The fusion of ALI with Hyperion data is studied using PCA and wavelet-based fusion. We then propose utilizing a geostatistical based interpolation method called cokriging as a new approach for image fusion. %B Information Fusion, 2005 8th International Conference on %V 1 %P 8 pp. - 8 pp. %8 2005/07// %G eng %R 10.1109/ICIF.2005.1591912 %0 Conference Paper %B Data Engineering, 2005. ICDE 2005. Proceedings. 21st International Conference on %D 2005 %T RDF aggregate queries and views %A Hung,E. %A Deng,Yu %A V.S. Subrahmanian %K aggregate %K databases; %K DBMS; %K description %K framework; %K languages; %K Maintenance %K methods; %K processing; %K queries; %K query %K RDF %K relational %K resource %K standard; %K standards; %K view %K Web %X Resource description framework (RDF) is a rapidly expanding Web standard. RDF databases attempt to track the massive amounts of Web data and services available. In this paper, we study the problem of aggregate queries. We develop an algorithm to compute answers to aggregate queries over RDF databases and algorithms to maintain views involving those aggregates. Though RDF data can be stored in a standard relational DBMS (and hence we can execute standard relational aggregate queries and view maintenance methods on them), we show experimentally that our algorithms that operate directly on the RDF representation exhibit significantly superior performance. %B Data Engineering, 2005. ICDE 2005. Proceedings. 21st International Conference on %P 717 - 728 %8 2005/04// %G eng %R 10.1109/ICDE.2005.121 %0 Conference Paper %B Acoustics, Speech, and Signal Processing, 2005. Proceedings. (ICASSP '05). IEEE International Conference on %D 2005 %T A robust and self-reconfigurable design of spherical microphone array for multi-resolution beamforming %A Zhiyun Li %A Duraiswami, Ramani %K 3D %K anti-terrorism; %K array %K array; %K arrays; %K audio %K beam %K beamforming; %K beampattern %K directivity %K Frequency %K microphone %K multiresolution %K omnidirectional %K optimisation; %K optimization; %K processing; %K reorganization %K response; %K robustness; %K sampling; %K self-reconfigurable %K signal %K soundfield %K spherical %K steering; %X We describe a robust and self-reconfigurable design of a spherical microphone array for beamforming. Our approach achieves a multi-resolution spherical beamformer with performance that is either optimal in the approximation of desired beampattern or is optimal in the directivity achieved, both robustly. Our implementation converges to the optimal performance quickly while exactly satisfying the specified frequency response and robustness constraint in each iteration step without accumulated round-off errors. The advantage of this design lies in its robustness and self-reconfiguration in microphone array reorganization, such as microphone failure, which is highly desirable in online maintenance and anti-terrorism. Design examples and simulation results are presented. %B Acoustics, Speech, and Signal Processing, 2005. Proceedings. (ICASSP '05). IEEE International Conference on %V 4 %P iv/1137 - iv/1140 Vol. 4 - iv/1137 - iv/1140 Vol. 4 %8 2005/03// %G eng %R 10.1109/ICASSP.2005.1416214 %0 Conference Paper %B Acoustics, Speech, and Signal Processing, 2005. Proceedings. (ICASSP '05). IEEE International Conference on %D 2005 %T Security of feature extraction in image hashing %A Swaminathan,A. %A Mao,Yinian %A M. Wu %K cryptography; %K differential %K digital %K entropy; %K extraction; %K feature %K functions; %K hash %K hashing; %K image %K metric; %K processing; %K randomness; %K robustness; %K Security %K signature; %K signatures; %X Security and robustness are two important requirements for image hash functions. We introduce "differential entropy" as a metric to quantify the amount of randomness in image hash functions and to study their security. We present a mathematical framework and derive expressions for the proposed security metric for various common image hashing schemes. Using the proposed security metric, we discuss the trade-offs between security and robustness in image hashing. %B Acoustics, Speech, and Signal Processing, 2005. Proceedings. (ICASSP '05). IEEE International Conference on %V 2 %P ii/1041 - ii/1044 Vol. 2 - ii/1041 - ii/1044 Vol. 2 %8 2005/03// %G eng %R 10.1109/ICASSP.2005.1415586 %0 Journal Article %J Speech and Audio Processing, IEEE Transactions on %D 2005 %T Speaker Localization Using Excitation Source Information in Speech %A Raykar,V.C. %A Yegnanarayana,B. %A Prasanna,S. R.M %A Duraiswami, Ramani %K correlation %K correlation; %K cross %K Delay %K error %K error; %K estimation; %K excitation %K generalized %K information; %K localization; %K mean %K methods; %K processing; %K production; %K root %K source %K speaker %K speech %K square %K TIME %X This paper presents the results of simulation and real room studies for localization of a moving speaker using information about the excitation source of speech production. The first step in localization is the estimation of time-delay from speech collected by a pair of microphones. Methods for time-delay estimation generally use spectral features that correspond mostly to the shape of vocal tract during speech production. Spectral features are affected by degradations due to noise and reverberation. This paper proposes a method for localizing a speaker using features that arise from the excitation source during speech production. Experiments were conducted by simulating different noise and reverberation conditions to compare the performance of the time-delay estimation and source localization using the proposed method with the results obtained using the spectrum-based generalized cross correlation (GCC) methods. The results show that the proposed method shows lower number of discrepancies in the estimated time-delays. The bias, variance and the root mean square error (RMSE) of the proposed method is consistently equal or less than the GCC methods. The location of a moving speaker estimated using the time-delays obtained by the proposed method are closer to the actual values, than those obtained by the GCC method. %B Speech and Audio Processing, IEEE Transactions on %V 13 %P 751 - 761 %8 2005/09// %@ 1063-6676 %G eng %N 5 %R 10.1109/TSA.2005.851907 %0 Conference Paper %B Image Processing, 2005. ICIP 2005. IEEE International Conference on %D 2005 %T Tracking objects in video using motion and appearance models %A Sankaranarayanan,A. C %A Chellapa, Rama %A Qinfen Zheng %K algorithm; %K analysis; %K appearance %K background %K estimation; %K image %K likelihood %K maximum %K model; %K models; %K MOTION %K object %K processing; %K signal %K target %K tracking %K tracking; %K video %K visual %X This paper proposes a visual tracking algorithm that combines motion and appearance in a statistical framework. It is assumed that image observations are generated simultaneously from a background model and a target appearance model. This is different from conventional appearance-based tracking, that does not use motion information. The proposed algorithm attempts to maximize the likelihood ratio of the tracked region, derived from appearance and background models. Incorporation of motion in appearance based tracking provides robust tracking, even when the target violates the appearance model. We show that the proposed algorithm performs well in tracking targets efficiently over long time intervals. %B Image Processing, 2005. ICIP 2005. IEEE International Conference on %V 2 %P II - 394-7 - II - 394-7 %8 2005/09// %G eng %R 10.1109/ICIP.2005.1530075 %0 Conference Paper %B Advanced Video and Signal Based Surveillance, 2005. AVSS 2005. IEEE Conference on %D 2005 %T VidMAP: video monitoring of activity with Prolog %A Shet,V. D %A Harwood,D. %A Davis, Larry S. %K activities %K algorithms; %K based %K Computer %K computerised %K engine; %K higher %K image %K level %K Logic %K monitoring; %K multicamera %K processing; %K programming; %K Prolog %K PROLOG; %K reasoning %K recognition; %K scenario; %K signal %K streaming; %K streams; %K Surveillance %K surveillance; %K system; %K video %K VISION %K vision; %K visual %X This paper describes the architecture of a visual surveillance system that combines real time computer vision algorithms with logic programming to represent and recognize activities involving interactions amongst people, packages and the environments through which they move. The low level computer vision algorithms log primitive events of interest as observed facts, while the higher level Prolog based reasoning engine uses these facts in conjunction with predefined rules to recognize various activities in the input video streams. The system is illustrated in action on a multi-camera surveillance scenario that includes both security and safety violations. %B Advanced Video and Signal Based Surveillance, 2005. AVSS 2005. IEEE Conference on %P 224 - 229 %8 2005/09// %G eng %R 10.1109/AVSS.2005.1577271 %0 Conference Paper %B Distributed Computing Systems, 2004. Proceedings. 24th International Conference on %D 2004 %T Adaptive replication in peer-to-peer systems %A Gopalakrishnan,V. %A Silaghi,B. %A Bhattacharjee, Bobby %A Keleher,P. %K adaptive %K allocation; %K data %K databases; %K decentralized %K delivery %K distributed %K LAR %K low-latency %K peer-to-peer %K processing; %K protocol; %K replicated %K replication %K resource %K strategies; %K structured %K system-neutral %K system; %K systems; %X Peer-to-peer systems can be used to form a low-latency decentralized data delivery system. Structured peer-to-peer systems provide both low latency and excellent load balance with uniform query and data distributions. Under the more common skewed access distributions, however, individual nodes are easily overloaded, resulting in poor global performance and lost messages. This paper describes a lightweight, adaptive, and system-neutral replication protocol, called LAR, that maintains low access latencies and good load balance even under highly skewed demand. We apply LAR to Chord and show that it has lower overhead and better performance than existing replication strategies. %B Distributed Computing Systems, 2004. Proceedings. 24th International Conference on %P 360 - 369 %8 2004/// %G eng %R 10.1109/ICDCS.2004.1281601 %0 Conference Paper %B Acoustics, Speech, and Signal Processing, 2004. Proceedings. (ICASSP '04). IEEE International Conference on %D 2004 %T Appearance-based tracking and recognition using the 3D trilinear tensor %A Jie Shao %A Zhou,S. K %A Chellapa, Rama %K 3D %K adaptive %K affine-transformation %K airborne %K algorithm; %K appearance %K appearance-based %K based %K estimation; %K geometrical %K image %K mathematical %K novel %K object %K operator; %K operators; %K perspective %K prediction; %K processing; %K recognition; %K representation; %K signal %K structure %K synthesis; %K template %K tensor %K tensor; %K tensors; %K tracking; %K transformation; %K trilinear %K updating; %K video %K video-based %K video; %K view %X The paper presents an appearance-based adaptive algorithm for simultaneous tracking and recognition by generalizing the transformation model to 3D perspective transformation. A trilinear tensor operator is used to represent the 3D geometrical structure. The tensor is estimated by predicting the corresponding points using the existing affine-transformation based algorithm. The estimated tensor is used to synthesize novel views to update the appearance templates. Some experimental results using airborne video are presented. %B Acoustics, Speech, and Signal Processing, 2004. Proceedings. (ICASSP '04). IEEE International Conference on %V 3 %P iii - 613-16 vol.3 - iii - 613-16 vol.3 %8 2004/05// %G eng %R 10.1109/ICASSP.2004.1326619 %0 Conference Paper %B Circuits and Systems, 2004. ISCAS '04. Proceedings of the 2004 International Symposium on %D 2004 %T Arbitrate-and-move primitives for high throughput on-chip interconnection networks %A Balkan,A.O. %A Gang Qu %A Vishkin, Uzi %K 8 %K arbiter %K arbitrate-and-move %K architecture; %K asynchronous %K balanced %K binary %K circuit %K circuit; %K circuits; %K consumption; %K data %K explicit %K interconnection %K interconnections; %K leaf %K mesh-of-trees %K multi-threading; %K Multithreading %K n-leaf %K network; %K pipeline %K pipelined %K power %K primitive %K processing; %K reduced %K simulation; %K structures; %K synchronous %K synchrony %K system-on-chip; %K tree %K tree; %X An n-leaf pipelined balanced binary tree is used for arbitration of order and movement of data from n input ports to one output port. A novel arbitrate-and-move primitive circuit for every node of the tree, which is based on a concept of reduced synchrony that benefits from attractive features of both asynchronous and synchronous designs, is presented. The design objective of the pipelined binary tree is to provide a key building block in a high-throughput mesh-of-trees interconnection network for Explicit Multi Threading (XMT) architecture, a recently introduced parallel computation framework. The proposed reduced synchrony circuit was compared with asynchronous and synchronous designs of arbitrate-and-move primitives. Simulations with 0.18 mu;m technology show that compared to an asynchronous design, the proposed reduced synchrony implementation achieves a higher throughput, up to 2 Giga-Requests per second on an 8-leaf binary tree. Our circuit also consumes less power than the synchronous design, and requires less silicon area than both the synchronous and asynchronous designs. %B Circuits and Systems, 2004. ISCAS '04. Proceedings of the 2004 International Symposium on %V 2 %P II - 441-4 Vol.2 - II - 441-4 Vol.2 %8 2004/05// %G eng %R 10.1109/ISCAS.2004.1329303 %0 Conference Paper %B Acoustics, Speech, and Signal Processing, 2004. Proceedings. (ICASSP '04). IEEE International Conference on %D 2004 %T Automatic position calibration of multiple microphones %A Raykar,V.C. %A Duraiswami, Ramani %K approximations; %K array %K audio %K AUTOMATIC %K calibration; %K closed %K covariance; %K dimensional %K estimation; %K form %K function %K implicit %K least %K likelihood %K loudspeakers; %K maximum %K microphone %K microphones; %K minimisation; %K minimization; %K multiple %K nonlinear %K position %K positions; %K problem; %K processing; %K signal %K solution; %K squares %K theorem; %K three %X We describe a method to determine automatically the relative three dimensional positions of multiple microphones using at least five loudspeakers in unknown positions. The only assumption we make is that there is a microphone which is very close to a loudspeaker. In our experimental setup, we attach one microphone to each loudspeaker. We derive the maximum likelihood estimator and the solution turns out to be a non-linear least squares problem. A closed form solution which can be used as the initial guess for the minimization routine is derived. We also derive an approximate expression for the covariance of the estimator using the implicit function theorem. Using this, we analyze the performance of the estimator with respect to the positions of the loudspeakers. The algorithm is validated using both Monte-Carlo simulations and a real-time experimental setup. %B Acoustics, Speech, and Signal Processing, 2004. Proceedings. (ICASSP '04). IEEE International Conference on %V 4 %P iv-69 - iv-72 vol.4 - iv-69 - iv-72 vol.4 %8 2004/05// %G eng %R 10.1109/ICASSP.2004.1326765 %0 Conference Paper %B Image Processing, 2004. ICIP '04. 2004 International Conference on %D 2004 %T Data hiding in curves for collusion-resistant digital fingerprinting %A Gou,Hongmei %A M. Wu %K (mathematics); %K B-spline %K coding; %K collusion-resistant %K CONTROL %K data %K devices; %K digital %K document %K encapsulation; %K extraction; %K feature %K fingerprinting; %K hiding; %K image %K INPUT %K maps; %K model; %K pen-based %K points; %K printing-and-scanning %K processing; %K robustness; %K sequence; %K spectrum %K splines %K spread %K topographic %K watermarking; %X This paper presents a new data hiding method for curves. The proposed algorithm parameterizes a curve using the B-spline model and adds a spread spectrum sequence in the coordinates of the B-spline control points. We demonstrate through experiments the robustness of the proposed data hiding algorithm against printing-and-scanning and collusions, and show its feasibility for collusion-resistant fingerprinting of topographic maps as well as writings/drawings from pen-based input devices. %B Image Processing, 2004. ICIP '04. 2004 International Conference on %V 1 %P 51 - 54 Vol. 1 - 51 - 54 Vol. 1 %8 2004/10// %G eng %R 10.1109/ICIP.2004.1418687 %0 Conference Paper %B Image Processing, 2004. ICIP '04. 2004 International Conference on %D 2004 %T A fine-structure image/video quality measure using local statistics %A Kim,Kyungnam %A Davis, Larry S. %K algorithm; %K background %K degradation; %K detection; %K foreground %K image %K line-structure %K local %K measure; %K modeling; %K no-reference %K object %K objective %K processing; %K QUALITY %K signal %K statistics; %K subtraction %K surveillance; %K video %X An objective no-reference measure is presented to assess line-structure image/video quality. It was designed to measure image/video quality for video surveillance applications, especially for background modeling and foreground object detection. The proposed measure using local statistics reflects image degradation well in terms of noise and blur. The experimental results on a background subtraction algorithm validate the usefulness of the proposed measure, by showing its correlation with the algorithm's performance. %B Image Processing, 2004. ICIP '04. 2004 International Conference on %V 5 %P 3535 - 3538 Vol. 5 - 3535 - 3538 Vol. 5 %8 2004/10// %G eng %R 10.1109/ICIP.2004.1421879 %0 Conference Paper %B Acoustics, Speech, and Signal Processing, 2004. Proceedings. (ICASSP '04). IEEE International Conference on %D 2004 %T Flexible layout and optimal cancellation of the orthonormality error for spherical microphone arrays %A Zhiyun Li %A Duraiswami, Ramani %A Grassi,E. %A Davis, Larry S. %K array %K audio %K base; %K beamforming; %K cable %K cancellation %K correction; %K error %K flexible %K harmonics; %K higher %K layout; %K microphone %K microphones; %K mounting %K optimization; %K order %K orthonormality %K outlets; %K processing; %K signal %K spherical %K surface; %X This paper describes an approach to achieving a flexible layout of microphones on the surface of a spherical microphone array for beamforming. Our approach achieves orthonormality of spherical harmonics to higher order for relatively distributed layouts. This gives great flexibility in microphone layout on the spherical surface. One direct advantage is that it makes it much easier to build a real world system, such as those with cable outlets and a mounting base, with minimal effects on the performance. Simulation results are presented. %B Acoustics, Speech, and Signal Processing, 2004. Proceedings. (ICASSP '04). IEEE International Conference on %V 4 %P iv-41 - iv-44 vol.4 - iv-41 - iv-44 vol.4 %8 2004/05// %G eng %R 10.1109/ICASSP.2004.1326758 %0 Conference Paper %B Automatic Face and Gesture Recognition, 2004. Proceedings. Sixth IEEE International Conference on %D 2004 %T Illuminating light field: image-based face recognition across illuminations and poses %A Zhou,Shaohua %A Chellapa, Rama %K Face %K field; %K illuminating %K image-based %K Lambertain %K light %K lighting; %K model; %K multidimensional %K poses; %K processing; %K recognition; %K reflectance %K reflectivity; %K signal %X We present an image-based method for face recognition across different illuminations and different poses, where the term 'image-based' means that only 2D images are used and no explicit 3D models are needed. As face recognition across illuminations and poses involves three factors, namely identity, illumination, and pose, generalizations from known identities to novel identities, from known illuminations to novel illuminations, and from known poses to unknown poses are desired. Our approach, called the illuminating light field, derives an identity signature that is invariant to illuminations and poses, where a subspace encoding is assumed for the identity, a Lambertain reflectance model for the illumination, and a light field model for the poses. Experimental results using the PIE database demonstrate the effectiveness of the proposed approach. %B Automatic Face and Gesture Recognition, 2004. Proceedings. Sixth IEEE International Conference on %P 229 - 234 %8 2004/05// %G eng %R 10.1109/AFGR.2004.1301536 %0 Conference Paper %B Cluster Computing and the Grid, 2004. CCGrid 2004. IEEE International Symposium on %D 2004 %T Multi-dimensional quorum sets for read-few write-many replica control protocols %A Silaghi,B. %A Keleher,P. %A Bhattacharjee, Bobby %K accesses; %K availability; %K cache %K caching; %K Communication %K complexity; %K CONTROL %K d-spaces; %K distributed %K efficiency; %K lightweight %K logical %K multi-dimensional %K processing; %K protocols; %K quorum %K read %K read-few %K reconfiguration; %K replica %K sets; %K storage; %K structures; %K update %K write-many %X We describe d-spaces, a replica control protocol defined in terms of quorum sets on multi-dimensional logical structures. Our work is motivated by asymmetrical access patterns, where the number of read accesses to data are dominant relative to update accesses, i.e. where the protocols should be read-few write-many. D-spaces are optimal with respect to quorum group sizes. The quality of the tradeoff between read efficiency and update availability is not matched by existing quorum protocols. We also propose a novel scheme for implementing d-spaces that combines caching and local information to provide a best-effort form of global views. This allows quorum reconfiguration to be lightweight without impacting access latencies, even when the rate of membership changes is very high. %B Cluster Computing and the Grid, 2004. CCGrid 2004. IEEE International Symposium on %P 355 - 362 %8 2004/04// %G eng %R 10.1109/CCGrid.2004.1336588 %0 Conference Paper %B Image Processing, 2004. ICIP '04. 2004 International Conference on %D 2004 %T Multi-level fast multipole method for thin plate spline evaluation %A Zandifar,A. %A Lim,S. %A Duraiswami, Ramani %A Gumerov, Nail A. %A Davis, Larry S. %K (mathematics); %K Computer %K deformation; %K evaluation; %K fast %K image %K MATCHING %K matching; %K metal %K method; %K multilevel %K multipole %K nonrigid %K pixel; %K plate %K plate; %K processing; %K registration; %K resolution; %K spline %K splines %K thin %K vision; %X Image registration is an important problem in image processing and computer vision. Much recent work in image registration is on matching non-rigid deformations. Thin plate splines are an effective image registration method when the deformation between two images can be modeled as the bending of a thin metal plate on point constraints such that the topology is preserved (non-rigid deformation). However, because evaluating the computed TPS model at all the image pixels is computationally expensive, we need to speed it up. We introduce the use of multi-level fast muitipole method (MLFMM) for this purpose. Our contribution lies in the presentation of a clear and concise MLFMM framework for TPS, which will be useful for future application developments. The achieved speedup using MLFMM is an improvement from O(N2) to O(N log N). We show that the fast evaluation outperforms the brute force method while maintaining acceptable error bound. %B Image Processing, 2004. ICIP '04. 2004 International Conference on %V 3 %P 1683 - 1686 Vol. 3 - 1683 - 1686 Vol. 3 %8 2004/10// %G eng %R 10.1109/ICIP.2004.1421395 %0 Conference Paper %B Image Processing, 2004. ICIP '04. 2004 International Conference on %D 2004 %T Multiple view tracking of humans modelled by kinematic chains %A Sundaresan, A. %A Chellapa, Rama %A RoyChowdhury, R. %K 3D %K algorithm; %K analysis; %K body %K calibrated %K cameras; %K chain %K displacement; %K error %K estimation; %K human %K image %K iterative %K kinematic %K kinematics; %K methods; %K model; %K MOTION %K motion; %K multiple %K parameters; %K perspective %K Pixel %K processing; %K projection %K sequences; %K signal %K tracking; %K video %K view %X We use a kinematic chain to model human body motion. We estimate the kinematic chain motion parameters using pixel displacements calculated from video sequences obtained from multiple calibrated cameras to perform tracking. We derive a linear relation between the 2D motion of pixels in terms of the 3D motion parameters of various body parts using a perspective projection model for the cameras, a rigid body motion model for the base body and the kinematic chain model for the body parts. An error analysis of the estimator is provided, leading to an iterative algorithm for calculating the motion parameters from the pixel displacements. We provide experimental results to demonstrate the accuracy of our formulation. We also compare our iterative algorithm to the noniterative algorithm and discuss its robustness in the presence of noise. %B Image Processing, 2004. ICIP '04. 2004 International Conference on %V 2 %P 1009 - 1012 Vol.2 - 1009 - 1012 Vol.2 %8 2004/10// %G eng %R 10.1109/ICIP.2004.1419472 %0 Conference Paper %B Mixed and Augmented Reality, 2004. ISMAR 2004. Third IEEE and ACM International Symposium on %D 2004 %T Recording and reproducing high order surround auditory scenes for mixed and augmented reality %A Zhiyun Li %A Duraiswami, Ramani %A Davis, Larry S. %K array; %K audio %K auditory %K augmented %K Computer %K graphics; %K high %K loudspeaker %K microphone %K mixed %K order %K processing; %K reality %K reality; %K scene; %K signal %K surround %K system; %K technology; %K virtual %K VISION %K vision; %X Virtual reality systems are largely based on computer graphics and vision technologies. However, sound also plays an important role in human's interaction with the surrounding environment, especially for the visually impaired people. In this paper, we develop the theory of recording and reproducing real-world surround auditory scenes in high orders using specially designed microphone and loudspeaker arrays. It is complementary to vision-based technologies in creating mixed and augmented realities. Design examples and simulations are presented. %B Mixed and Augmented Reality, 2004. ISMAR 2004. Third IEEE and ACM International Symposium on %P 240 - 249 %8 2004/11// %G eng %R 10.1109/ISMAR.2004.51 %0 Journal Article %J Multimedia, IEEE Transactions on %D 2004 %T Rendering localized spatial audio in a virtual auditory space %A Zotkin,Dmitry N %A Duraiswami, Ramani %A Davis, Larry S. %K (computer %K 3-D %K audio %K audio; %K auditory %K augmented %K data %K environments; %K functions; %K graphics); %K Head %K interfaces; %K perceptual %K processing; %K reality %K reality; %K related %K rendering %K rendering; %K scene %K signal %K sonification; %K spaces; %K spatial %K transfer %K user %K virtual %X High-quality virtual audio scene rendering is required for emerging virtual and augmented reality applications, perceptual user interfaces, and sonification of data. We describe algorithms for creation of virtual auditory spaces by rendering cues that arise from anatomical scattering, environmental scattering, and dynamical effects. We use a novel way of personalizing the head related transfer functions (HRTFs) from a database, based on anatomical measurements. Details of algorithms for HRTF interpolation, room impulse response creation, HRTF selection from a database, and audio scene presentation are presented. Our system runs in real time on an office PC without specialized DSP hardware. %B Multimedia, IEEE Transactions on %V 6 %P 553 - 564 %8 2004/08// %@ 1520-9210 %G eng %N 4 %R 10.1109/TMM.2004.827516 %0 Conference Paper %B Image Processing, 2004. ICIP '04. 2004 International Conference on %D 2004 %T Robust Bayesian cameras motion estimation using random sampling %A Qian, G. %A Chellapa, Rama %A Qinfen Zheng %K 3D %K baseline %K Bayesian %K CAMERAS %K cameras; %K coarse-to-fine %K consensus %K density %K estimation; %K feature %K function; %K hierarchy %K image %K images; %K importance %K matching; %K MOTION %K posterior %K probability %K probability; %K processing; %K random %K RANSAC; %K real %K realistic %K sample %K sampling; %K scheme; %K sequences; %K stereo %K strategy; %K synthetic %K wide %X In this paper, we propose an algorithm for robust 3D motion estimation of wide baseline cameras from noisy feature correspondences. The posterior probability density function of the camera motion parameters is represented by weighted samples. The algorithm employs a hierarchy coarse-to-fine strategy. First, a coarse prior distribution of camera motion parameters is estimated using the random sample consensus scheme (RANSAC). Based on this estimate, a refined posterior distribution of camera motion parameters can then be obtained through importance sampling. Experimental results using both synthetic and real image sequences indicate the efficacy of the proposed algorithm. %B Image Processing, 2004. ICIP '04. 2004 International Conference on %V 2 %P 1361 - 1364 Vol.2 - 1361 - 1364 Vol.2 %8 2004/10// %G eng %R 10.1109/ICIP.2004.1419754 %0 Conference Paper %B Acoustics, Speech, and Signal Processing, 2004. Proceedings. (ICASSP '04). IEEE International Conference on %D 2004 %T Robust two-camera tracking using homography %A Yue,Zhanfeng %A Zhou,S. K %A Chellapa, Rama %K Carlo %K filter; %K filters; %K frame %K framework; %K homography; %K image %K method; %K methods; %K Monte %K nonlinear %K occlusions; %K optical %K particle %K processing; %K robust %K sequences; %K sequential %K signal %K statistics; %K tracking %K tracking; %K two %K two-camera %K video %K view %K visual %X The paper introduces a two view tracking method which uses the homography relation between the two views to handle occlusions. An adaptive appearance-based model is incorporated in a particle filter to realize robust visual tracking. Occlusion is detected using robust statistics. When there is occlusion in one view, the homography from this view to other views is estimated from previous tracking results and used to infer the correct transformation for the occluded view. Experimental results show the robustness of the two view tracker. %B Acoustics, Speech, and Signal Processing, 2004. Proceedings. (ICASSP '04). IEEE International Conference on %V 3 %P iii - 1-4 vol.3 - iii - 1-4 vol.3 %8 2004/05// %G eng %R 10.1109/ICASSP.2004.1326466 %0 Conference Paper %B Image Processing, 2004. ICIP '04. 2004 International Conference on %D 2004 %T Simultaneous background and foreground modeling for tracking in surveillance video %A Shao, J. %A Zhou,S. K %A Chellapa, Rama %K algorithm; %K analysis; %K background-foreground %K displacement %K estimation; %K image %K information; %K INTENSITY %K modeling; %K MOTION %K processes; %K processing; %K resolution; %K sequences; %K signal %K Stochastic %K Surveillance %K surveillance; %K tracking %K tracking; %K video %X We present a stochastic tracking algorithm for surveillance video where targets are dim and at low resolution. The algorithm builds motion models for both background and foreground by integrating motion and intensity information. Some other merits of the algorithm include adaptive selection of feature points for scene description and defining proper cost functions for displacement estimation. The experimental results show tracking robustness and precision in a challenging video sequences. %B Image Processing, 2004. ICIP '04. 2004 International Conference on %V 2 %P 1053 - 1056 Vol.2 - 1053 - 1056 Vol.2 %8 2004/10// %G eng %R 10.1109/ICIP.2004.1419483 %0 Conference Paper %B Intelligent Transportation Systems, 2004. Proceedings. The 7th International IEEE Conference on %D 2004 %T A spherical microphone array system for traffic scene analysis %A Zhiyun Li %A Duraiswami, Ramani %A Grassi,E. %A Davis, Larry S. %K -6 %K 3D %K analysis; %K array %K arrays; %K audio; %K auditory %K beamformer; %K capture; %K dB; %K environment; %K gain; %K microphone %K NOISE %K noise; %K processing; %K real %K robust %K scene %K signal %K spherical %K system; %K traffic %K traffic; %K virtual %K white %K World %X This paper describes a practical spherical microphone array system for traffic auditory scene capture and analysis. Our system uses 60 microphones positioned on the rigid surface of a sphere. We then propose an optimal design of a robust spherical beamformer with minimum white noise gain (WNG) of -6 dB. We test this system in a real-world traffic environment. Some preliminary simulation and experimental results are presented to demonstrate its performance. This system may also find applications in broader areas such as 3D audio, virtual environment, etc. %B Intelligent Transportation Systems, 2004. Proceedings. The 7th International IEEE Conference on %P 338 - 342 %8 2004/10// %G eng %R 10.1109/ITSC.2004.1398921 %0 Conference Paper %B Parallel Architectures, Algorithms and Networks, 2004. Proceedings. 7th International Symposium on %D 2004 %T Strategies for exploring large scale data %A JaJa, Joseph F. %K algorithms; %K association; %K asymptotic %K bounds; %K business %K data %K data; %K database %K databases; %K demographic %K discovery; %K Indexing %K indexing; %K information %K knowledge %K large %K linear %K mining; %K multidimensional %K objects; %K optimal %K Parallel %K pattern %K processing; %K query %K querying; %K range %K scale %K scientific %K search %K serial %K series %K series; %K simulation %K space %K structure; %K structures; %K techniques; %K temporal %K TIME %K value %K very %K window; %X We consider the problem of querying large scale multidimensional time series data to discover events of interest, test and validate hypotheses, or to associate temporal patterns with specific events. This type of data currently dominates most other types of available data, and will very likely become even more prevalent in the future given the current trends in collecting time series of business, scientific, demographic, and simulation data. The ability to explore such collections interactively, even at a coarse level, will be critical in discovering the information and knowledge embedded in such collections. We develop indexing techniques and search algorithms to efficiently handle temporal range value querying of multidimensional time series data. Our indexing uses linear space data structures that enable the handling of queries in I/O time that is essentially the same as that of handling a single time slice, assuming the availability of a logarithmic number of processors as a function of the temporal window. A data structure with provably almost optimal asymptotic bounds is also presented for the case when the number of multidimensional objects is relatively small. These techniques improve significantly over standard techniques for either serial or parallel processing, and are evaluated by extensive experimental results that confirm their superior performance. %B Parallel Architectures, Algorithms and Networks, 2004. Proceedings. 7th International Symposium on %P 2 - 2 %8 2004/05// %G eng %R 10.1109/ISPAN.2004.1300447 %0 Conference Paper %B Pattern Recognition, 2004. ICPR 2004. Proceedings of the 17th International Conference on %D 2004 %T A system identification approach for video-based face recognition %A Aggarwal,G. %A Chowdhury, A.K.R. %A Chellapa, Rama %K and %K autoregressive %K average %K dynamical %K Face %K gallery %K identification; %K image %K linear %K model; %K moving %K processes; %K processing; %K recognition; %K sequences; %K signal %K system %K system; %K video %K video-based %X The paper poses video-to-video face recognition as a dynamical system identification and classification problem. We model a moving face as a linear dynamical system whose appearance changes with pose. An autoregressive and moving average (ARMA) model is used to represent such a system. The choice of ARMA model is based on its ability to take care of the change in appearance while modeling the dynamics of pose, expression etc. Recognition is performed using the concept of sub space angles to compute distances between probe and gallery video sequences. The results obtained are very promising given the extent of pose, expression and illumination variation in the video data used for experiments. %B Pattern Recognition, 2004. ICPR 2004. Proceedings of the 17th International Conference on %V 4 %P 175 - 178 Vol.4 - 175 - 178 Vol.4 %8 2004/08// %G eng %R 10.1109/ICPR.2004.1333732 %0 Conference Paper %B Image Processing, 2004. ICIP '04. 2004 International Conference on %D 2004 %T Uncalibrated stereo rectification for automatic 3D surveillance %A Lim,S.-N. %A Mittal,A. %A Davis, Larry S. %A Paragios,N. %K 3D %K AUTOMATIC %K conjugate %K epipolar %K image %K lines; %K matching; %K method; %K processing; %K rectification %K scene; %K stereo %K surveillance; %K uncalibrated %K urban %X We describe a stereo rectification method suitable for automatic 3D surveillance. We take advantage of the fact that in a typical urban scene, there is ordinarily a small number of dominant planes. Given two views of the scene, we align a dominant plane in one view with the other. Conjugate epipolar lines between the reference view and plane-aligned image become geometrically identical and can be added to the rectified image pair line by line. Selecting conjugate epipolar lines to cover the whole image is simplified since they are geometrically identical. In addition, the polarities of conjugate epipolar lines are automatically preserved by plane alignment, which simplifies stereo matching. %B Image Processing, 2004. ICIP '04. 2004 International Conference on %V 2 %P 1357 - 1360 Vol.2 - 1357 - 1360 Vol.2 %8 2004/10// %G eng %R 10.1109/ICIP.2004.1419753 %0 Conference Paper %B Acoustics, Speech, and Signal Processing, 2004. Proceedings. (ICASSP '04). IEEE International Conference on %D 2004 %T Vehicle detection and tracking using acoustic and video sensors %A Chellapa, Rama %A Qian,Gang %A Qinfen Zheng %K acoustic %K applications; %K audio %K audio-visual %K beam-forming %K Carlo %K chain %K density %K detection; %K direction-of-arrival %K DOA %K empirical %K estimation; %K framework; %K functions; %K fusion %K fusion; %K joint %K Markov %K methods; %K Monte %K moving %K multimodal %K object %K optical %K posterior %K probability %K probability; %K processes; %K processing; %K sensing; %K sensor %K sensors; %K signal %K Surveillance %K surveillance; %K systems; %K target %K techniques; %K tracking; %K vehicle %K video %X Multimodal sensing has attracted much attention in solving a wide range of problems, including target detection, tracking, classification, activity understanding, speech recognition, etc. In surveillance applications, different types of sensors, such as video and acoustic sensors, provide distinct observations of ongoing activities. We present a fusion framework using both video and acoustic sensors for vehicle detection and tracking. In the detection phase, a rough estimate of target direction-of-arrival (DOA) is first obtained using acoustic data through beam-forming techniques. This initial DOA estimate designates the approximate target location in video. Given the initial target position, the DOA is refined by moving target detection using the video data. Markov chain Monte Carlo techniques are then used for joint audio-visual tracking. A novel fusion approach has been proposed for tracking, based on different characteristics of audio and visual trackers. Experimental results using both synthetic and real data are presented. Improved tracking performance has been observed by fusing the empirical posterior probability density functions obtained using both types of sensors. %B Acoustics, Speech, and Signal Processing, 2004. Proceedings. (ICASSP '04). IEEE International Conference on %V 3 %P iii - 793-6 vol.3 - iii - 793-6 vol.3 %8 2004/05// %G eng %R 10.1109/ICASSP.2004.1326664 %0 Journal Article %J Multimedia, IEEE Transactions on %D 2004 %T Wide baseline image registration with application to 3-D face modeling %A Roy-Chowdhury, A.K. %A Chellapa, Rama %A Keaton, T. %K 2D %K 3D %K algorithm; %K baseline %K biometrics; %K Computer %K configuration; %K correspondence %K doubly %K error %K extraction; %K Face %K feature %K holistic %K image %K matching; %K matrix; %K minimization; %K modeling; %K models; %K normalization %K probability %K probability; %K procedure; %K processes; %K processing; %K recognition; %K registration; %K representation; %K sequences; %K shapes; %K Sinkhorn %K spatial %K statistics; %K Stochastic %K video %K vision; %K wide %X Establishing correspondence between features in two images of the same scene taken from different viewing angles is a challenging problem in image processing and computer vision. However, its solution is an important step in many applications like wide baseline stereo, three-dimensional (3-D) model alignment, creation of panoramic views, etc. In this paper, we propose a technique for registration of two images of a face obtained from different viewing angles. We show that prior information about the general characteristics of a face obtained from video sequences of different faces can be used to design a robust correspondence algorithm. The method works by matching two-dimensional (2-D) shapes of the different features of the face (e.g., eyes, nose etc.). A doubly stochastic matrix, representing the probability of match between the features, is derived using the Sinkhorn normalization procedure. The final correspondence is obtained by minimizing the probability of error of a match between the entire constellation of features in the two sets, thus taking into account the global spatial configuration of the features. The method is applied for creating holistic 3-D models of a face from partial representations. Although this paper focuses primarily on faces, the algorithm can also be used for other objects with small modifications. %B Multimedia, IEEE Transactions on %V 6 %P 423 - 434 %8 2004/06// %@ 1520-9210 %G eng %N 3 %R 10.1109/TMM.2004.827511 %0 Conference Paper %B Computer Vision and Pattern Recognition, 2004. CVPR 2004. Proceedings of the 2004 IEEE Computer Society Conference on %D 2004 %T Window-based, discontinuity preserving stereo %A Agrawal,M. %A Davis, Larry S. %K algorithm; %K approach; %K based %K cuts; %K dense %K discontinuity %K global %K graph %K image %K local %K MATCHING %K matching; %K minimisation; %K optimization; %K Pixel %K preserving %K processing; %K stereo %K theory; %K window %X Traditionally, the problem of stereo matching has been addressed either by a local window-based approach or a dense pixel-based approach using global optimization. In this paper we present an algorithm which combines window-based local matching into a global optimization framework. Our local matching algorithm assumes that local windows can have at most two disparities. Under this assumption, the local matching can be performed very efficiently using graph cuts. The global matching is formulated as minimization of an energy term that takes into account the matching constraints induced by the local stereo algorithm. Fast, approximate minimization of this energy is achieved through graph cuts. The key feature of our algorithm is that it preserves discontinuities both during the local as well as global matching phase. %B Computer Vision and Pattern Recognition, 2004. CVPR 2004. Proceedings of the 2004 IEEE Computer Society Conference on %V 1 %P I-66 - I-73 Vol.1 - I-66 - I-73 Vol.1 %8 2004/07/02/june %G eng %R 10.1109/CVPR.2004.1315015 %0 Conference Paper %B Computer Vision and Pattern Recognition, 2003. Proceedings. 2003 IEEE Computer Society Conference on %D 2003 %T Activity recognition using the dynamics of the configuration of interacting objects %A Vaswani, N. %A RoyChowdhury, A. %A Chellapa, Rama %K 2D %K abnormal %K abnormality %K abnormality; %K acoustic %K activity %K analysis; %K change; %K Computer %K configuration %K configuration; %K data; %K DETECTION %K detection; %K distribution; %K drastic %K dynamics; %K event; %K filter; %K hand-picked %K image %K infrared %K interacting %K learning; %K location %K low %K mean %K model; %K monitoring; %K MOTION %K moving %K noise; %K noisy %K object %K object; %K observation %K observation; %K particle %K pattern %K plane; %K point %K polygonal %K probability %K probability; %K problem; %K processing; %K radar %K recognition; %K resolution %K sensor; %K sensors; %K sequence; %K SHAPE %K shape; %K signal %K slow %K statistic; %K strategy; %K Surveillance %K surveillance; %K target %K test %K tracking; %K video %K video; %K visible %K vision; %X Monitoring activities using video data is an important surveillance problem. A special scenario is to learn the pattern of normal activities and detect abnormal events from a very low resolution video where the moving objects are small enough to be modeled as point objects in a 2D plane. Instead of tracking each point separately, we propose to model an activity by the polygonal 'shape' of the configuration of these point masses at any time t, and its deformation over time. We learn the mean shape and the dynamics of the shape change using hand-picked location data (no observation noise) and define an abnormality detection statistic for the simple case of a test sequence with negligible observation noise. For the more practical case where observation (point locations) noise is large and cannot be ignored, we use a particle filter to estimate the probability distribution of the shape given the noisy observations up to the current time. Abnormality detection in this case is formulated as a change detection problem. We propose a detection strategy that can detect both 'drastic' and 'slow' abnormalities. Our framework can be directly applied for object location data obtained using any type of sensors - visible, radar, infrared or acoustic. %B Computer Vision and Pattern Recognition, 2003. Proceedings. 2003 IEEE Computer Society Conference on %V 2 %P II - 633-40 vol.2 - II - 633-40 vol.2 %8 2003/06// %G eng %R 10.1109/CVPR.2003.1211526 %0 Conference Paper %B Multimedia and Expo, 2003. ICME '03. Proceedings. 2003 International Conference on %D 2003 %T Adaptive visual tracking and recognition using particle filters %A Zhou,Shaohua %A Chellapa, Rama %A Moghaddam, B. %K adaptive %K adaptive-velocity %K appearance %K extra-personal %K Filtering %K filters; %K image %K intra-personal %K model; %K MOTION %K particle %K processing; %K recognition; %K sequence; %K sequences; %K series %K signal %K spaces; %K theory; %K TIME %K tracking; %K video %K visual %X This paper presents an improved method for simultaneous tracking and recognition of human faces from video, where a time series model is used to resolve the uncertainties in tracking and recognition. The improvements mainly arise from three aspects: (i) modeling the inter-frame appearance changes within the video sequence using an adaptive appearance model and an adaptive-velocity motion model; (ii) modeling the appearance changes between the video frames and gallery images by constructing intra- and extra-personal spaces; and (iii) utilization of the fact that the gallery images are in frontal views. By embedding them in a particle filter, we are able to achieve a stabilized tracker and an accurate recognizer when confronted by pose and illumination variations. %B Multimedia and Expo, 2003. ICME '03. Proceedings. 2003 International Conference on %V 2 %P II - 349-52 vol.2 - II - 349-52 vol.2 %8 2003/07// %G eng %R 10.1109/ICME.2003.1221625 %0 Conference Paper %B Image Processing, 2003. ICIP 2003. Proceedings. 2003 International Conference on %D 2003 %T An appearance based approach for human and object tracking %A Capellades,M. B %A David Doermann %A DeMenthon,D. %A Chellapa, Rama %K algorithm; %K analysis; %K background %K basis; %K by %K Color %K colour %K correlogram %K detection; %K distributions; %K frame %K histogram %K human %K image %K information; %K object %K processing; %K segmentation; %K sequences; %K signal %K subtraction %K tracking; %K video %X A system for tracking humans and detecting human-object interactions in indoor environments is described. A combination of correlogram and histogram information is used to model object and human color distributions. Humans and objects are detected using a background subtraction algorithm. The models are built on the fly and used to track them on a frame by frame basis. The system is able to detect when people merge into groups and segment them during occlusion. Identities are preserved during the sequence, even if a person enters and leaves the scene. The system is also able to detect when a person deposits or removes an object from the scene. In the first case the models are used to track the object retroactively in time. In the second case the objects are tracked for the rest of the sequence. Experimental results using indoor video sequences are presented. %B Image Processing, 2003. ICIP 2003. Proceedings. 2003 International Conference on %V 2 %P II - 85-8 vol.3 - II - 85-8 vol.3 %8 2003/09// %G eng %R 10.1109/ICIP.2003.1246622 %0 Journal Article %J Image Processing, IEEE Transactions on %D 2003 %T Data hiding in image and video .I. Fundamental issues and solutions %A M. Wu %A Liu,Bede %K adaptive %K analysis; %K bits; %K colour %K condition; %K constant %K CONTROL %K data %K embedded %K EMBEDDING %K embedding; %K encapsulation; %K extractable %K hiding; %K image %K Modulation %K modulation; %K multilevel %K multiplexing %K multiplexing; %K NOISE %K nonstationary %K processing; %K rate; %K reviews; %K shuffling; %K signal %K signals; %K simulation; %K solution; %K techniques; %K variable %K video %K visual %X We address a number of fundamental issues of data hiding in image and video and propose general solutions to them. We begin with a review of two major types of embedding, based on which we propose a new multilevel embedding framework to allow the amount of extractable data to be adaptive according to the actual noise condition. We then study the issues of hiding multiple bits through a comparison of various modulation and multiplexing techniques. Finally, the nonstationary nature of visual signals leads to highly uneven distribution of embedding capacity and causes difficulty in data hiding. We propose an adaptive solution switching between using constant embedding rate with shuffling and using variable embedding rate with embedded control bits. We verify the effectiveness of our proposed solutions through analysis and simulation. %B Image Processing, IEEE Transactions on %V 12 %P 685 - 695 %8 2003/06// %@ 1057-7149 %G eng %N 6 %R 10.1109/TIP.2003.810588 %0 Journal Article %J Image Processing, IEEE Transactions on %D 2003 %T Data hiding in image and video .II. Designs and applications %A M. Wu %A Yu,H. %A Liu,Bede %K access %K annotation; %K authentication; %K capacity; %K conditions; %K content-based %K CONTROL %K control; %K copy %K data %K distortions; %K EMBEDDING %K embedding; %K encapsulation; %K extraction; %K frame %K hiding; %K image %K information; %K jitter; %K message %K multilevel %K NOISE %K noise; %K payload %K processing; %K robust %K signal %K uneven %K user %K video %X For pt. I see ibid., vol.12, no.6, p.685-95 (2003). This paper applies the solutions to the fundamental issues addressed in Part I to specific design problems of embedding data in image and video. We apply multilevel embedding to allow the amount of embedded information that can be reliably extracted to be adaptive with respect to the actual noise conditions. When extending the multilevel embedding to video, we propose strategies for handling uneven embedding capacity from region to region within a frame as well as from frame to frame. We also embed control information to facilitate the accurate extraction of the user data payload and to combat such distortions as frame jitter. The proposed algorithm can be used for a variety of applications such as copy control, access control, robust annotation, and content-based authentication. %B Image Processing, IEEE Transactions on %V 12 %P 696 - 705 %8 2003/06// %@ 1057-7149 %G eng %N 6 %R 10.1109/TIP.2003.810589 %0 Conference Paper %B Image Analysis and Processing, 2003.Proceedings. 12th International Conference on %D 2003 %T Depth-first k-nearest neighbor finding using the MaxNearestDist estimator %A Samet, Hanan %K branch-and-bound %K data %K depth-first %K distance; %K DNA %K documents; %K estimation; %K estimator; %K finding; %K images; %K k-nearest %K matching; %K maximum %K MaxNearestDist %K mining; %K neighbor %K parameter %K pattern %K possible %K process; %K processing; %K query %K search %K searching; %K sequences; %K series; %K similarity %K text %K TIME %K tree %K video; %X Similarity searching is an important task when trying to find patterns in applications which involve mining different types of data such as images, video, time series, text documents, DNA sequences, etc. Similarity searching often reduces to finding the k nearest neighbors to a query object. A description is given of how to use an estimate of the maximum possible distance at which a nearest neighbor can be found to prune the search process in a depth-first branch-and-bound k-nearest neighbor finding algorithm. Using the MaxNearestDist estimator (Larsen, S. and Kanal, L.N., 1986) in the depth-first k-nearest neighbor algorithm provides a middle ground between a pure depth-first and a best-first k-nearest neighbor algorithm. %B Image Analysis and Processing, 2003.Proceedings. 12th International Conference on %P 486 - 491 %8 2003/09// %G eng %R 10.1109/ICIAP.2003.1234097 %0 Conference Paper %B Applications of Signal Processing to Audio and Acoustics, 2003 IEEE Workshop on. %D 2003 %T HRTF personalization using anthropometric measurements %A Zotkin,Dmitry N %A Hwang,J. %A Duraiswami, Ramani %A Davis, Larry S. %K acoustic %K anthropometric %K audio %K audio; %K auditory %K Ear %K functions; %K Head %K head-and-torso %K HRTF %K individualized %K localization; %K measurements; %K model; %K models; %K parameters; %K perception; %K personalization; %K physiological %K processing; %K related %K scattering; %K scene; %K signal %K sound %K spatial %K subjective %K transfer %K virtual %K wave %X Individualized head related transfer functions (HRTFs) are needed for accurate rendering of spatial audio, which is important in many applications. Since these are relatively tedious to acquire, they may not be acceptable for some applications. A number of studies have sought to perform simple customization of the HRTF. We propose and test a strategy for HRTF personalization, based on matching certain anthropometric ear parameters with the HRTF database, and the incorporation of a low-frequency "head-and-torso" model. We present preliminary tests aimed at evaluation of this customization. Results show that the approach improves both the accuracy of the localization and subjective perception of the virtual auditory scene. %B Applications of Signal Processing to Audio and Acoustics, 2003 IEEE Workshop on. %P 157 - 160 %8 2003/10// %G eng %R 10.1109/ASPAA.2003.1285855 %0 Conference Paper %B Multimedia and Expo, 2003. ICME '03. Proceedings. 2003 International Conference on %D 2003 %T Image-based pan-tilt camera control in a multi-camera surveillance environment %A Lim,Ser-Nam %A Elgammal,A. %A Davis, Larry S. %K automated %K camera %K cameras; %K control; %K detection; %K environment; %K image %K image-based %K information; %K multicamera %K object %K pan-tilt %K position; %K processing; %K sensors; %K Surveillance %K surveillance; %K systems; %K vantage %K zero-position; %K zero-positions; %X In automated surveillance systems with multiple cameras, the system must be able to position the cameras accurately. Each camera must be able to pan-tilt such that an object detected in the scene is in a vantage position in the camera's image plane and subsequently capture images of that object. Typically, camera calibration is required. We propose an approach that uses only image-based information. Each camera is assigned a pan-tilt zero-position. Position of an object detected in one camera is related to the other cameras by homographies between the zero-positions while different pan-tilt positions of the same camera are related in the form of projective rotations. We then derive that the trajectories in the image plane corresponding to these projective rotations are approximately circular for pan and linear for tilt. The camera control technique is subsequently tested in a working prototype. %B Multimedia and Expo, 2003. ICME '03. Proceedings. 2003 International Conference on %V 1 %P I - 645-8 vol.1 - I - 645-8 vol.1 %8 2003/07// %G eng %R 10.1109/ICME.2003.1221000 %0 Journal Article %J Pattern Analysis and Machine Intelligence, IEEE Transactions on %D 2003 %T Properties of embedding methods for similarity searching in metric spaces %A Hjaltason,G. R %A Samet, Hanan %K complex %K contractiveness; %K data %K databases; %K decomposition; %K dimension %K distance %K distortion; %K DNA %K documents; %K EMBEDDING %K embeddings; %K Euclidean %K evaluations; %K FastMap; %K images; %K Lipschitz %K methods; %K metric %K MetricMap; %K multimedia %K processing; %K query %K reduction %K search; %K searching; %K sequences; %K similarity %K singular %K spaces; %K SparseMap; %K types; %K value %X Complex data types-such as images, documents, DNA sequences, etc.-are becoming increasingly important in modern database applications. A typical query in many of these applications seeks to find objects that are similar to some target object, where (dis)similarity is defined by some distance function. Often, the cost of evaluating the distance between two objects is very high. Thus, the number of distance evaluations should be kept at a minimum, while (ideally) maintaining the quality of the result. One way to approach this goal is to embed the data objects in a vector space so that the distances of the embedded objects approximates the actual distances. Thus, queries can be performed (for the most part) on the embedded objects. We are especially interested in examining the issue of whether or not the embedding methods will ensure that no relevant objects are left out. Particular attention is paid to the SparseMap, FastMap, and MetricMap embedding methods. SparseMap is a variant of Lipschitz embeddings, while FastMap and MetricMap are inspired by dimension reduction methods for Euclidean spaces. We show that, in general, none of these embedding methods guarantee that queries on the embedded objects have no false dismissals, while also demonstrating the limited cases in which the guarantee does hold. Moreover, we describe a variant of SparseMap that allows queries with no false dismissals. In addition, we show that with FastMap and MetricMap, the distances of the embedded objects can be much greater than the actual distances. This makes it impossible (or at least impractical) to modify FastMap and MetricMap to guarantee no false dismissals. %B Pattern Analysis and Machine Intelligence, IEEE Transactions on %V 25 %P 530 - 549 %8 2003/05// %@ 0162-8828 %G eng %N 5 %R 10.1109/TPAMI.2003.1195989 %0 Conference Paper %B Data Engineering, 2003. Proceedings. 19th International Conference on %D 2003 %T PXML: a probabilistic semistructured data model and algebra %A Hung,E. %A Getoor, Lise %A V.S. Subrahmanian %K algebra; %K data %K databases; %K instances; %K model; %K models; %K probabilistic %K processing; %K PXML; %K query %K relational %K semistructured %K structures; %K tree %K XML; %X Despite the recent proliferation of work on semistructured data models, there has been little work to date on supporting uncertainty in these models. We propose a model for probabilistic semistructured data (PSD). The advantage of our approach is that it supports a flexible representation that allows the specification of a wide class of distributions over semistructured instances. We provide two semantics for the model and show that the semantics are probabilistically coherent. Next, we develop an extension of the relational algebra to handle probabilistic semistructured data and describe efficient algorithms for answering queries that use this algebra. Finally, we present experimental results showing the efficiency of our algorithms. %B Data Engineering, 2003. Proceedings. 19th International Conference on %P 467 - 478 %8 2003/03// %G eng %R 10.1109/ICDE.2003.1260814 %0 Conference Paper %B Proceedings. IEEE Conference on Advanced Video and Signal Based Surveillance, 2003. %D 2003 %T Scalable image-based multi-camera visual surveillance system %A Lim,Ser-Nam %A Davis, Larry S. %A Elgammal,A. %K ACQUISITION %K algorithm; %K camera; %K constraints; %K feature %K hidden %K image-based %K MATCHING %K maximum %K multi-camera %K occlusion %K pan-tilt-zoom %K PLAN %K prediction; %K processing; %K removal; %K scalable %K scheduling; %K signal %K Surveillance %K surveillance; %K system; %K task %K video %K view; %K visibility %K visual %K weight %X We describe the design of a scalable and wide coverage visual surveillance system. Scalability (the ability to add and remove cameras easily during system operation with minimal overhead and system degradation) is achieved by utilizing only image-based information for camera control. We show that when a pan-tilt-zoom camera pans and tilts, a given image point moves in a circular and a linear trajectory, respectively. We create a scene model using a plan view of the scene. The scene model makes it easy for us to handle occlusion prediction and schedule video acquisition tasks subject to visibility constraints. We describe a maximum weight matching algorithm to assign cameras to tasks that meet the visibility constraints. The system is illustrated both through simulations and real video from a 6-camera configuration. %B Proceedings. IEEE Conference on Advanced Video and Signal Based Surveillance, 2003. %P 205 - 212 %8 2003/07// %G eng %R 10.1109/AVSS.2003.1217923 %0 Conference Paper %B Multimedia and Expo, 2003. ICME '03. Proceedings. 2003 International Conference on %D 2003 %T Shape and motion driven particle filtering for human body tracking %A Yamamoto, T. %A Chellapa, Rama %K 3D %K body %K broadcast %K camera; %K cameras; %K estimation; %K Filtering %K framework; %K human %K image %K MOTION %K motion; %K particle %K processing; %K rotational %K sequence; %K sequences; %K signal %K single %K static %K theory; %K tracking; %K TV %K video %X In this paper, we propose a method to recover 3D human body motion from a video acquired by a single static camera. In order to estimate the complex state distribution of a human body, we adopt the particle filtering framework. We present the human body using several layers of representation and compose the whole body step by step. In this way, more effective particles are generated and ineffective particles are removed as we process each layer. In order to deal with the rotational motion, the frequency of rotation is obtained using a preprocessing operation. In the preprocessing step, the variance of the motion field at each image is computed, and the frequency of rotation is estimated. The estimated frequency is used for the state update in the algorithm. We successfully track the movement of figure skaters in TV broadcast image sequence, and recover the 3D shape and motion of the skater. %B Multimedia and Expo, 2003. ICME '03. Proceedings. 2003 International Conference on %V 3 %P III - 61-4 vol.3 - III - 61-4 vol.3 %8 2003/07// %G eng %R 10.1109/ICME.2003.1221248 %0 Conference Paper %B Computer Vision and Pattern Recognition, 2003. Proceedings. 2003 IEEE Computer Society Conference on %D 2003 %T Simultaneous pose and correspondence determination using line features %A David,P. %A DeMenthon,D. %A Duraiswami, Ramani %A Samet, Hanan %K algorithm; %K algorithms; %K annealing; %K clutter; %K cluttered %K Computer %K correspondence %K detection; %K determination; %K deterministic %K environment; %K extraction; %K feature %K feature; %K image %K image; %K imagery; %K images; %K joint %K line %K local %K man-made %K MATCHING %K matching; %K measurement; %K model-to-image %K noise; %K occlusion; %K optimum; %K perspective %K point %K pose %K position %K problem; %K processing; %K real %K realistic %K registration %K simulated %K softassign; %K SoftPOSIT %K stereo %K synthetic %K vision; %X We present a new robust line matching algorithm for solving the model-to-image registration problem. Given a model consisting of 3D lines and a cluttered perspective image of this model, the algorithm simultaneously estimates the pose of the model and the correspondences of model lines to image lines. The algorithm combines softassign for determining correspondences and POSIT for determining pose. Integrating these algorithms into a deterministic annealing procedure allows the correspondence and pose to evolve from initially uncertain values to a joint local optimum. This research extends to line features the SoftPOSIT algorithm proposed recently for point features. Lines detected in images are typically more stable than points and are less likely to be produced by clutter and noise, especially in man-made environments. Experiments on synthetic and real imagery with high levels of clutter, occlusion, and noise demonstrate the robustness of the algorithm. %B Computer Vision and Pattern Recognition, 2003. Proceedings. 2003 IEEE Computer Society Conference on %V 2 %P II-424 - II-431 vol.2 - II-424 - II-431 vol.2 %8 2003/06// %G eng %R 10.1109/CVPR.2003.1211499 %0 Conference Paper %B Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03). 2003 IEEE International Conference on %D 2003 %T Simultaneous tracking and recognition of human faces from video %A Zhou,Shaohua %A Chellapa, Rama %K appearance %K changes; %K density; %K Face %K human %K illumination %K Laplacian %K model; %K optical %K pose %K processing; %K recognition; %K series %K series; %K signal %K TIME %K tracking; %K variations; %K video %K video; %X The paper investigates the interaction between tracking and recognition of human faces from video under a framework proposed earlier (Shaohua Zhou et al., Proc. 5th Int. Conf. on Face and Gesture Recog., 2002; Shaohua Zhou and Chellappa, R., Proc. European Conf. on Computer Vision, 2002), where a time series model is used to resolve the uncertainties in both tracking and recognition. However, our earlier efforts employed only a simple likelihood measurement in the form of a Laplacian density to deal with appearance changes between frames and between the observation and gallery images, yielding poor accuracies in both tracking and recognition when confronted by pose and illumination variations. The interaction between tracking and recognition was not well understood. We address the interdependence between tracking and recognition using a series of experiments and quantify the interacting nature of tracking and recognition. %B Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03). 2003 IEEE International Conference on %V 3 %P III - 225-8 vol.3 - III - 225-8 vol.3 %8 2003/04// %G eng %R 10.1109/ICASSP.2003.1199148 %0 Conference Paper %B Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03). 2003 IEEE International Conference on %D 2003 %T Statistical shape theory for activity modeling %A Vaswani, N. %A Chowdhury, A.R. %A Chellapa, Rama %K abnormal %K activities %K activity %K analysis; %K behavior; %K classification; %K data; %K image %K mass; %K matching; %K modeling; %K monitoring; %K moving %K normal %K particle; %K pattern %K pattern; %K point %K polygonal %K probability; %K problem; %K processing; %K sequence; %K sequences; %K SHAPE %K shape; %K signal %K statistical %K Surveillance %K surveillance; %K theory; %K video %X Monitoring activities in a certain region from video data is an important surveillance problem. The goal is to learn the pattern of normal activities and detect unusual ones by identifying activities that deviate appreciably from the typical ones. We propose an approach using statistical shape theory based on the shape model of D.G. Kendall et al. (see "Shape and Shape Theory", John Wiley and Sons, 1999). In a low resolution video, each moving object is best represented as a moving point mass or particle. In this case, an activity can be defined by the interactions of all or some of these moving particles over time. We model this configuration of the particles by a polygonal shape formed from the locations of the points in a frame and the activity by the deformation of the polygons in time. These parameters are learned for each typical activity. Given a test video sequence, an activity is classified as abnormal if the probability for the sequence (represented by the mean shape and the dynamics of the deviations), given the model, is below a certain threshold The approach gives very encouraging results in surveillance applications using a single camera and is able to identify various kinds of abnormal behavior. %B Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03). 2003 IEEE International Conference on %V 3 %P III - 493-6 vol.3 - III - 493-6 vol.3 %8 2003/04// %G eng %R 10.1109/ICASSP.2003.1199519 %0 Conference Paper %B Computer Vision, 2003. Proceedings. Ninth IEEE International Conference on %D 2003 %T Using specularities for recognition %A Osadchy,M. %A Jacobs, David W. %A Ramamoorthi,R. %K 3D %K formation;object %K glass;computer %K image %K information;specular %K light %K measurement;reflection;stereo %K objects;specular %K objects;wine %K processing; %K property;pottery;recognition %K recognition;object %K recognition;position %K reflectance %K reflection;compact %K reflection;transparent %K shape;Lambertian %K source;highlight %K systems;shiny %K vision;lighting;object %X Recognition systems have generally treated specular highlights as noise. We show how to use these highlights as a positive source of information that improves recognition of shiny objects. This also enables us to recognize very challenging shiny transparent objects, such as wine glasses. Specifically, we show how to find highlights that are consistent with a hypothesized pose of an object of known 3D shape. We do this using only a qualitative description of highlight formation that is consistent with most models of specular reflection, so no specific knowledge of an object's reflectance properties is needed. We first present a method that finds highlights produced by a dominant compact light source, whose position is roughly known. We then show how to estimate the lighting automatically for objects whose reflection is part specular and part Lambertian. We demonstrate this method for two classes of objects. First, we show that specular information alone can suffice to identify objects with no Lambertian reflectance, such as transparent wine glasses. Second, we use our complete system to recognize shiny objects, such as pottery. %B Computer Vision, 2003. Proceedings. Ninth IEEE International Conference on %P 1512 -1519 vol.2 - 1512 -1519 vol.2 %8 2003/10// %G eng %R 10.1109/ICCV.2003.1238669 %0 Conference Paper %B Multimedia and Expo, 2003. ICME '03. Proceedings. 2003 International Conference on %D 2003 %T Video based rendering of planar dynamic scenes %A Kale, A. %A Chowdhury, A.K.R. %A Chellapa, Rama %K (computer %K 3D %K analysis; %K approximation; %K based %K camera; %K cameras; %K direction; %K dynamic %K graphics); %K image %K monocular %K MOTION %K perspective %K planar %K processing; %K rendering %K rendering; %K scenes; %K sequence; %K sequences; %K signal %K video %K weak %X In this paper, we propose a method to synthesize arbitrary views of a planar scene from a monocular video sequence of it. The 3-D direction of motion of the object is robustly estimated from the video sequence. Given this direction any other view of the object can be synthesized through a perspective projection approach, under assumptions of planarity. If the distance of the object from the camera is large, a planar approximation is reasonable even for non-planar scenes. Such a method has many important applications, one of them being gait recognition where a side view of the person is required. Our method can be used to synthesize the side-view of the person in case he/she does not present a side view to the camera. Since the planarity assumption is often an approximation, the effects of non-planarity can lead to inaccuracies in rendering and needs to be corrected for. Regions where this happens are examined and a simple technique based on weak perspective approximation is proposed to offset rendering inaccuracies. Examples of synthesized views using our method and performance evaluation are presented. %B Multimedia and Expo, 2003. ICME '03. Proceedings. 2003 International Conference on %V 1 %P I - 477-80 vol.1 - I - 477-80 vol.1 %8 2003/07// %G eng %R 10.1109/ICME.2003.1220958 %0 Conference Paper %B Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03). 2003 IEEE International Conference on %D 2003 %T Video synthesis of arbitrary views for approximately planar scenes %A Chowdhury, A.K.R. %A Kale, A. %A Chellapa, Rama %K (access %K 3D %K applications; %K approach; %K approximately %K approximation; %K arbitrary %K Biometrics %K control); %K data; %K direction %K estimation; %K evaluation; %K Gait %K image %K monocular %K MOTION %K performance %K perspective %K planar %K processing; %K projection %K recognition; %K recovery; %K scenes; %K sequence; %K sequences; %K side %K signal %K structure; %K Surveillance %K surveillance; %K synthesis; %K synthesized %K video %K view %K views; %X In this paper, we propose a method to synthesize arbitrary views of a planar scene, given a monocular video sequence. The method is based on the availability of knowledge of the angle between the original and synthesized views. Such a method has many important applications, one of them being gait recognition. Gait recognition algorithms rely on the availability of an approximate side-view of the person. From a realistic viewpoint, such an assumption is impractical in surveillance applications and it is of interest to develop methods to synthesize a side view of the person, given an arbitrary view. For large distances from the camera, a planar approximation for the individual can be assumed. In this paper, we propose a perspective projection approach for recovering the direction of motion of the person purely from the video data, followed by synthesis of a new video sequence at a different angle. The algorithm works purely in the image and video domain, though 3D structure plays an implicit role in its theoretical justification. Examples of synthesized views using our method and performance evaluation are presented. %B Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03). 2003 IEEE International Conference on %V 3 %P III - 497-500 vol.3 - III - 497-500 vol.3 %8 2003/04// %G eng %R 10.1109/ICASSP.2003.1199520 %0 Conference Paper %B Multimedia and Expo, 2002. ICME '02. Proceedings. 2002 IEEE International Conference on %D 2002 %T 3D face reconstruction from video using a generic model %A Chowdhury, A.R. %A Chellapa, Rama %A Krishnamurthy, S. %A Vo, T. %K 3D %K algorithm; %K algorithms; %K analysis; %K Carlo %K chain %K Computer %K Face %K from %K function; %K generic %K human %K image %K Markov %K MCMC %K methods; %K model; %K Monte %K MOTION %K optimisation; %K OPTIMIZATION %K processes; %K processing; %K recognition; %K reconstruction %K reconstruction; %K sampling; %K sequence; %K sequences; %K SfM %K signal %K structure %K surveillance; %K video %K vision; %X Reconstructing a 3D model of a human face from a video sequence is an important problem in computer vision, with applications to recognition, surveillance, multimedia etc. However, the quality of 3D reconstructions using structure from motion (SfM) algorithms is often not satisfactory. One common method of overcoming this problem is to use a generic model of a face. Existing work using this approach initializes the reconstruction algorithm with this generic model. The problem with this approach is that the algorithm can converge to a solution very close to this initial value, resulting in a reconstruction which resembles the generic model rather than the particular face in the video which needs to be modeled. We propose a method of 3D reconstruction of a human face from video in which the 3D reconstruction algorithm and the generic model are handled separately. A 3D estimate is obtained purely from the video sequence using SfM algorithms without use of the generic model. The final 3D model is obtained after combining the SfM estimate and the generic model using an energy function that corrects for the errors in the estimate by comparing local regions in the two models. The optimization is done using a Markov chain Monte Carlo (MCMC) sampling strategy. The main advantage of our algorithm over others is that it is able to retain the specific features of the face in the video sequence even when these features are different from those of the generic model. The evolution of the 3D model through the various stages of the algorithm is presented. %B Multimedia and Expo, 2002. ICME '02. Proceedings. 2002 IEEE International Conference on %V 1 %P 449 - 452 vol.1 - 449 - 452 vol.1 %8 2002/// %G eng %R 10.1109/ICME.2002.1035815 %0 Conference Paper %B Image Processing. 2002. Proceedings. 2002 International Conference on %D 2002 %T Bayesian structure from motion using inertial information %A Qian,Gang %A Chellapa, Rama %A Qinfen Zheng %K 3D %K analysis; %K Bayes %K Bayesian %K camera %K estimation; %K image %K images; %K importance %K inertial %K information; %K methods; %K MOTION %K motion; %K parameter %K processing; %K real %K reconstruction; %K sampling; %K scene %K sensors; %K sequence; %K sequences; %K sequential %K signal %K structure-from-motion; %K synthetic %K systems; %K video %X A novel approach to Bayesian structure from motion (SfM) using inertial information and sequential importance sampling (SIS) is presented. The inertial information is obtained from camera-mounted inertial sensors and is used in the Bayesian SfM approach as prior knowledge of the camera motion in the sampling algorithm. Experimental results using both synthetic and real images show that, when inertial information is used, more accurate results can be obtained or the same estimation accuracy can be obtained at a lower cost. %B Image Processing. 2002. Proceedings. 2002 International Conference on %V 3 %P III-425 - III-428 vol.3 - III-425 - III-428 vol.3 %8 2002/// %G eng %R 10.1109/ICIP.2002.1038996 %0 Conference Paper %B Pattern Recognition, 2002. Proceedings. 16th International Conference on %D 2002 %T Binarization of low quality text using a Markov random field model %A Wolf,C. %A David Doermann %K analysis; %K annealing; %K Bayes %K Bayesian %K binarization; %K computing; %K distributions; %K document %K documents; %K field; %K Gibbs %K image %K low %K Markov %K method; %K methods; %K multimedia %K optimization; %K probability; %K processes; %K processing; %K QUALITY %K random %K simulated %K text %X Binarization techniques have been developed in the document analysis community for over 30 years and many algorithms have been used successfully. On the other hand, document analysis tasks are more and more frequently being applied to multimedia documents such as video sequences. Due to low resolution and lossy compression, the binarization of text included in the frames is a non-trivial task. Existing techniques work without a model of the spatial relationships in the image, which makes them less powerful. We introduce a new technique based on a Markov random field model of the document. The model parameters (clique potentials) are learned from training data and the binary image is estimated in a Bayesian framework. The performance is evaluated using commercial OCR software. %B Pattern Recognition, 2002. Proceedings. 16th International Conference on %V 3 %P 160 - 163 vol.3 - 160 - 163 vol.3 %8 2002/// %G eng %R 10.1109/ICPR.2002.1047819 %0 Conference Paper %B Pattern Recognition, 2002. Proceedings. 16th International Conference on %D 2002 %T Content-based image retrieval using Fourier descriptors on a logo database %A Folkers,A. %A Samet, Hanan %K abstraction; %K analysis; %K constraints; %K content-based %K contour %K database %K database; %K databases; %K descriptors; %K detection; %K edge %K Fourier %K image %K logos; %K pictorial %K processing; %K query %K retrieval; %K SHAPE %K spatial %K specification; %K theory; %K visual %X A system that enables the pictorial specification of queries in an image database is described. The queries are comprised of rectangle, polygon, ellipse, and B-spline shapes. The queries specify which shapes should appear in the target image as well as spatial constraints on the distance between them and their relative position. The retrieval process makes use of an abstraction of the contour of the shape which is invariant against translation, scale, rotation, and starting point, that is based on the use of Fourier descriptors. These abstractions are used in a system to locate logos in an image database. The utility of this approach is illustrated using some sample queries. %B Pattern Recognition, 2002. Proceedings. 16th International Conference on %V 3 %P 521 - 524 vol.3 - 521 - 524 vol.3 %8 2002/// %G eng %R 10.1109/ICPR.2002.1047991 %0 Conference Paper %B Scientific and Statistical Database Management, 2002. Proceedings. 14th International Conference on %D 2002 %T Efficient techniques for range search queries on earth science data %A Shi,Qingmin %A JaJa, Joseph F. %K based %K computing; %K content %K data %K data; %K databases; %K Earth %K factors; %K large %K mining %K mining; %K natural %K processing; %K queries; %K query %K range %K raster %K retrieval; %K scale %K Science %K sciences %K search %K spatial %K structures; %K tasks; %K temporal %K tree %K tree-of-regions; %K visual %X We consider the problem of organizing large scale earth science raster data to efficiently handle queries for identifying regions whose parameters fall within certain range values specified by the queries. This problem seems to be critical to enabling basic data mining tasks such as determining associations between physical phenomena and spatial factors, detecting changes and trends, and content based retrieval. We assume that the input is too large to fit in internal memory and hence focus on data structures and algorithms that minimize the I/O bounds. A new data structure, called a tree-of-regions (ToR), is introduced and involves a combination of an R-tree and efficient representation of regions. It is shown that such a data structure enables the handling of range queries in an optimal I/O time, under certain reasonable assumptions. We also show that updates to the ToR can be handled efficiently. Experimental results for a variety of multi-valued earth science data illustrate the fast execution times of a wide range of queries, as predicted by our theoretical analysis. %B Scientific and Statistical Database Management, 2002. Proceedings. 14th International Conference on %P 142 - 151 %8 2002/// %G eng %R 10.1109/SSDM.2002.1029714 %0 Journal Article %J Image Processing, IEEE Transactions on %D 2002 %T A generic approach to simultaneous tracking and verification in video %A Li,Baoxin %A Chellapa, Rama %K approach; %K Carlo %K configuration; %K correspondence %K data; %K density %K density; %K estimated %K estimation; %K evaluation; %K extraction; %K Face %K facial %K feature %K generic %K human %K hypothesis %K image %K measurement %K methods; %K Monte %K object %K performance %K posterior %K probability %K probability; %K problem; %K processing; %K propagation; %K recognition; %K road %K sequence %K sequences; %K sequential %K signal %K space; %K stabilization; %K state %K synthetic %K temporal %K testing; %K tracking; %K vector; %K vehicle %K vehicles; %K verification; %K video %K visual %X A generic approach to simultaneous tracking and verification in video data is presented. The approach is based on posterior density estimation using sequential Monte Carlo methods. Visual tracking, which is in essence a temporal correspondence problem, is solved through probability density propagation, with the density being defined over a proper state space characterizing the object configuration. Verification is realized through hypothesis testing using the estimated posterior density. In its most basic form, verification can be performed as follows. Given a measurement vector Z and two hypotheses H1 and H0, we first estimate posterior probabilities P(H0|Z) and P(H1|Z), and then choose the one with the larger posterior probability as the true hypothesis. Several applications of the approach are illustrated by experiments devised to evaluate its performance. The idea is first tested on synthetic data, and then experiments with real video sequences are presented, illustrating vehicle tracking and verification, human (face) tracking and verification, facial feature tracking, and image sequence stabilization. %B Image Processing, IEEE Transactions on %V 11 %P 530 - 544 %8 2002/05// %@ 1057-7149 %G eng %N 5 %R 10.1109/TIP.2002.1006400 %0 Conference Paper %B Motion and Video Computing, 2002. Proceedings. Workshop on %D 2002 %T A hierarchical approach for obtaining structure from two-frame optical flow %A Liu,Haiying %A Chellapa, Rama %A Rosenfeld, A. %K algorithm; %K aliasing; %K analysis; %K computer-rendered %K depth %K depth; %K error %K estimation; %K extraction; %K Face %K feature %K flow; %K gesture %K hierarchical %K image %K images; %K inverse %K iterative %K methods; %K MOTION %K nonlinear %K optical %K parameter %K processing; %K real %K recognition; %K sequences; %K signal %K structure-from-motion; %K system; %K systems; %K TIME %K two-frame %K variation; %K video %X A hierarchical iterative algorithm is proposed for extracting structure from two-frame optical flow. The algorithm exploits two facts: one is that in many applications, such as face and gesture recognition, the depth variation of the visible surface of an object in a scene is small compared to the distance between the optical center and the object; the other is that the time aliasing problem is alleviated at the coarse level for any two-frame optical flow estimate so that the estimate tends to be more accurate. A hierarchical representation for the relationship between the optical flow, depth, and the motion parameters is derived, and the resulting non-linear system is iteratively solved through two linear subsystems. At the coarsest level, the surface of the object tends to be flat, so that the inverse depth tends to be a constant, which is used as the initial depth map. Inverse depth and motion parameters are estimated by the two linear subsystems at each level and the results are propagated to finer levels. Error analysis and experiments using both computer-rendered images and real images demonstrate the correctness and effectiveness of our algorithm. %B Motion and Video Computing, 2002. Proceedings. Workshop on %P 214 - 219 %8 2002/12// %G eng %R 10.1109/MOTION.2002.1182239 %0 Conference Paper %B Pattern Recognition, 2002. Proceedings. 16th International Conference on %D 2002 %T Page classification through logical labelling %A Liang,Jian %A David Doermann %A Ma,M. %A Guo,J. K %K article %K attributed %K base; %K character %K classification; %K constraints; %K document %K document; %K experimental %K global %K graph %K graph; %K hierarchical %K image %K images; %K labelling; %K logical %K model %K noise; %K OCR; %K optical %K page %K pages; %K processing; %K recognition; %K relational %K results; %K technical %K theory; %K title %K unknown %X We propose an integrated approach to page classification and logical labelling. Layout is represented by a fully connected attributed relational graph that is matched to the graph of an unknown document, achieving classification and labelling simultaneously. By incorporating global constraints in an integrated fashion, ambiguity at the zone level can be reduced, providing robustness to noise and variation. Models are automatically trained from sample documents. Experimental results show promise for the classification and labelling of technical article title pages, and supports the idea of a hierarchical model base. %B Pattern Recognition, 2002. Proceedings. 16th International Conference on %V 3 %P 477 - 480 vol.3 - 477 - 480 vol.3 %8 2002/// %G eng %R 10.1109/ICPR.2002.1047980 %0 Conference Paper %B Image Processing. 2002. Proceedings. 2002 International Conference on %D 2002 %T Probabilistic recognition of human faces from video %A Chellapa, Rama %A Kruger, V. %A Zhou,Shaohua %K Bayes %K Bayesian %K CMU; %K distribution; %K Face %K faces; %K gallery; %K handling; %K human %K image %K images; %K importance %K likelihood; %K methods; %K NIST/USF; %K observation %K posterior %K probabilistic %K probability; %K processing; %K propagation; %K recognition; %K sampling; %K sequential %K signal %K still %K Still-to-video %K Uncertainty %K video %K Video-to-video %X Most present face recognition approaches recognize faces based on still images. We present a novel approach to recognize faces in video. In that scenario, the face gallery may consist of still images or may be derived from a videos. For evidence integration we use classical Bayesian propagation over time and compute the posterior distribution using sequential importance sampling. The probabilistic approach allows us to handle uncertainties in a systematic manner. Experimental results using videos collected by NIST/USF and CMU illustrate the effectiveness of this approach in both still-to-video and video-to-video scenarios with appropriate model choices. %B Image Processing. 2002. Proceedings. 2002 International Conference on %V 1 %P I-41 - I-44 vol.1 - I-41 - I-44 vol.1 %8 2002/// %G eng %R 10.1109/ICIP.2002.1037954 %0 Journal Article %J Computer %D 2002 %T Rover: scalable location-aware computing %A Banerjee,S. %A Agarwal,S. %A Kamel,K. %A Kochut, A. %A Kommareddy,C. %A Nadeem,T. %A Thakkar,P. %A Trinh,Bao %A Youssef,A. %A Youssef, M. %A Larsen,R.L. %A Udaya Shankar,A. %A Agrawala, Ashok K. %K amusement %K application-specific %K architecture; %K automation; %K business %K business; %K computing; %K data %K entertainment; %K handheld %K humanities; %K integration %K LAN; %K location-aware %K malls; %K mobile %K museums; %K office %K parks; %K processing; %K resource %K Rover; %K scalability; %K scalable %K scheduling; %K shopping %K software %K system %K theme %K units; %K user; %K wireless %X All the components necessary for realizing location-aware computing are available in the marketplace today. What has hindered the widespread deployment of location-based systems is the lack of an integration architecture that scales with user populations. The authors have completed the initial implementation of Rover, a system designed to achieve this sort of integration and to automatically tailor information and services to a mobile user's location. Their studies have validated Rover's underlying software architecture, which achieves system scalability through high-resolution, application-specific resource scheduling at the servers and network. The authors believe that this technology will greatly enhance the user experience in many places, including museums, amusement and theme parks, shopping malls, game fields, offices, and business centers. They designed the system specifically to scale to large user populations and expect its benefits to increase with them. %B Computer %V 35 %P 46 - 53 %8 2002/10// %@ 0018-9162 %G eng %N 10 %R 10.1109/MC.2002.1039517 %0 Conference Paper %B Multimedia Signal Processing, 2002 IEEE Workshop on %D 2002 %T Wide baseline image registration using prior information %A Chowdhury, AM %A Chellapa, Rama %A Keaton, T. %K 2D %K 3D %K algorithm; %K alignment; %K angles; %K baseline %K Computer %K configuration; %K constellation; %K correspondence %K creation; %K doubly %K error %K extraction; %K Face %K feature %K global %K holistic %K image %K images; %K matching; %K matrix; %K model %K models; %K normalization %K panoramic %K probability; %K procedure; %K processes; %K processing; %K registration; %K robust %K sequences; %K SHAPE %K signal %K Sinkhorn %K spatial %K statistics; %K stereo; %K Stochastic %K video %K view %K viewing %K vision; %K wide %X Establishing correspondence between features in two images of the same scene taken from different viewing angles in a challenging problem in image processing and computer vision. However, its solution is an important step in many applications like wide baseline stereo, 3D model alignment, creation of panoramic views etc. In this paper, we propose a technique for registration of two images of a face obtained from different viewing angles. We show that prior information about the general characteristics of a face obtained from video sequences of different faces can be used to design a robust correspondence algorithm. The method works by matching 2D shapes of the different features of the face. A doubly stochastic matrix, representing the probability of match between the features, is derived using the Sinkhorn normalization procedure. The final correspondence is obtained by minimizing the probability of error of a match between the entire constellations of features in the two sets, thus taking into account the global spatial configuration of the features. The method is applied for creating holistic 3D models of a face from partial representations. Although this paper focuses primarily on faces, the algorithm can also be used for other objects with small modifications. %B Multimedia Signal Processing, 2002 IEEE Workshop on %P 37 - 40 %8 2002/12// %G eng %R 10.1109/MMSP.2002.1203242 %0 Conference Paper %B Multimedia and Expo, 2000. ICME 2000. 2000 IEEE International Conference on %D 2000 %T Video access control via multi-level data hiding %A M. Wu %A Yu,Hong Heather %K access %K adaptive %K algorithms;hidden %K bits;high %K conditions;robustness;robustness-capacity %K control;adaptive %K data %K data;video %K design;user %K digital %K embedding;noise %K encapsulation;multimedia %K hiding %K hiding;multi-level %K information;data %K processing; %K QUALITY %K signal %K systems;authorisation;data %K systems;video %K technique;control %K tradeoff;system %K user %K video;multi-level %X The paper proposes novel data hiding algorithms and system design for high quality digital video. Instead of targeting on a single degree of robustness, which results in overestimation and/or underestimation of the noise conditions, we apply multi-level embedding to digital video to achieve more than one level of robustness-capacity tradeoff. In addition, an adaptive technique is proposed to determine how many bits are embedded in each part of the video. Besides user data, control information such as synchronization and the number of hidden user bits are embedded as well. The algorithm can be used for applications such as access control %B Multimedia and Expo, 2000. ICME 2000. 2000 IEEE International Conference on %V 1 %P 381 -384 vol.1 - 381 -384 vol.1 %8 2000/// %G eng %R 10.1109/ICME.2000.869620 %0 Conference Paper %B Research and Technology Advances in Digital Libraries, 1999. ADL '99. Proceedings. IEEE Forum on %D 1999 %T Refining query previews techniques for data with multivalued attributes: the case of NASA EOSDIS %A Plaisant, Catherine %A Venkatraman,M. %A Ngamkajorwiwat,K. %A Barth,R. %A Harberts,B. %A Feng,Wenlan %K attribute %K attributes;processing %K collection;memory %K computing;meta %K data %K data;abstracted %K data;digital %K data;multivalued %K data;query %K Earth %K EOSDIS;NASA %K libraries;geophysics %K metadata;dataset;digital %K NASA %K previews %K processing; %K requirements;multi-valued %K Science %K techniques;undesired %K time;query %X Query Previews allow users to rapidly gain an understanding of the content and scope of a digital data collection. These previews present overviews of abstracted metadata, enabling users to rapidly and dynamically avoid undesired data. We present our recent work on developing query previews for a variety of NASA EOSDIS situations. We focus on approaches that successfully address the challenge of multi-valued attribute data. Memory requirements and processing time associated with running these new solutions remain independent of the number of records in the dataset. We describe two techniques and their respective prototypes used to preview NASA Earth science data %B Research and Technology Advances in Digital Libraries, 1999. ADL '99. Proceedings. IEEE Forum on %P 50 - 59 %8 1999/// %G eng %R 10.1109/ADL.1999.777690 %0 Conference Paper %B Image Processing, 1998. ICIP 98. Proceedings. 1998 International Conference on %D 1998 %T An algorithm for wipe detection %A M. Wu %A Wolf,W. %A Liu,B. %K algorithm;image %K analysis;video %K content;video %K DC %K DETECTION %K detection;statistical %K domain;object %K information;structural %K information;video %K motion;compressed %K motion;shot %K processing; %K programs;camera %K programs;wipe %K sequence;MPEG %K sequences;signal %K signal %K stream;TV %K transitions %X The detection of transitions between shots in video programs is an important first step in analyzing video content. The wipe is a frequently used transitional form between shots. Wipe detection is more involved than the detection of abrupt and other gradual transitions because a wipe may take various patterns and because of the difficulty in discriminating a wipe from object and camera motion. In this paper, we propose an algorithm for detecting wipes using both structural and statistical information. The algorithm can effectively detect most wipes used in current TV programs. It uses the DC sequence which can be easily extracted from the MPEG stream without full decompression %B Image Processing, 1998. ICIP 98. Proceedings. 1998 International Conference on %V 1 %P 893 -897 vol.1 - 893 -897 vol.1 %8 1998/10// %G eng %R 10.1109/ICIP.1998.723664 %0 Journal Article %J Computational Science Engineering, IEEE %D 1998 %T Models and high-performance algorithms for global BRDF retrieval %A Zengyan Zhang %A Kalluri, SNV %A JaJa, Joseph F. %A Liang,Shunlin %A Townshend,J.R.G. %K algorithms; %K BRDF %K Earth %K geomorphology; %K geophysical %K global %K high-performance %K IBM %K information %K light %K machines; %K models; %K Parallel %K processing; %K reflectivity; %K retrieval %K retrieval; %K scattering; %K signal %K SP2; %K surface; %X The authors describe three models for retrieving information related to the scattering of light on the Earth's surface. Using these models, they've developed algorithms for the IBM SP2 that efficiently retrieve this information %B Computational Science Engineering, IEEE %V 5 %P 16 - 29 %8 1998/12//oct %@ 1070-9924 %G eng %N 4 %R 10.1109/99.735892 %0 Conference Paper %B Document Analysis and Recognition, 1997., Proceedings of the Fourth International Conference on %D 1997 %T Local correspondence for detecting random forgeries %A Guo,J. K %A David Doermann %A Rosenfeld, A. %K applications;online %K applications;questioned %K correspondence;offline %K detection;reference %K extraction;handwriting %K features;stroke %K forged %K forgeries;random %K forgeries;stroke %K Forgery %K level;stroke %K meaningful %K processing; %K properties;local %K recognition;image %K segmentation;stylistically %K segmentation;word %K segments;feature %K signature;random %K signature;signature %K signatures;invariant %K verification;skilled %X Progress on the problem of signature verification has advanced more rapidly in online applications than offline applications, in part because information which can easily be recorded in online environments, such as pen position and velocity, is lost in static offline data. In offline applications, valuable information which can be used to discriminate between genuine and forged signatures is embedded at the stroke level. We present an approach to segmenting strokes into stylistically meaningful segments and establish a local correspondence between a questioned signature and a reference signature to enable the analysis and comparison of stroke features. Questioned signatures which do not conform to the reference signature are identified as random forgeries. Most simple forgeries can also be identified, as they do not conform to the reference signature's invariant properties such as connections between letters. Since we have access to both local and global information, our approach also shows promise for extension to the identification of skilled forgeries %B Document Analysis and Recognition, 1997., Proceedings of the Fourth International Conference on %V 1 %P 319 -323 vol.1 - 319 -323 vol.1 %8 1997/08// %G eng %R 10.1109/ICDAR.1997.619864 %0 Conference Paper %B Supercomputing, 1995. Proceedings of the IEEE/ACM SC95 Conference %D 1995 %T Efficient Algorithms for Atmospheric Correction of Remotely Sensed Data %A Fallah-Adl,H. %A JaJa, Joseph F. %A Liang,Shunlin %A Kaufman,Y.J. %A Townshend,J. %K Atmospheric %K AVHRR; %K computing; %K correction; %K high %K I/O; %K Parallel %K performance %K processing; %K remote %K scalable %K sensing; %K TM; %X Remotely sensed imagery has been used for developing and validating various studies regarding land cover dynamics. However, the large amounts of imagery collected by the satellites are largely contaminated by the effects of atmospheric particles. The objective of atmospheric correction is to retrieve the surface reflectance from remotely sensed imagery by removing the atmospheric effects. We introduce a number of computational techniques that lead to a substantial speedup of an atmospheric correction algorithm based on using look-up tables. Excluding I/O time, the previous known implementation processes one pixel at a time and requires about 2.63 seconds per pixel on a SPARC-10 machine, while our implementation is based on processing the whole image and takes about 4-20 microseconds per pixel on the same machine. We also develop a parallel version of our algorithm that is scalable in terms of both computation and I/O. Experimental results obtained show that a Thematic Mapper (TM) image (36 MB per band, 5 bands need to be corrected) can be handled in less than 4.3 minutes on a 32-node CM-5 machine, including I/O time. %B Supercomputing, 1995. Proceedings of the IEEE/ACM SC95 Conference %P 12 - 12 %8 1995/// %G eng %R 10.1109/SUPERC.1995.242453 %0 Conference Paper %B Computer Vision and Pattern Recognition, 1993. Proceedings CVPR '93., 1993 IEEE Computer Society Conference on %D 1993 %T 2D images of 3-D oriented points %A Jacobs, David W. %K 2D %K 3-D %K database %K derivation; %K image %K images; %K indexing; %K linear %K model %K nonrigid %K oriented %K points; %K processing; %K recovery; %K structure-form-motion %K structure-from-motion %K transformation; %X A number of vision problems have been shown to become simpler when one models projection from 3-D to 2-D as a nonrigid linear transformation. These results have been largely restricted to models and scenes that consist only of 3-D points. It is shown that, with this projection model, several vision tasks become fundamentally more complex in the somewhat more complicated domain of oriented points. More space is required for indexing models in a database, more images are required to derive structure from motion, and new views of an object cannot be synthesized linearly from old views %B Computer Vision and Pattern Recognition, 1993. Proceedings CVPR '93., 1993 IEEE Computer Society Conference on %P 226 - 232 %8 1993/06// %G eng %R 10.1109/CVPR.1993.340985 %0 Conference Paper %B Document Analysis and Recognition, 1993., Proceedings of the Second International Conference on %D 1993 %T Image based typographic analysis of documents %A David Doermann %A Furuta,R. %K 2D %K analysis; %K attributes; %K based %K character %K commands; %K component %K data %K description %K document %K DVI %K extraction; %K feature %K figure %K file; %K formatting %K hierarchical %K image %K language; %K languages; %K layout; %K line %K margins; %K page %K placement; %K processing; %K read-order; %K relationships; %K representation; %K spacing; %K spatial %K structures; %K syntax; %K synthesis; %K typographic %K understanding; %X An approach to image based typographic analysis of documents is provided. The problem requires a spatial understanding of the document layout as well as knowledge of the proper syntax. The system performs a page synthesis from the stream of formatting commands defined in a DVI file. Since the two-dimensional relationships between document components are not explicit in the page language, the authors develop a representation which preserves the two-dimensional layout, the read-order and the attributes of document components. From this hierarchical representation of the page layout we extract and analyze relevant typographic features such as margins, line and character spacing, and figure placement %B Document Analysis and Recognition, 1993., Proceedings of the Second International Conference on %P 769 - 773 %8 1993/10// %G eng %R 10.1109/ICDAR.1993.395624 %0 Journal Article %J Parallel and Distributed Systems, IEEE Transactions on %D 1993 %T Optimal algorithms on the pipelined hypercube and related networks %A JaJa, Joseph F. %A Ryu,K. W. %K algorithms;pipeline %K algorithms;pipelined %K combinatorial %K geometry;parallel %K hypercube;shuffle-exchange;combinatorial %K mathematics;computational %K packing;monotone %K polygon;parallel %K problems;cube-connected-cycles;line %K processing; %X Parallel algorithms for several important combinatorial problems such as the all nearest smaller values problem, triangulating a monotone polygon, and line packing are presented. These algorithms achieve linear speedups on the pipelined hypercube, and provably optimal speedups on the shuffle-exchange and the cube-connected-cycles for any number p of processors satisfying 1 les;p les;n/((log3n)(loglog n)2), where n is the input size. The lower bound results are established under no restriction on how the input is mapped into the local memories of the different processors %B Parallel and Distributed Systems, IEEE Transactions on %V 4 %P 582 - 591 %8 1993/05// %@ 1045-9219 %G eng %N 5 %R 10.1109/71.224210 %0 Conference Paper %B Computer Vision and Pattern Recognition, 1991. Proceedings CVPR '91., IEEE Computer Society Conference on %D 1991 %T Optimal matching of planar models in 3D scenes %A Jacobs, David W. %K 3D %K approximation;flat %K error;close %K error;model %K features;computerised %K features;optimal %K matching;planar %K models;point %K object;image;maximum %K pattern %K picture %K processing; %K recognition;computerised %K scenes;bounded %K sensing %X The problem of matching a model consisting of the point features of a flat object to point features found in an image that contains the object in an arbitrary three-dimensional pose is addressed. Once three points are matched, it is possible to determine the pose of the object. Assuming bounded sensing error, the author presents a solution to the problem of determining the range of possible locations in the image at which any additional model points may appear. This solution leads to an algorithm for determining the largest possible matching between image and model features that includes this initial hypothesis. The author implements a close approximation to this algorithm, which is O( nm isin;6), where n is the number of image points, m is the number of model points, and isin; is the maximum sensing error. This algorithm is compared to existing methods, and it is shown that it produces more accurate results %B Computer Vision and Pattern Recognition, 1991. Proceedings CVPR '91., IEEE Computer Society Conference on %P 269 - 274 %8 1991/06// %G eng %R 10.1109/CVPR.1991.139700 %0 Journal Article %J Pattern Analysis and Machine Intelligence, IEEE Transactions on %D 1991 %T Space and time bounds on indexing 3D models from 2D images %A Clemens,D. T %A Jacobs, David W. %K 2D %K bounds;time %K bounds;visual %K extraction;grouping %K features;model %K features;model-based %K images;3D %K indexing;feature %K model %K operation;image %K pattern %K picture %K processing; %K recognition %K recognition;computerised %K recognition;space %K systems;computerised %X Model-based visual recognition systems often match groups of image features to groups of model features to form initial hypotheses, which are then verified. In order to accelerate recognition considerably, the model groups can be arranged in an index space (hashed) offline such that feasible matches are found by indexing into this space. For the case of 2D images and 3D models consisting of point features, bounds on the space required for indexing and on the speedup that such indexing can achieve are demonstrated. It is proved that, even in the absence of image error, each model must be represented by a 2D surface in the index space. This places an unexpected lower bound on the space required to implement indexing and proves that no quantity is invariant for all projections of a model into the image. Theoretical bounds on the speedup achieved by indexing in the presence of image error are also determined, and an implementation of indexing for measuring this speedup empirically is presented. It is found that indexing can produce only a minimal speedup on its own. However, when accompanied by a grouping operation, indexing can provide significant speedups that grow exponentially with the number of features in the groups %B Pattern Analysis and Machine Intelligence, IEEE Transactions on %V 13 %P 1007 - 1017 %8 1991/10// %@ 0162-8828 %G eng %N 10 %R 10.1109/34.99235