Environmental Sound Monitoring

Audio Forensics and Gunshot Acoustics

Selected publications, audio forensics:

recording audio from a gunshot308 rifle shots

Selected publications, gunshot acoustics:

 Journalist reports quoting R.C. Maher:

Ecological Sound Monitoring and Interpretation

The project involves two phases. In the first phase a set of algorithms are developed to identify and classify a variety of acoustical sound sources, such as jet aircraft, propeller aircraft, helicopters, snowmobiles, automobiles, and bioacoustic sounds, obtained with a high quality digital recording. The identification and classification are carried out using a fast time-frequency decomposition of the input signal followed by a maximum likelihood matching procedure.

In the second phase a rugged and self-contained monitoring platform is designed and constructed. The platform contains a digital signal processor (DSP), a microphone and data acquisition subsystem, a memory subsystem, and a solar power supply. The platform is intended to be deployed in a remote location for perhaps several weeks at a time, while continuously monitoring and classifying the acoustical environment. The data is then downloaded to a computer for further analysis and the preparation of an acoustical profile.

Automatic Decomposition and Classification of Sounds in Audio Recordings

It is very common in many applications to have an audio recording that contains a desired signal source (the target) and one or more competing sources (the jammers). In this situation it is necessary to enhance the desired source and attenuate the competing sources. For example, the desired source may be a person talking on a cell phone, while the competing sources may be other talkers, ambient wind and traffic noise, or electromagnetic interference in the communications channel.

Acoustical Modeling of Environments and Architectural Spaces

As would be expected, much of the work on architectural auralization is achieved using digital computer simulations. Like computer graphics, many auralization systems use a ray tracing model or an image model in which the computer program computes a series of reflections from a particular sound source location to the various surfaces in the room and then to a particular listener location. If the room is of a simple geometry, such as a rectangular floor plan, the computation of the various sound ray reflections can be accomplished in a straightforward manner. The interaction of sound waves with objects and surfaces encountered in a room has been treated using a variety of modeling techniques. The current scheme used in many acoustical modeling systems consists of an early reflection model coupled with a late reverberation model.

The simple image model of the early reflections is known to be inadequate to describe the diffuse reflections that occur when sound waves hit typical surfaces, particularly for frequencies above 200 Hz. The reason that these compromises are usually used is that the computation required to do higher-order modeling has been too high for practical implementation in real-time systems. Even with fast computers, the complexity of the image model is of the order nr, where n is the number of surfaces and ris the number of reflections calculated, and this makes simulating anything but the simplest of listening spaces intractable.

This research project involves two approaches in an attempt to improve the accuracy of the acoustical simulation while avoiding the excessive computational complexity. The first approach is to pre-calculate an estimate of the acoustical transfer function from the source to the vicinity of the listener, and then to convolve this pre-calculated response with the desired sound source signal. The second approach is less “brute force,” and will require research to discover the most efficient means of representation. Rather than treating the acoustical wave propagation using a ray model in which the angle of incidence equals the angle of reflection at each surface, a wavefront model is proposed to account for the diffuse sound energy that reflects in directions other than the specular angle.

Pitch Transitions in Vocal Vibrato

A common characteristic of singing by trained vocalists is vibrato. Typical vocal vibrato is a roughly sinusoidal variation in frequency of +/- 2%, with a repetition rate of about 5 Hz. Vibrato is caused by a periodic change in the tension of the vocal folds and glottis. The resulting frequency sweep of the fundamental and harmonics tends to enhance the richness of the singer’s voice as the spectral partials interact with the resonances of the throat, mouth, and sinuses. In particular, the FM of the glottal excitation can induce AM effects for partials that coincide with peaks, troughs, or shoulders of the fixed vocal tract resonances. In addition to the physical acoustics of vibrato, there are a variety of common singing practices that are learned indirectly during vocal instruction, but are usually not described explicitly. One of these behaviors is how vibrato is handled at the transition from one sung pitch to another during legato (no break in the sound) singing. In this investigation a set of recordings of trained singers performing simple legato arpeggios has been obtained so that an analysis can be performed on the instantaneous fundamental frequency before, during, and after the pitch transition. This work is intended to provide both a phenomenological description and a set of parameters suitable for realistic vocal synthesis.

R.C. Maher, "Control of synthesized vibrato during portamento musical pitch transitions," J. Audio Eng. Soc., vol. 56, no. 1/2, pp. 18-27, 2008.

Evaluation of Numerical Precision in the MDCT and Other Transforms

The use of fast numerical algorithms in waveform and image coding often involves the modified discrete cosine transform (MDCT) or similar procedures. However, the effects of round-off errors and coefficient quantization are not well understood in the MDCT, and so in practical systems it is difficult to choose the required number of bits to represent the coefficients and to store the intermediate results. In the proposed project a theoretical analysis of the numerical precision issues for the MDCT will be conducted. The results will help guide future designers in the optimal implementation of signal coding schemes.

Efficient Architectures for Audio Digital Signal Processing

Modern audio signal processing systems are noted for concurrent processing requirements: simultaneous mixing, music synthesis, sample rate conversion, audio effects, and decoding of compressed audio. It has been common to design either a special-purpose hardware processor for each function or to utilize software on a general-purpose processor in order to accomplish the required concurrency. However, the resulting system is typically unsatisfactory either due to the cost of designing and fabricating the special circuitry, or due to the reduced performance of a totally software-based implementation. This research will develop a scalable architecture in which the available hardware resources are assigned specific processing tasks according to the current user requirements. By assigning the resources dynamically and monitoring the actual system loading, the architecture provides a more efficient and economical system than is obtained by conventional methods.

Lossless Compression of Audio Signals

Audio recordings generally do not allow significant lossless data compression using conventional techniques such as Huffman, Arithmetic Coding, or Lempel-Ziv. A 50% compressed size would be considered useful, but typical compressed audio files are still 70-90% of the original file size. Lossy compression, and in particular perceptual coding, can achieve compressed files that are as little as 10% of the original file size, but the original data is not recovered exactly. If the lossy compression is perceptually lossless, the decoded data cannot be distinguished by a human listener and it might seem that the problem is solved. However, there are many applications in which lossless compression is essential or highly desirable, such as archiving data that will subsequently be mixed or processed, preparation of temporary or permanent backup copies, and transmission of data through channels that will require a series of compression/decompression steps (tandeming) in which the distortion due to lossy compression would build up. In this investigation a signal segmentation process is employed to separate the audio data into segments with specific signal properties, followed by a dynamic codebook adaptive coder to represent each segment. The process is asymmetrical: the encoder complexity is generally much greater than the decoder complexity.

  • R.C. Maher, "Lossless Audio Coding," book chapter, Lossless Compression Handbook, K. Sayood, ed., San Diego: Academic Press, 2003.