Disguising a Vocal Sample to Prevent Voiceprint Identification....


Text communications can be made in such a way to conceal the identity of the sender, whether letters cut from magazines are pasted to paper or an anonymous email remailer is used. But there are many situations where printed material or electronic text communications do not meet the requirements of the task, and vocal communication is necessary. In cases where an individual needs to conceal his or her identity, the use of voice communications presents a vulnerability; law enforcement agencies and corporations have access to specialised sound-analysis laboratories which can match speech collected from 911 calls or other recordings to forensic samples of the suspects.

Law Enforcement Use of Voice Identification

Courts have repeatedly held that requiring the accused to submit voice samples for the purpose of comparison and identification does not violate a suspect's Fifth Amendment rights. Many court orders will specify that the suspect give a sample of his or her voice, repeating phrases from the questioned recording in a similar conversational tone to that original sample. Lab technicians will attempt to replicate the recording environment as closely as possible, sometimes using the exact telephone or same model used by the original subject, and using similar disguising techniques employed by the original subject. The court orders will specify that the samples be given to the satisfaction of the investigators.

Federal Bureau of Investigation survey of its own performance in the examination of 2,000 forensic cases revealed an error rate of 0.31 percent for false identifications, and 0.53 percent for false elimination. If you do not disguise your voice carefully and effectively, voiceprint identification is incredibly reliable. Not all of the parts that make up your voiceprint are affected by having a cold or anything less severe than laryngitis. A well-planned strategy must be used that disguises all aspects of a voice that can be analysed.

Principles of Voice Identification

The premise is made that every voice is individually characteristic enough to distinguish it through voiceprint analysis. Each person has a unique vocal cavity (size and shape of the throat, nose, and mouth), unique vocal cords, and distinct learned speech patterns. This makes voiceprint analysis the equivalent of fingerprinting, and possibly more accurate.

Visual comparison of voice samples using spectrograms involves representing the sound as a graphic image. This allows scientific analysis of features like timing, frequencies, and amplitudes of the sounds. The subject's mouth and throat structures, muscles, and learned patterns produce distinct artefacts that can be seen in consonants, vowels and semi-vowels in isolation or in combination (co-articulation); the pitch, bandwidth, mean frequency, trajectory of vowel formants, distribution of formant energy, nasal resonance, stops, plosives, fricatives, pauses, inter formant features and other idiosyncratic and pathological features are analysed.

Aural cues compared include resonance quality, pitch, temporal factors, inflection, dialect, articulation, syllable grouping, breath pattern, disguise, pathologies and other peculiar speech characteristics. As many of these aural and technical cues as can be intentionally altered by the speaker or through software will help to make positive identification difficult.

Artificial Voices

In cases where it is not necessary for the voice to sound naturally human, computer text-to-speech programs can read text in a humanlike computerized voice. This can also be processed and concealed to make it less obviously computer-generated, but will always sound inhuman.

Voiceprint Concealment

Background noise, vocal technique (learning new ways of speaking), electronic devices, and digital signal processing can mask the voiceprint of a sample. It is best to try to alter an original message in advance rather than try to fake it when a clean sample is demanded by an investigative or corporate entity; at that point it is too late.

When it is not possible to use a pre-recorded message, background noise, vocal technique (speaking in an intentionally unnatural way), and electronic processing should be used. Electronic devices such as guitar effects pedals, vocal processors, vocoders, and toys can mask some aspects of a voice, but sound unnatural and are not good enough to prevent a determined researcher from performing a reasonable analysis. Nevertheless, if they are all that is available, they can often be rigged to a phone in an effective way; special adapters and audio equipment might be necessary to utilize many of these devices and interface them to a phone.

When it is possible to use a pre-recorded voice message to impersonate a live telephone call, additional voice disguising can be accomplished using a digital editor such as Sony Sound Forge; any two-track audio editing program with pro-audio features can be used and there are freeware applications which can do much of this processing. There is a balance between the naturalness of the sound of the recording and the need to disguise the voice; the more processing that is used the less natural it sounds. During intense situations, unnatural tones of voice will go largely unnoticed and the recording will still have the desired effect; but in situations where the mark has adequate time to study the recording, it will be noticed as being altered.

Tapes cannot be quickly or precisely cued and are not very good for interacting in live conversations. For additional flexibility in reacting to changing conversation and situation, many generic responses can be recorded in advance on a CD; track markers could be used to cue each response. This is better than a tape, but the best solution is to use a digital sampler or computer, and trigger different samples with keyboard shortcuts or a MIDI controller.

Recording the Original Sample

When recording the original to be processed, the recording should be made with an unnatural tone of voice, abnormal phrasing and speech rhythms, and either slower or faster than normal speech. These will be "corrected" when the recording is modified and made to sound more normal, but the processing will subvert attempts to take a voiceprint based on many common methods of voiceprint analysis. The more unnatural sounding the original sample is, the more unnatural the processed sound will be, but the less identifiable.

Phrasing and Timing

The phrasing and timing of words should be varied by inserting and deleting space between words and phrases. The entire sample can be sped up or slowed down to change a person's normal phrasing. When the original recording is made, these changes can be anticipated by having the speaker do the reverse of what will be done with the software; if the intent is to speed up the recording, the original speech should be delivered at a slow pace, for instance.

Pitch and Frequency Concealment

Changing pitch alone will make the voice sound very unnatural and will not be effective at concealing the voice; the investigators can just lower the pitch back down to analyse it, and the recording will sound like an Alvin and the Chipmunks record. Likewise, using an equalizer to change the frequency balance (more lows, less midrange, for instance) will not help; frequency content is very dependant on the recording quality and transducers (microphones and earpieces), and varies widely anyway.

Some vocal processors (software or rack-mountable) are able to change the pitch and formant to make a male voice sound female or vice versa. This is an excellent technique at concealing identity, especially if a high-quality effect is used. These processors can leave distinctive artefacts in the signal as the waveforms are processed and shaped, which may reveal that it has been processed but conceals the original characteristics very effectively. These digital-audio vocal effects are essential in concealing identity.

Advanced Processing Techniques

The formant can also be directly adjusted with many pro-audio vocal effects that plug-in to popular audio editing programs, and many other creative digital-audio effects can be used. A familiarity with common recording studio equipment is helpful in choosing effective tools that can obscure aspects of a voice while still sounding intelligible and reasonably human. This includes vocal harmonizers (when used to control formant but not used for harmony), pitch-correction plugins, chorusing effects, and many other popular pro-audio effects by vendors like Waves. Experience adjusting the formant on various samples will allow you to speak with an unnatural tone of voice and then adjust the formant to make it sound more normal. Since the formant has been unnaturally adjusted, it will help to frustrate the ability to identify the voiceprint using spectral analysis and will make it sound obviously different from the subject's normal voice even though it can sound somewhat natural.

Pitch-correction effects have become popular in recent years, and can be heard featured on many fine recordings by Britney Spears and N Sync. The signal processor analyses the pitch of a singer's voice and aligns it perfectly with a musical scale so out-of-tune notes can be corrected. This involves heavy DSP processing; and it can be abused to create very twisted sounding effects, especially when applied to speech. Although it will make the voice very disjointed and unnatural, it will also obscure phrase beginnings and endings by pitch-shifting phrases. The best example is Antares AutoTune. This processing will not disguise the frequencies of the voice nor the formants, but will disguise phrasing and pitch.

More bizarre forms of processing like flangers or the Fromage effect from Ohm Force can also be used, with the expectation that the voice will sound very processed, or at best, low-fidelity.

Background Noise

Background noise is the most obvious way to frustrate voice analysis; it adds random noises and frequencies that cloud a visual analysis of the recording. The volume should be as high as possible while ensuring that the resulting speech can be recognised even after being broadcast over a phone or radio.

Many libraries have sound effect CD's in their collections which can be used to put a voice in the context of a noisy bar, battleground, swimming pool, park, or office; music can also be used. Good background noise will not impede understanding of the words but can cover certain frequencies of the voice and prevent good spectral analysis of the sample. High frequencies are important to leave intact since high frequencies are what make words intelligible and understandable; low- and mid-range sounds can be masked to a greater degree. Adding background noise should be done last, using a multi-track sound editor like Cakewalk Sonar, Cubase, Logic Pro, or Protools. Many freeware applications can also do this, and this can be done with professional audio recording equipment.

Be careful when mixing in background noises; although the louder the noise is the better the voiceprint is concealed, the background noise can also make it difficult to understand the words being spoken. Mix it generously, but always remember that if the final recording will be played back on possibly sub-standard equipment, a PA system, or a telephone, that the voice should be a little louder so it can still be understood.

Permanent Alterations of Voice

In extreme cases where a true sample of a persons voice is already available in the wild or to law enforcement agencies, voices can be surgically modified or retrained with speech therapy. Common surgical procedures can involve a scalpel or laser to modify the vocal cords, similar to a surgery that will remove vocal nodes or change the voice of someone undergoing gender reassignment surgery. This type of investment would obviously only be warranted in cases where the consequences of identification are severe; the procedures can be time-consuming, expensive, and difficult. If you are in this situation, you need more than this short paper, but then you probably have a sizable recent windfall of liquid assets and a team of plastic surgeons standing by, right?

Resources and Citations

Formant explanation http://hyperphysics.phy-astr.gsu.edu/hbase/music/vowel.html

Text to Speech http://www.research.att.com/projects/tts/demo.html

Audio Editing Applications http://www.sonymediasoftware.com/Products/ShowProduct.asp?PID=961

Effect Plug-ins http://www.arboretum.com/support/manuals/manual_hmmp/Files/hppc_proc_misc.html#formant_pitch_shifter

Hardware Effects Processors http://www.sweetwater.com/store/detail/VT1Boss/

Voiceprint Forensics Software http://www.spectral-design.com/foenics/index.html

Voiceprint Identification http://expertpages.com/news/voiceprint_identification.htm
The author's background is in audio production as well as anarchism and sometimes sarcasm. Although the author is very well-versed in digital audio techniques, she does not have professional experience in the field of voiceprint identification; nor have these techniques been independently tested in a controlled environment. Although this information is well-researched and based on a solid understanding of the principles of audio analysis, the reader assumes all risk in using the techniques described for any purpose.

This document and the information contained herein are for entertainment purposes only; actually it's intended as really dry humour which I hope the reader can appreciate as such. The author does not advocate breaking the law or using the techniques described in the paper to conceal illegal activity or to obstruct justice.

This transmission may contain corporate espionage, death threats, or other neo-nationalist propagandas of mass destruction that is privileged, confidential, and/or exempt from interception with Echelon or Carnivore under applicable home bomb-making recipes. If you are not the intended payload's target, co-conspirator, or arch-nemesis, you are hereby notified that any such modification, alteration, decryption, or misuse/disuse of the information contained heroin vis--vis heretofore (including any reliance thereoninasmuch) is STRICTLY PROHIBITABLE by subchapter 4, sections 5-853, subparagraph 18. If you surveilled / reconnoitred this transmission in error, please immediately contact the FBI, NSA, CIA, KGB, HAMAS, or IRA and destroy the sender in entirety, whether in electronic or large-calibre format. Thank you.

