Digital Signal Processing Mini-Project:

An Automatic Speaker Recognition System


Overview

Speaker recognition is the process of automatically recognizing who is speaking on the basis of individual information included in speech waves. This technique makes it possible to use the speaker's voice to verify their identity and control access to services such as voice dialing, banking by telephone, telephone shopping, database access services, information services, voice mail, security control for confidential information areas, and remote access to computers.

The goal of this project is to build a simple, yet complete and representative automatic speaker recognition system. Due to the limited space, we will test our system on a small (but already non-trivial) speech database. There are 8 speakers, labeled from S1 to S8. All speakers uttered the same single digit "zero", once in a training session and once in a testing session. These sessions are at least 6 months apart to simulate the voice variation over the time.  The vocabulary of digits is commonly used in speaker recognition systems. For example, users have to speak a code in order to gain access into a laboratory, or users have to speak their credit card number to verify their identity over telephone line. By checking the voice characteristics of the input utterance, using an automatic speaker recognition system similar to the one that we will develop, the system is able to add an extra level of security.


How to build an automatic speaker recognition system?

  • HTML format
  • Microsoft Word format

  • Supplied code


    Speech data

    After unzipping the file correctly, you will find two folders, TRAIN and TEST, each contains 8 files, named: S1.WAV, S2.WAV, ..., S8.WAV; each is labeled after the ID of the speaker. These files were recorded in Microsoft WAV format. In Windows systems, you can listen to the recorded sounds by double clicking into the files.

    Your task is to train a voice model for each speaker S1 - S8 using the corresponding sound file in the TRAIN folder. After this training step, the system would have knowledge of the voice characteristic of each (known) speaker. Next, in the testing phase, the system will be able to identify the (assumed unknown) speakers of each sound file in the TEST folder.

  • PC WinZip format
  • UNIX tar format

  • Illustration


    Hints

  • For function mfcc.m
  • For function vqlbg.m

  • Related links