Speaker specific information present in the excitation signal is mostly viewed from sub-segmental, segmental and supra-segmental levels. In this work, the supra-segmental level information is explored for recognizing speakers. Earlier study has shown that, combined use of pitch and epoch strength vectors provides useful supra-segmental information. However, the speaker recognition accuracy achieved by supra-segmental level feature is relatively poor than other levels source information. May be the modulation information present at the supra-segmental level of the excitation signal is not manifested properly in pith and epoch strength vectors. We propose a method to model the supra-segmental level modulation information from residual mel frequency cepstral coefficient (R-MFCC) trajectories. The evidences from R-MFCC trajectories combined with pitch and epoch strength vectors are proposed to represent supra-segmental information. Experimental results show that compared to pitch and epoch strength vectors, the proposed approach provides relatively improved performance. Further, the proposed supra-segmental level information is relatively more complimentary to other levels information.
Pati, Debadatta and Prasanna, S. R. Mahadeva
"Speaker Recognition using Supra-segmental Level Excitation Information,"
International Journal of Computer and Communication Technology: Vol. 4:
1, Article 5.
Available at: https://www.interscience.in/ijcct/vol4/iss1/5