FPGA Implementation of CNN-LSTM Classifier in Speech Emotion Recognition System--Institute of Semiconductors

Gao, Zhaogang; Xiao, Wan'ang; Zhou, Weixin; Yang, Zhenghong Source: 2023 International Conference on High Performance Big Data and Intelligent Systems, HDIS 2023, p 47-52, 2023, 2023 International Conference on High Performance Big Data and Intelligent Systems, HDIS 2023;

Abstract:

Speech emotion recognition is a key technology within the field of human-computer interaction, which equips computers with the ability to recognize and understand human emotions by establishing emotional associations between computers and speech information. However, speech emotion recognition technology remains in the laboratory stage and has not been popularized and applied on a large scale. We design an FPGA-based speech emotion recognition system that deploys a CNN-LSTM neural network model. The neural network model is designed using HLS (High-level synthesis). The neural network is constructed on the PL side, and its scheduling and implementation are managed on the PS side. This system captures speech and analyzes emotions in real-time, which can be used in future wearables, smart homes, and smart robots to improve the human-computer interaction experience. We conducted experiment using the TESS (Toronto Emotional Speech Set) dataset, achieving an accuracy of 97.86%.

Institute of Semiconductors

Chinese Academy of Sciences

Appendix：