ORIGAMI – Oration to Physiognomy

Main Article Content

Mansi Pandya, et.al


More than 400,000 people die from homicide and other crimes each year. Despite the law workers doing everything they can these numbers keep rising. In order to provide help to deal with this situation, this paper explains how we can use deep learning to make correlations between faces and sound they produce which can, in turn, be used to track down criminals. Here, we analyze and comprehend how exactly a person’s face is created from a short audio clip of his/her voice. A deep neural network is trained and devised using a dataset of people speaking. Various physical features of the speaker like age, gender and race are captured during training by learning voice-face correlation in a self-supervised manner without explicit modelling of these features. Variations in numeric values between reconstructions from audio and original images are compared and evaluated to train the model to determine how the model works. The canonical face of a person is created from the audio clip. The reconstructed image will not be like the true image but will have the most prominent features of the true image.

Article Details