ORIGAMI – Oration to Physiognomy
Main Article Content
Abstract
More than 400,000 people die from homicide and other crimes each year. Despite the law workers doing everything they can these numbers keep rising. In order to provide help to deal with this situation, this paper explains how we can use deep learning to make correlations between faces and sound they produce which can, in turn, be used to track down criminals. Here, we analyze and comprehend how exactly a person’s face is created from a short audio clip of his/her voice. A deep neural network is trained and devised using a dataset of people speaking. Various physical features of the speaker like age, gender and race are captured during training by learning voice-face correlation in a self-supervised manner without explicit modelling of these features. Variations in numeric values between reconstructions from audio and original images are compared and evaluated to train the model to determine how the model works. The canonical face of a person is created from the audio clip. The reconstructed image will not be like the true image but will have the most prominent features of the true image.
Downloads
Metrics
Article Details
Licensing
TURCOMAT publishes articles under the Creative Commons Attribution 4.0 International License (CC BY 4.0). This licensing allows for any use of the work, provided the original author(s) and source are credited, thereby facilitating the free exchange and use of research for the advancement of knowledge.
Detailed Licensing Terms
Attribution (BY): Users must give appropriate credit, provide a link to the license, and indicate if changes were made. Users may do so in any reasonable manner, but not in any way that suggests the licensor endorses them or their use.
No Additional Restrictions: Users may not apply legal terms or technological measures that legally restrict others from doing anything the license permits.