Investigating the Use of Eye Fixation Data for Emotion Classification in VR

1 GraduateResearcher, Evolutionary Computing Laboratory, Faculty of Computing and Informatics, Universiti Malaysia Sabah, Jalan UMS, 88400, Kota Kinabalu, Sabah, Malaysia 2 Senior Lecturer, Faculty of Computing and Informatics, Universiti Malaysia Sabah, Jalan UMS, 88400, Kota Kinabalu, Sabah, Malaysia 3 Professor, Faculty of Computing and Informatics, Universiti Malaysia Sabah, Jalan UMS, 88400, Kota Kinabalu, Sabah, Malaysia E-mail: 3* jtwteo@ums.edu.my


Introduction
The study of emotion detection is a fascinating area of study and, with recent developments in high technology, the amount of research has increased significantly. Emotion recognition can be contributed in many domains such as in the field of medicine, education, psychology, and entertainment. Emotions are significant psychological instances that describe many human emotional states. The study of emotional states is useful for both the understanding of human behavior and the integration of human factors into artificial systems. Therefore, the studies focusing on human-computer interaction (HCI) (Lalitha et al., 2015;Jamshidnejad & Jamshidined, 2009) is also in an increasing trend to improve and enhance the interaction between users and the machine based on the understanding of human behavior (Pantic et al., 2007). There are many applications with emotion detection that have been developed and applied in many areas such as marketing (Consoli, 2010) and healthcare (Guo et al., 2013). Emotion can be detected using non-physiological signals such as face gesture (Gunes & Piccardi, 2007) and speech (Han, Yu, & Tashev, 2014), or physiological signals such as the electrocardiogram (ECG) (Ferdinando, Seppänen, & Alasaarela, 2017), electrooculograms (EOG) (Soundariya & Renuga, 2017), and electromyogram (EMG) (Yang & Yang, 2011), electroencephalograms (EEG) (Qing et al., 2019), and galvanic skin response (GSR) signals (Goshvarpour, Abbasi, & Goshvarpour, 2017). However, compares to non-physiological signals, the scientists are more likely to use physiological signals due to their reliability and usability. EEG signal is the most commonly used since it is a direct connection to the nervous system and human brain activities. Recently, the eye features from eye-tracking data are also widely used by researchers on their emotional investigations. The emotion classification can be done by using several types of eye signals such as pupil size, pupil position, and saccades. A recent report presented the use of eye-tracking for emotion recognition including multiple eye features with current challenges (Lim, Mountstephens, & Teo, 2020).
Moreover, to stimulate an individual's emotional states, an emotion stimulation tool is required. There are several ways to evoke the user's emotions such as watching a film, looking at an image, or listening to music. For example, an emotion can be classified using EEG data by watching a movie (Nie et al., 2011). However, with high technology development, Virtual Reality (VR) is developed. VR is an emerging technology in recent years. Itcan create a 3D virtual world or an immersive environment that is generated by a computer. Using VR as stimuli, the users will be fully controlled and they are immersed in a real-world experience, hence, it is fewer distractions and influences from the outside environment. Therefore, a more realistic response and reaction can be obtained from the user. Furthermore, nowadays VR headset is integrated with eye-tracking technology. It is more convenient and easier for recording and collecting the eye-tracking data when using a VR.
Throughout this paper, we utilized the eye-tracking data to detect emotions and classify them into four distinct classes in VR. In this investigation, fixation data is extracted and used for emotion recognition, and Support Vector Machine (SVM) is the machine learning classifier for classification tasks. Introduction, background, methodology, results, and conclusion are discussed in the topics. This section is briefly including the introduction of emotion recognition. The theory of emotions and eye-tracking with research works are presented in the background section. The methodology section includes the experimental setup, procedure, data collection, and classification method. The results and discussion are presented in the next section. The last section summarizes this paper and discusses future works.

Background
Emotions are behavioral states associated with the human brain triggered by neurophysiological changes related to an individual's feelings or behavioral responses. The definition of emotions can also be described as a degree of pleasure or displeasure, it is a positive emotion when happy while it is negative when upset. In emotion classification, it is possible to distinguish between emotional dispositions and emotional episodes. Six basic emotions such as happiness, anger, sadness, surprise, disgust, and fear are examined by Ekman's works (Ekman, 1999). His works are supported by the "wheel of emotions" from Plutchik and two primary emotions are added from the previous six basic emotions, which are trust and anticipation (Plutchik 2000). This emotion wheel depicted the eight fundamental emotions andillustrated various relationships between emotions from the emotion cluster. The Circumplex Model of Affects (Russell 1980), known as the model of emotion classification, is generally used to distinguish emotions. It consists of four quadrants in the dimensional model, which included the valence (positive/negative) and arousal (high/low) dimensions.
Eye-tracking is the method of assessing the eye movements or the focus point ofa user. It is a method of calculating the point of view or the eye's locations and gathering an individual's eye properties and recording them in the form of data. Recently, the research on eye-tracking is increasing significantly since its usability and it can be used in many fields such as neuroinformatic, educations, medical health, or gaming. The eye movement data can be collected by using an eye-tracker or a simple camera and the data collected can be used to detect emotions. The eye properties such as pupil size, pupil position, and fixation duration can be the indicator of emotion recognition. Fixation is the preservation of an individual's visual gaze in a single position. Several features can be extracted from fixation data such as the number of fixation counts, fixation duration, and first fixation. There are studies on emotion recognition using eye-tracking data such as pupil size ( VR is one of the recent advanced technology that can create a simulated environment that is generated by a computer program. The user has immersed in a virtual world within the VR headset hence a distraction from the outside environment can be dropped to the lowest. Therefore, an authentic reaction and response can be obtained from the user in VR stimuli. It is a good stimulation method to elicit the emotions of the user. VR nowadays is integrated with eye-tracking technology so it is convenient and easy for obtaining eye-tracking data. Thus, the objective of this paper is to investigate the use of fixation data on emotion recognition based on the four distinct quadrants from Russell's emotion modelin VR stimuli using an SVM machine learning classifier.

Experiment setup
In this emotional experiment, VR is used as our stimulation tool. A 360 0 video is presented using an HTC Vive VR headset with a pair of earphones to stimulate the emotions of the user. The contents of the emotional video clips are presented in four sessions according to the four distinct classesbased on thevalence and arousal dimension as shown in Figure 1.

Procedure
There was a total of 10 subjects (10 males) volunteered in this investigation and the age range is 22-29. Every participant was given an explanation and procedure of the investigation before the experiment began. For the recording and collecting of eye-tracking data, the hardware used is an add-on Pupil Labs eye-tracker, which is attached inside the VR headset. Figure 2 illustrated the flow of the video presentation when conducting the experiment. The total duration of the video presentation is approximately 6 minutes. It was given a 5-seconds preparation for VRstartup before the video clips started. There was a 10-second resting time between every video session until the video ended and it was about 80 seconds for every stimulation session. Figure 3 shows the VR stimuli used by the participants in the experiment. The participant sitsin front of the computer screen monitor. They are allowed to turn around their head and body in 360 degrees at their sitting position when watching the video. The participant is not allowed to stand up or move with a big step due to the limited wired attached to the VR headset.

Data collection and classifications
In data collection, the recording and capturing of eye-tracking data were done by using Pupil Capture, software with the Pupil Core Headset. The collected data is then undergoing the process of data visualization and exportation by using Pupil Player. The data is saved in CSV file format. The classification is done by using the fixation data. The Support Vector Machine (SVM) machine learning algorithm is used for the emotion classification tasks with the kernel of the Radial Basis Function (RBF). The running script is prepared by using Python programming language.

Results and Discussion
In this section, the following are the results obtained from the emotion classification. They are presented in the chart and table.  All classification results were obtained from the SVM machine learning algorithm with RBF kernel for each of the subjects in the experiment. Firstly, the user's eye data is recorded simultaneously when they are watching the video stimuli. The eye data is extracted in the form of data with the timestamp. The collected eye data is then matched up with video presentation time according to each of the session's quadrant. The unused data columns are then be removed. The data is saved and imported to the SVM machine learning algorithm to run the classification process.
The classification accuracy of each subject is presented in the chart and table for comparison. Fixation data is used as a single modality for emotion classification. We extracted the position of the fixation's centroid to classify the emotions. According to the results, the highest accuracy obtained was 69.23% while the 33.84% accuracy was the lowest performance obtained. The findings showed that emotion classification using fixation data has promising results in the prediction accuracy that close to 70% from a four-class random classification. Compares to the previous study (Zheng, Mountstephens, and Teo 2020) that using pupil diameter to classify emotions, emotion recognition using fixation data has a better performance.

Conclusion and Future Work
The main purpose of this paper is to investigate the use of fixation data for emotion classification based on the four distinct quadrants from Russell's emotion model in VR stimuli. We extracted the fixation data and uses it as the single emotional-relevant eye feature in this investigation. The machine learning classifier used in the experiment is SVM with RBF kernel and the highest accuracy obtained was 69.23% from this machine learning algorithm. The findings show that the results are promising in emotion recognition since the accuracy achieved at almost 70% from the random classifier. For future work, more features from the fixation data will be extracted and used for the emotion classification in a future experiment such as the fixation duration and total distance traveled from the fixation's position to compare the output here.