Color Interactive Contents System using Kinect Camera Calibration

Recently, media content that interacts in real time is increasing. In this paper, we introduce a real-time color extraction content system that utilizes the Kinect camera used in ‘COLOR’ media art. The Kinect camera used in the work detects and tracks the joints of the visitors that enter the exhibition space. Kinect detected data is mapped to color calibration in a Unity environment to generate a point cloud video. Get the pixel color of the spine shoulder joint coordinates of the visitor in the point cloud image. The color data is output on the screen in the form of color one, and passes through along with the spectators. Color circle decreases as the distance between the visitors and Kinect increases and the distance increases. When visitors come in and the color circles overlap, color of the mixed part will have an intermediate value between the two color circles. This work shows the form of a person's social movement through the colors that each person has and the mixture of the colors. The technology used in this work differs from other media arts in that it extracted the calibrated image colors separately and advanced the interactive media arts. We will improve the accuracy of the point cloud that corrects the color image and the depth image, and improve the color extraction accuracy of the visitors.


Introduction
Recently, various real-time interactive works have been increasing as exhibitions in the development of fusion art [1][2][3][4][5]. Especially in the interactive media art exhibition space, various digital contents are being rapidly developed based on digital technology [3]. The spatial act of interactive media art allows the audience, who is the subject of interactivity, to proactively alter the media environment in a dynamic and bi-directional manner, promoting various sensory and perceptive experiences of the body [4]. The Microsoft Kinect has been recognized as the most well-implemented and applied device for realizing a gesture interface using human tracking as a new input device. The production of Kinect was suspended from October 2017, but the core technology RGB single-depth motion capture has been utilized in the Microsoft HoloLens [5]. The Kinect sensor uses the Kinect V2 library to analyze and recognize people with an algorithm of a total of 25 skeleton joint points [6]. And when there are several people, it is possible to recognize each person separately [7]. The real-time reaction of Kinect Sensor can be effectively used in interactive contents that are variously expressed according to the behavior of the audience.
In this paper, we introduce the media art work 'COLOR' that uses color extraction technology that utilizes the Kinect sensor. And provide the interactive content system used for the 'COLOR' work. The media art 'COLOR' is designed so that the Kinect sensor extracts the color of clothes on the visitor when a visitor enters the exhibition space. Then, the color circle is transmitted through the beam. The closer the distance between the work and the viewers, the larger the color circle, and the further the distance between them, the smaller the circle. In this paper, after grasping the symbolism of 'COLOR' media art, we discuss the skeletal color extraction system technology used in connection with the work. Finally, the media art installation structure in the actual exhibition environment and the future complementary points of the interactive system will be additionally described.

2.
Media art 'COLOR' In this chapter, we introduce interactive media art with a similar expression method, and propose the meaning of 'COLOR' works and the design of the interaction system of the works. Figure 1 is Rafael Lozano-Hemmer's 'Body Movies' [8]. "Body Movies" projects the images that people have taken, and projects the shadow of the audience on the screen twice. This work is his signature public media art project and illustrates the "relationship" of our society. "Body Movies" utilizes a common space with interactive projection. It's a work that allows a silhouette to appear inside the shadow projected to pass in front of the projector.  These interactive media art works use Kinect sensors that can recognize multiple people so that visitors can experience interaction activities with each other. Connecting other visitors and others through fluid movements on one screen-interface may lead to a relational architecture of "being together" [10]. Interaction space design centered on the experience of visitor allows spectators to participate, be used, observe and generate valuable experiences through mutual trust relationships [3].

Symbol of the work
'COLOR' is a work produced in 2020. 'COLOR' designs a real-time interactive media art system in which the video changes according to the clothes the viewers wear. 'COLOR' is a work created by utilizing the characteristics of mixed colors. People live with different personalities and thoughts. 'COLOR' describes each of the other figures as a 'circular color.' "Circular colors" are mixed at intersections to create other colors when you meet others. Visitor can mix colors with multiple people while moving the color corresponding to himself. The mixture of colors represents a visualization of the "sociality" that humans live and form. The work allows visitor to feel the message of the work directly experiencing the media art.

3.
Interactive contents system Media art 'COLOR' uses Microsoft's Kinect for Xbox one (Kinect V2) to implement an effective exhibition through color extraction. Kinect can acquire spatial 3D coordinates of 25 human joints in real time [11]. And when multiple visitors come in, it is also possible for the Kinect sensor to recognize each visitor.
In this work, the Kinect V2 library is used to bring an image with calibration color and depth values to the Unity environment. The unity extracts the color value of the visitor's spine shoulder joint pixel color and outputs it as the color circle. Then, the beam projector outputs a color circle video whose size changes according to the real-time visitor viewing position.

Kinect color-depth calibration
In order to implement the 'COLOR' work, it is necessary to extract the pixel value of the color corresponding to the position of the skeleton. The Kinect camera is composed of an RGB color camera that acquires color images and an Infrared(IR) camera that creates depth images. However, in the Kinect, the color camera and the IR camera are physically separated from each other. Therefore, a calibration between the two cameras is needed to find out how to pay back a particular pixel depth from the color image.

Figure 3. Kinect Teardown[12]
Kinect camera color and depth images are not located at the same coordinates in the structure. So use the Kinect SDK MapColorFrameToDepthSpace function. After that, the color image can be mapped to the depth image space [13]. Looking at Figure 4, 3D coordinate data generated in the depth map is merged with RGB color information. That way, a Colorized Point Cloud is created [14].

System overview
Looking at Figure 6, the interactive content system proposed in this paper consists of a Kinect camera for human body recognition, a laptop for data processing, and a beam projector for real-time output of interactive content images. All equipment is placed towards the ceiling or walls that did not appear to interfere with the viewing of the exhibition. These arrangements will provide an environment where visitors can immerse themselves in the exhibition. Kinect sensing range is possible up to 8m, but the point that can stably track human joints is within 4.5m [16]. Therefore, the exhibition environment for constructing a stable media art environment is configured within a space of 3 m in width and 4 m in height. When a visitor enters, color circle is generated according to the color of their top clothes. There is an interaction in which the color circle that approaches the image becomes larger and the color circle becomes smaller as the distance leaves. Therefore, the audience naturally moves back and forth and experiences the media art.  Figure 7 is the interactive media art 'COLOR'. Kinect tracks spine shoulders as spectators enter. After that, the color of the coordinate pixel of the spine shoulder is detected and the color circle is output to the screen. If there is a distance from Kinect as shown in the photo on the left, the color circle becomes smaller. On the contrary, as shown in the photo on the right, the closer to Kinect, the larger the color circle.

Interaction data processing
Kinect v2 FOV (Field of View) has horizontal 70 degrees and vertical 60 degrees, and the depth acquisition range is 0.5m to 8m [17]. Looking at the proposed process in Fig. 8, we set the sensing range from 0m to 4m of exhibition space after Kinect camera calibration. If the audience is not within the sensing range, the video will not appear. However, if the visitor's body is detected within range, it will track the Spine shoulder. After that, the color of the pixel at the coordinates of the spine shoulder joint is detected. When the color of the coordinate pixel is detected, a color circle is output to the screen. The maximum distance of the exhibition space is set to 4m because the point that can stably track human joints is within 4.5m.

Conclusions
In this paper, technology and art are combined with a focus on 'COLOR', an interactive media art work. In addition, we grasp the symbolism of 'COLOR' works, and discussed technical production and implementation. The color of the pixel on the visitor's top is detected in real time through the Kinect camera and output as the color circle. Then, we studied the interaction content system that measures the distance between the visitors and Kinect and changes the size of the color circle according to the distance. Further research has found some points regarding interaction content systems. In the next study, we will improve the accuracy of the point cloud that corrects the color image and the depth image, and improve the color extraction accuracy of the visitors. This work extracts the color of Kinect Point Cloud, so a perfect lighting setting is needed to extract and output the correct color. Technical analysis using Kinect sensors will be the basis for continuous research on immersive contents. In the future, if the technology of this research is used by connecting it to other works that use color, it will lead to better research results.