Real Distance Measurement Using Object Detection of Artificial Intelligence

: Artificial intelligence technology is developing rapidly in recent years. The purpose of this paper is to measure the distance to an object using this. In order to measure the distance, two separate pictures from same angles of the object will be taken. It extracts sizes for the same object in two pictures. In order to do this in real time, object detection technology of Artificial Intelligence on mobile phone was used. In this paper, a method for measuring the distance from two pictures is presented. The proposed method was implemented as a prototype on iOS. In order to measure the performance of distance measurement, experiments were conducted in various environments. In the experiments, the empirical data yielded some discrepancies with the actual measurement. This was a result of errors occurring in the object detection process where the actual size of the object was calculated. Despite these discrepancies, this method of object detection may be widely used in instances where accurate measurements are not necessarily required such as guidance systems for the visually impaired.


Introduction
Artificial Intelligence has been developing with great speed and there are already numerous real-life cases in diverse industries where Artificial Intelligence is used to facilitate their work. This is because Artificial Intelligence is yielding actual, tangible results. Deep Learning, an Artificial Intelligence technology that uses artificial neural networks, is a lead in the industry. There are already many researches conducted to create application software that applies such Deep Learning technology. One of the most representative researches errors, the convenience and real-time measurements may very well be applied in various fields of studies. These may include guidance systems for the visually impaired.

Related studies
Recent developments in the Artificial Intelligence technology are truly amazing. And with such rapid developments in the field, various researches to utilize Artificial Intelligence are being conducted every day. Machine Learning is one of the most representative Artificial Intelligence technologies. Just like humans that grow more intelligent as new information is studied, Machine Learning enables the computer to study and accumulate knowledge to make better and educated decisions. Numerous studies and researches have been conducted in the area, including the development of Deep Learning. Deep Learning is a new technology created in light of the development of Artificial Neural Networks. This functions like the human brain.
CNN and RNN are one of the most well-known products of the Deep Learning Model [1,2]. CNN is used primarily to identify images. Images can easily be altered into completely new and different files in the eyes of the computer with even the most minor tweak in direction or location. CNN is capable of recognizing images regardless of these alterations. It applies continuously the process of Convolution and Polling to extract abstract information about the image. RNN is used primarily to recognized sequential information such as letters and voices. The RNN is programmed so that a previously entered value/information affects the interpretation of the value/information entered afterwards. This is because, for instance, when there is a need to recognize/analyze a sentence, words must be interpreted based on the context of the words before and after it. Thus, RNN is applied in various technologies such as voice recognition, machine translation, and image descriptions.
Conventionally, CNN has always been used for image classification [5]. That is, when a certain image is given, the CNN classifies whether the object in the image is a dog or a cat, etc. Unlike the conventional system that analyzes all the pixels in the image to produce a result, CNN extracts certain key features of the image into the Neural Network to classify what the image is. It is up to the user to determine which parts of the images are key features to be used in the process of classification. For instance, one may want to set the color, length, and shape as key features to differentiate a banana from an apple. CNN automates this process.
Within the many models of CNN, the object detection model is used most widely. This model detects objects within one image, as can be shown in figure 1. It can also identify and classify what the detected objects are. In figure 1, it has identified several people and a basket. The numbers in the figure indicate the certainty of the classified results. There are numerous libraries of object detection technology including tensorflow, YOLO, ImageAI, Detectron2: PyTorch-based modular object detection library [5,7]. The tensorflow is the most widely used [5]. Autonomous vehicles, multi drones, and robots are all new technologies experiencing a boom in light of recent developments in Artificial Intelligence [11,12]. All these new technologies require essentially the technology for the computer to recognize its surroundings. Amongst the technologies that enable them to do so, one of the most pivotal technologies is one that allows the computer to recognize/measure the distance with other surrounding objects. Currently, Radar and Lidar are most commonly used to achieve this. Radar calculates distances through reflected electromagnetic waves. Radar allows the computer to calculate distances to a fairly accurate degree even in night times where there is little light. However, the further away objects are, the more errors there are in Radar measurements. Lidar determines the distance, shape, and material of its surroundings through the reflection of its beams [13][14][15][16]. Lidar is resistant to harsh weather conditions and also has a shorter wavelength than Radars. Depending on where/what these technologies are applied to, the accuracy and speed of measurements required may differ greatly. This paper aims to analyze images inputted by the camera -with the help of Artificial Intelligence technology-to measure distances between objects and the camera.

Algorithm and implementation
Recently, there have been many attempts to use the camera as a means of measuring the size of an object or the distance between the camera and the object. This is because, as most mobile phones come with a camera nowadays, cameras have become increasingly accessible every day. This paper proposes methods to calculate distances with certain objects using the cameras attached on mobile phones and conduct several experiments to verify its effectiveness. Artificial Intelligence is used to identify objects and calculate distances in front of the camera.

Distance measurement algorithms
Measurement of the distance between the camera and a given object is calculated through the principles of how a camera works. If the actual size of the object is given, it is very simple and easy to calculate the distance between that object and the camera. This is because it is very easy to understand how a camera works. figure 2 is a simple explanation of how a camera works. The left side of figure 2 indicates an actual object, and the right side indicates how that object appears on the camera.  figure 2 is where the camera's lenses are. AB is where the actual object is located, and ab is the image of the object appearing on the camera. In this arrangement, triangles ABO and abO are similar figures. Thus, the following equation can be inferred.
W is the actual size (height or area) of the object. P is the size of the image of the object reflected on the camera. F is the focal distance of the camera. Since ABO and abO are similar figures, the equation 1 definitely holds. Equation 2 can easily be inferred from equation 1. The value of F in equation 2 is usually a constant. If the value of P and W are given, the value of D can easily be calculated. This paper, under the assumption that F and P can effectively be measured, aims to measure the distance between the camera and the object.
The problem with equation 2 is that, in order to measure distance real-time, the size of the object must always be given. This is impossible in a real-life environment. Since the actual size of the object is unknown, an alternative set of information about the object is required. This paper proposes the camera's movement as the alternative. That is, rather than a single photo shot of the object, multiple photo shoots of an object at a certain distance can be used to calculate the distance between the camera and the object. (4) = + × ′ Using equations 2 and 5, the following equation can also be inferred.
Equation 6 does not require W which is the actual size of the object. Thus, even without the value of W the system is still capable of calculating the distance between the object and the camera. This does, however, require two more measurements of d and P'.
It is fairly easy to measure the size of an object in an image. It can be achieved through the simple process of measuring the object within the image and displaying the value through pixels. As can be seen from (6), the measuring units of P and P' are not important unless they use a single common unit of measurement such as pixels or inches. It was the general understanding that measuring objects in images and videos is extremely difficult due to the various other objects that are also in the image or video. This, however, has changed as recent developments in Artificial Intelligence have achieved great success in image classification. Image classification is the technology to identify and recognize certain objects within images and comparing them to pre-learnt data to classify what the objects are. This technology extracts object information such as size and names from images through pre-learnt data. This usually takes around 300msec per image. Since there is no preemptive process required to analyze the image, this process of image classification can be considered realtime. figure 1 is a screenshot of object detection. This paper aims to measure P and P' in figure 3 using such object detection technologies.
To measure D in equation 6, the value of d must be measured. This can be measured through the simple process of moving the camera back and forth to a desired amount. For the purposes of this paper, d (the amount of which the camera moves) is set to 20cm. That is, the object is identified at the current location to measure P. Then, the camera will move 20cm forward and identify the object again at that location.

Implementation
A measuring software was developed to evaluate the accuracy of the proposed method. An iPhone 8 plus running at iOS 13.1 was used as the carrier for the software, with the software itself being created using Xcode's swift. Google's tensorflow-lite was used as the object detection model. In this particular experiment, it is important to select the right object. As can be seen in figure 1, an object detection software tries to identify as many objects as possible in any given moment. This test aims to measure the distance between the camera and the object closest to it.
The general algorithm of the developed software is as shown in figure 4. It starts by taking a photo, and then identifying objects through the object detection module. This module identifies any and all objects within the picture and calculates and displays the rectangles surrounding the object as well. The analysis module attempts to sort out the target object. In this paper, objects in the front are selected as target objects. If the selection of target objects is unclear, the software requests another photo shot or object detection. It analyses any new input and will request movement in order to measure the value of 'd' from (6). After having moved the amount of 'd', another picture is captured and the size of the target object is measured. figure 5 is a screen shot of the software in operation. As can be seen in the figure, the large rectangle in the left is tagged "potted plant" with an indication that it is 63% sure that the object is in fact a potted plant. Also, the object has a height of 425 and a width of 217, both in pixels. There are various methods to selecting a target object, but it is not easy. The object detection modules that we currently use are extremely fluctuating in that the target object could change at any given point even when the camera is standing still.

Performance measurements
In order to evaluate performance, several photos of various real-life settings were taken through the developed app whilst it also measuring P and P'. The application takes photos every 300msec and detects objects within those images. Because the detection process is so fast, even the smallest change in images can yield sensitive changes in detection results. Thus, it is more likely for errors to occur if object detection were to be conducted only once. In table 1, it shows the measurement of the distances according to d at each distance.
The d values were fixed at 10 cm, 20 cm, and 30 cm. These are distances measured at actual distances of 1m, 1.5m, and 2 m. In order to measure the size of the pixels displayed on the screen, it was measured by repeating 10 times in figure 4 and calculated by averaging them. Table 1 Different measurement sizes for object at the same distance As shown in the table, the measured distances are not accurate. Measurements at short distances are relatively accurate, but measurements at long distances are inaccurate. This is because the accuracy of object detection is more accurate at short distances. If the value of d is larger than the short case, the measurement distance is more accurate. This is considered to be because the relative error decreases when the value of P'-P in (6) is larger. From results of those experiments, distance measurements using object detection modules are not yet precisely accurate. Compared to the conventional methods of Radars and Lidars, they lack significantly in accuracy. One of the main reasons for this is that object detection technology cannot yet measure accurately the size of detected objects. Thus, when the value of P and P' in (6) is inaccurate, measurements are bound to be inaccurate as well. However, Artificial Intelligence is developing rapidly. Object detection technology still holds great potential, and in the near future, more accurate measurements using the technology will be made possible.
Despite its inaccuracy, the results of this paper may be used in various areas. Because it detects objects real time, it can be used to warn people of objects approaching them, as well as a guidance system for the visually impaired. As mobile smart devices are becoming increasingly accessible, most people -including the visually impaired -will carry around at least one smart device at all times. The proposed system can be of great help if it is linked to such smart devices to continuously monitor and alert its user of any obstacles and its distance with the user. This paper is aimed partially at developing such assistance systems for the visually impaired. Accuracy is not the primary focus in such cases. All that the user needs are a rough measurement of distance that will allow him/her to move away from obstacles that are in the way.

Conclusion
This paper proposes a new method of distance measurement using cameras and Artificial Intelligence technology. It is generally very easy to calculate the distance from a camera to an object, if the actual size of the object and the size of its reflection on the camera is given. Based on such facts, we have developed a system where photos of an object are taken from two separate locations, and object detection -a sort of Artificial Intelligence technology -is used to measure the distance between it and the camera.
Several equations were induced mathematically to ensure the accuracy of the proposed method. The iPhone 8 plus was used to ensure that the process was real-time. Google tensorflow-lite object detection model was used. Several experiments in various environments were conducted. Empirical data had illustrated that such distance measurements using object detection modules are not yet precise and accurate. One of the main reasons for this is that object detection technology cannot yet measure accurately the size of detected objects. This is an issue to be resolved relatively easily with the advancement of Artificial Intelligence in the near future.
The proposed method and developed system can be used to aid the visually impaired. If the system is incorporated in guidance systems for the visually impaired, it could effectively alert its user of obstacles that are in the way. The system may not provide a precise and accurate measurement of distance, but the information it provides will be more than enough for the user to avoid colliding into obstacles. The results of this paper will be used as a piece in developing guidance systems for the visually impaired.

Acknowledgements
This research was financially supported by Hansung University.