The workflow of human target detection based on thermal infrared images is shown in Figure 13. Mainly include image acquisition, image preprocessing, image segmentation, region of interest (region of interest, ROI target selection, human target feature extraction and recognition, human target tracking, advanced application analysis (such as action recognition, abnormal behavior recognition, group analysis, etc.) In recent years, with the development of machine learning theory, especially deep learning theory, multiple independent steps in the box in the figure may be integrated into the same algorithm.
1) Image acquisition
Using an infrared thermal imager (or combined with imaging devices such as CCD/CMOS cameras), a monocular binocular or multi-eye vision system can be constructed to realize single/dual/multi-channel thermal infrared image/video acquisition. The collected images/videos provide a data source for the human target detection system.
2) Image preprocessing
Thermal infrared images generally have weak quality, which is manifested as strong spatial correlation, high noise, low contrast, blurred edges, and lack of details. Image preprocessing can enhance image edges and details, improve image contrast, suppress background noise, and improve image data availability and visual effects.
Thermal infrared image preprocessing algorithms mainly include traditional algorithms such as unsharp mask, histogram processing, high dynamic range processing and homomorphic filtering, as well as Retinex algorithm pseudocolorization, wavelet transform, post-wavelet transform, morphological filtering and deep learning. and other new algorithms.
3) Image segmentation
Image segmentation is an important basis for image analysis and understanding. In Figure 14, the image segmentation uses the discontinuity in the features of the human body area and the background area to separate the human body target area from the background, and the extracted area is called ROI. Due to the characteristics of infrared imaging, the grayscale range of the human body in thermal infrared images is smaller than that of visible light images, the boundary between the ROI and the surrounding background is very blurred, and there is less available information, so it is very difficult to implement effective separation, and it is not easy to achieve accurate positioning of the ROI.
The representative methods of thermal infrared image segmentation mainly include threshold method, region growing method, fuzzy logic method, rough set theory method, evolutionary algorithm, swarm intelligence method, active contour model method, neural network method, etc. Which method to choose is highly relevant to the application scenario.
4) ROI target selection
Due to the weak quality of thermal infrared images and the complexity of human poses, the binarized foreground regions generated by image segmentation usually contain many regions that do not belong to the human body. In addition, it is difficult to provide enough target information for subsequent identification of a human body area that is too small. Therefore, the blocks generated by image segmentation can be filtered according to certain criteria to reduce the computational pressure of subsequent processing.
5) Human target feature extraction and recognition
The task of human object recognition is to distinguish human objects in images from other non-human objects. There are two difficulties in accomplishing the task. On the one hand, the identification category is not clear. It is reflected in the distinction between human targets and other interfering targets in the image, and the commonality of different human targets must be summarized under the premise that the characteristics of the human targets themselves do not have good stability and clusterability. On the other hand, the target itself has a complex structure and cannot be located. The flexibility of human motion is great, and the joint parts of the human body can no longer be accurately positioned on the two-dimensional projection plane. This brings a lot of trouble to the formulation of the identification scheme.
At present, the methods of human target recognition can be classified into human body model method, template matching method, motion detection method and statistical classification method.
The human body model method uses simple two-dimensional graphics or line drawings to generalize and approximate the human body contour, skeleton or various parts, and then match the detected objects to these models. This kind of method is dedicated to the intuitive expression of human body shape, which is easy to analyze and understand the human body posture and movement.
The template matching method first obtains the template of the whole body or part of the human body and effectively organizes it into a template set. During recognition, the candidate target and the template set are matched according to a certain distance measure. Template types include grayscale templates, probability templates, and so on.
Motion detection methods try to detect objects through motion information, usually by detecting the periodicity of human gait. Such methods require temporal information and are therefore inapplicable to stationary and unconventional gaits, in addition to requiring visible legs or feet. Real-time performance suffers as the computation requires multiple frames.
Statistical classification works under the framework of pattern recognition theory, and its working steps can be expressed as "target feature extraction + classifier judgment". Target features include global features and local features. The global feature is to describe the contour or area of the entire human target as an object. Their effectiveness relies on complete access to human targets. Due to the weak quality of thermal infrared images and the complexity of human body morphology, candidate targets that meet this requirement are often difficult to obtain. The acquisition of local features is generally based on a dense scan of the entire target area. They do not require an explicit target model and are robust to target occlusion, but may be computationally expensive. In terms of classification technology, support vector machine (SVM) correlation vector machine (RVM), artificial neural network (ANN), boosting (Boosting), etc. are widely used. Among them, ANN has a good effect on human detection, but the parameter selection is very difficult. SVM is based on kernel function transformation and has good generalization performance, but if there are a large number of candidate targets in the image, its computational load may be heavy.
6) Human target tracking
Human target tracking is to continuously locate the recognized human object in the image field of view, thereby obtaining the time domain information such as the movement speed, movement direction, and movement path of the tracked object. For human targets, traditional rigid target tracking methods, such as the widely used optical flow method, do not have the basic conditions for realization. Tracking methods based on the assumption of motion laws, such as Kalman filtering, are also difficult to work effectively. Some non-rigid object tracking methods calculated by color model and texture similarity in visible light image target tracking cannot be used [9].
There are two types of infrared human target tracking methods: generative and discriminative. The generative method describes the performance characteristics of the target by generating a model. During the image tracking process, the most similar area to the model is searched as the tracking result. Representative methods include principal component analysis, sparse coding and so on. Generative methods focus on building an effective appearance model of the target without considering the utilization of background information. The discriminative method distinguishes the target from the background by training the classifier, and the typical random forest method is the mean shift method. Due to the full use of background information, a large number of negative samples can be obtained for classifier training, and the tracking performance of the discriminative method is relatively more advantageous.
7) Advanced Application Analysis
On the basis of identifying and tracking human targets, advanced applications related to human target detection can be realized. For example, in intelligent monitoring applications, the identity of human targets can be verified through face recognition, gait analysis, etc.; combined with the prior knowledge of the scene, through action recognition, trajectory analysis, etc., abnormal behaviors of human targets can be determined.