DETECTION AND TRACKING OF PEDESTRIAN CROSSWALKS USING RECURRENT NEURAL NETWORK CLASSIFIER AND HISTOGRAMS OF ORIENTED GRADIENTS IN THE INFRARED IMAGE SEQUENCE

Recently, detection of pedestrian crosswalks has been identified as one of the important issues in the field of image processing and statistical identification. The simultaneous detection and tracking of pedestrians are much important, however, there are challenges such as time-consuming and uncertainty in determining the position of a person. In the past, automated methods have been proposed that often have low accuracy and uncertainty in achieving an optimal response. These methods lack comprehensiveness and suffer over fitting problem. In this paper, an algorithm is proposed for the automatic detecting of the pedestrians’ position and tracking based on an efficient approach consisting of recurrent neural networks and the gradient histogram in infrared images, which is more accurate and faster than the similar methods. In the first step, the best features are selected by the gradient histogram algorithm and then detection and tracking will be done in the recurrent neural network algorithm. A K-fold validation technique is used to divide the training and test data with a value of K equals to 10. The proposed algorithm has an acceptable performance with an error of less than 5% in detecting and tracking pedestrians.


INTRODUCTION
Visible and infrared lights are one of the radiating and non-visible parts of electromagnetic waves spectrum that contain electromagnetic radiation with a wavelength higher than normal and visible light [1,2]. This part of spectrum starts from the red edge and ends with visible light and varies from a few hundred nm (430 terahertz) to 1 mm (300 GHz) [3]. Of course, someone might suppose it with the maximum spectral width up to 1000 nm [4,5]. Most of the thermal radiations from the objects at room temperature are almost infrared [6,7], and the largest source of this spectrum emission is the sun.
Infrared wave has employed in various applications, including infrared detectors, telescopes, night vision devices, airplanes, food heating and driving [8]. One of the main application of these waves is to use them in imaging, in which objects that have fewer temperatures are pale, and this property can be used in moving human tracking with the body temperature above ambient temperature [9]. In previous researches, this property has been used to track pedestrians [10]. But in this area, there are also problems that have made human detection difficult. In general, in infrared images, the human format is lighter and brighter than fixed objects in the environment such as trees, streets, and so on. However, detecting people in the visible images is a complicated and difficult process and it is not possible to detect people and separate them simply. For example, objects such as transformers, animals, electric boxes, cars, especially in the summer, complicate human tracking. Another problem is that if supposed that tracking of the people is to be sent elsewhere, the large amount of data cannot be effective. On the other hand, if objects or targets are not large in video frames, then the tracking process will be greatly complicated. Another major problem seen in this area is the inability to track the multi-objective targets or the human in the infrared image sequence, which received less attention in the literature.. For this reason, an efficient and robust pedestrian tracking algorithm is important in thermal and infrared image sequence. Considering the importance of using images in human video surveillance, one or more of the following main aims may be noticed [11]:  Protection and safety  Control, management and surveillance  Training and research Today, human video surveillance systems are used to protect and monitor important, high-risk, and crowded places. Museums, stores, prisons, airports, metro, and railway stations [12,13], and even hospitals and schools [14,15] are other places that are equipped these systems to track people. By existing of this system, it will be possible to detect and track special people more easily [16].

PREVIOUS METHODS
Research in this field has followed various goals; however, there are several methods to detect pedestrians that can be categorized according to the type of tracking technique they used: 1. Kalman filter 2. Time graph 3. Particle filter 4. Gaussian mixture model [4] 5. Maximal mathematical algorithm [18] Elguebaly et al. (2013) [4] studied the use of asymmetric Gaussian mixture models to detect targets in infrared images.
In 2014, Teutsch et al. [5] performed pedestrian detection in infrared images by the Hot Spot classification. Authors in [6] tracked pedestrians through infrared image fusion for the various visible spectrums. In 2015, Soundrapandiyan et al. [7] tracked targets by adaptive thresholding and background subtracting of sample infrared images. Another research in 2015 by Rajkumar et al. [8] was suggested the use of local thresholding for detecting of the pedestrians. Berg et al. [9] proposed object tracking with different temperatures in 2015, which was assumed as a basis factor. Recently, extensive researches have been done in this field  and most researchers have been looking for new methods with the proper accuracy and performance [17][18][19][20][21][22][23].

PROPOSED METHOD
The proposed method is a combination of feature extraction by gradient histogram and the use of the recurrent neural network for the classification, as shown in Fig. 1. The tracking process is based on the Kalman filter method which is described in the subsequent sections.

Kalman Filter
Kalman filter predicts a new state of the system, compares the prediction value with the measured value, and weighs the difference of predicted and measured value, and creates a new estimation. The estimation at the time is displayed ti in equation (1): In this equation, () i At is also called the system matrix, and the gain matrix is obtained from the covariance matrix of the error. The Kalman filter algorithm is recursive; it means that the estimation includes all the system information in the past, without saving all measured values. The intensity of a pixel is estimated as a state of the system to be assumed as a background pixel. In this case, the threshold is considered based on the equations that if the moving object in the image follows the equations, then it will be a part of the background; otherwise it is a part of the foreground.

Feature extraction
The main idea in the histograms of oriented gradients method is that the distribution of local gradients or edge orientations can describe the image well, even if we do not have information about the exact position of the gradient or the corresponding edges. This feature shows the orientation of the image in a local neighbourhood. Each of these is called a cell. Depending on whether the gradient is unsigned or signed, the distance between 0-180 degrees or 0-360 degrees are divided into n equal parts, which n represents the number of gradient orientations or the histogram intervals, and each of these distances is a histogram channel. In each cell, the histogram of the oriented gradients is calculated for the pixels within the cell. Then, to resistance intensity changing, the histograms create a block and normalize. Normalized histograms are called oriented gradients. Blocks can be chosen as overlapped and non-overlapped, and usually, choose them overlapped. However, the length of the feature vector increases, but in this case, the accuracy of the performance will improve. For colour images, the gradient for each colour channel is calculated individually and the largest value for each pixel is calculated as the gradient vector in each cell.

Recurrent neural network
The best dynamic model that can be adapted to the type of the new data is the neural network model. The reason for this choice is that it can be achieved to efficient models by adjusting weights. It is noteworthy that the exact adjustment parameters cannot be assumed to be effective in this model, except the number of neurons in each layer and the number of layers corresponding to the problem. To this end, we use the repetition process as one of the methods for capturing the best neural network classifier model that provides the highest level of accuracy. The proposed model has solved in the following:  Solve the overfitting problem: The model is only created by using training patterns, but if another data, even a bit far from the training set, is applied to the model, the model is not able to correctly answer the new data and classified them with a lot of error.
 Solve the underfitting problem: The algorithm obtains a general model of the training set, but cannot decide on the data with which it has made the decision model. Usually, the structure of the neural networks avoids underfeeding.
 Possibility to save the best model: Due to that, we need to test after finding the best model, we save the model so that it can decide with new data.

EXPERIMENTAL RESULTS
The results of the simulation include tracking and the classification step, which is mentioned separately. In the tracking step, the Kalman filter is used, and in the second step, the detection is done using histograms of oriented gradients and recurrent neural network.

 Tracking step
In the tracking step performed by the Kalman algorithm, the pedestrian location is found, and for each mapped template onto the body of the person, the desired features are extracted. In Figure  2, an example of the primary tracking is displayed. Also, in Figure 3, the result of tracking is displayed beside the color frames. All videos are randomly converted into 20 clips in AVI format with a resolution of 320x340 pixels at 15 frames per second. The entire videos that include pedestrians and thermal objects are in 10 clips, and all the clips similar to pedestrians were placed in 10 other clips. The whole of these clips took about 30 minutes. Approximately 26438 frames were selected; videos filmed in an environment (pedestrian or objects similar to the human body format) were computed together. For example, Clip 1 and Clip 10 were filmed in an environment and Clip 1 included pedestrians, and the Clip 2 frames contained objects similar to the human body format. Table 1 shows the results.  In some responses, the cost function is zero, indicating that the performance of the proposed neural network as a cluster is along with extracting the optimal properties and finding the best movement position of the individuals in the thermal images.