Autonomous vehicle surveillance through fuzzy C-means segmentation and DeepSORT on aerial images

Asifa Mehmood Qureshi; Moneerah Alotaibi; Sultan Refa Alotaibi; Dina Abdulaziz AlHammadi; Muhammad Asif Jamal; Ahmad Jalal; Bumshik Lee

doi:10.7717/peerj-cs.2835

Autonomous vehicle surveillance through fuzzy C-means segmentation and DeepSORT on aerial images

Asifa Mehmood Qureshi¹, Moneerah Alotaibi², Sultan Refa Alotaibi², Dina Abdulaziz AlHammadi³, Muhammad Asif Jamal^1,4, Ahmad Jalal^1,5, Bumshik Lee ⁴

1Department of Computer Science, Air University, Islamabad, Pakistan

2Department of Computer Science, College of Science and Humanities, Shaqra University, Dawadmi, Saudi Arabia

3Department of Information Systems, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University, Riyadh, Saudi Arabia

4Information and Communication Engineering, Chosun University, Gwangju, Republic of South Korea

5Department of Computer Science and Engineering, College of Informatics, Korea University, Seoul, Republic of South Korea

DOI: 10.7717/peerj-cs.2835

Published: 2025-05-01
Accepted: 2025-03-25
Received: 2024-10-24

Academic Editor: Aswani Kumar Cherukuri

Subject Areas: Artificial Intelligence, Autonomous Systems, Computer Vision, Security and Privacy
Keywords: Aerial images, Fuzzy C-means, DeepSORT, Smart traffic monitoring, FCM

Copyright: © 2025 Qureshi et al.
Licence: This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits using, remixing, and building upon the work non-commercially, as long as it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Computer Science) and either DOI or URL of the article must be cited.

Cite this article: Qureshi AM, Alotaibi M, Alotaibi SR, AlHammadi DA, Jamal MA, Jalal A, Lee B. 2025. Autonomous vehicle surveillance through fuzzy C-means segmentation and DeepSORT on aerial images. PeerJ Computer Science 11:e2835 https://doi.org/10.7717/peerj-cs.2835

The authors have chosen to make the review history of this article public.

Abstract

The high mobility of uncrewed aerial vehicles (UAVs) has led to their usage in various computer vision applications, notably in intelligent traffic surveillance, where it enhances productivity and simplifies the process. Yet, there are still several challenges that must be resolved to automate these systems. One significant challenge is the accurate extraction of vehicle foregrounds in complex traffic scenarios. As a result, this article proposes a novel vehicle detection and tracking system for autonomous vehicle surveillance, which employs Fuzzy C-mean clustering to segment the aerial images. After segmentation, we employed the YOLOv4 deep learning algorithm, which is efficient in detecting small-sized objects in vehicle detection. Furthermore, an ID assignment and recovery algorithm based on Speed-Up Robust Feature (SURF) is used for multi-vehicle tracking across image frames. Vehicles are determined by counting in each image to estimate the traffic density at different time intervals. Finally, these vehicles were tracked using DeepSORT, which combines the Kalman filter with deep learning to produce accurate results. Furthermore, to understand the traffic flow direction, the path trajectories of each tracked vehicle is projected. Our proposed model demonstrates a noteworthy vehicle detection and tracking rate during experimental validation, attaining precision scores of 0.82 and 0.80 over UAVDT and KIT-AIS datasets for vehicle detection. For vehicle tracking, the precision is 0.87 over the UAVDT dataset and 0.83 for the KIT-AIS dataset.

Introduction

The number of vehicles on the road rises exponentially with rapid economic and population growth. Road traffic monitoring is essential for analyzing traffic data and optimizing roadway operations, making it a key component of intelligent transportation systems (Dikbayir & Bulbul, 2020). It helps identify congestion hotspots, track vehicles, and analyze parking (Weng, Kuo & Tu, 2006; Wu & Yang, 2007; Luo et al., 2021; Puertas et al., 2022).

Recently, uncrewed aerial vehicles (UAVs) have gained popularity for capturing visual data using mounted cameras and computer vision-based object detection (Najiya & Archana, 2018). Aerial imaging applications include road inspection, crowd management, agriculture monitoring, and land use analysis (Schreuder et al., 2003; Ke et al., 2017; Bozcan & Kayacan, 2020; Omar et al., 2021). The portability, affordability, and adaptability of UAVs make them effective for traffic data collection and emergency response (Ringwald et al., 2019; De Moraes & De Freitas, 2020).

Aerial images cover vast areas, including complex backgrounds like trees, highways, and buildings, making object detection challenging. However, advancements in computer vision and deep learning enable efficient detection even in such conditions (Ahmed, Jalal & Rafique, 2019). This article focuses on vehicle detection and tracking for intelligent traffic monitoring using aerial images. Our proposed model segments images to reduce background complexity, detect vehicles, and tracks them across multiple frames.

Segmentation is crucial in traffic monitoring via aerial imagery, as it determines how well vehicles are distinguished from complex backgrounds. We used two techniques: random forest (RF) segmentation and Fuzzy C-means (FCM) clustering, each chosen for its strengths. RF, a supervised method, classifies pixels based on predefined features, excelling in structured environments but requiring training data and high computation. FCM, an unsupervised approach, adapts to background variations, enhancing generalizability. Compared to methods like Otsu’s thresholding, K-means, or graph-based segmentation, RF and FCM offer greater adaptability and accuracy in varying lighting and occlusions. Unlike fixed-threshold methods, FCM allows smooth transitions between object and background, while RF ensures context-aware foreground extraction, outperforming purely statistical techniques like K-means.

The RGB images from video sequences are first extracted and preprocessed. Then, segmentation is performed using the FCM technique (Hashemi, Gholian-Jouybari & Hajiaghaei-Keshteli, 2023). The segmented images are fed into the vehicle detection module, where YOLOv4 (Yusuf, Hanzla & Jalal, 2024) detects small objects. To track multiple vehicles across frames, each detected vehicle is assigned an ID using Speed-Up Robust Feature (SURF) (Du, Su & Cai, 2009). A vehicle count is maintained across frames to estimate traffic density. Tracking is achieved via the DeepSORT Kalman filter (Kejriwal et al., 2022). The proposed traffic monitoring system is validated using UAVDT (Yu et al., 2020) and KIT-AIS (Beheim, 2021) datasets, demonstrating superior detection and tracking precision compared to state-of-the-art (SOTA) methods. The key contributions of this work include:

A hybrid model for the detection and tracking of vehicles on roadways for effective monitoring of transportation networks in aerial images having different complexity levels is presented.
A comparison of the random forest (Aroef, Rivan & Rustam, 2020) classifier and unsupervised segmentation technique, i.e., the Fuzzy C-mean algorithm, has been used for segmentation purposes.
We have greatly improved the precision, recall, F1 score, and quality performance measures for vehicle detection and tracking when compared to previous methods.
Vehicle tracking using the DeepSORT algorithm, ID assignment and recovery module based on SURF has been implemented.

The proposed system’s accuracy is validated on three public datasets. The article is structured as follows: “Introduction” introduces the study, “Related Work” covers related work, and “Materials and Methods” details the datasets, methodology, and system architecture. “Datasets Description” presents experiments and results, followed by a discussion in “Results”. “Conclusion and Future Work” concludes with future directions.

Related work

Many researchers have used machine learning for intelligent traffic monitoring, while others focused on deep learning with aerial images. For detection, various hand-crafted features were extracted, including SIFT (Battiato et al., 2007), Histogram of Oriented Gradient (HOG) (Kong et al., 2019; López-Sastre et al., 2019; Jalal, Khalid & Kim, 2020; Rizwan et al., 2023), and Haar-like features (Tang et al., 2017; Shahzad & Jalal, 2021). These feature vectors trained different classifiers but suffered from the curse of dimensionality, making them computationally expensive. Recently, deep learning models have excelled in object detection, especially in complex scenarios. To summarize current techniques, related work is categorized into machine learning- and deep learning-based traffic analysis.

Machine learning-based traffic scene analysis

Machine learning has long been used in computer vision, particularly for traffic monitoring. Rafique et al. (2023) developed a vehicle detection model using Haar-like features and an AdaBoost classifier. Tang et al. (2017) performed detection and classification via background subtraction and SIFT extraction, training neural networks and support vector machines (SVMs). Jabri et al. (2018) used Haar features and local binary patterns (LBP) with an AdaBoost classifier, which, when combined, improved results but increased energy consumption (Akhter, Jalal & Kim, 2021). Charouh, Ghogho & Guennoun (2019) proposed a moving vehicle detection method using background subtraction and morphological corrections. Yu et al. (2003) classified and tracked vehicles via image differencing and a Kalman filter. Mu, Hui & Zhao (2016) identified moving vehicles by selecting high sum of absolute differences (SAD) regions and applying SIFT for matching. Chen, Ellis & Velastin (2012) introduced an urban vehicle detection method using SVM and HOG features. However, these approaches are computationally intensive and struggle with complex scenes, affecting model generalizability.

Deep learning-based traffic scene analysis

Traditionally, traffic monitoring relies on manual methods and in-vehicle technologies. However, deep learning-based image processing has outperformed conventional approaches. Lin & Jhang (2022) proposed a car detection method using the YOLOv4 algorithm. Similarly, Al-qaness et al. (2021) developed a model that processes traffic video sequences with a CNN before applying YOLO for detection, though missing vehicles remain a challenge. Muchtar, Afdhal & Nasaruddin (2020) introduced a YOLOv3-based vehicle detection and classification approach, segmenting images with MOG2 before detection. Ammour et al. (2017) presented a vehicle detection and counting method combining a linear SVM classifier with a pre-trained CNN on high-resolution UAV imagery, but its high computational time limits real-time application.

This research aims to enhance modern computer vision technologies. Our system efficiently detects, tracks, and monitors traffic for intelligent, environmentally aware surveillance. By integrating multiple deep learning methodologies, it surpasses existing car detection and autonomous traffic surveillance systems in performance and accuracy.

Innovation over existing approaches

Our model advances vehicle detection and tracking in accuracy, scalability, and applicability. Unlike handcrafted feature-based methods like HOG-SVM and SIFT, which struggle with occlusions and complex backgrounds, we use YOLOv4, optimized for small object detection. Integrating FCM segmentation before detection reduces background interference, improving precision, recall, and F1-score in aerial imagery.

Many existing methods rely on computationally expensive region-based detectors like Faster region-based convolutional neural network (R-CNN), unsuitable for real-time UAV traffic monitoring. Our pipeline, using YOLOv4 and DeepSORT, is optimized for large-scale, real-time processing, ensuring efficient tracking in dynamic conditions. Unlike ground-based CCTV models, our UAV-specific approach integrates DeepSORT with ID assignment and SURF-based recovery for robust multi-frame tracking. Additionally, our vehicle trajectory estimation module enables a comprehensive analysis of traffic flow patterns, often missing in conventional detection models.

Materials and Methods

This article proposes a multi-stage vehicle recognition and tracking system for aerial images using semantic segmentation. After extracting and preprocessing frames, vehicles are segmented from the background using random forest (RF) and FCM clustering, improving detection by YOLOv4. RF, a supervised method, relies on labeled data, while FCM adapts to background variations without prior training. YOLOv4’s CSPDarknet53 backbone and PANet neck enhance feature extraction for reliable small-vehicle detection. Tracking is achieved using SURF extraction, a burst particle filter for position prediction, and DeepSORT with Kalman filtering for identity consistency in dense traffic. Finally, trajectory estimation maps vehicle movements for traffic analysis. Figure 1 illustrates the proposed system, with subsequent sections detailing each module.

Figure 1: The architecture of the proposed autonomous vehicle surveillance algorithm.

Download full-size image

DOI: 10.7717/peerj-cs.2835/fig-1

Preprocessing

In the preprocessing step, we extract a series of images from the video to achieve better outcomes for vehicle detection and tracking. Afterwards, each extracted frame is resized to 768 × 768 dimensions. To denoise the image, a nonlinear adaptation known as gamma correction (Xu et al., 2009) is applied to every pixel value using

(1) $O = I^{1 / G}$ where I is the input image, G represents the gamma value, which can be obtained using Eq. (2) and O denotes the output image scaled back to [0, 255].

(2) $G = \log \frac{0.5 \times 255}{S}$ where $S$ is the mean of the image after converting it to grayscale. Preprocessing plays a crucial role in enhancing image quality before segmentation, detection, and tracking. In our method, we remove Gaussian noise and motion blur common in UAV-captured imagery using median filtering to preserve edges while eliminating noise artifacts. Additionally, gamma correction is applied to adjust brightness variations caused by aerial lighting conditions. The gamma value (G) is selected adaptively using the formula G = mean ( $I_{g r a y}$ )/128, where $I_{g r a y}$ is the grayscale-converted image. This ensures that contrast enhancement is proportional to the image’s overall brightness, preventing overexposure or underexposure.

Semantic segmentation

Segmentation is a crucial preprocessing step in aerial image analysis, isolating vehicles from complex backgrounds to improve object detection efficiency. Aerial images often include irrelevant elements like buildings, vegetation, and shadows, leading to false detections and higher computational costs (Rafique et al., 2022). Using RF and FCM clustering reduces background noise, allowing the model to focus on vehicles. This enhances detection accuracy, minimizes misclassifications, and improves tracking by ensuring consistent vehicle identification across frames.

Random forest-based semantic segmentation

A random forest is a supervised classifier. The first step in segmentation involves extracting a feature vector from images and labels for model training. Feature extraction uses filters such as Canny edge detection (Yuan & Xu, 2016), Prewitt (Chaple, Daruwala & Gofane, 2015), Scharr (Kumar, Lal & Kumar, 2021), and a median filter. RF combines multiple decision trees; each tree outputs a class label, and the label with the most votes becomes the predicted output, as shown in Fig. 2.

Figure 2: Working methodology of random forest classifier for segmentation.

Download full-size image

DOI: 10.7717/peerj-cs.2835/fig-2

Each feature’s importance on the decision tree is calculated by Eq. (3).

(3) $f i_{i} = \frac{\sum_{j : n o d e j s p l i t s o n f e a t u r e i n i_{j}}}{\sum_{k \in a l l n o d e s} n i_{k}}$ where $f i_{i}$ represents the importance score of features $i$ , $n i_{j}$ denotes the importance contribution of node j that splits on feature I and $n i_{k}$ is the total importance contribution across all nodes k in the tree. Each calculated feature is normalized by using Eq. (4).

(4) $n o r m f i_{i} = \frac{f i_{i}}{\sum_{j \in a l l f e a t u r e s} f i_{i}} .$

Also, the overall importance of each input feature has been calculated by averaging it over all the Random Forest trees as given in Eq. (5).

(5) $R F f i_{i} = \frac{\sum_{j \in a l l t r e e s} n o r m f i_{i j}}{T}$ where $R F f i_{i}$ is the overall importance score of feature i in the random forest, $n o r m f i_{i j}$ represents the normalized importance of feature i in tree j, and T is the total number of trees in the forest.

We trained the model using the leave one subject out (LOSO) validation technique (Pauli, Pohl & Golz, 2021). Figure 3 visualizes the segmentation results: Fig. 3A displays the original frames, while Fig. 3B shows the segmentation using RF.

Figure 3: Segmentation result using random forest classifier.
(A) Original image frames (B) segmentation results.

Download full-size image

DOI: 10.7717/peerj-cs.2835/fig-3

Segmentation using fuzzy C-means algorithm

FCM is a segmentation algorithm where pixels are linked to multiple clusters (Miao, Zhou & Huang, 2020), reflecting fuzzy logic based on co-occurring elements. The objective function is optimized over iterations to complete the segmentation process (Ahmed, Jalal & Kim, 2020). During this process, clustering centers and membership degrees are updated. The performance index K_FCM calculates the weighted sum of distances between elements and cluster centers, as shown in Eq. (6).

(6) $K_{F C M} = (S, T) = \sum_{i = 1}^{c} \sum_{k = 1}^{N} t_{i k}^{p} ∥ u_{k} - v_{i} ∥^{2}, 1 < p < \infty$ where $K_{F C M} (S, T)$ represents the objective function of the FCM clustering algorithm, $t_{i k}$ is the membership degree of data point kkk belonging to cluster i, $u_{k}$ is the data point, $v_{i}$ is the center of cluster i, and p is the fuzziness parameter (with 1 < p < $\infty$ ). The degree of membership function satisfies the conditions given in Eq. (7).

(7) $K_{F C M} = {T \in R_{c n} ∣ \forall 1 ⩽ i ⩽ c, 1 ⩽ k ⩽ N t_{i k} \in [0, 1]; \sum_{i = 1}^{c} t_{i k} = 1; 0 < \sum_{i = 1}^{c} t_{i k} < N} .$

$K_{F C M}$ is the feasible membership matrix in the Fuzzy C-Means algorithm, where $t_{i k}$ denotes the membership degree of data point k to cluster i, c is the number of clusters, and N is the total number of data points. The conditions ensure membership values are between 0 and 1, the sum of memberships for each data point across all clusters equals 1, and each cluster has non-zero total membership.

Equations (8) and (9) are used to update the membership matrix and the cluster centres.

(8) $t_{i k}^{p} = \frac{1}{\sum_{j = 1}^{c} {(\frac{d_{i k}^{2}}{d_{j k}^{2}})}^{\frac{1}{p - 1}}}$ where $t_{i k}^{p}$ represents the membership degree of data point k to cluster i raised to the power of p, $d_{i k}$ is the distance between data point k and cluster center i, and $d_{j k}$ is the distance between data point k and other cluster centers j. The equation calculates the fuzzy membership based on the relative distances, where p is the fuzziness parameter.

(9) $v_{j} = \frac{\sum_{k = 1}^{N} t_{i k}^{p} u_{k}}{\sum_{k = 1}^{N} t_{i k}^{p}}$ where $v_{j}$ represents the center of cluster j, $t_{i k}^{p}$ is the membership degree of data point k to cluster i raised to the power of p, and $u_{k}$ is the data point. The equation calculates the weighted average of the data points, with the membership degrees serving as the weights, to determine the new center of cluster j.

Pixels are given high membership values when they are near the centroid of their respective class, and low membership values when they are far from the centroid, thus, the FCM objective function was minimized. The output of the FCM segmentation is shown in Fig. 4. Figure 4A presents the original images whereas the segmented images are shown in Fig. 4B.

Figure 4: Segmentation result using FCM algorithm.
(A) Original image frames (B) segmentation results.

Download full-size image

DOI: 10.7717/peerj-cs.2835/fig-4

For segmentation, we use RF and FCM to separate vehicles from the background. RF segmentation employs supervised learning with features extracted via Canny edge detection, Prewitt, Scharr, and median filters, classified by an ensemble of decision trees trained using LOSO validation. FCM clustering assigns membership values to pixels, enabling soft clustering that minimizes errors, especially in occluded areas. The computational cost and error rates of RF and FCM are compared using Eq. (10).

(10) $E r r o r R a t e = 1 - a c c u r a c y$

The RF classifier requires training on the ground truth, increasing the model’s computational complexity. In contrast, FCM can be applied without explicit training, enhancing the algorithm’s generalizability. Table 1 shows the error rates for image segmentation using RF and FCM on the UAVDT and KIS-AIS datasets. Considering both computational time and error rates, FCM proves to be more effective and accurate. Therefore, FCMP results are used for further processing, such as vehicle detection, ID allocation, recovery, counting, and tracking. Table 2 shows comparsion of segmentation techniques.

Table 1:

Error rate comparison of random forest and FCM segmentation.

Datasets	Error rate
Datasets	Random forest	FCM
UAVDT	0.37	0.19
KIT-AIS	0.43	0.24

DOI: 10.7717/peerj-cs.2835/table-1

Table 2:

Accuracies comparison of random forest and FCM segmentation.

Datasets	Segmentation accuracies
Datasets	Random forest	FCM
UAVDT	0.63	0.81
KIT-AIS	0.57	0.76

DOI: 10.7717/peerj-cs.2835/table-2

Comparison of FCM and RF segmentation

To evaluate the effectiveness of FCM and RF segmentation, we compared their computational efficiency, training dependency, adaptability to varying backgrounds, and robustness against noise. The following Table 3 presents a quantitative comparison based on segmentation performance on the UAVDT and KIT-AIS datasets.

Table 3:

Measurements of precision, recall, and F1 score for vehicle detection.

Datasets	Precision	Recall	F₁ score	Quality
UAVDT	0.82	0.80	0.81	0.68
KIT-AIS	0.80	0.81	0.80	0.67

DOI: 10.7717/peerj-cs.2835/table-3

RF segmentation uses hard classification, leading to over-segmentation or misclassification, especially in aerial imagery with shadows and varying lighting. FCM’s soft clustering ensures smoother boundaries, crucial for UAV-based traffic monitoring where motion blur and occlusions affect vehicle edges. Unlike RF, FCM adapts to varying road conditions and shadows, offering greater robustness and reducing segmentation artifacts, thus improving vehicle detection accuracy.

Vehicle detection

After image segmentation, vehicle detection is performed using the YOLOv4 deep learning algorithm, which detects objects with high accuracy and speed. YOLOv4’s architecture includes CSPDarknet53, PANet, and the YOLOv3 head, with CNNs splitting images into regions to predict probabilities. This reduces overfitting by using convolution and fewer parameters, as shown in Fig. 5.

(11) $G [i, j] = \sum_{u = - k}^{k} \sum_{v = - k}^{k} h [u, v] F [i + u, j + v] .$

Figure 5: The architecture of YOLOv4 for vehicle detection.

Download full-size image

DOI: 10.7717/peerj-cs.2835/fig-5

G[i, j] represents the output at position (i, j), h[u, v] is the filter value at (u, v), and F[i + u, j + v] is the input at the shifted position. The equation performs a convolution by summing the product of the filter and input values over a kernel-defined neighborhood. The CNN uses a pooling layer for downsampling, as defined in Eq. (12).

(12) $f^{(m)} = {[f_{1}^{m}, \dots \dots f_{k}^{m} \dots \dots f_{K}^{m}]}^{T}, f_{k}^{m} = m a x_{x \in X_{i}} x$ where $f_{k}^{m}$ is the maximum value of the pixel after each iteration. Convolutional and pooling layers work together to facilitate effective feature learning and selection. In the end, the Mish activation function is implemented to get the output label as given in Eq. (13).

(13) $f (x) = x \cdot t a n h (ς (x))$ where $ς (x) = l n (1 + e^{x})$ represents the soft plus activation function.

When the performance was compared, YOLOv4 was two times faster than EfficientDet (a competitive recognition model). For vehicle detection, we utilize YOLOv4, which consists of a CSPDarknet53 backbone, a PANet path-aggregation neck, and a YOLOv3 head. We configure YOLOv4 with batch size = 16, momentum = 0.937, learning rate = 0.001, and IoU threshold = 0.5 to balance accuracy and computational efficiency in UAV imagery. Table 4 shows comparsion of YOLOv4 with other methods.

Table 4:

Measurements of precision, recall, and F1 score for vehicle tracking algorithm.

Datasets	Precision	Recall	F₁ score	Quality
UAVDT	0.87	0.85	0.86	0.76
KIT-AIS	0.83	0.80	0.81	0.69

DOI: 10.7717/peerj-cs.2835/table-4

ID allocation and recovery

Each detected vehicle is assigned a unique ID based on SURF features, which provide reliable, scale-invariant, and efficient image comparison (Bay et al., 2008). SURF features are fast to compute, making them suitable for real-time object recognition. The method includes an interest point detector and descriptor (Du, Su & Cai, 2009), with interest points identified using an integral computed by Eq. (14).

(14) $I_{Σ} (x) = \sum_{i = 0}^{x} \sum_{j = 0}^{y} I (i, j)$ where $I_{Σ} (x)$ is the integral image at the location $x = {(x, y)}^{T}$ and $I (i, j)$ is the input image. Moreover, the blob-like structure in the image is calculated by using the Hessian matrix $H$ as given in Eq. (15).

(15) $H (x, σ) = [\begin{matrix} L_{x x} (x, σ) L_{x y} (x, σ) \\ L_{x y} (x, σ) L_{y y} (x, σ) \end{matrix}]$ where $H (x, σ)$ represents the Hessian matrix at point x with scale σ, $L_{x x} (x, σ)$ is the second derivative of the image with respect to x, $L_{x y} (x, σ)$ is the mixed second derivative with respect to x and y, and $L_{y y} (x, σ)$ is the second derivative with respect to y. The Hessian matrix captures the curvature of the image, which is used in various image processing tasks like edge detection and feature extraction. The assigned IDs can be recovered based on the feature-matching score in the succeeding frame detections.

For ID assignment and recovery, we employ SURF to extract key descriptors for each detected vehicle. An ID is successfully recovered if the number of feature matches between frames exceeds a predefined threshold (set to eight in our experiments). If fewer than eight matches are found, the vehicle is registered as a new entry, ensuring a balance between continuity and robustness in tracking.

Vehicle counting

For traffic situation analysis, vehicle counting is performed in each image frame based on vehicle detections. All the detections made by YOLOv4 are recorded using a counter, as given in Eq. (16). Vehicle counting in each frame estimates traffic density on the road at different time intervals, which can be further used to predict spontaneous reactions to traffic jams or other unhappy incidents.

(16) $V e h i c l e C o u n t = \sum_{i = 1}^{N} T$ where T is the vehicle detections in a frame.

Vehicle tracking

To track the movement of vehicles frame by frame, we implemented DeepSORT (Wojke, Bewley & Paulus, 2017; Hou, Wang & Chau, 2019; Kapania et al., 2020; Pramanik et al., 2022). DeepSORT is a tracking algorithm that combines the Kalman filter with deep learning to track objects not only based on their motion and velocity but also on their appearance. The motion information is incorporated using the Mahalanobis distance between the Kalman state and the newly arrived measurement by using Eq. (17).

(17) $d^{(1)} (i, j) = {(d_{j} - y_{i})}^{T} S_{i}^{- 1} (d_{j} - y_{i})$ where $d^{(1)} (i, j)$ represents the Mahalanobis distance between data point $d_{j}$ and the mean $y_{i}$ of cluster i, $S_{i}^{- 1}$ is the inverse of the covariance matrix of cluster i, and $(d_{j} - y_{i})$ is the difference between data point $d_{j}$ and the cluster center $y_{i}$ . The equation computes the squared distance, weighted by the inverse covariance, which measures how far $d_{j}$ is from the cluster center in terms of the cluster’s spread.

The appearance information can be calculated using the minimum cosine distance between i-th and j-th detection in appearance space as

(18) $d^{(2)} (i, j) = m i n {1 - r_{j}^{T} r_{k}^{(i)} | r_{k}^{(i)} \in R_{i}}$ where $d^{(2)} (i, j)$ represents the minimum distance between data point j and the points in cluster i, $r_{j}$ is the feature vector of data point j, and $r_{k}^{(i)}$ is the feature vector of a point in cluster i. The equation calculates the minimum value of the distance measure $1 - r_{j}^{T} r_{k}^{(i)}$ , where $r_{j}^{T} r_{k}^{(i)}$ is the dot product, and $R_{i}$ is the set of all points in cluster i, ensuring the closest match in the feature space.

(19) $c_{i, j} = λ d^{(1)} (i, j) + (1 - λ) d^{(2)} (i, j)$ where $c_{i, j}$ represents the combined distance measure between data point j and cluster i, $d^{(1)} (i, j)$ is the Mahalanobis distance (as defined earlier), and $d^{(2)} (i, j)$ is the minimum distance measure (as defined earlier). The parameter λ is a weight that balances the contribution of both distance measures, with λ controlling the influence of $d^{(1)}$ and (1−λ) controlling the influence of $d^{(2)}$ .

We use a pre-trained CNN model with two convolution layers, one max pooling layer, six residual layers connected to a dense layer, and L2 normalization to calculate the appearance features. Figure 6 shows the tracking results in each succeeding frame.

Figure 6: Tracking of multiple vehicles across the image frames extracted from the video.

Download full-size image

DOI: 10.7717/peerj-cs.2835/fig-6

We integrate DeepSORT, an extension of SORT, by adding deep appearance-based feature extraction with Kalman filtering. The Mahalanobis distance associates new detections with tracklets, while a cosine similarity-based re-identification model maintains tracking consistency when objects temporarily exit the frame. Table 5 shows comparsion of DeepSORT with other techniques.

Table 5:

The performance metrics of the ID recovery module.

The vehicle similarity threshold was set to 8.

Datasets	AID rate (%)	Recovery rate (%)
UAVDT	65	62
KIT-AIS	63	59

DOI: 10.7717/peerj-cs.2835/table-5

Vehicle trajectories estimation

We track vehicle movement by estimating and plotting each vehicle’s trajectory using the geometric coordinates of detected boxes. The DeepSORT algorithm provides the location data, from which we calculate and mark center points. The algorithm inputs detection coordinates into the DeepSORT tracker, which predicts vehicle locations in subsequent frames. SURF features are extracted and matched to recover vehicle IDs. If matches exceed a threshold, the ID is retrieved; otherwise, a new ID is assigned. Rectangular coordinates and midpoints are used to mark vehicle trajectories. Detailed steps are provided in Algorithm 1.

Algorithm 1: :

Trajectory estimation of tracked vehicles.

Input: V

= {H^{1}, H^{2}, . . . ., H^{z}}

// list of rectangular coordinates of length

{x_{1}, x_{2}, y_{1}, y_{2}}

Output:

{P^{r}}_{r = 1}^{R}

// vehicle trajectories

T = {T^{1}, T^{2}, . . . ., T^{z}} \leftarrow

DeepSORT(V) //Estimated positions of vehicles by tracker

feature_vector=[]

For i = 1 to the length of T

new_feature

\leftarrow

SURF(Tⁱ)

If feature_vector==[ ]

feature_vector

\leftarrow

new_feature

Else

matches

\leftarrow

(new_feature, feature_vector)

If matches > 8

AssignID(Tⁱ)

Else

feature_vector

\leftarrow

new_feature

x_{1}^{i}, x_{2}^{i}, y_{1}^{i}, y_{2}^{i} \leftarrow

ExtractRectangularCoordinatesofVehicle ()

x_{c e n t e r} \leftarrow \frac{(x_{1}^{i} + x_{2}^{i})}{2}

y_{c e n t e r} \leftarrow \frac{(y_{1}^{i} + y_{2}^{i})}{2}

P^{r} \leftarrow [X_{c e n t e r}, y_{c e n t e r}]

End for

return vehicle trajectories

DOI: 10.7717/peerj-cs.2835/table-101

Datasets description

For vehicle surveillance, two datasets were used: UAVDT and KIT-AIS datasets. A detailed description of each dataset is mentioned in the following subsections.

UAVDT dataset

The UAVDT dataset (Yu et al., 2020) consists of more than 10 h of video captured using a UAV platform in various urban settings. These traffic scenes include intersections, toll plazas, motorways, crossings, and arterial roadways. The .jpg images have a resolution of 1,080 × 540 pixels, while the videos were shot at a frame rate of 30 (fps).

KIT-AIS dataset

The Karlsruher Institut fur Technologie ¨ Aerial Image Sequences (KIT-AIS) dataset (Beheim, 2021) consists of 299 image frames with .jpg format extracted from video sequences. This dataset is provided by German Aerospace. It consists of five different classes, i.e., cars, truck, bus, minibus, and cyclist.

Dataset partitioning and augmentation strategies

Our study uses two publicly available aerial image datasets: UAVDT and KIT-AIS, each containing varied traffic scenarios. Both datasets were split into training, validation, and test sets as follows:

UAVDT dataset: With its large number of images, we applied an 80-10-10 split (80% for training, 10% for validation, and 10% for testing), ensuring broad training conditions and unbiased performance evaluation.
KIT-AIS dataset: Due to its smaller size, we enhanced the dataset with random rotation (±15°), horizontal flipping, brightness adjustment, and Gaussian noise addition. This augmented the dataset, creating a 70% training, 15% validation, and 15% test split.

We augmented the KIT-AIS dataset with random rotation (±15°), horizontal flipping, brightness adjustment, and Gaussian noise to simulate real-world variations like lighting, vehicle orientations, and image noise. This improved model generalizability, ensuring robust training and evaluation across diverse aerial traffic scenarios.

Results

Experiments were conducted on a Windows 10 (64-bit) system with an Intel Core i5-7200U processor, 8 GB RAM, and an NVIDIA GeForce GTX 1650 GPU (4 GB VRAM) for deep learning acceleration. The model was implemented in Python 3.7 using TensorFlow 2.4.1 (YOLOv4), OpenCV 4.5.1 (image preprocessing), Scikit-learn 0.24.2 (Random Forest segmentation), SciPy 1.6.2 (FCM clustering), and a custom DeepSORT implementation for multi-object tracking.

We employed seven evaluation parameters to assess the performance of our presented methodology. To compare the performance of random forest-based segmentation and FCM segmentation we used the accuracy as given in Eq. (20).

(20) $A c c u r a c y = \frac{T P + T N}{T P + F P + F N}$ where (TP + FP + FN) is the total area of ground truth and prediction, and TP and TN indicate the area of intersection. However, the effectiveness of vehicle detection and tracking is assessed using four evaluation metrics, namely precision, recall, quality, and F1 score as calculated by using Eqs. (21), (22), (23) and (24).

(21) $P r e c i s i o n = \frac{T P}{T P + F P}$

Precision measures the proportion of correctly identified vehicles among all detected instances. A high precision score indicates that the system effectively minimizes false positives, ensuring that non-vehicle objects are not mistakenly classified as vehicles.

(22) $R e c a l l = \frac{T P}{T P + F N}$

Recall represents the proportion of correctly identified vehicles relative to the total number of actual vehicles present in the dataset. A higher recall value indicates that the system successfully captures a greater percentage of vehicles, minimizing false negatives.

(23) $Q u a l i t y = \frac{T P}{T P + F P + F N}$

Quality is a holistic performance measure that incorporates the correctness of detections and tracking stability. This metric is particularly relevant to our system as it evaluates how well the detected vehicles are maintained across frames, ensuring consistency in tracking results.

(24) $F 1 s c o r e = \frac{2 (P r e c i s i o n \times R e c a l l)}{(P r e c i s i o n + R e c a l l)}$

F1-score is the harmonic mean of precision and recall, providing a balanced measure of the system’s accuracy. Since precision and recall often exhibit a trade-off, F1-score ensures that both aspects are considered, making it a robust evaluation metric for vehicle detection and tracking.

Where $T P$ is a true positive, $T N$ denotes true negative, $F P$ is a false positive, and $F N$ is a false negative, respectively.

We utilized two new measures to evaluate the ID assignment and recovery module as given in Eqs. (25) and (26). AID is the accurate ID rate, which denotes the percentage of accurate ID numbers allotted to vehicles.

(25) $A I D R a t e = \frac{\sum_{i = 1}^{N} A I D s_{i}}{\sum_{i = 1}^{N} I D_{i}}$ where N is the total number of vehicles. $A I D s_{i}$ denotes the overall number of ID assignments made to the true vehicles, and $I D_{i}$ denotes all of them. The $R e c o v e r y R a t e$ represents the percentage of true IDs recovered.

(26) $R e c o v e r y R a t e = \frac{\sum_{i = 1}^{N} T R e C o v e r s_{i}}{R e C o v e r s}$ where N represents the total number of dissimilar vehicles. $T R e C o v e r s_{i}$ represents the i-th number of true recoveries and $R e C o v e r s$ is the all-existing recoveries.

We compared the random forest and FCM segmentation methods based on accuracy and computational time. While random forest requires custom training, increasing its computational cost, FCM yields better segmentation results. Therefore, FCM was selected for our algorithm. Accuracy results for both methods are shown in Table 6.

Table 6:

Precision comparison of the proposed approach with State-of-the-art vehicle detection techniques.

Datasets	Models	Precision
UAVDT	NDFT (Cao et al., 2020)	0.520
	SpotNet (Perreault et al., 2020)	0.528
	Our method	0.82
KIT-AIS	Beheim (2021)	0.79
KIT-AIS	Our method	0.80

DOI: 10.7717/peerj-cs.2835/table-6

In this experiment, precision, recall, F1 score, and quality across the three datasets were used to assess how well our vehicle recognition and tracking system performed. The performance metrics for the proposed vehicle detection algorithm are displayed in Table 7. Table 8 shows the performance evaluation for the DeepSORT-based Tracking algorithm. Table 9 shows the performance evaluation of ID recovery module.

Table 7:

Precision comparison of the proposed approach with existing methods for vehicle tracking.

Datasets	Model	Precision
UAVDT	ASRDCT (Ge et al., 2023)	0.76
	Shape-based matching (Leitloff et al., 2014)	0.32
	Our method	0.87
KIT-AIS	AerialMPTNet (Azimi et al., 2020)	0.71
	SIFT features (Mu, Hui & Zhao, 2016)	0.62
	Our method	0.83

DOI: 10.7717/peerj-cs.2835/table-7

Table 8:

Measurements of precision, recall, and F1 score for vehicle tracking algorithm.

Datasets	Precision	Recall	F₁ score	Quality
UAVDT	0.87	0.85	0.86	0.76
KIT-AIS	0.83	0.80	0.81	0.69

DOI: 10.7717/peerj-cs.2835/table-8

Table 9:

The performance metrics of the ID recovery module.

The vehicle similarity threshold was set to 8.

Datasets	AID rate (%)	Recovery rate (%)
UAVDT	65	62
KIT-AIS	63	59

DOI: 10.7717/peerj-cs.2835/table-9

In this experiment, we have drawn a comparison of our proposed model with other popular algorithms. Table 10 represents a comparison between our presented detection algorithm and other methods.

Table 10:

Precision comparison of the proposed approach with State-of-the-art vehicle detection techniques.

Datasets	Models	Precision
UAVDT	NDFT (Cao et al., 2020)	0.520
	SpotNet (Perreault et al., 2020)	0.528
	Our method	0.82
KIT-AIS	Beheim (2021)	0.79
KIT-AIS	Our method	0.80

DOI: 10.7717/peerj-cs.2835/table-10

Table 11 depicts the comparison of our proposed tracking algorithm. As can be seen, our model performs better than other state-of-the-art methods.

Table 11:

Precision comparison of the proposed approach with state-of-the-art vehicle tracking methods.

Datasets	Model	Precision
UAVDT	ASRDCT (Ge et al., 2023)	0.76
	Shape-based matching (Leitloff et al., 2014)	0.32
	Our method	0.87
KIT-AIS	AerialMPTNet (Azimi et al., 2020)	0.71
	SIFT features (Mu, Hui & Zhao, 2016)	0.62
	Our method	0.83

DOI: 10.7717/peerj-cs.2835/table-11

Discussion on error propagation and mitigation strategies

Our model demonstrates high accuracy in vehicle detection and tracking but has limitations. Error propagation, due to incorrect segmentation, can lead to missed or misclassified vehicles. This can be mitigated with adaptive error correction, such as feedback loops and motion-based outlier rejection. Scalability is also a concern in high-density urban traffic, where occlusions and overlapping detections complicate ID assignment. Future improvements may include multi-frame aggregation and attention-based deep learning models to reduce ID switches. Additionally, computational efficiency must be optimized for large-scale surveillance, with lighter architectures like YOLOv5 or YOLOv8 and distributed computing to enhance real-time processing. Addressing these challenges will improve model robustness, scalability, and efficiency.

Sensitivity analysis

We analyzed key parameters affecting segmentation, detection, and tracking performance, focusing on gamma correction in preprocessing and the similarity threshold in ID recovery. Gamma correction enhances image contrast, influencing segmentation accuracy. Testing values between 0.6 and 2.0 showed that gamma values of 1.2 to 1.5 yield the best results, avoiding under- or over-enhancement. For ID recovery in DeepSORT, the similarity threshold affects identity reassignment across frames. Testing thresholds from 0.4 to 0.8 revealed that 0.6 provides the highest tracking stability, minimizing identity switches. Fine-tuning these parameters improves system robustness and generalization across diverse environments.

Conclusion and future work

This article presents an autonomous vehicle surveillance system using aerial images, integrating segmentation, detection, tracking, and trajectory estimation for traffic monitoring. Fuzzy C-means segmentation distinguishes vehicles from the background, followed by YOLOv4 for detection. SURF assigns unique IDs for consistent tracking, with DeepSORT and Kalman filtering ensuring robust multi-object tracking. Experimental results on UAVDT and KIT-AIS datasets show our method outperforms conventional models in precision, recall, F1-score, and tracking quality. Future research will focus on expanding to diverse urban datasets, incorporating advanced deep learning (e.g., DETR, YOLOv8), and optimizing the system for edge deployment to improve scalability and real-time performance.

Supplemental Information

Preprocessing stage (a) traffic video (b) extracted image from the video (c) original frame (d) filtered frame after gamma correction.

DOI: 10.7717/peerj-cs.2835/supp-5

Download

(Updated_Img_of_UAVDT)_Vehicle_detection_output_using_YOLOv4_over_UAVDT_and_KIT-AIS.

DOI: 10.7717/peerj-cs.2835/supp-6

Download

ID allocation and recovery (a) ID assigned to each detected vehicle in the frame (b) SURF feature extraction and matching for each vehicle (c) IDs recovered across frames.

DOI: 10.7717/peerj-cs.2835/supp-7

Download

Density estimation using vehicle count across image frames.

DOI: 10.7717/peerj-cs.2835/supp-8

Download

Estimation of vehicle trajectories (a) plotting and joining the center points of possible location of the tracked vehicle in each frame (b) resulting trajectories.

DOI: 10.7717/peerj-cs.2835/supp-9

Download

[1] Ahmed A, Jalal A, Kim K. 2020. A novel statistical method for scene classification based on multi-object categorization and logistic regression. Sensors (Switzerland) 20(14):1-20

[2] Ahmed A, Jalal A, Rafique AA. 2019. Salient segmentation based object detection and recognition using hybrid genetic transform.

[3] Akhter I, Jalal A, Kim K. 2021. Pose estimation and detection for event recognition using sense-aware features and adaboost classifier.

[4] Al-qaness MAA, Abbasi AA, Fan H, Ibrahim RA, Alsamhi SH, Hawbani A. 2021. An improved YOLO-based road traffic monitoring system. Computing 103(2):211-230

[5] Ammour N, Alhichri H, Bazi Y, Benjdira B, Alajlan N, Zuair M. 2017. Deep learning approach for car detection in UAV imagery. Remote Sensing 9(4):312

[6] Aroef C, Rivan Y, Rustam Z. 2020. Comparing random forest and support vector machines for breast cancer classification. TELKOMNIKA (Telecommunication Computing Electronics and Control) 18(2):815-821

[7] Azimi SM, Kraus M, Bahmanyar R, Reinartz P. 2020. Multiple pedestrians and vehicles tracking in aerial imagery: a comprehensive study. ArXiv preprint

[8] Battiato S, Gallo G, Puglisi G, Scellato S. 2007. SIFT features tracking for video stabilization.

[9] Bay H, Ess A, Tuytelaars T, Van Gool L. 2008. Speeded-up robust features (SURF) Computer Vision and Image Understanding 110(3):346-359

[10] Beheim T. 2021. Multi-vehicle detection and tracking in aerial imagery sequences using deep learning algorithms. Doctoral dissertation. Technical University of Munich.

[11] Bozcan I, Kayacan E. 2020. AU-AIR: a multi-modal unmanned aerial vehicle dataset for low altitude traffic surveillance.

[12] Cao J, Cholakkal H, Anwer RM, Khan FS, Pang Y, Shao L. 2020. D2DET: towards high quality object detection and instance segmentation.

[13] Chaple GN, Daruwala RD, Gofane MS. 2015. Comparisions of Robert, Prewitt, Sobel operator based edge detection methods for real time uses on FPGA.

[14] Charouh Z, Ghogho M, Guennoun Z. 2019. Improved background subtraction-based moving vehicle detection by optimizing morphological operations using machine learning.

[15] Chen Z, Ellis T, Velastin SA. 2012. Vehicle detection, tracking and classification in urban traffic.

[16] De Moraes RS, De Freitas EP. 2020. Multi-UAV based crowd monitoring system. IEEE Transactions on Aerospace and Electronic Systems 56(2):1332-1345

[17] Dikbayir HS, Bulbul HI. 2020. Deep learning based vehicle detection from aerial images.

[18] Du G, Su F, Cai A. 2009. Face recognition using SURF features.

[19] Ge D-Y, Yao X-F, Xiang W-J, Chen Y-P. 2023. Vehicle detection and tracking based on video image processing in intelligent transportation system. Neural Computing and Applications 35(3):2197-2209

[20] Hashemi SE, Gholian-Jouybari F, Hajiaghaei-Keshteli M. 2023. A fuzzy C-means algorithm for optimizing data clustering. Expert Systems with Applications 227(1):120377

[21] Hou X, Wang Y, Chau LP. 2019. Vehicle tracking using deep SORT with low confidence track filtering.

[22] Jabri S, Saidallah M, El Belrhiti El Alaoui A, El Fergougui A. 2018. Moving vehicle detection using haar-like, LBP and a machine learning adaboost algorithm.

[23] Jalal A, Khalid N, Kim K. 2020. Automatic recognition of human interaction via hybrid descriptors and maximum entropy Markov model using depth sensors. Entropy 22(8):817

[24] Kapania S, Saini D, Goyal S, Thakur N, Jain R, Nagrath P. 2020. Multi object tracking with UAVs using deep SORT and YOLOv3 RetinaNet detection framework.

[25] Ke R, Li Z, Kim S, Ash J, Cui Z, Wang Y. 2017. Real-time bidirectional traffic flow parameter estimation from aerial videos. IEEE Transactions on Intelligent Transportation Systems 18(4):890-901

[26] Kejriwal R, Ritika HJ, Arora A, Mohana. 2022. Vehicle detection and counting using deep learning basedYOLO and deep SORT algorithm for urban traffic management system. ICEEICT 58:1-6

[27] Kong X, Chen Q, Gu G, Ren K, Qian W, Liu Z. 2019. Particle filter-based vehicle tracking via HOG features after image stabilisation in intelligent drive system. IET Intelligent Transport Systems 13(6):942-949

[28] Kumar AA, Lal N, Kumar RN. 2021. A comparative study of various filtering techniques.

[29] Leitloff J, Rosenbaum D, Kurz F, Meynberg O, Reinartz P. 2014. An operational system for estimating road traffic information from aerial images. Remote Sensing (Basel) 6(11):11315-11341

[30] Lin C-J, Jhang J-Y. 2022. Intelligent traffic-monitoring system based on YOLO and convolutional fuzzy neural networks. IEEE Access 10:14120-14133

[31] López-Sastre RJ, Herranz-Perdiguero C, Guerrero-Gómez-olmedo R, Oñoro-Rubio D, Maldonado-Bascón S. 2019. Boosting multi-vehicle tracking with a joint object detection and viewpoint estimation sensor. Sensors (Switzerland) 19(19):1-24

[32] Luo X, Wang Y, Cai B, Li Z. 2021. Moving object detection in traffic surveillance video: new MOD-AT method based on adaptive threshold. ISPRS 10(11):742

[33] Miao J, Zhou X, Huang TZ. 2020. Local segmentation of images using an improved fuzzy C-means clustering algorithm based on self-adaptive dictionary learning. Applied Soft Computing 91(2–3):106200

[34] Mu K, Hui F, Zhao X. 2016. Multiple vehicle detection and tracking in highway traffic surveillance video based on SIFT feature matching. Journal of Information Processing Systems 12(2):183-195

[35] Muchtar K, Afdhal A, Nasaruddin N. 2020. Convolutional network and moving object analysis for vehicle detection in highway surveillance videos.

[36] Najiya KV, Archana M. 2018. UAV video processing for traffic surveillence with enhanced vehicle detection. ICICCT 5:662-668

[37] Omar W, Oh Y, Chung J, Lee I. 2021. Aerial dataset integration for vehicle detection based on YOLOv4. Korean Journal of Remote Sensing 37:747-761

[38] Pauli MP, Pohl C, Golz M. 2021. Balanced leave-one-subject-out cross-validation for microsleep classification. Current Directions in Biomedical Engineering 7(2):147-150

[39] Perreault H, Bilodeau GA, Saunier N, Heritier M. 2020. SpotNet: self-attention multi-task network for object detection.

[40] Pramanik A, Pal SK, Maiti J, Mitra P. 2022. Granulated RCNN and multi-class deep SORT for multi-object detection and tracking. IEEE Transactions on Emerging Topics in Computational Intelligence 6(1):171-181

[41] Puertas E, De-las-heras G, Fernández-Andrés J, Sánchez-Soriano J. 2022. Dataset: roundabout aerial images for vehicle detection. Data. 7:47

[42] Rafique AA, Al-Rasheed A, Ksibi A, Ayadi M, Jalal A, Alnowaiser K, Meshref H, Shorfuzzaman M, Gochoo M, Park J. 2023. Smart traffic monitoring through pyramid pooling vehicle detection and filter-based tracking on aerial images. IEEE Access 11:2993-3007

[43] Rafique AA, Gochoo M, Jalal A, Kim K. 2022. Maximum entropy scaled super pixels segmentation for multi-object detection and scene recognition via deep belief network. Multimedia Tools and Applications 82(9):13401-13430

[44] Ringwald T, Sommer L, Schumann A, Beyerer J, Stiefelhagen R, Hci C. 2019. UAV-Net: a fast aerial vehicle detector for mobile platforms.

[45] Rizwan SA, Alsufyani N, Shorfuzzaman M, Alarfaj M, Jalal A, Kim K. 2023. Automatic fiducial points detection for multi-facial expressions via invariant features and multi-layer kernel sliding perceptron. Journal of Electrical Engineering & Technology 18:651-661

[46] Schreuder M, Hoogendoorn SP, Van Zulyen HJ, Gorte B, Vosselman G. 2003. Traffic data collection from aerial imagery.

[47] Shahzad AR, Jalal A. 2021. A smart surveillance system for pedestrian tracking and counting using template matching.

[48] Tang Y, Zhang C, Gu R, Li P, Yang B. 2017. Vehicle detection and recognition for intelligent traffic surveillance system. Multimedia Tools and Applications 76(4):5817-5832

[49] Weng SK, Kuo CM, Tu SK. 2006. Video object tracking using adaptive Kalman filter. Journal of Visual Communication and Image Representation 17(6):1190-1208

[50] Wojke N, Bewley A, Paulus D. 2017. Simple online and realtime tracking with a deep association metric.

[51] Wu KL, Yang MS. 2007. Mean shift-based clustering. Pattern Recognition 40(11):3035-3052

[52] Xu G, Su J, Pan H, Zhang Z, Gong H. 2009. An image enhancement method based on gamma correction. ISCID 15:60-63