The intelligent decision model for determine the best path of transportation on smart city using random forest algorithm and bayesian optimization (RF-BO)

— This study investigated various approaches and algorithms in the context of object detection and best path determination for managing vehicular traffic in an urban environment, particularly in Palembang City. This research was a step towards the development of the smart city concept. In the object detection analysis, we applied the YOLOv3 method on video footage to identify vehicles, resulting in mAP accuracy rates between 72.7 % to 79.3 % for both motorcycle and car categories. The total detection accuracy of the model reached 76.0 %. Next, we adopted the Random Forest algorithm to classify traffic conditions into three classes: smooth, moderate, and congested. After optimizing the algorithm with Bayesian optimization, the model accuracy increased from 89 % to 92 %, while the classification accuracy increased from 91.6 % to 92.3 %. When the model fits the training data too well, leading to a decline in performance on test data, overfitting could be prevented by carefully optimizing the parameters using Bayesian optimization. The A* algorithm would then process the classification results from Random Forest optimization by Bayesian optimization using the Heuristic Search method, considering road conditions and travel distance. This would reveal that path 5, which runs from SMK PGRI 1 Palembang to Bom Baru Jl Perintis Kemerdekaan Arah Charitas (STMIK MBC), was most frequently selected nine times out of twelve trials. This approach was chosen because it had the lowest travel distance compared to other options and tended to have ”smooth” traffic levels. Choosing the optimal path also considered the road width factor, where wider roads could reduce traffic density and the risk of congestion.


I. INTRODUCTION
Traffic congestion is one of the most common problems faced in daily routines, hampering mobility and often causing delays in reaching destinations.The ever-increasing population growth rate significantly exacerbates the situation, leading to higher congestion on roads and more motorists.In the face of congestion, drivers often look for alternative routes with the shortest distance to travel to waste as little time as possible.However, the lack of information about the state or condition of the road on the route to be passed can also hinder the search for alternative routes that can be used as the best route to get to the destination if the main route has obstacles [1].
The use of AI technology continues to grow rapidly, especially in the field of computer vision.Computer vision refers to technology that is capable of making intelligent decisions based on analyzing objects in the real world through visual imagery.It serves to replicate human visual capabilities so that the system can "see" objects around it and convert that information into meaningful decisions.Intelligent Transportation System (ITS) is one of the implementations of advanced technology in the fields of electronics, computers, and telecommunications used to improve the overall efficiency and effectiveness of transportation [2].The intelligent decision model for determine the best path of transportation on smart city • • • Therefore, the focus of this research is the application of Intelligent Transportation within the scope of smart cities.
A smart City is a city concept that is intelligent in managing resources to provide better services to the community, both through efficient management and anticipation of unexpected situations [3], [4].The application of CCTV is becoming common in many cities in Indonesia, including Palembang, as evidence of traffic violations as well as a traffic flow monitoring tool [5], [6].
The following is the formulation of this research problem: How can we best use classification using Random Forests optimized by Bayesian optimization to determine the optimal path, considering the data from YOLOv3 used for object detection on the path.Moreover, how can we find the optimal path using the A* Heuristic Search algorithm while accounting for the roads' ever-changing conditions.For drivers to reach their destination swiftly and safely, the main problem that needs to be solved is how to avoid any obstacles or traffic on the main route.
In this context, traffic CCTV footage can be further utilized by using the YOLO (You Only Look Once) version 3 application to detect objects in real-time, including counting the number of vehicles with a good level of accuracy [7], [8].Random Forest algorithm, as an effective machine learning method, can be used to classify road conditions based on key parameters, such as number of vehicles and road width [9].Optimization using Bayesian optimization can improve the performance of Random Forest to provide more accurate predictions [10].The results of this model can then be used to determine the traffic condition, whether it is smooth, congested, or jammed, which in turn contributes to the selection of the best lane.This research also applies the A* (A Star) algorithm to find the best path based on road condition parameters and distance traveled.The A* algorithm can provide the best path-finding solution by considering the estimated closest distance to the destination with high time efficiency [11], [12].

II. RESEARCH METHOD
In the context of ever-increasing population growth, especially in urban areas, the surge in the number of private vehicles resulting in increasingly congested traffic has become a serious challenge.This phenomenon prompts the need for an effective approach to designing the best path, focusing on key factors such as varying road conditions and distances faced by road users [3], [13].To accomplish the research goal of identifying the optimal path, the methodology in this study uses heuristic search algorithms in conjunction with machine learning algorithms.Two primary techniques are combined in this study: the A* Heuristic Search algorithm is used to find the optimal path and Random Forest optimization is used to classify data using Bayesian optimization.
As can be seen in Fig. 1, it begins with the collection of datasets and then proceeds with the data preprocessing stage, which consists of four processes to be carried out.The first is data cleaning, which is the process of filtering unnecessary data sets such as deleting missing or incomplete data.The second is data integration, in this process vehicle images are released, namely motorcycle images with label 0 and car images with label 1.The third stage of data transformation is the process of combining data in a dataset.Finally, in the fourth stage of data reduction, namely the division of training and testing data into 80:20.After the pre-processing stage is complete, the next step is to carry out the data training process.In this process, the program will be trained to recognize objects according to the dataset and labels that have been given using the YOLOv3 model.This process involves a feed-forward and backward propagation process where the model will make initial predictions by comparing the predicted results with the actual labels and making adjustments to the weights.This research summarizes several main steps to formulate an effective solution.It starts by performing vehicle detection through the YOLOv3 model, followed by road condition classification using the Random Forest algorithm.This approach is then refined through the application of Bayesian optimization to obtain the optimal configuration [14].Next, an important step in this research is to determine the best path based on the identified critical parameters: road condition and distance.Through the combination of these steps, it is expected to make a meaningful contribution to overcoming traffic congestion constraints and improving travel efficiency in the midst of increasingly dynamic urban development [15].

A. YOLO (You Only Look Once) version 3
The feature extractor in the YOLOv3 architecture uses darknet-53 as the backbone network.Darknet-53 consists of various convolution layers and residual blocks, which together help in generating more robust and complex features from the input image.In the context of object detection, YOLOv3 utilizes a Convolutional Neural Network (CNN) architecture to predict the class and location of objects in the image [7], [16].The applications of YOLOv3 have extended to various fields, including security systems with object detection, traffic analysis, industrial monitoring, human identification in computer vision systems, and various other purposes.YOLOv3's high speed together with its good object detection capabilities have made it one of the popular architectures in the development of real-time object detection systems.Its ability to detect objects of different classes simultaneously is also one of its main advantages.YOLOv3 models are trained to recognize several predefined object classes, including but not limited to humans, cars, cats, and so on [8].
YOLOv3 is a computer vision model used to detect objects in images in real time.Its architecture consists of two main components: Darknet-53 as the main network and the detection head to generate predictions.As Fig. 2 shows, in the process, there are operations such as concatenation and addition that help combine information from previous layers with newer ones so that the model can better understand the spatial context.The residual box, or residual block, is used to improve the performance and stability of the model training.
Three precise scales are provided to YOLOv3 by down-sampling the dimensions of the input image by 32, 16, and 8 pixels, respectively.The 82nd layer is responsible for the initial detection.The network samples the image for the first 81 layers, which means that there are 32 steps in the 81 layers.The resulting feature map will measure 13 by 13 if the original image has dimensions of 416 by 416.Here, a 1 × 1 detection kernel is used to make one detection, providing a 13 × 13 × 255 detection feature map.The feature map from layer 79 is then sampled by 2× to a dimension of 26 × 26 after passing through several convolutional layers.The feature map from layer 61 and this one are then deeply combined.Subsequently, the feature map is merged once more, incorporating numerous 1 × 1 convolutional layers to amalgamate the characteristics from the preceding layer (61).The 94 layer is then used for a second detection, producing a detection feature map with dimensions of 26 × 26 × 255.The process is repeated with the 91-layer feature map going through  several convolutional layers and then being thoroughly merged with the 36-layer feature map.The information from the previous layer (36) is merged once more using numerous 1 × 1 convolutional layers.We make the final of 3 in the 106 layer, resulting in a feature map of size 52 × 52 × 255.

B. Random Forest
Random Forest is an algorithm for classification and regression on large data sets.It works by using a set of structured decision trees, called classifier trees, and each tree generates votes from input unit x for the most popular class in the classification [17].Random Forest is used to predict classification and regression situations on large datasets.This approach involves combining results from each decision tree for classification as well as averaging the regression results [10], [18].
The Random Forest machine learning algorithm combines the outputs of multiple decision trees to produce a single result.As the name implies, the forest is formed from many trees obtained through a bagging or bootstrap aggregating process.As shown in Fig. 3 each tree in Random Forest will output a class prediction.Class predictions with the most votes become candidate predictions in the model.The greater the number of trees, the higher the accuracy and prevents the problem of overfitting.
In the usage stage, after obtaining information about the number of vehicles, both motorcycles and cars, the next step is to predict road congestion conditions using the Random Forest algorithm.First of all, the data that has been collected previously, such as the number of motorcycles, the number of cars, the number of lanes, and the distance traveled at a certain time, is used as input for the Random Forest algorithm.This data is used to provide information to the algorithm regarding various factors related to traffic at the time of observation, as shown in (1).
where H(x) represents the prediction generated by the Random Forest model for an input data x, this model consists of several decision trees (h i ), which then each tree votes (Y value) on the prediction that will be generated.Finally, the Y value with the most votes (arg max) is taken as the final prediction of the Random Forest ensemble model [19].

C. Bayesian Optimization
Bayesian optimization is a method used to automatically adjust hyperparameters in machine learning algorithms.The focus is to find the hyperparameter configuration that results in the best performance of the model.As can be seen in Fig. 4, a probabilistic model is used in Bayesian optimization to estimate the objective function.As a "surrogate" or "substitute" for the real objective function, this model is used.At each evaluation point, the model usually generates an estimate of the mean and uncertainty (variance), which provides a sense of our level of confidence in the estimate.
The next step in the evaluation site selection process is to use the probabilistic model's information to determine the objective function's evaluation location.The "acquisition function" method is used in Bayesian optimization to identify the next point in time at which the objective function is evaluated.Based on the acquisition function, the objective function is assessed at the chosen point.The probabilistic model is updated using the evaluation's outcome.The Bayesian inference method is used to update the probabilistic ISSN: 2085-3688; e-ISSN: 2460-0997 The intelligent decision model for determine the best path of transportation on smart city • • • model following receipt of the objective function evaluation result.Stated differently, the objective function properties are better estimated by the model as it "learns" from the new evaluation data.The objective is to effectively search the search space and identify the ideal value of the objective function with the fewest number of evaluations through iterative repetition of this process.
After applying the Random Forest algorithm to perform a prediction analysis of road congestion conditions based on variables such as the number of motorcycles, the number of cars, the number of lanes, and travel distance, the next step is to optimize the Random Forest model using the Bayesian optimization method [20], [21], shown in (2).
where f (x) predicted the value of function x by modeling the estimated value of the objective function f (x) which was GP based on previously evaluated data.Then using the function m(x) calculateed the average of f (x) at x. Finally, the function k(x, x ′ ) would describe the extent to which the value at point x would correlate with the value at point x ′ .
This optimization process aimed to strike the right balance between two important aspects, namely avoiding data overfitting and improving the accuracy of the Random Forest model.By using the Bayesian optimization method, we hoped to find the optimal hyperparameter configuration for the model so that the model could provide accurate and consistent predictions of new data [19], [22].

D. A* Algorithm (Heuristic Search)
The A* algorithm is known as one of the most popular best-first search algorithms.It evaluates each node by combining the values of g(n) and h(n) [11].Heuristic search is finding solutions to problems by exploring various conditions to increase the variety and speed of finding solutions.In the context of the A* algorithm that uses heuristic methods, pathfinding in Heuristic Search becomes more efficient because it utilizes the estimated weight value or the estimated remaining distance to reach the destination.The decision on the best path results from considering the factors of road congestion conditions and distance to the destination point.This way, the method produces an optimal path that avoids congestion and optimizes the overall journey [12].The projected total cost of the path through node n was given by F (n).The equation to determine F (n) is shown in (3).
where H(n) is the projected cost of the shortest path from node n to the destination node, and G(n) is the cost from the origin node to the current node n.

III. RESULT
In this section, we explain the results obtained from the data training process using YOLOV3 in calculating the number of vehicles; then, the predicted results of road conditions using the Random Forest algorithm are optimized using the Bayesian optimization algorithm.Finally, the best path results were obtained using the A* algorithm with the Heuristic Search method.

A. Count the Number of Vehicles
From the model generated using YOLOv3, the results of average precision accuracy of motorcycles were 72.72 % and cars 79.35 %.The results of mean average precision average AP value of motorcycles and cars where in this model MAP produced a value of 76.03 %.The YOLOv3 model testing result is shown in Table 1.After obtaining a new model from YOLOV3, namely the weight and cfg files, the next step was to count the number of vehicles in the CCTV video recordings.As seen on Table 2 on the calculation of the number of vehicles, the results of the number of motorcycles and cars in 4 days (January 2, 3, 4 and 5, 2023) at 3 times (morning, afternoon and evening) with an average accuracy of 93.17 % motorcycles and 96.13 % cars.

B. Predicted Road Congestion Condition
After collecting vehicle data, the Random Forest algorithm would be used to predict road congestion conditions based on the number of cars and motorcycles, number of lanes, and travel distance with the existing conditions as in Table 3 to Table 7.
The Random Forest algorithm would create a prediction model by combining the results of various separate decision trees, such as the number of motorcycles, cars, lanes, and travel distance, as shown in Fig. 5.To The intelligent decision model for determine the best path of transportation on smart city • • •  estimate road conditions, each decision tree in Random Forest took into account various factors.In addition, the prediction of traffic congestion with new data that had never been explored could also be done using the trained Random Forest model.
The model's accuracy on the test data was calculated to assess the model's quality.This was accomplished by contrasting the model's predictions with the actual class labels found in the test data.Utilizing the 'accuracy score' function was the method utilized to gauge accuracy.
Based on the results obtained from the calculations in Fig. 6, it can be observed that the model's accuracy on the test data is 89 %.This figure showed that the model had a good level of accuracy in predicting road conditions based on the parameters previously described.8, regarding the accuracy of the readings, shows an average accuracy of 91.66 %, and when compared to the model, has an accuracy difference of 2.66 %.This showed that the accuracy obtained in the Random Forest model showed an accuracy that was quite close to the actual accuracy.Overall, the accuracy reading results showed varying levels of accuracy.From a total of 144 vehicle calculation data, 12 density condition results did not comply with the logic rules of road density conditions, and 132 data met the logic rules of road density conditions.
After applying the Random Forest algorithm for predicting road density conditions, the next stage  was to optimize the Random Forest model using the Bayesian optimization method.This optimization aimed to ensure a proper trade-off between data overfitting and the high accuracy of the Random Forest model.The optimization process automatically explored and exploited the hyperparameter search space to find the optimal configuration.After building the model, the best parameters of the model were obtained, which can be seen in Fig. 7 which are as follows: (bootstrap: False), (max depth: 10), (max features: 0.9181940453820531), (min samples leaf: 1), (min samples split: 7), (n estimator: 194).
From Fig. 8, the model evaluation shows that the accuracy of the optimized model is 92 % with a loss value of 0.078125.Which means that the model has increased accuracy by 3 %.

C. Determination of the Best Path
The A* algorithm was used to process the data using a heuristic method to determine the best path.The selection of the best path was done after taking into account the road density conditions and the required travel time.The Euclidean distance was used to estimate the remaining cost (heuristic) from the current node to the target node; based on the weight of the road and the conditions faced, this algorithm attempted to determine the best path with the lowest overall weight.There are six path options that can be seen in Fig. 9.
def calculate weight(distance, condition): if condition == "smooth": return distance * 1 elif condition == "medium": return distance * 2 elif condition == "traffic jam": return distance * 3 Based on the distance and condition of the road, the calculate weight function determines the path's weight.The A* algorithm will consider this weight while determining the optimum course of action.There are two inputs for this function: • Distance: The distance between the current node and its weighted surrounding nodes.
• Condition: This refers to the road's state, which may be "smooth," "medium," or "traffic jam." The resultant weight will be impacted by this circumstance.
The algorithm would iterate for neighbor, distance, and condition in graph[current][1:] to examine the current node's neighbors.The new weight G(n) was computed for each neighbor using the formula tentative g score.Based on the values for the road length and condition, the function calculate weight (distance, condition) would determine the weight of the road.The information was updated by altering came from, g score, and F (n) (f score) for that neighbor if the newly calculated G(n) value was less than the previous G(n) value for that neighbor.
The algorithm would then produce the best path based on the minimum weight criteria.This optimal route would show the total weight of each road edge traveled between the starting and ending points.The A* algorithm's output also showed the optimum path in detail, including the order of nodes that must be crossed.This research also represented the road graph to make the A* algorithm results easier to understand.The edges of the road on the optimal path were indicated on this road graph by the color green, while the edges of other roads were indicated by the color red.Then, on average, 9 out of 12 instances selected path five as the most frequently selected path.Road congestion conditions that were often "smooth" and the shortest travel route compared to other routes were factors in this selection.However, the road with the shortest route was not necessarily the best route because if the road conditions tended to be "congested," the route would not be selected as the best route.The width of the road also influenced the selection of the ideal path.Since it could accept more vehicles, lowering the chance of congestion and causing the A* algorithm to prefer it, routes with larger roads could reduce the amount of traffic on the road.

Table 1 .
YOLOv3 Model Testing Result

Table 2 .
Average Accuracy of Motorcycle and Car Readings

Table 3 .
Motorcycle Vehicle Count Parameter Number of Motor Vehicles Quantity

Table 4 .
Car Vehicle Count Parameter Number of Car Vehicles Quantity Numeric Value

Table 5 .
Number of Lanes parameter

Table 6 .
Travel Distance Parameter

Table 7 .
Road Condition Parameter

Table 8 .
Random Forest Reading Accuracy Results

Table 9 .
Table 9 is 92.36 Bayesian Optimization Reading Accuracy Results