Disaster Mitigation Efforts Using K-Medoids Algorithm and Bayesian Network

. Disaster mitigation is a series of efforts to reduce disaster risk. One of the disaster mitigation efforts is the supervision of the implementation of spatial planning. Knowing the level of damage to buildings in a region in the event of a disaster can supervise the implementation of spatial planning. To predict the level of damage to buildings in an area, we can use the Bayesian network Model. Bayesian network is an extension of Naive Bayes. There are several types of Bayesian networks based on the variable type, namely discrete Bayesian network, continuous Bayesian network, and hybrid Bayesian network. A discrete Bayesian network is a Bayesian network model in which all the variables involved are discrete. Therefore, if there is a continuous variable, it is necessary to discretize the variable. In this paper, modifications are made to the algorithm commonly used in the clustering process to be used in the discretization process. The algorithm used is the K-Medoids algorithm, where this algorithm uses existing data as a representative of the cluster center. Then, the Bayesian network model and the K-Medoids algorithm were used to determine the level of damage to buildings due to the earthquake that occurred in West Sumatra in 2009. From 25,000 house damage data used in this study, we


Introduction
There are numerous barriers and limits to determining the exact moment of a natural disaster. On the other hand, natural disasters frequently cause extensive material and non-material losses. By implementing disaster mitigation, we can lessen the impact of disaster-related losses. Catastrophe mitigation is a set of strategies for reducing disaster risk through physical development, disaster awareness, and disaster capacity building (Article 1 paragraph 6 PP No. 21 of 2008 concerning the Implementation of Disaster Management). Disaster mitigation aims to lessen the impact of catastrophes, particularly on the population, serve as a foundation (guideline) for development planning, and enhance public awareness of how to deal with and reduce disaster impact/risk so that people may live and work securely. Supervising the application of spatial planning is one of the catastrophe mitigation strategies. Knowing the extent of damage to buildings in a disaster area can help maintain the performance of spatial planning.
We can utilize the idea of the opportunity to forecast the extent of damage to buildings in a given location. When it comes to the concept of opportunity, Bayes' Theorem is unavoidable. Bayes ' theorem describes the link between the conditional probability of two events, which has critical applications in statistics. Naive Bayes, Hidden Naive Bayes, and Bayesian networks apply Bayes' theorem principles in the categorization process. The Bayesian network is a Naive Bayes extension. Based on variables, discrete, continuous, and hybrid Bayesian networks are the three forms of Bayesian networks. A discrete Bayesian network is a Bayesian network model in which all of the variables are discrete.
In this paper, the author is interested in predicting the level of damage to buildings using the Discrete Bayesian network. In the Discrete Bayesian network, all variables must be discrete. Therefore, if there is a continuous variable, it is necessary to discretize the variable. Discretization is converting a continuous variable into a discrete variable and creating partitions in the range of values that the variable takes. Then a mapping is made between each interval in the partition and the discrete values of the numbers. Once the discretization is performed, the new variable can be treated as an ordinal. Discretization can be seen as one of the possible data preprocessing techniques. These techniques can significantly improve the overall quality of relationships extracted from the data and the time required for analysis [1]. Discretization can or must be applied before using many statistical models. In fact, there are many models designed primarily for processing categorical data, such as Naive Bayes (NB) [2] and Bayesian network (BN) [3]- [7]. Both models examine the relationships between the variables of interest and allow the coexistence of discrete and continuous variables in the dataset under investigation.
Nevertheless, in the case of BN, the hybrid database enforces constraints on the parent-child relationship between variables. Discrete variables only need discrete parents [8], which can be an unrealistic constraint in many applications. Probabilities need to be estimated for BN and NB, making it challenging to handle continuous variables. To avoid this problem, they are generally assumed to be normally distributed, but this assumption does not always reflect the nature of these variables. Moreover, even if the model can handle continuous variables, the learning process is less efficient and effective [9].
After the discretization process for the new variable is carried out, the classification of the level of damage to the building is carried out using the Bayesian network. The Bayesian network (BN) is a graphical model for expressing information about uncertain domains that are probabilistic. Each node represents a random variable, and each edge reflects the related random variable's conditional probability [10]. Bayesian networks are familiar to be applied in various fields, including mining, finance, health, and disaster mitigation.
In the case in this study, we modified the algorithm commonly used in the clustering process to be used in the discretization process. The algorithm used is the K-Medoids algorithm, where this algorithm uses existing data as a representative of the cluster center. Then, the Bayesian network ISSN : 1411 3724 Eksakta : Berkala Ilmiah Bidang MIPA model and the K-Medoids algorithm were used to determine the level of damage to buildings due to the earthquake in West Sumatra in 2009.

Method 2.1. Procedure
The research in this study was conducted by analyzing the theories relevant to the problems discussed based on the literature review. The development carried out is considering the discretization analysis of exogenous and endogenous variables to determine the level of damage to buildings using the K-Medoids method. Then, clustering the level of damage to buildings using the Bayesian network model. Details of the research method can be seen in Figure 1.

K-Medoids Algorithm
K-Medoids is a partition clustering approach that reduces the distance between a cluster's labeled and center points. Each K-Medoids or PAM algorithm cluster is centered on an object (medoid). The K-Medoids approach has the advantage of overcoming the K-Means algorithm's flaw of being susceptible to noise and outliers, which can cause objects with great values to depart from the data distribution. Another benefit is that the clustering process' outcomes are independent of the sequence in which the records are entered. Procedure for the K-Medoids algorithm [1], [11]: a) Set up cluster centers (number of clusters) b) Assign all data (objects) to the nearest cluster using the Euclidean distance measurement formula. c) Choose one object from each cluster at random as a candidate for a new medoid. d) Calculate the distance between each object in each cluster using the new candidate medoid. e) Calculate the total deviation ( ) by comparing the new distance's total value to the old distance's total value. For < 0, these objects are combined with cluster data to create a new collection of medoids. f) Repeat steps 3 to 5 until there is no medoid change to obtain clusters and their respective cluster members. Not only can the K-Medoids technique be used to group objects, but it can also be used to discretize continuous variables. The number of features is limited to two due to discretization. The first function is a discretized variable, and the second function is an assumed constant auxiliary function.

Research Object
Data on damage to houses in Padang City due to the earthquake that hit West Sumatra on September 30, 2009, was obtained from the Regional Disaster Management Authority (RDMA) of Padang, and historical earthquake data from the Meteorological, Climatological, and Geophysical Agency (MCGA). Twenty-five thousand individual house-building data were used to create Bayesian networks. Five exogenous factors and three endogenous variables were employed in this investigation. Exogenous variables are not influenced by other variables, while other variables influence endogenous variables. The variables were chosen based on past literature or study, including research undertaken by Bayraktarli et al. [13], [14] and Li et al. [15], [16]. Table 1 shows that there are three continuous variables in the research data there are Peak Ground Acceleration ( 2 ), epicentral distance ( 3 ), and distance to fault ( 7 ). The K-Medoids algorithm is used to discretize variables before using the Bayesian network to classify the level of damage to buildings.

K-Medoids Algorithm for Variable Discretization
In addition to being used for object clustering, the K-Medoids algorithm can also be used to discretize continuous data. The number of features in the discretization process is limited to only two. The first feature is a discretizable variable, and the second feature is an assumed constant auxiliary feature. If there are objects in a set of objects = { 1 , . . . , , . . . , } then the set of variables is = { 1 , 2 },, where 1 = { 11 , . . . , 21 , . . . , 1 } and 2 are constant. The following stage in the clustering procedure is the same as the phases in the K-Medoids method. The same category applies to objects in the same cluster.
However, before discretizing the variables, the best cluster determination for each variable must be found. The elbow approach was utilized to determine the ideal number of clusters in this investigation. For each K, the sum of square error (SSE) value is determined using the elbow method. The SSE had seen a significant decline and has the highest number of clusters [17]. The optimal number of clusters for each variable using the Elbow method, there are two clusters, could be deduced from Figure 2. After determining the best number of clusters for each variable, the variables are discretized using the K-Medoids algorithm. Figure 3 shows the discretization findings for each variable. After completing the grouping procedure as indicated in Figure 3, clusterization validation is performed, which is referred to as discretization validation in this case. The number of objects with a positive silhouette coefficient value is compared to the total number of objects for each variable at the ISSN : 1411 3724 Eksakta : Berkala Ilmiah Bidang MIPA variable discretization validation level (Figure 4). The discretization validation level in this scenario is 98 percent for the variable Peak Ground Acceleration ( 2 ) and epicentral distance ( 3 ), and 96 percent for the variable distance to fault ( 7 ).

Formation of Bayesian Network Structure
Forming a BN structure is the next stage. Expert advice regarding numerous earlier scientific writings, including Bayraktarli et al. [13], [14] and Li et al. [15], [16], was used to form the BN structure in this study. Two key elements, exposure factors and system resilience factors impact the amount of damage to buildings caused by earthquakes [12]. Exposure considerations include magnitude, depth, epicentral distance, hypocentral distance, and other earthquake-related characteristics. The system resilience factor is linked to the environmental factors that cause disasters and the building's attributes.
In 2012, Li focused his research on the extent of damage caused by earthquakes from a human perspective, which is influenced by exposure variables and system resilience factors. Meanwhile, the research conducted by Bayraktarli [13], [14] only pays attention to the level of damage from the exposure factor. The relationship between variables and the probability for each variable can be seen in Figure 5. From each probability value of the level of damage, it can be concluded that the West Sumatra earthquake in 2009 caused damage to houses in the city of Padang, mostly at level two, which is moderate damage.  Furthermore, an assessment of the model's performance is carried out. The initial stage of performance evaluation is compiling a confusion matrix for the level of damage. The variable level of damage consists of three states, namely mild (state 1), moderate (state 2), and severe (state 3). In the confusion matrix, a comparison of the predicted and actual results is carried out, and complete details can be seen in Table 2. The calculation results show that the model's accuracy rate is 95.17%. The BN model provides a probabilistic approach to obtaining an inference. Inference in a BN is obtained from the relationship of each node in the structure. Suppose the values of all the variables determining the level of damage are known, as shown in Table 3. Then, using the GeNIe software, the Bayesian network structure and the probability table for each variable are obtained, as shown in Figure 6. So, with the value of variables that affect the level of damage as shown in Table 3, it is most likely that the level of damage is in the low category with a probability of 100%.