The Analysis of Factors Influencing Incidence rates of Toddler Pneumonia in Purwakarta Districts Using Panel Data Spatial Regression

. Pneumonia is an acute respiratory infection that attacks the lungs and can cause inflammation of the air sacs due to the alveoli is filled with pus and fluid. This research aims at identifying factors influencing pneumonia and mapping its incidence rate for toddlers in the Purwakarta Regency. Many factors influence pneumonia, but due to the limitation of data or information, some factors cannot be included in the model and are called omitted variables. The incidence rate of toddler pneumonia in sub-districts of Purwakarta Regency is assumed to be related to one another or have a spatial dependency. Therefore, modeling pneumonia with the Fixed Effect Spatial Model can accommodate spatial aspects. The results show that MR2 measles immunization, low birth weight, exclusive breastfeeding

Based on Figure 1, the incidence rates of toddler pneumonia in the Purwakarta District in the last six years tend to fluctuate, and their values are far above the target set by the work plan (Renja) of the Dinas Kesehatan Kabupaten Purwakarta in 2020, which should be less than 245 per 100,000 residents. To suppress the incidence rate and death from toddler pneumonia, the Purwakarta Health Office carried out various prevention efforts, including pneumonia's early detection. To do so, we propose a method for modeling factors that are suspected to affect the incidence rate of pneumonia which is called the risk factors. The risk factors for pneumonia can be classified into individual, behavior, and environmental factors. However, due to the limitation of data availability, not all of these risk factors are included in the analysis. Variables that should be included in the model but are omitted called omitted variables. However, taking into account these variables may lead to bias and inconsistent estimators [12]. To overcome this issue, we use panel data regression which considers omitted variables.
Pneumonia is an infectious disease transmitted from pneumonia sufferers who spread the virus by droplets into the air when coughing or sneezing. Consequently, adjacent areas may develop pneumonia as well. Given this situation, it is worth noting that the analysis should consider spatial dependence to determine the relationship of an element in an area with other adjacent areas. Some research that considers spatial dependence as it can affect the result of the analysis is conducted by [13][14][15]. In this paper, we fit a model involving factors that affect the incidence rate of toddler pneumonia using panel data spatial regression. In addition, to investigate areas prone to contracting toddler pneumonia, we create an incident rate map of districts in Purwakarta Regency.

Methods
The method used in this study is Spatial Panel Data Regression, which is the extension of Panel Data Regression. The spatial panel data regression has two types: spatial lag fixed effect model and spatial error fixed effect model. The explanation for each model is as follows.

Panel Data Regression
Panel data regression combines cross-section and time series data so that the number of observations is large. According to [16], the panel data regression model is expressed as follows = + ′ + , where i is the cross-section unit where i = 1, 2, …, N; t denotes the number of periods, where t = 1, 2, …, T. x is the independent variable matrix, is the vector of the regression parameter coefficient and denotes an error. In this study, we select one of the two approaches to the panel data regression model, fixed effect model (model with individual influence) and random effect model (model with error effect). The fixed effect model is a model with different effects between individuals. The slope coefficient in the fixed effect model is constant over time, but the intercept varies between cross-section units. The general form of the fixed effect model according to [16] is as follows = + ′ + + where is a specific effect of the individual and denotes errors. Difference intercept in the panel data regression analysis for the fixed effect case can be represented as incomplete information in making the model due to omitted variables.
On the other hand, the random effects model is a panel data model with differences in individual characteristics that are treated as part of the error component. Thus, individual characteristics are accommodated to the error of the model. The slope and intercept coefficients in the random effect model are constant. The general form of the random effect model is as follows = + ′ + + where is the cross-section error component. Spatial regression is an analysis that evaluates the relationship between variables that consider spatial effects or the presence of an area or location effect. The general model of spatial regression according to [17] is = + + + + = + where y is the dependent variable vector, X is the independent variable matrix, is the spatial lag autoregression coefficient, is the spatial error autoregression coefficient, is the coefficient vector regression parameters, W is the spatial weight matrix, is the error vector containing autocorrelation, and is an autocorrelation-free error vector. The spatial models specification might unstable in terms of sample period. To test the stability of the model, a simple test has been proposed by [18].
There are three model approaches in spatial regression, Spatial Autoregressive (SAR), Spatial Error Model (SEM), and Spatial Lag X (SLX). SAR occurs when the dependent variable depends on the other neighboring dependent variables. SLX occurs when the dependent variable depends on the independent variable on neighboring units [12]. While SEM occurs when the dependent variable depends on the error results in the unit's neighbors.

125
The Analysis of Factors Influencing Incidence rates of Toddler Pneumonia in Purwakarta District Using Panel Data Spatial Regression ISSN : 1411 3724 Eksakta : Berkala Ilmiah Bidang MIPA Parameter model estimation for a spatial panel model uses the maximum likelihood estimator. An estimation technique has been developed based on an asymptotic approach or solving the timevarying covariate and spatially correlated errors [19][20]. A robust inference for the linear panel model regression was proposed by [21]. In the case of skewed spatial data, a Bayesian approach is employed to perform the analysis [22][23], while if the spatial outcomes are multivariate then one can use the method introduced by [24].

Spatial Panel Data Regression
Spatial panel data regression analyzes combined data across individuals and time series (panel data) involving spatial influences to determine the relationship between the dependent variable and several independent variables. Some research that uses panel data methods and their development can be found in [25][26][27][28][29]. There are two types of fixed effect models in Spatial Panel Data Regression: Spatial Lag Fixed Effect Model and Spatial error fixed effect model.

Spatial Lag Fixed Effect Model (SAR-FEM)
The spatial lag fixed effect occurs when the dependent variable depends on the observed independent variable and the dependent variable on the nearest unit. The spatial fixed effect model is stated as follows: = + ∑ =1 + ′ + + (1) where y is the dependent variable vector, X is the independent variable matrix, is the spatial lag autoregression coefficient, is the coefficient vector of regression parameters, is the spatial weight standardized, is a specific effect of individual and are error. The development of the models in (1) using quantile regression can be found in [30][31] .

Spatial Error Fixed Effect Model (SEM-FEM)
Spatial errors in one region will depend on the results of errors in other regions on the panel data. Parameter estimation in this model uses the Maximum Likelihood Estimator (MLE). According to [17], the spatial error panel model is presented below (2) where y is the dependent variable vector, X is the independent variable matrix, is the spatial error autoregression coefficient, is the coefficient vector regression parameters, is the spatial standardized weight, is a specific effect of individual and is the error.
Using the concept presented previously, we present the steps for data analysis. There are seven steps beginning with the determination of the weight of the spatial matrix, and the final step is to test the model via goodness of fit test. The steps for data analysis are also provided in Figure 2 below.

Results and Discussions
There are some results obtained from this research: an incident rate map and several hypothesis testing such as spatial dependency test, Panel Data Spatial Regression model test, Hausman test, Spatiotemporal non-autocorrelation test, Homoscedasticity test, Normality Distribution Test, Parameter significance test, and goodness of fit test. Figure 3 presents the result in the form of an incidence rates map of pneumonia in Purwakarta Regency in 2021. The incident rate for each sub-district indicated by different colors. The brighter the color, the smaller the incident rate. On the other hand, dark color regions indicate high incidence rate

127
The Analysis of Factors Influencing Incidence rates of Toddler Pneumonia in Purwakarta District Using Panel Data Spatial Regression ISSN : 1411 3724 Eksakta : Berkala Ilmiah Bidang MIPA of pneumonia which means high risk of toddlers contracting pneumonia. In 2021, the sub-district that is at the highest risk for toddler pneumonia is Wanayasa sub-district, while Campaka is at the lowest risk.

Spatial Weighting Matrix
Spatial weighting matrix is created to indicate the neighborhood sub-district. In our analysis, we created a spatial weighting matrix (C) with Queen Contiguity with dimensions of 17 × 17 as follows: The value in matrix C will be standardized to produce a spatial weight matrix W as below. The spatial weighting matrix should be estimated in the case of unknown heteroscedasticity [32], spared spatial dependence structure [33], or when structural breaks exist [34]. In the first case, The M-estimation strategy is used and is extended for heteroscedasticity. While in the second case, the adaptive least absolute shrinkage and selection operator (LASSO) is used to select and estimate the individual nonzero connections of the spatial weight matrix. A two-stage LASSO is employed to estimate a full spatial weight matrix for the final case.

Spatial Dependency Test
The spatial dependency test was performed using software R.3.6.1 packages "Moran ST". The Moran's ST index is 0.5 with p-value of 0.01. The Moran ST index is in the range 0 < I ≤ 1, meaning that there is a positive spatial dependency between districts. The p-value of 0.01 is less than α = 0.05 so that 0 is rejected. In conclusion, with a 95% confidence level, there is a spatial dependency between sub-districts in Purwakarta Regency.

Panel Data Spatial Regression Model test
A Lagrange Multiplier (LM) test is performed to determine whether there is a spatial dependence in the lag or error. The results of LM test are shown in Table 1. = 0.0497 < = 0.05 then 0 is rejected. That is, with a 95% confidence level, it can be concluded that there is a spatial dependence of the error in the regression model. On the other hand, the Spatial Lag Model is not significant. The next test is the Hausman that is used to determine the type of model to use either the spatial random or fixed effect model. The Hausman Statistic is 12.454 with p-value of 0.029. Therefore H0 is rejected which means the resulting model follows the Fixed Effect Model.

Parameter estimation
Parameter estimation of the spatial error fixed effect model (SEM-FEM) was carried out using the maximum likelihood estimator (MLE). The purpose is to fit the model and to determine the contribution of factors suspected of influencing toddler pneumonia in Purwakarta Regency for three years (2019-2021). The results are shown in Table 2.   Table 2 shows the estimate of the Fixed Effect Model parameters. According to Table 2, Measles Immunization, Low Birth Weight, Exclusive Breastfeeding, and Clean and Healthy Behavior influence the Incidence rate of toddler pneumonia while Nutritional status does not influence the Incidence rate. Based on the direction of the regression coefficient, it can be seen that the percentage of MR2 Measles Immunization, Exclusive Breastfeeding, and Cleanness and Healthy Behavior has a negative relationship. On the other hand, low birth weight has a positive relationship with the Incidence rate. Based on the results of parameter estimates in Table 2 Table 3 presents the specific effect for each sub-district. Sub-district Wanayasa has the highest effect, while Sub-district Bojong has the smallest effect. The effect of each sub-district will be added to (3).

Spatiotemporal Non-Autocorrelation Test
Spatiotemporal non-autocorrelation testing can be done using the Moran ST index value. If the MoranST value is close to one, it indicates a solid positive Spatiotemporal autocorrelation from model error. The Moran spatiotemporal statistic is -0.152 with p-value of 0.911 meaning that the test is not significant. Therefore, using a 95% confidence level, it can be concluded that there is no Spatiotemporal autocorrelation in the model error.

Homoscedasticity Test
If the error variance is not constant, then the assumption of homoscedasticity is violated. Homoscedasticity testing can be done using the Breusch-Pagan test statistic. Obtained = 0,663 <

Non-Multicollinearity Detection
Multicollinearity detection can be done by observing the matrix determinant value (X'X). If the determinant is positive, then there is no perfect multicollinearity. The determinant value of the matrix obtained is ( ′ ) = 7.369 × 10 24 . Thus, it can be concluded that there is no perfect multicollinearity. Furthermore, the Variance Inflation Factor (VIF) is obtained, which is used to see whether or not multicollinearity exists between predictor variables and the results are shown in Table  4. Based on Table 4, it can be seen that all predictors have no multicollinearity between the independent variables.

Normality Distribution Test
Normality assumption testing is carried out to see whether the errors resulting from the regression model are normally distributed. Testing the normality assumption in this study uses the Jarque-Bera (JB) test statistic. The JB statistic obtained from the analysis is 1.09 with p-value of 0.58 which means that errors are normally distributed.

Parameter Significance Test
To partially test the significance of the parameters in a model, namely the coefficient of spatial error autocorrelation (ρ) and the parameter coefficients of the independent variables, one can use the Wald test. The results of the test can be seen in the following table.  Table 5 shows that the coefficient of measles immunization MR2 ( 1), low birth weight ( 2), exclusive breastfeeding ( 3), and clean and healthy living behavior ( 5) affect the incidence rate (IR) of toddler pneumonia with significant level 5%. Meanwhile, malnutrition status ( 4) is insignificant to IR toddler pneumonia.

Goodness of Fit
The goodness of fit test of the incidence rate of pneumonia model yields a coefficient of determination (R 2 ) of 82.77%. This value indicates that the regression model can explain the immense diversity of pneumonia incidence rates which the independent variable of 82.77%. The remaining 17.23% is explained by other variables which are not included in the model.

Conclusions
Based on the analysis, the Spatial Error Fixed Effect Model is more appropriate for this case study rather than Spatial Error Random Effect Model. The predictors of the model such as measles immunization MR2 (X1), low birth weight (X2), exclusive breastfeeding (X3), and cleanness and healthy living behavior (X5) have significant effects on the incidence rate (IR). In addition, the map suggests that Campaka and Darangdan sub-districts have low incidence rates. In contrast, Wanayasa sub-district has the highest toddler pneumonia incidence rate.