Multiple comparisons of precipitation variations in different areas using simultaneous confidence intervals for all possible ratios of variances of several zero-inflated lognormal models

View article
Environmental Science

Introduction and Motivation

In early 2021, approximately 186,300 people in lower southern Thailand were affected by heavy rainfall resulting in flash flooding, landslides, and windstorms, as reported by Thailand’s Department of Disaster Prevention and Migration (DDPM) (Thailand, 2021). Four provinces in the lower southern region of Thailand were affected by flooding: Songkhla (60 households), Pattani (2,810 households), Yala (12,082 households), and Narathiwat (22,308 households). Meanwhile, landslides occurred in Yala and Narathiwat that affected approximately 57 households (Thailand, 2021). Unfortunately, these natural disasters resulted in deaths and injuries (David, 2021).

It would be possible to reduce the impact of natural disasters if governmental organizations had an early warning system that could be triggered to warn people in high-risk areas in advance of impending catastrophes. Rainfall dispersion data can provide essential information indicating imminent flooding when variation is high by analyzing historical precipitation data. Importantly, it could also be used to predict precipitation variation in each area. From the historical evidence of flooding in lower southern Thailand, the precipitation data in four areas are inflated with zero observations, while the non-zero precipitation records are log-normally distributed, as can be seen in An Empirical Application Section. These properties indicate that precipitation data obey the assumptions for a zero-inflated lognormal (ZILN) distribution and can be modeled accordingly.

The ZILN model, also referred to as the delta-lognormal model, is appropriate for modeling right-skewed data with a proportion of zero (Aitchison & Brown, 1963; Fletcher, 2008; Wu & Hsieh, 2014; Hasan & Krishnamoorthy, 2018; Maneerat, Niwitpong & Niwitpong, 2019). Variance is a dispersion measure of probability used in statistical inference for both point and interval (e.g., confidence interval: CI) estimation. Several researchers have formulated point and interval estimates via various approaches. For example, Burdick & Graybill (1984) established CIs for linear combinations of the variance components using the unbalanced one-way classification model and the Graybill-Wang procedure by considering the inequality of the design (Graybill & Wang, 1980). Ciach & Krajewski (1999) estimated the radar-raingauge difference variances which can be separated into the area-point ground raingauge originating from resolution difference between them, and the error of the radar area-average rainfall estimate. Another important approach for variance estimation is bootstrapping based on t-statistics to formulate nonparametric CIs for a single variance and the difference between variances, which was used to estimate the variance in insurance data for properties (Cojbasic & Tomovic, 2007). Bebu & Mathew (2008) used a modified single log-likelihood ratio procedure to construct CIs for the ratio of bivariate lognormal variances and applied it to compare variation in health care costs. Cojbasic & Loncar (2011) suggested Hall’s bootstrapped-t method for constructing one-sided CIs (lower and upper endpoint CIs) for the variances of skewed distributions and illustrated the efficacy of their method by analyzing revenue variability within the food retail industry.

Later, Herbert et al. (2011) suggested an analytical method for the difference between two independent variances that performed well even with small unequal sample sizes and highly skewed leptokurtic data; they used data from a randomized trial for a cholesterol-lowering drug to portray the efficacies of their proposed methods. Harvey & Merwe (2012) revealed that a Bayesian CI based on the highest posterior density outperformed one based on the equal-tailed interval for the variance of lognormal distribution with zero observations. Maneerat, Niwitpong & Niwitpong (2020) showed that the highest posterior density interval based on a probability matching prior produced the narrowest interval with correct coverage for comparing delta-lognormal variances; they applied it to estimate the difference between rainfall variability in the lower and upper northern regions of Thailand. Recently, Bayesian credible intervals based on a non-informative prior were presented by Maneerat, Niwitpong & Niwitpong (2021a) for the single variance of a delta-lognormal model that was used on daily rainfall records.

Nevertheless, no studies have yet been conducted on simultaneous CIs (SCIs) for pairwise comparisons of the variances of several ZILN models, and so we addressed our research toward filling this gap. Hence, we estimated all possible ratios of variances of several ZILN models by using SCIs based on Bayesian, parametric bootstrap (PB), and generalized pivotal quantity (GPQ) approaches. The reasons for choosing them are that the Bayesian and PB approaches can be used to construct CIs capable of handling situations with large differences in the variances and high proportion of zero values of delta-lognormal models, respectively (Maneerat, Niwitpong & Niwitpong, 2020), while CI based on the GPQ approach perform quite well when the variance was large maneeratEstimatingFishDispersal2020. Their efficacies were determined via simulation studies and precipitation data from four areas of the lower southern region of Thailand in terms of the coverage rate (CR), the lower error rate (LER), the upper error rate (UER), and the average width (AW).

Model and methods

Model

For h groups, dii = 1, 2, …, h, denotes the probability of having zero observations while the remaining probability for non-zero observations, d i = 1 d i , follows a lognormal distribution denoted as LN μ i , σ i 2 with mean μi and variance σ i 2 . For random samples from the groups, let Yi = (Yi1Yi2, …., Yini) denote a ZILN variate based on ni observations from group i with the probability density function given by g y i ; d , μ i , σ i 2 = d i + d i y 1 2 π σ i 2 1 / 2 exp ln y i μ i 2 2 σ i 2 .

For Yi = 0, the number of zero observations ni0 follows a binomial distribution with sample size ni and the probability of having zero observations di, where ni = ni0 + ni1, ni0 = #{j:Yij = 0} and ni1 = #{j:Yij > 0}; j = 1, 2, …, ni. For Yi > 0, Wi = lnYi are normally distributed with mean μi and variance σ i 2 . For a ZILN model, the maximum likelihood estimates of di, μi and σ i 2 are d ˆ i = n i 0 / n i , μ ˆ i = j : Y i j > 0 ln Y i j / n i 1 and σ ˆ i , m l e 2 = j : Y i j > 0 ln Y i j μ ˆ i 2 / n i 1 , respectively. For the ith group, the population variance of Yi is given by V i = d i exp 2 μ i + σ i 2 exp σ i 2 d i

which can be log-transformed as T i = ln V i = ln d i + 2 μ i + σ i 2 + ln 1 d i exp σ i 2 . Considering the third term of Ti leads to obtaining lim σ i 2 ln 1 d i exp σ i 2 = 0 when σ i 2 is large. Thus, the log-transformed variance of Vi can be approximated as T i ln d i + 2 μ i + σ i 2 .

Given d ˆ i , μ ˆ i and σ ˆ i 2 from the observations, the estimates of Ti can be written as T ˆ i ln d ˆ i + 2 μ ˆ i + σ ˆ i 2 ; σ ˆ i 2 = j : y i j > 0 ln Y i j μ ˆ i 2 / n i 1 1 . Using the delta theorem, the variance of T ˆ i becomes Var T ˆ i = 1 d i n i d i + 4 σ i 2 n i 1 + 2 σ i 2 n i 1 1 .

In the present study, the parameter of interest is all pairwise ratios among the log-transformed variances of several ZILN models, which is defined as λ i k = ln V i V k = T i T k .

Its estimates can be obtained as λ ˆ i k = T ˆ i T ˆ k ; ∀i ≠ k and i, k =1 , 2, …, h. From Eq. (4), the variance of λ ˆ i k can be expressed as Var λ ˆ i k = Var T ˆ i + Var T ˆ k ,

where the covariance between T ˆ i and T ˆ k is COV T ˆ i , T ˆ k = 0 because Yi = (Yi1Yi2, …., Yini) comprise independent and identically distributed (iid) random vector from a ZILN model. Thus, we can obtain estimates of T ˆ i that are independent random variables. Using estimates d ˆ i , μ ˆ i , σ ˆ i 2 and d ˆ k , μ ˆ k , σ ˆ k 2 from the samples enables the estimated variance of λ ˆ i k to become

V a r ̂ λ ˆ i k = 1 d ˆ i n i d ˆ i + 1 d ˆ i n k d ˆ k + 4 σ ˆ i 2 n i 1 + 2 σ ˆ i 2 n i 1 1 + σ ˆ k 2 n k 1 + 2 σ ˆ k 2 n k 1 1 ,

where d ˆ i , μ ˆ i , σ ˆ i 2 and d ˆ k , μ ˆ k , σ ˆ k 2 denote the estimated parameters of d i , μ i , σ i 2 and d k , μ k , σ k 2 , respectively.

Methods

To estimate λik, the SCIs are formulated based on Bayesian, GPQ and PB approaches.

The Bayesian approach

The essential feature of Bayesian approach is to use the situation-specific prior distribution that reflects knowledge or subjective belief about the parameter of interest; this is modified in accordance with Baye’s Theorem to yield the posterior distribution. Thus, CIs based on the Bayesian approach are derived by using the posterior distribution. In Bayesian theory, the CI is referred to as the credible interval because it is not unique on the posterior distribution. The following methods are used to define suitable credible intervals: the narrowest interval for a univariate distribution (the highest posterior density interval) (Box & Tiao, 1973); the interval when the probability of being below is the same as being above, which is sometimes referred to as the equal-tailed interval (Gelman et al., 2014); or the interval with the mean as the central point (assuming that it exists). In the present study, the SCIs based on the Bayesian approach were constructed based on the equal-tailed interval. Motivated by Maneerat, Niwitpong & Niwitpong (2020), the probability-matching-beta (PMB) and reference-beta (RB) priors were our choice for parameter d i , μ i , σ i 2 in this study. Thus, Bayesian SCIs for λik were established as follows:

The PMB prior:

The probability-matching prior for μ i , σ i 2 is P μ i , σ i 2 p m σ i 2 2 + σ i 2 combined with the prior of d i as a beta distribution with ai = bi = 1/2. Thus, the PMB prior for d , μ i , σ i 2 can be defined as P d , μ i , σ i 2 pmb i = 1 h σ i 2 2 + σ i 2 1 d i d i .

When updated with its likelihood, we obtain P y | λ i = 1 h 1 d i i n i 0 d i n i 1 σ i 2 n i 1 / 2 exp 1 2 σ i 2 j : x i j > 0 ln y i j μ i 2 .

The respective marginal posterior distributions of d i , μ i , σ i 2 are

P d i | y i pmb 1 d i i n i 0 + 1 / 2 d i n i 1 + 1 / 2 P μ i | y i , σ i 2 pmb exp 1 2 σ i , pmb 2 j : x i j > 0 ln y i j μ i 2 P σ i 2 | y i pmb σ i 2 n i 1 + 1 2 2 + σ i 2 exp n i 1 1 σ ˆ i 2 2 σ i 2

which are denoted as d i , pmb p o s t | y i beta n i 0 + 1 / 2 , n i 1 + 1 / 2 , μ i , pmb p o s t | y i N μ ˆ i , pmb , σ i , pmb 2 p o s t , and σ i , pmb 2 p o s t σ i 2 n i 1 + 1 2 2 + σ i 2 exp n i 1 1 σ ˆ i 2 2 σ i 2 , respectively. Thus, the posterior of λ becomes λ i k , pmb p o s t = T i , pmb p o s t T k , pmb p o s t ,

where T i , pmb p o s t ln d i , pmb p o s t + 2 μ i p o s t + σ i , pmb 2 p o s t and T k , pmb p o s t ln d k , pmb p o s t + 2 μ k , pmb p o s t + σ k , pmb 2 p o s t . In agreement with Ganesh (2009), the 100(1 − α)% Bayesian-based SCI with PMB prior for λik is L λ i k , U λ i k pmb = λ i k , pmb p o s t v α pmb ,

where v α pmb stands for the (1 − α)th percentile of the distribution of V pmb = max h λ i k , pmb p o s t min h λ i k , pmb p o s t .

The RB prior:

This is a non-informative prior derived from the Fisher information matrix (Maneerat, Niwitpong & Niwitpong, 2020). The RB prior of d , μ i , σ i 2 is defined as P d , μ i , σ i 2 rfb i = 1 h σ i 1 1 + 2 σ i 2 1 1 d i d i

in which the prior of d′ is a beta distribution. When combined with its likelihood Eq. (9), the posterior of μ i , σ i 2 differs from the PMB prior as follows:

P μ i | y i , σ i 2 rfb exp 1 2 σ i , rfb 2 j : x i j > 0 ln y i j μ i 2 P σ i 2 | y i rfb σ i 2 n i 1 2 1 + 2 σ i 2 1 exp n i 1 1 σ ˆ i 2 2 σ i 2 .

Moreover, it can be similarly denoted as d i , rfb p o s t | y i beta n i 0 + 1 / 2 , n i 1 + 1 / 2 , μ i , rfb p o s t | y i N μ ˆ i , rfb , σ i , rfb 2 p o s t and σ i , rfb 2 p o s t σ i 2 n i 1 2 1 + 2 σ i 2 1 exp n i 1 1 σ ˆ i 2 2 σ i 2 , respectively. The posterior of λik is λ i k , rfb p o s t = T i , rfb p o s t T k , rfb p o s t , where T i , rfb p o s t ln d i , rfb p o s t + 2 μ i p o s t + σ i , rfb 2 p o s t and T k , rfb p o s t ln d k , rfb p o s t + 2 μ k , rfb p o s t + σ k , rfb 2 p o s t . According to Ganesh (2009), the 100(1 − α)% Bayesian-based SCI with the RB prior for λik is L λ i k , U λ i k rfb = λ i k , rfb p o s t v α rfb ,

where v α rfb stands for the (1 − α)th percentile of the distribution of V rfb = max h λ i k , rfb p o s t min h λ i k , rfb p o s t .

The GPQ approach

Motivated by Wu & Hsieh (2014), the GPQ of di is formulated using the arcsin square-root transformation of the variance. Moreover, the GPQs for μ i , σ i 2 are also obtained from transformation of the normal approximation by using the central limit theorem (Tian, 2005; Hasan & Krishnamoorthy, 2017). The GPQ for Ti can be written as G T i = ln 1 sin 2 sin 1 d ˆ i R i 2 n i + 2 μ ˆ i S i G σ i 2 n i 1 + G σ i 2 ,

where G σ i 2 = n i 1 1 σ ˆ i 2 / U i . The random variables R i = 2 n i sin 1 d ˆ i sin 1 d i , S i = μ ˆ i μ i / σ i 2 / n i 1 and U i = n i 1 1 σ ˆ i 2 / σ i 2 are independent from standard normal, normal and χ n i 1 1 2 distributions, respectively. Thus, the corresponding GPQ of λik can be expressed as G λ i k = G T i G T k .

Similarly, G T k = ln 1 G d k + G 2 μ k + G 2 σ k 2 denotes the GPQ of Tk; G d k = sin 2 sin 1 d ˆ k R k 2 n k 1 , G 2 μ k = 2 μ ˆ k S k G σ k 2 / n k 1 , and G 2 σ k 2 = 2 n k 1 1 σ ˆ k 2 / U k . Therefore, the 100(1 − α)% SCI for λjk based on the GPQ approach is given by L λ i k , U λ i k gpq = λ ˆ i k q α GPQ V a r ̂ λ ˆ i k ,

where q α GPQ denotes the (1 − α)th percentile of the QGPQ distribution; the QGPQ is derived as Q GPQ = m a x j l λ ˆ i k G λ i k Y , Y , d , μ , σ 2 / V a r ̂ λ ˆ i k .

In agreement with Hannig et al. (2006), Kharrati-Kopaei & Eftekhar (2017), the asymptotic coverage probability of the SCI for λik based on the GPQ is slightly modified from that in Maneerat, Niwitpong & Niwitpong (2021b) (the proof of Theorem 1 in the Appendix).

Theorem 1

Let Y i = Y i 1 , Y i 2 , . , Y i n i i i d Z I L N d i , μ i , σ i 2 . For Yi = 0, ni0 is binomially distributed with the proportion of zero inflation di = E(ni0/ni) . For Yi > 0, lnYi is log-normally distributed with mean μi = E(lnYi)and variance σ i 2 = V a r ln Y i . Moreover, let λik = Ti/Tk; T i ln d i + 2 μ i + σ i 2 from group ibe the log-transformed variance of ZILN. Given yi = (yi1yi2, …., yini), let V a r ̂ λ ˆ i k be an approximated variance of λ ˆ i k = T ˆ i / T ˆ k , where T ˆ i , T ˆ k are the estimates of (TiTk). Suppose that ni/n → φi ∈ (0, 1) as n = i = 1 h n i , thus it follows that the asymptotically coverage probability of 100 (1 − α)% SCI for λjk based the GPQ approach is given by

P λ j k λ ˆ i k q α GPQ V a r ̂ λ ˆ i k 1 α

for ∀i ≠ k and ik =1 , …, h.

The PB approach

Here, we assume that the data come from a known distribution with unknown parameters that are estimated by using samples stimulated from the estimated distribution. In the present study, the PB approach is adjusted to suit our particular situation. Let d ˆ i , μ ˆ i and σ ˆ i 2 be the observed values of d ˆ i , μ ˆ i , and σ ˆ i 2 representing the estimated values of parameters di, μi, and σ i 2 , respectively. Thus, we can obtain the empirical distribution of T based on the PB approach. In accordance with Sadooghi-Alvandi & Malekzadeh (2014), the respective sampling distributions of ( d ˆ i , μ ˆ i , σ ˆ i 2 ) are

d ˆ i p b o o t beta n i 0 + 1 / 2 , n i 1 + 1 / 2 μ ˆ i p b o o t = μ ˆ i + D i σ ˆ i 2 n i 1 σ ˆ i 2 p b o o t = σ ˆ i 2 U n i 1 1 ,

where D i = μ ˆ i p b o o t μ ˆ i / σ ˆ j 2 / n i 1 N 0 , 1 and U j = n i 1 1 σ ˆ i 2 p b o o t / σ ˆ i 2 χ n i 1 1 2 are independent random variables with standard normal and Chi-square distributions, respectively. The PB variable-based pivotal quantity is expressed as M PB = λ ˆ i k p b o o t λ ˆ i k / V a r ̂ λ ˆ i k ,

where λ ˆ i k p b o o t = T ˆ i p b o o t T ˆ k p b o o t and λ ˆ i k = T ˆ i T ˆ k . By replacing observed values d ˆ i μ ˆ i , σ ˆ i 2 from the samples, we respectively obtain

λ ˆ i k = ln d ˆ i d ˆ k + 2 μ ˆ i μ ˆ k + σ ˆ i 2 σ ˆ k 2 λ ˆ i k p b o o t = ln d ˆ i p b o o t d ˆ k p b o o t + 2 μ ˆ i p b o o t μ ˆ k p b o o t + σ ˆ i 2 p b o o t σ ˆ k 2 p b o o t V a r ˆ λ ˆ i k = d ˆ i n i d ˆ i + d ˆ i n k d ˆ k + 4 σ ˆ i 2 n i 1 + 2 σ ˆ i 2 n i 1 1 + σ ˆ k 2 n k 1 + 2 σ ˆ k 2 n k 1 1 ,

where d ˆ i = 1 d ˆ i and n i 1 = n i d ˆ i . Hence, the 100(1 − α)% SCI for λik based on the PB approach is L λ i k , U λ i k P B = λ ˆ i k M α PB V a r ̂ λ ˆ i k ,

where m α PB is the (1 − α)th percentile of the distribution of MPB. Theorem 2 shows the asymptotic coverage probability of the 100(1 − α)% SCI for λik based on the PB approach (see the proof in the Appendix ).

Theorem 2

Suppose that Yi = (Yi1Yi2, …., Yini) comprise an iid random vector from a ZILN model based on ni observations from population group i. Let λ ˆ i k = T ˆ i T ˆ k be the estimate of λik, where T ˆ i and T ˆ k are the approximately log-transformed variances of T ˆ i and T ˆ k from the population groups ith and kth, respectively. Hence, P λ i k λ ˆ i k M α PB V a r ̂ λ ˆ i k 1 α , where V a r ̂ λ ˆ i k is the estimated variance of λ ˆ i k ;i ≠ k and ik =1 , 2, .., h.

Simulation studies and results

Simulation studies were conducted to assess the performances of the SCIs based Bayesian, GPQ, and PB approaches for all pairwise ratios of variances of several ZILN distributions: Bayesian SCIs based on PMB and RB priors (Maneerat, Niwitpong & Niwitpong, 2020), the GPQ-based SCI (Wu & Hsieh, 2014), and the PB-based SCI (Sadooghi-Alvandi & Malekzadeh, 2014; Li, Song & Shi, 2015; Kharrati-Kopaei & Eftekhar, 2017). CRs, LERs, UERs, and AWs of the SCIs were determined when the population group size(h) were fixed at 3 and 5; the optimal values of CR, LER, UER, and AW are 95%, 5%, 5% and 0, respectively, which were used to judge the best-performing SCI. Critical values v α pmb , v α rb , q α GPQ and m α PB for the Bayesian SCIs based on PMB and RB priors, GPQ and PB, respectively, were also assessed. Throughout the simulation studies, the simulation procedure to estimate the CRs, LERs, and UERs was as follows:

  • Generate random samples Yi = (Yi1Yi2, …., Yini) from ZILN d i , μ i , σ i 2 , and compute d ˆ i , μ ˆ i , σ ˆ i 2 ; i = 1, 2, …, h from the samples.

  • Compute the critical values for each method using 2500 Monte Carlo simulations.

  • Apply the SCIs based on Bayesian-based PMB and RB priors, GPQ, and PB approaches given in Eqs. (14), (18), (21) and (31), respectively, and record whether or not the values of (λiki ≠ k) fall within their corresponding confidence intervals.

  • Repeat steps (i)-(iii) M = 5000 times.

  • For each method: obtain the number of times that all (λiki ≠ k) are in their corresponding SCIs to estimated the CR.

  • Obtain the number of times that all (λiki ≠ k) is less than or greater than their corresponding SCIs to estimate the LER and UER, respectively.

For the three-group comparison, the following parameter combinations were used: large variances σ 1 2 , σ 2 2 , σ 3 2 = 3 , 5 , 7 ; small (30, 30, 30), moderate (50, 50, 50), large [(100, 100, 100) and (100, 100, 200)], small-to-large (30, 50, 100) and medium-to-large (50, 100, 200) sample sizes; and zero-inflation percentages of (10, 20, 30), (10, 30, 50) and (30, 50, 50). For the five-group comparison, the following parameter combinations were used: large variances σ 1 2 , σ 2 2 , σ 3 2 , σ 4 2 , σ 5 2 =(1, 1, 2, 2, 3); small-to-large (30, 50, 50, 100, 200), medium-to-large (50, 50, 50, ) (100, 100), and large (70, 100, 100, 200, 200) sample sizes; and zero-inflation percentages of (10, 10, 20, 20, 20), (20, 20, 30, 30, 50) and (50, 50, 50, 70, 70). The results are reported in Table 1.

Table 1:
Performance measures of SCIs-based different approaches.
ni di(%) B-PMB B-RB GPQ PB AW
LER CR UER LER CR UER LER CR UER LER CR UER B-PMB B-RB GPQ PB
3 sample groups and σ 1 2 , σ 2 2 , σ 3 2 = 3 , 5 , 7
(303) (10,20,30) 1.993 97.973 0.033 1.307 98.693 0.000 0.707 99.293 0.000 2.460 97.540 0.000 22.961 25.493 27.764 22.942
(10,30,50) 1.880 98.113 0.007 1.200 98.800 0.000 0.967 99.033 0.000 2.900 97.100 0.000 28.702 33.323 33.300 27.254
(30,50,50) 1.120 98.873 0.007 0.427 99.573 0.000 0.620 99.380 0.000 2.520 97.480 0.000 30.764 35.737 36.813 29.113
(503) (10,20,30) 2.833 96.800 0.367 2.300 97.567 0.133 1.107 98.887 0.007 2.347 97.627 0.027 15.521 16.544 19.078 16.893
(10,30,50) 2.887 97.027 0.087 2.173 97.800 0.027 1.253 98.747 0.000 2.607 97.393 0.000 18.848 20.654 22.403 19.733
(30,50,50) 2.087 97.840 0.073 1.413 98.567 0.020 0.973 99.027 0.000 2.320 97.673 0.007 20.104 21.996 24.567 21.096
(1003) (10,20,30) 3.480 95.140 1.380 3.200 95.693 1.107 1.273 98.527 0.200 1.960 97.767 0.273 10.015 10.325 12.448 11.681
(10,30,50) 3.660 95.627 0.713 3.200 96.420 0.380 1.427 98.540 0.033 2.087 97.833 0.080 11.866 12.410 14.327 13.422
(30,50,50) 3.220 96.040 0.740 2.780 96.747 0.473 1.167 98.813 0.020 2.073 97.853 0.073 12.389 12.948 15.408 14.202
(30,50,100) (10,20,30) 1.787 96.753 1.460 1.367 97.453 1.180 0.380 99.480 0.140 1.127 98.467 0.407 12.846 13.402 16.552 14.152
(10,30,50) 1.853 97.127 1.020 1.387 97.993 0.620 0.420 99.553 0.027 1.420 98.353 0.227 14.348 15.042 18.368 15.604
(30,50,50) 1.013 97.947 1.040 0.547 98.687 0.767 0.260 99.653 0.087 1.053 98.627 0.320 16.343 17.452 20.826 17.181
(50,100,200) (10,20,30) 2.580 94.773 2.647 2.247 95.293 2.460 0.467 99.047 0.487 0.847 98.307 0.847 8.637 8.822 11.261 10.230
(10,30,50) 2.847 95.073 2.080 2.593 95.560 1.847 0.667 99.093 0.240 1.313 98.140 0.547 9.522 9.725 12.334 11.166
(30,50,50) 2.173 95.880 1.947 1.793 96.533 1.673 0.380 99.380 0.240 1.020 98.460 0.520 10.618 10.939 13.751 12.189
(1002,200) (10,20,30) 3.253 94.213 2.533 2.953 94.693 2.353 0.967 98.673 0.360 1.507 97.920 0.573 7.952 8.090 10.266 9.647
(10,30,50) 2.940 95.013 2.047 2.620 95.533 1.847 0.980 98.793 0.227 1.460 98.127 0.413 8.985 9.184 11.489 10.773
(30,50,50) 2.567 95.387 2.047 2.227 96.007 1.767 0.900 98.893 0.207 1.547 98.047 0.407 9.888 10.197 12.666 11.709
5 sample groups and σ 1 2 , σ 2 2 , σ 3 2 , σ 4 2 , σ 5 2 = 1 , 1 , 2 , 2 , 3
(30, 502, 100, 200) (10,10,20,20,20) 0.326 99.504 0.170 0.232 99.626 0.142 0.344 99.568 0.088 0.756 99.002 0.242 6.224 6.471 6.310 5.600
(20,20,30,30,50) 0.244 99.620 0.136 0.154 99.754 0.092 0.244 99.694 0.062 0.666 99.164 0.170 6.952 7.250 7.067 6.201
(20,30,50,50,70) 0.154 99.738 0.108 0.092 99.828 0.080 0.322 99.642 0.036 0.788 99.084 0.128 8.510 8.971 8.513 7.369
(50,50,50,70,70) 0.062 99.882 0.056 0.026 99.942 0.032 0.116 99.872 0.012 0.426 99.490 0.084 9.572 10.226 9.861 8.223
(503, 1002) (10,10,20,20,20) 0.398 99.504 0.098 0.338 99.582 0.080 0.558 99.414 0.028 1.122 98.788 0.090 6.614 6.826 6.557 5.914
(20,20,30,30,50) 0.392 99.512 0.096 0.312 99.618 0.070 0.526 99.448 0.026 1.092 98.810 0.098 7.791 8.100 7.567 6.768
(20,30,50,50,70) 0.358 99.618 0.024 0.244 99.748 0.008 0.582 99.398 0.020 1.196 98.754 0.050 10.067 10.737 9.488 8.354
(50,50,50,70,70) 0.204 99.766 0.030 0.136 99.850 0.014 0.254 99.746 0.000 0.822 99.166 0.012 10.687 11.352 10.571 9.039
(70, 1002, 2002) (10,10,20,20,20) 0.784 99.038 0.178 0.710 99.140 0.150 0.810 99.120 0.080 1.232 98.640 0.128 4.499 4.565 4.507 4.237
(20,20,30,30,50) 0.666 99.174 0.160 0.580 99.280 0.140 0.620 99.310 0.070 1.058 98.826 0.116 5.218 5.321 5.116 4.783
(20,30,50,50,70) 0.620 99.290 0.090 0.550 99.380 0.060 0.750 99.200 0.060 1.158 98.744 0.098 6.546 6.743 6.202 5.753
(50,50,50,70,70) 0.374 99.548 0.078 0.310 99.630 0.060 0.370 99.600 0.030 0.680 99.258 0.062 6.938 7.139 6.892 6.249
DOI: 10.7717/peerj.12659/table-1

Notes:

Note: (1003, 2002) = (100, 100, 100, 200, 200). Bold denotes the best-performing method.

For h = 3 with large variance, Table 1 and Fig. 1 reveal that all of the methods provided CR performances close to and greater than the nominal confidence level (95%). Meanwhile, the SCIs based on the Bayesian approach based on the PMB prior and GPQ maintained a good balance between LER and UER. Importantly, the AW of PB was narrower than the other methods for small sample sizes, while those of the Bayesian approach based on the PMB prior were slightly narrower than the others for the other sample sizes. When a group comparison was h = 5 (Table 1 and Fig. 2), the PB approach provided the best CRs and narrowest AWs for all scenarios tested.

The CR and AW performance measures for three sample groups: (A) CR (B) AW.

Figure 1: The CR and AW performance measures for three sample groups: (A) CR (B) AW.

The CR and AW performance measures for five sample groups: (A) CR (B) AW.

Figure 2: The CR and AW performance measures for five sample groups: (A) CR (B) AW.

An empirical application of the four methods to daily precipitation data

Daily precipitation records comprise publicly available data from the Thailand Meteorology Department (Department, 2021). Flash floods, landslides, and windstorms caused by heavy rainfall occurred in the four provinces in the lower southern area of Thailand: Songkhla, Yala, Narathiwat, and Pattani during January 2021, as reported by Thailand’s Department of Disaster Prevention and Mitigation (Thailand, 2021). According to automatic weather system (Department, 2021), Songkhla has two weather stations in the Songkhla and Sadao districts, which means that we could simultaneously estimate variations in precipitation at five weather stations.

Daily precipitation data from December 2020 to January 2021 (Table 2) were used in the analysis. Figure 3 shows histogram along with normal quantile–quantile (Q-Q), cumulative density function (CDF) and probability-probability (P-P) plots. Furthermore, the Akaike information criterion (AIC) and Bayesian information criterion (BIC) values of five models: normal, logistic, lognormal, exponential, and Cauchy applied to fitting the non-zero precipitation data were compared to check the appropriateness of each model for fitting the data (Table 3). The AIC and BIC results for the lognormal model were the lowest, and thus it was the most efficient. The data from all of the stations were zero-inflated, thereby verifying that they follow the assumptions for ZILN.

Table 2:
Daily precipitation data in five stations of southern Thailand.
Dates Weather stations: December 2020 Dates Weather stations: January 2021
Shongklha Songkhla-based
Sadao district
Yala Narathiwat Pattani Shongklha Songkhla-based
Sadao district
Yala Narathiwat Pattani
1 160.0 56.4 46.4 38.6 82.0 1 0.8 4.2 6.6 31.2 0.8
2 14.6 85.8 46.6 70.0 0.0 2 1.4 8.2 5.6 6.4 2.0
3 20.8 4.2 55.8 74.2 0.0 3 2.6 42.6 49.6 38.6 49.8
4 8.8 0.2 27.0 0.4 7.2 4 21.4 8.4 28.6 10.4 4.4
5 0.0 0.0 0.2 0.0 0.0 5 9.2 70.2 137.8 62.8 49.0
6 0.0 0.0 0.0 0.0 0.2 6 0.2 2.8 84.8 13.2 0.2
7 0.0 0.0 0.0 0.0 0.0 7 0.0 0.0 1.8 9.2 2.8
8 0.2 0.0 1.6 0.0 0.0 8 0.4 0.0 0.4 0.0 1.2
9 52.0 0.0 0.0 0.0 0.0 9 0.8 0.0 0.0 1.4 0.0
10 39.4 0.0 0.0 0.8 3.6 10 29.0 15.6 2.8 12.6 22.8
11 0.6 0.0 2.8 9.2 9.8 11 23.0 0.6 0.2 0.2 0.0
12 12.2 4.2 17.2 0.0 8.0 12 5.0 0.2 0.6 3.6 1.2
13 5.4 37.2 2.0 8.2 12.8 13 0.0 0.0 2.4 3.0 1.0
14 9.4 0.0 0.0 0.0 3.4 14 5.4 0.0 0.0 0.0 0.0
15 7.0 2.4 12.4 78.4 7.2 15 1.8 0.0 0.0 0.0 0.0
16 19.2 25.6 43.8 43.0 62.8 16 0.8 0.0 0.0 0.0 0.0
17 84.4 97.4 126.4 162.0 164.8 17 0.0 0.0 0.0 0.0 0.0
18 97.2 9.2 113.8 141.2 46.4 18 0.0 0.0 0.0 1.2 0.0
19 92.0 19.2 39.8 43.6 26.2 19 0.0 0.0 0.0 0.0 0.0
20 19.8 7.2 27.8 20.4 7.0 20 0.0 0.0 0.0 0.0 0.0
21 5.4 0.4 0.0 0.2 3.4 21 0.0 0.0 0.0 0.0 0.0
22 0.0 0.0 1.2 1.0 3.4 22 0.0 0.0 0.0 0.0 0.0
23 23.8 0.0 31.0 61.4 12.6 23 0.0 0.0 0.0 0.0 0.0
24 23.4 0.0 19.6 6.6 0.0 24 0.0 0.0 2.2 0.0 0.0
25 2.2 0.0 46.6 39.8 6.8 25 0.0 0.0 0.0 0.0 0.0
26 1.0 10.0 27.6 84.0 2.8 26 0.0 0.0 0.0 0.0 0.0
27 0.0 0.0 1.0 0.0 0.2 27 0.0 0.0 2.0 0.2 0.0
28 0.0 0.0 0.0 0.0 0.0 28 0.0 0.0 0.0 0.0 0.0
29 0.0 0.0 0.0 0.0 0.0 29 0.0 0.0 0.0 0.0 0.0
30 0.0 0.0 0.0 0.0 0.0 30 4.4 0.0 0.0 0.0 0.0
31 6.2 0.4 11.2 89.2 3.2 31 9.6 2.0 3.0 0.4 1.6
DOI: 10.7717/peerj.12659/table-2

Notes:

Source: Thailand Meteorological Department Automatic Weather System.

Histogram, normal Q-Q, CDF and P-P plots of nonzero precipitation records in five stations of southern Thailand: (A) Songkhla (B) Songkhla-Sadao (C) Yala (D) Narathiwat (E) Pattani.

Figure 3: Histogram, normal Q-Q, CDF and P-P plots of nonzero precipitation records in five stations of southern Thailand: (A) Songkhla (B) Songkhla-Sadao (C) Yala (D) Narathiwat (E) Pattani.

The results in Table 4 reveals that since variance σ i 2 was greater than the mean μi, quite large precipitation variations were required in the present study. For applying data of daily precipitation to measure the efficacy of the four methods, the 95% SCIs-based Bayesian, GPQ and PB approaches for all pairwise precipitation datasets from the five weather stations cover their point estimates (Table 5). In a agree with the simulation results for n1 = n2 = n3 = 50 and n4 = n5 = 100, the PB approach provided the best SCI performance for ratio of variances of several ZILN models. This can be interpreted as Narathiwat has the highest variation in precipitation, followed by Yala. These results are in line with the Asia Disaster Monitoring and Response System (Thailand, 2021), which reported that both areas were affected by flooding and landslides damaging 22,308 households in Narathiwat and 12,082 households in Yala during the time period covered by the data used in the study.

Table 3:
The AIC and BIC results for five associated models.
Stations Criterion Models
Normal Lognormal Logistic Exponential Cauchy
Songkhla AIC 387.611 305.171 373.337 317.644 345.549
BIC 390.938 308.498 376.664 319.308 348.876
Songkhla-Sadao district AIC 241.141 196.707 238.534 203.226 225.198
BIC 243.579 199.145 240.971 204.445 227.635
Yala AIC 373.538 313.718 368.171 322.168 365.426
BIC 376.760 316.940 371.393 323.779 368.648
Narathiwat AIC 362.209 310.600 359.299 317.455 358.947
BIC 365.320 313.711 362.410 319.010 362.058
Pattani AIC 328.067 242.474 313.959 260.584 273.318
BIC 331.060 245.467 316.952 262.080 276.311
DOI: 10.7717/peerj.12659/table-3
Table 4:
Summary statistics for five stations.
Weather stations i ni0 ni1 d ˆ i (%) μ ˆ i σ i 2 λ ˆ i
Songkhla 1 39 23 37.097 1.909 2.982 9.317
Songkhla-Sadao district 2 25 37 59.677 1.828 3.509 9.766
Yala 3 37 25 40.323 2.155 3.490 10.774
Narathiwat 4 35 27 43.548 2.253 4.238 12.411
Pattani 5 33 29 46.774 1.669 2.950 8.607
DOI: 10.7717/peerj.12659/table-4
Table 5:
95% SCIs of all pairwise log-ratios of precipitation variabilities amoung five weather stations in lower southern Thailand.
Methods Limits All pairwise log-ratios of precipitation variabilities among weather stations
Songkhla/ Songkhla-sadao Songkhla/ Yala Songkhla/ Narathiwat Songkhla/ Pattani Songkhla-sadao/Yala
−0.4489 −1.4568 −3.0939 0.71043 −1.0079
Bayesian SCIs -based PMB prior Lower −8.7881 −9.796 −11.4331 −7.6287 −9.3471
Upper 7.8903 6.8824 5.2452 9.0496 7.3313
Width 16.6783 16.6783 16.6783 16.6783 16.6783
Bayesian SCIs -based RB prior Lower −9.4711 −10.479 −12.1161 −8.3117 −10.0301
Upper 8.5733 7.5654 5.9283 9.7326 8.0143
Width 18.0444 18.0444 18.0444 18.0444 18.0444
SCI-based GPQ Lower −9.3037 −9.2166 −11.9695 −6.6362 −10.4292
Upper 8.4059 6.303 5.7816 8.0571 8.4134
Width 17.7096 15.5196 17.7511 14.6932 18.8426
SCI-based PB Lower −7.4257 −7.5709 −10.0871 −5.0781 −8.4311
Upper 6.5279 4.6573 3.8992 6.4989 6.4153
Width 13.9536 12.2281 13.9863 11.577 14.8464
Methods Limits Songkhla-sadao/ Narathiwat Songkhla-sadao/ Pattani Yala/ Narathiwat Yala/ Pattani Narathiwat/ Pattani
−2.645 1.1593 −1.6371 2.1672 3.8043
Bayesian SCIs -based PMB prior Lower −10.9842 −7.1798 −9.9763 −6.1719 −4.5348
Upper 5.6941 9.4985 6.702 10.5064 12.1435
Width 16.6783 16.6783 16.6783 16.6783 16.6783
Bayesian SCIs -based RB prior Lower −11.6672 −7.8629 −10.6593 −6.855 −5.2178
Upper 6.3771 10.1815 7.385 11.1894 12.8266
Width 18.0444 18.0444 18.0444 18.0444 18.0444
SCI-based GPQ Lower −13.0047 −7.9247 −11.078 −5.8532 −5.2999
Upper 7.7146 10.2433 7.8037 10.1876 12.9086
Width 20.7193 18.168 18.8817 16.0408 18.2085
SCI-based PB Lower −10.8075 −5.9981 −9.0757 −4.1522 −3.369
Upper 5.5175 8.3168 5.8014 8.4866 10.9777
Width 16.325 14.3149 14.8771 12.6388 14.3467
DOI: 10.7717/peerj.12659/table-5

Discussion

From the above numerical results, it can be seen that the SCIs based on PB and the Bayesian approach based on the PMB prior dealt with large variations in the data better than the other approaches. The PB-based SCI has some strong points for small sample sizes due to random samples being obtained via bootstrap resampling. Furthermore, the performance of the Bayesian SCI based on the PMB prior declined as the number of populations increased and the sample size decreased. Although, the GPQ method provided appropriate CRs, its AWs were wider than the other methods, possibly because the GPQ of di is limited for cases with unequal zero-inflated percentages. Since it has performed quite well for one population group especially (Wu & Hsieh, 2014; Maneerat, Niwitpong & Niwitpong, 2021a). Further research could be conducted to explore subjective or prior beliefs about parameters when using the Bayesian approach for parameter estimation

Conclusions

SCIs for the comparison of the variance ratios among several ZILN models were formulated by applying Bayesian approaches based on the PMB and RB priors, along with the GPQ and PB approaches. In practice, the daily precipitation data for each of the weather stations considered were overdispersed (i.e., the variance was greater than the mean) and zero-inflated (Table 4). Thus, the ZILN distribution is an appropriate model for estimating parameters in the construction of SCIs for multiple comparisons between their variances.

For three populations, all of the methods produced 95% SCIs for all pairwise comparisons among variances covering the true parameter. Meanwhile, the SCI constructed via the Bayesian approach based on the PMB prior maintained a good balance between LER and UER and provided the narrowest AWs except for small sample sizes. On the other hand, the PB-based SCI could handle extreme cases when the sample sizes were small with large variances. For five populations, the PB-based SCI performed the best overall, with the Bayesian approach based on the RB prior for small-to-large sample sizes and the GPQ approach for medium-to-large and large sample sizes providing acceptable results, and thus can be recommended as alternative SCIs.

Supplemental Information

R code for computing all results in this paper

DOI: 10.7717/peerj.12659/supp-1
1 Citation   Views   Downloads