Bayesian estimation of rainfall dispersion in Thailand using gamma distribution with excess zeros

Wansiri Khooriphan; Sa-Aat Niwitpong; Suparat Niwitpong

doi:10.7717/peerj.14023

Bayesian estimation of rainfall dispersion in Thailand using gamma distribution with excess zeros

Wansiri Khooriphan, Sa-Aat Niwitpong , Suparat Niwitpong

Department of Applied Statistics, Faculty of Applied Science, King Mongkut’s University of Technology North Bangk, Bangkok, Thailand

DOI: 10.7717/peerj.14023

Published: 2022-09-16
Accepted: 2022-08-16
Received: 2022-06-01

Academic Editor: Alban Kuriqi

Subject Areas: Statistics, Computational Science, Natural Resource Management, Environmental Impacts
Keywords: Bayesian estimation, Variance of a gamma distribution with excess zeros, Jeffrey’s prior, Uniform prior, Normal-gamma-beta prior, Rainfall dispersion, Fiducial quantity

Copyright: © 2022 Khooriphan et al.
Licence: This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited.

Cite this article: Khooriphan W, Niwitpong S, Niwitpong S. 2022. Bayesian estimation of rainfall dispersion in Thailand using gamma distribution with excess zeros. PeerJ 10:e14023 https://doi.org/10.7717/peerj.14023

The authors have chosen to make the review history of this article public.

Abstract

The gamma distribution is commonly used to model environmental data. However, rainfall data often contain zero observations, which violates the assumption that all observations must be positive in a gamma distribution, and so a gamma model with excess zeros treated as a binary random variable is required. Rainfall dispersion is important and interesting, the confidence intervals for the variance of a gamma distribution with excess zeros help to examine rainfall intensity, which may be high or low risk. Herein, we propose confidence intervals for the variance of a gamma distribution with excess zeros by using fiducial quantities and parametric bootstrapping, as well as Bayesian credible intervals and highest posterior density intervals based on the Jeffreys’, uniform, or normal-gamma-beta prior. The performances of the proposed confidence interval were evaluated by establishing their coverage probabilities and average lengths via Monte Carlo simulations. The fiducial quantity confidence interval performed the best for a small probability of the sample containing zero observations (δ) whereas the Bayesian credible interval based on the normal-gamma-beta prior performed the best for large δ. Rainfall data from the Kiew Lom Dam in Lampang province, Thailand, are used to illustrate the efficacies of the proposed methods in practice.

Introduction

Thailand is a mainly agrarian country, with the largest agricultural area being in the north of the country due to its cooler climate making it the best place for cultivation. Rainfall is an important factor for cultivation. The rainy season begins in mid-May and ends in mid-October, the southwest monsoon predominate over Thailand to bring abundant annual rainfall. August to September is the wettest period of the year for most of the country, whereas January and December are very dry. Fluctuating rainfall makes it difficult to predict heavy precipitation that could lead to crop loss or damage. Since environmental data, meteorology, climatology and pollution studies are often right-skewed, the gamma distribution is commonly used to model these data (Piao & Zhi-Sheng, 2015; Pradhan & Kundu, 2011; Son & Oh, 2006; Wang et al., 2019). Many researchers have developed confidence intervals for the parameters of a gamma distribution by using various methods. For example, Krishnamoorthy & León-Novelo (2014) proposed the parametric bootstrap (PB) confidence interval for the mean of a gamma distribution that performed satisfactorily even for small samples. Sangnawakij, Niwitpong & Niwitpong (2015) proposed the method of variance estimates recovery (MOVER) and score and Wald intervals to construct confidence intervals for the ratio of the coefficients of variation (CVs) of gamma distributions that performed better than classical estimators in terms of the expected length. Krishnamoorthy & Wang (2016) developed approximate fiducial quantities (FQs) for constructing the confidence interval for the mean of a gamma distribution that performed satisfactorily when the shape parameter was around 0.5 or larger. FQs can be used to establish approximate solutions to many statistical problems and can be readily applied to handle both uncensored and censored samples. Wang et al. (2019) proposed FQs for the differences between the shape parameters, scale parameters, and means of two independent gamma distributions and found that the performances of the FQ-based confidence intervals were more accurate than other comparable methods.

Rainfall data often contain zero observations at certain times of the year and so this must be taken into account when studying precipitation in Thailand. Aitchison (1955) investigated situations where data contain zero observations with the probability of 0¡ δ¡1 while the positive observations have a residual probability of 1- δ. Aitchison & Brown (1963) introduced the delta-lognormal distribution (a lognormal distribution with an excess of zero observations) for which the number of zero observations comprises a random variable with a binomial distribution and the positive observations comprise a random variable from a lognormal distribution. Many researchers have developed methods to construct confidence intervals for the parameters of a delta-lognormal distribution by using various methods. For example, Yosboonruang, Niwitpong & Niwitpong (2019) proposed new confidence intervals for the CV of a delta-lognormal distribution by using Bayesian methods based on the independent Jeffreys’, Jeffreys’ rule, or uniform prior and compared them with the fiducial generalized confidence interval (FGCI); the Bayesian confidence interval based on the independent Jeffreys’ prior performed better than the other methods in all situations studied. Maneerat & Niwitpong (2021) proposed confidence intervals for the common mean of several delta-lognormal distributions based on FGCI, the large-sample (LS) approach, MOVER, PB, and highest posterior density intervals (HPD) based on the Jeffreys’ rule (HPD-JR) or normal-gamma-beta (HPD-NGB) prior; those based on MOVER and PB outperformed the others in a variety of situations. Several researchers have examined methods for constructing confidence intervals for a gamma distribution with excess zeros. Ren, Liu & Pu (2021) proposed simultaneous confidence intervals for the difference between the means of multiple zero-inflated gamma distributions by using three fiducial methods and applied them to precipitation data. Muralidharan & Kale (2002) defined a modified gamma distribution with a singularity at zero and produced confidence intervals for the mean of a mixed distribution. Lecomte et al. (2013) provided compound Poisson-gamma and delta-gamma distributions to handle zero-inflated continuous data under variable sampling volume. Kaewprasert, Niwitpong & Niwitpong (2022) proposed Bayesian estimation for the mean of delta-gamma distributions with application to rainfall data in Thailand.

In statistics, the variance, which gives a measure of the spread or variability of a distribution, is the second central moment, and the positive square root of the variance is the standard deviation (Casella & Berger, 2001). It is one of the most popular parameters of interest for probability and statistical inference.

We are interested to study the confidence interval for the variance of gamma distribution because it is commonly used to model environmental data such as a rainfall dispersion. Rainfall dispersion data can help to examine rainfall intensity, which may be high or low risk. We have studied many research related to constructing the confidence interval for rainfall data, such as Yosboonruang, Niwitpong & Niwitpong (2019) and Maneerat & Niwitpong (2021). We have found several interesting priors, including: Jeffreys’, uniform, or normal-gamma-beta prior. Therefore, we applied to this study.

Since no publications have yet been forthcoming on constructing confidence intervals for the variance of a gamma distribution with excess zeros, the objective of the present study is to construct the confidence interval for the variance of a gamma distribution with excess zeros based on FQ, PB, and six Bayesian-based methods: three Bayesian confidence intervals based on the Jeffreys’ (BAY-J), uniform (BAY-U), or normal-gamma-beta (BAY-NGB) prior and three highest posterior density intervals based on the Jeffreys’ (HPD-J), uniform (HPD-U), or normal-gamma-beta (HPD-NGB) prior.

Methods

Let X_i be a random variable following gamma (α, β) distribution with shape parameter α and scale parameter β. The probability density function can be derived as follows (1) $f (x; α, β) = \{\begin{matrix} \frac{1}{Γ (a) β^{α}} x^{α - 1} e^{- x / β}; & x > 0, \\ 0; & otherwise . \end{matrix}$

Suppose that the population of interest contains both zero and non-zero observations; the zero observations follow a binomial distribution while the non-zero observations follow a gamma distribution. The numbers of zero and non-zero observations are defined as n₍₀₎ and n₍₁₎ respectively, where n = n₍₀₎ + n₍₁₎. Let X = (X₁, X₂, …, X_n) be a random sample from a gamma distribution with excess zeros denoted as Δ(δ, α, β). The distribution function for the confidence interval can be derived as (2) $G (x_{i}; δ, α, β) = \{\begin{matrix} δ; & x = 0, \\ δ + (1 - δ) F (x; α, β); & x > 0 \end{matrix}$ where F(x; α, β) is the gamma cumulative distribution function.

The maximum likelihood estimator of δ is $\hat{δ} = n_{(0)} / n$ . The population mean and variance of X are respectively given by (3) $E (X) = (1 - δ) \cdot (α β)$ (4) $V a r (X) = τ = (1 - δ) \cdot (α β^{2}) + δ (1 - δ) \cdot {(α β)}^{2} .$

The approches used to construct the confidence intervals are in the following subsections.

The FQ confidence interval

Krishnamoorthy, Mathew & Mukherjee (2008) suggested that a gamma distribution can be approximated by applying the cubic transformation of a Gaussian distribution. Let Y₁, …, Y_n be a sample from a gamma (α, β) distribution. When $X_{i} = Y_{i}^{\frac{1}{3}},$ i=1 , …, n then X_i are approximately normally distributed with mean µand variance σ² respectively given by (5) $μ = {(b a)}^{\frac{1}{3}} (1 - \frac{1}{9 a}) and σ^{2} = \frac{b^{\frac{2}{3}}}{9 a^{\frac{1}{3}}}$ where shape parameter a and scale parameter b. The FQs for µand σ² are, respectively, (6) $Q_{μ} = \bar{x} + \frac{Z \sqrt{n - 1}}{\sqrt{χ_{n - 1}^{2}}} \cdot \frac{s}{\sqrt{n}} and Q_{σ^{2}} = \frac{(n - 1) s^{2}}{χ_{n - 1}^{2}}$ where $\bar{x}$ and s are the observed values of $\bar{X}$ and S, respectively; Z and $χ_{n - 1}^{2}$ represent independent random variable of standard normal and chi-squared distribution, respectively; and n is the sample size. The FQs for the parameters of a gamma distribution can thus be derived as (7) $Q_{a} = \frac{1}{9} \{(1 + 0.5 \frac{Q_{μ}^{2}}{Q_{σ^{2}}}) + {[{(1 + 0.5 \frac{Q_{μ}^{2}}{Q_{σ^{2}}})}^{2} - 1]}^{\frac{1}{2}}\}$ (8) $Q_{b} = 27 {(Q_{a})}^{\frac{1}{2}} {(Q_{σ^{2}})}^{\frac{3}{2}} .$

Krishnamoorthy & Wang (2016) proposed the FQs for the mean of gamma distribution as follows: (9) $Q_{M} = {\{\frac{Q_{μ}}{2} + \sqrt{{(\frac{Q_{μ}}{2})}^{2} + Q_{σ^{2}}}\}}^{3}$ where Q_μ and Q_σ² are defined in Eq. (6). Li, Zhou & Tian (2013) proposed the FQ for δ as (10) $Q_{δ} \sim \frac{1}{2} Beta (n_{(1)}, n_{(0)} + 1) + \frac{1}{2} Beta (n_{(1)} + 1, n_{(0)}) .$

We can express the FQ for the variance as follows:

If V = ab², then we can write Eq. (5) as (11) $μ = V^{\frac{1}{3}} b^{- \frac{1}{3}} (1 - \frac{b^{2}}{9 V}) and σ^{2} = \frac{b^{\frac{4}{3}}}{9 V^{\frac{1}{3}}} .$

By solving the above equations for V, we obtain $V = {((μ + \sqrt{μ^{2} + 4 σ^{2}}) / (2 (9^{- 1 / 4}) {(σ^{2})}^{- 1 / 4}))}^{4}$ . Thus, the FQ for gamma variance can be obtained as (12) $Q_{V} = {\{\frac{Q_{μ} + \sqrt{Q_{μ}^{2} + 4 Q_{σ^{2}}}}{2 (9^{- 1 / 4}) {(Q_{σ^{2}})}^{- 1 / 4}}\}}^{4}$ where Q_μ and Q_σ² are defined in Eq. (6). Thus, the FQ for τ is in the form (13) $Q_{τ} = (1 - Q_{δ}) \cdot Q_{V} + Q_{δ} (1 - Q_{δ}) \cdot Q_{M}^{2} .$

Therefore, the 100(1 − α)% confidence interval for τ is (14) $C I_{F Q} = [Q_{τ} (α / 2), Q_{τ} (1 - α / 2)]$ where Q_τ(α/2) and Q_τ(1 − α/2) are the (α/2)-th and (1 − α/2)-th percentiles of Q_τ, respectively.

The confidence intervals for τ can be obtained by using Algorithm 1.

 
_______________________ 
Algorithm 1 FQ_____________________________________________________________________ 
 1:  Generate x from a gamma distribution with excess zeros, compute ˉ x, and 
     s2 of the cube root transformed sample. 
  2:  Generate a standard normal variate Z and a chi-square variate χ2n−1. 
  3:  Generate Beta(n(1),n(0) + 1) and Beta(n(1) + 1,n(0). 
  4:  Compute Qμ, Qσ2  and Qδ from Eqs. (6) and (10). 
  5:  Compute the FQs for mean (QM) and variance (QV ) of gamma distribution 
     from Eqs. (9) and (12). 
  6:  Compute Qτ from Eq. (13). 
  7:  Repeat Steps 2–6 5,000 times and obtain an array of Qτ. 
  8:  Compute the 95% confidence intervals for τ from Eq. (14). 
  9:  Repeat Steps 1–8 10,000 times to compute the coverage probabilities (CPs) 
     and the average lengths (ALs).__________________________________________________________

The PB confidence interval

The log-likelihood function for the vector of shape α and scale β parameters in gamma distribution is given by Saulo et al. (2018). (15) $L (α, β) = n \{α log (β) - log [Γ (α)]\} + (α - 1) \sum_{i = 1}^{n} log (X_{i}) - β \sum_{i = 1}^{n} X_{i} .$

Then, the maximum likelihood estimators (MLE) of α and β can be derived as (16) $\hat{α} = \frac{0.5}{log \bar{x} - \bar{log x}}$ (17) $\hat{β} = \frac{\hat{α}}{\bar{x}} .$ The PB for variance of gamma distribution with excess zeros can be written as (18) ${\hat{τ}}^{*} = (1 - {\hat{δ}}^{*}) \cdot () + {\hat{δ}}^{*} (1 - {\hat{δ}}^{*}) \cdot {(\frac{{\hat{α}}^{*}}{{\hat{β}}^{*}})}^{2} .$

The 100(1 − α)% confidence interval for τ is (19) $C I_{P B} = [{\hat{τ}}^{*} (α / 2), {\hat{τ}}^{*} (1 - α / 2)] .$

The Bayesian confidence intervals

For this study, let Y₁, …, Y_n be a sample from a gamma (α, β) distribution, then for $X_{i} = Y_{i}^{\frac{1}{3}},$ i=1 , …, n then X_i are approximately normally distributed with mean µand variance σ² (Krishnamoorthy, Mathew & Mukherjee, 2008). From the law of large numbers, we know that $μ \sim N (\bar{x}, σ^{2} / n)$ (Casella & Berger, 2001). Thus, the marginal posterior distribution of µis $μ | σ^{2}, x \sim N (\bar{x}, σ^{2} / n_{(1)})$

 
_______________________________________________________________________________________________________ 
Algorithm 2 PB_____________________________________________________________________ 
 1:  Generate x from a gamma distribution with excess zeros, compute ˉ x, ^ δ , ^ α 
    and ^ β . 
  2:  Generate x∗ from x. 
  3:  Compute ˉ x∗, ^ δ∗, ^ α∗ and ^ β∗. 
  4:  Compute ^ τ∗ from Eq. (18). 
  5:  Repeat Steps 2–4 5,000 times and obtain an array of ^ τ∗. 
  6:  Compute the 95% confidence intervals for ^ τ∗ from Eq. (19). 
  7:  Repeat Steps 1–6 10,000 times to compute the CPs and ALs.__________________

HPD intervals are constructed from the posterior distribution based on the Bayesian approach. The HPD consists of the values of the parameter for which the posterior density is highest (Casella & Berger, 2001), while the HPD interval is the narrowest possible interval for the parameter of interest at probability 100(1 − α)% (Maneerat, Niwitpong & Niwitpong, 2020).

In this section, the Bayesian confidence interval is constructed upon the Jeffreys’ priors, uniform priors and normal-gamma-beta prior.

The BAY-J and HPD-J intervals

The Jeffreys’ prior for δ in a binomial distribution is $p (δ) \propto {(δ)}^{- \frac{1}{2}} {(1 - δ)}^{\frac{1}{2}}$ (Bolstad & Curran, 2016). This leads to obtaining the marginal posterior distribution of δ as (20) $δ_{j e f} | x \sim B e t a (n_{(0)} + \frac{1}{2}, n_{(1)} + \frac{3}{2}) .$

Jeffreys’ prior for σ² in a lognormal distribution is p(σ²) ∝ σ⁻². Therefore, the marginal posterior distribution of σ² becomes (21) $σ_{j e f}^{2} | x \sim I G (\frac{n_{(1)}}{2}, \frac{\sum_{i = 1}^{n} {(x_{i} - μ)}^{2}}{2}) .$

The marginal posterior distribution of µis (22) $μ_{j e f} | σ^{2}, x \sim N (\bar{x}, σ_{j e f}^{2} / n_{(1)}) .$

We compute the mean and variance of gamma by using μ_jef|σ², x and $σ_{j e f}^{2} | x$ as follows: (23) $M_{B A Y - J} = {\{\frac{μ_{j e f}}{2} + \sqrt{{(\frac{μ_{j e f}}{2})}^{2} + σ_{j e f}^{2}}\}}^{3}$ (24) $V_{B A Y - J} = {\{\frac{μ_{j e f} + \sqrt{μ_{j e f}^{2} + 4 σ_{j e f}^{2}}}{2 (9^{- 1 / 4}) {(σ_{j e f}^{2})}^{- 1 / 4}}\}}^{4} .$

So that (25) ${\hat{τ}}_{B A Y - J} = (1 - δ_{j e f}) \cdot V_{B A Y - J} + δ_{j e f} (1 - δ_{j e f}) \cdot M_{B A Y - J}^{2} .$

The confidence interval and HPD interval of τ based on the Jeffreys’ prior are obtained as (26) $C I_{B A Y - J} = [{\hat{τ}}_{B A Y - J} (α / 2), {\hat{τ}}_{B A Y - J} (1 - α / 2)] .$

The BAY-U and HPD-U intervals

The uniform prior for δ in a binomial distribution is p(δ) ∝ 1 (Bolstad & Curran, 2016). This leads to obtaining the marginal posterior distribution of δ as (27) $δ_{u n i f} | x \sim B e t a (n_{(0)} + 1, n_{(1)} + 1) .$

The uniform prior for σ² is σ² ∝ 1 (Kalkur & Rao, 2017). Subsequently, the marginal posterior distribution of σ² becomes (28) $σ_{u n i f}^{2} | x \sim I G (\frac{n_{(1)} - 2}{2}, \frac{\sum_{i = 1}^{n} {(x_{i} - μ)}^{2}}{2}) .$

The marginal posterior distribution of µas (29) $μ_{u n i f} | σ^{2}, x \sim N (\bar{x}, σ_{u n i f}^{2} / n_{(1)}) .$

We compute the mean and variance of a gamma distribution using μ_unif|σ², x and $σ_{u n i f}^{2} | x$ as follows: (30) $M_{B A Y - U} = {\{\frac{μ_{u n i f}}{2} + \sqrt{{(\frac{μ_{u n i f}}{2})}^{2} + σ_{u n i f}^{2}}\}}^{3}$ (31) $V_{B A Y - U} = \{\frac{μ_{u n i f} + \sqrt{μ_{u n i f}^{2} + 4 σ_{u n i f}^{2}}}{2 (9^{- 1 / 4}) {(σ_{u n i f}^{2})}^{- 1 / 4}}\} .^{4}$

So that (32) ${\hat{τ}}_{B A Y - U} = (1 - δ_{u n i f}) \cdot V_{B A Y - U} + δ_{u n i f} (1 - δ_{u n i f}) \cdot M_{B A Y - U}^{2} .$

The confidence interval and HPD interval of τ based on the uniform prior are respectively obtained as (33) $C I_{B A Y - U} = [{\hat{τ}}_{B A Y - U} (α / 2), {\hat{τ}}_{B A Y - U} (1 - α / 2)] .$

The BAY-NGB and HPD-NGB intervals

Maneerat & Niwitpong (2021) defined the normal-gamma-beta prior as (34) $p (τ) \propto λ^{- 1} {[(δ) (1 - δ)]}^{- 1 / 2}$ where λ = σ⁻², (μ, λ) follows a normal-gamma distribution and δ follows a beta distribution (Maneerat & Niwitpong, 2021). Thus, the marginal posterior distributions of δ, σ² and µrespectively become (35) $δ_{N G B} | x \sim B e t a (n_{(0)} + \frac{1}{2}, n_{(1)} + \frac{1}{2})$ (36) $σ_{N G B}^{2} | x \sim I G (\frac{n_{(1)} - 1}{2}, \frac{\sum_{i = 1}^{n_{(1)}} {(x_{i} - μ)}^{2}}{2})$ (37) $μ_{N G B} | x \sim t_{2 (n_{(1)} - 1)} (\bar{x}, \frac{\sum_{i = 1}^{n} {(x_{i} - \bar{x})}^{2}}{n_{(1)} (n_{(1)} - 1)}) .$ We compute the mean and variance of a gamma distribution by using μ_NGB|x and $σ_{N G B}^{2} | x$ as follows: (38) $M_{B A Y - N G B} = {\{\frac{μ_{N G B}}{2} + \sqrt{{(\frac{μ_{N G B}}{2})}^{2} + σ_{N G B}^{2}}\}}^{3}$ (39) $V_{B A Y - N G B} = {\{\frac{μ_{N G B} + \sqrt{μ_{N G B}^{2} + 4 σ_{N G B}^{2}}}{2 (9^{- 1 / 4}) {(σ_{N G B}^{2})}^{- 1 / 4}}\}}^{4} .$

So that (40) ${\hat{τ}}_{B A Y - N G B} = (1 - δ_{N G B}) \cdot V_{B A Y - N G B} + δ_{N G B} (1 - δ_{N G B}) \cdot M_{B A Y - N G B}^{2} .$

The confidence interval and HPD interval of τ based on the normal-gamma-beta prior are respectively obtained as (41) $C I_{B A Y - N G B} = [{\hat{τ}}_{B A Y - N G B} (α / 2), {\hat{τ}}_{B A Y - N G B} (1 - α / 2)] .$

 
_______________________________________________________________________________________________________ 
Algorithm 3 Bayesian interval___________________________________________________ 
 1:  Generate x from a gamma distribution with excess zeros, compute ^ δ , ^ μ , and 
     ^ σ2. 
  2:  Generate δ|x from Eqs. (20), (27) and (35). 
  3:  Generate σ2|x from Eqs. (21), (28) and (36). 
  4:  Given σ2|x generate μ|σ2,x. 
  5:  Compute mean and variance of gamma distribution from Eqs.  (23), (24), 
     (30), (31), (38) and (39). 
  6:  Compute ^ τ from Eqs. (25), (32) and (40). 
  7:  Compute the 95% confidence intervals and HPD for ^ τ from Eqs. (26), (33) 
     and (41). 
  8:  Repeat Steps 1–7 10,000 times to compute the CPs and ALs.__________________

Simulation studies and Results

A Monte Carlo simulation study with 10,000 replications (M) and 5,000 repetitions (m) for FQ and PB, was conducted at a nominal confidence level of 0.95. We set sample size n as 30, 50, 100 or 200 and probability of zeros δ as 0.2, 0.5 or 0.8, for which we set shape parameter α as 7.00, 7.50 or 7.75; 2.00, 2.50 or 2.75; and 1.25, 1.50 or 1.75, respectively. We set rate parameter β as 1 for all cases. The performances of the confidence intervals were assessed by comparing their coverage probabilities (CPs) and average lengths (ALs); the best-performing confidence interval for a particular situation was identified as having a CP close or greater than 0.95 and the shortest AL. The confidence intervals for the variance of gamma distribution with excess zeros constructed using FQ, PB, BAY-J, HPD-J, BAY-U, HPD-U, BAY-NGB and HPD-NGB.

We report the coverage probabilities and the average lengths of nominal 95% two-sided confidence intervals for variance of gamma distribution with excess zeros in Table 1 and Figs. 1, 2 and 3.

Table 1:

The coverage probabilities and (average lengths) of nominal 95% two-sided confidence intervals for variance of gamma distribution with excess zeros.

n	δ	α	Coverage probability (Average length)
			PB	FQ	BAY-J	HPD-J	BAY-U	HPD-U	BAY-NGB	HPD-NGB
30	0.2	7.00	0.9444	0.9686	0.9226	0.9184	0.9324	0.9444	0.9802	0.9771
			(11.6924)	(12.6084)	(10.0751)	(9.8624)	(10.9472)	(10.6335)	(12.7642)	(12.4067)
		7.50	0.9480	0.9728	0.9317	0.9293	0.9420	0.9522	0.9826	0.9789
			(12.8866)	(13.6348)	(11.0851)	(10.8819)	(11.9674)	(11.6665)	(13.9564)	(13.6000)
		7.75	0.9541	0.9731	0.9378	0.9334	0.9482	0.9569	0.9827	0.9807
			(13.5974)	(14.3134)	(11.7150)	(11.5094)	(12.6155)	(12.3114)	(14.6896)	(14.3333)
	0.5	2.00	0.8616	0.9521	0.8004	0.7817	0.8487	0.8557	0.9578	0.9391
			(2.9978)	(4.1962)	(2.3918)	(2.0896)	(3.2420)	(2.7330)	(3.8034)	(3.3989)
		2.50	0.8629	0.9500	0.7903	0.7788	0.8354	0.8502	0.9556	0.9440
			(3.7780)	(5.3509)	(3.0529)	(2.7002)	(4.0796)	(3.4959)	(4.8638)	(4.4099)
		2.75	0.8601	0.9467	0.7850	0.7767	0.8308	0.8433	0.9543	0.9454
			(4.1300)	(5.8440)	(3.3308)	(2.9635)	(4.4162)	(3.8167)	(5.3407)	(4.8762)
	0.8	1.25	0.7784	0.9564	0.8347	0.8479	0.8874	0.9569	0.9711	0.9616
			(1.3763)	(12.5762)	(3.5742)	(2.0919)	(63.4518)	(15.9813)	(10.0543)	(4.5952)
		1.50	0.7932	0.9615	0.8403	0.8577	0.8897	0.9603	0.9754	0.9671
			(1.6638)	(13.2999)	(4.0138)	(2.4576)	(63.9289)	(6.6673)	(10.6502)	(5.1506)
		1.75	0.8048	0.9621	0.8489	0.8647	0.8937	0.9637	0.9793	0.9725
			(1.9395)	(12.9027)	(4.1595)	(2.7093)	(53.5498)	(15.2379)	(10.3024)	(5.4055)
50	0.2	7.00	0.9621	0.9704	0.9275	0.9243	0.9411	0.9461	0.9814	0.9789
			(9.2009)	(9.0634)	(7.5400)	(7.4561)	(7.8353)	(7.7315)	(9.4418)	(9.2934)
		7.50	0.9625	0.9704	0.9338	0.9296	0.9447	0.9506	0.9807	0.9779
			(10.1651)	(9.9058)	(8.3868)	(8.3065)	(8.6808)	(8.5812)	(10.4194)	(10.2715)
		7.75	0.9655	0.9729	0.9374	0.9367	0.9463	0.9498	0.9844	0.9826
			(10.6812)	(10.3530)	(8.8378)	(8.7599)	(9.1334)	(9.0356)	(10.9368)	(10.7863)
	0.5	2.00	0.9054	0.9478	0.7868	0.7573	0.8238	0.8155	0.9505	0.9285
			(2.4797)	(2.6883)	(1.6201)	(1.4938)	(1.8801)	(1.7160)	(2.5473)	(2.3981)
		2.50	0.9010	0.9475	0.7890	0.7693	0.8228	0.8202	0.9514	0.9341
			(3.0615)	(3.4346)	(2.0567)	(1.9090)	(2.3755)	(2.1861)	(3.2687)	(3.1047)
		2.75	0.9039	0.9515	0.7892	0.7674	0.8223	0.8193	0.9538	0.9417
			(3.3850)	(3.8265)	(2.2825)	(2.1243)	(2.6329)	(2.4295)	(3.6435)	(3.4714)
	0.8	1.25	0.8435	0.9559	0.8337	0.8262	0.8882	0.9116	0.9688	0.9476
			(1.1826)	(2.4727)	(1.2830)	(1.0168)	(2.5640)	(1.7317)	(2.1219)	(1.6296)
		1.50	0.8550	0.9569	0.8384	0.8402	0.8860	0.9161	0.9699	0.9538
			(1.4185)	(2.8663)	(1.5275)	(1.2448)	(2.8649)	(2.0242)	(2.4800)	(1.9666)
		1.75	0.8675	0.9602	0.8515	0.8537	0.8930	0.9239	0.9736	0.9622
			(1.6807)	(3.2832)	(1.7911)	(1.4943)	(3.1990)	(2.3387)	(2.8757)	(2.3349)
100	0.2	7.00	0.9685	0.9652	0.9270	0.9238	0.9372	0.9366	0.9758	0.9729
			(6.6077)	(6.1266)	(5.2494)	(5.2171)	(5.3267)	(5.2916)	(6.5195)	(6.4617)
		7.50	0.9732	0.9682	0.9321	0.9311	0.9394	0.9407	0.9801	0.9785
			(7.3264)	(6.7357)	(5.8730)	(5.8403)	(5.9473)	(5.9122)	(7.2284)	(7.1679)
		7.75	0.9760	0.9702	0.9437	0.9426	0.9501	0.9512	0.9841	0.9817
			(7.6438)	(7.0104)	(6.1733)	(6.1407)	(6.2465)	(6.2117)	(7.5695)	(7.5095)
	0.5	2.00	0.9332	0.9292	0.7597	0.7285	0.7931	0.7636	0.9316	0.9120
			(1.8169)	(1.6738)	(1.0400)	(0.9967)	(1.1103)	(1.0616)	(1.6412)	(1.5930)
		2.50	0.9360	0.9420	0.7703	0.7434	0.7995	0.7783	0.9436	0.9306
			(2.2541)	(2.1692)	(1.3337)	(1.2817)	(1.4222)	(1.3641)	(2.1292)	(2.0751)
		2.75	0.9301	0.9392	0.7763	0.7528	0.7995	0.7875	0.9425	0.9295
			(2.4789)	(2.4163)	(1.4761)	(1.4200)	(1.5739)	(1.5114)	(2.3735)	(2.3171)
	0.8	1.25	0.9076	0.9439	0.8161	0.7969	0.8573	0.8475	0.9550	0.9302
			(0.9191)	(1.0226)	(0.6335)	(0.5746)	(0.7685)	(0.6809)	(0.9717)	(0.8766)
		1.50	0.9159	0.9526	0.8333	0.8141	0.8667	0.8630	0.9624	0.9427
			(1.0920)	(1.2439)	(0.7821)	(0.7184)	(0.9349)	(0.8409)	(1.1916)	(1.0887)
		1.75	0.9123	0.9544	0.8394	0.8267	0.8696	0.8697	0.9678	0.9482
			(1.2881)	(1.4819)	(0.9445)	(0.8765)	(1.1158)	(1.0159)	(1.4312)	(1.3194)
200	0.2	7.00	0.9761	0.9634	0.9225	0.9199	0.9339	0.9330	0.9751	0.9715
			(4.7169)	(4.2392)	(3.6845)	(3.6666)	(3.7070)	(3.6888)	(4.5589)	(4.5303)
		7.50	0.9785	0.9665	0.9317	0.9304	0.9442	0.9428	0.9775	0.9755
			(5.1932)	(4.6428)	(4.1173)	(4.0987)	(4.1390)	(4.1201)	(5.0479)	(5.0179)
		7.75	0.9822	0.9692	0.9403	0.9384	0.9483	0.9469	0.9817	0.9799
			(5.4485)	(4.8598)	(4.3497)	(4.3307)	(4.3699)	(4.3503)	(5.3069)	(5.2765)
	0.5	2.00	0.9477	0.8978	0.6997	0.6659	0.7285	0.6938	0.9000	0.8774
			(1.3034)	(1.1146)	(0.7016)	(0.6854)	(0.7237)	(0.7066)	(1.1126)	(1.0944)
		2.50	0.9463	0.9201	0.7363	0.7060	0.7590	0.7326	0.9209	0.9059
			(1.6261)	(1.4556)	(0.9051)	(0.8852)	(0.9330)	(0.9121)	(1.4510)	(1.4304)
		2.75	0.9470	0.9297	0.7443	0.7145	0.7667	0.7419	0.9302	0.9162
			(1.7859)	(1.6291)	(1.0051)	(0.9835)	(1.0358)	(1.0131)	(1.6250)	(1.6031)
	0.8	1.25	0.9426	0.9268	0.7829	0.7506	0.8168	0.7872	0.9383	0.9111
			(0.6664)	(0.5858)	(0.3827)	(0.3651)	(0.4131)	(0.3922)	(0.5860)	(0.5575)
		1.50	0.9476	0.9450	0.8173	0.7885	0.8451	0.8224	0.9553	0.9334
			(0.8008)	(0.7339)	(0.4859)	(0.4665)	(0.5214)	(0.4983)	(0.7377)	(0.7060)
		1.75	0.9462	0.9480	0.8317	0.8124	0.8542	0.8393	0.9594	0.9419
			(0.9469)	(0.8890)	(0.5953)	(0.5742)	(0.6363)	(0.6114)	(0.8986)	(0.8637)

DOI: 10.7717/peerj.14023/table-1

Notes:

aThe coverage probabilities greater than the nominal confidence level of 0.95 are in bold and the shortest average lengths are in italics.

The CPs of the PB, FQ, HPD-U, BAY-NGB, and HPD-NGB confidence intervals were greater than or close to the nominal confidence level of 0.95 in all situations studied. For a small-to-moderate sample size, FQ and the HPD-U performed well for small δ whereas BAY-NGB and HPD-NGB performed well for large δ. For a large sample size, FQ performed well for small δ whereas BAY-NGB performed well for large δ. Although the expected lengths of the HPD-J were shorter than the other methods, the CPs of BAY-J and HPD-J were lower than the nominal confidence level in all cases.

The findings show that although FQ, HPD-U, BAY-NGB, and HPD-NGB attained acceptable CPs, the ALs of BAY-NGB and the HPD-NGB were shorter than the other methods, and so they can be recommended for constructing the confidence interval for the variance of a gamma distribution with excess zeros. It can be seen that for HPD-NGB developed from the study of Maneerat & Niwitpong (2021), the simulation results are similar to these studies. For small-to-large sample size, HPD-NGB performed well. BAY-NGB and HPD-NGB are the best because BAY-NGB and HPD-NGB attained stable CPs and ALs were shorter than the other methods for all sample sizes. A referee suggested to check the validity and robustness of the model for smaller sample sizes with moderate number of zeros. We, therefore, simulated a study with 10,000 replications (M) and 5,000 repetitions (m) for FQ and PB, was conducted at a nominal confidence level of 0.95. We set sample size n as 10 or 20 and probability of zeros δ as 0.2, or 0.5, for which we set shape parameter α as 7.00, 7.50 or 7.75; and 2.00, 2.50 or 2.75, respectively. We set rate parameter β as one for all cases. The results (not shown here) show that the CPs of the FQ, HPD-U, BAY-NGB, and HPD-NGB confidence intervals were greater than or close to the nominal confidence level of 0.95 in all situations studied. The findings show that although FQ, HPD-U, BAY-NGB, and HPD-NGB attained acceptable CPs, the ALs of HPD-NGB were shorter than the other methods. Although the sample sizes are small (n = 10, n = 20), our findings show that BAY-NGB and HPD-NGB can be recommended for constructing the confidence interval for the variance of a gamma distribution with excess zeros.

Figure 1: Line graphs of (A) coverage probabilities and (B) average lengths of all methods in the case of the different sample sizes.

Download full-size image

DOI: 10.7717/peerj.14023/fig-1

Figure 2: Line graphs of (A) coverage probabilities and (B) average lengths of all methods in the case of the different probabilities of zero values.

Download full-size image

DOI: 10.7717/peerj.14023/fig-2

Figure 3: Line graphs of (A) coverage probabilities and (B) average lengths of all methods in the case of the different shape parameters.

Download full-size image

DOI: 10.7717/peerj.14023/fig-3

Empirical application of the proposed confidence intervals

The confidence interval performances were compared by using real-world datasets comprising monthly rainfall data reported by the Upper Northern Region Irrigation Hydrology for January and February 1993 to 2021 at the Kiew Lom Dam, Lampang province, Thailand.

First, the best fit for the positive rainfall data among normal, lognormal, Cauchy, and gamma models was examined by calculating their Akaike information criterion (AIC) and Bayesian information criterion (BIC) values (Table 2). The results show that the lowest AIC and BIC values (207.7139 and 210.2301, respectively) were for the gamma distribution, indicating that it was the best fit for the data.

The summary statistics for the rainfall data in Kiew Lom Dam Lampang province are $\bar{x} = 18.6461$ , n = 58 , n₍₁₎ = 26, n₍₀₎ = 32, while the maximum likelihood estimators for δ, α, β and τ are $\hat{δ} = 0.5517, \hat{α} = 0.7297, \hat{β} = 0.0391$ and $\hat{τ} = 299.5542$ , respectively. The calculated two-sided confidence intervals for τ are reported in Table 3.

Table 2:

AIC and BIC results of positive rainfall data.

Models	Normal	Lognormal	Cauchy	Gamma
AIC	224.9317	216.186	230.4221	207.7139
BIC	227.4479	218.7022	232.9383	210.2301

DOI: 10.7717/peerj.14023/table-2

Table 3:

The 95% two-sided confidence intervals for variance of rainfall data in Kiew Lom Dam in Lampang province.

Methods	Confidence intervals for θ		Length of intervals
	Lower	Upper
PB	115.6468	543.4372	427.7903
FQ	115.3533	974.3039	858.9506
BAY-J	138.7433	764.2119	625.4687
HPD-J	107.4391	613.0399	505.6008
BAY-U	146.4527	1078.71	932.2578
HPD-U	111.2196	809.1257	697.9061
BAY-NGB	135.4990	885.4536	749.9546
HPD-NGB	102.1386	685.1513	583.0128

DOI: 10.7717/peerj.14023/table-3

For n = 50 and δ = 0.5, FQ and BAY-NGB obtained CPs close to the nominal confidence level of 0.95, but BAY-NGB bay obtained the shortest length method. Thus, the BAY-NGB method is recommended for constructing the confidence interval for the variance in rainfall data in January and February at the Kiew Lom Dam in Lampang province.

Conclusions

We constructed confidence intervals for the variance of a gamma distribution with excess zeros by using the PB, FQ, BAY-J, HPD-J, BAY-U, HPD-U, BAY-NGB, and HPD-NGB approaches. The CPs and ALs of the methods were assessed by Monte Carlo simulation for various situations and by using real precipitation data following a gamma distribution with excess zeros. Our findings show that BAY-NGB and HPD-NGB can be recommended for constructing the confidence interval for the variance of a gamma distribution with excess zeros. In future research, we will investigate constructing confidence intervals for the difference between the variances of gamma distributions with excess zeros.

Supplemental Information

R code

This code computed coverrage probabilities and average lengths for all confidence intervals.

DOI: 10.7717/peerj.14023/supp-1

Download

R code to compute the data set

This R code is computed all confidence intervals

DOI: 10.7717/peerj.14023/supp-2

Download

The monthly rainfall data (mm) from Kiew Lom Dam, Lampang province, Thailand in January and February, comprising 58 observations from 1993–2021

DOI: 10.7717/peerj.14023/supp-3

Download

[1] Aitchison J. 1955. On the distribution of a positive random variable having a discrete probability mass at the origin. Journal of the American Statistical Association 50(271):901-908

[2] Aitchison J, Brown JAC. 1963. The lognormal distribution: with special reference to its uses in economics. London: Cambridge University Press.

[3] Bolstad WM, Curran JM. 2016. Introduction to bayesian statistics (Third Edition). Hoboken: Wiley.

[4] Casella G, Berger RL. 2001. Statistical inference (Second Edition). Boston: Cengage Learning.

[5] Kaewprasert T, Niwitpong SA, Niwitpong S. 2022. Bayesian estimation for the mean of delta-gamma distributions with application to rainfall data in Thailand. PeerJ 10:e13465

[6] Kalkur TA, Rao A. 2017. Bayes estimator for coefficient of variation and inverse coefficient of variation for the normal distribution. International Journal of Statistics and Systems 12(4):721-732

[7] Krishnamoorthy K, León-Novelo L. 2014. Small sample inference for gamma parameters: one-sample and two-sample problems. Environmetrics 25(2):107-126

[8] Krishnamoorthy K, Mathew T, Mukherjee S. 2008. Normal-based methods for a gamma distribution. Technometrics 50(1):69-78

[9] Krishnamoorthy K, Wang X. 2016. Fiducial confidence limits and prediction limits for a gamma distribution: censored and uncensored cases. Environmetrics 27(8):479-493

[10] Lecomte JB, Benoît HP, Ancelet S, Etienne MP, Bel L, Parent E. 2013. Compound poisson-gamma vs. delta-gamma to handle zero-inflated continuous data under a variable sampling volume. In: O’Hara RB, ed. Methods in ecology and evolution. Hoboken: Wiley. 1159-1166

[11] Li X, Zhou X, Tian L. 2013. Interval estimation for the mean of lognormal data with excess zeros. Statistics & Probability Letters 83(11):2447-2453

[12] Maneerat P, Niwitpong SA. 2021. Estimating the average daily rainfall in Thailand using confidence intervals for the common mean of several delta-lognormal distributions. PeerJ 9:e10758

[13] Maneerat P, Niwitpong S, Niwitpong S. 2020. Bayesian confidence intervals for the difference between variances of deltalognormal distributions. Biometrical Journal 62(7):1769-1790

[14] Muralidharan K, Kale BK. 2002. Modified gamma distributions with singularity at zero. Communications in Statistics 31(1):143-158

[15] Piao C, Zhi-Sheng Y. 2015. Tolerance limits for gamma distribution based on generalized fiducial method. In: 2015 IEEE International Conference on Industrial Engineering and Engineering Management (IEEM).

[16] Pradhan B, Kundu D. 2011. Bayes estimation and prediction of the two-parameter gamma distribution. Journal of Statistical Computation and Simulation 81(9):1187-1198

[17] Ren P, Liu G, Pu X. 2021. Simultaneous confidence intervals for mean differences of multiple zero-inflated gamma distributions with applications to precipitation. Communications in Statistics - Simulation and Computation

[18] Sangnawakij P, Niwitpong SA, Niwitpong S. 2015. Confidence intervals for the ratio of coefficients of variation of the gamma distributions. In: Hyunh VN, Inuiguchi M, Demoeux T, eds. Integrated Uncertainty in Knowledge Modeling and Decision Making 2015, Lecture Notes in Computer Science, vol. 9376. Cham: Springer. 193-203