Micro Data Functions
md_functions.Rmd
Overview
This vignette shows an overview of the pipster
package
functions for micro data. Microdata consist of detailed records of
individual welfare measures, such as consumption, expenditure, or
income, where each observation corresponds to a unique individual, and
is accompanied by a sample weight that represents the individual’s
proportion in the overall population. pipster
provides a
series of functions to estimate poverty and inequality measures using
microdata:
pipmd_pov_headcount()
(FGT0)pipmd_pov_gap()
(FGT1)pipmd_pov_severity()
(FGT2)
It also provides a series of functions to calculate welfare shares, cumulative welfare shares and income thresholds for each quantile:
Micro Data Sample
In this vignette, we will explore several typical scenarios in which
the pipster
package can be effectively utilized. In each of
these scenario, we will use a sample dataset with 1000 observations,
pip_md
, available with this package. The variables are the
following:
- welfare: welfare (income or consumption).
- weight: population weights.
Here is a preview of the first 10 observations:
#> welfare weight
#> 1 81.5864216 7941
#> 2 61.4004171 7672
#> 3 304.4441509 2617
#> 4 1267.9985109 7912
#> 5 3.9202884 8371
#> 6 0.6881794 6819
#> 7 1176.7986957 4229
#> 8 5.5824495 4204
#> 9 8.1514308 7863
#> 10 1.3944173 4762
Case 1: Poverty Profiling
pipster
allows the user to estimate poverty measures
quickly and accurately. To demonstrate its use, we can manually
calculate FGT(0), FGT(1), and FGT(2), and then replicate it using only
pipster
functions. The Foster-Greer-Thorbecke indices are a
family of poverty metrics which can be derived by substituting different
values of the parameter \(\alpha\) into
the following equation:
\[F G T_\alpha=\frac{1}{N} \sum_{i=1}^H\left(\frac{z-y_i}{z}\right)^\alpha\] where \(z\) is the povety line, \(N\) the total population, and \(H\) the number of poor individuals (with income, or \(y_i\), \(<= z\)).
1.1 Poverty Headcount
The poverty headcount, or FGT(1) can be calculated as follows: \[F G T_0=\frac{1}{N} \sum_{i=1}^H\left(\frac{z-y_i}{z}\right)^0 = \frac{H}{N}\]
z = 1.4 # set the poverty line
N = sum(pip_md$weight)
H = sum(pip_md$weight[pip_md$welfare <= z])
FGT0 = H/N
print(paste0("The poverty headcount index is ", round(FGT0*100,2), "%"))
#> [1] "The poverty headcount index is 37.15%"
In pipster
, we can simply use the
pipmd_pov_headcount()
function:
pip_FGT0 <- pipmd_pov_headcount(welfare = pip_md$welfare,
weight = pip_md$weight,
povline = z)
#> Warning: replacing previous import 'collapse::fdroplevels' by
#> 'data.table::fdroplevels' when loading 'wbpip'
print(paste0("The poverty headcount index is ", round(pip_FGT0$pov_headcount*100,2), "%"))
#> [1] "The poverty headcount index is 37.15%"
1.2 Poverty Gap
The poverty gap, or FGT(1), can be calculated as follows: \[F G T_1=\frac{1}{N} \sum_{i=1}^H\left(\frac{z-y_i}{z}\right)\]
# Calculate the shortfall: the distance between the poverty line and the income of the poor, for each poor.
shortfall <- sum((z - pip_md$welfare[pip_md$welfare <= z]) * pip_md$weight[pip_md$welfare <= z])/z
FGT1 <- (1/N)*(shortfall)
print(paste0("The poverty gap index is ", round(FGT1*100,2), "%"))
#> [1] "The poverty gap index is 27.48%"
In pipster
, we can simply use the
pipmd_pov_gap()
function:
pip_FGT1 <- pipmd_pov_gap(welfare = pip_md$welfare,
weight = pip_md$weight,
povline = z)
print(paste0("The poverty gap index is ", round(pip_FGT1$pov_gap*100,2), "%"))
#> [1] "The poverty gap index is 27.48%"
1.3 Poverty Severity
The poverty severity, or FGT(2), can be calculated as follows: \[F G T_2=\frac{1}{N} \sum_{i=1}^H\left(\frac{z-y_i}{z}\right)^2\]
shortfall_squared <- sum(((z - pip_md$welfare[pip_md$welfare <= z]) / z)^2 * pip_md$weight[pip_md$welfare <= z])
FGT2 <- (1/N)*shortfall_squared
print(paste0("The poverty severity index is ", round(FGT2*100,2), "%"))
#> [1] "The poverty severity index is 23.29%"
In pipster
, we can simply use the
pipmd_pov_severity()
function:
pip_FGT2 <- pipmd_pov_severity(welfare = pip_md$welfare,
weight = pip_md$weight,
povline = z)
print(paste0("The poverty severity index is ", round(pip_FGT2$pov_severity*100,2), "%"))
#> [1] "The poverty severity index is 23.29%"
1.4 Alternatives
For each of the previous poverty indexes functions, the user can
specify the mean
or a factor (times_mean
) with
which the function will calculate the poverty line. For example:
pip_FGT1_alt <- pipmd_pov_gap(welfare = pip_md$welfare,
weight = pip_md$weight,
times_mean = 0.2)
print(paste0("The poverty gap index is ", round(pip_FGT1_alt$pov_gap*100,2), "%"))
#> [1] "The poverty gap index is 76.82%"
Finally, povline
can also take multiple values:
pip_FGT1_multiple <- pipmd_pov_gap(welfare = pip_md$welfare,
weight = pip_md$weight,
povline = c(1.4, 3, 4))
print(pip_FGT1_multiple)
#> povline pov_gap
#> <num> <num>
#> 1: 1.4 0.2747608
#> 2: 3.0 0.3571872
#> 3: 4.0 0.3955473
Case 2: Additional Inequality and Poverty Measures
pipster
can also be used to easily calculate additional
inequality measures. The Gini coefficient can be
calculated using pipmd_gini()
like so:
gini <- pipmd_gini(welfare = pip_md$welfare,
weight = pip_md$weight)
print((paste0("The Gini index is ", round(gini$value, 2))))
#> [1] "The Gini index is -0.29"
The MLD (Mean Logarithmic Deviation) can be
calculated using pipgd_mld()
like so:
mld <- pipmd_mld(welfare = pip_md$welfare,
weight = pip_md$weight)
print((paste0("The MLD is ", round(mld$value,2))))
#> [1] "The MLD is 4.57"
And finally, the Watts Index can be calculated using
pipgd_watts()
specifying the poverty line
(povline
) like so:
z <- 3 # set the poverty line
watts <- pipmd_watts(welfare = pip_md$welfare,
weight = pip_md$weight,
povline = z)
print((paste0("The Watts index is ", round(watts$watts, 2))))
#> [1] "The Watts index is 0.74"
In alternative, the user can specify the parameter
times_mean
. In this case, the poverty line will be
calculated as the mean of the welfare
vector multiplied by
the times_mean
parameter:
watts <- pipmd_watts(welfare = pip_md$welfare,
weight = pip_md$weight,
times_mean = 0.8)
print((paste0("The Watts index is ", round(watts$watts, 2))))
#> [1] "The Watts index is 3.96"
Case 3: Welfare Shares
3.1 Welfare share for a specific number of quantiles
One simple use case is the calculation of welfare shares at a
specific quantile or the cumulative welfare shares at a specific
quantile by specifying n
:
quantiles <- 5
quantile_welfare_share <- pipmd_quantile_welfare_share(welfare = pip_md$welfare,
weight = pip_md$weight,
n = quantiles)
quantile_welfare_share_at <- pipmd_welfare_share_at(welfare = pip_md$welfare,
weight = pip_md$weight,
n = quantiles)
quantile_threshold <- pipmd_quantile(welfare = pip_md$welfare,
weight = pip_md$weight,
n = quantiles)
# Combine into a dataframe for practicality
df_combined <- data.frame(
popshare = quantile_welfare_share$quantile,
quantile_share = quantile_welfare_share$share,
cumulative_share = quantile_welfare_share_at$share_at,
income_threshold = quantile_threshold$values
)
# View the combined dataframe
print(df_combined)
#> popshare quantile_share cumulative_share income_threshold
#> 1 q_20% 3.168918e-05 0.2010610 2.523813e-01
#> 2 q_40% 4.331949e-04 0.4015349 1.794095e+00
#> 3 q_60% 1.833667e-03 0.6013513 6.789944e+00
#> 4 q_80% 9.426906e-03 0.8004222 4.445397e+01
#> 5 q_100% 9.882745e-01 1.0000000 1.233918e+05
3.2 Welfare share at a given population share
Another use case is calculating the welfare share of a specific share
of the population, which can be achieved using
pipmd_welfare_share_at()
by setting n = NULL
and specifying the popshare
:
selected_popshare <- 0.8
welfare_at_50 <- pipmd_welfare_share_at(welfare = pip_md$welfare,
weight = pip_md$weight,
n= NULL,
popshare = selected_popshare)
# Format the string with the given values
formatted_message <- sprintf("The bottom %.0f%% of the population owns %.0f%% of welfare.",
selected_popshare * 100,
welfare_at_50$share_at * 100)
print(formatted_message)
#> [1] "The bottom 80% of the population owns 80% of welfare."