Micro Data Functions

library(pipster)

Overview

This vignette shows an overview of the pipster package functions for micro data. Microdata consist of detailed records of individual welfare measures, such as consumption, expenditure, or income, where each observation corresponds to a unique individual, and is accompanied by a sample weight that represents the individual’s proportion in the overall population. pipster provides a series of functions to estimate poverty and inequality measures using microdata:

It also provides a series of functions to calculate welfare shares, cumulative welfare shares and income thresholds for each quantile:

Micro Data Sample

In this vignette, we will explore several typical scenarios in which the pipster package can be effectively utilized. In each of these scenario, we will use a sample dataset with 1000 observations, pip_md, available with this package. The variables are the following:

welfare: welfare (income or consumption).
weight: population weights.

Here is a preview of the first 10 observations:

#>         welfare weight
#> 1    81.5864216   7941
#> 2    61.4004171   7672
#> 3   304.4441509   2617
#> 4  1267.9985109   7912
#> 5     3.9202884   8371
#> 6     0.6881794   6819
#> 7  1176.7986957   4229
#> 8     5.5824495   4204
#> 9     8.1514308   7863
#> 10    1.3944173   4762

Case 1: Poverty Profiling

pipster allows the user to estimate poverty measures quickly and accurately. To demonstrate its use, we can manually calculate FGT(0), FGT(1), and FGT(2), and then replicate it using only pipster functions. The Foster-Greer-Thorbecke indices are a family of poverty metrics which can be derived by substituting different values of the parameter \(\alpha\) into the following equation:

\[F G T_\alpha=\frac{1}{N} \sum_{i=1}^H\left(\frac{z-y_i}{z}\right)^\alpha\] where \(z\) is the povety line, \(N\) the total population, and \(H\) the number of poor individuals (with income, or \(y_i\), \(<= z\)).

1.1 Poverty Headcount

The poverty headcount, or FGT(1) can be calculated as follows: \[F G T_0=\frac{1}{N} \sum_{i=1}^H\left(\frac{z-y_i}{z}\right)^0 = \frac{H}{N}\]

z = 1.4 # set the poverty line
N = sum(pip_md$weight)
H = sum(pip_md$weight[pip_md$welfare <= z])

FGT0 = H/N

print(paste0("The poverty headcount index is ", round(FGT0*100,2), "%"))
#> [1] "The poverty headcount index is 37.15%"

In pipster, we can simply use the pipmd_pov_headcount() function:

pip_FGT0 <- pipmd_pov_headcount(welfare = pip_md$welfare,
                                weight = pip_md$weight,
                                povline = z)
#> Warning: replacing previous import 'collapse::fdroplevels' by
#> 'data.table::fdroplevels' when loading 'wbpip'

print(paste0("The poverty headcount index is ", round(pip_FGT0$pov_headcount*100,2), "%"))
#> [1] "The poverty headcount index is 37.15%"

1.2 Poverty Gap

The poverty gap, or FGT(1), can be calculated as follows: \[F G T_1=\frac{1}{N} \sum_{i=1}^H\left(\frac{z-y_i}{z}\right)\]

# Calculate the shortfall: the distance between the poverty line and the income of the poor, for each poor.
shortfall <- sum((z - pip_md$welfare[pip_md$welfare <= z]) * pip_md$weight[pip_md$welfare <= z])/z

FGT1 <- (1/N)*(shortfall)

print(paste0("The poverty gap index is ", round(FGT1*100,2), "%"))
#> [1] "The poverty gap index is 27.48%"

In pipster, we can simply use the pipmd_pov_gap() function:

pip_FGT1 <- pipmd_pov_gap(welfare = pip_md$welfare,
                          weight = pip_md$weight,
                          povline = z)

print(paste0("The poverty gap index is ", round(pip_FGT1$pov_gap*100,2), "%"))
#> [1] "The poverty gap index is 27.48%"

1.3 Poverty Severity

The poverty severity, or FGT(2), can be calculated as follows: \[F G T_2=\frac{1}{N} \sum_{i=1}^H\left(\frac{z-y_i}{z}\right)^2\]

shortfall_squared <- sum(((z - pip_md$welfare[pip_md$welfare <= z]) / z)^2 * pip_md$weight[pip_md$welfare <= z])

FGT2 <- (1/N)*shortfall_squared

print(paste0("The poverty severity index is ", round(FGT2*100,2), "%"))
#> [1] "The poverty severity index is 23.29%"

In pipster, we can simply use the pipmd_pov_severity() function:

pip_FGT2 <- pipmd_pov_severity(welfare = pip_md$welfare,
                               weight = pip_md$weight,
                               povline = z)


print(paste0("The poverty severity index is ", round(pip_FGT2$pov_severity*100,2), "%"))
#> [1] "The poverty severity index is 23.29%"

1.4 Alternatives

For each of the previous poverty indexes functions, the user can specify the mean or a factor (times_mean) with which the function will calculate the poverty line. For example:

pip_FGT1_alt <- pipmd_pov_gap(welfare = pip_md$welfare,
                              weight = pip_md$weight,
                              times_mean = 0.2)

print(paste0("The poverty gap index is ", round(pip_FGT1_alt$pov_gap*100,2), "%"))
#> [1] "The poverty gap index is 76.82%"

Finally, povline can also take multiple values:

pip_FGT1_multiple <- pipmd_pov_gap(welfare = pip_md$welfare,
                                          weight = pip_md$weight,
                                          povline = c(1.4, 3, 4))

print(pip_FGT1_multiple)
#>    povline   pov_gap
#>      <num>     <num>
#> 1:     1.4 0.2747608
#> 2:     3.0 0.3571872
#> 3:     4.0 0.3955473

Case 2: Additional Inequality and Poverty Measures

pipster can also be used to easily calculate additional inequality measures. The Gini coefficient can be calculated using pipmd_gini() like so:

gini <- pipmd_gini(welfare = pip_md$welfare,
                   weight = pip_md$weight)

print((paste0("The Gini index is ", round(gini$value, 2))))
#> [1] "The Gini index is -0.29"

The MLD (Mean Logarithmic Deviation) can be calculated using pipgd_mld() like so:

mld <- pipmd_mld(welfare = pip_md$welfare,
                 weight = pip_md$weight)

print((paste0("The MLD is ", round(mld$value,2))))
#> [1] "The MLD is 4.57"

And finally, the Watts Index can be calculated using pipgd_watts() specifying the poverty line (povline) like so:

z <- 3 # set the poverty line
watts <- pipmd_watts(welfare = pip_md$welfare,
                     weight = pip_md$weight,
                     povline = z)

print((paste0("The Watts index is ", round(watts$watts, 2))))
#> [1] "The Watts index is 0.74"

In alternative, the user can specify the parameter times_mean. In this case, the poverty line will be calculated as the mean of the welfare vector multiplied by the times_mean parameter:

watts <- pipmd_watts(welfare = pip_md$welfare,
                     weight = pip_md$weight,
                     times_mean = 0.8)

print((paste0("The Watts index is ", round(watts$watts, 2))))
#> [1] "The Watts index is 3.96"

Case 3: Welfare Shares

One simple use case is the calculation of welfare shares at a specific quantile or the cumulative welfare shares at a specific quantile by specifying n:

quantiles <- 5
quantile_welfare_share <- pipmd_quantile_welfare_share(welfare = pip_md$welfare,
                                                       weight  = pip_md$weight,
                                                       n = quantiles)

quantile_welfare_share_at <- pipmd_welfare_share_at(welfare = pip_md$welfare,
                                                    weight  = pip_md$weight,
                                                    n = quantiles)

quantile_threshold <- pipmd_quantile(welfare = pip_md$welfare,
                                     weight  = pip_md$weight,
                                     n = quantiles)
  
# Combine into a dataframe for practicality
df_combined <- data.frame(
  popshare = quantile_welfare_share$quantile,
  quantile_share = quantile_welfare_share$share,
  cumulative_share = quantile_welfare_share_at$share_at,
  income_threshold = quantile_threshold$values
)

# View the combined dataframe
print(df_combined)
#>   popshare quantile_share cumulative_share income_threshold
#> 1    q_20%   3.168918e-05        0.2010610     2.523813e-01
#> 2    q_40%   4.331949e-04        0.4015349     1.794095e+00
#> 3    q_60%   1.833667e-03        0.6013513     6.789944e+00
#> 4    q_80%   9.426906e-03        0.8004222     4.445397e+01
#> 5   q_100%   9.882745e-01        1.0000000     1.233918e+05

Another use case is calculating the welfare share of a specific share of the population, which can be achieved using pipmd_welfare_share_at() by setting n = NULL and specifying the popshare:

selected_popshare <- 0.8
welfare_at_50 <- pipmd_welfare_share_at(welfare = pip_md$welfare,
                                         weight  = pip_md$weight,
                                         n= NULL,
                                         popshare = selected_popshare)

# Format the string with the given values
formatted_message <- sprintf("The bottom %.0f%% of the population owns %.0f%% of welfare.",
                             selected_popshare * 100,
                             welfare_at_50$share_at * 100)

print(formatted_message)
#> [1] "The bottom 80% of the population owns 80% of welfare."