Skip to contents

This article explains how the pd_* functions in {pipdata} work together. These are the functions used during the data processing pipeline and that are necessary to process both welfare data from datalibweb and auxiliary data from different sources into the files that will be used for calculations in the estimations pipeline. The process is summarized in the

Process Data function order

Load relevant data

All the data in the PIP process has to be loaded with the package {pipload}. function pip_load_dlw() loads any welfare data available in DatalibWeb. It does not use the datalibweb Stata command but reads directly into a flat folder provided by the Poverty GP. Function pip_load_aux() loads any auxiliary data available in the PIP project. Auxiliary data refers to any dataset that is not welfare data. The most important auxiliary data is the Price FrameWork (pfw) dataset, since it is data that contains all relevant metadata for data processing and estimations.

df  <- pipload::pip_load_dlw(country = "PHL", year = 2012)
pfw <- pipload::pip_load_aux("pfw")

Country Price FrameWork (cpfw)

The country price framework is a subset of the pfw data. Yet, it is not simply filtered data. Sometimes you may find two welfare types in the same survey, so it is necessary to have to different sets of information regarding each welfare type. This is way, pfw must be filtered using the get_country_pfw() function.

cpfw <- get_country_pfw(df, pfw)
# names(cpfw)

pd_* functions

All functions prefixed with pd_ are for Processing Data. They are used only to convert data from datalibweb form to pip form. The results format of the data is what is known as input cache data. All the pd_ functions are intended to be used at a high level, but they depend on lower level S3 methods that vary depending on the type of welfare data.

Split alternative welfare

lf <- pd_split_alt_welfare(df, cpfw)
# names(lf)

Clean from DLW format

lf_dlw    <- pd_dlw_clean(lf, cpfw)
# names(lf_dlw)

Clean to be used by wbpip

lf_wbpip <- pd_wbpip_clean(lf_dlw)
# names(lf_wbpip)

Add variables for PIP process

ppp  <- pipload::pip_load_aux("ppp")
cpi  <- pipload::pip_load_aux("cpi")
pop  <- pipload::pip_load_aux("pop")

lf_pip_vars <- 
  pd_add_pip_vars(lf = lf_wbpip, 
                  cpfw = cpfw, 
                  cpi = cpi, 
                  ppp = ppp, 
                  pop = pop)
# names(lf_pip_vars[[1]])