Skip to contents
#devtools::load_all()

library(pipaux)

# Initialize log to make it available in this vignette
pipfun::log_init("pipaux_update_log", overwrite = T)
#pipfun::setup_working_release(release = "20250203")

Overview

  • Objectives: Update auxiliary data files and compare them both across different releases and within a single release.

  • Key functions:

  • Key Outputs:

    • Updated GitHub branches and synchronized auxiliary data files saved to the Y drive. The current folder used to store files from the new pipeline is specified by the option getOption(“pipaux.working_dir”), which currently points to “Y:/PIP_ingestion_pipeline_v2”.

    • Data tables highlighting changes in files either within a release or between different releases.

  • Key Steps:

  1. Set up the working release.
  2. Update auxiliary data in both GitHub and the Y drive.
  3. Compare auxiliary data files to identify and review changes.

1️⃣ STEP 1 - SETUP WORKING RELEASE

In the new pipeline framework, we always work with a reference version called the “working release.” Therefore, we need to set up the working release at the beginning of each new R session:

In this example, I am using a “TEST” release with a release date of “20250203”. However, if no release has been created for the desired date, call pipfun::new_pip_release with the appropriate arguments to create it.

# Set up the working release 
pipfun::setup_working_release(release  = "20250203",
                              identity = "TEST")

2️⃣ STEP 2 - UPDATE ALL (OR SOME) AUXILIARY DATA

Update In this example, we update a selection of auxiliary data measures. The repositories used are hosted on my GitHub account, so we specify the owner as “RossanaTat”.


update <- update_all_aux(
  measures = c("maddison", "cpi"),
  owner    = "RossanaTat",
  log      = TRUE,         # Enable logging
  log_save = FALSE          # Optionally save the log as a .qs file in the current release folder
)

update

The output of update_all_aux() is a named list, with each element corresponding to a measure. For each measure, the list contains:

  • success: a logical flag (TRUE or FALSE) indicating whether the update completed successfully,

  • error: either NULL if there was no error, or the error message if something went wrong.

IMPORTANT:

As explained in the “update_workflow” vignette, the update process ensures that each auxiliary data measure is kept up to date with the latest changes. This means that the GitHub branches are updated with the most recent version of the DEV branch, and that the corresponding files are saved to the Y drive accordingly.

To better understand what happened during the update process, you should inspect the log. The log is stored as a data.table, and can be accessed using {pipfun} functions or read directly from the .qs file saved in current release folder (if log_save = TRUE). Note: The logging functions from the pipfun package are explained in the Log vignette.

Inspect result 👀


# Read the entire log into memory
#pipfun::log_get(name = "pipaux_update_log")[]

# Filter log entries 

## Show info messages
pipfun::log_get(name = "pipaux_update_log")[event == "info"]

## Show all update-related events
pipfun::log_get(name = "pipaux_update_log")[event == "update"]

Under the hood The update_all_aux() function internally calls an auxiliary function named aux_fun() for each specified measure. This function manages the full update workflow — from checking synchronization status between GitHub and the network drive (Y:) to updating all interdependencies among auxiliary data measures.

To learn more about this process and the logic behind it, please refer to the dedicated article “Aux Data: Updating GitHub and the Y drive” .

Notes (for your reference or future modifications):

  • At this stage, I am working with auxiliary data repositories under my GitHub account. Once testing is complete, we will run the functions on the actual folders under the owner “PIP-Technical-Team” organization.

  • Additional functions to facilitate log interaction are under development. For now, we use those already available in the pipfun package.

3️⃣ STEP 3 - COMPARE AUXILIARY DATA FILES (ACROSS RELEASES AND VERSIONS)

In this section, we explore how to identify changes in auxiliary data files. We focus on the following measures: cpi, pop, ppp, gdp, and pfw.

The current release used here is "20250101_TEST".

Compare between releases

The compare_aux_releases() function compares the contents of auxiliary data files between the current release and a specified earlier release. This allows you to detect any changes in values, row structure (e.g., new countries or years), or column structure (e.g., added or removed variables).

You can run the function for a single measure or for all available measures. If old_release = NULL, the function will automatically use the most recent available release that shares the same identity (e.g., TEST) of the current working release.

# Compare pop data between the current release and a previous one
changes_pop <- compare_aux_releases(
  measure     = "pop"  # you can pass one or more measures through character vector
)

# Inspect structure of the output
names(changes_pop$pop)

changes_pop$diff_values
# changes_pop$diff_rows

Each element in the returned list corresponds to a measure, depending on what you passed to the measure argument (e.g., “gdp”, “pop”, “ppp”). For each measure, the output is itself a list with three named elements:

  1. diff_values: Changes in Data Values
  2. diff_rows: Row Additions or Removals
  3. diff_cols: Column Additions or Removals

If no differences are found for a given measure, all three elements (diff_values, diff_rows, diff_cols) will be NULL or empty.

Note: compare_aux_releases() uses the {myrror} package to detect changes. The key variables used for comparison are stored as attributes in the .qs files on the Y drive. The function reads these attributes to ensure consistency across releases.

Compare within releases

The compare_aux_vintages() function helps you detect changes within the same release cycle, by comparing the latest version of each auxiliary data file to a previous one stored in the same directory. Previous versions are saved in the vintage folder.

By default, the function compares the current version with the immediately previous one (version = -1). You can specify a different relative version using the version argument (e.g., -2 to compare with two versions back).

Check the version argument documentation for more details.


# Compare the current POP file with its previous version
vintage_changes_pop <- compare_aux_vintages(
  measures = "pop",
  version = -1
)

names(vintage_changes_pop)

vintage_changes_pop

Each element in the returned (invisible) list corresponds to a measure provided in the measures argument. The result for each measure includes the differences in values, rows, and columns (as returned by the internal function compare_vintage_versions()).

If no previous version is found for a given measure, or if an error occurs during comparison, the function will return NULL for that measure and optionally print a message (if verbose = TRUE).

Note that if no vintage versions are found, the function will warn you and return NULL as in this example:

# Compare the current CPI file with its previous version
vintage_changes_cpi <- compare_aux_vintages(
  measures = "cpi",
  version  = -1
)

Notes (for your reference or future modifications):

  • These functions compare auxiliary data files either between releases or within the same release (i.e., across versions).

  • They are intended for use with files structured and saved through the new pipeline framework.

  • Since we haven’t run the full pipeline yet, I simulated changes (e.g., modified values, new rows or columns) in older files using helper functions for testing purposes (e.g., simulate_old_release() and simulate_file_changes() ). As a result, the output values shown here are not real and stem from the simulated changes.