| Title: | Analysis Blinding Tools |
|---|---|
| Description: | Provides tools for analysis blinding in confirmatory research contexts by masking and scrambling test-relevant aspects of data. Vector-, data frame-, and row-wise operations support blinding for hierarchical and repeated-measures designs. For more details see MacCoun and Perlmutter (2015) <doi:10.1038/526187a> and Dutilh, Sarafoglou, and Wagenmakers (2019) <doi:10.1007/s11229-019-02456-7>. |
| Authors: | Tamás Nagy [aut, cre] (ORCID: <https://orcid.org/0000-0001-5244-0356>), Alexandra Sarafoglou [aut, dtc] (ORCID: <https://orcid.org/0000-0003-0031-685X>), Márton Kovács [aut] (ORCID: <https://orcid.org/0000-0002-8142-8492>) |
| Maintainer: | Tamás Nagy <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 1.1.0 |
| Built: | 2026-06-06 06:44:06 UTC |
| Source: | https://github.com/nthun/vazul |
A cross-cultural dataset from the Many-Analysts Religion Project (MARP), which investigated the relationship between religiosity and well-being across 24 countries and diverse religious traditions.
data(marp)data(marp)
A data frame with 10,535 rows (participants) and 48 variables:
Unique subject identifier (integer).
Country of residence (character string).
Importance of religion in daily life (0–10 scale).
Frequency of religious service attendance (ordinal).
Self-rated religiosity (0–10 scale).
Belief in God (binary: yes/no).
Prayer frequency (ordinal).
Bible/study frequency (ordinal).
Religious upbringing (binary: yes/no).
Current religious denomination (categorical).
Change in religiosity over lifetime (ordinal).
Perceived cultural norm: importance of religious lifestyle for average person in country (0–10).
Perceived cultural norm: importance of belief in God for average person in country (0–10).
Overall life satisfaction (1–5 Likert).
Overall happiness (1–5 Likert).
Energy level (1–5).
Sleep quality (1–5).
Appetite (1–5).
Physical pain/discomfort (1–5).
General health (1–5).
Exercise frequency (1–5).
Illness burden (1–5).
Positive affect (1–5).
Negative affect (reverse coded; 1–5).
Meaning in life (1–5).
Purpose in life (1–5).
Hopefulness (1–5).
Anxiety (reverse coded; 1–5).
Social support (1–5).
Loneliness (reverse coded; 1–5).
Community belonging (1–5).
Mean of all well-being items (numeric).
Mean of physical well-being items (numeric).
Mean of psychological well-being items (numeric).
Mean of social well-being items (numeric).
Age in years (integer).
Self-reported gender (character: e.g., "Male", "Female", "Other").
Socioeconomic status composite (numeric).
Highest education level completed (ordinal integer).
Self-reported ethnicity (character).
Religious denomination (character).
GDP per capita (PPP, USD) for country (numeric).
Scaled GDP (mean = 0, sd = 1) used in analyses (numeric).
Recruitment method: e.g., "online panel", "student sample" (character).
Type of compensation: e.g., "monetary", "entry into lottery" (character).
Score on embedded attention check task (integer).
Hoogeveen, S., Sarafoglou, A., Aczel, B., et al. (2022). A many-analysts approach to the relation between religiosity and well-being. Religion, Brain & Behavior. doi:10.1080/2153599X.2023.2254980
library(dplyr) data(marp) # Dimensions dim(marp) # Quick overview if (requireNamespace("dplyr", quietly = TRUE)) { library(dplyr) marp |> group_by(country) |> summarise( mean_wb = mean(wb_overall_mean, na.rm = TRUE), .groups = "drop" ) }library(dplyr) data(marp) # Dimensions dim(marp) # Quick overview if (requireNamespace("dplyr", quietly = TRUE)) { library(dplyr) marp |> group_by(country) |> summarise( mean_wb = mean(wb_overall_mean, na.rm = TRUE), .groups = "drop" ) }
Assigns random new labels to each unique value in a character or factor vector. The purpose is to blind data so analysts are not aware of treatment allocation or categorical outcomes. Each unique original value gets a random new label, and the assignment order is randomized to prevent correspondence with the original order.
mask_labels(x, prefix = "masked_group_")mask_labels(x, prefix = "masked_group_")
x |
a character or factor vector |
prefix |
character string to use as prefix for masked labels. Default is "masked_group_" |
a vector of the same type as input with masked labels
mask_variables for masking multiple variables in a data frame,
mask_names for masking variable names.
# Example with character vector set.seed(123) treatment <- c("control", "treatment", "control", "treatment") mask_labels(treatment) # Example with custom prefix set.seed(456) condition <- c("A", "B", "C", "A", "B", "C") mask_labels(condition, prefix = "group_") # Example with factor vector set.seed(789) ecology <- factor(c("Desperate", "Hopeful", "Desperate", "Hopeful")) mask_labels(ecology) # Using with dataset column data(williams) set.seed(123) williams$ecology_masked <- mask_labels(williams$ecology) head(williams[c("ecology", "ecology_masked")])# Example with character vector set.seed(123) treatment <- c("control", "treatment", "control", "treatment") mask_labels(treatment) # Example with custom prefix set.seed(456) condition <- c("A", "B", "C", "A", "B", "C") mask_labels(condition, prefix = "group_") # Example with factor vector set.seed(789) ecology <- factor(c("Desperate", "Hopeful", "Desperate", "Hopeful")) mask_labels(ecology) # Using with dataset column data(williams) set.seed(123) williams$ecology_masked <- mask_labels(williams$ecology) head(williams[c("ecology", "ecology_masked")])
Assigns new masked names to selected variables in a data frame. All selected variables are combined into a single set and renamed with a common prefix. To mask different variable groups with different prefixes, call the function separately for each group.
mask_names(data, ..., prefix)mask_names(data, ..., prefix)
data |
A data frame. |
... |
Columns to mask using tidyselect semantics. All arguments are combined into a single set. Each can be:
|
prefix |
character string to use as prefix for masked names.
This becomes the base prefix, with numeric suffixes appended (e.g.,
|
A data frame with the specified variables renamed to masked names.
mask_labels for masking values in a vector,
mask_variables for masking values in multiple variables.
df <- data.frame( treat_1 = c(1, 2, 3), treat_2 = c(4, 5, 6), outcome_a = c(7, 8, 9), outcome_b = c(10, 11, 12), id = 1:3 ) # Mask one set of variables library(dplyr) mask_names(df, starts_with("treat_"), prefix = "A_") # Using character vectors mask_names(df, c("treat_1", "treat_2"), prefix = "A_") # Mask multiple sets separately # Note that the order of masking matters # Try to mix up the order of prefixes # for different sets to ensure proper masking. df |> mask_names(starts_with("treat_"), prefix = "B_") |> mask_names(starts_with("outcome_"), prefix = "A_") # Example with the 'williams' dataset data(williams) set.seed(42) williams |> mask_names(starts_with("SexUnres"), prefix = "A_") |> mask_names(starts_with("Impul"), prefix = "B_") |> colnames()df <- data.frame( treat_1 = c(1, 2, 3), treat_2 = c(4, 5, 6), outcome_a = c(7, 8, 9), outcome_b = c(10, 11, 12), id = 1:3 ) # Mask one set of variables library(dplyr) mask_names(df, starts_with("treat_"), prefix = "A_") # Using character vectors mask_names(df, c("treat_1", "treat_2"), prefix = "A_") # Mask multiple sets separately # Note that the order of masking matters # Try to mix up the order of prefixes # for different sets to ensure proper masking. df |> mask_names(starts_with("treat_"), prefix = "B_") |> mask_names(starts_with("outcome_"), prefix = "A_") # Example with the 'williams' dataset data(williams) set.seed(42) williams |> mask_names(starts_with("SexUnres"), prefix = "A_") |> mask_names(starts_with("Impul"), prefix = "B_") |> colnames()
Applies masked labels to multiple categorical variables in a data frame using
the mask_labels() function. Each variable gets independent random
masked labels by default, or can optionally use the same masked labels
across all selected variables.
mask_variables(data, ..., .across_variables = FALSE)mask_variables(data, ..., .across_variables = FALSE)
data |
a data frame |
... |
Columns to mask using tidyselect semantics. Each can be:
Only character and factor columns will be processed. |
.across_variables |
logical. If |
A data frame with the specified categorical columns masked. Only character and factor columns can be processed.
mask_labels for masking a single vector,
mask_names for masking variable names.
# Create example data df <- data.frame( treatment = c("control", "intervention", "control"), outcome = c("success", "failure", "success"), score = c(1, 2, 3) # numeric, won't be masked ) set.seed(123) # Independent masking for each variable (default - uses column names as # prefixes) # Using bare names mask_variables(df, treatment, outcome) # Or using character vector mask_variables(df, c("treatment", "outcome")) set.seed(456) # Shared masking across variables mask_variables(df, c("treatment", "outcome"), .across_variables = TRUE) # Using tidyselect helpers mask_variables(df, where(is.character)) # Example with multiple categorical columns df2 <- data.frame( group = c("A", "B", "A", "B"), condition = c("ctrl", "test", "ctrl", "test") ) set.seed(123) result <- mask_variables(df2, c("group", "condition")) print(result) # Example with williams dataset (multiple categorical columns) data(williams) set.seed(456) # Using bare names (recommended for interactive use) williams_masked <- mask_variables(williams, subject, ecology) head(williams_masked[c("subject", "ecology")])# Create example data df <- data.frame( treatment = c("control", "intervention", "control"), outcome = c("success", "failure", "success"), score = c(1, 2, 3) # numeric, won't be masked ) set.seed(123) # Independent masking for each variable (default - uses column names as # prefixes) # Using bare names mask_variables(df, treatment, outcome) # Or using character vector mask_variables(df, c("treatment", "outcome")) set.seed(456) # Shared masking across variables mask_variables(df, c("treatment", "outcome"), .across_variables = TRUE) # Using tidyselect helpers mask_variables(df, where(is.character)) # Example with multiple categorical columns df2 <- data.frame( group = c("A", "B", "A", "B"), condition = c("ctrl", "test", "ctrl", "test") ) set.seed(123) result <- mask_variables(df2, c("group", "condition")) print(result) # Example with williams dataset (multiple categorical columns) data(williams) set.seed(456) # Using bare names (recommended for interactive use) williams_masked <- mask_variables(williams, subject, ecology) head(williams_masked[c("subject", "ecology")])
Scramble a vector of values
scramble_values(x)scramble_values(x)
x |
a vector |
the scrambled vector
scramble_variables for scrambling multiple variables in a data frame.
# Example with character vector set.seed(123) x <- letters[1:10] scramble_values(x) # Example with numeric vector nums <- 1:5 scramble_values(nums) # Scramble a column in the 'williams' dataset data(williams) # Simple scrambling of a single column set.seed(123) williams$ecology_scrambled <- scramble_values(williams$ecology) head(williams[c("ecology", "ecology_scrambled")])# Example with character vector set.seed(123) x <- letters[1:10] scramble_values(x) # Example with numeric vector nums <- 1:5 scramble_values(nums) # Scramble a column in the 'williams' dataset data(williams) # Simple scrambling of a single column set.seed(123) williams$ecology_scrambled <- scramble_values(williams$ecology) head(williams[c("ecology", "ecology_scrambled")])
Scramble the values of several selected variables in a data frame simultaneously. Supports independent scrambling, joint scrambling, and within-group scrambling.
scramble_variables( data, ..., .groups = NULL, .together = FALSE, .byrow = FALSE )scramble_variables( data, ..., .groups = NULL, .together = FALSE, .byrow = FALSE )
data |
a data frame |
... |
Columns to scramble using tidyselect semantics. Each can be:
|
.groups |
Optional grouping columns. Scrambling will be done within each group.
Supports the same tidyselect syntax as column selection. Grouping columns must not overlap with
the columns selected in |
.together |
logical. If |
.byrow |
logical. If |
A data frame with the specified columns scrambled. If grouping is specified, scrambling is done within each group.
scramble_values for scrambling a single vector.
df <- data.frame( x = 1:6, y = letters[1:6], group = c("A", "A", "A", "B", "B", "B") ) set.seed(123) # Example without grouping. Variables scrambled across the entire data frame. # Using bare names df |> scramble_variables(x, y) # Or using character vector df |> scramble_variables(c("x", "y")) # Example with .together = TRUE. Variables scrambled together as a unit per row. df |> scramble_variables(c("x", "y"), .together = TRUE) # Example with grouping. Variable only scrambled within groups. df |> scramble_variables("y", .groups = "group") # Example combining grouping and together parameters df |> scramble_variables(c("x", "y"), .groups = "group", .together = TRUE) # Example with tidyselect helpers library(dplyr) df |> scramble_variables(starts_with("x")) df |> scramble_variables(where(is.numeric), .groups = "group") # Example with the 'williams' dataset data(williams) williams |> scramble_variables(c("ecology", "age")) williams |> scramble_variables(1:5) williams |> scramble_variables(c("ecology", "age"), .groups = "gender") williams |> scramble_variables(c(1, 2), .groups = 3) williams |> scramble_variables(c("ecology", "age"), .together = TRUE) williams |> scramble_variables(c("ecology", "age"), .groups = "gender", .together = TRUE) # Rowwise scrambling df_row <- data.frame(a = 1:3, b = 4:6, c = 7:9) df_row |> scramble_variables(a, b, c, .byrow = TRUE)df <- data.frame( x = 1:6, y = letters[1:6], group = c("A", "A", "A", "B", "B", "B") ) set.seed(123) # Example without grouping. Variables scrambled across the entire data frame. # Using bare names df |> scramble_variables(x, y) # Or using character vector df |> scramble_variables(c("x", "y")) # Example with .together = TRUE. Variables scrambled together as a unit per row. df |> scramble_variables(c("x", "y"), .together = TRUE) # Example with grouping. Variable only scrambled within groups. df |> scramble_variables("y", .groups = "group") # Example combining grouping and together parameters df |> scramble_variables(c("x", "y"), .groups = "group", .together = TRUE) # Example with tidyselect helpers library(dplyr) df |> scramble_variables(starts_with("x")) df |> scramble_variables(where(is.numeric), .groups = "group") # Example with the 'williams' dataset data(williams) williams |> scramble_variables(c("ecology", "age")) williams |> scramble_variables(1:5) williams |> scramble_variables(c("ecology", "age"), .groups = "gender") williams |> scramble_variables(c(1, 2), .groups = 3) williams |> scramble_variables(c("ecology", "age"), .together = TRUE) williams |> scramble_variables(c("ecology", "age"), .groups = "gender", .together = TRUE) # Rowwise scrambling df_row <- data.frame(a = 1:3, b = 4:6, c = 7:9) df_row |> scramble_variables(a, b, c, .byrow = TRUE)
Data from a study by Williams et al. testing whether high-wealth individuals are perceived as having faster life history strategies (e.g., more impulsive, less invested) when associated with "desperate" ecological conditions compared to "hopeful" ones.
data(williams)data(williams)
A data frame with 224 rows (one per participant) and 25 variables:
Unique subject identifier (integer).
Experimental condition: "Desperate" or "Hopeful" (character).
Participant's age in years (numeric).
Self-reported gender: 1 = Male, 2 = Female (numeric); may be recoded as factor.
Time taken to complete the survey (numeric).
First attention check response: 1 = correct, 0 = incorrect (numeric).
Second attention check response: 1 = correct, 0 = incorrect (numeric).
Perceived sexual unrestrictedness: "likely to have short-term relationships" (1–7 Likert).
"likely to engage in casual sex" (1–7).
"not interested in long-term commitment" (1–7).
"faithful to romantic partners" — reverse-coded (1–7).
"committed in relationships" — reverse-coded (1–7).
"acts without thinking" (1–7).
"thinks carefully before acting" — reverse-coded (1–7).
"plans ahead" — reverse-coded (1–7).
% Note: likely typo in original; was Impul not Impuls?
"opportunities for long-term planning exist" (1–7).
"can save money for the future" (1–7).
"can make career plans" (1–7).
"can plan for retirement" (1–7).
"has control over future outcomes" (1–7).
"life is unpredictable" — reverse-coded (1–7).
"invests in education" — reverse-coded (1–7).
"values academic achievement" — reverse-coded (1–7).
"invests time and resources in children" (1–7).
"neglects parental responsibilities" — reverse-coded (1–7).
Williams, S. A., Galak, J., & Kruger, D. J. (2019). The influence of ecology on social perceptions: When wealth signals faster life history strategies. Evolutionary Behavioral Sciences, 13(4), 313–325. doi:10.1037/ebs0000148
Data based on materials available at: https://osf.io/xyz12 (replace with real link if known)
data(williams) str(williams) table(williams$ecology) # Compute composite scores (example) if (requireNamespace("dplyr", quietly = TRUE)) { library(dplyr) williams_composites <- williams |> rowwise() |> mutate( sexual_unrestrictedness = mean(c(SexUnres_1, SexUnres_2, SexUnres_3, 8 - SexUnres_4_r, 8 - SexUnres_5_r), na.rm = TRUE), impulsivity = mean(c(Impuls_1, 8 - Impuls_2_r, 8 - Impul_3_r), na.rm = TRUE), opportunity = mean(c(Opport_1, Opport_2, Opport_3, Opport_4, Opport_5, 8 - Opport_6_r), na.rm = TRUE), investment = mean(c(8 - InvEdu_1_r, 8 - InvEdu_2_r, InvChild_1, 8 - InvChild_2_r), na.rm = TRUE) ) |> ungroup() summary(williams_composites[ , c("sexual_unrestrictedness", "impulsivity")]) }data(williams) str(williams) table(williams$ecology) # Compute composite scores (example) if (requireNamespace("dplyr", quietly = TRUE)) { library(dplyr) williams_composites <- williams |> rowwise() |> mutate( sexual_unrestrictedness = mean(c(SexUnres_1, SexUnres_2, SexUnres_3, 8 - SexUnres_4_r, 8 - SexUnres_5_r), na.rm = TRUE), impulsivity = mean(c(Impuls_1, 8 - Impuls_2_r, 8 - Impul_3_r), na.rm = TRUE), opportunity = mean(c(Opport_1, Opport_2, Opport_3, Opport_4, Opport_5, 8 - Opport_6_r), na.rm = TRUE), investment = mean(c(8 - InvEdu_1_r, 8 - InvEdu_2_r, InvChild_1, 8 - InvChild_2_r), na.rm = TRUE) ) |> ungroup() summary(williams_composites[ , c("sexual_unrestrictedness", "impulsivity")]) }