Calculating Study Day in R for CDISC Compliance: A Step-by-Step Guide

 

Calculating the Study Day (--DY). This is a crucial step in clinical trial data analysis, and we’ll show you how to create a dynamic function in R to handle this task, even when dealing with partial dates.

Function Overview

The calculate_study_day function calculates the Study Day for clinical trial events relative to a reference date (typically the date of first dose). It handles partial dates by setting the Study Day to NA when dates are incomplete.

Function Definition

Here’s the complete function definition:

 

R

# Load necessary package

library(dplyr)


# Function to calculate Study Day

calculate_study_day <- function(data, subject_col, date_col, ref_date_col, study_day_col) {

  temp_date_col <- paste0(date_col, "_temp")

  temp_ref_date_col <- paste0(ref_date_col, "_temp")

  

  data <- data %>%

    mutate(

      !!temp_date_col := as.Date(ifelse(nchar(get(date_col)) < 10, NA, get(date_col)), format = "%Y-%m-%d"),

      !!temp_ref_date_col := as.Date(ifelse(nchar(get(ref_date_col)) < 10, NA, get(ref_date_col)), format = "%Y-%m-%d")

    ) %>%

    group_by(!!sym(subject_col)) %>%

    mutate(

      !!study_day_col := ifelse(is.na(get(temp_date_col)) | is.na(get(temp_ref_date_col)), NA,

                                ifelse(get(temp_date_col) < first(get(temp_ref_date_col)),

                                       as.numeric(get(temp_date_col) - first(get(temp_ref_date_col))),

                                       as.numeric(get(temp_date_col) - first(get(temp_ref_date_col)) + 1)))

    ) %>%

    ungroup() %>%

    select(-!!sym(temp_date_col), -!!sym(temp_ref_date_col))

  

  return(data)

}

 

Explanation

  1. Load Necessary Package: We use the dplyr package for data manipulation.
  2. Define the Function: The calculate_study_day function takes five arguments:
    • data: The dataset containing the clinical trial data.
    • subject_col: The column name of the subject identifier.
    • date_col: The column name of the event date for which the Study Day is to be calculated.
    • ref_date_col: The column name of the reference date (e.g., date of first dose).
    • study_day_col: The name of the output column where the Study Day will be stored.

Inside the function:

    • We convert the date_col and ref_date_col to Date objects using as.Date().
    • We clean partial dates by setting them to NA if they are not in the standard YYYY-MM-DD format.
    • We group the data by the subject identifier (subject_col).
    • We calculate the Study Day as the difference between the event date and the reference date, setting it to NA if either date is NA.

Example Usage

Let’s see how to use this function with a sample dataset:

R

# Sample data

sample_data <- data.frame(

  USUBJID = c("01", "01", "01", "02", "02"),

  RFSTDTC = c("2024-01-01", "2024-01-01", "2024-01-01", "2024-01-01", "2024-01-01"),

  AESTDTC = c("2024-01-05", "2024-01-10", "2024-01-15", "2024-01", "2024-01-12")

)

# Calculate Study Day for Adverse Events (AE)

sample_data <- calculate_study_day(sample_data, "USUBJID", "AESTDTC", "RFSTDTC", "AESTDY")

print(sample_data)

 

Handling Partial Dates

In the example above, the function handles partial dates by setting the AESTDY column to NA when the AESTDTC date is incomplete. This ensures that the Study Day calculation is accurate and compliant with CDISC standards.

Conclusion

The calculate_study_day function is a powerful tool for ensuring your clinical trial datasets are CDISC-compliant. By handling partial dates and dynamically calculating the Study Day, this function helps maintain the integrity and accuracy of your data.

 

Popular posts from this blog

Mastering the Art of Debugging Nested Macros in SAS

HOW TO ACCESS SPECIAL CHARACTERS IN SAS