Calculating Study Day in R for CDISC Compliance: A Step-by-Step Guide
Calculating the Study Day (--DY). This is a crucial step in clinical trial data analysis,
and we’ll show you how to create a dynamic function in R to handle this task,
even when dealing with partial dates.
Function Overview
The calculate_study_day function calculates the Study Day for clinical trial
events relative to a reference date (typically the date of first dose). It
handles partial dates by setting the Study Day to NA when dates are incomplete.
Function Definition
Here’s
the complete function definition:
R
# Load necessary package
library(dplyr)
# Function to calculate Study Day
calculate_study_day <- function(data, subject_col, date_col, ref_date_col, study_day_col) {
temp_date_col <- paste0(date_col, "_temp")
temp_ref_date_col <- paste0(ref_date_col, "_temp")
data <- data %>%
mutate(
!!temp_date_col := as.Date(ifelse(nchar(get(date_col)) < 10, NA, get(date_col)), format = "%Y-%m-%d"),
!!temp_ref_date_col := as.Date(ifelse(nchar(get(ref_date_col)) < 10, NA, get(ref_date_col)), format = "%Y-%m-%d")
) %>%
group_by(!!sym(subject_col)) %>%
mutate(
!!study_day_col := ifelse(is.na(get(temp_date_col)) | is.na(get(temp_ref_date_col)), NA,
ifelse(get(temp_date_col) < first(get(temp_ref_date_col)),
as.numeric(get(temp_date_col) - first(get(temp_ref_date_col))),
as.numeric(get(temp_date_col) - first(get(temp_ref_date_col)) + 1)))
) %>%
ungroup() %>%
select(-!!sym(temp_date_col), -!!sym(temp_ref_date_col))
return(data)
}
Explanation
- Load Necessary Package: We use
the dplyr package for data
manipulation.
- Define the Function: The calculate_study_day function takes five arguments:
- data: The dataset
containing the clinical trial data.
- subject_col: The column
name of the subject identifier.
- date_col: The column
name of the event date for which the Study Day is to be calculated.
- ref_date_col: The column
name of the reference date (e.g., date of first dose).
- study_day_col: The name of
the output column where the Study Day will be stored.
Inside
the function:
- We convert the date_col and ref_date_col to Date objects using as.Date().
- We clean partial dates by
setting them to NA if they
are not in the standard YYYY-MM-DD format.
- We group the data by the subject
identifier (subject_col).
- We calculate the Study Day as
the difference between the event date and the reference date, setting it
to NA if
either date is NA.
Example Usage
Let’s
see how to use this function with a sample dataset:
R
# Sample data
sample_data <- data.frame(
USUBJID = c("01", "01", "01", "02", "02"),
RFSTDTC = c("2024-01-01", "2024-01-01", "2024-01-01", "2024-01-01", "2024-01-01"),
AESTDTC = c("2024-01-05", "2024-01-10", "2024-01-15", "2024-01", "2024-01-12")
)
# Calculate Study Day for Adverse Events (AE)
sample_data <- calculate_study_day(sample_data, "USUBJID", "AESTDTC", "RFSTDTC", "AESTDY")
print(sample_data)
Handling Partial Dates
In the example above, the function handles partial dates by
setting the AESTDY column to NA when the AESTDTC date
is incomplete. This ensures that the Study Day calculation is accurate and
compliant with CDISC standards.
Conclusion
The calculate_study_day function is a powerful tool for ensuring your
clinical trial datasets are CDISC-compliant. By handling partial dates and
dynamically calculating the Study Day, this function helps maintain the
integrity and accuracy of your data.