Unlocking Data Consistency: Tackling Missing Variables in SDTM and ADaM Programming
In the world of clinical data programming, maintaining data integrity is absolutely essential. One of the common hurdles we face is dealing with permissible variables in SDTM (Study Data Tabulation Model) datasets that might have missing values across all records. These variables often get dropped in the final SDTM dataset but are still needed for ADaM (Analysis Data Model) datasets. This isn't just a problem for AE (Adverse Events) datasets; it extends to all SUPPxx domains and other SDTM domains as well.
Why This Matters
When permissible variables are missing from SDTM datasets, several issues can arise:
- Programming Errors: When expected variables are not present, it can cause errors in the programming code, such as variable not found errors, which can halt the data processing workflow and require additional debugging time.
- Data Loss: Important variables might be unintentionally omitted, leading to incomplete datasets that miss crucial information.
- Inconsistencies: The absence of these variables can create discrepancies between SDTM and ADaM datasets, complicating the data analysis process.
- Impact on Analysis: Missing variables can affect the accuracy and reliability of the analysis, potentially leading to incorrect conclusions.
The Challenge
Permissible variables are those that are allowed but not required to be present in the dataset. In practice, these variables might be entirely missing from the SDTM dataset if they have no values. However, for ADaM datasets, which are used for analysis, these variables might still be necessary. This creates a challenge for data programmers who need to ensure that all required variables are available for analysis, even if they were not present in the original SDTM dataset.
Our Approach
In this blog post, we’ll explore how to dynamically handle missing permissible variables to ensure they are available for ADaM programming. We will demonstrate a method to check for the existence of these variables in the SDTM dataset and create them with missing values if they do not exist. This approach ensures that the ADaM datasets are complete and consistent, facilitating accurate and reliable analysis.
By using a dynamic macro, we can automate this process, making it efficient and reducing the risk of errors. This method can be applied to any SDTM domain, ensuring that all necessary variables are included in the ADaM datasets, regardless of their presence in the original SDTM datasets.
Why Handling Missing Permissible Variables is Important
Missing permissible variables can lead to several issues:
- Data Loss: Important variables may be dropped, leading to incomplete datasets.
- Inconsistencies: Missing variables can cause inconsistencies between SDTM and ADaM datasets.
- Analysis Impact: Missing variables can affect the accuracy and reliability of the analysis.
Step-by-Step Solution
To address this issue, we can create a dynamic macro that checks for the existence of permissible variables in the SDTM dataset and assigns missing values if they do not exist. This ensures that all necessary variables are available for ADaM programming.
1. Check for Variable Existence
First, we need to check if each permissible variable exists in the SDTM dataset. If a variable does not exist, we will create it and assign missing values.
2. Create a Macro to Handle Variable Checks and Assignments
We can create a macro to dynamically check for the existence of each variable and assign missing values if the variable does not exist. The output dataset will be created in the work library.
Example Macro:
%macro check_and_assign_missing(inds=, outds=, varlist=);
%let dsid = %sysfunc(open(&inds));
%if &dsid %then %do;
data &outds;
set &inds;
%do i = 1 %to %sysfunc(countw(&varlist));
%let var = %scan(&varlist, &i);
%if %sysfunc(varnum(&dsid, &var)) = 0 %then %do;
length &var $1; /* Adjust length as needed */
&var = ' ';
%end;
%end;
run;
%let rc = %sysfunc(close(&dsid));
%end;
%else %put ERROR: Dataset &inds does not exist.;
%mend check_and_assign_missing;
Practical Example: AE Dataset
Let's see how this works with the sdtm.ae
dataset, including two variables (AESCONG and AESDISAB) with values set to 'Y':
/* Create the example AE dataset */
data AE;
input STUDYID $ USUBJID $ AESEQ AETERM $ AESCONG $ AESDISAB $;
datalines;
STUDY01 001 1 Headache Y Y
STUDY01 001 2 Nausea Y Y
STUDY01 002 1 Dizziness Y Y
STUDY01 003 1 Fatigue Y Y
STUDY01 003 2 Abdominal Pain Y Y
;
run;
/* Define the list of permissible variables */
%let permissible_vars = AESCONG AESDISAB AESDTH AESHOSP AESLIFE AESOD AESMIE;
/* Call the macro for the SDTM.AE dataset */
%check_and_assign_missing(inds=ae, outds=ae1, varlist=&permissible_vars);
Explanation of AE Example
The AE dataset includes the variables STUDYID, USUBJID, AESEQ, AETERM, AESCONG, and AESDISAB.
- Defining Permissible Variables: We define a list of permissible variables (AESCONG, AESDISAB, AESDTH, AESHOSP, AESLIFE, AESOD, AESMIE) that we need to check and ensure are present in the dataset.
- Calling the Macro: We call the
check_and_assign_missing
macro with thesdtm.ae
dataset and the list of permissible variables. The macro dynamically checks for the existence of each permissible variable in thesdtm.ae
dataset. If a variable does not exist, it is created as a character variable with a length of 1 and assigned a missing value (a blank space).
Input AE Dataset Preview:
STUDYID | USUBJID | AESEQ | AETERM | AESCONG | AESDISAB |
---|---|---|---|---|---|
STUDY01 | 001 | 1 | Headache | Y | Y |
STUDY01 | 001 | 2 | Nausea | Y | Y |
STUDY01 | 002 | 1 | Dizziness | Y | Y |
STUDY01 | 003 | 1 | Fatigue | Y | Y |
STUDY01 | 003 | 2 | Abdominal Pain | Y | Y |
Output AE Dataset Preview:
After running the macro, the resulting dataset in the work library will include the permissible variables with missing values (blank spaces) if they were not present in the original dataset.
STUDYID | USUBJID | AESEQ | AETERM | AESCONG | AESDISAB | AESDTH | AESHOSP | AESLIFE | AESOD | AESMIE |
---|---|---|---|---|---|---|---|---|---|---|
STUDY01 | 001 | 1 | Headache | Y | Y | |||||
STUDY01 | 001 | 2 | Nausea | Y | Y | |||||
STUDY01 | 002 | 1 | Dizziness | Y | Y | |||||
STUDY01 | 003 | 1 | Fatigue | Y | Y | |||||
STUDY01 | 003 | 2 | Abdominal Pain | Y | Y |
Practical Example: SUPPxx Dataset
Let's see how this works with an example SUPPxx dataset, including some QNAM variables:
/* Create the example SUPPAE dataset */
data suppae;
input STUDYID $ USUBJID $ IDVAR $ IDVARVAL $ QNAM $ QVAL $;
datalines;
STUDY01 001 AESEQ 1 QNAM1 Value1
STUDY01 001 AESEQ 2 QNAM2 Value2
STUDY01 002 AESEQ 1 QNAM1 Value3
STUDY01 003 AESEQ 1 QNAM2 Value4
STUDY01 003 AESEQ 2 QNAM3 Value5
;
run;
/* Transpose the SUPPAE dataset */
proc transpose data=suppae out=suppae_t(drop=_name_);
by STUDYID USUBJID IDVAR IDVARVAL;
id QNAM;
var QVAL;
run;
/* Define the list of QNAM variables */
%let qnam_vars = QNAM1 QNAM2 QNAM3 QNAM4 QNAM5;
/* Call the macro for the transposed SUPPAE dataset */
%check_and_assign_missing(inds=suppae_t, outds=suppae1, varlist=&qnam_vars);
Explanation of SUPPxx Example
The SUPPAE dataset includes the variables STUDYID, USUBJID, IDVAR, IDVARVAL, QNAM, and QVAL. The QNAM variables present are QNAM1, QNAM2, and QNAM3.
- Transposing the SUPPAE Dataset: We transpose the SUPPAE dataset to get QNAMs as variables. The resulting dataset will have QNAM1, QNAM2, and QNAM3 as columns.
- Defining QNAM Variables: We define a list of QNAM variables (QNAM1, QNAM2, QNAM3, QNAM4, QNAM5) that we need to check and ensure are present in the dataset.
- Calling the Macro: We call the
check_and_assign_missing
macro with the transposed SUPPAE dataset and the list of QNAM variables. The macro dynamically checks for the existence of each QNAM variable in the transposed dataset. If a variable does not exist, it is created as a character variable with a length of 200 and assigned a missing value (a blank space).
Input SUPPxx Dataset Preview:
STUDYID | USUBJID | IDVAR | IDVARVAL | QNAM | QVAL |
---|---|---|---|---|---|
STUDY01 | 001 | AESEQ | 1 | QNAM1 | Value1 |
STUDY01 | 001 | AESEQ | 2 | QNAM2 | Value2 |
STUDY01 | 002 | AESEQ | 1 | QNAM1 | Value3 |
STUDY01 | 003 | AESEQ | 1 | QNAM2 | Value4 |
STUDY01 | 003 | AESEQ | 2 | QNAM3 | Value5 |
Transposed SUPPxx Dataset Preview:
STUDYID | USUBJID | IDVAR | IDVARVAL | QNAM1 | QNAM2 | QNAM3 |
---|---|---|---|---|---|---|
STUDY01 | 001 | AESEQ | 1 | Value1 | ||
STUDY01 | 001 | AESEQ | 2 | Value2 | ||
STUDY01 | 002 | AESEQ | 1 | Value3 | ||
STUDY01 | 003 | AESEQ | 1 | Value4 | ||
STUDY01 | 003 | AESEQ | 2 | Value5 |
Output SUPPxx Dataset Preview:
After running the macro, the resulting dataset in the work library will include the QNAM variables with missing values (blank spaces) if they were not present in the original dataset.
STUDYID | USUBJID | IDVAR | IDVARVAL | QNAM1 | QNAM2 | QNAM3 | QNAM4 | QNAM5 |
---|---|---|---|---|---|---|---|---|
STUDY01 | 001 | AESEQ | 1 | Value1 | ||||
STUDY01 | 001 | AESEQ | 2 | Value2 | ||||
STUDY01 | 002 | AESEQ | 1 | Value3 | ||||
STUDY01 | 003 | AESEQ | 1 | Value4 | ||||
STUDY01 | 003 | AESEQ | 2 | Value5 |
Conclusion
Handling permissible variables that may be missing in the SDTM dataset is crucial for ensuring the integrity of ADaM datasets. By using a dynamic macro to check for variable existence and assign missing values, you can streamline your programming process and avoid issues with missing variables. This approach can be applied to any SDTM domain, ensuring consistent and accurate data management across your clinical trials.
Key Takeaways
Importance of Data Integrity: Ensuring all permissible variables are present in SDTM datasets is crucial for maintaining data integrity and consistency in ADaM datasets.
Dynamic Macro Solution: A dynamic macro can be used to check for the existence of permissible variables and assign missing values if they do not exist, streamlining the data preparation process.
Practical Application: The macro can be applied to various datasets, such as AE and SUPPxx, to ensure all necessary variables are included, even if they were missing in the original dataset.
Flexibility and Adaptability: The approach can be adapted to any SDTM domain, ensuring consistent and accurate data management across different clinical trials.
Enhanced Analysis Accuracy: By handling missing permissible variables effectively, the accuracy and reliability of the analysis are improved, leading to better decision-making in clinical research.