Unlocking Data Consistency: Tackling Missing Variables in SDTM and ADaM Programming

In the world of clinical data programming, maintaining data integrity is absolutely essential. One of the common hurdles we face is dealing with permissible variables in SDTM (Study Data Tabulation Model) datasets that might have missing values across all records. These variables often get dropped in the final SDTM dataset but are still needed for ADaM (Analysis Data Model) datasets. This isn't just a problem for AE (Adverse Events) datasets; it extends to all SUPPxx domains and other SDTM domains as well.

Why This Matters

When permissible variables are missing from SDTM datasets, several issues can arise:

  • Programming Errors: When expected variables are not present, it can cause errors in the programming code, such as variable not found errors, which can halt the data processing workflow and require additional debugging time.
  • Data Loss: Important variables might be unintentionally omitted, leading to incomplete datasets that miss crucial information.
  • Inconsistencies: The absence of these variables can create discrepancies between SDTM and ADaM datasets, complicating the data analysis process.
  • Impact on Analysis: Missing variables can affect the accuracy and reliability of the analysis, potentially leading to incorrect conclusions.

The Challenge

Permissible variables are those that are allowed but not required to be present in the dataset. In practice, these variables might be entirely missing from the SDTM dataset if they have no values. However, for ADaM datasets, which are used for analysis, these variables might still be necessary. This creates a challenge for data programmers who need to ensure that all required variables are available for analysis, even if they were not present in the original SDTM dataset.

Our Approach

In this blog post, we’ll explore how to dynamically handle missing permissible variables to ensure they are available for ADaM programming. We will demonstrate a method to check for the existence of these variables in the SDTM dataset and create them with missing values if they do not exist. This approach ensures that the ADaM datasets are complete and consistent, facilitating accurate and reliable analysis.

By using a dynamic macro, we can automate this process, making it efficient and reducing the risk of errors. This method can be applied to any SDTM domain, ensuring that all necessary variables are included in the ADaM datasets, regardless of their presence in the original SDTM datasets.


Why Handling Missing Permissible Variables is Important

Missing permissible variables can lead to several issues:

  • Data Loss: Important variables may be dropped, leading to incomplete datasets.
  • Inconsistencies: Missing variables can cause inconsistencies between SDTM and ADaM datasets.
  • Analysis Impact: Missing variables can affect the accuracy and reliability of the analysis.

Step-by-Step Solution

To address this issue, we can create a dynamic macro that checks for the existence of permissible variables in the SDTM dataset and assigns missing values if they do not exist. This ensures that all necessary variables are available for ADaM programming.

1. Check for Variable Existence

First, we need to check if each permissible variable exists in the SDTM dataset. If a variable does not exist, we will create it and assign missing values.

2. Create a Macro to Handle Variable Checks and Assignments

We can create a macro to dynamically check for the existence of each variable and assign missing values if the variable does not exist. The output dataset will be created in the work library.

Example Macro:

%macro check_and_assign_missing(inds=, outds=, varlist=);
    %let dsid = %sysfunc(open(&inds));
    %if &dsid %then %do;
        data &outds;
            set &inds;
            %do i = 1 %to %sysfunc(countw(&varlist));
                %let var = %scan(&varlist, &i);
                %if %sysfunc(varnum(&dsid, &var)) = 0 %then %do;
                    length &var $1; /* Adjust length as needed */
                    &var = ' ';
                %end;
            %end;
        run;
        %let rc = %sysfunc(close(&dsid));
    %end;
    %else %put ERROR: Dataset &inds does not exist.;
%mend check_and_assign_missing;

Practical Example: AE Dataset

Let's see how this works with the sdtm.ae dataset, including two variables (AESCONG and AESDISAB) with values set to 'Y':

/* Create the example AE dataset */
data AE;
    input STUDYID $ USUBJID $ AESEQ AETERM $ AESCONG $ AESDISAB $;
    datalines;
STUDY01 001 1 Headache Y Y
STUDY01 001 2 Nausea Y Y
STUDY01 002 1 Dizziness Y Y
STUDY01 003 1 Fatigue Y Y
STUDY01 003 2 Abdominal Pain Y Y
;
run;

/* Define the list of permissible variables */
%let permissible_vars = AESCONG AESDISAB AESDTH AESHOSP AESLIFE AESOD AESMIE;

/* Call the macro for the SDTM.AE dataset */
%check_and_assign_missing(inds=ae, outds=ae1, varlist=&permissible_vars);

Explanation of AE Example

The AE dataset includes the variables STUDYID, USUBJID, AESEQ, AETERM, AESCONG, and AESDISAB.

  • Defining Permissible Variables: We define a list of permissible variables (AESCONG, AESDISAB, AESDTH, AESHOSP, AESLIFE, AESOD, AESMIE) that we need to check and ensure are present in the dataset.
  • Calling the Macro: We call the check_and_assign_missing macro with the sdtm.ae dataset and the list of permissible variables. The macro dynamically checks for the existence of each permissible variable in the sdtm.ae dataset. If a variable does not exist, it is created as a character variable with a length of 1 and assigned a missing value (a blank space).

Input AE Dataset Preview:

STUDYIDUSUBJIDAESEQAETERMAESCONGAESDISAB
STUDY010011HeadacheYY
STUDY010012NauseaYY
STUDY010021DizzinessYY
STUDY010031FatigueYY
STUDY010032Abdominal PainYY

Output AE Dataset Preview:

After running the macro, the resulting dataset in the work library will include the permissible variables with missing values (blank spaces) if they were not present in the original dataset.

STUDYIDUSUBJIDAESEQAETERMAESCONGAESDISABAESDTHAESHOSPAESLIFEAESODAESMIE
STUDY010011HeadacheYY
STUDY010012NauseaYY
STUDY010021DizzinessYY
STUDY010031FatigueYY
STUDY010032Abdominal PainYY

Practical Example: SUPPxx Dataset

Let's see how this works with an example SUPPxx dataset, including some QNAM variables:

/* Create the example SUPPAE dataset */
data suppae;
    input STUDYID $ USUBJID $ IDVAR $ IDVARVAL $ QNAM $ QVAL $;
    datalines;
STUDY01 001 AESEQ 1 QNAM1 Value1
STUDY01 001 AESEQ 2 QNAM2 Value2
STUDY01 002 AESEQ 1 QNAM1 Value3
STUDY01 003 AESEQ 1 QNAM2 Value4
STUDY01 003 AESEQ 2 QNAM3 Value5
;
run;

/* Transpose the SUPPAE dataset */
proc transpose data=suppae out=suppae_t(drop=_name_);
    by STUDYID USUBJID IDVAR IDVARVAL;
    id QNAM;
    var QVAL;
run;

/* Define the list of QNAM variables */
%let qnam_vars = QNAM1 QNAM2 QNAM3 QNAM4 QNAM5;

/* Call the macro for the transposed SUPPAE dataset */
%check_and_assign_missing(inds=suppae_t, outds=suppae1, varlist=&qnam_vars);

Explanation of SUPPxx Example

The SUPPAE dataset includes the variables STUDYID, USUBJID, IDVAR, IDVARVAL, QNAM, and QVAL. The QNAM variables present are QNAM1, QNAM2, and QNAM3.

  • Transposing the SUPPAE Dataset: We transpose the SUPPAE dataset to get QNAMs as variables. The resulting dataset will have QNAM1, QNAM2, and QNAM3 as columns.
  • Defining QNAM Variables: We define a list of QNAM variables (QNAM1, QNAM2, QNAM3, QNAM4, QNAM5) that we need to check and ensure are present in the dataset.
  • Calling the Macro: We call the check_and_assign_missing macro with the transposed SUPPAE dataset and the list of QNAM variables. The macro dynamically checks for the existence of each QNAM variable in the transposed dataset. If a variable does not exist, it is created as a character variable with a length of 200 and assigned a missing value (a blank space).

Input SUPPxx Dataset Preview:

STUDYIDUSUBJIDIDVARIDVARVALQNAMQVAL
STUDY01001AESEQ1QNAM1Value1
STUDY01001AESEQ2QNAM2Value2
STUDY01002AESEQ1QNAM1Value3
STUDY01003AESEQ1QNAM2Value4
STUDY01003AESEQ2QNAM3Value5

Transposed SUPPxx Dataset Preview:

STUDYIDUSUBJIDIDVARIDVARVALQNAM1QNAM2QNAM3
STUDY01001AESEQ1Value1
STUDY01001AESEQ2Value2
STUDY01002AESEQ1Value3
STUDY01003AESEQ1Value4
STUDY01003AESEQ2Value5

Output SUPPxx Dataset Preview:

After running the macro, the resulting dataset in the work library will include the QNAM variables with missing values (blank spaces) if they were not present in the original dataset.

STUDYIDUSUBJIDIDVARIDVARVALQNAM1QNAM2QNAM3QNAM4QNAM5
STUDY01001AESEQ1Value1
STUDY01001AESEQ2Value2
STUDY01002AESEQ1Value3
STUDY01003AESEQ1Value4
STUDY01003AESEQ2Value5

Conclusion

Handling permissible variables that may be missing in the SDTM dataset is crucial for ensuring the integrity of ADaM datasets. By using a dynamic macro to check for variable existence and assign missing values, you can streamline your programming process and avoid issues with missing variables. This approach can be applied to any SDTM domain, ensuring consistent and accurate data management across your clinical trials.

Key Takeaways

  1. Importance of Data Integrity: Ensuring all permissible variables are present in SDTM datasets is crucial for maintaining data integrity and consistency in ADaM datasets.

  2. Dynamic Macro Solution: A dynamic macro can be used to check for the existence of permissible variables and assign missing values if they do not exist, streamlining the data preparation process.

  3. Practical Application: The macro can be applied to various datasets, such as AE and SUPPxx, to ensure all necessary variables are included, even if they were missing in the original dataset.

  4. Flexibility and Adaptability: The approach can be adapted to any SDTM domain, ensuring consistent and accurate data management across different clinical trials.

  5. Enhanced Analysis Accuracy: By handling missing permissible variables effectively, the accuracy and reliability of the analysis are improved, leading to better decision-making in clinical research.

Popular posts from this blog

Calculating Study Day in R for CDISC Compliance: A Step-by-Step Guide

Mastering the Art of Debugging Nested Macros in SAS

HOW TO ACCESS SPECIAL CHARACTERS IN SAS