IDA Framework

Initial Data Analysis (IDA) consists of all steps performed on the data of a study typically between the end of the data collection/entry and start of those statistical analyses that address research questions. Ideally, IDA should already be performed during ongoing data collections to detect and deal with data issues as early as possible.

IDA conists of six elements.

  1. Metadata setup summarizes background information to properly conduct all following IDA steps. Beyond technical metadata such as labels or plausibility limits, this covers conceptual metadata which combines information from the study protocol, secondary information sources and information about the actual study conduct.

  2. Data cleaning is performed to identify and correct technical data errors. Many errors may not be directly observed and a proper metadata setup is crucial to progress correctly and efficiently in this step.

  3. Data screening examines data properties to inform decisions about the realizability of the intended analyses. In contrast to the data cleaning step, the focus is on data properties, not technical errors. However, data screening may reveal structural errors that occurred during the data collection process.

  4. Initial data reporting documents all insights obtained from the previous steps to the research body.

  5. Refining and updating the analysis plan may makes adaptations of the analysis plan to account for findings from the previous IDA steps.

  6. Reporting IDA in research papers is necessary to ensure transparency regarding key findings and actions in the IDA steps that impacted the analysis or interpretation of results. This reporting step is based on the initial data reporting but clearly focused on the specific paper and what has been done, whereas the former provides a general overview of IDA findings and suggestions on ways to handle potential conflicts with the analysis plan.

Huebner M, le Cessie S, Schmidt CO, Vach W . A contemporary conceptual framework for initial data analysis. Observational Studies 2018; 4: 171-192. Link