Bachelor Thesis: Data Smells Detection in Large Clinical Trial Databases

"Data smells" refer to specific patterns or characteristics found in data that have the potential to compromise its quality in the future. The smells are characterized as context-independent and can manifest across diverse domains. However, certain domains, such as the medical field, may be more sensitive to data smells and their potential harm due to the criticality of data accuracy and the necessity for precise analysis. This bachelor's thesis aims to identify data smells and their causes in clinical research databases within small interventional and large observational studies and estimate their potential threats to data quality.

Furthermore, the thesis aims to investigate the root causes of the identified data smells and proposes methods of smell identification and targeted corrective actions. New methods will be implemented and extend a Software Library and Web Application for Rule-based Data Smell Detection with a group of data smells.

Supervisor: Valentina.Golendukhina@uibk.ac.at