Loading
Hospitals and life-science institutes produce a tremendous amount of data on a daily basis during the healthcare process and ordinary scientific activity. Such data are highly valuable as they can be used to improve the process of care delivery and prevention and can also play a pivotal role in prospective clinical research. However, clinical, biological and imaging data are usually gathered by means of diverse data collection channels and procedures exhibiting a diverse degree of reliability and trustability. As a consequence, the collected data is usually scattered over heterogeneous data sources and suffers from quality problems that hampers its use for analysis purposes. Classical data quality issues can be observed, including missing or erroneous data, and also more complex problems can be perceived, for example due to secondary use in different contexts than the ones they were meant to be collected for. Additionally, the distribution of data can evolve over time creating “data-glitches” than can cause interpretation errors of high severity. Today, no system is able to assist the clinicians and researchers in a quality-aware exploration of their data. Overall, the lack of quality indicators strongly limits an in-depth use of healthcare data in translational research. We argue that more analyses of increasing complexity and more interactions between clinical and pre-clinical medical research would be feasible if the available data were annotated with quality indicators, and if such quality indicators were also employed in the querying and analysis of the available data. This research proposal is geared toward a system capable of capturing and formalizing the knowledge of data quality from domain experts, enriching the available data with this knowledge and thus exploiting this knowledge in the subsequent quality-aware medical research studies. We expect a quality-certified collection of medical and biological datasets, on which quality-certified analytical queries can be formulated. We envision the conception and implementation of a quality-aware query engine with query enrichment and answering capabilities. To reach this ambitious objectives, the following concrete scientific goals must be fulfilled : An innovative research approach, that starts from concrete datasets and expert practices and knowledge to reach formal models and theoretical solutions, will be employed to elicit innovative quality dimensions and to identify, formalize, verify and finally construct quality indicators able to capture the variety and complexity of medical data; those indicators have to be composed, normalized and aggregated when queries involve data with different granularities (e.g., accuracy indications on pieces of information at the patient level have to be composed when one queries cohort) and of different quality dimensions (e.g., mixing incomplete and inaccurate data); In turn, those complex aggregated indicators have to be used to provide new quality-driven query answering, refinement, enrichment and data analytics techniques. A key novelty of this project is the handling of data which are not rectified on the original database but sanitized in a query-driven fashion: queries will be modified, rewritten and extended to integrate quality parameters in a flexible and automatic way. The adequacy of our declarative specification of quality indicators, and the efficiency of query refinement and query answering, along with analytical tasks leveraging such indicators will be assessed by domain experts on real representative datasets collected by the project consortium.
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=anr_________::075072cdaf39d94c7db25a8612d59a38&type=result"></script>');
-->
</script>