Use case 3 (Barco / VITO)
AI-based clinical decision support
When a medical device is developed, the safety and effectiveness need to be guaranteed for all subgroups of the target population. Manually identifying subgroups and evaluating the performance is not always possible, especially when looking at combinations of shards (i.e. subpopulations) on levels larger than two, e.g. a shard combination of gender, age and patient skin type. The number of possible slices quickly becomes challenging as inspecting all possible data shards is a combinatorial problem and the number of shards to inspect increases exponentially with increasing number of fields
We present a novel toolbox for automatic slicing and performance assessment in a generic and modular approach, helping to find the subpopulations where the AI models are underperforming. This helps in understanding the detailed performance characteristics of the AI component and ensures the safety and effectiveness of a potential update, an essential requirement for any deployment of a medical device. By having this tool we can do this analysis in more detail and faster, thus resulting in a faster update cycle.
Labeled healthcare data is scarce making it challenging to evaluate an-AI based medical device. By using the more readily available unlabeled data some aspects such as generalizability, outlier and drift detection can already be partially evaluated.
Identifying a subset where the presence of non-clinical features correlates with lower performance can indicate unwanted bias. For example, the presence of a blue marker in a dermoscopic image has been identified as a root cause of bias for existing public AI components for skin cancer analysis. To evaluate if this is the case for our internal AI update, we can simulate the presence of such markers, e.g. by overlaying segmented marker areas in a regular dermoscopic image (see example figure below). Using the dashboard, we can now compare the performance shift for images with marker with the original images without the marker. And this not just on an overall performance, but also on subpopulation level (see final figure). In summary we show how this toolbox in combination with simulation can also be used to inspect causality vs. correlation in such a situation.
This video explains use case 3: AI based clinical decision support for radiology, surgical and dermatology system
Disclaimer: the performance presented in the above figures are from internal (dedicated) experiments of AI components, and do not represent any product performances.