Medical Out-of-Distribution Analysis Challenge
‘Will you be able to spot a Gorilla in a CT scan ???’
Despite overwhelming successes in recent years, progress in the field of biomedical image computing still largely depends on the availability of annotated training examples. This annotation process is often prohibitively expensive because it requires the valuable time of domain experts.
Furthermore, many algorithms used in image analysis are vulnerable to out-of-distribution samples, resulting in wrong and overconfident decisions.
However, even humans are vulnerable to ‘inattentional blindness’. More than 50% of trained radiologists did not notice a gorilla image, rendered into a lung CT scan when assessing lung nodules.
With the Medical Out-of-Distribution Analysis Challenge (MOOD) we want to tackle these challenges!
The MOOD Challenge gives a standardized dataset and benchmark for anomaly detection. We propose two different tasks. First a sample-level (i.e. patients-level) analysis, thus detecting out-of-distribution samples. For example, having a pathological condition or any other condition not seen in the training-set. This can pose a problem to classically supervised algorithms and can further allow physicians to prioritize different patients. Secondly, we propose an object-level analysis i.e. giving a score for each voxel, highlighting abnormal conditions and potentially guiding the physician.
The challenge spans two datasets with more than 500 scans each, one brain MRI-dataset and one abdominal CT-dataset, to allow for a comparison of the generalizability of the approaches. The training set is comprised of hand-selected scans in which no anomalies were identified.
In order to prevent overfitting on the (types of) anomalies existing in our testset, the testset will be kept confidential at all times. As in reality, the types of anomalies should not be known beforehand, to prevent a bias towards certain anomalies in the evaluation. Some scans in the testset do contain no anomalies, whilst others contain naturally occurring anomalies. In addition to the natural anomalies, we will add synthetic anomalies to cover a broad and unpredictable variety of different anomalies and also allow for an analysis of weaknesses and strengths of the methods by different factors.
We believe that this will allow for a controlled and fair comparison of different algorithms.
Number of training samples:
- Brain: 800 scans ( 256 x 256 x 256 )
- Abdominal: 550 scans ( 512 x 512 x 512 )
We provide four toy test-cases for both datasets, for participants to test their final algorithm. After submitting their solution we will report the score on these test-cases back to participants to check if their algorithm run successfully.
To get access to the data go to the Submission-Site.
The MOOD Challenge has two tasks:
Sample-level: Analyse different scans/samples and report a score for each sample. The algorithm should process a single sample and give a “probability” for this sample being abnormal/ out-of-distribution. The scores must be in [0-1], where 0 indicates no abnormality and 1 indicates the most abnormal input.
In summary: One score per sample.
Object-/ Pixel-level: Analyse different scans and report a score for each voxel of the sample. The algorithm should process a single sample and give a “probability” for each voxel being abnormal/ out-of-distribution. The scores must be in [0-1], where 0 indicates no abnormality and 1 indicates the most abnormal input.
The participants can decide if they want to submit a a binary predicition (0 or 1) or a continuous score (0-1). In the case of a continuous score, we use a toy-dataset (similar to the toy-dataset presented to the participants) to determine the best binarization threshold with respect to the presented metric on this toy-dataset. We then use the binarization threshold to binarize the continuous score the binarized score, we apply connected component analysis to find individual objects and discard all objects which are smaller than half the size of the smallest anomaly in our test set and all objects with are more than two times bigger than the biggest object in our test set (to sort out low-frequency noise). This procedure then outputs the prediction objects. We then match the GT objects and the prediction objects in the following way to obtain TP, FP, FN:
If the center of mass of the prediction object falls in the convex hull of the GT object and the size difference is only at max a factor of two (= 0.5 * GT-object size < prediction-object size < 2*GT-object size), then we consider the GT object to be detected, resulting in a TP. If a GT object was not matched/detected by any prediction object we consider this an FN. If a prediction object could not be matched to a GT object, we consider this an FP. We iterate over all images and objects in the dataset and obtain all TP, FP, FN for the whole dataset. The final score is then given by the F1 score (using the determined TPs, FPs, FNs).
In summary: X * Y * Z scores per sample (where X * Y * Z is the dimensionality of the data sample).
- For evaulation please submit a docker for the task on synapse.
- The docker must be able to process a directory with nifti files and…
- … for the Sample-level task output a text file with a single score per sample
- … for the Pixel-level task output a nifti (with the same dimensions as the input) per sample.
- The scores should be in [0-1] (scores above and below the interval will be clamped to [0-1]).
- If a case is missing or fails we will assign it the lowest given anomaly score (= 0).
- During the evaluation, a runtime of 600 sec/case is allowed (You will get a report of the runtime along with your toy example scores).
- No internet connection will be available during the evaluation.
- Only the provided output directory will have write access.
- We will use the reported scores together with the ground truth labels to calculate the AP over the whole dataset (for more information regarding the AP see scikit-learn ).
- We combine the two datasets by choosing a consolidation ranking schema.
- Teams are allowed 3 submissions, however, only the latest submission will be considered (due to technical issues we increased the #submissions ).
To see our evaluation code and reproduce the results on the toy cases visit our github-page.
- Only fully automatic algorithms allowed.
- It is only allowed to use the provided training data. No other data and data sources are allowed.
- The teams may decide after receiving their challenge rank if only their username or teamname including the team members will appear on the leaderboard.
- By default (i.e. if they don’t decline) the winning team will be announced publicly. The remaining teams may decide if they choose to appear on the public Leaderboard.
- Teams that reveal their identity, can nominate members of their team as co-author for the challenge paper.
- We reserve the right to exclude teams and team members if they do not adhere to the challenge rules and guidelines.
- The method description submitted by the authors may be used in the publication of the challenge results. Personal data of the authors can include their name, affiliation and contact addresses.
- Participating teams may publish their own results separately.
- Teams are allowed three submissions, however, only the latest submission will be considered.
- All participating teams can choose if they will appear on the leaderboard. However (to prevent misconduct) only teams that open source their code will be eligible to win the challenge and receive any prizes.
12 Oct 2023
|13:30 - 13:35 PDT||Welcome|
|13:35 - 14:05 PDT||Keynote: Latha Pemula (Applied Scientist, AWS AI Labs, Homepage)|
|14:05 - 14:15 PDT||Challenge Intro|
|14:15 - 15:45 PDT||Participants Presentations|
|15:45 - 16:00 PDT||Coffee Break|
|16:00 - 16:30 PDT||Results|
|Challenge Data Release:||01 April 2023|
|Registration Until:||30 August 2023|
|Submission of Dockers closes:|
|Submit abstract before:|
|Announcement of Results:||Miccai 2023|
Register for the Challenge
Go to <synapse.org> and click on Join.
Get access to the data
Go to <synapse.org> and go to the Data tab. In the mood folder, you can access the data.
We suggest the following folder structure (to work with our github examples):
Load the scans
In python you can load and write the nifti files using nibabel:
pip install nibabel
Load the image data and affine matrix:
1 2 3 4 5 import nibabel as nib nifti = nib.load(source_file) data_array = nifti.get_fdata() affine_matrix = nifti.affine
- Save a array as nifti file:
1 2 3 4 import nibabel as nib new_nifti = nib.Nifti1Image(data_array, affine=affine_matrix) nib.save(new_nifti, target_file)
Build a docker
Please install and use docker for submission: https://www.docker.com/get-started
You can build and use any docker base/ image you like. There are already good base docker images to build on for:
- Pytorch: https://hub.docker.com/r/pytorch/pytorch/
- Tensorflow https://www.tensorflow.org/install/docker
For GPU support you may need to install the NVIDIA Container Toolkit: https://github.com/NVIDIA/nvidia-docker
2. Docker Setup
For the different tasks the docker needs the following scripts (which accept parameter
/workspace/run_sample_brain.sh input_folder output_folder(for the brain dataset)
/workspace/run_sample_abdom.sh input_folder output_folder(for the abdominal dataset)
/workspace/run_pixel_brain.sh input_folder output_folder(for the brain dataset)
/workspace/run_pixel_abdom.sh input_folder output_folder(for the abdominal dataset)
The docker has to allow mounting the input folder to
/mnt/data and the output folder to
/mnt/pred. We will mount the input and output folder and pass them to the run scripts.
You will only have write access to
/mnt/pred so please also use it for temporary results if needed.
During testing, the docker image will be run with the following commands:
Your model should process all scans and write the outputs to the given output directory.
Please be aware of the different output formats depending on your task:
- Sample-level: For each input file (e.g. input1.nii.gz) create a txt file with the same name and an appended “.txt” file-ending (e.g. input1.nii.gz.txt) in the output directory and write out a single (float) score.
- Pixel-level: For each input file (e.g. input1.nii.gz) create a nifti file with the same dimensions and save it under the same (e.g. input1.nii.gz) in the output directory.
For more information have a look at our github example: https://github.com/MIC-DKFZ/mood
Test a docker locally
You can test you docker locally using the toy cases. After submitting your docker, we will also report the toy-test scores on the toy examples back to you, so you can check if your submission was successful and the scores match.
To run the docker locally you can:
- Either run the docker manually and use the
evalresults.pyfile (not recommended)
- Or use the
Clone the github repo:
git clone https://github.com/MIC-DKFZ/mood.git
Install the requirements:
pip install -r requirements.txt
-d you can pass your docker image name, with
-i you pass the path to your base input directory which must contain a
brain and an
abdom folder (see above -> ‘Get access to the data’), and with -t you can define the task, either
Test a docker on our submission system
You can test you docker on our submission system only for the toy cases. After submitting your docker, we will report the toy-test scores on the toy examples back to you, so you can check if your docker runs successful and the scores match with your local system.
To test your docker please submit to the tasks Toy Examples - Pixel-Level or Toy Examples - Sample-Level.
In the next Section on “Submit a docker” you can see how to submit a docker but submit to the tasks Toy Examples - Pixel-Level or Toy Examples - Sample-Level only.
Please not that this will not count as a MOOD 2020 submission. Submitting to the Toy Examples tasks will not increase you submission count and will not include your submission in the challenge.
Submit a docker
Becoming a certified Synapse user:
Important: In order to use all docker functionality you must be a certified user : https://www.synapse.org/#!Quiz:
Create a new project on synapse:
To submit a docker file, you first need to create a new project on the synapse platform (e.g:
MOOD_submission_<task>_<Your team name>).
Note the Synapse Project ID (e.g. syn20482334).
The organizing team of the challenge (Medical Out of Distribution Analysis Challenge 2020 Organizers) must be given download permissions to your project (Under Project Settings -> Project Sharing Settings ),
Upload docker-image to synapse:
1 2 3 docker login docker.synapse.org docker tag docker_imagename docker.synapse.org/synapse_project_ID/docker_imagename docker push docker.synapse.org/synapse_project_ID/docker_imagename:latest
- Submit the docker image:
- Got to your Project Page → Docker and click on the docker file .
- Click on Docker Repository Tools → Submit Docker Repository to Challenge .
- Choose the ‘tag’ you want to submit.
- Choose the task you want to submit to (either ‘Sample-level’ or ‘Pixel-level’) (if you plan to participate in multiple tasks of the challenge, you need to submit your corresponding docker to each queue individually) .
- Specify if you are entering alone or as a team.
Read-only files system:
OSError: [Errno 30] Read-only file system: '/workspace/XXXX
Please make sure only to write your (temp) data to
/mnt/predand perhaps set
Python site packages permission error:
File "/opt/venv/lib/python3.X/site-packages/pkg_resources/__init__.py", line XXXX, in get_metadata with io.open(self.path, encoding='utf-8') as f: PermissionError: [Errno 13] Permission denied
Please make sure you use virtual environments and use the virtualenv path as the first place to check for packages.
FileNotFoundError: [Errno 2] No such file or directory: './XXX'
Please only use absolute paths, i.e.
Any other Errors ? Feel free to contact us =).
Submit the abstract
To offically take part in the challenge, the participants are required to provide a short abstract describing their submission. You can find the abstract templates on Synapse or Overleaf. Please fill out the provided template and send it as pdf to email@example.com before the 07 September 2022.
Cite the Challenge
Please be sure to include the following citations in your work if you use the challenge data:
For more information see https://zenodo.org/record/6362313.
David Zimmerer, Jens Petersen, Gregor Köhler, Paul Jäger, Peter Full, Klaus Maier-Hein
Div. Medical Image Computing (MIC), German Cancer Research Center (DKFZ)
Tobias Roß, Tim Adler, Annika Reinke, Lena Maier-Hein
Div Computer Assisted Medical Interventions (CAMI), German Cancer Research Center (DKFZ)
- Offical Website/ Submission system: https://synapse.org/mood
- Github-Repository: https://github.com/MIC-DKFZ/mood
- Challenge-Document: https://zenodo.org/record/6362313
- Miccai 2022: https://conferences.miccai.org/2023/en/
- Some Anomaly Detection/ OoD papers: https://paperswithcode.com/task/anomaly-detection , https://paperswithcode.com/task/out-of-distribution-detection
- Previous editions: http://medicalood.dkfz.de/web/2020/ , http://medicalood.dkfz.de/web/2021/, http://medicalood.dkfz.de/web/2022/
- Inattentional blindness in expert observers paper: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3964612/
- NVidia: https://www.nvidia.com/
- Theme: https://github.com/t413/SinglePaged