- Babier, Aaron;
- Mahmood, Rafid;
- Zhang, Binghao;
- Alves, Victor;
- Barragán-Montero, Ana;
- Beaudry, Joel;
- Cardenas, Carlos;
- Chang, Yankui;
- Chen, Zijie;
- Chun, Jaehee;
- Diaz, Kelly;
- David Eraso, Harold;
- Faustmann, Erik;
- Gaj, Sibaji;
- Gay, Skylar;
- Gronberg, Mary;
- Guo, Bingqi;
- He, Junjun;
- Heilemann, Gerd;
- Hira, Sanchit;
- Huang, Yuliang;
- Ji, Fuxin;
- Jiang, Dashan;
- Carlo Jimenez Giraldo, Jean;
- Lee, Hoyeon;
- Lian, Jun;
- Liu, Shuolin;
- Liu, Keng-Chi;
- Marrugo, José;
- Miki, Kentaro;
- Nakamura, Kunio;
- Netherton, Tucker;
- Nguyen, Dan;
- Nourzadeh, Hamidreza;
- Osman, Alexander;
- Peng, Zhao;
- Darío Quinto Muñoz, José;
- Ramsl, Christian;
- Joo Rhee, Dong;
- David Rodriguez, Juan;
- Shan, Hongming;
- Siebers, Jeffrey;
- Soomro, Mumtaz;
- Sun, Kay;
- Usuga Hoyos, Andrés;
- Valderrama, Carlos;
- Verbeek, Rob;
- Wang, Enpei;
- Willems, Siri;
- Wu, Qi;
- Xu, Xuanang;
- Yang, Sen;
- Yuan, Lulin;
- Zhu, Simeng;
- Zimmermann, Lukas;
- Moore, Kevin;
- Purdie, Thomas;
- McNiven, Andrea;
- Chan, Theodore
Objective.To establish an open framework for developing plan optimization models for knowledge-based planning (KBP).Approach.Our framework includes radiotherapy treatment data (i.e. reference plans) for 100 patients with head-and-neck cancer who were treated with intensity-modulated radiotherapy. That data also includes high-quality dose predictions from 19 KBP models that were developed by different research groups using out-of-sample data during the OpenKBP Grand Challenge. The dose predictions were input to four fluence-based dose mimicking models to form 76 unique KBP pipelines that generated 7600 plans (76 pipelines × 100 patients). The predictions and KBP-generated plans were compared to the reference plans via: the dose score, which is the average mean absolute voxel-by-voxel difference in dose; the deviation in dose-volume histogram (DVH) points; and the frequency of clinical planning criteria satisfaction. We also performed a theoretical investigation to justify our dose mimicking models.Main results.The range in rank order correlation of the dose score between predictions and their KBP pipelines was 0.50-0.62, which indicates that the quality of the predictions was generally positively correlated with the quality of the plans. Additionally, compared to the input predictions, the KBP-generated plans performed significantly better (P< 0.05; one-sided Wilcoxon test) on 18 of 23 DVH points. Similarly, each optimization model generated plans that satisfied a higher percentage of criteria than the reference plans, which satisfied 3.5% more criteria than the set of all dose predictions. Lastly, our theoretical investigation demonstrated that the dose mimicking models generated plans that are also optimal for an inverse planning model.Significance.This was the largest international effort to date for evaluating the combination of KBP prediction and optimization models. We found that the best performing models significantly outperformed the reference dose and dose predictions. In the interest of reproducibility, our data and code is freely available.