J Nucl Med. 2012; 53 (Supplement 1):506
Oncology: Clinical Diagnosis
Hodgkin Disease and Myeloma
Analysis of the Deauville criteria for the assessment of interim PET in advanced stage Hodgkin lymphoma patients enrolled in the IVS study part II: Reliability of score and concordance between reviewers
Stephane Chauvie1 and
1 S. Croce and Carle Hospital, Cuneo, Italy
2 CHU H. Mondor, Paris, France
3 Mount Sinai Medical Center, New York, NY
4 Rigshospitalet Copenhagen, Copenhagen, Denmark
5 St.Thomas Hospital, London, United Kingdom
6 Oncology Institute of Veneto, Padova, Italy
Abstract No. 506
Objectives: The Deauville criteria FDG uptake is scored based on intensity. The choice of threshold to define a positive scan depends on whether a high sensitivity or specificity is desired. The aim of this study was to test if the level of agreement between reporters is also influenced by threshold.
Methods: An international validation study (IVS) was performed to measure PFS in HL according to interim PET-CT. Paired scans were reported by 6 independent reviewers and uptake scored as: (1) none (2) mediastinum (3) liver (4) moderately increased uptake > liver (5) markedly increased uptake > liver. For IVS, score 4 or 5 was regarded as positive, score 1,2,3 as negative. Levels of agreement between reporters were measured for IVS (5 categories), then for 3 categories using score 1 or 2 (complete metabolic response) vs score 3 (equivocal) vs score 4 or 5 (disease) and for individual scores of 1 vs 2 vs 3 vs 4 vs 5. Agreement was calculated using Krippendorfs alpha coefficient.
Results: 261 paired scans were evaluated. There was concordance between the majority of reporters (> 4 agreed on the identical score) in 97% of cases for IVS, in 87% using CMR vs equivocal vs disease categories and in 65% using individual scores of 1-5. The level of agreement was α = 0.758 (IVS) α = 0.542 (3 categories) α = 0.352 (scores 1-5).
Conclusions: Agreement between reporters was fair using scores 1-5 and moderate or good using fewer but more clinically relevant categories. Levels of agreement are better using the liver rather than the mediastinum as a reference region