STARD2015に学ぶ「診断精度の分析」の書き方

STARD2015に学ぶ
「診断精度の分析」の書き方
統計数理研究所リスク解析戦略研究センター
特任助教竹林由武
16/07/09 第26回REQUIRE研究会診断精度の分析: STARD2015
東京医科歯科大学湯島キャンパス: 14:30-17:45
ytake2 [at] ism.ac.jp

発表の構成 2
概要
バイア
ス
書き方
「診断精度の分析」の概要(5min)
「診断精度の分析」における
バイアスのリスク (10min)
「診断精度の分析」の書き方(25min)

診断精度研究の特徴
• 定義
• 精度指標
• 研究疑問 (PIRATE)
• 研究デザイン
3

• 定義
現状において最も精度が高い診断法を参照
基準 (至適基準)として、関心のある指標検査
の診断精度を検討する研究
4
真陽性 (TP) 偽陽性(FP)
陽性者の数
(TP+FP)
偽陰性 (FN) 真陰性 (TN)
陰性者の数
(FN+TN)
病気の者の数
(TP+FN)
病気の者の数
(FP+TN)
全員の数
(TP+FP+FN+TN)
病気あり (+) 病気なし (-)
+
-
指標検査
参照基準 (真の結果)

• 感度: 参照基準 +の中で、指標検査+の割合
• 特異度: 参照基準 −の中で、指標検査−の割合
診断精度：指標 (2値検査) 5
陽性者の数
(TP+FP)
陰性者の数
(FN+TN)
病気の者の数
(TP+FN)
病気の者の数
(FP+TN)
全員の数
(TP+FP+FN+TN)
+
-
指標検査
感度
(sensitivity)
TP / (TP+FN)
特異度
(specificity)
TN / (FP+TN)TP+FN
TP
FP+TN
TN

• 陽性尤度比 (LR+): 感度 / (1-特異度)
「参照基準＋のうち、指標検査＋(感度)」
「参照基準ーのうち、指標検査で＋(1-特異度)」
陽性者の数
(TP+FP)
陰性者の数
(FN+TN)
病気の者の数
(TP+FN)
病気の者の数
(FP+TN)
全員の数
(TP+FP+FN+TN)
+
-
指標検査
(真に)病気ありに比べて、(真に)病気なしが何倍陽性になりやすいか？

• 陰性尤度比 (LR-): 1−感度 / 特異度
「参照基準＋のうち、指標検査ー(1-感度)」
「参照基準ーのうち、指標検査でー(特異度)」
陽性者の数
(TP+FP)
陰性者の数
(FN+TN)
病気の者の数
(TP+FN)
病気の者の数
(FP+TN)
全員の数
(TP+FP+FN+TN)
+
-
指標検査
(真に)病気ありに比べて、(真に)病気なしが何倍陰性になりやすいか？

• 診断オッズ比 (DOR): 1−感度 / 特異度
「陽性尤度比」
「陰性尤度比」
値が大きいほど、診断精度が高い

• 陽性的中率: 指標検査+のうち、参照基準+の割合
• 陰性的中率: 指標検査−のうち、参照基準−の割合
診断精度: 指標 (2値検査) 9
陽性者の数
(TP+FP)
陰性者の数
(FN+TN)
病気の者の数
(TP+FN)
病気の者の数
(FP+TN)
全員の数
(TP+FP+FN+TN)
+ -
指標検査
陽性的中率
(PPV)
TP / (TP+FP)
陰性的中率
(NPV)
TN / (FN+TN)
TP+FP
TP
FN+TN
TN
+
-

診断精度: 指標 (連続検査)
• ROC曲線
– 曲線化面積 (診断精度)
– 最適カットオフ
10
D+ D-
指標+ TP FP
指標- FN TN
D+D-
Cut-off
①
②
③
TP
FPFN
TN
指標検査得点
度数

診断精度の分析: 特徴
• 研究疑問の定式化
11
http://doctorvoodoocartoons.com/diagnostic-challenge/
P p o p u l a t i o n
R r e f e r e n c e t e s t
A a c c u r a c y m e t h o d s
T t e s t c u t o f f p o i n t
E e x p e c t e d u s e
I i n d e x t e s t
母集団
指標検査
参照検査
精度検討の指標
カテゴリーの選定法
検査の用途
PICOで定式化しても良いがPIRATEの方が診断精度研究にフィット

12
P 関心のある母集団は何か？どういう状態の患者か？
I 関心のある検査は何か？
R 指標検査の検討に用いられる参照基準は何か？現
在のところ何が最善の検査か？
A 診断精度の指標は何か？ (感度、特異度、尤度比？)
T 検査による分類はどのようになされるか？カットオフ
ががどのように定められるか？
E 指標検査の用途は何か？
“Chapter4: Planning a systematic review of diagnostic test accuracy evidence”,
Synthesizing Evidence of Diagnostic Accuracy, Lippincot Williams & Wilkins, 2011

13
P 妊婦の疑いがある女性
I 32-34日目のdouble decidual sac sign (DDSS)
R 懐胎7週目の経膣超音波検査 (TVS)
A 感度、特異度、陽性・陰性尤度、陽性・陰性的中率
T 指標検査: 超音波検査でDDSが視認されるか否か。
参照基準: 超音波検査によるエキスパートの判断
E トリアージ子宮内妊娠を確定検査前に正確に診断でき
るので、効率良く子宮外妊娠者を除外できる
Richardson, A., Hopkisson, J., Campbell, B., & Raine‐Fenning, N. (2016). Use of the double decidual sac sign to
confirm intra‐uterine pregnancy location prior to ultrasonographic visualisation of embryonic contents: a diagnostic
accuracy study. Ultrasound in Obstetrics & Gynecology.

• 研究デザイン
– 基本的に横断研究, 参加者のリクルート方法で
Single Gate型とTwo-Gate型に分けられる
14
Kohn, M. A., Carpenter, C. R., & Newman, T. B. (2013). Understanding the direction of bias in
studies of diagnostic test accuracy. Academic Emergency Medicine, 20(11), 1194-1206.
Positive
(+D)
Negative
(-D)
Positive
(+D)
Negative
(-D)
Positive
(+)
TP
FP
Positive
(+)
TP
FP
TN TN
Negative
(-)
FN
Negative
(-)
FN
症例対照研究
(two gate)
横断研究
(single gate)
Separate
samples
症例対照研究では、陰性・陽性的中率 (PPV, NPV), 有病率 (apparent or true
prevalence)が正しく算出できないので、報告している研究の結果の解釈は要注意

発表の構成 15
概要
バイア
ス
書き方

研究計画のステップとバイアス 16
ステップ留意すべきバイアス
1 研究目的の設定
2 標的母集団の特定
3 標本抽出計画の選定 selection / spectrum bias
4 参照基準の選定 imperfect gold standard bias,incorporation
bias, treatment paradox, disease progression
bias, work-up bias, differential verification
bias, verification bias
5 精度指標の選定 Location bias
6 標的評価者の母集団の特定
7 標的評価者の抽出計画の選定
8 データ収集計画 diagnostic review bias, test review bias,
reading order bias, context bias
9 データ解析計画
10 標本サイズの決定
参照基準の選び方や測定の仕方が超重要
Obuchowski, N. A., & McClish, D. K. (2011). Statistical methods in diagnostic medicine. Wiley.

診断精度研究の主要なバイアス
• 組み入れバイアス (incorporation bias)
• 検証バイアス１ (partial verification bias)
• 検証バイアス２ (differential verification bias)
• 誤分類バイアス (imperfect gold standard bias)
• スペクトラムバイアス１ (disease and non disease)
• スペクトラムバイアス２ (ambiguous test results)
17

参照基準の中に、指標検査 (の項目)が含まれて
いる場合に生じるバイアス
18
参照基準が、複数の検査項目群からなる場合)
指標検査はその項目群から除かれている？
参照基準が、エキスパートによる病態評価である場合)
エキスパートは指標検査の結果について盲検化
された？
→ 参照基準と指標検査が独立しているか？
組み入れバイアス

検証バイアス ①: Partial verification bias 19
指標検査で陽性の人は、参照基準による検査を受けやす
く、参照基準をうけた人だけ研究に含まれることで、真
陰性者と偽陰性者に欠落が生じる
本当のクロス集計バイアスありのクロス集計
Positive
(+D)
Negative
(-D)
Positive
(+D)
Negative
(-D)
Positive
(+)
TP
23
FP
87 Positive
(+)
TP
23
FP
87
TN
55
TN
55
Negative
(-)
FN
14 TN'
182
Negative
(-)
FN
14
FN'
13
Exclude !!
感度=TP/ (TP+FN+FN’)=23 / 50 =46%
特異度= TN’ / (FP+TN+TN’)= 237 / 324 = 73%
感度=TP/ (TP+FN)=23 / 37 =62%
特異度= TN / (FP+TN)= 55 / 142 = 39%
バイアスの方向性: 感度↑up、特異度↓down

検証バイアス ①: Partial verification bias20
指標検査で陽性の人は、参照基準による検査を受けやす
く、参照基準をうけた人だけ研究に含まれることで、真
陰性者と偽陰性者に欠落が生じる
Positive
(+D)
Negative
(-D)
Positive
(+D)
Negative
(-D)
Positive
(+)
TP
23
FP
87 Positive
(+)
TP
23
FP
87
TN
55
TN
55
Negative
(-)
FN
14 TN'
182
Negative
(-)
FN
14
FN'
13
Exclude !!
感度=TP/ (TP+FN+FN’)=23 / 50 =46%
特異度= TN’ / (FP+TN+TN’)= 237 / 324 = 73%
感度=TP/ (TP+FN)=23 / 37 =62%
特異度= TN’ / (FP+TN+TN’)= 237 / 142 = 39%
バイアスの方向性: 感度↑up、特異度↓down
審美眼
研究参加者の組み入れが、
単一の参照基準の結果に基づいている場合)
参照基準を実施するか否かが、指標検査の
結果とは独立しているか？

検証バイアス② : differential verification bias
指標検査と参照基準の実施間隔が空いた時に、
指標検査で偽陰性の人が病気が改善して、偽陰
性数が下がり、真陰性数が上がる。
Positive
(+D)
Negative
(-D)
Positive
(+D)
Negative
(-D)
Positive
(+)
TP
311
TP
336
Positive
(+)
TP
311
TP
336
TN'
300
TN'
300
Negative
(-)
FN'
5 + 6
Negative
(-)
FN'
5 6
感度
=TP/ (TP+FN’)
=311 / 322 = 96.6%
特異度
= TN’ / (FP+TN’)
= 300 / 636 = 47.2%
感度
=TP/ (TP+FN)
=311 / 316 = 98.4%
特異度
= TN’ / (FP+TN+TN’)
= 300 / 642 = 47.4%
バイアスの方向性: 感度↑up、特異度↑up

検証バイアス② : differential verification bias
指標検査と参照基準の実施間隔が空いた時に、
指標検査で偽陰性の人が病気が改善して、偽陰
性数が下がり、真陰性数が上がる。
Positive
(+D)
Negative
(-D)
Positive
(+D)
Negative
(-D)
Positive
(+)
TP
311
TP
336
Positive
(+)
TP
311
TP
336
TN'
300
TN'
300
Negative
(-)
FN'
5 + 6
Negative
(-)
FN'
5 6
感度
=TP/ (TP+FN’)
=311 / 322 = 96.6%
特異度
= TN’ / (FP+TN’)
= 300 / 636 = 47.2%
感度
=TP/ (TP+FN)
=311 / 316 = 98.4%
特異度
= TN’ / (FP+TN+TN’)
= 300 / 642 = 47.4%
バイアスの方向性: 感度↑up、特異度↑up
審美眼
多くの指標検査の陽性者には、すぐに参
照基準を実施し、指標検査陰性者には
フォロアップを実施しているか？
その場合、フォローアップはすぐに実施し
た参照基準の結果と同じだった？

誤分類バイアス: imperfect gold standard bias
参照基準の分類結果が不正確であることで発生
するバイアス
Positive
(+D)
Negative
(-D)
Positive
(+D)
Negative
(-D)
Positive
(+)
TP
Cx(+)5 TP
0
Positive
(+)
TP
Cx(+)5
TP
0
TP'
Cx(-)5
TP'
Cx(-)5
Negative
(-)
FN
Cx(+)3
TN
185
Negative
(-)
FN
Cx(+)3
TN
185
FN'
Cx(-)14
FN'
Cx(-)14
感度
=TP/ (TP+FN’)
=10 / 27 = 37%
特異度
= TN’ / (FP+TN’)
= 185 / 185 = 100%
感度
=TP/ (TP+FN’)
=3 / 8 = 68.5%
特異度
= TN’ / (FP+TN’)
= 199 / 204 = 97.5%
(この例の)バイアスの方向性: 感度↑up、特異度↓down

誤分類バイアス: imperfect gold standard bias
参照基準の分類結果が不正確であることで発生
するバイアス
Positive
(+D)
Negative
(-D)
Positive
(+D)
Negative
(-D)
Positive
(+)
TP
Cx(+)5 TP
0
Positive
(+)
TP
Cx(+)5
TP
0
TP'
Cx(-)5
TP'
Cx(-)5
Negative
(-)
FN
Cx(+)3
TN
185
Negative
(-)
FN
Cx(+)3
TN
185
FN'
Cx(-)14
FN'
Cx(-)14
感度
=TP/ (TP+FN’)
=10 / 27 = 37%
特異度
= TN’ / (FP+TN’)
= 185 / 185 = 100%
感度
=TP/ (TP+FN’)
=3 / 8 = 68.5%
特異度
= TN’ / (FP+TN’)
= 199 / 204 = 97.5%
(この例の)バイアスの方向性: 感度↑up、特異度↓down
審美眼
参照基準は、常に正しく標的条件を分類し
ているか？

スペクトラムバイアス: disease and non disease
参照基準、軽度の症状がある陰性者を除外
(極端に健康な人のみをNegative(-)にする)
Positive
(+D)
Negative
(-D)
Positive
(+D)
Negative
(-D)
Positive
(+)
TP
670
FP'
60
FP
142 Positive
(+)
TP
670
FP
142
TN
628
TN
628
TN'
12
Negative
(-)
FN
74
Negative
(-)
FN
74
Exclude!!
感度
=TP/ (TP+FN)
=670 / 744 = 90%
特異度
= TN’+TN / (FP’+FP+TN’+TN)
= 640 / 842 = 76%
感度
=TP/ (TP+FN)
=670 / 744 = 90%
特異度
= TN / (FP+TN)
= 628 / 770 = 82%
(この例の)バイアスの方向性: 特異度↑up (感度は不変)

スペクトラムバイアス: disease and non disease
軽度の症状がある陰性者を除外
(極端に健康な人のみをNegative(-)にする)
Positive
(+D)
Negative
(-D)
Positive
(+D)
Negative
(-D)
Positive
(+)
TP
670
FP'
60
FP
142 Positive
(+)
TP
670
FP
142
TN
628
TN
628
TN'
12
Negative
(-)
FN
74
Negative
(-)
FN
74
Exclude!!
感度
=TP/ (TP+FN)
=670 / 744 = 90%
特異度
= TN’+TN / (FP’+FP+TN’+TN)
= 640 / 842 = 76%
感度
=TP/ (TP+FN)
=670 / 744 = 90%
特異度
= TN / (FP+TN)
= 628 / 770 = 82%
(この例の)バイアスの方向性: 特異度↑up (感度は不変)
審美眼
D+とD-は個別に標本抽出されているか？
D+のスペクトラムは適切か？
中等症ケースがD+に含まれているか？
D-のスペクトラムは適切か？
D+の疑いがあるような人も含めて幅広く標本抽
出がされているか？

曖昧な指標検査結果を除外
Positive
(+D)
Negative
(-D)
Positive
(+D)
Negative
(-D)
high prob
TP
89
FP
13 high prob
TP
89
FP
13
FP'
105intermediate
prob
TP'
47
intermediate
prob
TN'
474low and
very low
prob
FN'
30
low and
very low
prob
TN
150
TN
150normal
FN
2
normal
FN
2
スペクラムバイアス： removing ambiguous test results
Exclude !!
感度
=TP+TP’/
(TP+TP’+FN+FN’)
=139 / 168 = 81%
特異度
= TN’+TN /
(FP’+FP+TN’+TN)
= 624 / 742 = 84%
感度
=TP/ (TP+FN)
=89 / 91 = 98%
特異度
= TN / (FP+TN)
= 150 / 163 = 92%
バイアスの方向性: 感度↑up, 特異度↑up

曖昧な検査結果を除外
Positive
(+D)
Negative
(-D)
Positive
(+D)
Negative
(-D)
high prob
TP
89
FP
13 high prob
TP
89
FP
13
FP'
105intermediate
prob
TP'
47
intermediate
prob
TN'
474low and
very low
prob
FN'
30
low and
very low
prob
TN
150
TN
150normal
FN
2
normal
FN
2
スペクラムバイアス： removing ambiguous test results
Exclude !!
感度
=TP+TP’/
(TP+TP’+FN+FN’)
=139 / 168 = 81%
特異度
= TN’+TN /
(FP’+FP+TN’+TN)
= 624 / 742 = 84%
感度
=TP/ (TP+FN)
=89 / 91 = 98%
特異度
= TN / (FP+TN)
= 150 / 163 = 92%
バイアスの方向性: 感度↑up, 特異度↑up
審美眼
指標検査結果が曖昧(中等症、軽症)な人が研究に
含まれているか？

QUADAS2
• 診断精度研究のバイアスのリスクの評価項目
A 参加者選定 (STARD2015, 5,6,7,9)
B 指標検査 (STARD2015, 12a, 13a)
C 参照基準 (STARD2015, 12b, 13b)
D フローとタイミング (STARD2015, 8, 22)
29

QUADAS2
A 参加者選定 (STARD2015, 5,6,7,9)
参加者は、連続あるいはランダムに抽出した？
症例対照研究ではない？
不適切なデータの除外を行っていない？
30

QUADAS2
B. 指標検査 (STARD2015, 12a, 13a)
指標検査の結果は参照基準の結果を盲検化して評価した？
閾値が用いられた場合、事前に定義した？
C. 参照基準 (STARD2015, 12b, 13b)
参照基準の結果は指標検査の結果を盲検化して解釈した？
参照基準は標的症状を正しく分類していると仮定される？
31

QUADAS2
D. フローとタイミング (STARD2015, 8, 22)
指標検査と参照基準の実施間隔は適切か？
全ての参加者に参照基準を実施した？
全ての参加者が同一の参照基準で分類した？
全ての参加者が解析に含まれているか？
32

バイアスのリスクまとめ
• 診断精度研究は、研究デザインが命
• バイアス発生の原因と方向性を理解して、そ
の対処を考慮し事前に研究計画
• 関心は新しい検査の精度であるが、研究の質
を高めるには、参照基準の質、幅広い対象者
の選定、指標の測定期間を考慮すべき
33

発表の構成 34
概要
バイア
ス
書き方

「診断精度の分析」の書き方
• 適正報告調査
Korevaar, D. A., van Enst, W. A., Spijker, R., Bossuyt, P. M., & Hooft, L. (2014).
Reporting quality of diagnostic accuracy studies: a systematic review and meta-analysi
of investigations on adherence to STARD. Evidence Based Medicine, 19(2), 47-54.
• 報告ガイドライン: STARD2015 Bossuyt, P. M., Reitsma, J. B.,
Bruns, D. E., Gatsonis, C. A., Glasziou, P. P., Irwig, L., ... & Kressel, H. Y. (2015).
STARD 2015: an updated list of essential items for reporting diagnostic accuracy
studies. Radiology, 277(3), 826-832.
35
診断精度研究のガイドライン
バイアスがかかるポイントを押さえて、そのポイントに関し説明・記述することが重要

STARD2003- QUADAS2
• STARD2003やQUADAS2の解説
36
http://www.slideshare.net/shinjiyamagata/v14-
17600186
http://www.slideshare.net/YoshihikoKunisato/ss-
40713224?qid=45212543-1b92-4b13-9958-
391e479aee76&v=&b=&from_search=3

STARD2003→2015
• 報告基準、STARDが2015年に改定
37
Korevaar et al. Research Integrity and Peer Review (2016) 1:7DOI 10.1186/s41073-016-0014-7
主要な修正・追加点
表題 STARD for abstracts に基づく構造化抄録
序論指標検査の用途、研究仮説の設定
方法陽性カットオフやカテゴリの事前特定or探索
変動性の分析の事前特定
例数設計
結果解析対象者のリクルートフロー図の使用
考察潜在するバイアスに関する考察
臨床的意義
その他研究の事前登録、プロトコルの公開、資金源

表題や抄録 (STARD: 項目１,２)
• 精度指標(感度, 特異度, 予測値, あるいはAUC)
を少なくとも1つは使って、診断精度の研究で
あることを明確にする。
• STARD for Abstractで構造化抄録
※ STARD for Abstractsはまだ公表
されていないけど、以下の文献で、ドラフトを
見ることができる
38
Bossuyt P. Draft STARD for abstracts (personal communication). 2016.
https://www.ruor.uottawa.ca/bitstream/10393/34253/1/Protocol%20for
%20Registration%201-3-2016.pdf

STARD for Abstructs (仮) 39
構造化抄録の構造
表題
背景と目的
研究デザイン
参加者
検査手法
フローダイアグラム
検査結果
考察
Bossuyt P. Draft STARD for abstracts (personal communication). 2016.

序論 (STARD: 項目3)
• 研究目的と仮説を明確に述べる。その際、指
標検査の用途(臨床上の有用性)を明記する。
40
スクリーニ
ング検査
スクリーニ
ング検査
新規検査スクリーニ
ング検査
既存検査新規検査既存検査既存検査
新規検査
replacement triage add-on現状
指標検査の利点も明記 (早い？、安い？、正確？)

序論 (STARD: 項目3)
• 記載例) 用途 (トリアージ)
• 記載例) 仮説
41
A gestation sac is the first ultrasonographic sign of an intrauterine pregnancy
(IUP). It appears as a uniformly round, hypoechoic structure with an
echogenic rim. Initially it does not contain any internal echoes and can
therefore be difficult to differentiate from a ‘pseudosac’, that is, an
endometrial fluid collection that occurs in up to 15% of ectopic pregnancies
(EPs) (1). It is clinically important not to confuse these two structures and
hence several different ultrasonographic signs have been proposed to help
differentiate between them prior to visualisation of any embryonic contents.
The double decidual sac sign (DDSS) is one such sign.
our hypothesis being that all intrauterine fluid collections that
exhibit the DDSS represent a true gestation sac.
Richardson, A., Hopkisson, J., Campbell, B., & Raine‐Fenning, N. (2016). Use of the double decidual sac sign to
confirm intra‐uterine pregnancy location prior to ultrasonographic visualisation of embryonic contents: a diagnostic
accuracy study. Ultrasound in Obstetrics & Gynecology.

STARD2015 方法と結果の節
• 参加者 (QUADAS2: A, D)
• 検査手法 (QUADAS2: B, C, D)
• 分析
• その他
42

参加者
• 方法の節
– 組み入れ基準の詳細
– 組み入れのフロー (セッティング, 場所, 日付)
– 参加者の構成
• 結果の節
– 参加者のフローダイアグラム
– ベースライン属性、臨床特性
– 指標検査と参照基準の測定間隔, 臨床介入
43

• ダイアグラムを用いて、どのようなフローで
解析対象者が選定されたか明記する。
– 潜在組み入れ対象者
– 組み入れ対象者
– 指標検査の実施者数
– 参照基準の実施者数
– 最終的な解析対象
参加者 44
各フェイズで除外された人数と除外理由を記載し、フロー明確に

参加者
• 結果の節
参加者のフローダイアグラム
– 記載例
45
Richardson, A., Hopkisson, J., Campbell, B., & Raine‐Fenning, N. (2016). Use of the double decidual sac sign to confirm
intra‐uterine pregnancy location prior to ultrasonographic visualisation of embryonic contents: a diagnostic accuracy
study. Ultrasound in Obstetrics & Gynecology.

参加者
• 結果の節
ー参加者のフローダイアグラム
• 記載例
46
Between 1st January and 31st October 2015, 620 IVF/ICSI cycles were undertaken
within the unit. Of these, 124 (20%) women agreed to participate in the study. In
addition to these, a further six women were approached by one of the authors at
the time of embryo transfer and declined to participate in the study due to
various reasons, namely work commitments (n=3), reluctance to have a TVS
(n=2) and distance to travel to the clinic (n=1). 45 (36.3%) of the 124 women
were subsequently excluded as they had a negative urinary pregnancy test. Of
the 79 women who had a positive pregnancy test, two (2.53%) did not attend
for the index test and nine (11.39%) of those that did attend did not have an
intrauterine fluid collection present on TVS and were therefore excluded. 77
intrauterine fluid collections were observed in the remaining 68 women (nine of
the women had two intrauterine fluid collections detected).

参加者
• 方法の節
– 組み入れ可能な対象者を特定する基準
• 症状
• 先行する検査結果
• レジストリー
– 組み入れ可能な対象者をいつどこで特定したか？
• セッティング、場所、日付
47

参加者
• 方法の節
– 研究デザイン
– 組み入れ可能な対象者を特定する基準
– 組み入れ可能な対象者をいつどこで特定したか？
• 記載例
48
Participants were recruited prospectively from Nurture Fertility,
Nottingham, United Kingdom between 1st January and 31st October
2015. Women were aged between 18 and 45 years of age and had
undergone IVF/ICSI treatment using a standard long agonist or antagonist
protocol depending on ovarian reserve tests as previously described (13).
The study was well advertised within the IVF unit using posters and
patient information leaflets. Whenever possible, one of the authors (AR)
was also present to discuss the study with women following their embryo
transfer procedure. All women were invited to participate in the study.

参加者
• 方法の節
– 組み入れ (or 除外)基準の詳細
• 記載例
49
Women were excluded from the study if they had a negative urinary
pregnancy test (performed 18 days after oocyte retrieval in a fresh cycle or
13-16 days after embryo transfer in a frozen embryo replacement cycle
depending on the stage of embryo development at the time of transfer) or if,
at the time of the index test, there was either no ultrasonographic evidence
of an intrauterine fluid collection, or a yolk sac and/or fetal pole was clearly
visible within the intrauterine fluid collection. Women were also excluded if
no outcome data were available or if, following the reference standard, the
final diagnosis was not known (for example resolving or persistent
pregnancies of unknown location).

参加者
• 結果の節
– 指標検査と参照基準の測定間隔やその間実施され
た臨床介入を明記する
• 記載例
50
If the urinary pregnancy test was positive, an early ultrasound scan was
scheduled for either 19 or 20 days after oocyte retrieval corresponding to a
gestational age of 33 or 34 days. This range was specifically chosen to
optimize the chances of a gestation sac being present but a yolk sac or fetal
pole being absent (14, 15).
All women were scheduled to have a routine viability ultrasound scan at
between 6 and 7 weeks gestation (between 8 and 16 days after the index
test) as per the fertility unit’s standard practice.
指標検査
参照基準
検証バイアス(differential verification bias)②をチェック
※ この研究では、臨床介入なし

参加者
• 結果の節
– ベースライン属性、臨床特性
• 記載例(Table 1)
51
The baseline characteristics of
study participants are illustrated in
Table 1 (values refer to mean
±standard deviation). These were
not significantly different from
the baseline characteristics of the
general population attending the
IVF unit during the same time
period.

検査手法
• 方法の節 (指標・参照共通)
– 検査の詳細(再現可能な程度に)
– 検査の陽性カットオフ、カテゴリ化の定義、説明
– 検査の盲検化
– 参照基準の選択理由 (参照基準のみ)
• 結果の節
– クロス集計
– 診断精度指標とその正確性(信頼区間)
– 有害事象
52

検査手法
• 方法の節
– (指標・参照)検査の詳細
• 検査の材料、器具の仕様と使用法, 具体的な測定方法に
ついて再現可能な情報を記載する(引用文献含む)
• 検査の陽性カットオフ、カテゴリ化の定義、説明
– 事前に特定されたカットオフか？事後的に精度が高くなるよう
に、分類基準をポストホックに調整していないか？
• 参照基準の選択理由
53
引用文献を含め、必ず読者が再現可能な情報を記載する

検査手法
• 方法の節
検査の盲検化記載例
指標検査
参照検査
54
Interpretation of the reference standard was performed by
an experienced gynaecologist without knowledge of the
findings from the index test.
The findings from the early scan were interpreted
immediately and recorded separate to the main clinical
notes. 実施時期は参照基準前なので、参照基準の情報
は知りようがない
バイアスのリスク評価項目

検査手法
• 結果の節
– クロス集計
• 記載例
55
Of the six intrauterine fluid collections that did not display
the DDSS, four were subsequently proven to have an IUP and
two were found to have an EP (Table 2).
本研究での記載はないが、参照基準や指標検査によって
診断が確定できなかった人数も検査の性能を知る上で重要なので報告する

検査手法
• 結果の節
– 診断精度指標とその正確性(信頼区間)
• 記載例 (２値検査)
56
The DDSS therefore has a sensitivity of 93.9% (95% CI 85.0%-98.3%), specificity of 100% (95% CI
15.8%-100%) and overall diagnostic accuracy of 94.0% (95% CI 88.3%-99.7%) for predicting an
IUP. The positive and negative predictive values are 100% (95% CI 94.1%-100%) and 33.3% (95%
CI 4.3%-77.7%) respectively whilst the positive likelihood ratio was infinite and the negative
likelihood ratio was 0.06 (95% CI 0.02-0.16).

検査手法
• 結果の節
– 有害事象
• (指標・参照)検査の実施により生じた有害事象を記載
• 記載例
57
No adverse events from performing the index
test or reference standard were reported.

分析
• 方法の節
– 多様性の分析
– 例数設計
– 欠測の扱い
58

分析
• 多様性の検討
• 実施された診断精度の多様性 (variability)の分析を報
告。事前に解析が予定されたものと探索的な解析は
区別する (事前の解析プラン有りが望ましい)。
• サブグループ解析等
59
診断精度が、特定の要因によって変動するか？

分析
• 多様性のソース
1. 患者共変量
属性、症状の種類、合併、実施施設など
1. 標的条件と関連する要因
重症度や実施地域など
1. 検査のデバイスやモダリティに関連する要因
検査機器の経年による精度の変化など
1. 検査結果の評価者要因
熟練度など
60
Obuchowski, N. A., & McClish, D. K. (2011). Statistical methods in diagnostic medicine. Wiley.

分析
• 例数設計
– 例数の設定根拠を具体的に記載する。
• 抑うつの診断精度研究、例数設計の方法を明記している
のは3%のみ
• 抑うつの診断精度研究、感度の信頼区間が10%以下であ
る研究は8%、62%が95%信頼幅が21%以上
精度の点推定値のみでなく、
正確性 (信頼区間幅)を考慮した例数設計が必要
61
Thombs, B. D., & Rice, D. B. (2016). Sample sizes and precision of estimates of sensitivity and specificity from primary studies on the
diagnostic accuracy of depression screening tools: a survey of recently published studies. International journal of methods in psychiatric
research, 25(2), 145-152.

分析
• 例数設計の手法
– 1つの検査の診断精度
• 2値検査の感度、特異度
• 連続検査のROC
– 2つの検査の診断精度の比較
• 2値検査の感度、特異度
• 連続検査のROC
62
Hajian-Tilaki, K. (2014). Sample size estimation in diagnostic test studies of biomedical
informatics. Journal of biomedical informatics, 48, 193-204.

分析
• 例数設計: 2値検査の感度・特異度
63
Za
2
2
P(1- P)
d2
´(1- Prev)
P = 感度 or 特異度
Zα/2 = 1.96(α=0.05),
d = 正確度(許容誤差)
Prev= 有病率
有意水準=0.05, 感度 = 90, 特異度 = 70, 正確度 = 0.07,
有病率 = 0.10とすると、必要な例数は…
1.962 × 90 ×10
0.072 × (1-90)
= 706
1.962 × 70 ×30
0.072 × (1-90)
= 1647
感度特異度

分析
• 例数設計: 2値検査の感度・特異度
– 記載例
64
Our sample size calculation was based on the following formula as described by
Karimollah16. …(中略)…As for our study, the predetermined values of sensitivity
and specificity were 99% and 98%, respectively, Zα/2=1.96, and the margin of
error (d) was set as ±5%, which yielded results that would be accurate to
within ±5 percentage points. Based on the formula, the sample sizes for
sensitivity and specificity were 15 and 30, respectively. Subsequently, the overall
sample sizes for sensitivity and specificity were calculated using the following
formulae, respectively:…(中略)… Prev denotes the prevalence of disease in the
population. The prevalence of disease in the population was 40% in our present
study, and thus the overall sample sizes calculated based on sensitivity and
specificity were 38.0 and 50.2, respectively. The maximum total number of
participants based on sensitivity and specificity was 50.2, and thus a sample size
of 51 was finally selected in our study.
Gao, J., Wu, H., Wang, L., Zhang, H., Duan, H., Lu, J., & Liang, Z. (2016). Validation of targeted next-generation sequencing
for RAS mutation detection in FFPE colorectal cancer tissues: comparison with Sanger sequencing and ARMS-Scorpion
real-time PCR. BMJ open, 6(1), e009532.

分析
• 例数設計連続値検査、ROC AUC
65
n =
Za
2
2
V(AUC)
d2
V(AUC) = (0.0099´e-a2
/2
)´(6a2
+16)
a =j-1
(AUC)´1.414 j-1
は逆累積標準正規分布
AUC=.70で、正確度0.07の場合に必要な例数は…
a =j-1
(0.70)´1.414 = 0.741502
V(AUC) = (0.0099´e-0.7415022
/2
)´(60.7415022
+16)= 0.145136
n =
1.962
´0.145136
0.072
=114

分析
• 例数設計 2値検査、2つの検査の比較
66
Za
2
2´ P(1- P) + Zb P1(1- P1)+ P2 (1- P2 )
é
ë
ê
ù
û
ú
2
(P1 - P2 )2
= 2つの検査の感度(or特異度)の平均
P1 = 一方の検査の感度(or特異度)
P2= もう一方の検査の感度(or特異度)
Zα/2 = 1.96(α=0.05), Zβ = 0.84(β=0.80)
P
1.96 2´0.75(1-0.25) +0.84 0.70(1-0.30)+0.80(1-0.20)é
ë
ù
û
2
(0.10)2
= 293

分析
• 例数設計連続値検査、ROC AUC、比較
67
n =
Za
2
2VH 0 (AUC) + Zb V(AUC1)+V(AUC2 )
é
ë
ê
ù
û
ú
2
AUC1 - AUC2[ ]
2
のAUCは比較する2つの検査のAUCの平均
AUC1=.70で、比較するテストとの差AUC2-AUC1=0.10を
検出力.80、95%信頼区間で検出したい場合の必要例数
VH 0(AUC)
n =
1.96 2´0.1348 +0.84 0.14531+0.11946é
ë
ù
û
2
0.80-0.70[ ]
2
= 211
V1(AUC),V2(AUC),VH 0(AUC) は1つの検査の時と同様に求める

分析
• 欠測
– 欠測の理由と割合を報告することが重要
欠測への対処
– verification bias
• BG法による補正
• 多重代入
– Differential verification bias
• Bayesian methods
など、各種脱落に応じた手法が開発されているが、
普及してはいない…
68
de Groot, J. A., Bossuyt, P. M., Reitsma, J. B., Rutjes, A. W., Dendukuri, N., Janssen, K. J., & Moons, K. G. (2011).
Verification problems in diagnostic accuracy studies: consequences and solutions. BMJ, 343, d4770.

事前登録, プロトコル公開
• 診断精度研究の事前登録率は15%程度[1]。
• 結果良好な診断精度研究はより早く出版[2]。
診断精度研究でも、
事前の研究登録、プロトコル公開は必須
69
[1] Korevaar, D. A., van Es, N., Zwinderman, A. H., Cohen, J. F., & Bossuyt, P. M. (2016). Time to publication among
completed diagnostic accuracy studies: associated with reported accuracy estimates. BMC medical research methodology,
16(1), 1.
[1] Korevaar DA, Bossuyt PM, Hooft L. Infrequent and incomplete registration of test accuracy studies: analysis of
recent study reports. BMJ Open. 2014;4(1):e004596.

STARD2015の重要追加事項
• 研究の登録番号と登録名 (項目28)
• 研究プロトコルの入手可能性 (項目29)
• 資金源 (項目30)
記載例) 方法の節, 論文末尾
70
The study was registered with www.clinicaltrials.gov
(NCT02700789) and conducted following STARD guidelines
(12). The full study protocol can be accessed by contacting
the corresponding author.
FUNDING
University of Nottingham and Nurture Fertility

プロトコル公開
• 研究プロトコル、論文として公開
• 上記を論文に記載
71
Ethical approval was given by the
National Research Ethics Service
Committee North-West (Cheshire)
on February 19, 2013
(13/NW/0010; 118638), and the
study protocol was published
(Macey et al. 2013).
Macey, R., Glenny, A., Walsh, T., Tickle, M., Worthington, H., Ashley, J., & Brocklehurst, P. (2015). The efficacy of screening
for common dental diseases by hygiene-therapists a diagnostic test accuracy study. Journal of dental research,
0022034514567335.

Take Home Message
• STARD2015を参考に、透明性の高い研究計画
• 診断精度の指標は、記述的な指標なので、測定対象
の影響をもろに受けるので、研究デザインが極めて
重要
• バイアスのリスクを考慮した、研究デザイン
• 研究の事前登録や例数設計は、RCTだけではなく、
診断精度研究でも必須
72

解析コード 73
クロス集計、診断精度
ROC曲線の描画
例数設計

クロス集計の精度指標
• EpiR
74
# クロス集計表
dat <- as.table(matrix(c(670,202,74,640), nrow = 2, byrow = TRUE))
colnames(dat) <- c("Dis+","Dis-")
rownames(dat) <- c("Test+","Test-")
# epiRによる診断精度の推定
library(epiR)
rval <- epi.tests(dat, conf.level = 0.95)
print(rval)

• pROC
ROC曲線 75
data(aSAH)
roc1<-roc(outcome ~ s100b, aSAH,
plot=T, ci=T, print.auc=T, grid=T,
show.thres=T,auc.polygon=T)
roc2<-roc(outcome ~ s100b, aSAH,
plot=T, ci=T, print.auc=T, smooth=T,
grid=T, show.thres=T,auc.polygon=T)
Specificity
Sensitivity
0.00.20.40.60.81.0
1.0 0.8 0.6 0.4 0.2 0.0
AUC: 0.740 (0.630...0.824)AUC: 0.740 (0.630...0.824)
Specificity
Sensitivity
0.00.20.40.60.81.0
1.0 0.8 0.6 0.4 0.2 0.0
AUC: 0.731 (0.630...0.833)AUC: 0.731 (0.630...0.833)
> coords(roc1, “best”, ret=c(“threshold”, “specificity”, “1-npv”)) # 最適カットオフ
threshold specificity 1-npv
0.2050000 0.8055556 0.2054795

例数設計
• 2値検査, 感度・得意度
76
# 関数
precision.sn.sp <- function(sn=NULL, sp=NULL, w = NULL, p =
NULL, sig.level=.05){
z.value <- qnorm(sig.level/2, lower.tail=FALSE)
tp.fn <- z.value^2 * (sn * (1-sn))
n.sn <- ceiling(tp.fn/((w^2)*(1-p)))
fp.tn <- z.value^2 * (sp * (1-sp))
n.sp <- ceiling(fp.tn/((w^2)*(1-p)))
return(list(n.sensitivity=n.sn, n.specificity=n.sp))
}
# 実行
precision.sn.sp(sn=.90, sp=.70, w=.07,p=.90)

例数設計
• 2値検査, 感度・得意度, 2つの検査の比較
77
# 関数
precision.comp <- function(p1=NULL, p2=NULL, sig.level=.05,
beta=0.80){
z.alpha <- qnorm(sig.level/2, lower.tail=FALSE)
z.beta<- qnorm(beta)
mean.p<-(p1+p2)/2; d.p<- (p1-p2)^2
tp.fn <- ((z.alpha) * sqrt(2*mean.p*(1-mean.p)) + (z.beta) *
sqrt(p1*(1-p1)+p2*(1-p2)))^2
n.p <- ceiling(tp.fn/d.p)
return(list(n.compare=n.p))
}
# 実行
precision.comp(p1=.70,p2=.90)

例数設計
• 連続検査, AUC
78
# 関数
precision.auc <- function(auc=NULL, w=NULL, sig.level=0.05){
z.value<-qnorm(0.05/2,lower.tail=F)
a <- qnorm(auc)*1.414
v_auc<-(0.0099*exp(1)^((-a^2)/2))*(6*a^2+16)
n.auc<- ceiling(((z.value^2)*v_auc)/w^2)
return(list(n.auc=n.auc))
}
# 実行
precision.auc(auc=.70, w=0.07)

例数設計
• 連続検査, AUC, 2つの検査の比較
79
# 関数
precision.auc.comp <- function(auc1=NULL, auc2=NULL, sig.level=0.05,beta=0.80){
z.alpha<-qnorm(sig.level/2,lower.tail=F)
z.beta<-qnorm(beta)
a1 <- qnorm(auc1)*1.414
v_auc1<-(0.0099*exp(1)^((-a1^2)/2))*(6*a1^2+16)
a2 <- qnorm(auc2)*1.414
v_auc2<-(0.0099*exp(1)^((-a2^2)/2))*(6*a2^2+16)
a_H0 <- qnorm(mean(c(auc2,auc1)))*1.414
v_auc_H0<-(0.0099*exp(1)^((-a_H0^2)/2))*(6*a_H0^2+16)
n.auc.comp<-
ceiling((z.alpha*sqrt(2*v_auc_H0)+z.beta*sqrt(v_auc1+v_auc2))^2/(auc2-auc1)^2)
return(n.auc.compare=n.auc.comp)
}
# 実行
precision.auc.comp(auc1=0.50,auc2=0.80)

参考図書
• STARD2003の解説論文の和訳が掲載。
中山健夫, & 津谷喜一郎. (2008). 臨床研究と疫学研究のための国際ルール集.
• 日本語で診断精度研究のデザイン, バイアスに関して解説。
HULLEY, S. B., et al. 木原雅子・木原正博 (訳): 医学的研究のデザイン. 2004.
• 診断精度研究の入門書
– Knottnerus, J. A., & Buntinx, F. (2009). The evidence base of clinical diagnosis.
Theory and methods of diagnostic research.
• 診断精度分析の概説書: 研究計画や統計手法 (統計手法充実)
– Zhou, X. H., McClish, D. K., & Obuchowski, N. A. (2009). Statistical methods in
diagnostic medicine (Vol. 569). John Wiley & Sons.
80

STARD2015に学ぶ「診断精度の分析」の書き方

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Similar a STARD2015に学ぶ「診断精度の分析」の書き方

Similar a STARD2015に学ぶ「診断精度の分析」の書き方 (15)

Más de Yoshitake Takebayashi

Más de Yoshitake Takebayashi (15)

STARD2015に学ぶ「診断精度の分析」の書き方