# References

- Abu-Arafeh A, Jordan H, Drummond G (2016) Reporting of method comparison studies: a review of advice, an assessment of current practice, and specific suggestions for future reports. British Journal of Anaesthesia 117:595-575.
- Altman DG (1980) Statistics and ethics in medical research. VI - Presentation of results. British Medical Journal 281:1542-1544.
- Altman DG (1991) Practical statistics for medical research. London: Chapman and Hall.
- Altman DG (1993) Construction of age-related reference centiles using absolute residuals. Statistics in Medicine 12:917-924.
- Altman DG (1998) Confidence intervals for the number needed to treat. British Medical Journal 317: 1309-1312.
- Altman DG, Chitty LS (1993) Design and analysis of studies to derive charts of fetal size. Ultrasound in Obstetrics and Gynecology 3:378-384.
- Altman DG, Chitty LS (1993) Design and analysis of studies to derive charts of fetal size. Ultrasound in Obstetrics and Gynecology 3:378-384.
- Altman DG, Chitty LS (1994) Charts of fetal size: 1. Methodology. British Journal of Obstetrics and Gynaecology 101:29-34.
- Altman DG, Gardner MJ (1988) Calculating confidence intervals for regression and correlation. British Medical Journal 296:1238-1242.
- Altman DG, Gore SM, Gardner MJ, Pocock SJ (1983) Statistical guidelines for contributors to medical journals. British Medical Journal 286:1489-1493.
- Altman DG, Machin D, Bryant TN, Gardner MJ (Eds) (2000) Statistics with confidence, 2
^{nd}ed. BMJ Books. - Armitage P (1955) Tests for linear trends in proportions and frequencies. Biometrics 11:375-386.
- Armitage P, Berry G, Matthews JNS (2002) Statistical methods in medical research. 4
^{th}ed. Blackwell Science. - Bablok W, Passing H (1985) Application of statistical procedures in analytical instrument testing. Journal of Automatic Chemistry 7:74-79.
- Barnhart HX, Barborial DP (2009) Applications of the repeatability of quantitative imaging biomarkers: a review of statistical analysis of repeat data sets. Translational Oncology 2:231-235.
- Begg CB, Mazumdar M (1994) Operating characteristics of a rank correlation test for publication bias. Biometrics 50:1088–1101.
- Bellera CA, Hanley JA (2007) A method is presented to plan the required sample size when estimating regression-based reference limits. Journal of Clinical Epidemiology 60:610-615.
- Bewick V, Cheek L, Ball J (2004) Statistics review 10: further nonparametric methods. Critical Care 8:196-199.
- Bland M (2000) An introduction to medical statistics, 3
^{rd}ed. Oxford: Oxford University Press. - Bland M (2006) How should I calculate a within-subject coefficient of variation? https://www-users.york.ac.uk/~mb55/meas/cv.htm
- Bland JM, Altman DG (1986) Statistical method for assessing agreement between two methods of clinical measurement. The Lancet i:307-310.
- Bland JM, Altman DG (1995) Comparing methods of measurement: why plotting difference against standard method is misleading. The Lancet 346:1085-1087.
- Bland M, Altman DG (1996) Statistics Notes: Measurement error proportional to the mean. British Medical Journal 313:106.
- Bland JM, Altman DG (1997) Statistics notes: Cronbach's alpha. British Medical Journal 314:572.
- Bland JM, Altman DG (1999) Measuring agreement in method comparison studies. Statistical Methods in Medical Research 8:135-160.
- Bland JM, Altman DG (2007) Agreement between methods of measurement with multiple observations per individual. Journal of Biopharmaceutical Statistics. 17:571-582.
- Borenstein M, Hedges LV, Higgins JPT, Rothstein HR (2009) Introduction to meta-analysis. Chichester, UK: Wiley.
- Box GEP, Cox DR (1964) An analysis of transformations. Journal of the Royal Statistical Society, Series B 26: 211–252.
- Boyd K, Eng KH, Page CD (2013) Area under the Precision-Recall Curve: Point Estimates and Confidence Intervals. In: Blockeel H, Kersting K, Nijssen S, Železný F (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2013. Lecture Notes in Computer Science, vol 8190. Springer, Berlin, Heidelberg.
- Brookmeyer R, Crowley JA (1982) A confidence interval for the median survival time. Biometrics 38:29-41.
- Campbell I (2007) Chi-squared and Fisher-Irwin tests of two-by-two tables with small sample recommendations. Statistics in Medicine 26:3661-3675.
- Campbell MJ, Gardner MJ (1988) Calculating confidence intervals for some non-parametric analyses. British Medical Journal 296:1454-1456.
- Chitty LS, Altman DG, Henderson A, Campbell S (1994) Charts of fetal size: 2. Head Measurements. British Journal of Obstetrics and Gynaecology 101: 35-43.
- Christensen E (1987) Multivariate survival analysis using Cox's regression model. Hepatology 7:1346-1358.
- Clopper C, Pearson ES (1934) The use of confidence or fiducial limits illustrated in the case of the binomial. Biometrika 26:404–413.
- CLSI (2003) Estimation of Total Analytical Error for Clinical Laboratory Methods; Approved Guideline. CLSI Document EP21-A. Wayne, PA: Clinical and Laboratory Standards Institute.
- CLSI (2008) Defining, establishing, and verifying reference intervals in the clinical laboratory: approved guideline - 3
^{rd}edition. CLSI Document C28-A3. Wayne, PA: Clinical and Laboratory Standards Institute. - CLSI (2012) Evaluation of detection capability for clinical laboratory measurement procedures; Approved guideline - 2
^{nd}edition. CLSI document EP17-A2. Wayne, PA: Clinical and Laboratory Standards Institute. - CLSI (2018) Measurement procedure comparison and bias estimation using patient samples. 3
^{rd}ed. CLSI guideline EP09c. Wayne, PA: Clinical and Laboratory Standards Institute. - Cochran WG (1954) Some methods for strengthening the common χ
^{2}tests. Biometrics 10:417-451. - Cohen J (1960) A coefficient of agreement for nominal scales. Educational and Psychological Measurement 20:37-46.
- Cohen J (1968) Weighted kappa: nominal scale agreement with provision for scaled disagreement or partial credit. Psychological Bulletin 70:213-220.
- Conover WJ (1999) Practical nonparametric statistics, 3
^{rd}edition. New York: John Wiley & Sons. - Cornbleet PJ, Gochman N (1979) Incorrect least-squares regression coefficients in method-comparison analysis. Clinical Chemistry 25:432-438.
- Cronbach LJ (1951) Coefficient alpha and the internal structure of tests. Psychometrika 16:297-334.
- Dallal GE, Wilkinson L (1986) An analytic approximation to the distribution of Lilliefors' test for normality. The American Statistician 40:294-296.
- Daly LE (1998) Confidence limits made easy: interval estimation using a substitution method. American Journal of Epidemiology 147:783-790.
- Davis J, Goadrich M (2006) The relationship between precision-recall and ROC curves. Proceedings of the 23
^{rd}International Conference on Machine Learning, Pittsburgh, PA, 2006. - DeLong ER, DeLong DM, Clarke-Pearson DL (1988) Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics 44:837-845.
- DerSimonian R, Laird N (1986) Meta-analysis in clinical trials. Controlled Clinical Trials 7:177-188.
- Dunn OJ (1964) Multiple comparisons using rank sums. Technometrics 6:241-252.
- Efron B (1987) Better Bootstrap Confidence Intervals. Journal of the American Statistical Association 82:171-185.
- Efron B, Tibshirani RJ (1993) An introduction to the Bootstrap. Chapman & Hall/CRC.
- Egger M, Smith GD, Schneider M, Minder C (1997) Bias in meta-analysis detected by a simple, graphical test. BMJ 315:629–634.
- Eisenhauer JG (2003) Regression through the origin. Teaching Statistics 25:76-80.
- Feldt LS (1965) The approximate sampling distribution of Kuder-Richardson reliability coefficient twenty. Psychometrika 30:357-371.
- Finney DJ (1947) Probit Analysis. A statistical treatment of the sigmoid response curve. Cambridge: Cambridge University Press.
- Fleiss JL (1981) Statistical methods for rates and proportions, 2
^{nd}ed. New York: John Wiley & Sons. - Fleiss JL, Levin B, Paik MC (2003) Statistical methods for rates and proportions, 3
^{rd}ed. Hoboken: John Wiley & Sons. - Forkman J (2009) Estimator and tests for common coefficients of variation in normal distributions. Communications in Statistics - Theory and Methods 38:233-251.
- Gardner IA, Greiner M (2006) Receiver-operating characteristic curves and likelihood ratios: improvements over traditional methods for the evaluation and application of veterinary clinical pathology tests. Veterinary Clinical Pathology 35:8-17.
- Gardner MJ, Altman DG (1986) Confidence intervals rather than P values: estimation rather than hypothesis testing. British Medical Journal 292:746-750.
- Girden ER (1992) ANOVA: repeated measures. Sage University Papers Series on Quantitative Applications in the Social Sciences, 84. Thousand Oaks, CA: Sage.
- Glantz SA, Slinker BK (2001) Primer of applied regression & analysis of variance. 2
^{nd}ed. McGraw-Hill. - Greenhouse SW, Geisser S (1959) On methods in the analysis of profile data. Psychometrika 24:95-112.
- Greiner M, Pfeiffer D, Smith RD (2000) Principles and practical application of the receiver-operating characteristic analysis for diagnostic tests. Preventive Veterinary Medicine 45:23-41.
- Griner PF, Mayewski RJ, Mushlin AI, Greenland P (1981) Selection and interpretation of diagnostic tests and procedures. Annals of Internal Medicine 94:555-600.
- Grubbs FE (1969) Procedures for detecting outlying observations in samples. Technometrics 11:1-21.
- Hanley H (1986) Analysis of Crude Data. In: Modern Epidemiology, ed Rothman KJ. Boston: Little, Brown & Co.
- Hanley JA, Hajian-Tilaki KO (1997) Sampling variability of nonparametric estimates of the areas under receiver operating characteristic curves: an update. Academic Radiology 4:49-58.
- Hanley JA, McNeil BJ (1982) The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 143:29-36.
- Hanley JA, McNeil BJ (1983) A method of comparing the areas under receiver operating characteristic curves derived from the same cases. Radiology 148:839-843.
- Hanneman SK (2008) Design, analysis, and interpretation of method-comparison studies. AACN Advanced Critical Care 19:223-234.
- Harrell FE Jr, Lee KL, Mark DB (1996) Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors. Statistics in Medicine 15:361-387
- Hedges LV, Olkin I (1985) Statistical methods for meta-analysis. London: Academic Press.
- Hintze JL, Nelson RD (1998) Violin Plots: A Box Plot-Density Trace Synergism. The American Statistician 52:181-184.
- Higgins JPT, Thomas J (editors) (2021) Cochrane Handbook for Systematic Reviews of Interventions Version 6.2. The Cochrane Collaboration, 2021. Available from https://training.cochrane.org
- Higgins JP, Thompson SG, Deeks JJ, Altman DG (2003) Measuring inconsistency in meta-analyses. BMJ 327:557-560.
- Hilgers RA (1991) Distribution-free confidence bounds for ROC curves. Methods of Information in Medicine 30:96-101.
- Hinkle DE, Wiersma W, Jurs SG (1988) Applied statistics for the behavioral sciences. 2
^{nd}ed. Boston: Houghton Mifflin Company. - Hosmer DW Jr, Lemeshow S, Sturdivant RX (2013) Applied Logistic Regression. Third Edition. New Jersey: John Wiley & Sons.
- Huitema BE (1980) The analysis of covariance and alternatives. Wiley-Interscience.
- Husted JA, Cook RJ, Farewell VT, Gladman DD (2000) Methods for assessing responsiveness: a critical review and recommendations. Journal of Clinical Epidemiology 53:459-468.
- Huynh H, Feldt LS (1976) Estimation of the Box correction for degrees of freedom from sample data in randomised block and split-plot designs. Journal of Educational Statistics 1:69-82.
- Hyslop NP, White WH (2009) Estimating precision using duplicate measurements. Journal of the Air & Waste Management Association 59:1032-1039.
- Jensen AL, Kjelgaard-Hansen M (2010) Diagnostic test validation. In: Weiss D, Wardrop KJ, editors. Schalm's Veterinay Hematology, 6
^{th}ed. Ames: Wiley-Blackwell; p. 1027-1033. - Jones R, Payne B (1997) Clinical investigation and statistics in laboratory medicine. London: ACB Venture Publications.
- Kirkwood BR, Sterne JAC (2003) Essential medical statistics, 2
^{nd}ed. Oxford: Blackwell Science. - Klein JP, Moeschberger ML (2003) Survival Analysis. Techniques for censored and truncated data, 2
^{nd}ed. New York: Springer Publishers. - Koehler KJ, Lartnz K (1980) An empirical investigation of goodness-of-fit statistics for sparse multinomials. Journal of the American Statistical Association 75:336-344.
- Krouwer JS (1987) Cumulative distribution analysis graphs - An alternative to ROC curves. Clinical Chemistry 33:2305-2306.
- Krouwer JS (2008) Why Bland-Altman plots should use
*X*, not (*Y*+*X*)/2 when*X*is a reference method. Statistics in Medicine 27:778-780. - Krouwer JS, Monti KL (1995) A simple, graphical method to evaluate laboratory assays. European Journal of Clinical Chemistry & Clinical Biochemistry 33:525-527.
- Lecoutre B (1991) A correction for the e approximate test in repeated measures designs with two or more independent groups. Journal of Educational Statistics 16:371-372.
- Lentner C (ed) (1982) Geigy Scientific Tables, 8
^{th}edition, Volume 2. Basle: Ciba-Geigy Limited. - Lin L.I-K (1989) A concordance correlation coefficient to evaluate reproducibility. Biometrics 45:255-268.
- Lin L.I-K (2000) A note on the concordance correlation coefficient. Biometrics 56:324-325.
- Linnet K, Boyd JC (2012) Selection and analytical evaluation of methods - with statistical techniques. In Burtis CA, Ashwood ER, Bruns DE (eds). Tietz Textbook of Clinical Chemistry and Molecular Diagnostics (5th ed). Elsevier Saunders, St Louis, MO, pp. 201-228.
- Long JS (1997) Regression Models for categorical and limited dependent variables. Thousand Oaks, CA: Sage Publications.
- Lu MJ, Zhong WH, Liu YX, Miao HZ, Li YC, Ji MH (2016) Sample size for assessing agreement between two methods of measurement by Bland-Altman method. The International Journal of Biostatistics 12: issue 2 (8 pp).
- Ludbrook J (2010) Linear regression analysis for comparing two measures or methods of measurement: but which regression? Clinical and Experimental Pharmacology & Physiology 37:692-699.
- Machin D, Campbell MJ, Tan SB, Tan SH (2009) Sample size tables for clinical studies. 3
^{rd}ed. Chichester: Wiley-Blackwell. - Matsumoto M, Nishimura T (1998) Mersenne twister: a 623-dimensionally equidistributed uniform pseudo-random number generator. ACM Transactions on Modeling and Computer Simulation 8:3-30.
- Matthews JN, Altman DG, Campbell MJ, Royston P (1990) Analysis of serial measurements in medical research. Britisch Medical Journal 300:230-235.
- Mayer B, Gaus W, Braisch U (2016) The fallacy of the Passing-Bablok-regression. Jökull Journal 66:95-106.
- McClish DK (1989) Analyzing a Portion of the ROC Curve. Medical Decision Making 9:190-195
- McGill R, Tukey JW, Larsen WA (1978) Variations of box plots. The American Statistician 32:12-16.
- McGraw KO, Wong SP (1996) Forming inferences about some intraclass correlation coefficients. Psychological Methods 1:30-46. (Correction: 1:390).
- Mercaldo ND, Lau KF, Zhou XH (2007) Confidence intervals for predictive values with an emphasis to case-control studies. Statistics in Medicine 26:2170-2183.
- Metz CE (1978) Basic principles of ROC analysis. Seminars in Nuclear Medicine 8:283-298.
- Monahan JF (2001) Numerical methods of statistics. Cambridge University Press.
- NCCLS (2000) How to define and determine reference intervals in the clinical laboratory: approved guideline - second edition. NCCLS document C28-A2. Wayne, PA: NCCLS.
- Neely JG, Karni RJ, Engel SH, Fraley PL, Nussenbaum B, Paniello RC (2007) Practical guides to understanding sample size and minimal clinically important difference (MCID). Otolaryngology - Head and Neck Surgery, 143:29-36.
- Neter J, Wasserman W, Whitmore GA (1988) Applied statistics. 3
^{rd}ed. Boston: Allyn and Bacon, Inc. - Neter J, Kutner MH, Nachtsheim CJ, Wasserman W (1996) Applied linear statistical models. 4
^{th}ed. McGraw-Hill. - Norman GR, Wyrwich KW, Patrick DL (2007) The mathematical relationship among different forms of responsiveness coefficients. Quality of Life Research 16:815-822.
- Obuchowski NA (2006) Receiver Operating Characteristic Curves and their use in radiology. Radiology 229:3-8.
- Pagano M, Gauvreau K (2000) Principles of biostatistics, 2
^{nd}ed. Brooks/Cole, Cengage Learning. - Pampel FC (2000) Logistic regression: A primer. Sage University Papers Series on Quantitative Applications in the Social Sciences, 07-132. Thousand Oaks, CA: Sage.
- Park SY, Park JE, Kim H, Park SH (2021) Review of statistical methods for evaluating the performance of survival or other time-to-event prediction models (from conventional to deep learning approaches). Korean Journal of Radiology.
- Passing H, Bablok W (1983) A new biometrical procedure for testing the equality of measurements from two different analytical methods. Application of linear regression procedures for method comparison studies in Clinical Chemistry, Part I. Journal of Clinical Chemistry & Clinical Biochemistry 21:709-720.
- Passing H, Bablok W (1984) Comparison of several regression procedures for method comparison studies and determination of sample sizes. Application of linear regression procedures for method comparison studies in Clinical Chemistry, Part II. Journal of Clinical Chemistry & Clinical Biochemistry 22:431-445.
- Peduzzi P, Concato J, Feinstein AR, Holford TR (1995) Importance of events per independent variable in proportional hazards regression analysis. II. Accuracy and precision of regression estimates. Journal of Clinical Epidemiology 48:1503-1510.
- Peduzzi P, Concato J, Kemper E, Holford TR, Feinstein AR (1996) A simulation study of the number of events per variable in logistic regression analysis. Journal of Clinical Epidemiology 49:1373-1379.
- Pencina MJ, D'Agostino RB (2004) Overall C as a measure of discrimination in survival analysis: model specific population value and confidence interval estimation. Statistics in Medicine 23:2109-2123.
- Petrie A, Bulman JS, Osborn JF (2003) Further statistics in dentistry. Part 8: systematic reviews and meta-analyses. British Dental Journal 194:73-78.
- Press WH, Teukolsky SA, Vetterling WT, Flannery BP (2007) Numerical Recipes. The Art of Scientific Computing. Third Edition. New York: Cambridge University Press.
- Reed AH, Henry RJ, Mason WB (1971) Influence of statistical method used on the resulting estimate of normal range. Clinical Chemistry 17:275-284.
- Richardson JTE (2011) The analysis of 2 x 2 contingency tables - Yet again. Statistics in Medicine 30:890.
- Rosner B (1983) Percentage points for a generalized ESD many-outlier procedure. Technometrics 25:165-172.
- Rosner B (2006) Fundamentals of Biostatistics. 6
^{th}ed. Pacific Grove: Duxbury. - Royston P (1993a) A pocket-calculator algorithm for the Shapiro-Francia test for non-normality: an application to medicine. Statistics in Medicine 12: 181-184.
- Royston P (1993b) A Toolkit for Testing for Non-Normality in Complete and Censored Samples. Journal of the Royal Statistical Society. Series D (The Statistician) 42: 37-43.
- Royston P (1995) A Remark on Algorithm AS 181: The W-test for Normality. Journal of the Royal Statistical Society. Series C (Applied Statistics) 44: 547-551.
- Royston P, Karmar KB (2013) Restricted mean survival time: an alternative to the hazard ratio for the design and analysis of randomized trials with a time-to-event outcome. BMC Medical Research Methodology 13:152.
- Sahai H, Khurshid A (1996) Statistics in epidemiology: methods, techniques, and applications. Boca Raton, FL: CRC Press, Inc.
- Saito T, Rehmsmeier M (2015) The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets. Plos One 10:e0118432.
- Schoonjans F, De Bacquer D, Schmid P (2011) Estimation of population percentiles. Epidemiology 22: 750-751.
- Schuirmann DJ (1987) A comparison of the Two One-Sided Tests Procedure and the Power Approach for assessing the equivalence of average bioavailability. Journal of Pharmacokinetics and Biopharmaceutics 15:657-680.
- Shapiro SS, Francia RS (1972) An approximate analysis of variance test for normality. Journal of the American Statistical Association 67: 215-216.
- Shapiro SS, Wilk MB (1965) An analysis of variance test for normality (complete samples). Biometrika 52: 3-4.
- Sheskin DJ (2004) Handbook of parametric and nonparametric statistical procedures. 3
^{rd}ed. Boca Raton: Chapman & Hall /CRC. - Sheskin DJ (2011) Handbook of parametric and nonparametric statistical procedures. 5
^{th}ed. Boca Raton: Chapman & Hall /CRC. - Shrout PE, Fleiss JL (1979) Intraclass correlations: uses in assessing rater reliability. Psychological Bulletin 86:420-428.
- Smith RJ (2009) Use and misuse of the reduced major axis for line-fitting. American Journal of Physical Anthropology 140:476-486.
- Snedecor GW, Cochran WG (1989) Statistical methods, 8
^{th}edition. Ames, Iowa: Iowa State University Press. - Spiegel MR (1961) Theory and problems of statistics. New York: McGraw-Hill Book Company.
- Sterne JA, Egger E (2001) Funnel plots for detecting bias in meta-analysis: guidelines on choice of axis. Journal of Clinical Epidemiology 54:1046–1055.
- Sterne JA, Sutton AJ, Ioannidis JP et al. (2011) Recommendations for examining and interpreting funnel plot asymmetry in meta-analyses of randomised controlled trials. BMJ 2011;343:d4002.
- Stöckl D, Rodríguez Cabaleiro D, Van Uytfanghe K, Thienpont LM (2004) Interpreting method comparison studies by use of the Bland-Altman plot: reflecting the importance of sample size by incorporating confidence limits and predefined error limits in the graphic. Clinical Chemistry 50:2216-2218.
- Synek V (2008) Evaluation of the standard deviation from duplicate results. Accreditation and Quality Assurance 13:335-337.
- Tukey JW (1977) Exploratory data analysis. Reading, Mass: Addison-Wesley Publishing Company.
- Walter SD (2005) The partial area under the summary ROC curve. Statistics in Medicine 24:2025–40.
- Westfall PH (2014) Kurtosis as Peakedness, 1905 - 2014. R.I.P. The American Statistician 68:191-195.
- Westgard JO, Barry PL, Hunt MR, Groth T (1981) A multi-rule Shewhart chart for Quality Control in Clinical Chemistry. Clinical Chemistry 27:493-501.
- Westgard JO (2008) Basic method validation. 3
^{rd}ed. Madison: Westgard QC, Inc. - Wildt AR, Ahtola OT (1978) Analysis of covariance. Sage Publications.
- Wong B (2011) Points of view: Color blindness. Nature Methods 8:441.
- Wright EM, Royston P (1997) Simplified estimation of age-specific reference intervals for skewed data. Statistics in Medicine 16:2785-2803.
- Wright EM, Royston P (1997) A comparison of statistical methods for age-related reference intervals. Journal of the Royal Statistical Society, A 160:47-69.
- Youden WJ (1950) An index for rating diagnostic tests. Cancer 3:32-35.
- Youden WJ (1959) Graphical diagnosis of interlaboratory test results. Industrial Quality Control 15:24-28.
- Zhou XH, NA Obuchowski, DK McClish (2002) Statistical methods in diagnostic medicine. New York: Wiley.
- Zou GY (2013) Confidence interval estimation for the Bland-Altman limits of agreement with multiple observations per individual. Statistics in Medicine 22:630-642.
- Zweig MH, Campbell G (1993) Receiver-operating characteristic (ROC) plots: a fundamental evaluation tool in clinical medicine. Clinical Chemistry 39:561-577.