Swiss general internal medicine board examination: quantitative effects of publicly available and unavailable questions on question difficulty and test performance

Petra Ferrari Pedrini; Christoph Berendonk; Anne Ehle Roussy; Luca Gabutti; Thomas Hugentobler; Lilian Küng; Franco Muggli; Florian Neubauer; Simon Ritter; Alexandre Ronga; Andreas Rothenbühler; Monique Savopol; Hansueli Späth; Daniel Stricker; Daniel Widmer; Ulrich Stoller; Jürg Hans Beer

doi:10.4414/SMW.2022.w30118

Original article

Vol. 152 No. 0910 (2022)

Swiss general internal medicine board examination: quantitative effects of publicly available and unavailable questions on question difficulty and test performance

Petra Ferrari Pedrini
Christoph Berendonk
Anne Ehle Roussy
Luca Gabutti
Thomas Hugentobler
Lilian Küng
Franco Muggli
Florian Neubauer
Simon Ritter
Alexandre Ronga
Andreas Rothenbühler
Monique Savopol
Hansueli Späth
Daniel Stricker
Daniel Widmer
Ulrich Stoller
Jürg Hans Beer

DOI: https://doi.org/10.4414/SMW.2022.w30118
Cite this as:: Swiss Med Wkly. 2022;152:w30118
Published: 08.03.2022

Summary

BACKGROUND: Formerly, a substantial number of the 120 multiple-choice questions of the Swiss Society of General Internal Medicine (SSGIM) board examination were derived from publicly available MKSAP questions (Medical Knowledge Self-Assessment Program^®). The possibility to memorise publicly available questions may unduly influence the candidates’ examination performance. Therefore, the examination board raised concerns that the examination did not meet the objective of evaluating the application of knowledge. The society decided to develop new, “Helvetic” questions to improve the examination. The aim of the present study was to quantitatively assess the degree of difficulty of the Helvetic questions (HQ) compared with publicly available and unavailable MKSAP questions and to investigate whether the degree of difficulty of MKSAP questions changed over time as their status changed from publicly available to unavailable.

METHODS: The November 2019 examination consisted of 40 Helvetic questions, 40 publicly available questions from MKSAP edition 17 (MKSAP-17) and 40 questions from MKSAP-15/16, which were no longer publicly available at the time of the examination. An one factorial univariate analysis of variance (ANOVA) examined question difficulty (lower values mean higher difficulty) between these three question sets. A repeated ANOVA compared the difficulty of MKSAP-15/16 questions in the November 2019 examination with the difficulty of the exact same questions from former examinations, when these questions belonged to the publicly available MKSAP edition. The publicly available MKSAP-17 and the publicly unavailable Helvetic questions served as control.

RESULTS: The analysis of the November 2019 exam showed a significant difference in average item difficulty between Helvetic and MKSAP-17 questions (71% vs 86%, p <0.001) and between MKSAP-15/16 and MKSAP-17 questions (70% vs 86%, p <0.001). There was no significant difference in item difficulty between Helvetic and MKSAP-15/16 questions (71% vs 70%, p = 0.993). The repeated measures ANOVA on question use and the three question categories showed a significant interaction (p <0.001, partial eta-squared = 0.422). The change in the availability of MKSAP-15/16 questions had a strong effect on difficulty. Questions became on average 21.9% more difficult when they were no longer publicly available. In contrast, the difficulty of the MKSAP-17 and Helvetic questions did not change significantly across administrations.

DISCUSSION: This study provides the quantitative evidence that the public availability of questions has a decisive influence on question difficulty and thus on SSGIM board examination performance. Reducing the number of publicly available questions in the examination by introducing confidential, high-quality Helvetic questions contributes to the validity of the board examination by addressing higher order cognitive skills and making rote-learning strategies less effective.

References

Cranston M, Slee-Valentijn M, Davidson C, Lindgren S, Semple C, Palsson R ; European Board of Internal Medicine Competencies Working Group. Postgraduate education in internal medicine in Europe. Eur J Intern Med. 2013 Oct;24(7):633–8. https://doi.org/10.1016/j.ejim.2013.08.006
Hutchinson L, Aitken P, Hayes T. Are medical postgraduate certification processes valid? A systematic review of the published evidence. Med Educ. 2002 Jan;36(1):73–91. https://doi.org/10.1046/j.1365-2923.2002.01120.x
Torre DM, Hemmer PA, Durning SJ, Dong T, Swygert K, Schreiber-Gregory D, et al. Gathering Validity Evidence on an Internal Medicine Clerkship Multistep Ex-am to Assess Medical Student Analytic Ability. Teach Learn Med. 2020 Apr;•••:1–8.
Sam AH, Field SM, Collares CF, van der Vleuten CP, Wass VJ, Melville C, et al. Very-short-answer questions: reliability, discrimination and acceptability. Med Educ. 2018 Apr;52(4):447–55. https://doi.org/10.1111/medu.13504
See KC, Tan KL, Lim TK. The script concordance test for clinical reasoning: re-examining its utility and potential weakness. Med Educ. 2014 Nov;48(11):1069–77. https://doi.org/10.1111/medu.12514
Lineberry M, Kreiter CD, Bordage G. Threats to validity in the use and interpretation of script concordance test scores. Med Educ. 2013 Dec;47(12):1175–83. https://doi.org/10.1111/medu.12283
https://acponline.org/
Cohen J. Statistical Power Analysis for the Social Sciences (2nd. Edition). Hillsda-le, New Jersey: Lawrence Erlbaum Associates; 1988.
May W, Chung EK, Elliott D, Fisher D. The relationship between medical students’ learning approaches and performance on a summative high-stakes clinical performance examination. Med Teach. 2012;34(4):e236–41. https://doi.org/10.3109/0142159X.2012.652995
Feeley AM, Biggerstaff DL. Exam Success at Undergraduate and Graduate-Entry Medical Schools: Is Learning Style or Learning Approach More Important? A Critical Review Exploring Links Between Academic Success, Learning Styles, and Learning Approaches Among School-Leaver Entry (“Traditional”) and Graduate-Entry (“Nontraditional”) Medical Students. Teach Learn Med. 2015;27(3):237–44. https://doi.org/10.1080/10401334.2015.1046734
Riggs CD, Kang S, Rennie O. Positive Impact of Multiple-Choice Question Authoring and Regular Quiz Participation on Student Learning. CBE Life Sci Educ. 2020 Jun;19(2):ar16. https://doi.org/10.1187/cbe.19-09-0189
Jensen JL, McDaniel MA, Woodard SM, Kummer TA. Teaching to the Test…or Testing to Teach: Exams Requiring Higher Order Thinking Skills Encourage Greater Conceptual Understanding. Educ Psychol Rev. 2014;26(2):307–29. https://doi.org/10.1007/s10648-013-9248-9
www.mrcpuk.org/get-involved-examiners/question-writers
www.abim.org/about/exam-information/exam-development
Norman G, Swanson D, Case S. Conceptual and methodology issues in studies compar-ing assessment formats, issues in comparing item formats. Teach Learn Med. 1996;•••:8.
Scully D. Constructing Multiple-Choice Items to Measure Higher-Order Think-ing. Pract Assess, Res Eval. 2017;22:4.
Daniel M, Rencic J, Durning SJ, Holmboe E, Santen SA, Lang V, et al. Clinical Reasoning Assessment Methods: A Scoping Review and Practical Guidance. Acad Med. 2019 Jun;94(6):902–12. https://doi.org/10.1097/ACM.0000000000002618
Schuwirth LW, Verheggen MM, van der Vleuten CP, Boshuizen HP, Dinant GJ. Vali-dation of short case-based testing using a cognitive psychological methodology. Med Educ. 2000;35:348–56. https://doi.org/10.1046/j.1365-2923.2001.00771.x
Schuwirth LW, van der Vleuten CP. Different written assessment methods: what can be said about their strengths and weaknesses? Med Educ. 2004 Sep;38(9):974–9. https://doi.org/10.1111/j.1365-2929.2004.01916.x
McBee E, Pitkin NEB, Durning SJ, Burke MJ. Commentary: a View from the Inside-A Per-spective on How ABIM is Innovating in Response to Feedback. Eval Health Prof. 2019 Dec 23:163278719895080. doi: https://doi.org/10.1177/0163278719895080. Online ahead of print.Eval Health Prof. 2019. PMID: 31868003.
De Champlain AF, Book Editor(s):Tim Swanwick, Kirsty Forrest, Bridget C. O'Brien, First published: 05 October 2018, https://doi.org/https://doi.org/10.1002/9781119373780.ch24
Ricker K. Setting cut-scores: a critical review of the Angoff and modified Angoff methods. Alberta J Educ Res. 2006;52(1):53–6.
Steven J. Durning SJ, Dong T, Artino AR, Van der Vleuten C, Holmboe E, Schuwirth L. Dual processing theory and expertsʼ reasoning: exploring thinking on national multiple-choice questions. Perspectives on medical Education 4.2015; 168-175.

Publication image

How to Cite

1.

Ferrari Pedrini P, Berendonk C, Ehle Roussy A, Gabutti L, Hugentobler T, Küng L, Muggli F, Neubauer F, Ritter S, Ronga A, Rothenbühler A, Savopol M, Späth H, Stricker D, Widmer D, Stoller U, Hans Beer J. Swiss general internal medicine board examination: quantitative effects of publicly available and unavailable questions on question difficulty and test performance. Swiss Med Wkly [Internet]. 2022 Mar. 8 [cited 2024 Sep. 22];152(0910):w30118. Available from: https://smw.ch/index.php/smw/article/view/3165

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Most read articles by the same author(s)

Valentina Forni Ogna, Adam Ogna, Belen Ponte, Luca Gabutti, Isabelle Binet, David Conen, Paul Erne, Augusto Gallino, Idris Guessous, Daniel Hayoz, Franco Muggli, Fred Paccaud, Antoinette Péchère-Bertchi, Paolo M. Suter, Murielle Bochud, Michel Burnier, Prevalence and determinants of chronic kidney disease in the Swiss population , Swiss Medical Weekly: Vol. 146 No. 1718 (2016)
Sissel Guttormsen, Christine Beyeler, Raphael Bonvin, Sabine Feller, Christian Schirlo, Kai Schnabel, Tina Schurter, Christoph Berendonk, The new licencing examination for human medicine: from concept to implementation , Swiss Medical Weekly: Vol. 143 No. 4950 (2013)
Nicolas Glatz, Aline Chappuis, David Conen, Paul Erne, Antoinette Péchère-Bertschi, Idris Guessous, Valentina Forni, Luca Gabutti, Franco Muggli, Augusto Gallino, Daniel Hayoz, Isabelle Binet, Paolo Suter, Fred Paccaud, Murielle Bochud, Michel Burnier, Swiss Medical Weekly, Associations of sodium, potassium and protein intake with blood pressure and hypertension in Switzerland , Swiss Medical Weekly: Vol. 147 No. 0708 (2017)
Claudia Gamondi, Nadia Galli, Carlo Schönholzer, Claudio Marone, Hugo Zwahlen, Luca Gabutti, Giorgia Bianchi, Claudia Ferrier, Claudio Cereghetti, Olivier Giannini, Frequency and severity of pain and symptom distress among patients with chronic kidney disease receiving dialysis , Swiss Medical Weekly: Vol. 143 No. 0708 (2013)
Jan Breckwoldt, Adrian P. Marty, Daniel Stricker, Raphael Stolz, Reto Thomasin, Niels Seeholzer, Joana Berger-Estilita, Robert Greif, Sören Huwendiek, Marco P. Zalunardo, Bottom-up feedback to improve clinical teaching: validation of the Swiss System for Evaluation of Teaching Qualities (SwissSETQ) , Swiss Medical Weekly: Vol. 152 No. 1112 (2022)
Marc Sohrmann, Christoph Berendonk, Mathieu Nendaz, Raphaël Bonvin, the Swiss Working Group for PROFILES Implementation, Nationwide introduction of a new competency framework for undergraduate medical curricula: a collaborative approach , Swiss Medical Weekly: Vol. 150 No. 1516 (2020)
Michelle Frank, Sara Guarino-Gubler, Michel Burnier, Marc Maillard, Franco Keller, Luca Gabutti, Estimation of glomerular filtration rate in hospitalised patients: are we overestimating renal function? , Swiss Medical Weekly: Vol. 142 No. 5152 (2012)
Angela Greco, Giovanni Rabito, Michela Pironi, Marco Bissig, Saida Parlato, Laura Andreocchi, Giorgia Bianchi, Marilù Poretti Guigli, Michael Llamas, Rita Monotti, Leander Sciolli, Franco Ravetta, Roberto Della Bruna, Anna Zasa, Daniela Stehrenberger, Olivier Giannini, Luca Gabutti, Hypokalaemia in hospitalised patients , Swiss Medical Weekly: Vol. 146 No. 2526 (2016)
Andreas W. Schoenenberger, Dragana Radovanovic, Franco Muggli, Paolo M. Suter, Renate Schoenenberger-Berzins, Gianfranco Parati, Mario G. Bianchetti, Augusto Gallino, Paul Erne, Prevalence of ideal cardiovascular health in a community-based population – results from the Swiss Longitudinal Cohort Study (SWICOS) , Swiss Medical Weekly: Vol. 151 No. 3940 (2021)
Maristella Santi, Sebastiano A. G. Lava, Giacomo D. Simonetti, Andreas Stettbacher, Mario G. Bianchetti, Franco Muggli, Clustering of cardiovascular disease risk factors among male youths in Southern Switzerland: preliminary study , Swiss Medical Weekly: Vol. 146 No. 3334 (2016)

[1] Cranston M, Slee-Valentijn M, Davidson C, Lindgren S, Semple C, Palsson R ; European Board of Internal Medicine Competencies Working Group. Postgraduate education in internal medicine in Europe. Eur J Intern Med. 2013 Oct;24(7):633–8. https://doi.org/10.1016/j.ejim.2013.08.006

[2] Hutchinson L, Aitken P, Hayes T. Are medical postgraduate certification processes valid? A systematic review of the published evidence. Med Educ. 2002 Jan;36(1):73–91. https://doi.org/10.1046/j.1365-2923.2002.01120.x

[3] Torre DM, Hemmer PA, Durning SJ, Dong T, Swygert K, Schreiber-Gregory D, et al. Gathering Validity Evidence on an Internal Medicine Clerkship Multistep Ex-am to Assess Medical Student Analytic Ability. Teach Learn Med. 2020 Apr;•••:1–8.

[4] Sam AH, Field SM, Collares CF, van der Vleuten CP, Wass VJ, Melville C, et al. Very-short-answer questions: reliability, discrimination and acceptability. Med Educ. 2018 Apr;52(4):447–55. https://doi.org/10.1111/medu.13504

[5] See KC, Tan KL, Lim TK. The script concordance test for clinical reasoning: re-examining its utility and potential weakness. Med Educ. 2014 Nov;48(11):1069–77. https://doi.org/10.1111/medu.12514

[6] Lineberry M, Kreiter CD, Bordage G. Threats to validity in the use and interpretation of script concordance test scores. Med Educ. 2013 Dec;47(12):1175–83. https://doi.org/10.1111/medu.12283

[7] https://acponline.org/

[8] Cohen J. Statistical Power Analysis for the Social Sciences (2nd. Edition). Hillsda-le, New Jersey: Lawrence Erlbaum Associates; 1988.

[9] May W, Chung EK, Elliott D, Fisher D. The relationship between medical students’ learning approaches and performance on a summative high-stakes clinical performance examination. Med Teach. 2012;34(4):e236–41. https://doi.org/10.3109/0142159X.2012.652995

[10] Feeley AM, Biggerstaff DL. Exam Success at Undergraduate and Graduate-Entry Medical Schools: Is Learning Style or Learning Approach More Important? A Critical Review Exploring Links Between Academic Success, Learning Styles, and Learning Approaches Among School-Leaver Entry (“Traditional”) and Graduate-Entry (“Nontraditional”) Medical Students. Teach Learn Med. 2015;27(3):237–44. https://doi.org/10.1080/10401334.2015.1046734

[11] Riggs CD, Kang S, Rennie O. Positive Impact of Multiple-Choice Question Authoring and Regular Quiz Participation on Student Learning. CBE Life Sci Educ. 2020 Jun;19(2):ar16. https://doi.org/10.1187/cbe.19-09-0189

[12] Jensen JL, McDaniel MA, Woodard SM, Kummer TA. Teaching to the Test…or Testing to Teach: Exams Requiring Higher Order Thinking Skills Encourage Greater Conceptual Understanding. Educ Psychol Rev. 2014;26(2):307–29. https://doi.org/10.1007/s10648-013-9248-9

[13] www.mrcpuk.org/get-involved-examiners/question-writers

[14] www.abim.org/about/exam-information/exam-development

[15] Norman G, Swanson D, Case S. Conceptual and methodology issues in studies compar-ing assessment formats, issues in comparing item formats. Teach Learn Med. 1996;•••:8.

[16] Scully D. Constructing Multiple-Choice Items to Measure Higher-Order Think-ing. Pract Assess, Res Eval. 2017;22:4.

[17] Daniel M, Rencic J, Durning SJ, Holmboe E, Santen SA, Lang V, et al. Clinical Reasoning Assessment Methods: A Scoping Review and Practical Guidance. Acad Med. 2019 Jun;94(6):902–12. https://doi.org/10.1097/ACM.0000000000002618

[18] Schuwirth LW, Verheggen MM, van der Vleuten CP, Boshuizen HP, Dinant GJ. Vali-dation of short case-based testing using a cognitive psychological methodology. Med Educ. 2000;35:348–56. https://doi.org/10.1046/j.1365-2923.2001.00771.x

[19] Schuwirth LW, van der Vleuten CP. Different written assessment methods: what can be said about their strengths and weaknesses? Med Educ. 2004 Sep;38(9):974–9. https://doi.org/10.1111/j.1365-2929.2004.01916.x

[20] McBee E, Pitkin NEB, Durning SJ, Burke MJ. Commentary: a View from the Inside-A Per-spective on How ABIM is Innovating in Response to Feedback. Eval Health Prof. 2019 Dec 23:163278719895080. doi: https://doi.org/10.1177/0163278719895080. Online ahead of print.Eval Health Prof. 2019. PMID: 31868003.

[21] De Champlain AF, Book Editor(s):Tim Swanwick, Kirsty Forrest, Bridget C. O'Brien, First published: 05 October 2018, https://doi.org/https://doi.org/10.1002/9781119373780.ch24

[22] Ricker K. Setting cut-scores: a critical review of the Angoff and modified Angoff methods. Alberta J Educ Res. 2006;52(1):53–6.

[23] Steven J. Durning SJ, Dong T, Artino AR, Van der Vleuten C, Holmboe E, Schuwirth L. Dual processing theory and expertsʼ reasoning: exploring thinking on national multiple-choice questions. Perspectives on medical Education 4.2015; 168-175.