Skip to main navigation menu Skip to main content Skip to site footer

Original article

Vol. 152 No. 0910 (2022)

Swiss general internal medicine board examination: quantitative effects of publicly available and unavailable questions on question difficulty and test performance

  • Petra Ferrari Pedrini
  • Christoph Berendonk
  • Anne Ehle Roussy
  • Luca Gabutti
  • Thomas Hugentobler
  • Lilian Küng
  • Franco Muggli
  • Florian Neubauer
  • Simon Ritter
  • Alexandre Ronga
  • Andreas Rothenbühler
  • Monique Savopol
  • Hansueli Späth
  • Daniel Stricker
  • Daniel Widmer
  • Ulrich Stoller
  • Jürg Hans Beer
Cite this as:
Swiss Med Wkly. 2022;152:w30118


BACKGROUND: Formerly, a substantial number of the 120 multiple-choice questions of the Swiss Society of General Internal Medicine (SSGIM) board examination were derived from publicly available MKSAP questions (Medical Knowledge Self-Assessment Program®). The possibility to memorise publicly available questions may unduly influence the candidates’ examination performance. Therefore, the examination board raised concerns that the examination did not meet the objective of evaluating the application of knowledge. The society decided to develop new, “Helvetic” questions to improve the examination. The aim of the present study was to quantitatively assess the degree of difficulty of the Helvetic questions (HQ) compared with publicly available and unavailable MKSAP questions and to investigate whether the degree of difficulty of MKSAP questions changed over time as their status changed from publicly available to unavailable.

METHODS: The November 2019 examination consisted of 40 Helvetic questions, 40 publicly available questions from MKSAP edition 17 (MKSAP-17) and 40 questions from MKSAP-15/16, which were no longer publicly available at the time of the examination. An one factorial univariate analysis of variance (ANOVA) examined question difficulty (lower values mean higher difficulty) between these three question sets. A repeated ANOVA compared the difficulty of MKSAP-15/16 questions in the November 2019 examination with the difficulty of the exact same questions from former examinations, when these questions belonged to the publicly available MKSAP edition. The publicly available MKSAP-17 and the publicly unavailable Helvetic questions served as control.

RESULTS: The analysis of the November 2019 exam showed a significant difference in average item difficulty between Helvetic and MKSAP-17 questions (71% vs 86%, p <0.001) and between MKSAP-15/16 and MKSAP-17 questions (70% vs 86%, p <0.001). There was no significant difference in item difficulty between Helvetic and MKSAP-15/16 questions (71% vs 70%, p = 0.993). The repeated measures ANOVA on question use and the three question categories showed a significant interaction (p <0.001, partial eta-squared = 0.422). The change in the availability of MKSAP-15/16 questions had a strong effect on difficulty. Questions became on average 21.9% more difficult when they were no longer publicly available. In contrast, the difficulty of the MKSAP-17 and Helvetic questions did not change significantly across administrations.

DISCUSSION: This study provides the quantitative evidence that the public availability of questions has a decisive influence on question difficulty and thus on SSGIM board examination performance. Reducing the number of publicly available questions in the examination by introducing confidential, high-quality Helvetic questions contributes to the validity of the board examination by addressing higher order cognitive skills and making rote-learning strategies less effective.


  1. Cranston M, Slee-Valentijn M, Davidson C, Lindgren S, Semple C, Palsson R ; European Board of Internal Medicine Competencies Working Group. Postgraduate education in internal medicine in Europe. Eur J Intern Med. 2013 Oct;24(7):633–8.
  2. Hutchinson L, Aitken P, Hayes T. Are medical postgraduate certification processes valid? A systematic review of the published evidence. Med Educ. 2002 Jan;36(1):73–91.
  3. Torre DM, Hemmer PA, Durning SJ, Dong T, Swygert K, Schreiber-Gregory D, et al. Gathering Validity Evidence on an Internal Medicine Clerkship Multistep Ex-am to Assess Medical Student Analytic Ability. Teach Learn Med. 2020 Apr;•••:1–8.
  4. Sam AH, Field SM, Collares CF, van der Vleuten CP, Wass VJ, Melville C, et al. Very-short-answer questions: reliability, discrimination and acceptability. Med Educ. 2018 Apr;52(4):447–55.
  5. See KC, Tan KL, Lim TK. The script concordance test for clinical reasoning: re-examining its utility and potential weakness. Med Educ. 2014 Nov;48(11):1069–77.
  6. Lineberry M, Kreiter CD, Bordage G. Threats to validity in the use and interpretation of script concordance test scores. Med Educ. 2013 Dec;47(12):1175–83.
  8. Cohen J. Statistical Power Analysis for the Social Sciences (2nd. Edition). Hillsda-le, New Jersey: Lawrence Erlbaum Associates; 1988.
  9. May W, Chung EK, Elliott D, Fisher D. The relationship between medical students’ learning approaches and performance on a summative high-stakes clinical performance examination. Med Teach. 2012;34(4):e236–41.
  10. Feeley AM, Biggerstaff DL. Exam Success at Undergraduate and Graduate-Entry Medical Schools: Is Learning Style or Learning Approach More Important? A Critical Review Exploring Links Between Academic Success, Learning Styles, and Learning Approaches Among School-Leaver Entry (“Traditional”) and Graduate-Entry (“Nontraditional”) Medical Students. Teach Learn Med. 2015;27(3):237–44.
  11. Riggs CD, Kang S, Rennie O. Positive Impact of Multiple-Choice Question Authoring and Regular Quiz Participation on Student Learning. CBE Life Sci Educ. 2020 Jun;19(2):ar16.
  12. Jensen JL, McDaniel MA, Woodard SM, Kummer TA. Teaching to the Test…or Testing to Teach: Exams Requiring Higher Order Thinking Skills Encourage Greater Conceptual Understanding. Educ Psychol Rev. 2014;26(2):307–29.
  15. Norman G, Swanson D, Case S. Conceptual and methodology issues in studies compar-ing assessment formats, issues in comparing item formats. Teach Learn Med. 1996;•••:8.
  16. Scully D. Constructing Multiple-Choice Items to Measure Higher-Order Think-ing. Pract Assess, Res Eval. 2017;22:4.
  17. Daniel M, Rencic J, Durning SJ, Holmboe E, Santen SA, Lang V, et al. Clinical Reasoning Assessment Methods: A Scoping Review and Practical Guidance. Acad Med. 2019 Jun;94(6):902–12.
  18. Schuwirth LW, Verheggen MM, van der Vleuten CP, Boshuizen HP, Dinant GJ. Vali-dation of short case-based testing using a cognitive psychological methodology. Med Educ. 2000;35:348–56.
  19. Schuwirth LW, van der Vleuten CP. Different written assessment methods: what can be said about their strengths and weaknesses? Med Educ. 2004 Sep;38(9):974–9.
  20. McBee E, Pitkin NEB, Durning SJ, Burke MJ. Commentary: a View from the Inside-A Per-spective on How ABIM is Innovating in Response to Feedback. Eval Health Prof. 2019 Dec 23:163278719895080. doi: Online ahead of print.Eval Health Prof. 2019. PMID: 31868003.
  21. De Champlain AF, Book Editor(s):Tim Swanwick, Kirsty Forrest, Bridget C. O'Brien, First published: 05 October 2018,
  22. Ricker K. Setting cut-scores: a critical review of the Angoff and modified Angoff methods. Alberta J Educ Res. 2006;52(1):53–6.
  23. Steven J. Durning SJ, Dong T, Artino AR, Van der Vleuten C, Holmboe E, Schuwirth L. Dual processing theory and expertsʼ reasoning: exploring thinking on national multiple-choice questions. Perspectives on medical Education 4.2015; 168-175.

Most read articles by the same author(s)