Suggestions(2)
Similar(60)
Results showed that gains in performance were larger in the open-ended than multiple-choice condition.
Initial test scores for participants in the standard multiple-choice condition and the confidence-weighted multiple-choice condition are not directly comparable, as they are calculated in different ways.
One participant's data in the confidence-weighted multiple-choice condition and one participant's data in the standard multiple-choice condition were excluded from analysis because these individuals did not follow instructions.
In the standard multiple-choice condition, performance on the test of the first passage (M = 7, SD = 2.05) did not differ significantly from that on the second passage [M = 7.43, SD = 1.83; t(36) = −1.37, p = .18].18]
Additionally, participants in the confidence-weighted multiple-choice condition were shown a series of statements (see Appendix for a complete list) regarding their opinions of the initial confidence-weighted tests.
Similarly, for participants in the confidence-weighted multiple-choice condition, the average score obtained on the test of the first passage (M = 4.82, SD = 17.11) did not differ significantly from that obtained on the test of the second passage [M = 4.16, SD = 18.29; t(49) = .22, p = .83].83]
Finally, in the confidence-weighted multiple-choice condition, average scores on the test of the first passage (M = −6.05, SD = 20.23) did not significantly differ from those on the test of the second passage (M = 1.08, SD = 16.95; t 36) = −1.82, p = .08), suggesting, as in Experiment 1, that participants did not change their test-taking strategies from their first to their second tests.
Finally, although initial test performance in the confidence-weighted multiple-choice condition in Experiment 2 is numerically lower than that observed in Experiment 1 (M = 4.82 and 4.16), we believe this difference is due to the large variability in the scores, which is greatly affected by marking even a single highly confident but incorrect answer.
Averages for correct performance on the final cued-recall test, calculated on the basis of 20 items (10 from each passage), were 4.92 (SD = 2.81) items (24.5 %) for participants in the study-only group; 7.02 items (SD = 2.59) (35.1 %) for those in the standard-multiple choice group; and 8.4 (SD = 2.5) items (42 %) and for those in the confidence-weighted multiple-choice condition (Fig. 3).
For the confidence-weighted multiple-choice and standard multiple-choice conditions, the procedure remained the same as in Experiment 1.
Average correct performance on the final cued-recall test, based on a total of 20 items (10 items from each passage), for participants in the standard multiple-choice, standard multiple-choice plus confidence judgment, and confidence-weighted multiple-choice conditions, respectively, was 6.54 (SD = 2.14) items (32.7 %), 6.41 (SD = 2.85) items (32.1%%), and 8.11 (SD = 2.34) items (40.6 %) (Fig. 4).
Write better and faster with AI suggestions while staying true to your unique style.
Since I tried Ludwig back in 2017, I have been constantly using it in both editing and translation. Ever since, I suggest it to my translators at ProSciEditing.

Justyna Jupowicz-Kozak
CEO of Professional Science Editing for Scientists @ prosciediting.com