Statistics used in Practique

Practique uses a range of standard statistics for reporting and standard setting. Below are some descriptions and some useful links.

Candidate feedback report

Internal ID: candidate_feedback

Example:

Item	Description	Useful links
Overall score	Overall score in percentage for the whole exam / candidate
Cohort average	Student performance against cohort group (average of students)
Pass/Fail	Student passed of failed

Result analysis per station

Item	Description	Useful links
Your score	Score calculated from summary of marks in observation criteria
Pass mark	The Pass mark is the score which candidates must achieve in order to pass based on score. The pass mark is calculated by combining the station cut score (based on the type of standard setting method chosen) and the Standard Error of measurement (SEm multiplier + cut score) NOTE: The Pass mark can be entered manually per station which will override the Practique calculated passmark.
Class average	Cohort group average
Results by station	Passing or failing question
Results analysis graph	Visual representation of candidate score, cut score, and average score
Feedback from examiner	Any text feedback given by the Examiner for that candidate for that station.
Items breakdown	Break down per station

Station cut score

Internal ID: item_cut_score

Example:

Item	Description	Useful links
Mean score	Average score
Cut score	Calculated by (max score of station times standard method value) per 100
Max score	Max score of the question/station (OSCE: summary of observation criteria scores)
Standard deviation	scored.standard_deviation() --> numpy.std()	https://docs.scipy.org/doc/numpy/reference/generated/numpy.std.html
Alpha (if station deleted)	Cronbach’s alpha is a measure used to assess the reliability, or internal consistency, of a set of scale or test items. In other words, the reliability of any given measurement refers to the extent to which it is a consistent measure of a concept, and Cronbach’s alpha is one way of measuring the strength of that consistency. Cronbach’s alpha is computed by correlating the score for each scale item with the total score for each observation (usually individual survey respondents or test takers), and then comparing that to the variance for all individual item scores: The resulting coefficient of reliability ranges from 0 to 1 in providing this overall assessment of a measure’s reliability. If all of the scale items are entirely independent from one another (i.e., are not correlated or share no covariance), then = 0; and, if all of the items have high covariances, then will approach 1 as the number of items in the scale approaches infinity. In other words, the higher the coefficient, the more the items have shared covariance and probably measure the same underlying concept.	https://data.library.virginia.edu/using-and-interpreting-cronbachs-alpha/
Passes	number of passes per criteria
Fails	number of fails per criteria

Station statistic analysis

Internal ID: item_stat_analysis

Supported for Written items except SAQ, VSAQ and EMQ

Example

Item	Description	Useful links
33% Discrimination	Item discrimination is the degree to which students with high overall exam scores also got a particular item correct. The Station Statistic analysis uses 33% cohort to calculate the discrimination by: getting all correct answer and sorting it in order, selecting the top third correct answers and the bottom third correct answers, subtracting bottom from the top
Discrimination (point-biserial)	The item discrimination index is a point biserial correlation coefficient. Its possible range is -1.00 to 1.00. A positive result indicates that there is a high correlation between higher performing candidates giving a correct response to the item.	https://en.wikipedia.org/wiki/Point-biserial_correlation_coefficient
Facility (difficulty) of correct answer	Facility is a measure of how easy or difficult is a question for candidates. It is calculated as: FI = (Xaverage) / Xmax where Xaverage is the mean score obtained by all users attempting the item, and Xmax is the maximum score achievable for that item.
Frequency	Frequency of answers
Quintile Graph	For SBA type items it works like this: all candidates sorted by score (from the highest to the lowest) are split to 5 groups and then the graph shows % of candidates who got the question correctly in each group. The graph should usually shows "steps down" because most of top scored candidates should get the question right. For CPQ item type it shows ... something different

Item response model

Internal ID: item_responses

Example

Item Response Theory

Item	Description	Useful links
Difficulty	3Pl model	https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.curve_fit.html https://en.wikipedia.org/wiki/Item_response_theory
Discrimination	3PL modle	https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.curve_fit.html https://en.wikipedia.org/wiki/Item_response_theory
Pseudo-guess	This is only showed if it is more than 1. 3PL model	https://en.wikipedia.org/wiki/Item_response_theory
Chi-squared test		https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.curve_fit.html https://en.wikipedia.org/wiki/Chi-squared_test

In addition to the Scipy links, here is the wiki page that describes the 3 parameters above for IRT.

Classical Test Theory

Item

Description

Useful links

Facility

facility = mean_score of the station / max_score of the station

Discrimination (point-biserial)

The item discrimination index is a point biserial correlation coefficient. Its possible range is -1.00 to 1.00. A positive result indicates that there is a high correlation between higher performing candidates giving a correct response to the item.

https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.pearsonr.html

https://en.wikipedia.org/wiki/Point-biserial_correlation_coefficient

Frequency

In SBA item type frequency of answers is calculated. If candidate have not responded it is included in calculation. Facility and Frequency of most chosen answer should be the same. From Practique 5.4.0 > , beside answer letters columns for Frequency there is No Response column as well to show the whole picture.

Item characteristic curve (Passing probability over Ability):

item characteristic curve
passing percentage

Examiner report

Internal ID: examiner_control

Example:

Item	Description	Useful links
Z-score	How many standard deviations the examiner is from the mean	http://www.statisticshowto.com/probability-and-statistics/z-score/
Mean score	Average score given by all examiners for one station
Standard deviation	scored.standard_deviation() --> numpy.std()	https://docs.scipy.org/doc/numpy/reference/generated/numpy.std.html

Exam analysis report

Internal ID: diet_score

Example:

It is possible to select one category which will be used when computing data for the report.

Cumulative Percentage curve

Represents score frequency distribution from the minimal exam score to the maximal exam score.

Item analysis

Statistics

Item	Description	Useful links
Number of candidates	Number of candidates that sat the exam. Candidates that are excluded from exam are not included in the calculations.
Number of items	Number of items in the exam. Items that are excluded are not included in the calculations.
Minimum score	Smallest score achieved on exam.
Maximum score	Largest score achieved on exam.
Median	The median value is the score value in the middle of the sorted score array.	https://docs.scipy.org/doc/numpy/reference/generated/numpy.median.html
Mode	mode = scored.mode() --> scipy.mode(): Mode or Modal value is returning the most common score value in the list of scores. If there are more then oen value the smallest is returned. If there a no most common values it returns the smallest score in the exam.	https://docs.scipy.org/doc/scipy-0.19.1/reference/generated/scipy.stats.mode.html
Mean	The sum of all scores over the number of scores.	https://docs.scipy.org/doc/numpy/reference/generated/numpy.mean.html
Standard error of mean
Standard deviation	First calculating the mean score of the exam. Then we calculate (x - mean)^2 for each score. Then summary of each squared differences is divided by number of scores - 1. -1 is used as standard statistical practice for better estimation. Squared root is take from last result.	https://docs.scipy.org/doc/numpy/reference/generated/numpy.std.htm
Skew	Checking if data is noramlly distributed. If > 0 it is more squeezed to left if < 0 it is more squeezed to right.	https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.skew.html
Kurtosis	It defines sharpness of the distributed data at the peak of the curve. We are using Pearson definition.	https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.kurtosis.html; https://en.wikipedia.org/wiki/Kurtosis https://www.spcforexcel.com/knowledge/basic-statistics/are-skewness-and-kurtosis-useful-statistics#kurtosis

Classical Test Theory

Item	Description	Useful links
Cut Score	scored.exam_cut_score() --> sum(self.get_cut_scores().values() --> get_scored_cases() --> returns instances of Scored cases (set by standard method) : Sum of cut score of all stations divided by number of stations/questions.
Cronbach	Cronbach’s Alpha For each of the standard setting methods the Cronbach’s Alpha reliability metric is also calculated for the exam. This is given for the whole exam as well as what it would be if each item in turn were omitted from the analysis. This allows items that are lowering the reliability of the exam to be excluded from the results.	Standard Setting Terminology
SE of measurement	The Standard Error of Measurement (not to be confused with the Standard Error of the Mean) gives an indication of the spread of the measurement errors, when estimating candidates' true scores from the observed scores. It is calculated from the reliability coefficient (Practique uses Chronbach's alpha). It is assumed that the sampling errors are normally distributed. The SEM is calculated as SEM = S(1 – r_xx)^0.5 where S is the standard deviation of the exam, and r_xxis the reliability coefficient (Chronbach's alpha). The key application of SEM in Practique is to apply a confidence interval to the cut score. For example, if you would like to be 68% sure of the pass/fail decision, the SEM indicates that the candidates within 1 SEM of the cut score may fluctuate to the other side of the cut score should they take the exam again. For example, if you wanted to be 95% sure of your decision on outcomes, an SEM multiplier of 1.96 can be applied. These figures are based on the Normal Distribution. Practique applies this on the positive side for most Standard Setting methods, as we are dealing with competency exams. In practice, what this means is that you are 95% certain that the passing candidates scores represent their true scores.	Standard Setting Terminology
SEm mulitplier	See above	Standard Setting Terminology
Error (SEm * multiplier)
Pass Score rounded
Pass Rate