*What does that even mean?*

We certainly treat GPA like it means something. The statistic is used as a measure of Satisfactory Academic Progress (SAP), which has the weight of federal regulation behind it (34 CFR § 668.34). The regulation doesn't specify *how* GPA is to be calculated, however (or even that GPA must be the measure of progress).

Student grade averages are also used to determine financial aid, program admittance, and create informal barriers to experiences like study abroad and internships. Employers may use GPA as a filter for screening applicants.

Such uses suggest a measurement study, where we work backwards from the uses of GPA to assess the predictive validity and other measurement properties. For example, if employers are using GPA > 3.0 as a screen, is that meaningful for students from a particular institution/program? Is it valid *across* institutions? Such studies are routinely done by institutional researchers, for example in using GPA to predict retention, but this probably doesn't often include alternative definitions of GPA.

I'll take a different approach here, to focus on a single measurement question that may seem abstruse: the existence of a latent variable.

## Latent Traits

*what*the scale measures, but rather how we expect those values to be distributed. If we imagine that we've tapped into a latent variable that is the result of the contributions of many factors, then it is reasonable to assume by the Central Limit Theorem that the distribution of values is normal (the bell curve common to statistics).

__research question:__what is the grade-point assignment that gives the best approximation of a normal GPA distribution?

Instead of assuming that a B is 3 grade points for GPA purposes, maybe the "best" value is 2.9 or 3.5, where "best" means the value that leads most closely to a normal distribution of values.

The simplest way to induce grade point values is to use the ideas I wrote about here, here, and here.

*Figure 1. Assuming a log-likelihood latent scale for course grades, and that the scale is normally distributed, red lines denote induced scalar values on the 0-4 grade point scale, where 4 = A, 3 = B, etc.*

In this method, the lowest (F = 0) and highest (A = 4) values are fixed, and the intermediate ones are allowed to vary. The results are D = 0.8, C = 1.6 , and B = 2.7. One can see in Figure 1 that the gap between the induced values (red lines) is larger between A and B than the others, suggesting the need for a "super-A." At my institution, the A+ grade is four grade points, just like a regular A. Imagine that we let A+ = 5 instead. Then the induced scale shifts the undecorated A to the left.

*Figure 2. A+ grades are assigned five points in this variation. The induced values are F = 0, D = 0.7, C = 1.5, B = 2.4, and A = 3.5. On a four-point scale it's equivalent to F = 0, D = 0.6, C = 1.2, B = 1.9, A = 2.8, and A+ = 4.*

Notice the large gap in Figure 2 between A+ and A grades. The A+ version emphasizes the need for a grade to the right of A to better calibrate GPA. Alternatively, grading styles could be recalibrated on the usual scale by pushing the distribution leftward (more difficult-to-earn grades, making As less frequent).

## Course Difficulty

*independent*samples, and the courses students take (hence grades) are not randomly selected in most cases. Some courses are more difficult to earn grades in than others, and students can pick their way through these in many different ways. We can't make the selections random, but we can try to account for the variance in course difficulty. There are multiple ways to do that, and I'll just describe one here.

*Table 1. R output showing student statistics, one row per student. The S_ columns are frequencies of grades in the courses that student took, and the plain A-F grades are the proportions for that student.*

**Claim 1:**Any linear assignment of grade point weights results in the same distribution of GPA.

*any*weights \( w_i = a + bi \) preserve the distribution of GPAs after centering and scaling. A consequence is that we can always assume F = 0 and A = 4, and let the intermediate values vary without losing generality. </edit>

*Figure 3. Q-Q plots for (left) the usual integer (A= 4, B = 3...) weights, (middle) the optimal scale with just A, B, C, D, and F, and (right) the optimal scale including A+.*

Figure 4. A visualization of the extended grade weights to produce a near-perfect normal distribution of grade averages. The black line is the usual integer scale, with vertical displacements of the labels to show differences.

Figure 4. A visualization of the extended grade weights to produce a near-perfect normal distribution of grade averages. The black line is the usual integer scale, with vertical displacements of the labels to show differences.

Given Claim 1, the improvement in the GPA distribution's shape to near-normality is the result of the non-linear displacements (grades higher or lower than the dark line in Figure 4) and the addition of the A+ as a separate grade.

The optimization engine is sensitive to the initial guess for the parameters, and can converge to local minima that are not very good. The initial parameters for the output in Figure 4 were D = C = B = A = 3 points.

* *Discussion

The weights in Figure 4 suggest the need to weight A+ grades differently than A grades. The gaps between A, B, and C are nearly equal, and since these are the most numerous grades, this explains why the usual 0 to four scale has a nearly normal distribution. The results suggest that there's not much difference between C and D grades, however.

Results from the difficulty-adjusted weights in Figure 4 compare favorably to the latent-variable approach in Figure 2:

Difficulty-adjusted:

* F = 0, D = 1.2, C = 1.3, B = 1.9, A = 2.6, and A+ = 4.*

Latent Scale:

* F = 0, D = 0.6, C = 1.2, B = 1.9, A = 2.8, and A+ = 4.*

The only significant difference between the two is the weight for the D grade.

This analysis is predicated on the assumption that student abilities measured by GPA should be normally distributed. There are usually selection effects during the admissions process that could challenge the symmetric distribution theory. However, the similarity of the integer-weighted GPA to a normal distribution is close enough to think it's a reasonable assumption. Additionally, the fact that the optimal grade points end up in the right order (A > B, etc.) is a sign of validity.

## Source Code

## Predictive Validity

*Figure 5. Smoothed rates of graduate school admittance by GPA metric over about 8000 students sorted from low to high. GPA = the usual GPA, wGPA is the usual weights after accounting for course difficulty, and wpGPA is the custom difficulty-adjusted weights found in the Discussion.*