We certainly treat GPA like it means something. The statistic is used as a measure of Satisfactory Academic Progress (SAP), which has the weight of federal regulation behind it (34 CFR § 668.34). The regulation doesn't specify how GPA is to be calculated, however (or even that GPA must be the measure of progress).
Student grade averages are also used to determine financial aid, program admittance, and create informal barriers to experiences like study abroad and internships. Employers may use GPA as a filter for screening applicants.
Such uses suggest a measurement study, where we work backwards from the uses of GPA to assess the predictive validity and other measurement properties. For example, if employers are using GPA > 3.0 as a screen, is that meaningful for students from a particular institution/program? Is it valid across institutions? Such studies are routinely done by institutional researchers, for example in using GPA to predict retention, but this probably doesn't often include alternative definitions of GPA.
I'll take a different approach here, to focus on a single measurement question that may seem abstruse: the existence of a latent variable.
Instead of assuming that a B is 3 grade points for GPA purposes, maybe the "best" value is 2.9 or 3.5, where "best" means the value that leads most closely to a normal distribution of values.
In this method, the lowest (F = 0) and highest (A = 4) values are fixed, and the intermediate ones are allowed to vary. The results are D = 0.8, C = 1.6 , and B = 2.7. One can see in Figure 1 that the gap between the induced values (red lines) is larger between A and B than the others, suggesting the need for a "super-A." At my institution, the A+ grade is four grade points, just like a regular A. Imagine that we let A+ = 5 instead. Then the induced scale shifts the undecorated A to the left.
Notice the large gap in Figure 2 between A+ and A grades. The A+ version emphasizes the need for a grade to the right of A to better calibrate GPA. Alternatively, grading styles could be recalibrated on the usual scale by pushing the distribution leftward (more difficult-to-earn grades, making As less frequent).
Figure 4. A visualization of the extended grade weights to produce a near-perfect normal distribution of grade averages. The black line is the usual integer scale, with vertical displacements of the labels to show differences.
Given Claim 1, the improvement in the GPA distribution's shape to near-normality is the result of the non-linear displacements (grades higher or lower than the dark line in Figure 4) and the addition of the A+ as a separate grade.
The optimization engine is sensitive to the initial guess for the parameters, and can converge to local minima that are not very good. The initial parameters for the output in Figure 4 were D = C = B = A = 3 points.
The weights in Figure 4 suggest the need to weight A+ grades differently than A grades. The gaps between A, B, and C are nearly equal, and since these are the most numerous grades, this explains why the usual 0 to four scale has a nearly normal distribution. The results suggest that there's not much difference between C and D grades, however.
Results from the difficulty-adjusted weights in Figure 4 compare favorably to the latent-variable approach in Figure 2:
F = 0, D = 1.2, C = 1.3, B = 1.9, A = 2.6, and A+ = 4.
F = 0, D = 0.6, C = 1.2, B = 1.9, A = 2.8, and A+ = 4.
The only significant difference between the two is the weight for the D grade.
This analysis is predicated on the assumption that student abilities measured by GPA should be normally distributed. There are usually selection effects during the admissions process that could challenge the symmetric distribution theory. However, the similarity of the integer-weighted GPA to a normal distribution is close enough to think it's a reasonable assumption. Additionally, the fact that the optimal grade points end up in the right order (A > B, etc.) is a sign of validity.