## Thursday, April 22, 2010

### Validity and Measurement

My wife is in charge of an "applied research" center here at the university, and ordered some books on research to satisfy her curiosity about how the other half lives (she's a lit geek).  I browsed them, looking for definitions of validity and measurement.

From The Research Methods Knowledge Base by William M. K. Trochim and James P. Donnelly:
When people think about validity in research, they tend to think in term of research components.  You might say that a measure is a valid one, that a  valid sample was drawn, or that the design had strong validity, but all of those statements are technically incorrect.  Measures, samples, and designs don't have validity--only propositions can be said to be valid.  Technically, you should say that a measure leads to valid conclusions or that a sample enables valid inferences, and so on.  It is a proposition, inference, or conclusion that can have validity.
This is the usual definition, which most people seem to ignore.  In casual conversation, people in assessment land and in psychology say things like "use a valid measurement" routinely in my experience.  More problematic is the usual linkage between validity and reliability.  Reliability is repeatability of results, or as Wikipedia puts it nicely "the consistency of a set of measurements or measuring instrument [...]"

Do you notice anything illogical here?  If validity is about propositions (about some instrument), and reliability is a property of the test itself, we can't justify the common assertion that validity requires reliability.  It's like saying that a thermometer needs someone to read it correctly--there are two issues conflated.

Example:  Jim Bob steps onto the balcony of his hotel room on his first evening of an all-expense paid trip to Paris for winning the Bugtussel Bowling Championship.  He looks at the thermometer nailed to the door frame, and is amazed to see that it is 10 degrees.  It's chilly out, but he didn't think it was that cold!
Clearly Jim Bob mistook Celsius for Fahrenheit, and made an invalid conclusion.  This has nothing to do with the reliability of the thermometer. You may rightfully object that of course we can always reach invalid conclusions--the trick is to find valid ones, and for that we require reliable instruments.  Not so.

Example: A standardized test is given to 145 students.  The assessment director gets a list of the students who took the test.  Is this list valid?  Yes.  In what sense is it reliable?

Here, the proposition is valid--it reflects reality--but isn't reliable longitudinally.  It's reliable in the trivial sense that if we look at the same instance of testing, we'd see the same roster of students, but almost anything is reliable by that standard.  The next time we give the instrument, we won't have the same roster.  Does this unreliability mean that the roster is invalid?  Of course not.

If reliability is a sine qua non for validity, then no unique observation can be valid.  Your impressions and conclusions about a movie on first viewing or first date are invalid because they can't be repeated.  When you read a book you really like, finish it and tell your friend "I enjoyed it," this is invalid according to testing dogma standards. You would have to first repeatedly read it for the first time and then assess the resulting statistics.

To underline the illogic of validity => reliability, consider the proposition "The inter-rater statistics on this method of assessment show it to be unreliable."  This is a common enough thing.  So how would we evaluate the validity of that statement?  If the central dogma it is true that validity requires reliability, there must be an underlying reliability that is a prerequisite for the validity of the conclusion.  Does that mean that we have to show that the inter-rater statistics are reliably unreliable?  That makes no sense: one instance of unreliability is all that is required to demonstrate unreliability.  A drunk driver may be only occasionally unreliable, but still absolutely unreliable, right?

If I make the statement "no cow is ever brown" the validity of that can be negated by a single instance of a brown cow.  I don't have to be able to reliably find brown cows, just one.  Yes, I have to be sure that the cow I saw really was brown, but this is a very low bar for reliability, and not what we're talking about.  Therefore, some kinds of propositions do not require reliability in order to be valid.

If I hear on the news that the Dow went up 15 points today, how should I evaluate the validity of this statement?  I could check other sources, but this will just establish the one fact.  I cannot repeat the day over and over to see if the Dow repeats its performance each time.  There is no way to establish longitudinal reliability.  Does that negate the validity of the statement?

The role of reliability is to allow us to make an inductive leap: if X has happened consistently in the past, maybe X is a feature of the universe, and will continue to happen.  Every time we eat a hamburger (or anything else), we make such an inductive leap.  Sometimes we have to be really clever to find out where the reliable parts are--like measuring gravity.  So rather than a dogma strong requirement about reliability and validity, we should say something like "we assume that reliability implies something persistent about the objective reality of the subject."

So, on to measurement.  I looked up that chapter in the book and found this:
Measurement is the process of observing and recording the observations that are collected as part of a research effort. (pg 56)
Contrast that with the nice Wikipedia definition:
In science, measurement is the process of obtaining the magnitude of a quantity, such as length or mass, relative to a unit of measurement, such as a meter or a kilogram.

From dictionary.com, a measurement is the "extent, size, etc., ascertained by measuring".

The scientific definitions lend themselves to units and physical dimensions.  The first definition, from the research methods book, is much more general.  Let's parse it.  There are three parts.  First, measurement is a process of observing.  Then we note that we should record the observation.  I would argue that this provision is unnecessary--we only talk about any datum as long as it's recorded somewhere, even if only in our minds.  If it's not recorded, it's nowhere to be found and irrelevant. We can, however, glean from this that 'measurement' is both a verb and a noun.

The last provision--that it must be part of a research effort--can also be discarded.  The purpose of the observer may not be known when the measurements are later used.  Perhaps an ancient astronomer made star charts for religious reasons.  Does that negate them as measurements, solely for that reason?  No.  So we are left simply with the first part: a measurement is a process of observing (verb), or the product of the same (noun).  This is much weaker than the scientific version, because we aren't tied to standard units of measurement.

My main objection to this is that there's no need to use another word.  If we mean observation, why don't we just say observation?

Edits: Fixed a vanished sentence fragment, and crossed out 'dogma', which is a silly word to use.  I hadn't had enough coffee yet.