When comparing primary school students’ mean weights in both a public and private school, one can assume that the students attending government schools are more nutritionally poorer and therefore have lower body weights (hypothesis). The null hypothesis states that there is no significant difference in students’ weights in both schools. Because of sampling error, statistical probabilities of finding a difference are higher than real ones. This is known as a type 1 error, and by convention it is fixed at 5% or below (p value = the probability of an event occurring by chance) 😊 On the other hand, a type 2 error is when we fail to observe a difference when there is a difference due to say, inadequate sample πŸ˜‰ [1]
As you can see, even though the subjects' exposure and outcomes are accurately categorized, it is possible to introduce bias through differential selection or retention. It is possible to bias the association estimation if subjects are wrongly classified with regard to outcome or exposure. This is often called misclassification. The mechanism responsible for these mistakes can lead to either differential or non-differential misclassification. Ken Rothman distinguishes these as follows:
Image #2
Misclassifications of results can introduce bias in a study. However, it is usually less significant than misclassifications of exposure. First of all, misclassification is often based on exposure status. It's more challenging to categorize and assess exposures. While we often refer to smokers as non-smokers or smokers, what exactly do those terms mean? It is important to look at the amount of smoking, their duration, when and how often they smoked and whether or not they have had ever been exposed to smoke from the environment. As you can see, misclassifications of exposure are possible through a variety of methods. However, the majority of outcomes are definitive and few mechanisms can introduce error in classification.
Image #3
Let's say I measure a variable X with error as X' and then put people into a category based on their value of X'. People who have a value of X' close to the top of the category are more likely to be misclassified into the next higher category than are people with values of X' close to the middle of the category. People with a value of X' close to the bottom are more likely to be misclassified into the next lower category. Let's say that X has a FUTURE outcome. Well people with a value of X' close to the top of the category are more likely to develop the outcome than are people with values of X' close to the bottom of the category. So now people with a high value of X' are both more likely to be misclassified and also more likely to develop the outcome. Even though the measurement error was not differential, misclassifications are differential. The direction of the misclassification does not always point towards null. This can also happen with prospective studies where X was measured at baseline prior to the actual outcome.
Image #4

