What term is used to refer to how likely a study can be replicated with the same results in sociology?

External validity is the extent to which you can generalize the findings of a study to other situations, people, settings and measures. In other words, can you apply the findings of your study to a broader context?

Table of Contents Show

Types of external validity
Population validity
Ecological validity
Trade-off between external and internal validity
Threats to external validity and how to counter them
How to counter threats to external validity
Frequently asked questions about external validity

The aim of scientific research is to produce generalizable knowledge about the real world. Without high external validity, you cannot apply results from the laboratory to other people or the real world.

In qualitative studies, external validity is referred to as transferability.

Types of external validity

There are two main types of external validity: population validity and ecological validity.

Population validity

Population validity refers to whether you can reasonably generalize the findings from your sample to a larger group of people (the population).

Population validity depends on the choice of population and on the extent to which the study sample mirrors that population. Non-probability sampling methods are often used for convenience. With this type of sampling, the generalizability of results is limited to populations that share similar characteristics with the sample.

Example: low population validityYou want to test the hypothesis that people tend to perceive themselves as smarter than others in terms of academic abilities. Your target population is the 10,000 undergraduate students at your university.

You recruit over 200 participants. They are science and engineering majors; most of them are American, male, 18–20 years old and from a high socioeconomic background. In a laboratory setting, you administer a mathematics and science test and then ask them to rate how well they think performed. You find that the average participant believes they are smarter than 66% of their peers.

Can you conclude that most people believe themselves to be much better than others at maths and science?

Here, your sample is not representative of the whole population of students at your university. The findings can only reasonably be generalized to populations that share characteristics with the participants, e.g. college-educated men and STEM majors.

For higher population validity, your sample would need to include people with different characteristics (e.g. women, non-binary people, and students from different majors, countries, and socioeconomic backgrounds).

Samples like this one, from Western, Educated, Industrialized, Rich and Democratic (WEIRD) countries, are used in an estimated 96% of psychology studies, even though they represent only 12% of the world’s population. Since they are outliers in terms of visual perception, moral reasoning and categorization (among many other topics), WEIRD samples limit broad population validity in the social sciences.

Ecological validity

Ecological validity refers to whether you can reasonably generalize the findings of a study to other situations and settings in the ‘real world’.

Example: low ecological validityYou want to test the hypothesis that driving reaction times become slower when people pay attention to others talking.

In a laboratory setting, you set up a simple computer-based task to measure reaction times. Participants are told to imagine themselves driving around the racetrack and double-click the mouse whenever they see an orange cat on the screen. For one round, participants listen to a podcast. In the other round, they do not need to listen to anything. After assessing the results, you find that reaction times are much slower when listening to the podcast.

Can you conclude that driving reaction times are slower when people listen to others talking?

In the example above, it is difficult to generalize the findings to real-life driving conditions. A computer-based task using a mouse does not resemble real-life driving conditions with a steering wheel. Additionally, a static image of an orange cat may not represent common real-life hurdles when driving.

To improve ecological validity in a lab setting, you could use an immersive driving simulator with a steering wheel and foot pedal instead of a computer and mouse. This increases psychological realism by more closely mirroring the experience of driving in the real world.

Alternatively, for higher ecological validity, you could conduct the experiment using a real driving course.

Trade-off between external and internal validity

Internal validity is the extent to which you can be confident that the causal relationship established in your experiment cannot be explained by other factors.

There is an inherent trade-off between external and internal validity; the more applicable you make your study to a broader context, the less you can control extraneous factors in your study.

Internal vs. external validity exampleIn the driving reaction times study, you are able to control the conditions of the experiment and ensure that there are no extraneous factors that could explain the outcome. Because the experiment has high internal validity, you can confidently conclude that listening to the podcast causes slower reaction times.

Moving the experiment to a real-life driving course significantly increases external validity at the expense of internal validity. That’s because you risk introducing extraneous and confounding factors (e.g. weather or visibility conditions) that affect the outcome.

Professional editors proofread and edit your paper by focusing on:

Academic style
Vague sentences
Grammar
Style consistency

See an example

Threats to external validity and how to counter them

Threats to external validity are important to recognize and counter in a research design for a robust study.

Research exampleA researcher wants to test the hypothesis that people with clinical diagnoses of mental disorders can benefit from practicing mindfulness daily in just two months time. They recruit people who have been diagnosed with depression for at least a year, are aged between 20–29, and live locally.

Participants are given a pretest and a post-test measuring how often they experienced anxiety in the past week. During the study, all participants are given an individual mindfulness training and asked to practice mindfulness daily for 15 minutes in the morning.

Since the levels of anxiety decreased between the pre- and post-test, the researcher concludes that all clinical populations can benefit from mindfulness.

Threats to external validity

Threat	Meaning	Example
Sampling bias	The sample is not representative of the population.	The sample includes only people with depression. They have characteristics (e.g., negative thought patterns) that may make them very different from other clinical populations, like people with personality disorders or schizophrenia.
History	An unrelated event influences the outcomes.	Right before the pre-test, a natural disaster takes place in a neighbouring state. As a result, pre-test anxiety scores are higher than they might be otherwise.
Experimenter effect	The characteristics or behaviors of the experimenter(s) unintentionally influence the outcomes.	The trainer of the mindfulness sessions unintentionally stressed the importance of this study for the research department’s funding. Participants work extra hard to reduce their anxiety levels during the study as a result.
Hawthorne effect	The tendency for participants to change their behaviors simply because they know they are being studied.	The participants actively avoid anxiety-inducing situations for the period of the study because they are conscious of their participation in the research.
Testing effect	The administration of a pre- or post-test affects the outcomes.	Because participants become familiar with the pre-test format and questions, they are less anxious during the post-test and recall less anxiety then.
Aptitude-treatment	Interactions between characteristics of the group and individual variables together influence the dependent variable.	Interactions between certain characteristics of the participants with depression (e.g., negative thought patterns) and the mindfulness exercises (e.g., focus on the present) improve anxiety levels. The findings are not replicated with people with personality disorders or schizophrenia.
Situation effect	Factors like the setting, time of day, location, researchers’ characteristics, etc. limit generalizability of the findings.	The study is repeated with one change; the participants practice mindfulness at night rather than in the morning. The outcomes do not show any improvement this time.

How to counter threats to external validity

There are several ways to counter threats to external validity:

Replications counter almost all threats by enhancing generalizability to other settings, populations and conditions.
Field experiments counter testing and situation effects by using natural contexts.
Probability sampling counters selection bias by making sure everyone in a population has an equal chance of being selected for a study sample.
Recalibration or reprocessing also counters selection bias using algorithms to correct weighting of factors (e.g., age) within study samples.