Definition
A scatter diagram is a plot of paired data points on an X-Y axis that reveals whether two variables are related. One variable is plotted on the horizontal axis, the other on the vertical axis, and each pair of measurements becomes a single point on the chart. The resulting pattern of points shows whether the variables are correlated — and if so, whether the relationship is positive, negative, strong, or weak.
In the QC tool sequence, the scatter diagram is the hypothesis-testing tool. The cause-and-effect diagram generates potential causes. The scatter diagram tests whether a suspected cause is actually related to the effect. Does furnace temperature correlate with hardness? Does humidity correlate with paint defects? Does machine speed correlate with dimensional variation?
Japanese Origin
散布図 (sanpu zu) combines:
- 散布 (sanpu) — scatter, disperse, spread out (散 = scatter, 布 = spread/distribute)
- 図 (zu) — diagram, chart
The name describes the visual result: data points scattered across the chart area. The pattern of the scatter tells the story.
History
Scatter plots originated in the statistical sciences of the 19th century. Francis Galton used scatter diagrams to study hereditary traits in the 1880s, and Karl Pearson developed the correlation coefficient to quantify the relationships scatter plots revealed.
Ishikawa included the scatter diagram in the 7 QC Tools because it provides a simple, visual way for frontline workers to test whether two variables are related — without requiring them to calculate correlation coefficients. The visual pattern is sufficient for practical decision-making on the shop floor.
At Toyota — Scatter diagrams appear in QC circle reports and A3 problem-solving activities where the team needs to test a hypothesis about what is causing a quality problem. After a cause-and-effect diagram identifies potential causes, the team collects paired data (cause variable and effect variable simultaneously) and plots a scatter diagram to see if the relationship exists.
How to Construct
- Identify the two variables — One is the suspected cause (plotted on the X axis), the other is the effect (plotted on the Y axis).
- Collect paired data — For each observation, record both variables simultaneously. At least 30-50 pairs are needed for a reliable pattern.
- Scale the axes — Choose scales that allow the full range of both variables to be displayed.
- Plot the points — Each pair of values becomes a single point on the chart.
- Observe the pattern — Look at the overall shape of the point cloud.
How to Read a Scatter Diagram
Strong positive correlation — Points cluster along a line from lower-left to upper-right. As X increases, Y increases. Example: as furnace temperature increases, part hardness increases.
Strong negative correlation — Points cluster along a line from upper-left to lower-right. As X increases, Y decreases. Example: as tool wear increases, surface finish quality decreases.
Weak correlation — Points show a general trend but with wide scatter. The relationship exists but other factors are also significant.
No correlation — Points are scattered randomly across the chart with no discernible pattern. The two variables are not related. This is valuable information — it eliminates a suspected cause and redirects investigation.
Non-linear relationship — Points follow a curve rather than a straight line. The variables are related, but not in a simple proportional way. Example: defect rate may decrease as operator experience increases, but the improvement rate slows after a certain experience level.
Common Mistakes
Confusing correlation with causation. Two variables can be correlated without one causing the other. Both may be caused by a third variable. The scatter diagram shows correlation — proving causation requires further investigation, experimentation, or controlled tests.
Not collecting enough data pairs. A scatter diagram with 10 points can appear to show a relationship that is actually random. Collect at least 30 pairs, preferably 50, for a reliable visual pattern.
Plotting data collected at different times without checking stability. If the process was unstable during data collection (out of control on a control chart), the scatter diagram may show a false relationship or mask a real one. Verify process stability first.
Not stratifying. Mixing data from different conditions (machines, shifts, materials) can create misleading patterns. A scatter diagram may show no correlation in mixed data, when stratification would reveal a strong correlation within each group (or vice versa).
Over-interpreting weak patterns. A slight visual trend in a noisy scatter plot may not represent a real relationship. When the pattern is ambiguous, calculate the correlation coefficient for a quantitative assessment, or collect more data.