Linking to ImageMagick 6.9.12.98
Enabled features: fontconfig, freetype, fftw, heic, lcms, pango, raw, webp, x11
Disabled features: cairo, ghostscript, rsvg
Using 4 threads
img <- image_read("bird.jpg")2026-03-11
2026-03-11
Original Japanese version: Rで大津の二値化を行う方法
Otsu’s thresholding is one method for binarizing grayscale images in image processing. It binarizes an image by analyzing the image histogram and automatically determining the optimal threshold. Otsu’s thresholding is especially effective when image contrast is high, and it can clearly separate background and foreground.
Because Otsu’s thresholding is simple yet effective, it is widely used in image processing.
I remember being impressed when I first learned about Otsu’s thresholding. It is simple, effective, and beautiful. It is still one of the methods I use often.
The basic idea is to calculate within-class variance and between-class variance for all possible thresholds, then choose the threshold that maximizes between-class variance. The classes refer to the two classes in the binarized image: black and white. The threshold that maximizes between-class variance is the point that divides the image histogram most effectively, and the idea is that it separates background and foreground most clearly.
The classes may be called class 0 and class 1, or background class and foreground class.
A very simple summary of the flow is:
Within-class variance, between-class variance, and total variance are important concepts for analyzing image histograms. These variances have the following relationship.
\[ \sigma_T^2 = \sigma_W^2 + \sigma_B^2 \]
Here, \(\sigma_T^2\) represents total variance, \(\sigma_W^2\) represents within-class variance, and \(\sigma_B^2\) represents between-class variance.
Therefore, maximizing between-class variance means minimizing within-class variance. It can also be understood as making each class tightly grouped.
The image to be binarized here is the following.

This is a photo of a budgerigar that I used to keep at home. Her name was Aiko. She was very cute.
If you use an image processing package such as imager, Otsu’s thresholding can be implemented easily in R. However, to understand the algorithm, I will implement it manually here.
First, load the image. Manually implementing image loading is difficult, so I use the magick package for everything except the thresholding itself.
magick
magick is a package for image processing in R. It provides various functions for loading, converting, and editing images. magick itself is not originally an R image processing library; it is an R binding for ImageMagick, an image processing library written in C and C++.
Linking to ImageMagick 6.9.12.98
Enabled features: fontconfig, freetype, fftw, heic, lcms, pango, raw, webp, x11
Disabled features: cairo, ghostscript, rsvg
Using 4 threads
img <- image_read("bird.jpg")Use the image_convert() function to convert the image to grayscale.
img_gray <- image_convert(img, colorspace = "gray")
plot(img_gray)
Next, obtain the pixel values of the grayscale image.
dat <- image_data(img_gray, channels = "gray") # obtain pixel values
vals <- as.integer(dat) # convert pixel values to integersSummarize the frequency of pixel values as a histogram. Because pixel values range from 0 to 255, prepare 256 bins.
h <- hist(vals, breaks = (-0.5):(255.5))
Implement the Otsu thresholding algorithm. Otsu’s thresholding calculates within-class variance and between-class variance for all possible thresholds and selects the threshold that maximizes between-class variance.
The following code calculates within-class variance and between-class variance while changing the threshold from 0 to 255.
list_otsu <- lapply(seq_along(h$counts), function(i) {
t <- i - 1 # 0..L-1
p <- h$counts / sum(h$counts) # convert histogram to probability distribution
L <- length(p)
idx0 <- if (t >= 0) 0:t else integer(0) # pixel-value indices, zero-based
idx1 <- if (t + 1 <= L - 1) (t + 1):(L - 1) else integer(0)
# p is one-based in R, so add 1 when indexing
omega_0 <- sum(p[idx0 + 1])
omega_1 <- 1 - omega_0
# total mean
mu_t <- sum((0:(L - 1)) * p)
# class means; do not calculate for empty classes
mu_0 <- if (omega_0 > 0) sum(idx0 * p[idx0 + 1]) / omega_0 else 0
mu_1 <- if (omega_1 > 0) sum(idx1 * p[idx1 + 1]) / omega_1 else 0
# within-class variance; empty classes contribute 0
var_0 <- if (omega_0 > 0) sum((idx0 - mu_0)^2 * p[idx0 + 1]) / omega_0 else 0
var_1 <- if (omega_1 > 0) sum((idx1 - mu_1)^2 * p[idx1 + 1]) / omega_1 else 0
var_w <- omega_0 * var_0 + omega_1 * var_1
# between-class variance; terms for empty classes are 0
term0 <- if (omega_0 > 0) omega_0 * (mu_0 - mu_t)^2 else 0
term1 <- if (omega_1 > 0) omega_1 * (mu_1 - mu_t)^2 else 0
var_b <- term0 + term1
# total variance
var_T <- sum(p * ((0:(L - 1)) - mu_t)^2)
c(
t = t,
omega_0 = omega_0, # probability of class 0
omega_1 = omega_1, # probability of class 1
mu_0 = mu_0, # mean of class 0
mu_1 = mu_1, # mean of class 1
mu_t = mu_t, # total mean
var_0 = var_0, # variance of class 0
var_1 = var_1, # variance of class 1
var_w = var_w, # within-class variance
var_b = var_b, # between-class variance
var_T = var_T # total variance
)
})
df_otsu <- as.data.frame(do.call(rbind, list_otsu))
head(df_otsu) t omega_0 omega_1 mu_0 mu_1 mu_t var_0 var_1 var_w var_b
1 0 0 1 0 191.9755 191.9755 0 4692.817 4692.817 0
2 1 0 1 0 191.9755 191.9755 0 4692.817 4692.817 0
3 2 0 1 0 191.9755 191.9755 0 4692.817 4692.817 0
4 3 0 1 0 191.9755 191.9755 0 4692.817 4692.817 0
5 4 0 1 0 191.9755 191.9755 0 4692.817 4692.817 0
6 5 0 1 0 191.9755 191.9755 0 4692.817 4692.817 0
var_T
1 4692.817
2 4692.817
3 4692.817
4 4692.817
5 4692.817
6 4692.817
The optimal threshold is selected at the point where between-class variance is maximized.
t_opt <- df_otsu$t[which.max(df_otsu$var_b)]
t_opt[1] 164
For this image, the optimal threshold is 164.
To check visually, draw graphs of within-class variance and between-class variance.
plot(
df_otsu$t,
df_otsu$var_w,
ylim = c(0, max(df_otsu$var_w, df_otsu$var_b) * 1.1),
type = "l",
xlab = "Threshold",
ylab = "Within-class variance",
col = 2,
lwd = 3
)
lines(
df_otsu$t,
df_otsu$var_b,
type = "l",
xlab = "Threshold",
ylab = "Between-class variance",
col = 4,
lwd = 3
)
abline(v = t_opt, col = adjustcolor(7, alpha.f = 0.7), lwd = 3)
legend(
"topleft",
legend = c(
"Within-class variance",
"Between-class variance",
"Optimal threshold"
),
col = c(2, 4, 7),
lwd = 3,
bg = adjustcolor("white", alpha.f = 0.7)
)
You can see that the point where within-class variance is minimized and between-class variance is maximized is the optimal threshold.
Now binarize the image with this threshold.

The binarization looks quite clean.