Exploring statistics: Detecting objects in Images

I wanted to identify moving objects in video (or still) images - preferably using R as that's my tool of choice. I thought this might be challenging, but it proved to be relatively easy.

The basic steps are:

Import the images from the video, discard the 'empty' ones and save the interesting ones. [R, with ImageMagick]
Run the interesting images through a neural net to locate and identify objects. Save the locations for some later stage of the project. [R, yolo]
Stitch the results back together. [R, ImageMagick]

There's an example here, applied to video from a 'camera trap' in the garden.

The rest of the blog gives the full details on how this is done.

1. Import and Filter Images

I'm using the R packages tidyr, dplyr, magick and image.darknet.

We want to save just those images from the video where there is movement (change from one to the next), but to maintain the image index which indicates the original frame ID (and hence the time).

On my machine, yolo is fast - about 1.5 seconds per image - but filtering out images with no movement in them using 'image_compare' is about 40 times faster, so well worth the trouble.

1a. Faff around with names and directories

mov_dir <- "image/movies/" # for output

mov <- "~/Movies/Garden/2020Q4/12240009 dog.mp4"

fps <- 5 # use a low value, eg 5 during testing then up to the full frames-per-second later

no_change_threshold <- 1000

fuzz_threshold <- 15 # use 10 for night vision, 15 otherwise

mov_img_name <- stringr::str_remove_all(mov, "(.*/)|(.mov)|(.mp4)|(.MOV") %>%

stringr::str_replace_all(" ", "_") %>%

stringr::str_c(paste0("_", fps, "fps"))

mov_img_dir <- paste0(mov_dir, mov_img_name)

dir.create(mov_img_dir, showWarnings = FALSE)

1b. Read in the images from the video

imgs <- image_read_video(mov, fps = fps)

1c. Nothing is moving if two images are the same

#check for no movement

system.time(

diff <- unlist(lapply(1:(length(imgs) - 1),

function(x) {image_compare_dist(imgs[x], imgs[x+1],

metric = "AE",

fuzz = fuzz_threshold)}))

)

# apply threshold

diff <- (diff < no_change_threshold)

# if TRUE, then apply to image and the next one

keep_movement <- !(c(diff, FALSE) | c(FALSE, diff))

1d. Save the interesting ones

# 5 digits should be find unless you're making very long videos

z <- lapply(1:length(imgs),

function(x){

if (keep_movement[x])

image_write(imgs[x],

file.path(mov_img_dir,

paste0(mov_img_name, "_frame",

stringr::str_pad(x, 5, pad = "0"),

".jpg")),

format = "jpg", quality = 50)

}

)

2. Run the neural net

I'm using Joseph Redmon's YOLO because it's super quick, and it's available through the 'image.darknet' package in R, from BNOSAC.

In fact, I wanted to extract the coordinates of the rectangles from YOLO. This isn't built in to 'image.darknet', perhaps because YOLO - while very impressive - has pushed for speed over precision rectangles. Whether there's really enough precision for my end purpose remains to be seen, but in any case, I forked it to make a very small amendment: adding the rectangle coordinates to the output, using the advice of Brian O'Donnell.

If you prefer to run the 'official' version (on github, but not yet on CRAN) then it will work, but you won't be able run xxx successfully, to parse the log.

The steps for running yolo owe a lot to Duncan Golicher and Tomasz Weiss. I've tweaked the parsing of the log quite a lot, to pick up details of the image - and there's more faffing around with filenames and directories, because I want the code to manage the files in a tidy fashion.

2a. Install the package

# original version...

# devtools::install_github("bnosac/image", subdir = "image.darknet", build_vignettes = TRUE)

# my version with added reporting...

# devtools::install_github("david6marsh/image", subdir = "image.darknet", build_vignettes = TRUE)

2b. Setup the neural net

detect <- image_darknet_model(type = 'detect',

model = 'tiny-yolo-voc.cfg',

# model = 'yolo.cfg',

weights = system.file(package='image.darknet', 'models', 'tiny-yolo-voc.weights'),

labels = system.file(package='image.darknet', 'include', 'darknet', 'data', 'voc.names'))

# a function to apply yolo to a bunch of images

yolo <- function(x,

thresh = 0.3,

config = detect,

source_path = "image", pred_path = "pred"){

fl <- paste(source_path, x, sep="/")

pred <- image_darknet_detect(file = fl,

object = config,

threshold = thresh)

# output is png so save as png but in pred_path

file.rename("predictions.png",

paste0(pred_path, "/",

paste0(stringr::str_remove_all(x, "(.jpg)|(.png)|( )"), ".png")

))

pred

}

## Redirecting stdout to give text details of capture- from D Golcher

# redirect to file

Rcpp::cppFunction('void redir(){FILE* F=freopen("capture.txt","w+",stdout);}')

redir();

# and for flushing - to get the final text

Rcpp::cppFunction('void fflush(){fflush(stdout);}')

2c. Parse the log to extract details, into a data frame

yolo_to_df <- function(cap = "capture.txt",

path = "image"){

# Read in the output file

d <- data.frame(txt = unlist(readLines(cap))) %>%

mutate(id = row_number())

# split the 3 types of lines separately

path_ = paste0(path,"/")

#filenames

d_file <- d %>%

filter(grepl("/", txt)) %>%

separate(txt, into = c("path", "file", "time_taken_s"),

sep = paste0("(",

path_,

")|(: Predicted in )|( seconds.)"), extra = "drop") %>%

mutate(path = path)

d_box <- d %>%

filter(grepl("Boxes", txt)) %>%

separate(txt, into = c("b", "boxes", "sig_boxes", "other"),

sep = "(:)|(of which)|(above the threshold.)", extra = "drop") %>%

select(-b, -other)

d_boxes <- d %>%

filter(grepl("Bounding", txt)) %>%

separate(txt, into = c("object", "prob", "left", "top", "right", "bottom"),

sep = "(Bounding Box: Left=)|(:)|(, Top=)|(, Right=)|(, Bottom=)",

extra = "drop")

# recombine, and convert to correct types

df <- d_file %>%

bind_rows(d_box, d_boxes) %>%

arrange(id) %>%

fill(path:time_taken_s, .direction = "down") %>%

# grouping by time is strange, but serves as identifier of run

group_by(path, file, time_taken_s) %>%

mutate(id = min(id)) %>%

# fill in the gaps and compress

fill(boxes:sig_boxes, .direction = "updown") %>%

filter(sig_boxes == 0 | !is.na(object)) %>%

mutate(across(c(boxes, sig_boxes, left:bottom), as.integer)) %>%

mutate(time_taken_s = as.double(time_taken_s),

prob = as.numeric(sub("%", "", prob)) / 100) %>%

arrange(path, file, time_taken_s, desc(prob)) %>%

ungroup()

}

2d. Detect and label objects in the images

imgs <- c(dir(path=mov_img_dir, pattern="jpg"))

pred_img_dir <- stringr::str_replace(mov_img_dir, "image", "pred")

dir.create(pred_img_dir, recursive = TRUE, showWarnings = FALSE)

z <- file.remove("capture.txt")

redir()

system.time(

d <- lapply(imgs, yolo, thresh = 0.25,

source_path = mov_img_dir, pred_path = pred_img_dir)

)

fflush() # to get result from final image

2e. Parse the text results

# and save the results as a dataframe

df <- yolo_to_df("capture.txt", path = mov_img_dir)

save(df, file = paste0("data/boxes_",

stringr::str_replace(Sys.time(), " ", "_"),

".RDS"))

3. Stitch the images back together

pred_imgs <- paste0(pred_img_dir, "/",

stringr::str_replace(imgs, ".jpg", ".png"))

m_imgs <- image_read(pred_imgs) %>%

image_crop(geometry = "1280x670") # trim off the bottom - camera logo

image_write_video(m_imgs,

path = paste0(pred_img_dir, "/", "sheep_dog.mp4"),

framerate = 5)

Exploring statistics

Links to my Tableau public dashboards:

Monday, 4 January 2021

Detecting objects in Images - with R, yolo and ImageMagick