How to use fastai to train an image sequence to image sequence job.
 

Fix an issue on fastai 2.3, this PR should fix it

Learner.one_batch[source]

Learner.one_batch(i, b)

UCF101 Action Recognition

UCF101 is an action recognition data set of realistic action videos, collected from YouTube, having 101 action categories. This data set is an extension of UCF50 data set which has 50 action categories.

"With 13320 videos from 101 action categories, UCF101 gives the largest diversity in terms of actions and with the presence of large variations in camera motion, object appearance and pose, object scale, viewpoint, cluttered background, illumination conditions, etc, it is the most challenging data set to date. As most of the available action recognition data sets are not realistic and are staged by actors, UCF101 aims to encourage further research into action recognition by learning and exploring new realistic action categories"

setup

We have to download the UCF101 dataset from their website. It is a big dataset (6.5GB), if your connection is slow you may want to do this at night or in a terminal (to avoid blocking the notebook). fastai's untar_data is not capable of downloading this dataset, so we will use wget and then unrar the files using rarfile.

fastai's datasets are located inside ~/.fastai/archive, we will download UFC101 there.

# !wget -P ~/.fastai/archive/ --no-check-certificate  https://www.crcv.ucf.edu/data/UCF101/UCF101.rar 

you can run this command on a terminal to avoid blocking the notebook

Let's make a function tounrar the downloaded dataset. This function is very similar to untar_data, but handles .rar files.

unrar[source]

unrar(fname, dest)

Extract fname to dest using rarfile

To be consistent, we will extract UCF dataset in ~/.fasta/data. This is where fastai stores decompressed datasets.

ucf_fname = Path.home()/'.fastai/archive/UCF101.rar'
dest = Path.home()/'.fastai/data/UCF101'

unraring a large file like this one is very slow.

path = unrar(ucf_fname, dest)
extracting to: /home/tcapelle/.fastai/data/UCF101

The file structure of the dataset after extraction is one folder per action:

path.ls()
(#101) [Path('/home/tcapelle/.fastai/data/UCF101/Hammering'),Path('/home/tcapelle/.fastai/data/UCF101/HandstandPushups'),Path('/home/tcapelle/.fastai/data/UCF101/HorseRace'),Path('/home/tcapelle/.fastai/data/UCF101/FrontCrawl'),Path('/home/tcapelle/.fastai/data/UCF101/LongJump'),Path('/home/tcapelle/.fastai/data/UCF101/GolfSwing'),Path('/home/tcapelle/.fastai/data/UCF101/ApplyEyeMakeup'),Path('/home/tcapelle/.fastai/data/UCF101/UnevenBars'),Path('/home/tcapelle/.fastai/data/UCF101/HeadMassage'),Path('/home/tcapelle/.fastai/data/UCF101/Kayaking')...]

inside, you will find one video per instance, the videos are in .avi format. We will need to convert each video to a sequence of images to able to work with our fastai vision toolset.

UCF101-frames

├── ApplyEyeMakeup
|   |── v_ApplyEyeMakeup_g01_c01.avi
|   ├── v_ApplyEyeMakeup_g01_c02.avi
|   |   ...
├── Hammering
|   ├── v_Hammering_g01_c01.avi
|   ├── v_Hammering_g01_c02.avi
|   ├── v_Hammering_g01_c03.avi
|   |   ...
...
├── YoYo
    ├── v_YoYo_g01_c01.avi
    ...
    ├── v_YoYo_g25_c03.avi

we can grab all videos at one using get_files and passing the '.avi extension

video_paths = get_files(path, extensions='.avi')
video_paths[0:4]
(#4) [Path('/home/tcapelle/.fastai/data/UCF101/Hammering/v_Hammering_g22_c05.avi'),Path('/home/tcapelle/.fastai/data/UCF101/Hammering/v_Hammering_g21_c05.avi'),Path('/home/tcapelle/.fastai/data/UCF101/Hammering/v_Hammering_g03_c03.avi'),Path('/home/tcapelle/.fastai/data/UCF101/Hammering/v_Hammering_g18_c02.avi')]

We can convert the videos to frames using av:

extract_frames[source]

extract_frames(video_path)

convert video to PIL images

frames = list(extract_frames(video_paths[0]))
frames[0:4]
[<PIL.Image.Image image mode=RGB size=320x240 at 0x7F2C87D161F0>,
 <PIL.Image.Image image mode=RGB size=320x240 at 0x7F2C87D0CBE0>,
 <PIL.Image.Image image mode=RGB size=320x240 at 0x7F2C87D0CD30>,
 <PIL.Image.Image image mode=RGB size=320x240 at 0x7F2C87D0CD90>]

We havePIL.Image objects, so we can directly show them using fastai's show_images method

show_images(frames[0:5])

let's grab one video path

video_path = video_paths[0]
video_path
Path('/home/tcapelle/.fastai/data/UCF101/Hammering/v_Hammering_g22_c05.avi')

We want to export all videos to frames, les't built a function that is capable of exporting one video to frames, and stores the resulting frames on a folder of the same name.

Let's grab de folder name:

video_path.relative_to(video_path.parent.parent).with_suffix('')
Path('Hammering/v_Hammering_g22_c05')

we will also create a new directory for our frames version of UCF. You will need at least 7GB to do this, afterwards you can erase the original UCF101 folder containing the videos.

path_frames = path.parent/'UCF101-frames'
if not path_frames.exists(): path_frames.mkdir()

we will make a function that takes a video path, and extracts the frames to our new UCF-frames dataset with the same folder structure.

avi2frames[source]

avi2frames(video_path, path_frames, force=False)

Extract frames from avi file to jpgs

avi2frames(video_path, path_frames)
(path_frames/video_path.relative_to(video_path.parent.parent).with_suffix('')).ls()
(#161) [Path('/home/tcapelle/.fastai/data/UCF101-frames/Hammering/v_Hammering_g22_c05/63.jpg'),Path('/home/tcapelle/.fastai/data/UCF101-frames/Hammering/v_Hammering_g22_c05/90.jpg'),Path('/home/tcapelle/.fastai/data/UCF101-frames/Hammering/v_Hammering_g22_c05/19.jpg'),Path('/home/tcapelle/.fastai/data/UCF101-frames/Hammering/v_Hammering_g22_c05/111.jpg'),Path('/home/tcapelle/.fastai/data/UCF101-frames/Hammering/v_Hammering_g22_c05/132.jpg'),Path('/home/tcapelle/.fastai/data/UCF101-frames/Hammering/v_Hammering_g22_c05/59.jpg'),Path('/home/tcapelle/.fastai/data/UCF101-frames/Hammering/v_Hammering_g22_c05/46.jpg'),Path('/home/tcapelle/.fastai/data/UCF101-frames/Hammering/v_Hammering_g22_c05/130.jpg'),Path('/home/tcapelle/.fastai/data/UCF101-frames/Hammering/v_Hammering_g22_c05/142.jpg'),Path('/home/tcapelle/.fastai/data/UCF101-frames/Hammering/v_Hammering_g22_c05/39.jpg')...]

Now we can batch process the whole dataset using fastcore's parallel. This could be slow on a low CPU count machine. On a 12 core machine it takes 4 minutes.

#parallel(avi2frames, video_paths)

after this you get a folder hierarchy that looks like this

UCF101-frames

├── ApplyEyeMakeup
|   |── v_ApplyEyeMakeup_g01_c01
|   │   ├── 0.jpg
|   │   ├── 100.jpg
|   │   ├── 101.jpg
|   |   ...
|   ├── v_ApplyEyeMakeup_g01_c02
|   │   ├── 0.jpg
|   │   ├── 100.jpg
|   │   ├── 101.jpg
|   |   ...
├── Hammering
|   ├── v_Hammering_g01_c01
|   │   ├── 0.jpg
|   │   ├── 1.jpg
|   │   ├── 2.jpg
|   |   ...
|   ├── v_Hammering_g01_c02
|   │   ├── 0.jpg
|   │   ├── 1.jpg
|   │   ├── 2.jpg
|   |   ...
|   ├── v_Hammering_g01_c03
|   │   ├── 0.jpg
|   │   ├── 1.jpg
|   │   ├── 2.jpg
|   |   ...
...
├── YoYo
    ├── v_YoYo_g01_c01
    │   ├── 0.jpg
    │   ├── 1.jpg
    │   ├── 2.jpg
    |   ...
    ├── v_YoYo_g25_c03
        ├── 0.jpg
        ├── 1.jpg
        ├── 2.jpg
        ...
        ├── 136.jpg
        ├── 137.jpg

Data pipeline

we have converted all the videos to images, we are ready to start building our fastai data pieline

data_path = Path.home()/'.fastai/data/UCF101-frames'
data_path.ls()[0:3]
(#3) [Path('/home/tcapelle/.fastai/data/UCF101-frames/Hammering'),Path('/home/tcapelle/.fastai/data/UCF101-frames/HandstandPushups'),Path('/home/tcapelle/.fastai/data/UCF101-frames/HorseRace')]

we have one folder per action category, and inside one folder per instance of the action.

get_instances[source]

get_instances(path)

gets all instances folders paths

with this function we get individual instances of each action, these are the image sequences that we need to clasiffy.. We will build a pipeline that takes as input instance path's.

instances_path = get_instances(data_path)
instances_path[0:3]
(#3) [Path('/home/tcapelle/.fastai/data/UCF101-frames/Hammering/v_Hammering_g14_c02'),Path('/home/tcapelle/.fastai/data/UCF101-frames/Hammering/v_Hammering_g07_c03'),Path('/home/tcapelle/.fastai/data/UCF101-frames/Hammering/v_Hammering_g13_c07')]

we have to sort the video frames numerically. We will patch pathlib's Path class to return a list of files conttaines on a folde sorted numerically. It could be a good idea to modify fastcore's ls method with an optiional argument sort_func.

Path.ls_sorted[source]

Path.ls_sorted()

ls but sorts files by name numerically

instances_path[0].ls_sorted()
(#187) [Path('/home/tcapelle/.fastai/data/UCF101-frames/Hammering/v_Hammering_g14_c02/0.jpg'),Path('/home/tcapelle/.fastai/data/UCF101-frames/Hammering/v_Hammering_g14_c02/1.jpg'),Path('/home/tcapelle/.fastai/data/UCF101-frames/Hammering/v_Hammering_g14_c02/2.jpg'),Path('/home/tcapelle/.fastai/data/UCF101-frames/Hammering/v_Hammering_g14_c02/3.jpg'),Path('/home/tcapelle/.fastai/data/UCF101-frames/Hammering/v_Hammering_g14_c02/4.jpg'),Path('/home/tcapelle/.fastai/data/UCF101-frames/Hammering/v_Hammering_g14_c02/5.jpg'),Path('/home/tcapelle/.fastai/data/UCF101-frames/Hammering/v_Hammering_g14_c02/6.jpg'),Path('/home/tcapelle/.fastai/data/UCF101-frames/Hammering/v_Hammering_g14_c02/7.jpg'),Path('/home/tcapelle/.fastai/data/UCF101-frames/Hammering/v_Hammering_g14_c02/8.jpg'),Path('/home/tcapelle/.fastai/data/UCF101-frames/Hammering/v_Hammering_g14_c02/9.jpg')...]

let's grab the first 5 frames

frames = instances_path[0].ls_sorted()[0:5]
show_images([Image.open(img) for img in frames])

We will build a tuple that contains individual frames and that can show themself. We will use the same idea that on the siamese_tutorial. As a video can have many frames, and we don't want to display them all, the show method will only display the 1st, middle and last images.

class ImageTuple[source]

ImageTuple(x=None, *rest) :: fastuple

A tuple of PILImages

ImageTuple(PILImage.create(fn) for fn in frames).show();

we will use the mid-level API to create our Dataloader from a transformed list.

class ImageTupleTfm[source]

ImageTupleTfm(seq_len=20, step=1) :: Transform

A wrapper to hold the data on path format

tfm = ImageTupleTfm(seq_len=5, step=1)
hammering_instance = instances_path[0]
hammering_instance
Path('/home/tcapelle/.fastai/data/UCF101-frames/Hammering/v_Hammering_g14_c02')
tfm(hammering_instance).show()
<AxesSubplot:>

with this setup, we can use the parent_label as our labelleing function

parent_label(hammering_instance)
'Hammering'
splits = RandomSplitter()(instances_path)

We will use fastaiDatasets class, we have to pass a list of transforms. The first list [ImageTupleTfm(5)] is how we grab the x's and the second list [parent_label, Categorize]] is how we grab the y's.' So, from each instance path, we grab the first 5 images to construct an ImageTuple and we grad the label of the action from the parent folder using parent_label and the we Categorize the labels.

ds = Datasets(instances_path, tfms=[[ImageTupleTfm(5)], [parent_label, Categorize]], splits=splits)
len(ds)
13320
dls = ds.dataloaders(bs=4, after_item=[Resize(128), ToTensor], 
                      after_batch=[IntToFloatTensor, Normalize.from_stats(*imagenet_stats)])

refactoring

get_action_dataloaders[source]

get_action_dataloaders(files, bs=8, image_size=64, seq_len=20, step=1, val_idxs=None, **kwargs)

Create a dataloader with val_idxs splits

dls = get_action_dataloaders(instances_path, bs=32, image_size=64, seq_len=5)
dls.show_batch()