Regions with Convolutional Neural Network Features
This project is maintained by rbgirshick
Created by Ross Girshick, Jeff Donahue, Trevor Darrell and Jitendra Malik at UC Berkeley EECS.
Acknowledgements: a huge thanks to Yangqing Jia for creating Caffe and the BVLC team, with a special shoutout to Evan Shelhamer, for maintaining Caffe and helping to merge the R-CNN fine-tuning code into Caffe.
R-CNN is a state-of-the-art visual object detection system that combines bottom-up region proposals with rich features computed by a convolutional neural network. At the time of its release, R-CNN improved the previous best detection performance on PASCAL VOC 2012 by 30% relative, going from 40.9% to 53.3% mean average precision. Unlike the previous best results, R-CNN achieves this performance without using contextual rescoring or an ensemble of feature types.
R-CNN was initially described in an arXiv tech report and will appear in a forthcoming CVPR 2014 paper.
If you find R-CNN useful in your research, please consider citing:
@inproceedings{girshick14CVPR,
Author = {Girshick, Ross and Donahue, Jeff and Darrell, Trevor and Malik, Jitendra},
Title = {Rich feature hierarchies for accurate object detection and semantic segmentation},
Booktitle = {Computer Vision and Pattern Recognition},
Year = {2014}
}
R-CNN is released under the Simplified BSD License (refer to the LICENSE file for details).
| Method | VOC 2007 mAP | VOC 2010 mAP | VOC 2012 mAP |
|---|---|---|---|
| R-CNN | 54.2% | 50.2% | 49.6% |
| R-CNN bbox reg | 58.5% | 53.7% | 53.3% |
$ make matcaffe
$CAFFE_HOME: $ export CAFFE_HOME=$(pwd)
rcnn
$ cd rcnn
external/caffe, so create a symlink: $ ln -sf $CAFFE_HOME external/caffe
rcnn folder): $ matlab
R-CNN startup done followed by the MATLAB prompt >>.>> rcnn_build() (builds liblinear and Selective Search). Don't worry if you see compiler warnings while building liblinear, this is normal on my system.>> key = caffe('get_init_key'); (expected output is key = -2)Common issues: You may need to set an LD_LIBRARY_PATH before you start MATLAB. If you see a message like "Invalid MEX-file '/path/to/rcnn/external/caffe/matlab/caffe/caffe.mexa64': libmkl_rt.so: cannot open shared object file: No such file or directory" then make sure that CUDA and MKL are in your LD_LIBRARY_PATH. On my system, I use:
export LD_LIBRARY_PATH=/opt/intel/mkl/lib/intel64:/usr/local/cuda/lib64
The quickest way to get started is to download precomputed R-CNN detectors. Currently we have detectors trained on PASCAL VOC 2007 train+val and 2012 train. Unfortunately the download is large (1.5GB), so brew some coffee or take a walk while waiting.
From the rcnn folder, run the data fetch script: $ ./data/fetch_data.sh.
This will populate the rcnn/data folder with caffe_nets, rcnn_models and selective_search_data. See rcnn/data/README.md for details.
Let's assume that you've downloaded the precomputed detectors. Now:
$ cd rcnn. $ matlab.
R-CNN startup done when MATLAB starts, then you probably didn't start MATLAB in rcnn directory.>> rcnn_demo
Let's use PASCAL VOC 2007 as an example. The basic pipeline is:
extract features to disk -> train SVMs -> test
You'll need about 200GB of disk space free for the feature cache (which is stored in rcnn/feat_cache by default; symlink rcnn/feat_cache elsewhere if needed). It's best if the feature cache is on a fast, local disk. Before running the pipeline, we first need to install the PASCAL VOC 2007 dataset.
Download the training, validation, test data and VOCdevkit:
$ wget http://pascallin.ecs.soton.ac.uk/challenges/VOC/voc2007/VOCtrainval_06-Nov-2007.tar $ wget http://pascallin.ecs.soton.ac.uk/challenges/VOC/voc2007/VOCtest_06-Nov-2007.tar $ wget http://pascallin.ecs.soton.ac.uk/challenges/VOC/voc2007/VOCdevkit_08-Jun-2007.tar
Extract all of these tars into one directory, it's called VOCdevkit.
$ tar xvf VOCtrainval_06-Nov-2007.tar $ tar xvf VOCtest_06-Nov-2007.tar $ tar xvf VOCdevkit_08-Jun-2007.tar
It should have this basic structure:
VOCdevkit/ % development kit VOCdevkit/VOCcode/ % VOC utility code VOCdevkit/VOC2007 % image sets, annotations, etc. ... and several other directories ...
I use a symlink to hook the R-CNN codebase to the PASCAL VOC dataset:
$ ln -sf /your/path/to/voc2007/VOCdevkit /path/to/rcnn/datasets/VOCdevkit2007
>> rcnn_exp_cache_features('train'); % chunk1
>> rcnn_exp_cache_features('val'); % chunk2
>> rcnn_exp_cache_features('test_1'); % chunk3
>> rcnn_exp_cache_features('test_2'); % chunk4
Pro tip: on a machine with one hefty GPU (e.g., k20, k40, titan) and a six-core processor, I run start two MATLAB sessions each with a three worker matlabpool. I then run chunk1 and chunk2 in parallel on that machine. In this setup, completing chunk1 and chunk2 takes about 8-9 hours (depending on your CPU/GPU combo and disk) on a single machine. Obviously, if you have more machines you can hack this function to split the workload.
Now to run the training and testing code, use the following experiments script:
>> test_results = rcnn_exp_train_and_test()
Note: The training and testing procedures save models and results under rcnn/cachedir by default. You can customize this by creating a local config file named rcnn_config_local.m and defining the experiment directory variable EXP_DIR. Look at rcnn_config_local.example.m for an example.
It should be easy to train an R-CNN detector using another detection dataset as long as that dataset has complete bounding box annotations (i.e., all instances of all classes are labeled).
To support a new dataset, you define three functions: (1) one that returns a structure that describes the class labels and list of images; (2) one that returns a region of interest (roi) structure that describes the bounding box annotations; and (3) one that provides an test evaluation function.
You can follow the PASCAL VOC implementation as your guide:
imdb/imdb_from_voc.m (list of images and classes)imdb/roidb_from_voc.m (region of interest database)imdb/imdb_eval_voc.m (evalutation)As an example, let's see how you would fine-tune a CNN for detection on PASCAL VOC 2012.
rcnn directory>> imdb_train = imdb_from_voc('datasets/VOCdevkit2012', 'train', '2012');
>> imdb_val = imdb_from_voc('datasets/VOCdevkit2012', 'val', '2012');
>> rcnn_make_window_file(imdb_train, 'external/caffe/examples/pascal-finetuning');
>> rcnn_make_window_file(imdb_val, 'external/caffe/examples/pascal-finetuning');
Run fine-tuning with Caffe
$ cp finetuning/voc_2012_prototxt/pascal_finetune_* external/caffe/examples/pascal-finetuning/
external/caffe/examples/pascal-finetuning
/path/to/rcnn with the actual path to where R-CNN is installed):GLOG_logtostderr=1 ../../build/tools/finetune_net.bin \ pascal_finetune_solver.prototxt \ /path/to/rcnn/data/caffe_nets/ilsvrc_2012_train_iter_310k 2>&1 | tee log.txt
Note: In my experiments, I've let fine-tuning run for 70k iterations, although with hindsight it appears that improvement in mAP saturates at around 40k iterations.