Ross Girshick (rbg)
Research Scientist
Facebook AI Research (FAIR)
r...@gmail.com
arXiv / Google scholar / cv

Research

I'm interested in algorithms for visual perception (object recognition, localization, segmentation, pose estimation, ...) and visual reasoning (answering complex queries, often in natural language, about images). My work explores topics in computer vision and machine/deep/statistical learning.

About me / bio

Ross Girshick is a research scientist at Facebook AI Research (FAIR), working on computer vision and machine learning. He received a PhD in computer science from the University of Chicago under the supervision of Pedro Felzenszwalb in 2012. Prior to joining FAIR, Ross was a researcher at Microsoft Research, Redmond and a postdoc at the University of California, Berkeley, where he was advised by Jitendra Malik and Trevor Darrell. His interests include instance-level object understanding and visual reasoning challenges that combine natural language processing with computer vision. He received the 2017 PAMI Young Researcher Award and is well-known for developing the R-CNN (Region-based Convolution Neural Network) approach to object detection. In 2017, Ross also received the Marr Prize at ICCV for "Mask R-CNN".

Selected tech reports

Non-Local Neural Networks
Xiaolong Wang, Ross Girshick, Abhinav Gupta, Kaiming He
arXiv preprint 2017 / bibtex
  @article{xiaolongwang2017nonlocal,
    Author    = {Xiaolong Wang and Ross Girshick and Abhinav Gupta and Kaiming He},
    Title     = {{Non-Local Neural Networks}},
    Journal   = {arXiv preprint arXiv:1711.07971},
    Year      = {2017}}
      
Learning to Segment Every Thing
Ronghang Hu, Piotr Dollár, Kaiming He, Trevor Darrell, Ross Girshick
arXiv preprint 2017 / bibtex
  @article{hu2017everything,
    Author    = {Ronghang Hu and Piotr Doll\'{a}r and Kaiming He and
                 Trevor Darrell and Ross Girshick},
    Title     = {{Learning to Segment Every Thing}},
    Journal   = {arXiv preprint arXiv:1711.10370},
    Year      = {2017}}
      
Learning by Asking Questions
Ilija Radosavovic, Piotr Dollár, Ross Girshick, Georgia Gkioxari, Kaiming He
arXiv preprint 2017 / bibtex
  @article{misra2017lba,
    Author    = {Ishan Misra and Ross Girshick and Rob Fergus and
                 Martial Hebert and Abhinav Gupta and Laurens van der Maaten},
    Title     = {{Learning by Asking Questions}},
    Journal   = {arXiv preprint arXiv:1712.01238},
    Year      = {2017}}
      
Data Distillation
Ilija Radosavovic, Piotr Dollár, Ross Girshick, Georgia Gkioxari, Kaiming He
arXiv preprint 2017 / bibtex
  @article{radosavovic2017dd,
    Author    = {Ilija Radosavovic and Piotr Doll\'{a}r and Ross Girshick and
                 Georgia Gkioxari and Kaiming He},
    Title     = {{Panoptic Segmentation}},
    Journal   = {arXiv preprint arXiv:1712.04440},
    Year      = {2017}}
      
Panoptic Segmentation
Alexander Kirillov, Kaiming He, Ross Girshick, Carsten Rother, Piotr Dollár
arXiv preprint 2017 / bibtex
  @article{kirillov2017panoptic,
    Author    = {Alexander Kirillov and Kaiming He and Ross Girshick and
                 Carsten Rother and Piotr Doll\'{a}r},
    Title     = {{Panoptic Segmentation}},
    Journal   = {arXiv preprint arXiv:1801.00868},
    Year      = {2017}}
      
Low-Shot Learning from Imaginary Data
Yu-Xiong Wang, Ross Girshick, Martial Hebert, Bharath Hariharan
arXiv preprint 2018 / bibtex
  @article{yuxiongwang2017imaginary,
    Author    = {Yu-Xiong Wang and Ross Girshick and
                 Martial Hebert and Bharath Hariharan},
    Title     = {{Low-Shot Learning from Imaginary Data}},
    Journal   = {arXiv preprint arXiv:1801.05401},
    Year      = {2018}}
      
Detecting and Recognizing Human-Object Interactions
Georgia Gkioxari, Ross Girshick, Piotr Dollár, Kaiming He
arXiv preprint 2017 / bibtex
  @article{gkioxari2017,
    Author    = {Georgia Gkioxari and Ross Girshick and
                 Piotr Doll\'{a}r and Kaiming He},
    Title     = {Detecting and Recognizing Human-Object Interactions},
    Journal   = {arXiv preprint arXiv:1704.07333},
    Year      = {2017}}
      
Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour
Priya Goyal, Piotr Dollár, Ross Girshick, Pieter Noordhuis, Lukasz Wesolowski, Aapo Kyrola, Andrew Tulloch, Yangqing Jia, Kaiming He
arXiv preprint 2017 / bibtex
  @article{goyal2017imagenet1hr,
    Author    = {Priya Goyal and Piotr Doll\'{a}r and Ross Girshick and
                 Pieter Noordhuis and Lukasz Wesolowski and Aapo Kyrola and
                 Andrew Tulloch and Yangqing Jia and Kaiming He},
    Title     = {Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour},
    Journal   = {arXiv preprint arXiv:1706.02677},
    Year      = {2017}}
      

ICCV 2017

Mask R-CNN
Kaiming He, Georgia Gkioxari, Piotr Dollár, Ross Girshick
IEEE International Conference on Computer Vision (ICCV), 2017 / bibtex
oral presentation
Marr prize winner (ICCV best paper award)
  @inproceedings{he2017maskrcnn,
    Author    = {Kaiming He and Georgia Gkioxari and
                 Piotr Doll\'{a}r and Ross Girshick},
    Title     = {{Mask R-CNN}},
    Booktitle = {Proceedings of the International
                 Conference on Computer Vision ({ICCV})},
    Year = {2017}}
      
Focal Loss for Dense Object Detection
Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, Piotr Dollár
IEEE International Conference on Computer Vision (ICCV), 2017 / bibtex
oral presentation
ICCV best student paper award
  @inproceedings{lin2017focal,
    Author    = {Tsung-Yi Lin and Priya Goyal and Ross Girshick and
                 Kaiming He and Piotr Doll\'{a}r},
    Title     = {{Focal Loss for Dense Object Detection}},
    Booktitle = {Proceedings of the International
                 Conference on Computer Vision ({ICCV})},
    Year = {2017}}
      
Inferring and Executing Programs for Visual Reasoning
Justin Johnson, Bharath Hariharan, Laurens van der Maaten, Judy Hoffman, Li Fei-Fei, C. Lawrence Zitnick, Ross Girshick
IEEE International Conference on Computer Vision (ICCV), 2017 / pytorch code / bibtex
oral presentation
  @inproceedings{johnson2017iep,
    Author    = {Justin Johnson and Bharath Hariharan and Laurens van der Maaten and
                 Judy Hoffman and Li Fei-Fei and C. Lawrence Zitnick and
                 Ross Girshick},
    Title     = {{Inferring and Executing Programs for Visual Reasoning}},
    Booktitle = {Proceedings of the International
                 Conference on Computer Vision ({ICCV})},
    Year = {2017}}
      
Low-shot Visual Recognition by Shrinking and Hallucinating Features
Bharath Hariharan, Ross Girshick
IEEE International Conference on Computer Vision (ICCV), 2017 / bibtex
  @article{hariharan2017lowshot,
    Author    = {Bharath Hariharan and Ross Girshick},
    Title     = {Low-shot Visual Recognition by Shrinking and
                 Hallucinating Features},
   Booktitle = {Proceedings of the International
                Conference on Computer Vision ({ICCV})},
    Year      = {2017}}
      

CVPR 2017

Learning Features by Watching Objects Move
Deepak Pathak, Ross Girshick, Piotr Dollár, Trevor Darrell, Bharath Hariharan
To appear in CVPR 2017 / bibtex
  @inproceedings{pathak2016motion,
    Author    = {Deepak Pathak and Ross Girshick and
                 Piotr Doll\'{a}r and Trevor Darrell and
                 Bharath Hariharan},
    Title     = {Learning Features by Watching Objects Move},
    Booktitle = {{CVPR}},
    Year      = {2017}}
      
Feature Pyramid Networks for Object Detection
Tsung-Yi Lin, Piotr Dollár, Ross Girshick, Kaiming He, Bharath Hariharan, Serge Belongie
To appear in CVPR 2017 / bibtex
  @inproceedings{lin2016fpn,
    Author    = {Tsung-Yi Lin and Piotr Doll\'{a}r and
                 Ross Girshick and Kaiming He and
                 Bharath Hariharan and Serge Belongie},
    Title     = {Feature Pyramid Networks for Object Detection},
    Booktitle = {{CVPR}},
    Year      = {2017}}
      
Aggregated Residual Transformations for Deep Neural Networks
Saining Xie, Ross Girshick, Piotr Dollár, Zhuowen Tu, Kaiming He
To appear in CVPR 2017 / bibtex / github (code)
  @inproceedings{xie2016groups,
    Author    = {Saining Xie and Ross Girshick and
                 Piotr Doll\'{a}r and Zhuowen Tu and
                 Kaiming He},
    Title     = {Aggregated Residual Transformations for
                 Deep Neural Networks},
    Booktitle = {{CVPR}},
    Year      = {2017}}
      

Selected older publications

All publications and tech reports (Google scholar)
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
Shaoqing Ren, Kaiming He, Ross Girshick, Jian Sun
Neural Information Processing Systems (NIPS), 2015
Python code / Matlab code / bibtex
@inproceedings{ren2015faster,
  Author = {Shaoqing Ren and Kaiming He and
            Ross Girshick and Jian Sun},
  Title = {Faster {R-CNN}: Towards Real-Time Object Detection
           with Region Proposal Networks},
  Booktitle = {Neural Information Processing Systems ({NIPS})},
  Year = {2015}
}
    
Fast R-CNN
Ross Girshick
IEEE International Conference on Computer Vision (ICCV), 2015
oral presentation
code / slides / bibtex
@inproceedings{girshick15fastrcnn,
  Author = {Ross Girshick},
  Title = {Fast {R-CNN}},
  Booktitle = {Proceedings of the International
               Conference on Computer Vision ({ICCV})},
  Year = {2015}
}
    
Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation
R. Girshick, J. Donahue, T. Darrell, J. Malik
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2014
oral presentation
arXiv tech report / supplement / code / poster / slides / bibtex
@inproceedings{girshick2014rcnn,
  Author    = {Ross Girshick and
               Jeff Donahue and
               Trevor Darrell and
               Jitendra Malik},
  Title     = {Rich feature hierarchies for accurate
               object detection and semantic segmentation},
  Booktitle = {Proceedings of the IEEE Conference on
               Computer Vision and Pattern Recognition ({CVPR})},
  Year      = {2014}}
    
This paper proposes R-CNN, a state-of-the-art visual object detection system that combines bottom-up region proposals with rich features computed by a convolutional neural network. At the time of its release, R-CNN improved the previous best detection performance on PASCAL VOC 2012 by 30% relative, going from 40.9% to 53.3% mean average precision.
Efficient Human Pose Estimation from Single Depth Images
J. Shotton, R. Girshick, A. Fitzgibbon, T. Sharp, M. Cook, M. Finocchio, R. Moore, P. Kohli, A. Criminisi, A. Kipman, A. Blake
IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 35, No. 12, Dec. 2013
abstract / bibtex
@article{shotton2013kinect,
  Author    = {J. Shotton and
               R. Girshick and
               A. Fitzgibbon and
               T. Sharp and
               M. Cook and
               M. Finocchio and
               R. Moore and
               P. Kohli and
               A. Criminisi and
               A. Kipman and
               A. Blake},
  Title     = {Efficient Human Pose Estimation
               from Single Depth Images},
  Volume    = {35},
  Number    = {12},
  Journal   = {Pattern Analysis and Machine Intelligence},
  Year      = {2013}}
    
An integrated description of the original Kinect pose estimation algorithm and our ICCV 2011 algorithm.
Object Detection with Discriminatively Trained Part Based Models
P. Felzenszwalb, R. Girshick, D. McAllester, D. Ramanan
IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 32, No. 9, Sep. 2010
abstract / PAMI code / latest code (voc-release5) / bibtex
@article{felzenszwalb2010dpm,
  Author    = {P. Felzenszwalb and
               R. Girshick and
               D. McAllester and
               D. Ramanan},
  Title     = {Object Detection with Discriminatively
               Trained Part Based Models},
  Volume    = {32},
  Number    = {9},
  Journal   = {Pattern Analysis and Machine Intelligence},
  Year      = {2010}}
    
Deformable part models (DPM).

See also, CACM Research Highlight:
Visual Object Detection with Deformable Part Models
P. Felzenszwalb, R. Girshick, D. McAllester, D. Ramanan
Communications of the ACM, no. 9 (2013): 97-105

Erdös = 3 (via two paths)


I like this website