top of page

Advanced Robot Vision with
Deep Learning in ROS

Course Overview

This advanced course delves into the application of deep learning techniques to solve complex robot vision problems. It builds upon the fundamentals of computer vision and robotics, focusing on state-of-the-art deep learning models and their integration with robotic systems.

​

Course Objectives

Upon completion of this course, students will be able to:

  • Understand deep learning fundamentals:

    • Neural network architectures (CNNs, RNNs, Transformers)

    • Training and optimization techniques (backpropagation, gradient descent)

  • Apply deep learning to robot vision tasks:

    • Object detection and instance segmentation

    • Semantic and instance segmentation

    • Optical flow estimation and motion tracking

    • 3D object detection and pose estimation

  • Integrate deep learning with robot vision systems:

    • Design and implement end-to-end vision systems for autonomous robots

    • Evaluate and improve the performance of vision systems

    • Address challenges in real-world robot vision applications

​

Course Curriculum

Module 1: Deep Learning Fundamentals for Computer Vision

  • Convolutional Neural Networks (CNNs):

    • Architecture and components

    • Feature extraction and classification

    • Transfer learning and fine-tuning

  • Recurrent Neural Networks (RNNs):

    • Sequence modeling and time series data

    • Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU)

  • Transformers:

    • Self-attention mechanism and encoder-decoder architecture

    • Vision Transformer (ViT) and its variants

​

Module 2: Object Detection and Instance Segmentation

  • Two-stage detectors:

    • Region-based Convolutional Neural Networks (R-CNN)

    • Faster R-CNN

    • Mask R-CNN

  • One-stage detectors:

    • YOLO

    • EfficientDet

  • Instance segmentation:

    • Mask R-CNN

    • Detectron2

​

Module 3: Semantic and Instance Segmentation

  • Fully Convolutional Networks (FCNs):

    • Pixel-wise classification

  • Encoder-Decoder architectures:

    • U-Net

    • DeepLabv3+

  • Instance segmentation:

    • Mask R-CNN

​

Module 4: Optical Flow and Motion Tracking

  • Optical flow estimation:

    • Traditional methods (Lucas-Kanade)

    • Deep learning-based methods (RAFT, PWC-Net)

  • Motion tracking:

    • Tracking-by-detection

    • DeepSORT

​

Module 5: 3D Object Detection and Pose Estimation

  • Monocular 3D object detection:

    • Deep learning-based methods (Mono3D, PVRCNN)

  • Stereo 3D object detection:

    • Stereo R-CNN

  • LiDAR-based 3D object detection:

    • PointPillars, PointRCNN

​

Module 6: Integration with Robotics (Stretch Goals)

  • Object detection and tracking: Implement a system to detect and track objects in real-time video streams.

  • Semantic segmentation: Segment different objects and scenes in images.

  • 3D object detection and pose estimation: Detect and estimate the pose of 3D objects from RGB-D or LiDAR data.

  • Object Detection and Tracking: Implement an object detection and tracking system using traditional or deep learning methods.

  • Robot Vision Application: Integrate a vision system with a robot to perform tasks like object grasping or autonomous navigation.

Enroll Now!

Thanks for submitting!

bottom of page