1
0
1
1
0
0
1
0
1
1
1
1
0
0
1
Het Patel - Robotics Engineer

Hello, I'm

Het Patel

Robotics Engineer & AI Researcher

Building intelligent autonomous systems at UIUC

About Me

I'm a graduate student pursuing my Masters in Autonomy and Robotics at the University of Illinois Urbana-Champaign with a perfect 4.0 GPA. My passion lies in building intelligent autonomous systems that can navigate and interact with complex, unstructured environments.

With hands-on experience at leading organizations including ISRO, Samsung R&D, and Vinayak Technology, I've architected and deployed cutting-edge robotics solutions ranging from 300kg AMRs for construction sites to embedded systems for space missions.

I specialize in computer vision, sensor fusion, ROS/ROS2, and edge AI deployment, with a track record of delivering real-world impact through innovative robotics and AI solutions.

Education

University of Illinois Urbana-Champaign

Aug 2025 - Present

Masters in Autonomy and Robotics

GPA: 4.0/4.0

Relevant Coursework: Computer Vision, Principle of Safe Autonomy, Humanoid Robotics, Mobile Robotics

Click

Vellore Institute of Technology

Sept 2020 - May 2024

B.Tech in Electronics and Communication Engineering

GPA: 3.8/4.0

Relevant Coursework: Analog and Digital Electronics, Communication, Control Systems, Algorithms, Machine Learning

Click

Work Experience

Robotics Engineer

Vinayak Technology

Ahmedabad, GJ

July 2024 – August 2025
  • Architected and deployed a ROS2-based software stack for a 300kg payload Autonomous Mobile Robot (AMR) designed for unstructured construction environments
  • Implemented a real-time multi-sensor fusion pipeline integrating LiDAR, IMU, and RGB-D camera data to achieve robust performance in Hector SLAM and GMapping for precise localization and mapping
  • Developed and optimized path planning algorithms in construction sites, improving material delivery efficiency by 25% through dynamic obstacle avoidance and intelligent route management

Embedded Software Engineer

Indian Space Research Organization (ISRO)

Ahmedabad, GJ

May 2023 – Jan 2024
  • Engineered the core firmware for a Synthetic Aperture Radar's data storage subsystem, enabling high-speed reliable data acquisition from the Solid State Reader, critical for earth observation payloads used in latest NASA-ISRO SAR Mission
  • Contributed to the Gaganyaan human spaceflight mission by developing embedded software for the cabin display system, a key safety-critical HMI, leveraging OpenGL-ES and an Embedded Petalinux stack on a Xilinx Based Board

Research and Development Intern

Samsung Research

Bangalore, KA

Dec 2022 – June 2023
  • Architected a computer vision system for a Samsung IoT edge device for Refrigerator, deploying a suite of models (YOLOv4, R-CNN, BiT) optimized with TensorRT to achieve 96% accuracy in real-time food identification with <100ms latency
  • Engineered the supporting data pipeline to sync on-device detections with a cloud database, enabling automated inventory management and delivering a feature projected to reduce household food waste by up to 25%

Featured Projects

C.A.R.E. — Companion Autonomous Robotic Entity

Overview

C.A.R.E. bridges augmented reality and robotics to create an assistive robotic companion for individuals with mobility challenges. The system integrates Snap AR Spectacles with the Booster K1 robot, enabling intuitive human-robot interaction through an AR control interface. Built at Cal Hacks 12.0, this project demonstrates how cutting-edge AR and robotics technologies can work together to assist elderly and mobility-limited individuals.

What It Does

  • AR Control Interface: Users view live robot camera feeds through AR glasses with joystick and HUD overlay for intuitive control
  • Dual Operation Modes: Supports both autonomous patrol mode and manual control via head movements
  • AI-Powered Detection: Computer vision identifies and tracks people in real-time using Google Gemini AI
  • Low-Latency Communication: ROS2, WebSockets, and ngrok tunneling maintain responsive control with dual channels (low-bandwidth control, high-bandwidth video)
  • Head Tracking Navigation: User head orientation maps directly to robot movement commands for natural, hands-free control

Target Applications

  • Elderly and mobility-limited assistance - helping individuals navigate and interact with their environment
  • Security and patrol operations - autonomous monitoring with human oversight
  • Search and rescue missions - remote exploration in hazardous environments
  • Construction and infrastructure inspection - safe inspection of dangerous or hard-to-reach areas

Technologies Used

ROS2 Snap AR Spectacles Booster K1 Robot Google Gemini AI OpenCV WebSockets ngrok Snap Lens Studio Python

Technical Challenges & Solutions

  • Latency Reduction: Implemented dual WebSocket channels - one for low-bandwidth control signals and another for high-bandwidth video streaming to minimize control lag while maintaining video quality
  • Real-time Tracking: Integrated Gemini detection with ROS2 navigation stack for smooth human-following behavior with predictive movement
  • AR Interface Design: Created minimal, intuitive HUD elements that provide essential information without overwhelming the user's field of view
  • Sensor Calibration: Developed mapping system to accurately translate AR camera rotation data to robot motor angle commands for precise head-tracking control

Team

Built at Cal Hacks 12.0 by:

  • Het Patel - Booster Robot Control, Snap AR Integration, Vision Language Models
  • Sunny Deshpande - SLAM and Navigation
  • Atharv Mungale - Snap AR Software and Communication
  • Vetrivel Balaji - Snap AR Software and Communication

Language-Guided Humanoid Loco-Manipulation via Vision-Language-Action Models

Humanoid Loco-Manipulation Demo

Overview

Developed an advanced framework for humanoid loco-manipulation using Vision-Language-Action (VLA) models within the OmniGibson simulation environment. The system integrates state-of-the-art VLA models with SLAM-based navigation to enable language-guided task execution in realistic household environments, with a focus on waste sorting and object manipulation tasks across 20+ diverse household scenes from the BEHAVIOR-1K benchmark.

Problem Statement

Traditional robotic manipulation systems require extensive task-specific programming and struggle to generalize across different environments and tasks. Humanoid robots need to seamlessly combine locomotion (navigation) and manipulation (grasping, sorting) while understanding natural language commands. Existing systems lack the ability to perform zero-shot task execution in novel household environments with language-based instructions.

Key Features

  • Vision-Language-Action Models: Integration of GR00T N1 and OpenVLA models for end-to-end language-to-action translation
  • Loco-Manipulation: Unified framework combining SLAM-based navigation with precise object manipulation
  • Zero-Shot Task Execution: Ability to execute novel tasks without task-specific training
  • Semantic Scene Understanding: 85%+ accuracy in language-guided semantic reasoning
  • Realistic Simulation: 20+ photorealistic household scenes from BEHAVIOR-1K benchmark
  • Multi-Modal Perception: Integration of RGB-D cameras, proprioceptive sensors, and language inputs
  • Dynamic Object Interaction: Robust manipulation of various household objects with different properties

Technologies Used

GR00T N1 OpenVLA OmniGibson SLAM BEHAVIOR-1K ROS2 PyTorch Python

Technical Architecture

Vision-Language-Action Pipeline

  • Natural language command parsing and semantic understanding
  • RGB-D image processing for scene understanding and object detection
  • VLA model generates low-level robot actions from high-level commands
  • Real-time action execution with feedback control

SLAM-Based Navigation Stack

  • Continuous robot pose estimation in household environments
  • Dynamic map building of 3D environment structure
  • Collision-free trajectory generation for locomotion
  • Seamless coordination between navigation and manipulation tasks

Manipulation Framework

  • Intelligent grasp pose generation for diverse object geometries
  • Precise end-effector control for object manipulation
  • Compliant manipulation with force/torque sensing
  • Multi-step manipulation sequences (pick, place, sort)

Results & Performance

  • Semantic Reasoning Accuracy: 85%+ in language-guided task understanding
  • Navigation Success Rate: High success in SLAM-based navigation across 20+ scenes
  • Manipulation Precision: Robust grasping and sorting of various household objects
  • Zero-Shot Generalization: Effective task execution without scene-specific training
  • Scene Diversity: Tested across kitchen, living room, bedroom, and bathroom environments
  • Object Variety: Successfully manipulated 50+ different object types

Challenges & Solutions

  • Challenge: Language-to-action translation - Mapping high-level natural language to low-level robot actions
    Solution: Leveraged pre-trained VLA models (GR00T N1, OpenVLA) with transfer learning for household domain
  • Challenge: Loco-manipulation coordination - Simultaneous control of locomotion and manipulation systems
    Solution: Developed hierarchical control architecture with SLAM for navigation and VLA for manipulation, coordinated through ROS2
  • Challenge: Sim-to-real gap - Simulation behavior differs from real-world physics
    Solution: Used high-fidelity OmniGibson simulation with realistic object physics and rendering

Team Members

Het Patel,Vardhan Dongre,Sunny Deshpande

Automated Solar Panel Cleaning Robot: Design, Implementation and Software Control System

Team

Author: Het Patel (20BEC1165)

Advisor: Dr. Sheena Christabel Pravin, Assistant Professor Senior Grade

Institution: School of Electronics Engineering, Vellore Institute of Technology, Chennai

Date: April 2024

Overview

Engineered and deployed an autonomous robotic system for grid-aware solar panel maintenance, leveraging ESP32 microcontroller for edge computation and Qt6 framework for cross-platform control software. The project addresses the critical challenge of dust accumulation on solar panels, which causes daily energy losses of 4.4% annually and up to 20% during prolonged dry periods.

The robotic system features a precision-engineered stainless steel chassis with caterpillar track drive system, specialized roller brush cleaning mechanism, intelligent water delivery system, and automated wiper assembly, all controlled through an intuitive cross-platform dashboard providing real-time telemetry, performance monitoring, and autonomous path planning capabilities.

Problem Statement

Solar photovoltaic panels face significant efficiency degradation due to dust accumulation, bird droppings, and environmental debris. Studies show performance reductions of up to 32% within eight months in similar climatic regions. Traditional cleaning methods are inadequate:

  • Manual Cleaning: High labor costs, inconsistent schedules, safety risks at height
  • High-Pressure Water Spray: Requires 16-meter water head, excessive water consumption
  • Mechanical Methods: Can damage panel surface, reduce lifespan
  • Electrostatic Methods: Cannot remove particles >0.2mm, affected by panel tilt angle

With solar panel systems having 25-year lifespans and 6-year energy payback periods, maintaining peak efficiency is economically critical.

Key Features

Hardware Design

  • Robust Chassis: Stainless steel 304 Grade (1mm sheet) construction with CNC laser cutting, CNC bending, and TIG welding
  • Caterpillar Track Drive: Four 12V geared DC motors (50 RPM, 346.8 N-cm torque) with 40mm width track belts
  • Active Cleaning Mechanism: Roller brush assembly with 12V DC motor (100 RPM, 103 N-cm torque) in 3D-printed housing
  • Water Delivery System: Centrifugal pump (8W, 10 L/min) with four-nozzle distribution achieving 166.68 cm/s velocity
  • Automated Wiper: Rack-and-pinion mechanism with servo motor for surface drying
  • Braking System: Linear slider-crank mechanism with servo control for stable positioning on 10-15° inclines
  • Environmental Sensors: BME680 sensor for temperature, humidity, pressure, and air quality monitoring

Software & Control

  • Qt6 Cross-Platform Dashboard: Runs on Windows, macOS, and Linux with rich GUI components
  • UDP Communication: Sub-100ms latency with 99% packet delivery rate
  • Monitor Screen: Real-time energy generation charts, weather data visualization, battery tracking
  • Performance Analytics: Individual panel visualization with color-coded indicators and maintenance logs
  • Interactive Control: Direct robot control for motors, brake, pump, wipers, and brush
  • Autonomous Path Planning: Pre-defined waypoint navigation with serpentine pattern optimization
  • Firebase Integration: REST API for live sensor data streaming and cloud monitoring

Performance Achievements

  • 45% Water Consumption Reduction: Optimized nozzle flow rates versus manual methods
  • 15% Energy Yield Improvement: Through regular automated cleaning maintenance
  • 10-13 Minute Battery Runtime: 4200mAh LiPo battery (11.1V, 3S configuration)
  • 99% Communication Reliability: UDP protocol with acknowledgment achieving <100ms latency
  • Incline Navigation: Tested and verified operation on 10-15° solar panel inclinations

Technologies Used

Hardware

ESP32 Johnson DC Motors BTS7960 Driver L293D Driver BME680 Sensor LiPo Battery Stainless Steel 304 3D Printing (FDM)

Software

Qt6 Framework C++ Arduino/C UDP Protocol Firebase REST API WiFi

Manufacturing

CNC Laser Cutting CNC Bending TIG Welding CAD Design

Results & Performance

Quantitative Achievements

  • Water Efficiency: 45% reduction compared to manual/traditional automated methods
  • Energy Yield: 15% annual increase through consistent cleaning maintenance
  • Communication Latency: <100ms UDP round-trip time with 99% packet delivery
  • Battery Runtime: 10-13 minutes continuous operation on 4200mAh LiPo
  • Flow Rate: 10 L/min distributed across 4 nozzles (2.5 L/min per nozzle)
  • Incline Capability: Successfully tested on 10-15° solar panel inclinations
  • Brush Speed: 100 RPM with 103 N-cm torque for effective dirt removal
  • Drive Speed: 50 RPM with 346.8 N-cm torque per motor pair

Real-World Impact

  • Daily energy loss prevention: Mitigates 4.4% annual average degradation
  • Peak period protection: Prevents >20% efficiency drops during extended dry periods
  • Lifespan optimization: Maintains performance throughout 25-year panel lifetime
  • ROI improvement: Accelerates 6-year energy payback period through enhanced output

Technical Architecture

Hardware System

Chassis & Structure: Stainless steel 304 Grade (1mm sheet) with CNC laser cutting → CNC bending → TIG welding process chain for outdoor durability and corrosion resistance.

Drive Assembly: Four 12V DC motors in paired sets with 40mm caterpillar track belts and idler pulley tensioning. BTS7960 motor drivers (43A max current) provide power.

Cleaning System: Roller brush (12V, 100 RPM, 103 N-cm) with centrifugal pump (10 L/min) distributing water through 4 nozzles at 41.67 cm³/s each.

Auxiliary Mechanisms: Rack-and-pinion wiper system and slider-crank braking mechanism, both servo-actuated.

Software Architecture

Qt6 Dashboard: Multi-screen interface including Monitor (energy charts, weather data), Performance (panel analytics), Control (robot operation), and Automate (path planning) screens.

ESP32 Firmware: Main control loop receives UDP commands, executes motor/servo control, and sends status updates. Processes commands for MOVE_FORWARD, MOVE_BACKWARD, TURN_LEFT, TURN_RIGHT, BRUSH_ON, PUMP_ON, WIPER_ACTIVATE, and BRAKE_APPLY.

Path Planning: Serpentine pattern optimization with alternating left-to-right, right-to-left row traversal for minimum distance cleaning.

Challenges and Solutions

Challenge 1: Traction on Inclined Panels

Problem: Standard wheels slip on smooth, tilted solar panel surfaces (10-15° inclination).

Solution: Implemented caterpillar track belt system with soft rubber compound material and idler pulleys for tension maintenance.

Result: Achieved reliable operation on 10-15° inclinations with four 12V motors providing sufficient climbing force.

Challenge 2: Uniform Water Distribution

Problem: Single-point water delivery creates uneven coverage and wastes water.

Solution: Designed four-nozzle distribution system with calculated flow rates and 2x velocity amplification.

Result: Uniform coverage enabling 45% water consumption reduction while maintaining cleaning efficacy.

Challenge 3: Real-Time Communication

Problem: Wireless control can experience packet loss and delays, compromising safety.

Solution: Implemented UDP protocol with acknowledgment system (500ms timeout) and 3-attempt retry logic.

Result: Achieved <100ms latency with 99% packet delivery rate for safe real-time control.

Challenge 4: Cross-Platform Deployment

Problem: Solar installations use diverse operating systems requiring universal compatibility.

Solution: Selected Qt6 framework for native cross-platform C++ development.

Result: Successfully deployed on Windows and Linux with identical features and reliable UDP communication.

Future Work

  • Enhanced Sensors: Wind speed and particulate matter sensors for environmental monitoring
  • AI Integration: Machine learning for predictive cleaning schedule optimization
  • Computer Vision: Camera-based dirt detection for targeted cleaning verification
  • Solar-Powered Operation: Self-charging capability for extended autonomous operation
  • Multi-Robot Coordination: Fleet management for large-scale solar farm deployments
  • Edge Detection: Ultrasonic/IR sensors for panel boundary detection and fall prevention
  • Weather Integration: Automatic scheduling based on weather API forecasts
  • Cloud Platform: Web-based monitoring dashboard for remote access

Medication and Multipurpose Drone for Wildlife Conservation

Overview

Engineered a hexacopter drone platform designed specifically for Kaziranga National Park to track and protect endangered one-horned rhinos. The drone combines autonomous flight capabilities, computer vision for wildlife detection, and bio-inspired mechanisms for extended surveillance operations.

Problem Statement

Kaziranga National Park houses 66.7% (2,413 out of ~3,600) of the world's one-horned rhinos. Despite conservation efforts, rhinos face critical threats:

  • Poaching for horns despite government anti-poaching measures
  • Seasonal floods trapping rhinos without food for extended periods
  • Injuries requiring medical attention in remote, inaccessible areas
  • Limited ground-based surveillance capabilities across vast park areas

Key Features

Autonomous Flight System

  • Flight Endurance: 40.54 minutes continuous operation
  • Coverage Area: 500 hectares per mission
  • Flight Controller: Pixhawk 2.4.8 for precise autonomous control
  • Navigation: GPS-based waypoint navigation with SLAM capabilities

Bio-Inspired Design Features

  • Falcon-Like Claw Mechanism: Enables perching on tree branches to conserve battery during stationary surveillance
  • Bat-Like Sonar: 6x Ultrasonic sensors (HCSR-04) for 360° obstacle avoidance in dense forest environments
  • Power Management: Perching extends surveillance time by 3x through intelligent power conservation

Computer Vision & AI

  • Detection Model: YOLOv3 trained for rhino detection and tracking
  • Injury Assessment: Real-time computer vision to identify wounded rhinos
  • Surveillance: Continuous monitoring with automated alert system

Multiple Operation Modes

  • Medical Surveillance: Track injured rhinos and guide rescue teams to their location
  • Anti-Poaching Security: Continuous surveillance to deter poaching activities
  • Safari Assistance: Live video feed for tourists and location services for tour guides
  • Population Tracking: Automated rhino counting and movement pattern analysis

Mobile Application

  • Live video streaming from drone camera
  • Real-time rhino location tracking on map
  • User location services for safari groups
  • Remote drone control and mission planning

Technical Specifications

Hardware Components

  • Frame: Custom hexacopter design (8.4 kg total weight)
  • Motors: 6x TITAN T5010 300KV BLDC motors
  • ESC: 6x 30A Electronic Speed Controllers
  • Propellers: 18-inch with 6.5 pitch
  • Battery: TATTU 30,000mAh 6S 25C LiPo
  • Flight Controller: Pixhawk with buzzer and arming switch
  • Onboard Computer: Raspberry Pi 4B+ (4GB RAM)
  • Camera: FPV camera with real-time video transmission
  • Sensors: GPS (Ublox Neo M8N), 6x Ultrasonic sensors, LiDAR
  • Communication: VTX (10km range), 5.8GHz FPV antenna
  • Gripper: Custom servo-controlled claw mechanism

Performance Metrics

  • Thrust-to-Weight Ratio: 1.2:1
  • Total Thrust: 10,088.4 grams
  • Current Draw: 44.4A at cruising speed
  • Flight Time: 40.54 minutes calculated endurance
  • Project Cost: ₹116,444.22 (~$1,400 USD)

Software Architecture

Mission Planning

  • ArduPilot for autonomous waypoint navigation
  • Python-based mission planner for coverage optimization
  • Integration with Kaziranga population density heat maps
  • Automated path planning for maximum coverage efficiency

Computer Vision Pipeline

  1. Real-time video capture from FPV camera
  2. YOLOv3 object detection for rhino identification
  3. Horn detection for injury assessment
  4. Automated alert generation for park officials
  5. GPS coordinate logging for rescue operations

ROS2 Integration

  • Sensor data fusion (LiDAR + GPS + IMU)
  • SLAM for real-time localization and mapping
  • Nav2 for path planning and obstacle avoidance
  • Multi-sensor coordination for autonomous flight

Technical Challenges & Solutions

  • Limited Flight Endurance: Implemented bio-inspired falcon claw perching mechanism allowing the drone to land on tree branches, conserving battery while maintaining surveillance. Extended effective operation time by 3x.
  • Dense Forest Navigation: Deployed bat-inspired ultrasonic sensor array (6 sensors) for 360-degree obstacle detection, enabling safe autonomous flight through dense vegetation.
  • Wildlife Detection Accuracy: Custom-trained YOLOv3 model on rhino dataset with specific focus on horn detection for injury assessment, achieving high detection rates in varied lighting conditions.
  • Remote Medical Assistance: Integrated GPS tracking with real-time video feed, enabling park officials to locate injured rhinos and dispatch medical teams with precise coordinates.

Impact & Results

  • Anti-Poaching: Continuous drone surveillance acts as deterrent for poaching activities
  • Faster Response: Real-time injured rhino detection enables immediate medical intervention
  • Conservation Data: Automated population tracking and movement pattern analysis
  • Tourism Enhancement: Live wildlife feed improves safari experience without disturbing animals
  • Cost-Effective: ₹116,444 system provides capabilities of much more expensive commercial drones

Future Enhancements

  • Extended Coverage: Upgrade to higher capacity batteries for longer flight times and larger area coverage
  • Advanced LiDAR: 3D terrain mapping and enhanced obstacle avoidance
  • Multi-Drone Coordination: Swarm-based surveillance for complete park coverage
  • Thermal Imaging: Night vision capabilities for 24/7 monitoring
  • Automated Medication Delivery: Payload system for remote medicine administration
  • AI-Based Analysis: Automated rhino counting and health assessment
  • Weather Resistance: Waterproofing for monsoon season operations
  • Expansion: Adapt system for other wildlife sanctuaries and endangered species

Technologies Used

ROS2 Python YOLOv3 OpenCV ArduPilot Pixhawk Raspberry Pi SLAM LiDAR GPS Navigation Ultrasonic Sensors Computer Vision Bio-inspired Robotics Embedded Systems Mobile App Development

Academic Context

Developed as a Control Systems course project (November 2022) demonstrating practical application of:

  • PID control for stable hexacopter flight
  • Sensor fusion and state estimation
  • Autonomous navigation and path planning
  • Real-world robotics system integration for wildlife conservation

Adaptive Vehicle Control Based on Pedestrian Behavior

Overview

Developed a predictive autonomous vehicle control framework that dynamically adjusts vehicle speed and behavior in real-time based on pedestrian behavioral cues, moving beyond traditional reactive obstacle avoidance systems. The system addresses the fundamental gap between reactive and predictive autonomous navigation by anticipating pedestrian intent rather than simply reacting to proximity.

Technologies Used

ROS2 GEM e2 Vehicle LiDAR (Ouster) RGB-D Camera (OAK-D) YOLOv11 DBSCAN Stanley Controller PID Control GNSS Python Sensor Fusion

Problem Statement

Traditional AV navigation systems treat pedestrians as static obstacles outside of the road during cruising, relying on simple reactive braking once they cross. This reactive approach cannot handle complex pedestrian interactions or anticipate human intent. Our project developed a control framework that dynamically adjusts vehicle speed and control in real-time based on pedestrian behavior cues, rather than just proximity.

Key Features

  • Multi-Sensor Perception: LiDAR and RGB-D camera fusion for robust pedestrian detection
  • Pedestrian Behavior Prediction: Trajectory prediction, motion forecasting, and Time-to-Collision (TTC) calculation
  • Intelligent State Machine: Multi-phase decision system with CRUISE, STOP_YIELD, SLOW_CAUTION, and CREEP_PASS states
  • Real-time Adaptation: Dynamic speed and path adjustment based on predicted pedestrian behavior
  • Safety Controller: Emergency braking and velocity control with PID feedback
  • Stanley Controller: Precise lateral control for path following
  • Sensor Fusion: Weighted fusion of LiDAR (0.8 distance, 0.3 direction) and Camera (0.2 distance, 0.7 direction) data

Technical Architecture

Perception Stack

  • LiDAR Processing: Voxelization, ground filtering, outlier removal, DBSCAN clustering, tracking with EMA smoothing, geometric and motion-based human detection
  • RGB-D Processing: YOLOv11 object detection, depth extraction, pedestrian pose transformation to ego frame
  • Sensor Fusion: Time synchronization, data association with Euclidean distance matching (2.0m threshold), weighted averaging

Prediction Module

  • Pedestrian trajectory buffering and smoothing
  • Motion prediction using historical trajectory data
  • Ego vehicle trajectory prediction
  • Time-to-Collision (TTC) calculation

Planning & Control

  • High-Level Decision: Three-phase state machine (Critical Checks, Context, Recovery)
  • Safety Controller: Speed mapping (CRUISE → 5 m/s, SLOW → 2.5 m/s) and emergency braking
  • Stanley Controller: Minimize heading and cross-track error for lateral control
  • Velocity PID: Smooth acceleration/deceleration for longitudinal control

Results & Performance

Experiment Type Experiments Success Rate
Cruise Mode 5 100% (5/5)
No Pedestrian w/ Sign 10 100% (10/10)
Crossing Pedestrian w/ Sign 10 90% (9/10)
Stationary Pedestrian 5 100% (5/5)
Crossing Pedestrian 10 90% (9/10)
Pedestrian Walking Along Road 10 80% (8/10)
Vehicle Stanley Control 8 87.5% (7/8)

Overall System Success: ~91% across all scenarios

Challenges & Solutions

  • Challenge: Human movement is inherently uncertain and unpredictable
    Solution: Implemented probabilistic trajectory prediction with motion smoothing and TTC-based early warning
  • Challenge: Sensor fusion with different modalities (LiDAR vs Camera)
    Solution: Developed weighted fusion approach leveraging LiDAR's distance accuracy and Camera's directional precision
  • Challenge: Real-time decision making with safety constraints
    Solution: Designed hierarchical state machine with critical safety checks, context-aware behavior, and recovery mechanisms

Team Members

Het Patel (hcp4) • Sunny Deshpande (sunnynd2) • Ansh Bhansali (anshb3) • Keisuke Ogawa (ogawa3)

Open-World Semantic-Based Zero-Shot 6D Pose Estimation Using SAM3 And FoundationPose

Open-World Zero-Shot 6D Pose Estimation Demo

Overview

Developed a novel open-vocabulary 6D object pose tracking framework that extends NVIDIA's FoundationPose architecture to enable language-guided, zero-shot tracking of arbitrary objects without pre-registered CAD models. By integrating Moondream2 vision-language model, SAM-3 segmentation, and on-the-fly 3D mesh generation from Objaverse-XL, the system achieves real-time, occlusion-robust pose estimation with dynamic target switching via natural language prompts. This breakthrough enables robotic manipulators to seamlessly transition between tracking different objects (e.g., "grasp the red bottle" to "now grasp the blue cup") in unstructured environments without reinitialization.

Technologies Used

FoundationPose (CVPR 2024) SAM-3 Moondream2 VLM Objaverse-XL TripoSR YOLOv8 PyTorch 2.0/2.7 CUDA 11.8/12.6 NVDiffRast Gemini API Python

Problem Statement

Traditional 6D pose estimation methods face critical limitations that restrict their deployment in real-world robotic manipulation scenarios. NVIDIA's FoundationPose, while achieving zero-shot inference for unseen objects, requires pre-provided CAD models and manual mask annotation in the initial frame. It often fails in heavily occluded scenes (especially LineMOD dataset), where errors propagate through the mesh-matching and refinement stages. Furthermore, no existing system supports real-time, language-guided, multi-object pose estimation with dynamic target switching—a crucial capability for responsive robotic manipulation in novel environments.

Key Features

  • Open-Vocabulary Detection: Lightweight Moondream2 VLM for edge-compatible semantic scene understanding
  • Zero-Shot Mesh Generation: On-the-fly 3D proxy generation via Objaverse-XL retrieval (10M+ assets) and TripoSR
  • Language-Driven Segmentation: SAM-3 integration for text-prompt-based, occlusion-robust target segmentation
  • Dynamic Target Switching: Seamless mid-task object switching via natural language without reinitialization
  • Hierarchical Mesh Acquisition: Three-tier system (benchmark CAD → Objaverse retrieval → TripoSR generation)
  • Real-Time Tracking: Render-and-compare architecture with transformer-based pose refinement
  • Mesh Dictionary Caching: Asynchronous mesh fetching and caching to eliminate redundant queries
  • Multi-Object Support: Concurrent tracking of multiple objects with individual semantic labels

Technical Architecture

Stage 1: Semantic Scene Analysis

  • Moondream2 VLM generates comprehensive object inventory from RGB stream
  • Produces discrete candidate list (e.g., "red bottle," "blue cup," "black keyboard")
  • Gemini API fallback for semantic query enhancement when detection fails
  • Enables prompt-based object specification even for BOP unseen objects

Stage 2: On-the-Fly 3D Mesh Generation

  • Primary: Load ground-truth CAD models when available (benchmark datasets)
  • Retrieval: Query Objaverse-XL database via language-guided similarity search
  • Generation: TripoSR generates candidate mesh from single observed image
  • Selection: Mesh manager scores candidates based on silhouette, depth, and IoU alignment
  • Asynchronous fetching and caching eliminates redundant queries during multi-object tracking

Stage 3: Language-Driven Segmentation

  • SAM-3 accepts natural language prompts (e.g., "the red apple") to output pixel-level masks
  • Outperforms traditional R-CNN detectors in heavy clutter and occlusion
  • Temporal consistency for video tracking with frame-to-frame coherence
  • Simultaneous multi-object segmentation via distinct text prompts

Stage 4: Unified 6D Pose Estimation & Tracking

  • Render-and-compare: FoundationPose aligns retrieved mesh with video observation
  • Pose scoring: Uniform sampling, composite scoring (IoU + Depth + Silhouette)
  • Iterative refinement: Transformer-based pose refinement for sub-frame accuracy
  • Dynamic switching: Instant mask + mesh updates enable seamless target transitions

Results & Performance

Overall Performance (44 Evaluations, 12 Scenes, 18 Objects, 787 Frames)

Metric Mean ± Std
ADD AUC 71.61% ± 39.10%
ADD-S AUC 88.31% ± 28.56%
Rotation Error 44.23° ± 58.38°
Translation Error 2.44cm ± 6.13cm
Processing Time 13.08s ± 0.15s

Best Performing Objects (Top 5 by ADD-S AUC = 100%)

Object ADD AUC Rotation (°) Translation (cm)
Power Drill (4 scenes) 100.0% 2.0° ± 0.4° 0.24cm
Bleach Cleanser (3 scenes) 100.0% 2.4° ± 0.8° 0.34cm
Banana (2 scenes) 100.0% 5.3° ± 0.4° 0.40cm
Mustard Bottle (2 scenes) 90.0% 19.7° ± 25.6° 0.22cm
Pudding Box (1 scene) 100.0% 2.7° 0.24cm

Symmetric Object Performance (Demonstrating ADD-S Effectiveness)

Object ADD AUC ADD-S AUC Translation
Master Chef Can (4 scenes) 83.0% 99.0% 0.55cm
Bowl (2 scenes) 8.0% 100.0% 0.38cm
Mug (2 scenes) 98.0% 100.0% 0.48cm
Tuna Fish Can (4 scenes) 98.0% 98.0% 0.57cm

Key Insight: ADD-S metric successfully handles symmetric objects, converting Bowl's 8% ADD → 100% ADD-S, demonstrating robust pose recovery despite rotational ambiguity

Runtime Performance Breakdown

  • Average Processing Time: 13.08 seconds per frame
  • SAM-3 Segmentation: ~12.5 seconds (95.6% of total time)
  • FoundationPose Tracking: ~0.58 seconds (near real-time after initialization)
  • First Frame Registration: 13-16 seconds (includes mask generation + initial pose alignment)

Comparison with State-of-the-Art

Method Zero-Shot Objects Text Prompt ADD-S AUC (%)
PoseCNN No No 75.4
DenseFusion No No 82.3
FoundationPose Yes No 89.2
Ours (SAM3+FP) Yes Yes 88.31 ± 28.56

Unique Contribution: Only method combining zero-shot capability with text-prompted segmentation for dynamic, interactive pose tracking. Achieves competitive 88.31% ADD-S AUC across 18 diverse objects without CAD model pre-registration.

Challenges & Solutions

  • Challenge: Direct 3D reconstruction (YOLOv8 + SAM + TripoSR) produced low-quality meshes with artifacts
    Solution: Shifted to retrieval-based strategy leveraging Objaverse-XL's 10M+ professionally designed meshes for clean, artifact-free geometry
  • Challenge: Symmetric objects (cylindrical items) exhibited rotational ambiguity
    Solution: Adopted ADD-S metric accounting for pose equivalence, achieving 76.5% ADD-S AUC despite 111° rotation error
  • Challenge: SAM-3 mask generation dominated processing time (~12.5 sec/frame)
    Solution: Modular architecture with FoundationPose achieving near real-time after initialization; future work targets GPU-accelerated SAM-3
  • Challenge: Retrieved meshes may differ in exact proportions from real instances
    Solution: Depth-based scale estimator adjusts mesh dimensions; composite scoring (IoU + Depth + Silhouette) selects best candidate

Team Members

Het Patel (hcp4) • Sunny Deshpande (sunnynd2) • Ansh Bhansali (anshb3) • Keisuke Ogawa (ogawa3)

Course: CS543 Computer Vision, University of Illinois Urbana-Champaign

Date: November 2024

Mobile Manipulator for Mars Missions

Overview

Designed and simulated a mobile manipulator robot for Mars exploration missions, focusing on sample collection and terrain navigation in challenging environments.

Technologies Used

ROS2 Simulation Mobile Manipulation Gazebo

Constructa-1 Construction Robot

Overview

Architected and deployed a ROS2-based software stack for a 300kg payload Autonomous Mobile Robot (AMR) designed for unstructured construction environments with LiDAR and sensor fusion.

Technologies Used

ROS2 SLAM AMR LiDAR Sensor Fusion

Solid State Recorder for SAR Missions

Overview

Engineered core firmware for Synthetic Aperture Radar data storage subsystem at ISRO, enabling high-speed reliable data acquisition critical for earth observation payloads.

Technologies Used

Embedded Systems Firmware C/C++ SAR ISRO

Gaganyaan Cabin Display System

Overview

Contributed to ISRO's human spaceflight mission by developing embedded software for cabin display system, a key safety-critical HMI using OpenGL-ES and Petalinux.

Technologies Used

OpenGL-ES Embedded Linux HMI Petalinux ISRO

Detecting Food Item and Quantity

Overview

Architected computer vision system for Samsung IoT edge device deploying YOLOv4, R-CNN, BiT optimized with TensorRT achieving 96% accuracy in real-time food identification with <100ms latency.

Technologies Used

YOLOv4 TensorRT IoT Computer Vision Samsung

Regular Class Aircraft

Regular Class Aircraft Demo

Team

Team: Aviators International Team of VIT

Competition: SAE INDIA Aero Design Competition

Period: 2022-2023

Overview

As part of the Aviators International Team of VIT, I contributed to the design and development of a Regular Class Aircraft for the prestigious SAE INDIA Aero Design Competition. This project challenged us to create an aircraft capable of meeting strict performance requirements while maintaining stability and safety throughout its flight envelope.

Our team successfully engineered a parasol wing configuration aircraft optimized for maximum lift generation, constructed from lightweight yet robust materials including plywood and aluminum, enhanced through additive manufacturing techniques. The aircraft demonstrated exceptional performance characteristics including stable flight, precise control, and efficient power management.

Design Objectives

The competition requirements demanded strict adherence to several critical performance parameters:

  • Takeoff Performance: Complete takeoff sequence within 100 feet of runway
  • Flight Stability: Achieve and maintain stable flight within 400 feet altitude
  • Maneuverability: Demonstrate precise turning capability with controlled banking
  • Landing Safety: Execute safe landing procedures with minimal ground roll

Technical Specifications

Wing Configuration

  • Design Type: Parasol wing configuration for optimal lift-to-drag ratio
  • Aerodynamic Optimization: High-lift airfoil selection for low-speed performance
  • Wing Placement: Elevated mounting above fuselage for improved ground clearance and stability
  • Structural Integration: Efficient strut-braced design minimizing weight while maximizing strength

Materials & Construction

  • Primary Structure: Plywood for main structural components offering excellent strength-to-weight ratio
  • Reinforcement Elements: Aluminum components at critical stress points and connection interfaces
  • Advanced Manufacturing: Additive manufacturing (3D printing) for complex geometries and custom fittings
  • Surface Finish: Lightweight covering material for aerodynamic smoothness

Power System

  • Battery Configuration: Lithium Polymer (Li-Po) battery pack for high energy density
  • Propulsion: Single electric motor with optimized propeller selection
  • Flight Endurance: Approximately 14 minutes of continuous flight operation
  • Thrust Performance: Motor and propeller combination providing sufficient thrust for all competition requirements
  • Power Management: Electronic speed controller (ESC) for efficient motor control and battery protection

Key Features

  • Parasol Wing Advantage: Provides superior visibility from cockpit and improved stability compared to low-wing designs
  • Lightweight Construction: Optimized material selection achieving minimum weight without compromising structural integrity
  • Electric Propulsion: Clean, quiet operation with excellent throttle response and controllability
  • Modular Design: Easy assembly and disassembly for transport and maintenance
  • Competition Ready: Designed to meet all SAE INDIA Aero Design Competition specifications

Performance Achievements

  • Successfully met takeoff distance requirement of 100 feet
  • Achieved stable flight within 400 feet altitude envelope
  • Demonstrated precise turning and maneuvering capabilities
  • Completed safe landing procedures consistently
  • 14-minute flight endurance exceeding minimum mission requirements

Technologies & Skills

Aircraft Design Aerodynamics CAD Modeling Additive Manufacturing Structural Analysis Flight Dynamics Electric Propulsion Composite Materials SAE Competition Team Collaboration

Learning Outcomes

This project provided invaluable experience in:

  • Aerospace Engineering Fundamentals: Practical application of aerodynamic principles, structural mechanics, and flight dynamics
  • Design Process: Complete aircraft design lifecycle from conceptual design through detailed engineering to flight testing
  • Manufacturing Techniques: Hands-on experience with traditional woodworking, metalworking, and modern additive manufacturing
  • System Integration: Coordinating multiple subsystems (structure, propulsion, control surfaces) into cohesive aircraft design
  • Competition Experience: Working under strict requirements, timelines, and performance specifications
  • Team Dynamics: Collaborating with multidisciplinary team members to achieve common goals

Bank Cheque Processing System

Overview

Built automated bank cheque processing system using OCR and image processing techniques for efficient cheque verification and data extraction.

Technologies Used

OCR Image Processing OpenCV Python

Cozmo Clench

Cozmo Clench Robot Demo

Competition Details

Event: Cozmo Clench - Techfest, IIT Bombay

Year: 2022

Team: VIT Robotics Team

Overview

Developed an Arduino-based manually controlled rover robot for the Cozmo Clench robotics competition at Techfest, IIT Bombay. The robot was designed to navigate an arena, grip and manipulate blocks, and place them in designated target zones while overcoming various obstacles and challenges.

The competition challenged teams to design and build a manually controlled robot capable of navigating a 3000mm x 2500mm arena, gripping and lifting colored blocks, placing blocks in specific target zones, and operating within strict size and power constraints.

Key Features

Robot Design

  • Compact Dimensions: Robot designed within 300mm x 200mm x 300mm size constraints
  • Gripper Mechanism: Custom-designed claw mechanism for secure block manipulation
  • Sturdy Chassis: Robust frame construction for stability during block transport
  • Omnidirectional Movement: Four-wheel drive system for precise maneuvering

Control System

  • Arduino-based Control: Microcontroller-based architecture for motor control and sensor integration
  • Wireless Operation: Remote control system for manual robot operation
  • Power Management: Efficient 24V onboard power supply system
  • Motor Controllers: PWM-based motor drivers for smooth speed control

Mechanical Components

  • Gripper Assembly: Servo-controlled claw with adjustable grip strength
  • Drive Train: DC geared motors providing adequate torque for block manipulation
  • Structural Materials: Combination of metal and 3D-printed components
  • Sensor Integration: IR/ultrasonic sensors for obstacle detection

Technical Specifications

  • Maximum Dimensions: 300mm x 200mm x 300mm
  • Power Supply: 24V DC onboard battery system
  • Control: Manual wireless control with Arduino-based receiver
  • Gripper: Servo-controlled claw mechanism
  • Sensors: IR/Ultrasonic for obstacle detection
  • Motors: 4x DC geared motors for drive, 1-2 servos for gripper

Challenges and Solutions

Precise Block Gripping

Challenge: Achieving consistent and reliable grip on blocks of varying sizes

Solution: Designed adaptive gripper with rubber padding and adjustable servo angles for optimal grip force

Stability During Block Transport

Challenge: Robot tipping when lifting blocks due to center of gravity shift

Solution: Implemented low center of gravity design with counterweight and wide wheelbase for enhanced stability

Accurate Zone Placement

Challenge: Positioning blocks precisely in target zones under time pressure

Solution: Developed intuitive control mapping and practiced maneuvering patterns for efficient placement

Power Efficiency

Challenge: Battery drain during extended competition runs

Solution: Optimized power consumption through efficient motor selection and smart power management code

Competition Performance

  • Successfully completed block manipulation tasks
  • Demonstrated reliable gripper operation
  • Achieved consistent navigation and obstacle avoidance
  • Showcased robust mechanical and electrical design

Technologies & Skills

Arduino Embedded Systems C/C++ Robotics CAD Design 3D Printing Motor Control Wireless Communication Techfest IIT Bombay Mechanical Design

Learning Outcomes

This project provided hands-on experience in:

  • Robotics Design: End-to-end robot development from concept to competition
  • Embedded Systems: Arduino programming and hardware interfacing
  • Mechanical Engineering: CAD design, 3D printing, and mechanism development
  • Control Systems: Manual control interface and motor control algorithms
  • Team Collaboration: Working with cross-functional team on tight deadlines
  • Problem Solving: Rapid prototyping and iterative design improvements
  • Competition Experience: Performing under pressure in competitive environment

Home Automation Using Augmented Reality

Overview

Developed AR-based home automation system enabling intuitive control of IoT devices through augmented reality interface for seamless smart home management.

Technologies Used

Augmented Reality IoT Unity C#

Weather Prediction Using Extended Kalman Filter

Overview

Implemented Extended Kalman Filter for accurate weather prediction and state estimation from noisy sensor data with improved forecasting accuracy.

Technologies Used

Kalman Filter State Estimation Python MATLAB

University of Illinois Urbana-Champaign

Masters in Autonomy and Robotics

August 2025 - Present

Program Overview

Pursuing a Master's degree in Autonomy and Robotics at one of the world's leading engineering institutions. The program focuses on advanced robotics systems, autonomous navigation, computer vision, and safe AI deployment in real-world environments.

Academic Performance

4.0
Current GPA

Maintaining perfect academic standing while engaging in cutting-edge research and coursework.

Relevant Coursework

Computer Vision
Principle of Safe Autonomy
Humanoid Robotics
Mobile Robotics

Request Academic Transcripts

For official academic transcripts and records, please send a request via email.

Request UIUC Transcripts

Vellore Institute of Technology

Bachelor of Technology in Electronics and Communication Engineering

September 2020 - May 2024

Program Overview

Completed a comprehensive undergraduate program in Electronics and Communication Engineering, with a strong focus on embedded systems, control systems, and machine learning applications. Gained hands-on experience through multiple internships at leading organizations including ISRO and Samsung R&D.

Academic Performance

3.8
Final GPA

Graduated with distinction, demonstrating excellence across core engineering and advanced elective courses.

Relevant Coursework

Analog and Digital Electronics
Communication Systems
Control Systems
Algorithms
Machine Learning

Request Academic Transcripts

For official academic transcripts and records from VIT, please send a request via email.

Request VIT Transcripts

Get In Touch

I'm always open to discussing new opportunities, collaborations, or exciting robotics and AI projects.

Location

Urbana, IL

Schedule a Meeting

Book a time slot