FlexiCup: Wireless Multimodal Suction Cup with Dual-Zone Vision-Tactile Sensing

1Tsinghua University, 2Pengcheng Laboratory

System Overview

FlexiCup System Overview

Conventional suction cups lack sensing capabilities for contact-aware manipulation in unstructured environments. FlexiCup is a multimodal suction cup with wireless electronics that integrates dual-zone vision-tactile sensing within a single optical system. The central zone dynamically switches between vision and tactile modalities via LED illumination control, while the peripheral zone provides continuous spatial awareness. The modular mechanical design supports both vacuum (sustained-contact adhesion) and Bernoulli (contactless lifting) actuation while maintaining the identical dual-zone sensing architecture, demonstrating sensing-actuation decoupling where sensing and actuation principles are orthogonally separable. Key specifications include 41.5 N maximum normal force and 640×480@30 Hz image streaming over Wi-Fi.

Hardware Design

System Architecture and Integration

The system adopts a modular layered architecture integrating pneumatic actuation, electronic control, and optical sensing. The bottom housing provides the pneumatic interface and contact surface with a central groove for the PDMS membrane; its airway geometry differs between vacuum and Bernoulli configurations while the membrane mounting interface remains consistent. The top housing mounts a PCB assembly centered on an ESP32S3 microcontroller (3.7 V, 300 mAh battery, 12.5 μH wireless charging coil), streaming 640×480 images at 30 Hz over Wi-Fi. The electronic module remains identical across both suction modes, enabling wireless operation with electrical decoupling from the robot arm.

System Architecture

Vision-Tactile Sensing System

A single 180° fisheye camera captures two functional zones: the central zone enables switchable vision-tactile sensing via illumination control, while the peripheral zone maintains continuous spatial awareness. In vision mode, LEDs remain inactive, allowing ambient light through the membrane for object detection. In tactile mode, LEDs illuminate the membrane internally, imaging deformations induced by contact forces. Multimodal fusion with multi-head attention (8 heads, 512-d) achieves 100% object recognition accuracy across 13 categories, outperforming vision-only (82.5%) and tactile-only (46.7%) approaches.

Dual-Zone Sensing

Vision-Tactile Sensing System

The peripheral zone maintains environmental awareness through continuous vision for approach planning and obstacle detection, while the central zone provides high-resolution contact imaging via LED-controlled illumination switching. The system switches modalities in real-time through the ESP32S3 microcontroller with dynamic camera exposure and gain adjustments, enabling simultaneous spatial context and precise contact detection for robust manipulation.

PDMS Membrane Design

PDMS Membrane Design

The dual-layer membrane employs a PDMS base layer (30:1 ratio, 70°C × 4h) providing mechanical compliance that captures surface details, and a semitransparent reflective layer (Ag:PDMS 100:1, 70°C × 0.5h) providing the reflective surface for photometric tactile imaging. Four modular bottom configurations (I–IV) implement varying membrane diameters: Configurations I–II are optimized for vacuum operation with deformable objects and tactile sensitivity, while III–IV support Bernoulli operation. Adhesion force ranges from sub-Newton to over 40 N across configurations.

Reconfigurable Dual-Mode Suction Mechanisms

Dual-Mode Suction Mechanisms CFD Simulation Results

Mechanical configurations: vacuum (left) and Bernoulli (right) suction mechanisms

FlexiCup supports vacuum and Bernoulli suction through modular bottom housing reconfiguration. Vacuum mode generates adhesion through negative pressure, requiring sustained contact that induces continuous membrane deformation for dense tactile feedback. Bernoulli mode creates contactless lifting through outward airflow, supporting visual perception and tactile verification during positioning. Both modes share the identical dual-zone sensing architecture and control pipelines, confirming sensing-actuation decoupling where mode selection depends on task requirements rather than sensing constraints. The pneumatic system employs a vacuum pump (750 W, 140 L/min, −90 kPa max) and an air compressor (800 W, 65 L/min, 0.8 MPa) for the respective modes.

System Characterization

Sensor-Based Force Characterization

To characterize suction performance, we conducted automated force measurements using a calibrated 6-axis force/torque sensor on a smooth acrylic surface at −80 kPa vacuum pressure. The test protocol included normal pull-off tests measuring detachment force and horizontal drag tests measuring shear resistance, with each test repeated 20 times for statistical reliability.

Sensor-Based Force Characterization

Force measurements at −80 kPa: normal force (pull-off) and tangential force (drag) averaged over 20 trials. Inset: experimental setup with 6-axis force/torque sensor.

Results: The averaged force profiles reveal transient behaviors during attachment and detachment. Measurements demonstrate a mean maximum normal force of 41.5 N and shear force of 8.34 N, exceeding theoretical predictions (F = P × A ≈ 33.2 N for nominal diameter 23 mm) due to structural compliance increasing the effective sealing area under vacuum.

Non-Contact Verification: Wafer Handling

To experimentally verify non-contact handling capability, we conducted semiconductor wafer pick-up experiments comparing vacuum and Bernoulli suction modes. The two modes serve complementary roles: vacuum provides strong sustained-contact adhesion, while Bernoulli enables contactless lifting essential for delicate surfaces.

Bernoulli Mode

Vacuum Mode

Wafer Pick-up Comparison

Surface condition after manipulation: vacuum leaves visible contact smudging, Bernoulli maintains pristine surface

Force Measurement

Measured vertical force: Bernoulli ≈ 0 N (non-contact), Vacuum ≈ 3.5 N (contact required)

The vacuum-picked wafer exhibited visible smudging (~3.5 N peak contact force), while the Bernoulli-picked wafer remained pristine with near-zero contact force. This confirms Bernoulli's contactless handling capability for delicate surfaces, complementing vacuum's stronger adhesion for general manipulation.

Dynamic Performance Evaluation

Dynamic stress tests at 3.0 rad/s joint speed and 3.0 rad/s² acceleration demonstrate stable suction and sensing performance while holding objects with varying mass distributions. Demonstrations include an orange and a water-filled bottle, where water sloshing causes significant dynamic mass redistribution, yet the suction grasp remains stable with no sensing degradation.

Fast Motion with Orange

Fast Motion with Water Bottle

The demonstrations confirm stable grasping during aggressive motions, robustness to dynamic mass redistribution, and consistent sensing performance without image degradation or modality switching issues.

Applications

Modular Perception-Driven Grasping

Experimental Setup and Framework

To validate that the dual-zone sensing architecture operates effectively across both actuation principles, we conduct modular perception-driven grasping using identical sensing and control pipelines with only pneumatic parameter adjustments. The pipeline integrates YOLOv8n for peripheral target detection and ResNet-34 for central tactile verification. On LEGO board manipulation with 25%, 50%, and 75% obstacle coverage (180 total trials), vacuum mode achieves 90.0% and Bernoulli mode achieves 86.7% mean success rate. The comparable success rates validate sensing-actuation decoupling: performance differences stem from actuation mechanisms rather than sensing limitations.

End-to-End Contact-Aware Manipulation

Learning Framework

A diffusion policy framework processes multimodal observations (workspace camera, peripheral view, central view, system state) through parallel ResNet-18 encoders. A multi-head attention mechanism (8 heads, 512 dimensions) coordinates central and peripheral features, correlating contact details with spatial context during approach-to-manipulation transitions. The action chunking mechanism (8-step history, 48-step horizon) generates coordinated sequences controlling robot joints, illumination switching, and pneumatic valve state. Training uses AdamW optimizer with cosine annealing over 500 epochs.

Task Demonstrations Flowchart

Inclined Transport Task

Inclined transport involves positioning above the surface, searching for suitable contact regions, adjusting tilt angle guided by tactile feedback to match inclined surfaces (5°, 10°, or 15°), verifying contact, and performing secure lifting. The system achieves 73.3% success rate (150 demonstrations, 30 evaluation trials), with multi-head attention providing 13% improvement over the configuration without attention by coordinating dual-zone information during vision-tactile transitions.

Orange Extraction Task

Orange extraction consists of transparent cover removal in vision mode, realignment above the orange, then Tactile-aware grasping with LED-enabled contact detection. The system achieves 66.7% success rate (100 demonstrations, 30 evaluation trials). Ablation studies confirm that central zone sensing is critical for contact detection (removal reduces to 33.3%), while peripheral view contributes essential spatial context for approach planning (removal reduces to 36.7%). Workspace camera alone achieves 0% on this task, confirming that intimate sensing is necessary for contact-critical manipulation.

A BC-RNN (Behavioral Cloning with RNN) baseline trained on the same demonstration data failed both tasks (0% success), frequently becoming stuck or failing to coordinate modality switching. This is consistent with findings from the original diffusion policy work, where recurrent baselines exhibited similar stuck behaviors with multimodal action distributions, confirming the advantage of diffusion policy's action chunking mechanism for coordinated suction manipulation.

BC-RNN baseline on inclined transport task: stuck behavior near target

BC-RNN baseline on orange extraction task: failed modality coordination