Using Novel Visuo-Tactile Sensors to Recognize Contact in Robotic Manipulation



We humans can easily perform our day-to-day tasks by picking up and changing the object’s position in confined placed with the help of the advanced dexterity capabilities of our hands. From childhood we have been unknowingly developing the skills to manipulate objects of various shapes, sizes, and materials. However Robotic manipulation is not that simple. In cluttered environments like homes, it requires stable grasps, precise placement and robustness against external contact. A growing trend is the development of soft hands that can conform to an object’s shape, absorb unexpected forces at contact and compensate for load change during manipulation (as compared to the existing and popularly used gelsight and gel slim sensors on grippers) and I will be exploring this forward in the coming sections. My primary focus is on the types of visual, tactile, and visuo-tactile sensors used in robotic manipulation which result in a contact rich-manipulation. I conducted an experiment to prove that the Soft bubble visuo-tactile sensor can be used to determine if a robot is in contact with its environment. Soft bubble sensor was chosen as it can deform around a contacting object more freely and drastically than other sensors. The data set was prepared by bringing the soft bubble sensor placed on a fixed place manipulator in contact with a 3-D printed hex tool, replicating conditions like that of a robot in contact with tight spaces in home.

Keywords: Robotic manipulation, Visuo-tactile sensing, Machine learning, Binary classification


The first robotic manipulator was constructed in the 1960s. In earlier days, robotic manipulation consisted of carefully prescribed movements that a robot would execute with no ability to adapt to a changing environment. For example, in early factory settings, robot arms followed predetermined trajectories and assumed that objects would always appear at the same place. From the 1990s onwards, researchers aimed to increase the robustness of object manipulation at all levels.

Now, robots can automatically generate movement sequences, drawing on artificial intelligence and automated reasoning. They can handle errors and uncertainty in sensing at runtime, can adapt their trajectory to retrieve objects at different locations, and are skilled in picking up and manipulating objects in repetitive and familiar settings.

To enable multi-purpose manipulation, roboticists are designing human-like hands capable of using tools. Learning to manipulate in a real-world setting is expensive, time-consuming and laborious. So, researchers use a simulation environment. However, simulators are not yet advanced to simulate realistic robot environments. As a result, there is a pursue of two different roads in which:

  • The robots are made to pick up skills by observing humans perform complex manipulation tasks.
  • Researchers construct databases of real object manipulation, with the goal to better inform the simulators and generate examples that are as realistic as possible. Robotic manipulation is still a poor proxy for human dexterity. To date, no robot can easily hand wash dishes, button a shirt or peel a potato. Robots can only adapt to some variations in the object properties. For example, they cannot take out the right key from a bunch of keys. Thus, unlike industrial robots that can operate with certainty about their tasks and surroundings, robots designed for homes and other unstructured environments must be able to cope with large imprecision in their knowledge of the surrounding environment such as clutter, occlusions, variable lighting conditions and never seen objects. As a result, the sensors used in the making of the robot (tactile, visuo-tactile and visual sensors) are becoming more advanced and the roboticists are testing the algorithm in simulation.
Fig1: Soft-bubble sensor output (top right) is used to stack wine glasses (left)1

Although, recent approaches to tactile sensing include a camera that captures the deformations of a reflective
soft surface as it contacts the world. However, light from the external world does not reach the camera due to the sensor’s skin opacity, preventing its use as a traditional vision sensor. Thus, we augment the approach of novel Visio-tactile sensors.

Visio- tactile sensors convert signals of the contact deformation into images obtaining a close-up 3D view of the location where manipulation contacts occur can be challenging, particularly in confined spaces and cluttered environments. A vision-based approach is practical as it is easier to manufacture such sensors with adjustments to the robots. The soft bubble sensor addresses the need of flexibility as it can work on a large range of free-form membrane shapes and can withstand rough treatment. The fabrication process is simple, the soft bubble sensor is lightweight, and its components are easily replaced. Thus, making it a suitable sensor for low-cost, low-payload robots.

Literature Review

Soft bubble is a highly compliant dense geometry tactile sensor for robot manipulation. It is a new kind of a tactile sensor that combines the advantages of a highly compliant elastomeric structure with the ability to sense the detailed geometric features of contacting objects. The sensor captures deformation of a thin, flexible air-filled membrane using an off-the-shelf depth sensor. It consists of three main functional components: an elastic membrane sensing surface, an airtight hull that allows pressurization of the membrane and an internal depth sensor2. The resulting sensor is highly compliant, lightweight, robust to continued contact, and outputs a high-resolution depth image that is ideal for manipulation applications.

High -resolution tactile sensors, such as Gel Sight, Gel Slim and Finger Vision use cameras to gather large amounts of data over relatively small contact areas. Gel Sight uses precise internal lighting and photometric stereo algorithms to generate height maps of contacting geometry3.

Soft bubble draws influence from these camera-based tactile sensors, particularly on its use of an off-the-shelf depth camera and an opaque membrane which drapes sensed object surfaces in consistent color and reflectance properties. Mechanically, Soft bubble can deform around a contacting object more freely and drastically than the gel-bases sensors above. As a result of using a self-contained depth sensor, precisely placed illumination and 3D reconstruction algorithms are not needed to capture deformation. This allows the sensor to work on a large range of free-form membrane shapes. However, the sensor currently senses geometry only, i.e., Extracting contact forces requires additional modeling and analysis. This means that they can neither be used for human-scale object manipulation nor compensate for externally induced shear4.

The fabrication process is cheap, simple and repeatable and the use of air over gel also makes the sensor lightweight, making it a suitable sensor for low-cost, low-payload robots5. Employing the resilience of latex, the sensor membrane can withstand rough treatment while worn components are easily replaced. It is also well-suited for contact heavy manipulation as the compliant, high-friction membrane surface offers large contact patches and form closure via deformation around an object. The quality and resolution of the pair of images produced by Soft bubbles on contact are more than sufficient to enable tracking and pose6.

Grasping is a most basic and popular application of tactile sensing. Grasps are not only object dependent but also robot dependent (as shown by Fig. 1). As the number of degrees of freedom of the hand increases, its complexity of control also increases. Tactile sensing aims to increase the success ratio and the stability of the grasp by controlling the grasp parameters like a grasp pose, width and force to hold an object7. Roboticists first had to learn the reason for executing the grasp before they planned to solve the problem of how to grasp an object as the grasp differs in different scenarios .For example: a knife may be held normally but when they are used for cleaning or cutting, their grasp changes .

Such events would also be difficult to detect with external vision due to occlusion. Data for training, and a common approach is to generate the data from trial-and-error experiments8. Testing the algorithm in simulation first and then refining the learning on a real platform or else this may damage the robot and the process becomes tedious.

Fig 2: Dimensioned sensor assembly of the depth sensor, PMD Pico flex. All dimensions in mm.4

Soft bubble sensor integrates multiple tactile perception capabilities to enable robust manipulation in tightly constrained environments9. Mechanically, Soft bubble grippers achieve robust grasps since they are malleable, are easy to build due to their air-filled membrane design and are durable (as seen in Fig 2). They are also closely related to other visuo tactile sensors with the key difference that the generated depth maps are directly measured by the internal imaging sensor and are not inferred.

The earlier Soft-bubble prototype was a single sensor used as an end effector. However, Soft bubble grippers have been designed to both task-based and perceptual requirements. To achieve tasks in constrained domestic environments, the Soft-bubbles are designed to be attached to a standard parallel gripper, to interact with human-scale objects, and to fit into tight household spaces i.e., a sink or dishwasher. Perceptually driven improvements include the use of a shorter-range depth sensor as depicted in Fig-3.

Fig 3: Soft-bubble gripper used to manipulate household objects in tight spaces. All Dimensions in [mm.]2

A batch of Bubble-sensor fingers can be inexpensively assembled in as little as two hours with no more than a FDM printer, laser cutter, scissors, glue and a paintbrush10. The ToF sensors in the bubbles provide both depth and IR over independent channels. If a grasp is stable using thresholds on the bubble pressure differential as well as the finger velocity. This eliminates confounding issues like image blur.
Tactile sensing seems to be still experimental in robotics due to the following reasons:

  • Difficulty to install on robotic hands as the robot hands vary in finger surface (flat and curved) but sensor needs to be placed in a limited space. The sensor is also expensive and requires capabilities that are hard to achieve.
  • Installing this sensor results in the increase in the complexing of wiring and power supply along with processing circuits. As a result, Programming becomes complicated and implementing human-quality tactile sensors is impossible with the state-of-the-art.
  • Sometimes the sensor can be broken due to the interaction with the external force. As a result, the Maintenance becomes complicated as periodic repairs would be needed as the tactile sensors are not easy to repair.
  • Many tactile sensors are not compatible with the others as there are variations in the modality and spatial and temporal resolutions in the sensor. The sensing principles also vary resulting in the decrease of reusability of software.
  • Many robotic manipulations can be implemented without tactile sensors.

Thus, due to these reasons, the researchers are unable to use the tactile sensors and accumulate the knowledge related to them.

The fundamental difficulties and open questions in modeling compliant contact mechanics have limited the adoption and deployment of soft tactile sensors. How high a spatial resolution is necessary for tactile sensing? Is either geometry or force sensing more important than the other?

Although data driven methods have been employed as attempts to overcome the modeling difficulties, there remains a lack of highly compliant mechanisms which also incorporate high-resolution contact sensing. However, due to their ability to directly transfer interactions at the contacting surface, tactile sensors have the potential to be predominant when vision and other exteroceptive modalities are occluded or incapable of sensing due to lack of sufficiently salient features10. For Example, as seen in Fig.- 4,

Fig 4: The various stages of the in-hand pose estimation pipeline.

a) A plastic mug being grasped in the Soft bubble gripper system.
b) The concatenated point-cloud produced from the depth images from each Soft-bubble sensor computed in the gripper frame.
c) The contact-patch filtered concatenated point-cloud.
d) The estimated in hand mug pose from the proximity pose estimator.

As we have already seen, Fingertip Gel Sight sensor measures the 3D geometry and contact force information with high special resolution; however, the arduous fabrication of this sensor severely restricted its application. Similarly, Gel Sight and Gel Slim, while cost- effective, have been difficult to manufacture and assemble in-masses or by inexperienced users due to their reliance on complex by-hand fabrication techniques that are incompatible with retail fabrication services. Furthermore, a problem of the Gel Sight sensor is also that it does not detect the force distribution, while the human tactile system can measure both the surface geometry and the contact force distribution.

On the other hand, in soft bubble sensors, perception methods presented are computationally efficient enough to enable closed-loop, real-time control of complex tasks. The proximity field method used in our pose estimation framework is a novel contribution that is promising for achieving tractable depth-based tracking.

My focus will be on the Soft bubble design used in grippers with advanced technology which enables multiple forms of perception; resulting in determining whether a robot is in contact with the environment or not.



Python 3 was the programming language used for this project. This programming language was chosen because it is a very popular & powerful language used in robotics. Python also has several cloud and web-based implementations, so all the programming could be done from private browser instead of installing anything further on the computer, which could potentially run into hardware or dependency issues.

The web-based implementation of Python 3 used for this project is Google Colaboratory. This tool allows Google Drive to be connected to a cloud-based integrated development environment (IDE) that is free and easy to use. Having my IDE connected to my Google Drive was important because a large amount of data had to be stored and Google Drive was a great way to store that data.

Google Colaboratory also has all the dependencies needed to complete this project pre-installed. NumPy1, Matplotlib2, and Scikit-Learn3 are the main libraries used to complete this project. NumPy is an open-source library that allows users to efficiently perform numerical calculations in Python. Matplotlib is another open-source library for creating plots with data in Python. NumPy and Matplotlib are usually used together because Matplotlib supports using data held in NumPy arrays. Finally, Scikit-Learn is the backbone for my machine learning methods. Scikit-Learn is built using NumPy and Matplotlib, as well as some other Scientific Python libraries that have not been used in this project. Scikit-Learn provides simple and efficient tools for predictive data analysis that are open source. These 3 libraries have been used heavily throughout my project.

The choice of these tools contributes to the scientific rigor of the study as Google Colaboratory is very transparent and allowed me to collaborate with my mentor to track necessary edits and investigate the process by which the code has arrived. Python also has powerful data visualization libraries such as Matplotlib which gave accurate plots, NumPy helped me understand the data patterns and Scikit-Learn provided accurate data analysis. This way the code was examined step by step.

Alternative tools like Jupyter notebook were also considered. Jupyter notebook require no internet connection and each line of code is processed faster individually. However, the notebooks are stored in JSON file format, so many times tracking changes and collaborating, using version control tools like Git, gets complicated, resulting in errors.

Data Acquisition

The dataset had been prepared by experimenting the sensor in constrict spaces (constructed in lab) and generating images when in contact and not in contact. Soft bubble sensors, a highly compliant, easy to build and lightweight tactile sensor in contact with a 3-D printed hex tool. Steps like adjusting the lighting and keeping the camera at a proper angle were taken to ensure the dataset’s quality and representativeness. To mitigate a possibility of bias introduced, the variety of the data set was kept in mind and made sure that the sensor readings are not overrepresented in the dataset. A data path had to be established so that the files can be located. This facilitated in understanding how a robot is in contact with the flexible environment like that of the tight spaces in home.

Fig 5: Contact and Non-Contact images shown by the Soft bubble sensor in respect to its environment.
Fig 6: Visualizing the train-test split with a test size parameter of 0.5.
Fig 7: Visualizing the train-test split with a test size parameter of 0.98.

Data Loading

All the images have loaded data and they are gray-scaled 140/175 (width/height) image data from the Soft bubble sensors. The counter shows that there were 100 contact images and 100 non-contact images. These images were depicted as arrays of pixels that could be illustrated by various color schemes (RGB, RGBA, HSV, grayscale, etc.) They were labeled by pre-sorting the images into folders, of contact and non-contact and then using the folder name as the label. The .png files are read into python by coding on Google Colaboratory.

Data Preprocessing

A code was written to tell the unique instances of each label. A bunch of subplots were made based on height and width. One of the axes was the contact axis and the other one was the non- contact axis. This visualization of subplots helped in handling the outliers. To determine their exclusion, the reasons for getting the outliers in the first place were understood and their impact on analysis was kept in mind.

Machine Learning Methods

SGD classifier method had been used to turn these representations into a binary prediction which showed 1(contact)or 0(non-contact). The data had been accessed by variable X and label as Y.SGD classifier was chosen as it is efficient in dealing with large datasets. It processes one data point at a time due to which it requires less memory compared to other classifiers which processes the entire dataset in one go. SGD classifier makes the model more robust to noisy data and can quickly adapt to new patterns in the data impacting the model’s generalizability.

Visualizing the Train and Test Split

Every time the randomness was going to be the same, impact of the parameter could be checked. Thus, it could be deciphered if changing the test-size parameter impacted the results or not.

The images were in RGBA and to efficiently train on them, they had to be converted to gray scale. Hence, the images were represented using binary prediction instead of pixels.


A standard scaler was used to print the shape of data, telling the amount of data present in the training set. A classification accuracy of around 98% was found which helped me to determine that the sensors were correctly in contact with the environment.

Overfitting was addressed by cross-validation in which the training set was separated into k-subsets evenly and each time the data was separated, one of the subsets out of k subsets were taken out as testing data. This not only helped in improving the overall performance and accuracy of the model but also helped to generalize better with unseen data.

It was important to take out 50-50 data during training and testing set as work had to be done on a large dataset and a 50-50 split helped to balance the amount of data used for both training and evaluation as well as analyzed data quickly.

When there were 100% contact images in the training set and 0% contact images in test set, It was automatically understood that the testing was not to identify contact in the test set. Similarly, when there was 0% training images and 100% test images in non-contact, the testing was never to identify non -contact in the training set. Hence, I had to have a 50-50 split between the training and test set, to test on what had been trained on and tested on (shown in Table 1.)

Test size# Contact training samples# Noncontact training samplesClassification Accuracy
Table 1: Classification results for varying test size parameters

The following graphs shows the train-test split with a test size parameter of 0.5. (in Chart 1.)
There are 47%counts of contact with train set (shown in blue) consisting of 100 photos and more than 50% counts of contact with test set (shown in orange) consisting of 100 photos. Whereas there are around 50% counts of non-contact with train set (shown in blue) consisting of 100 photos and 48.7% counts of contact with test set (shown in orange) consisting of 100 photos.

The following graphs shows the train-test split with a test size parameter of 0.98. (in Chart 2.)
There are 50% counts of contact with train set (shown in blue) consisting of 4 photos and similarly 50% counts of contact with test set (shown in orange) consisting of 196 photos. Whereas there are around 50% counts of non-contact with train set (shown in blue) consisting of 4 photos and 50% counts of contact with test set (shown in orange) consisting of 196 photos


The machine learning method was able to classify with 100% accuracy on such few training samples showing that the sensors were in-contact with the environment. It was noticed that when there was no contact with the environment, there was a large depth in the bubble membrane and the image was black in color. But in the case of with contact, there was a much less depth created, and the outline of the hex cam could be seen. This helped to conclude that color was the distinguishing factor as in the case of with-contact the image was white and in the case of no-contact the image was black.

However, under different lighting conditions and object textures this conclusion might not be so robust. Shadows, reflections, and ambient lighting changes may influence the color perception. Also, textures might introduce patterns that could be misinterpreted by the model. Thus, the model’s might accuracy decrease under such conditions. As a result, implications for the reliability of the results in more complex scenarios are that the model can face failure due to lack of generalization.

Thresholding with this data set could also be a good way to solve this problem. In thresholding, the pixel values are assigned corresponding to the provided threshold values, giving me a 100% accuracy. However, if the data looked different, then the thresholding method would be more prone to error and may fail as color might not be the distinguishing factor and the data set would also not be so clear.


Experiments had some limitations like:

During coding, multiple errors were encountered that obstructed the process of data conversion into binary prediction. For instance, there were times when getting a proper data reading was a challenge, due to missing few logic steps in code and miscalculations. Also, there were some struggles to bridge the gap between the input (single array or arrays) and output (1-contact or 0-non-contact) of the data and get the 50-50 train test split. Coding some steps like transferring the dataset into the private google drive was also a problem.

These errors often gave no image which led to contradictions and ambiguity. But rewriting the codes and checking the sensor outcomes gave reliable data.


This study highlighted the effectiveness of a visuo-tactile sensor like the Soft Bubble for robotic manipulation. The steps that were carried out in the experiment were:

  • Importing some of the tools
  • Accessing and loading the data
  • Processing the data and analyzing the labels
  • Plotting examples of my data on the X and Y axes
  • Splitting the train and test data

It is concluded that 98% accuracy was found when there were 2 train sets and 2 test sets of data. The machine learning method was able to distinguish based on color whether the image was of contact or no-contact. This way, it can be successfully deduced that visuotactile sensors like the Soft Bubbles can be used to determine whether a robot is in contact with its environment or not, keeping in mind that the lighting was kept normal under all conditions and the texture was smooth and regular.

To understand what limits Soft Bubble’s performance and hinders their transition from laboratory to real-world conditions, future studies should focus on understanding the principles behind the design and operation of soft robots. They can build upon this research by:

  • Static and dynamic modeling of the soft bubble sensor so that contact location and pressure on the membrane may be estimated based on sensed deformation and membrane physics.
  • With the addition of dots or other trackable features on the inner surface of the membrane, shear forces and moments can be judged as well.
  • Modeling will also allow the sensor contact mechanics and output to be simulated.
  • Methods for calibrating the tactile sensor’s depth output, as well as for quantifying measurement error and sensor noise, are actively under development.


Thank you to Andrea Sipos for guidance in the development of this research paper.


  1. N. Kuppuswamy, A. Alspach, A. Uttamchandani, S. Creasey, T. Ikeda, and R. Tedrake, Soft-bubble grippers for robust and perceptive manipulation, 9917-9918, October 25-29, 2020 [] []
  2. W. Yuan, S. Dong, E. H. Anderson, GelSight: High-Resolution Robot Tactile Sensors for Estimating Geometry and Force, 2/21, 13/21, 19/21 Published: 29 November 2017 [] [] []
  3. I. H. Taylor, S. Dong, and A.Rodriguez, GelSlim;3.0: High-Resolution Measurement of Shape, Force and Slip in a Compact Tactile-Sensing Finger,1-4,23 Mar 2021 [] []
  4. Harris, C.R., Millman, K.J., van der Walt, S.J. et al. Array programming with NumPy. Nature 585, 357–362 (2020). DOI:10.1038/s415860202649-2. [] []
  5. Hunter, J. D. Matplotlib: A 2D Graphics Environment. Computing in Science & Engineering, vol. 9, no. 3, pp. 90-95, 2007. []
  6. Y. Bekiroglu, J. Laaksonen, J. A. Jorgensen, V. Kyrki, and D. Kragic, “Assessing grasp stability based on learning and haptic data,” IEEE Transactions on Robotics, vol. 27, no. 3, pp. 616–629, 2011. []
  7. J. Tegin and J. Wikander, “Tactile sensing in intelligent robotic manipulation–a review,” Industrial Robot: An International Journal, vol. 32, no. 1, pp. 64–70, 2005. []
  8. R. S. Dahiya, G. Metta, M. Valle, and G. Sandini, “Tactile sensingfromhumans to humanoids,” IEEE transactions on robotics, vol. 26, no.1, pp.1–20, 2010. []
  9. A. Yamaguchi and C. G. Atkeson, “Implementing tactile behaviors using fingervision,” in 2017 IEEE-RAS 17th International Conference on Humanoid Robotics (Humanoids), Nov 2017, pp. 241–248. []
  10. L. Zhang and J. C. Trinkle, “The application of particle filtering to grasping acquisition with visual occlusion and tactile sensing,” in Robotics and automation (ICRA), 2012 IEEE international conference on. IEEE, 2012, pp. 3805–3812. [] []


Please enter your comment!
Please enter your name here