Abstract:
Here we describe a new and effective real-time solution for detecting video segments showing an instrument used during diagnostic or therapeutic operations in endoscopic procedures. In addition, we present a new method to create training data: similarity-based data augmentation. This method automates most of the creation of a large training dataset and prevents extensive manual effort to collect and annotate training data by domain experts. Convolutional Neural Network (CNN) analysis using the training data collected with similarity-based data augmentation yields an average F1 score very close to that of the CNN analysis using a large manually collected training dataset.