A new approach forces machine learning models to focus on more data when learning a task, which can lead to more reliable predictions. If your Uber driver takes a shortcut, you may reach your destination faster. But if a machine learning model takes shortcuts, it may ruin what you want to do in unexpected ways.
In machine learning, when the model relies on a simple feature of the data set to make decisions, instead of learning the true nature of the data, shortcuts will appear, which may lead to inaccurate predictions. For example, a model might learn to recognize an image of a cow by focusing on the green grass that appears in the photo, rather than the more complex shapes and patterns of the cow.
A new study by MIT researchers explores the problem of shortcuts in a popular machine learning method and proposes a solution that prevents AI from taking shortcuts by forcing models to use more data in their decisions.
By removing the simpler features that the model focuses on, the researcher forces it to focus on the more complex features of the data that it has not considered. Then, by asking the model to solve the same task in two ways–once using those simpler features, and then also using complex features that it has learned to recognize–they reduced the tendency for shortcut solutions and improved the performance of the model.
Researchers at the Massachusetts Institute of Technology have developed a technique that reduces the tendency of comparative learning models to use shortcuts by forcing the model to focus on features in the data that it has not considered before.
One potential application of this work is to improve the effectiveness of machine learning models used to identify diseases in medical images. Shortcut solutions in this situation may lead to wrong diagnoses and have a dangerous impact on the patient.
The long road to understanding shortcuts
The researchers focused their research on comparative learning, which is a powerful form of self-supervised machine learning. In self-supervised machine learning, raw data without label descriptions from humans is used to train a model. Therefore, it can be successfully used for more types of data.
Self-supervised learning models learn useful data representations that are used as input for different tasks, such as image classification. But if the model takes shortcuts and fails to capture important information, these tasks will not be able to use this information.
For example, if a self-supervised learning model is trained to classify pneumonia in X-rays from some hospitals, but it learns to predict based on a label, this label can identify scans from a specific hospital (because some hospitals There are more cases of pneumonia than other hospitals), then when it is given data from the new hospital, this model will not perform well.
For the comparative learning model, an encoder algorithm is trained to distinguish between similar input pairs and dissimilar input pairs. This process encodes rich and complex data, such as images, in a way that can be interpreted by a comparative learning model.
The researchers tested the comparative learning encoder with a series of images and found that during this training process, they would also fall into a shortcut solution. Encoders tend to focus on the simplest features in the image to determine which input pairs are similar and which are not. Ideally, the encoder should pay attention to all the useful features of the data when making a decision.
As a result, the research team made the distinction between similar and dissimilar data pairs more difficult to distinguish, and found that this changed which features the encoder would look at to make a decision.
If you make the task of distinguishing similar and dissimilar items more and more difficult, then the system is forced to learn more meaningful information from the data, because the task cannot be solved without learning. However, increasing this difficulty leads to a trade-off-the encoder becomes better at focusing on certain features of the data, but becomes worse at focusing on other features, seeming to almost forget the simpler features.
In order to avoid this trade-off, the researchers asked the encoder to use simpler features in the original way, and distinguish these pairs after the researchers deleted the information it had learned. Solving the task in two ways at the same time, the encoder has been improved in all features.
Their method is called implicit feature modification, which adaptively modifies the samples to remove the simpler features used by the encoder to distinguish between pairs. The technology does not rely on human input, which is important because real-world data sets may have hundreds of different features, which may be combined in complex ways.
Problems that can be solved range from cars to chronic obstructive pulmonary disease
The researchers tested this method with images of vehicles. They use implicit feature modification to adjust the color, direction, and vehicle type, making it more difficult for the encoder to distinguish between similar and dissimilar image pairs. The encoder also improves its accuracy on all three features-texture, shape, and color.
In order to understand whether the method can withstand more complex data, the researchers also tested it with samples from the chronic obstructive pulmonary disease (COPD) medical image database. Again, this method enabled all the features they evaluated to be improved simultaneously.
Although this work has taken some important steps in understanding the causes of shortcut solutions and working to solve these problems, the researchers say that continuing to refine these methods and apply them to other types of self-supervised learning will be progress in the future. The essential.