Transfer Learning - ML Strategy

GUPTA, Gagan       Posted by GUPTA, Gagan
      Published: July 22, 2021
        |  

Enjoy listening to this Blog while you are working with something else !

   

What is Transfer Learning?

Transfer learning, used in machine learning, is the reuse of a pre-trained model on a new problem. In transfer learning, a machine exploits the knowledge gained from a previous task to improve generalization about another. For example, in training a classifier to predict whether an image contains backpack, you could use the knowledge it gained during training to recognize other stuffs. It refers to storing knowledge gained while solving one problem and applying it to a different, but related, problem.

Unfortunately, when created from scratch, deep learning models require access to vast amounts of data and compute resources. This is a luxury that many can't afford. Moreover, it takes a long time to train deep learning models to perform tasks, which is not suitable for use cases that have a short time budget.

Fortunately, transfer learning, the discipline of using the knowledge gained from one trained AI model to another, can help solve these problems. Transfer learning is a technique that's risen to prominence in the AI and machine learning community over the past several decades. Prominent computer scientist Andrew Ng said in 2016 that transfer learning will be one of the major drivers of machine learning commercial success.

"After supervised learning - Transfer Learning will be the next driver of ML commercial success." - Andrew NG, one of the world's foremost data scientists

Traditional vs Transfer Learning

As opposed to traditional machine learning, which occurs on specific tasks and datasets, transfer learning leverages features and weights (among other variables) from previously trained models to train new models. Features are information extracted from a dataset to simplify a model's learning process, like the edges, shapes, and corners of signature boxes and typefaces in documents. On the other hand, weights determine how a given piece of input data will influence the output data.

Models are trained in two stages in transfer learning. First, there's retraining, where the model is trained on a benchmark dataset representing a range of categories. Next is fine-tuning, where the model is further trained on a target task of interest. The pretraining step helps the model to learn general features that can be reused on the target task, boosting its accuracy.

Types of Transfer Learning

There's several different kinds of transfer learning, each with their own upsides: inductive, unsupervised, and transductive transfer learning. With inductive transfer learning, the source and target domains are the same, yet the source and target tasks are different. Unsupervised learning involves different tasks in similar - but not identical - source and target domains without labeled data. As for transductive transfer learning, similarities exist between the source and target tasks, but the domains are different and only the target domain doesn't have labeled data.

Models are trained in two stages in transfer learning. First, there's retraining, where the model is trained on a benchmark dataset representing a range of categories. Next is fine-tuning, where the model is further trained on a target task of interest. The pretraining step helps the model to learn general features that can be reused on the target task, boosting its accuracy.

Our On-Premise Corporate Classroom Training is designed for your immediate training needs

Transfer Learning - ML Strategy
Transfer Learning - ML Strategy

Approaches to Transfer Learning

Training a Model to Reuse it

Imagine you want to solve task A but don't have enough data to train a deep neural network. One way around this is to find a related task B with an abundance of data. Train the deep neural network on task B and use the model as a starting point for solving task A. Whether you'll need to use the whole model or only a few layers depends heavily on the problem you're trying to solve.

If you have the same input in both tasks, possibly reusing the model and making predictions for your new input is an option. Alternatively, changing and retraining different task-specific layers and the output layer is a method to explore.

Using a Pre-Trained Model

The second approach is to use an already pre-trained model. There are a lot of these models out there, so make sure to do a little research. How many layers to reuse and how many to retrain depends on the problem.

Keras, for example, provides nine pre-trained models that can be used for transfer learning, prediction, feature extraction and fine-tuning. You can find these models, and also some brief tutorials on how to use them, here. There are also many research institutions that release trained models. This type of transfer learning is most commonly used throughout deep learning.

Feature Extraction

Another approach is to use deep learning to discover the best representation of your problem, which means finding the most important features. This approach is also known as representation learning, and can often result in a much better performance than can be obtained with hand-designed representation.

In machine learning, features are usually manually hand-crafted by researchers and domain experts. Fortunately, deep learning can extract features automatically. Of course, this doesn't mean feature engineering and domain knowledge isn't important anymore - you still have to decide which features you put into your network. That said, neural networks have the ability to learn which features are really important and which ones aren't. A representation learning algorithm can discover a good combination of features within a very short timeframe, even for complex tasks which would otherwise require a lot of human effort.

The learned representation can then be used for other problems as well. Simply use the first layers to spot the right representation of features, but don't use the output of the network because it is too task-specific. Instead, feed data into your network and use one of the intermediate layers as the output layer. This layer can then be interpreted as a representation of the raw data.

This approach is mostly used in computer vision because it can reduce the size of your dataset, which decreases computation time and makes it more suitable for traditional algorithms, as well.

Popular Pre-Trained Models

Keras itself provides some of the successful image processing neural networks pretrained on the ImageNet: https://keras.io/applications/ Other deep learning libraries also offer some pretrained models, notably:

TensorFlow: https://github.com/tensorflow/models
caffe: https://github.com/BVLC/caffe/wiki/Model-Zoo
caffe2: https://github.com/caffe2/caffe2/wiki/Model-Zoo
pytorch: https://github.com/Cadene/pretrained-models.pytorch
Lasagne: https://github.com/Lasagne/Recipes

The two most popular pre-trained vector embeddings can be found on these links:
GloVe: https://nlp.stanford.edu/projects/glove/
Tensorflow Embeddings: https://code.google.com/archive/p/word2vec/

There are also a couple of less popular and/or more recent ones:
LexVec: https://github.com/alexandres/lexvec
FastText: https://github.com/icoxfog417/fastTextJapaneseTutorial
Meta-Embeddings: http://cistern.cis.lmu.de/meta-emb/

Our On-Premise Corporate Classroom Training is designed for your immediate training needs

Limitations of Transfer Learning

- Currently, one of the biggest limitations to transfer learning is the problem of negative transfer. Transfer learning only works if the initial and target problems are similar enough for the first round of training to be relevant. Developers can draw reasonable conclusions about what type of training counts as 'similar enough' to the target, but the algorithm doesn't have to agree. If the first round of training is too far off the mark, the model may actually perform worse than if it had never been trained at all. Right now, there are still no clear standards on what types of training are sufficiently related, or how this should be measured.

- Transfer learning in a modern context requires a very large, general dataset. It's extremely important that you not build your base model from domain-specific data. If you build your base model on domain-specific data (god forbid data in your training or test sets) then your entire experiment is invalid. Even if you don't present labels to the base model, by seeing the 'correct' base data you've given it information that it shouldn't have access to.

- In transfer learning, developers cannot remove the network layers to find optimal AI models with confidence. If they remove the first layers, then it will affect the dense layers as the number of trainable parameters will change. And dense layers can be a good point for reducing layers, but analyzing how many layers and neurons to remove so that the model does not become overfitting is time-consuming and challenging. In context to transfer learning, overfitting happens when the new model learns details and noises from training data that negatively impact its outputs.

- The biggest negative of transfer learning is that it's very hard to do right and very easy to mess up. Especially in NLP this kind of approach has only been mainstream for about a year, which just isn't enough time when model runs take weeks.

Use Cases of Transfer Learning

- Real-world Simulations : Digital simulation is better than creating a physical prototype for real-world implementations
- Gaming : The adoption of Artificial Intelligence has taken gaming to an altogether new level.
- Image Classification : Transfer learning reduces the time to train the model by pre-training the model using ImageNet, which contains millions of images from different categories.
- Zero Shot translation: Google's Neural Translation model(GNMT), allows for effective cross-lingual translations.
- Sentiment Classification: Analyzing sentiments for a new text corpus is difficult to build up, as training the models to detect different emotions is difficult. A solution to this is Transfer Learning.

Final Thoughts

Transfer learning has brought in a new wave of learning in machines by reusing algorithms and the applied logic, thus speeding up their learning process by transferring 'knowledge'. This directly results in a reduction in the capital investment and also the time invested to train a model. This is why many organizations are looking forward to replicating such a learning onto their machine learning models. Also, transfer learning has been carried out successfully in the field of Image processing, Simulations, Gaming, and so on. How transfer learning affects the learning curve of machines in other sectors in the future, is worth watching out for.

Do you want to see TL in action? Our team of experts at Vyom Data Science's, can assist you in 'Transfer Learning'. Do contact us if you desire to become an expert in the field of Transfer Learning. It would be fun !

Support our effort by subscribing to our youtube channel. Update yourself with our latest videos on Data Science.

Looking forward to see you soon, till then Keep Learning !

Our On-Premise Corporate Classroom Training is designed for your immediate training needs

Transfer Learning - ML Strategy
                         



Corporate Scholarship Career Courses