- What is the difference between AI, Machine Learning, and Deep Learning?
Ans. AI Artificial Intelligence refers to the study, development, and application of computer techniques that allow computers to acquire certain skills of human intelligence.
ML Machine learning is a subset of Artificial Intelligence where people “train” machines to recognize patterns based on data and make their predictions. ML algorithms are mathematical algorithms that allow machines to learn by imitating the way we humans learn.
DL (Deep Learning): Deep Learning is a subset of ML in which the machine is able to reason and draw its own conclusions, learning by itself. Deep Learning uses algorithms that mimic human perception inspired by our brain and the connection between neurons. Most deep learning methods use neural network architecture.
- What is a neural network?
Ans. A neural network is a system of programs and data structures that approximates the functioning of the human brain. A neural network usually involves a large number of processors operating in parallel, each having its own small sphere of knowledge and access to data in its local memory.
A neural network is initially “trained” or fed with large amounts of data and rules about relationships (e.g. “a grandparent is older than a person’s father”). A program can then tell the network how to behave in response to an external stimulus (e.g. input from a computer user interacting with the network) or it can initiate the activity itself, within limits of their access to the external world.
Deep learning uses neural networks to learn useful representations of features directly from data. For example, you can use a pre-trained neural network to identify and remove artifacts such as noise from images.
- What is the idea behind the GANs?
Ans. The Generative Adversarial Network (GAN) is a very popular candidate in the field of machine learning that has showcased its potential to create realistic-looking images and videos. GANs consist of two networks (D & G) where –
D =”discriminating” network
G = “Generative” network.
The goal is to create data: images, for example, that cannot be distinguished from actual images. Suppose we want to create an adversarial example of a cat. Network G will generate images. Network D will classify the images according to whether it is a cat or not. The cost function of G will be constructed in such a way that it tries to “trick” D into always classifying its output as cat.
- In the neural network, if the hidden layer has a sufficient number of units, can it approximate any continuous function?
Ans. The Universal Approximation Theorem states that a simple feedforward neural network (i.e. multilayer perceptron) with a single hidden layer and a standard activation function can approximate any continuous function if the hidden layer has a sufficient number of units. If the function jumps around or has large gaps, it cannot be approximated.
- How will you solve the gradient explosion problem?
Ans. There are many ways to solve the gradient explosion problem. Some of the best experimental methods are –
Redesign the network model – In deep neural networks, the gradient explosion can be solved by redesigning the network with fewer layers. Using a smaller batch size is also good for network training. In recurrent neural networks, updating in fewer previous time steps during training (truncated backpropagation over time) can alleviate the gradient burst problem.
Use the ReLU trigger function – In deep multilayer perceptron neural networks, gradient explosion can occur due to activation functions, such as the previously popular Sigmoid and Tanh functions. Using the ReLU trigger function can reduce gradient burst. Adopting the ReLU trigger function is one of the most popular practices for hidden layers.
Use short and long-term memory networks – In the recurrent neural network, the gradient explosion may be due to the instability of the training of a certain network. For example, backpropagation over time essentially converts the recurring network into a deep multilayer perceptron neural network. The use of short- and long-term memory units (LSTM) and related gate-like neural structures can reduce the gradient burst problem. The use of LSTM units is the latest best practice for sequence prediction suitable for recurrent neural networks.
Use gradient clipping – In very deep multilayer perceptron networks with large batches and LSTMs with long input sequences, gradient bursts can occur. If the gradient burst still occurs, you can check and limit the size of the gradient during the training process. This process is called gradient truncation. There is a simple and effective solution to dealing with gradient bursts: If the gradients exceed the threshold, cut them off.
Specifically, it checks whether the value of the error gradient exceeds the threshold, and if it exceeds it, the gradient is truncated and the gradient is set as the threshold. Gradient truncation can alleviate the gradient burst problem to some extent (gradient truncation, i.e. the gradient is set as a threshold before the gradient descent step).
Use weight regularization – If the gradient explosion still exists, you can try another method, which is to check the size of the network weights and penalize the loss function that produces a larger weight value. This process is called weight regularization and generally uses either the L1 penalty (the absolute value of the weight) or the L2 penalty (the square of the weight). Using L1 or L2 penalty terms for loop weights can help alleviate gradient bursts.
- What is the difference between Stochastic Gradient Descent (SGD) and Batch Gradient Descent (BGD)?
Ans. Gradient Descent and Stochastic Gradient Descent are algorithms used in linear regression to find the set of parameters that minimize a loss function.
Batch Gradient Descent – BGD involves MULTIPLE calculations over the full training set at each step. It is a slower and expensive process if we have very large training data. However, this is great for convex or relatively smooth error manifolds.
Stochastic Gradient Descent: SGD picks up a RANDOM instance of training data at each step and then computes the gradient. This makes SGD a faster process than BGD.
- What is a confusion matrix and why do you need it?
Ans. A confusion matrix is a tool that allows visualizing the performance of a supervised learning algorithm. Each column of the matrix represents the number of predictions of each class, while each row represents the instances in the real class, that is, in practical terms it allows us to see what types of successes and errors our model is having when it comes to going through the learning process with data. The confusion matrix allows us to check if the algorithm is misclassifying the classes and to what extent.
- What is a Fourier transform?
Ans. Fourier Transform is a mathematical technique that transforms any function of time to a function of frequency. Fourier transform uses a time-based pattern for input and calculates the overall cycle offset, rotation speed, and strength for all possible cycles. Fourier transform is best applied to waveforms as it has functions of time and space. A Fourier transform decomposes a waveform into a sinusoid when applied.
- What is the difference between overfitting and underfitting?
Ans. Overfitting – In overfitting, a statistical model describes any random error or noise, and occurs when a model is super complicated. An overfit model has poor predictive performance as it overreacts to minor fluctuations in training data. Overfitting demonstrates good performance on the training data and poor generalization to other data.
Underfitting – In underfitting, a statistical model is unable to capture the underlying data trend. This type of model also shows poor predictive performance. Underfitting demonstrates poor performance on the training data and poor generalization to other data.