Question Mapping

Session 1 (2023-24)

Mapping of Questions to Syllabus Topics:

Group-A (Very Short Answer Type Question)

(i) Fraud Detection, Image Classification, Diagnostic, and Customer Retention are applications in ________ learning algorithm.
- Syllabus Topic: 1. Supervised Learning (Regression/Classification)
(ii) Machine learning algorithm build a model which is based on sample data is known as ________.
- Syllabus Topic: General Concept of Machine Learning (Implicitly covered across all topics)
(iii) The prior goal of unsupervised learning model is to determine the ________.
- Syllabus Topic: 2. Unsupervised Learning
(iv) Real-Time decisions, Game AI, Learning Tasks, Skill Acquisition, and Robot Navigation are applications in ________.
- Syllabus Topic: 5. Scalable Machine Learning (Online and Distributed Learning) - Reinforcement Learning
(v) Deep learning algorithms are ________ more accurate than machine learning algorithm in image classification.
- Syllabus Topic: 4. Sparse Modeling and Estimation, Modeling Sequence/Time-Series Data, Deep Learning and Feature Representation Learning
(vi) True or False: Overfitting is more likely when you have huge amount of data to train.
- Syllabus Topic: 3. Evaluating Machine Learning algorithms and Model Selection
(vii) If you use an ensemble of different base models, is it necessary to tune the hyper parameters of all base models to improve the ensemble performance?
- Syllabus Topic: 3. Evaluating Machine Learning algorithms and Model Selection - Ensemble Methods (Boosting, Bagging, Random Forests)
(viii) The Bayes rule can be used in answering ________.
- Syllabus Topic: 1. Supervised Learning (Regression/Classification) - Naive Bayes
(ix) True or False: K-means clustering aims to partition n observations into k clusters.
- Syllabus Topic: 2. Unsupervised Learning - Clustering: K-means/Kernel K-means
(x) Which ensemble model helps in reducing variance?
- Syllabus Topic: 3. Evaluating Machine Learning algorithms and Model Selection - Ensemble Methods (Boosting, Bagging, Random Forests)
(xi) Write one common feature about neural network and linear regression models.
- Syllabus Topic: 1. Supervised Learning (Regression/Classification) - Linear models
(xii) In a Decision Tree Leaf Node represents ________.
- Syllabus Topic: 1. Supervised Learning (Regression/Classification) - Basic methods: Decision Trees

Group-B (Short Answer Type Question)

2. How to Tackle Overfitting and Underfitting?
- Syllabus Topic: 3. Evaluating Machine Learning algorithms and Model Selection
3. Explain Pool-Based sampling in Active learning.
- Syllabus Topic: 5. Scalable Machine Learning (Online and Distributed Learning) - Active Learning
4. Differentiate Between Machine Learning and Deep Learning.
- Syllabus Topic: 4. Sparse Modeling and Estimation, Modeling Sequence/Time-Series Data, Deep Learning and Feature Representation Learning
5. Explain How a system can play a Game of Chess using Reinforcement Learning.
- Syllabus Topic: 5. Scalable Machine Learning (Online and Distributed Learning) - Reinforcement Learning
6. Explain ‘Naive’ in a Naive Bayes Algorithm.
- Syllabus Topic: 1. Supervised Learning (Regression/Classification) - Basic methods: Naive Bayes

Group-C (Long Answer Type Question)

7. (a) What is a False Positive and False Negative and How are they significant with an example? (b) Why do you need Confusion matrix? (c) For the given dataset, apply Naive Baye’s Algorithm and predict the outcome for a car= {Red, Domestic, SUV}.
- Syllabus Topic: 1. Supervised Learning (Regression/Classification) - Basic methods: Naive Bayes and 3. Evaluating Machine Learning algorithms and Model Selection
8. Determine the x(n) using time-shifting property. X(z) = 1 + (1/2)z⁻¹ / 1 - (1/2)z⁻¹ (a) Explain: Sparse Modelling and estimation. (b) Explain: Time series data in machine learning (c) Explain how deep learning and feature representation learning related.
- Syllabus Topic: 4. Sparse Modeling and Estimation, Modeling Sequence/Time-Series Data, Deep Learning and Feature Representation Learning
9. (a) Explain the ID3 algorithm for decision tree learning. (b) What is the procedure of building Decision tree using ID3 with Gain and Entropy. (c) What do you mean by Information gain and Entropy and its relation with ID3.
- Syllabus Topic: 1. Supervised Learning (Regression/Classification) - Basic methods: Decision Trees
10. (a) Why KNN is non-parametric? (b) Write down three Pros and Cons of KNN algorithm. (c) What is Euclidean distance in terms of machine learning? Using Euclidean distance find the distance between points P(3, 2) and Q(4, 1).
- Syllabus Topic: 1. Supervised Learning (Regression/Classification) - Basic methods: Distance-based methods, Nearest-Neighbours
11. (a) What is model selection? Write down the types of data and based on that which model is used? (b) Explain Cross validation with an example. (c) Explain Statistical Learning theory.
- Syllabus Topic: 3. Evaluating Machine Learning algorithms and Model Selection - Introduction to Statistical Learning Theory

Analysis of Session 1:

This session’s paper provides a broad assessment of fundamental machine learning concepts. Here’s a breakdown of the focus:

Module 1: Supervised Learning (Regression/Classification) is the most heavily weighted, with approximately 7 questions. This includes detailed questions on Naive Bayes, Decision Trees (specifically ID3), and KNN.
Module 3: Evaluating Machine Learning algorithms and Model Selection is also very important, with around 5 questions covering overfitting, underfitting, ensemble methods, and evaluation metrics like the confusion matrix.
Module 4: Sparse Modeling, Time-Series, and Deep Learning has a significant presence with about 3 questions, including a long answer question.
Module 5: Scalable Machine Learning is touched upon with 3 questions related to Reinforcement Learning and Active Learning.
Module 2: Unsupervised Learning has a lighter representation with 2 questions on clustering.
Module 6: Recent trends does not appear to be explicitly covered.

In summary, for Session 1, a strong understanding of supervised learning algorithms, model evaluation techniques, and the basics of deep learning and reinforcement learning were crucial.

Session 2 (2024-25)

Mapping of Questions to Syllabus Topics:

Group-A (Very Short Answer Type Question)

(i) What is unsupervised learning?
- Syllabus Topic: 2. Unsupervised Learning
(ii) what is an ensemble model in machine learning?
- Syllabus Topic: 3. Evaluating Machine Learning algorithms and Model Selection - Ensemble Methods (Boosting, Bagging, Random Forests)
(iii) Define deep learning.
- Syllabus Topic: 4. Sparse Modeling and Estimation, Modeling Sequence/Time-Series Data, Deep Learning and Feature Representation Learning
(iv) Define Reinforcement learning.
- Syllabus Topic: 5. Scalable Machine Learning (Online and Distributed Learning) - Reinforcement Learning
(v) What are support vectors in the context of SVMs?
- Syllabus Topic: 1. Supervised Learning (Regression/Classification) - Support Vector Machines
(vi) What is PCA?
- Syllabus Topic: 2. Unsupervised Learning - Dimensionality Reduction: PCA and kernel PCA
(vii) ______ machine learning algorithm is based upon the idea of bagging.
- Syllabus Topic: 3. Evaluating Machine Learning algorithms and Model Selection - Ensemble Methods (Boosting, Bagging, Random Forests)
(viii) Write down the difference between under fitting and over fitting.
- Syllabus Topic: 3. Evaluating Machine Learning algorithms and Model Selection
(ix) What is semi-supervised learning in machine learning?
- Syllabus Topic: 5. Scalable Machine Learning (Online and Distributed Learning) - Semi-supervised Learning
(x) What do you mean by Clustering?
- Syllabus Topic: 2. Unsupervised Learning - Clustering: K-means/Kernel K-means
(xi) Give one example of Dimensionality reduction technique.
- Syllabus Topic: 2. Unsupervised Learning - Dimensionality Reduction: PCA and kernel PCA
(xii) Give one real life example of deep learning.
- Syllabus Topic: 4. Sparse Modeling and Estimation, Modeling Sequence/Time-Series Data, Deep Learning and Feature Representation Learning

Group-B (Short Answer Type Question)

2. Explain Precision, Recall and F1 Score.
- Syllabus Topic: 3. Evaluating Machine Learning algorithms and Model Selection
3. Discuss working of Nueral network. State two differences between Deep learning and Machine learning.
- Syllabus Topic: 4. Sparse Modeling and Estimation, Modeling Sequence/Time-Series Data, Deep Learning and Feature Representation Learning
4. Define conditional probability. Discuss Bayes theorem with suitable example.
- Syllabus Topic: 1. Supervised Learning (Regression/Classification) - Basic methods: Naive Bayes
5. Compare linear regression with logistic regression. Write two applications of logistic regression.
- Syllabus Topic: 1. Supervised Learning (Regression/Classification) - Linear models: Linear Regression, Logistic Regression
6. How does the choice of distance metric affect the performance of a KNN classifier?
- Syllabus Topic: 1. Supervised Learning (Regression/Classification) - Basic methods: Distance-based methods, Nearest-Neighbours

Group-C (Long Answer Type Question)

7. (a) Define-Support Vectors, Margins and Hyperplane. (b) What is the difference between a linear and non-linear SVM? (c) Discuss cost function in regression.
- Syllabus Topic: 1. Supervised Learning (Regression/Classification) - Support Vector Machines and Linear models
8. (a) State the requirement of clustering algorithm? (b) How does the clustering differ from classification in machine learning? (c) Describe the strengths and weaknesses of the K-Means clustering algorithm. When is it suitable for use?
- Syllabus Topic: 2. Unsupervised Learning - Clustering: K-means/Kernel K-means
9. (a) Compare between Bagging & Boosting technique. (b) State the Difference Between Bias and variance in ML. (c) Write short note on Random Forest algorithm.
- Syllabus Topic: 3. Evaluating Machine Learning algorithms and Model Selection - Ensemble Methods (Boosting, Bagging, Random Forests)
10. (a) Write short note on Content Based filtering and Collaborative filtering. (b) How does the system make recommendations. (c) Discuss mathematics behind the recommendations made using content-based filtering.
- Syllabus Topic: 2. Unsupervised Learning - Matrix Factorization and Matrix Completion (as recommendation systems often use these techniques) and potentially 6. Recent trends.
11. (a) Discuss Naive Bayes Classifier algorithm. (b) Why it is called Naive Bayes ? (c) Explain KNN algorithm with suitable example. Using these probabilities estimate the probability values for the new instance (Color=Green, legs =2,Height=Tall, and Smelly=No).
- Syllabus Topic: 1. Supervised Learning (Regression/Classification) - Basic methods: Naive Bayes, Nearest-Neighbours

Analysis of Session 2:

This paper also covers a wide range of topics, with a clear emphasis on both supervised and unsupervised learning, as well as model evaluation.

Module 1: Supervised Learning (Regression/Classification) is again very prominent, with approximately 6 questions. These questions cover SVMs, Naive Bayes, KNN, and regression models.
Module 3: Evaluating Machine Learning algorithms and Model Selection is also heavily tested with about 4 questions on ensemble methods, evaluation metrics, and the bias-variance tradeoff.
Module 2: Unsupervised Learning has a strong showing with roughly 5 questions, including a full long-answer question on clustering and questions on PCA and recommendation systems.
Module 4: Deep Learning is introduced with about 3 questions, focusing on the definition and comparison with machine learning.
Module 5: Scalable Machine Learning is touched upon with 2 questions on reinforcement learning and semi-supervised learning.
Module 6: Recent trends could be considered to be addressed through the question on recommendation systems.

Overall, Session 2 required a balanced knowledge across supervised and unsupervised techniques, with a solid understanding of how to evaluate and select models.

Session 3 (2022-23)

Mapping of Questions to Syllabus Topics:

Group-A (Very Short Answer Type Question)

(i) ______ is a classification algorithm used to assign observations to a discrete set of classes.
- Syllabus Topic: 1. Supervised Learning (Regression/Classification)
(ii) The number of nodes in the input layer is 10 and the hidden layer is 5. The maximum number of connections from the input layer to the hidden layer are ______.
- Syllabus Topic: 4. Sparse Modeling and Estimation, Modeling Sequence/Time-Series Data, Deep Learning and Feature Representation Learning
(iii) True or False: Hierarchical clustering is slower than non-hierarchical clustering?
- Syllabus Topic: 2. Unsupervised Learning - Clustering: K-means/Kernel K-means
(iv) True or False: Ensemble learning can only be applied to supervised learning methods.
- Syllabus Topic: 3. Evaluating Machine Learning algorithms and Model Selection - Ensemble Methods (Boosting, Bagging, Random Forests)
(v) A collection of individual models that learn to predict a target by combining their strengths and avoiding the weaknesses of each is called ______.
- Syllabus Topic: 3. Evaluating Machine Learning algorithms and Model Selection - Ensemble Methods (Boosting, Bagging, Random Forests)
(vi) Semi-supervised learning algorithm deals with which types of data
- Syllabus Topic: 5. Scalable Machine Learning (Online and Distributed Learning) - Semi-supervised Learning
(vii) In an election, N candidates are competing against each other and people are voting for either of the candidates. Voters don’t communicate with each other while casting their votes. Which of the following ensemble method works similar to above-discussed election procedure?
- Syllabus Topic: 3. Evaluating Machine Learning algorithms and Model Selection - Ensemble Methods (Boosting, Bagging, Random Forests)
(viii) A feature F1 can take certain value: A, B, C, D, E, & F and represents grade of students from a college. Feature F1 is an example of ______ variable.
- Syllabus Topic: General concept, implicit in all modules.
(ix) Imagine a Newly-Born starts to learn walking. It will try to find a suitable policy to learn walking after repeated falling and getting up. Specify what type of machine learning is best suited?
- Syllabus Topic: 5. Scalable Machine Learning (Online and Distributed Learning) - Reinforcement Learning
(x) The selling price of a house depends on many factors. For example, it depends on the number of bedrooms, number of kitchen, number of bathrooms, the year the house was built, and the square footage of the lot. Given these factors, predicting the selling price of the house is an example of which type of linear regression.
- Syllabus Topic: 1. Supervised Learning (Regression/Classification) - Linear models: Linear Regression
(xi) Targeted marketing, Recommended Systems, and Customer Segmentation are applications in which algorithm?
- Syllabus Topic: 2. Unsupervised Learning - Clustering: K-means/Kernel K-means
(xii) The ______ is the difference between a sample statistic used to estimate a population parameter and the actual but unknown value of the parameter.
- Syllabus Topic: 3. Evaluating Machine Learning algorithms and Model Selection - Introduction to Statistical Learning Theory

Group-B (Short Answer Type Question)

2. Explain Matrix Factorization and where it is used.
- Syllabus Topic: 2. Unsupervised Learning - Matrix Factorization and Matrix Completion
3. Why ensemble learning is used? What is the general principle of an ensemble method and what is bagging and boosting in ensemble method?
- Syllabus Topic: 3. Evaluating Machine Learning algorithms and Model Selection - Ensemble Methods (Boosting, Bagging, Random Forests)
4. Explain the Difference Between Classification and Regression?
- Syllabus Topic: 1. Supervised Learning (Regression/Classification)
5. Compare K-means and KNN Algorithms.
- Syllabus Topic: 1. Supervised Learning (Regression/Classification) - Basic methods: Nearest-Neighbours and 2. Unsupervised Learning - Clustering: K-means/Kernel K-means
6. How do we decide the value of “K” in KNN algorithm? Why is the odd value of “K” preferable in KNN algorithm?
- Syllabus Topic: 1. Supervised Learning (Regression/Classification) - Basic methods: Nearest-Neighbours

Group-C (Long Answer Type Question)

7. (a) Discuss the different types of Machine Learning? (b) What are parametric and non-parametric model? (c) How is machine learning related to AI?
- Syllabus Topic: This covers the breadth of the syllabus, touching on concepts from all modules.
8. (a) Explain Generative Mixture model (b) With a proper diagram explain the steps of a generative mixture model (c) Write down the steps of PCA (Principal Component Analysis)
- Syllabus Topic: 2. Unsupervised Learning - Generative Models (mixture models and latent factor models) and Dimensionality Reduction: PCA and kernel PCA
9. (a) Explain the Confusion Matrix with Respect to Machine Learning Algorithms with an suitable example. (b) Calculate the accuracy percentage for the given Confusion Matrix. (c) Explain True Positive, True Negative, False Positive, and False Negative in Confusion Matrix with an example.
- Syllabus Topic: 3. Evaluating Machine Learning algorithms and Model Selection
10. (a) Explain the three techniques under supervised feature Selection (b) Explain the benefits of using feature selection in machine learning (c) Explain the curse of dimensionality
- Syllabus Topic: 2. Unsupervised Learning - Dimensionality Reduction (related concepts) and 3. Evaluating Machine Learning algorithms and Model Selection.
11. (a) What is Artificial Intelligence and why do we need it? (b) What is Deep Learning, and give some of its example that is used in real-world? (c) Differentiate between Artificial intelligence, Machine Learning, and Deep Learning
- Syllabus Topic: 4. Sparse Modeling and Estimation, Modeling Sequence/Time-Series Data, Deep Learning and Feature Representation Learning

Analysis of Session 3:

This session’s paper also demonstrates a comprehensive evaluation across the syllabus, with a notable emphasis on both fundamental concepts and evaluation methods.

Module 3: Evaluating Machine Learning algorithms and Model Selection is heavily featured, with approximately 6 questions covering ensemble methods, the confusion matrix, and feature selection.
Module 1: Supervised Learning (Regression/Classification) has a strong presence with about 5 questions focusing on classification vs. regression, linear regression, and KNN.
Module 2: Unsupervised Learning is well-represented with roughly 5 questions, including a detailed long-answer question on generative models and PCA, as well as questions on clustering and matrix factorization.
Module 4: Deep Learning is assessed with about 3 questions, including a long-answer question differentiating it from AI and Machine Learning.
Module 5: Scalable Machine Learning is covered with 2 questions on semi-supervised learning and reinforcement learning.
Module 6: Recent trends does not appear to be explicitly questioned.

In conclusion, Session 3 places a significant premium on understanding the principles of model evaluation and the distinctions between different learning paradigms, in addition to knowledge of specific supervised and unsupervised algorithms.

Overall Analysis

Highest Priority Topics (Featured Heavily in All Papers, Often in Long Answer Questions):

Supervised Learning Fundamentals: This is the most critical area.
- Core Algorithms: Have a deep understanding of Naive Bayes, K-Nearest Neighbors (KNN), and Decision Trees (specifically ID3). Be prepared to explain how they work, their pros and cons, and solve numerical problems with them.
- Linear Models: Know the difference between Linear and Logistic Regression.
- Support Vector Machines (SVM): Understand the concepts of hyperplanes, margins, and support vectors, and the difference between linear and non-linear SVMs.
Model Evaluation & Selection: This is equally important to knowing the algorithms themselves.
- Confusion Matrix: You must be able to define, explain, and calculate metrics from a confusion matrix (Accuracy, Precision, Recall, F1-Score, True/False Positives/Negatives). This is a recurring long-answer question.
- Overfitting vs. Underfitting & Bias vs. Variance: Expect short answer questions defining and differentiating these concepts and how to tackle them.
- Ensemble Methods: Understand Bagging, Boosting, and Random Forests. Be able to compare them and explain why they are used (e.g., to reduce variance).
Unsupervised Learning: This is a major topic, frequently appearing in all sections.
- Clustering: K-Means is a favorite. Know its algorithm, strengths, weaknesses, and how it differs from a classification algorithm like KNN.
- Dimensionality Reduction: Understand the purpose and steps of Principal Component Analysis (PCA).

Medium Priority Topics (Consistent but Less Frequent):

Deep Learning Fundamentals: Questions on this topic are becoming standard.
- Core Concepts: Be ready to define Deep Learning and differentiate it from Machine Learning and Artificial Intelligence.
- Applications: Know some real-world examples of deep learning.
Advanced Topics: These are often tested in short answer or objective questions.
- Reinforcement Learning: Understand the basic definition and be able to identify scenarios where it applies (e.g., game playing, robotics).
- Semi-supervised and Active Learning: Know the basic definitions.

Final Actionable Advice

Don’t Skip the Basics: Master the foundational concepts of Supervised, Unsupervised, and Reinforcement Learning as they form the basis for many questions.
Practice Numerical Problems: Be prepared to solve small numerical problems for algorithms like Naive Bayes, KNN (calculating Euclidean distance), and calculating metrics from a confusion matrix.
Focus on “Compare and Contrast”: Many questions ask you to differentiate between concepts (e.g., Machine Learning vs. Deep Learning, Classification vs. Regression, Bagging vs. Boosting, K-Means vs. KNN). Make summary notes for these.
Review All Groups: The short answer questions in Group A and B often cover the definitions and core ideas needed to answer the more detailed questions in Group C.

By focusing your preparation on these key areas, you will be well-equipped to handle the majority of the questions and perform well in your exam. Good luck

Augmented Syllabus

Unit 1: Supervised Learning (Regression/Classification)

Basic methods:
- Distance-based methods ✅
- Nearest-Neighbours ✅
  - K-NN Algorithm: Including the significance of the value of “K” ✅
- Decision Trees ✅
  - ID3 algorithm: Including information gain and entropy ✅
- Naive Bayes ✅
  - Naive Bayes Classifier Algorithm ✅
Linear models:
- Linear Regression ✅
- Logistic Regression ⚠️
- Generalized Linear Models ⚠️
- Comparison of Linear Regression and Logistic Regression ✅
Support Vector Machines ✅
- Linear vs. Non-linear SVM ✅
- Support Vectors ✅
- Nonlinearity and Kernel Methods ✅
Beyond Binary Classification:
- Multi-class/Structured Outputs ✅
- Ranking ❌

Unit 2: Unsupervised Learning

Clustering:
- Hierarchical vs. Non-hierarchical clustering ✅
- K-means/Kernel K-means ✅
- Applications of Clustering ✅
Dimensionality Reduction:
- Principal Component Analysis (PCA): Steps and applications ✅
- kernel PCA ✅
- Curse of Dimensionality ✅
Matrix Factorization and Matrix Completion ✅
Generative Models (mixture models and latent factor models) ✅
- Generative Mixture Models ✅

Unit 3: Evaluating Machine Learning Algorithms and Model Selection

Introduction to Statistical Learning Theory ❌
Ensemble Methods
- Boosting ✅
- Bagging ✅
- Random Forests ✅
- General Principles of Ensemble Methods ✅
- Bagging vs. Boosting ✅
Model Evaluation:
- Confusion Matrix: True Positive, True Negative, False Positive, False Negative ✅
- Metrics: Precision, Recall, F1-Score, and Accuracy ✅
- Overfitting and Underfitting ✅
- Cross-Validation ✅

Unit 4: Advanced Topics

Sparse Modeling and Estimation
Modeling Sequence/Time-Series Data
Deep Learning and Feature Representation Learning
- Deep Learning: Real-world examples
- Artificial Intelligence vs. Machine Learning vs. Deep Learning
- Neural Networks: Comparison with Linear Regression

Unit 5: Scalable Machine Learning

A selection from some other advanced topics, e.g.,
- Semi-supervised Learning
- Active Learning
  - Pool-Based Sampling
- Reinforcement Learning
  - Applications of Reinforcement Learning
- Inference in Graphical Models
- Introduction to Bayesian Learning and Inference

Topics Asked in PYQs But Not Explicitly in Your Syllabus

Here are the topics that appeared in the question papers but were not explicitly mentioned in the syllabus you provided:

Feature Selection:
- Techniques in supervised feature selection and its benefits.
Parametric vs. Non-parametric Models:
- This was a specific question asked in one of the papers.
Collaborative and Content-Based Filtering:
- These are specific types of recommendation systems that were asked about.
Bayes Theorem and Conditional Probability:
- While Naive Bayes is in the syllabus, a deeper understanding of the underlying Bayes’ theorem and conditional probability was expected in some questions.

Avi's Notes

Explorer

ML PYQ Analysis

Question Mapping

Session 1 (2023-24)

Session 2 (2024-25)

Session 3 (2022-23)

Overall Analysis

Final Actionable Advice

Augmented Syllabus

Unit 1: Supervised Learning (Regression/Classification)

Unit 2: Unsupervised Learning

Unit 3: Evaluating Machine Learning Algorithms and Model Selection

Unit 4: Advanced Topics

Unit 5: Scalable Machine Learning

Topics Asked in PYQs But Not Explicitly in Your Syllabus

Graph View

Table of Contents

Backlinks