Glossary of Terms

Predictive Modelling

Keyword Definition
Cross-Validation A technique used to assess the performance of a predictive model by splitting the data into multiple subsets.
Data Cleaning The process of removing errors and inconsistencies from the collected data before further analysis.
Data Collection The first step in the predictive modelling process, involving gathering relevant data from various sources.
Data Preparation The process of cleaning, transforming, and organizing data before building a predictive model.
Decision Tree A flowchart-like structure in which each internal node represents a test on an attribute, each branch represents the outcome of the test, and each leaf node represents a class label.
Deployment Where the model is put into practical use for making predictions.
Ensemble Learning A machine learning technique that combines multiple models to improve the overall performance and accuracy of the prediction.
Feature Selection The process of choosing the most relevant input variables to be used in the predictive model.
Gradient Boosting An ensemble learning method that builds a model in a stage-wise manner, with each new model addressing the errors of the previous models.
K-Nearest Neighbors A non-parametric method used for classification and regression that classifies a data point based on the majority class of its k nearest neighbors.
Logistic Regression A statistical method used to model binary outcomes by estimating the probability that a given outcome is present.
Model Evaluation The process of assessing the performance of the predictive model using validation data sets.
Model Training The stage where the predictive model is developed using the selected features and training data.
Model Tuning The step where adjustments are made to the model to improve its predictive accuracy and performance.
Monitoring The ongoing process of evaluating the model's performance and making updates as needed to maintain accuracy.
Neural Network A network of interconnected nodes, similar to neurons in the brain, that processes information by mimicking the way the human brain functions.
Predictive Modelling Process The process of using data and statistical algorithms to create models that predict future outcomes or trends.
Random Forest An ensemble learning method that constructs a multitude of decision trees at training time and outputs the mode of the classes as the prediction.
Regression Analysis A statistical method used to examine the relationship between one dependent variable and one or more independent variables.
Support Vector Machine A supervised machine learning algorithm that classifies data into different classes by finding the hyperplane that best separates the data points.
Time Series Forecasting A technique used to forecast future values based on past data points in time order.