Data Mining | Pattern Recognition |
Clustering | Classification |
Association Rule Mining | Machine Learning |
Data Preprocessing | Decision Tree |
The automated recognition of patterns and regularities in data. | The process of discovering patterns in large data sets involving methods at the intersection of machine learning, statistics, and database systems. |
Predicting the class or category of an object based on its attributes. | Grouping a set of objects in such a way that objects in the same group are more similar to each other than those in other groups. |
Enabling computers to learn and make decisions from data without being explicitly programmed. | Discovering interesting relationships or association patterns among a set of items in large datasets. |
A flowchart-like structure in which each internal node represents a feature, each branch represents a decision rule, and each leaf node represents the outcome. | Cleaning and transforming raw data into a suitable format before performing analysis or mining. |
Feature Selection | Big Data |
Predictive Modelling Process | Data Collection |
Data Cleaning | Model Training |
Model Evaluation | Model Tuning |
Extremely large and complex data sets that cannot be easily managed or analyzed using traditional data processing techniques. | Selecting a subset of relevant features from the original dataset to improve the performance of a machine learning model. |
The first step in the predictive modelling process, involving gathering relevant data from various sources. | The process of using data and statistical algorithms to create models that predict future outcomes or trends. |
The stage where the predictive model is developed using the selected features and training data. | The process of removing errors and inconsistencies from the collected data before further analysis. |
The step where adjustments are made to the model to improve its predictive accuracy and performance. | The process of assessing the performance of the predictive model using validation data sets. |
Deployment | Monitoring |
Data Preparation | Cross-Validation |
Regression Analysis | Random Forest |
Support Vector Machine | Neural Network |
The ongoing process of evaluating the model's performance and making updates as needed to maintain accuracy. | Where the model is put into practical use for making predictions. |
A technique used to assess the performance of a predictive model by splitting the data into multiple subsets. | The process of cleaning, transforming, and organizing data before building a predictive model. |
An ensemble learning method that constructs a multitude of decision trees at training time and outputs the mode of the classes as the prediction. | A statistical method used to examine the relationship between one dependent variable and one or more independent variables. |
A network of interconnected nodes, similar to neurons in the brain, that processes information by mimicking the way the human brain functions. | A supervised machine learning algorithm that classifies data into different classes by finding the hyperplane that best separates the data points. |
Logistic Regression | K-Nearest Neighbors |
Gradient Boosting | Time Series Forecasting |
Ensemble Learning | |
A non-parametric method used for classification and regression that classifies a data point based on the majority class of its k nearest neighbors. | A statistical method used to model binary outcomes by estimating the probability that a given outcome is present. |
A technique used to forecast future values based on past data points in time order. | An ensemble learning method that builds a model in a stage-wise manner, with each new model addressing the errors of the previous models. |
A machine learning technique that combines multiple models to improve the overall performance and accuracy of the prediction. | |