Printable Flashcards

Data Mining	Pattern Recognition
Clustering	Classification
Association Rule Mining	Machine Learning
Data Preprocessing	Decision Tree

The automated recognition of patterns and regularities in data.	The process of discovering patterns in large data sets involving methods at the intersection of machine learning, statistics, and database systems.
Predicting the class or category of an object based on its attributes.	Grouping a set of objects in such a way that objects in the same group are more similar to each other than those in other groups.
Enabling computers to learn and make decisions from data without being explicitly programmed.	Discovering interesting relationships or association patterns among a set of items in large datasets.
A flowchart-like structure in which each internal node represents a feature, each branch represents a decision rule, and each leaf node represents the outcome.	Cleaning and transforming raw data into a suitable format before performing analysis or mining.

Feature Selection	Big Data
Predictive Modelling Process	Data Collection
Data Cleaning	Model Training
Model Evaluation	Model Tuning

Extremely large and complex data sets that cannot be easily managed or analyzed using traditional data processing techniques.	Selecting a subset of relevant features from the original dataset to improve the performance of a machine learning model.
The first step in the predictive modelling process, involving gathering relevant data from various sources.	The process of using data and statistical algorithms to create models that predict future outcomes or trends.
The stage where the predictive model is developed using the selected features and training data.	The process of removing errors and inconsistencies from the collected data before further analysis.
The step where adjustments are made to the model to improve its predictive accuracy and performance.	The process of assessing the performance of the predictive model using validation data sets.

Deployment	Monitoring
Data Preparation	Cross-Validation
Regression Analysis	Random Forest
Support Vector Machine	Neural Network

The ongoing process of evaluating the model's performance and making updates as needed to maintain accuracy.	Where the model is put into practical use for making predictions.
A technique used to assess the performance of a predictive model by splitting the data into multiple subsets.	The process of cleaning, transforming, and organizing data before building a predictive model.
An ensemble learning method that constructs a multitude of decision trees at training time and outputs the mode of the classes as the prediction.	A statistical method used to examine the relationship between one dependent variable and one or more independent variables.
A network of interconnected nodes, similar to neurons in the brain, that processes information by mimicking the way the human brain functions.	A supervised machine learning algorithm that classifies data into different classes by finding the hyperplane that best separates the data points.

Logistic Regression	K-Nearest Neighbors
Gradient Boosting	Time Series Forecasting
Ensemble Learning

A non-parametric method used for classification and regression that classifies a data point based on the majority class of its k nearest neighbors.	A statistical method used to model binary outcomes by estimating the probability that a given outcome is present.
A technique used to forecast future values based on past data points in time order.	An ensemble learning method that builds a model in a stage-wise manner, with each new model addressing the errors of the previous models.
	A machine learning technique that combines multiple models to improve the overall performance and accuracy of the prediction.