Tokenization | Training Corpus |
Parameters | Inference |
Fine-Tuning | Natural Language Processing |
Transformer | Overfitting |
A large set of texts used to train a model to understand and generate language. | The process of breaking text into smaller pieces, called tokens, which can be words or subwords. |
The process of using a trained model to generate predictions or outputs based on new input data. | The internal variables of a model that are adjusted during training to minimize prediction error. |
A field of artificial intelligence that focuses on the interaction between computers and human language. | A method of further training a pre-trained model on a specific dataset to improve performance on a particular task. |
A modeling error that occurs when a model learns the training data too well, failing to generalize to new data. | An architecture that uses self-attention mechanisms to process and generate sequences of data. |
Semantic Analysis | Bag Of Words |
Where a text is represented as an unordered collection of words. | The process of interpreting the meaning of words and phrases in context. |