One-hot encoding
One-hot encoding converts categorical variables into binary vectors so machine learning models can process them. Each category gets its own column, and the value is marked as 1 or 0 depending on whether that category applies.
Example
Imagine a dataset with different types of fruit, and one variable shows the fruit's color, such as 'red', 'green', or 'yellow'. Machine learning models usually need numerical input, so one-hot encoding turns the color variable into binary vectors. Each unique category gets a new binary column. A '1' is placed in the matching column for the relevant category, while '0' is placed in the others. In this case, 'red' becomes [1, 0, 0], 'green' becomes [0, 1, 0], and 'yellow' becomes [0, 0, 1]. This lets the model evaluate the categorical information correctly.