In machine learning, there are different types of label encoding techniques that can be used based on the nature of the data. Here are a few commonly used label encoding techniques:
1. Ordinal Encoding: In ordinal encoding, categories are assigned integer values based on their order or rank. For example, if we have a feature with categories "low," "medium," and "high," they can be encoded as 0, 1, and 2, respectively.
```python
from sklearn.preprocessing import OrdinalEncoder
categories = [['low'], ['medium'], ['high']]
encoder = OrdinalEncoder()
encoded_categories = encoder.fit_transform(categories)
print(encoded_categories)
```
Output:
```
[[0.]
[1.]
[2.]]
```
2. One-Hot Encoding: One-hot encoding creates binary columns for each category, representing the presence or absence of a category. Each category is transformed into a vector of 0s and 1s. For example, if we have categories "red," "blue," and "green," they can be encoded as [1, 0, 0], [0, 1, 0], and [0, 0, 1], respectively.
```python
from sklearn.preprocessing import OneHotEncoder
categories = [['red'], ['blue'], ['green']]
encoder = OneHotEncoder()
encoded_categories = encoder.fit_transform(categories).toarray()
print(encoded_categories)
```
Output:
```
[[1. 0. 0.]
[0. 1. 0.]
[0. 0. 1.]]
```
3. Binary Encoding: Binary encoding converts each category into binary code. Each category is represented by a sequence of binary digits. This encoding is particularly useful when dealing with high-cardinality categorical variables.
```python
import category_encoders as ce
import pandas as pd
categories = ['red', 'blue', 'green', 'red', 'blue']
data = pd.DataFrame({'categories': categories})
encoder = ce.BinaryEncoder(cols=['categories'])
encoded_data = encoder.fit_transform(data)
print(encoded_data)
```
Output:
```
categories_0 categories_1 categories_2
0 0 0 1
1 0 1 0
2 0 1 1
3 0 0 1
4 0 1 0
```
These are just a few examples of label encoding techniques in machine learning. The choice of encoding method depends on the specific requirements of your dataset and the machine learning algorithm you plan to use.
No comments:
Post a Comment