Saturday, July 8, 2023

what are different label encodings in machine learning ang give examples

 In machine learning, there are different types of label encoding techniques that can be used based on the nature of the data. Here are a few commonly used label encoding techniques:


1. Ordinal Encoding: In ordinal encoding, categories are assigned integer values based on their order or rank. For example, if we have a feature with categories "low," "medium," and "high," they can be encoded as 0, 1, and 2, respectively.


```python

from sklearn.preprocessing import OrdinalEncoder

categories = [['low'], ['medium'], ['high']]

encoder = OrdinalEncoder()

encoded_categories = encoder.fit_transform(categories)

print(encoded_categories)

```

Output:

```

[[0.]

 [1.]

 [2.]]

```

2. One-Hot Encoding: One-hot encoding creates binary columns for each category, representing the presence or absence of a category. Each category is transformed into a vector of 0s and 1s. For example, if we have categories "red," "blue," and "green," they can be encoded as [1, 0, 0], [0, 1, 0], and [0, 0, 1], respectively.


```python

from sklearn.preprocessing import OneHotEncoder

categories = [['red'], ['blue'], ['green']]

encoder = OneHotEncoder()

encoded_categories = encoder.fit_transform(categories).toarray()

print(encoded_categories)

```

Output:

```

[[1. 0. 0.]

 [0. 1. 0.]

 [0. 0. 1.]]

```


3. Binary Encoding: Binary encoding converts each category into binary code. Each category is represented by a sequence of binary digits. This encoding is particularly useful when dealing with high-cardinality categorical variables.


```python

import category_encoders as ce

import pandas as pd


categories = ['red', 'blue', 'green', 'red', 'blue']


data = pd.DataFrame({'categories': categories})


encoder = ce.BinaryEncoder(cols=['categories'])

encoded_data = encoder.fit_transform(data)


print(encoded_data)

```


Output:

```

   categories_0  categories_1  categories_2

0             0             0             1

1             0             1             0

2             0             1             1

3             0             0             1

4             0             1             0

```


These are just a few examples of label encoding techniques in machine learning. The choice of encoding method depends on the specific requirements of your dataset and the machine learning algorithm you plan to use.

No comments: