Class: OneHotEncoder
Encode categorical features as a one-hot numeric array.
The input to this transformer should be an array-like of integers or strings, denoting the values taken on by categorical (discrete) features. The features are encoded using a one-hot (aka ‘one-of-K’ or ‘dummy’) encoding scheme. This creates a binary column for each category and returns a sparse matrix or dense array (depending on the sparse_output
parameter).
By default, the encoder derives the categories based on the unique values in each feature. Alternatively, you can also specify the categories
manually.
This encoding is needed for feeding categorical data to many scikit-learn estimators, notably linear models and SVMs with the standard kernels.
Note: a one-hot encoding of y labels should use a LabelBinarizer instead.
Read more in the User Guide. For a comparison of different encoders, refer to: Comparing Target Encoder with Other Encoders.
Constructors
new OneHotEncoder()
new OneHotEncoder(
opts
?):OneHotEncoder
Parameters
Parameter | Type | Description |
---|---|---|
opts ? | object | - |
opts.categories ? | "auto" | Categories (unique values) per feature: |
opts.drop ? | any [] | "first" | "if_binary" | Specifies a methodology to use to drop one of the categories per feature. This is useful in situations where perfectly collinear features cause problems, such as when feeding the resulting data into an unregularized linear regression model. However, dropping one category breaks the symmetry of the original representation and can therefore induce a bias in downstream models, for instance for penalized linear classification or regression models. |
opts.dtype ? | any | Desired dtype of output. |
opts.feature_name_combiner ? | "concat" | Callable with signature def callable(input_feature, category) that returns a string. This is used to create feature names to be returned by get_feature_names_out . "concat" concatenates encoded feature name and category with feature + "_" + str(category) .E.g. feature X with values 1, 6, 7 create feature names X_1, X_6, X_7 . |
opts.handle_unknown ? | "ignore" | "error" | "infrequent_if_exist" | Specifies the way unknown categories are handled during transform . |
opts.max_categories ? | number | Specifies an upper limit to the number of output features for each input feature when considering infrequent categories. If there are infrequent categories, max_categories includes the category representing the infrequent categories along with the frequent categories. If undefined , there is no limit to the number of output features. |
opts.min_frequency ? | number | Specifies the minimum frequency below which a category will be considered infrequent. |
opts.sparse_output ? | boolean | When true , it returns a scipy.sparse.csr_matrix , i.e. a sparse matrix in “Compressed Sparse Row” (CSR) format. |
Returns OneHotEncoder
Defined in generated/preprocessing/OneHotEncoder.ts:31
Properties
Property | Type | Default value | Defined in |
---|---|---|---|
_isDisposed | boolean | false | generated/preprocessing/OneHotEncoder.ts:29 |
_isInitialized | boolean | false | generated/preprocessing/OneHotEncoder.ts:28 |
_py | PythonBridge | undefined | generated/preprocessing/OneHotEncoder.ts:27 |
id | string | undefined | generated/preprocessing/OneHotEncoder.ts:24 |
opts | any | undefined | generated/preprocessing/OneHotEncoder.ts:25 |
Accessors
categories_
Get Signature
get categories_():
Promise
<any
>
The categories of each feature determined during fitting (in order of the features in X and corresponding with the output of transform
). This includes the category specified in drop
(if any).
Returns Promise
<any
>
Defined in generated/preprocessing/OneHotEncoder.ts:416
drop_idx_
Get Signature
get drop_idx_():
Promise
<any
[]>
drop_idx_\[i\]
is the index in categories_\[i\]
of the category to be dropped for each feature.
Returns Promise
<any
[]>
Defined in generated/preprocessing/OneHotEncoder.ts:441
feature_name_combiner
Get Signature
get feature_name_combiner():
Promise
<any
>
Callable with signature def callable(input_feature, category)
that returns a string. This is used to create feature names to be returned by get_feature_names_out
.
Returns Promise
<any
>
Defined in generated/preprocessing/OneHotEncoder.ts:516
feature_names_in_
Get Signature
get feature_names_in_():
Promise
<ArrayLike
>
Names of features seen during fit. Defined only when X
has feature names that are all strings.
Returns Promise
<ArrayLike
>
Defined in generated/preprocessing/OneHotEncoder.ts:491
n_features_in_
Get Signature
get n_features_in_():
Promise
<number
>
Number of features seen during fit.
Returns Promise
<number
>
Defined in generated/preprocessing/OneHotEncoder.ts:466
py
Get Signature
get py():
PythonBridge
Returns PythonBridge
Set Signature
set py(
pythonBridge
):void
Parameters
Parameter | Type |
---|---|
pythonBridge | PythonBridge |
Returns void
Defined in generated/preprocessing/OneHotEncoder.ts:88
Methods
dispose()
dispose():
Promise
<void
>
Disposes of the underlying Python resources.
Once dispose()
is called, the instance is no longer usable.
Returns Promise
<void
>
Defined in generated/preprocessing/OneHotEncoder.ts:140
fit()
fit(
opts
):Promise
<any
>
Fit OneHotEncoder to X.
Parameters
Parameter | Type | Description |
---|---|---|
opts | object | - |
opts.X ? | ArrayLike [] | The data to determine the categories of each feature. |
opts.y ? | any | Ignored. This parameter exists only for compatibility with Pipeline . |
Returns Promise
<any
>
Defined in generated/preprocessing/OneHotEncoder.ts:157
fit_transform()
fit_transform(
opts
):Promise
<any
[]>
Fit to data, then transform it.
Fits transformer to X
and y
with optional parameters fit_params
and returns a transformed version of X
.
Parameters
Parameter | Type | Description |
---|---|---|
opts | object | - |
opts.fit_params ? | any | Additional fit parameters. |
opts.X ? | ArrayLike [] | Input samples. |
opts.y ? | ArrayLike | Target values (undefined for unsupervised transformations). |
Returns Promise
<any
[]>
Defined in generated/preprocessing/OneHotEncoder.ts:196
get_feature_names_out()
get_feature_names_out(
opts
):Promise
<any
>
Get output feature names for transformation.
Parameters
Parameter | Type | Description |
---|---|---|
opts | object | - |
opts.input_features ? | any | Input features. |
Returns Promise
<any
>
Defined in generated/preprocessing/OneHotEncoder.ts:238
get_metadata_routing()
get_metadata_routing(
opts
):Promise
<any
>
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
Parameters
Parameter | Type | Description |
---|---|---|
opts | object | - |
opts.routing ? | any | A MetadataRequest encapsulating routing information. |
Returns Promise
<any
>
Defined in generated/preprocessing/OneHotEncoder.ts:274
init()
init(
py
):Promise
<void
>
Initializes the underlying Python resources.
This instance is not usable until the Promise
returned by init()
resolves.
Parameters
Parameter | Type |
---|---|
py | PythonBridge |
Returns Promise
<void
>
Defined in generated/preprocessing/OneHotEncoder.ts:101
inverse_transform()
inverse_transform(
opts
):Promise
<ArrayLike
[]>
Convert the data back to the original representation.
When unknown categories are encountered (all zeros in the one-hot encoding), undefined
is used to represent this category. If the feature with the unknown category has a dropped category, the dropped category will be its inverse.
For a given input feature, if there is an infrequent category, ‘infrequent_sklearn’ will be used to represent the infrequent category.
Parameters
Parameter | Type | Description |
---|---|---|
opts | object | - |
opts.X ? | ArrayLike | The transformed data. |
Returns Promise
<ArrayLike
[]>
Defined in generated/preprocessing/OneHotEncoder.ts:312
set_output()
set_output(
opts
):Promise
<any
>
Set output container.
See Introducing the set_output API for an example on how to use the API.
Parameters
Parameter | Type | Description |
---|---|---|
opts | object | - |
opts.transform ? | "default" | "pandas" | "polars" | Configure output of transform and fit_transform . |
Returns Promise
<any
>
Defined in generated/preprocessing/OneHotEncoder.ts:348
transform()
transform(
opts
):Promise
<ArrayLike
>
Transform X using one-hot encoding.
If sparse_output=True
(default), it returns an instance of scipy.sparse._csr.csr_matrix
(CSR format).
If there are infrequent categories for a feature, set by specifying max_categories
or min_frequency
, the infrequent categories are grouped into a single category.
Parameters
Parameter | Type | Description |
---|---|---|
opts | object | - |
opts.X ? | ArrayLike [] | The data to encode. |
Returns Promise
<ArrayLike
>
Defined in generated/preprocessing/OneHotEncoder.ts:384