Class: DictVectorizer
Transforms lists of feature-value mappings to vectors.
This transformer turns lists of mappings (dict-like objects) of feature names to feature values into Numpy arrays or scipy.sparse matrices for use with scikit-learn estimators.
When feature values are strings, this transformer will do a binary one-hot (aka one-of-K) coding: one boolean-valued feature is constructed for each of the possible string values that the feature can take on. For instance, a feature “f” that can take on the values “ham” and “spam” will become two features in the output, one signifying “f=ham”, the other “f=spam”.
If a feature value is a sequence or set of strings, this transformer will iterate over the values and will count the occurrences of each string value.
However, note that this transformer will only do a binary one-hot encoding when feature values are of type string. If categorical features are represented as numeric values such as int or iterables of strings, the DictVectorizer can be followed by OneHotEncoder
to complete binary one-hot encoding.
Features that do not occur in a sample (mapping) will have a zero value in the resulting array/matrix.
For an efficiency comparison of the different feature extractors, see FeatureHasher and DictVectorizer Comparison.
Read more in the User Guide.
Constructors
new DictVectorizer()
new DictVectorizer(
opts
?):DictVectorizer
Parameters
Parameter | Type | Description |
---|---|---|
opts ? | object | - |
opts.dtype ? | any | The type of feature values. Passed to Numpy array/scipy.sparse matrix constructors as the dtype argument. |
opts.separator ? | string | Separator string used when constructing new features for one-hot coding. |
opts.sort ? | boolean | Whether feature_names_ and vocabulary_ should be sorted when fitting. |
opts.sparse ? | boolean | Whether transform should produce scipy.sparse matrices. |
Returns DictVectorizer
Defined in generated/feature_extraction/DictVectorizer.ts:35
Properties
Property | Type | Default value | Defined in |
---|---|---|---|
_isDisposed | boolean | false | generated/feature_extraction/DictVectorizer.ts:33 |
_isInitialized | boolean | false | generated/feature_extraction/DictVectorizer.ts:32 |
_py | PythonBridge | undefined | generated/feature_extraction/DictVectorizer.ts:31 |
id | string | undefined | generated/feature_extraction/DictVectorizer.ts:28 |
opts | any | undefined | generated/feature_extraction/DictVectorizer.ts:29 |
Accessors
feature_names_
Get Signature
get feature_names_():
Promise
<any
[]>
A list of length n_features containing the feature names (e.g., “f=ham” and “f=spam”).
Returns Promise
<any
[]>
Defined in generated/feature_extraction/DictVectorizer.ts:496
py
Get Signature
get py():
PythonBridge
Returns PythonBridge
Set Signature
set py(
pythonBridge
):void
Parameters
Parameter | Type |
---|---|
pythonBridge | PythonBridge |
Returns void
Defined in generated/feature_extraction/DictVectorizer.ts:66
vocabulary_
Get Signature
get vocabulary_():
Promise
<any
>
A dictionary mapping feature names to feature indices.
Returns Promise
<any
>
Defined in generated/feature_extraction/DictVectorizer.ts:471
Methods
dispose()
dispose():
Promise
<void
>
Disposes of the underlying Python resources.
Once dispose()
is called, the instance is no longer usable.
Returns Promise
<void
>
Defined in generated/feature_extraction/DictVectorizer.ts:118
fit()
fit(
opts
):Promise
<any
>
Learn a list of feature name -> indices mappings.
Parameters
Parameter | Type | Description |
---|---|---|
opts | object | - |
opts.X ? | any | Dict(s) or Mapping(s) from feature names (arbitrary Python objects) to feature values (strings or convertible to dtype). |
opts.y ? | any | Ignored parameter. |
Returns Promise
<any
>
Defined in generated/feature_extraction/DictVectorizer.ts:135
fit_transform()
fit_transform(
opts
):Promise
<any
>
Learn a list of feature name -> indices mappings and transform X.
Like fit(X) followed by transform(X), but does not require materializing X in memory.
Parameters
Parameter | Type | Description |
---|---|---|
opts | object | - |
opts.X ? | any | Dict(s) or Mapping(s) from feature names (arbitrary Python objects) to feature values (strings or convertible to dtype). |
opts.y ? | any | Ignored parameter. |
Returns Promise
<any
>
Defined in generated/feature_extraction/DictVectorizer.ts:174
get_feature_names_out()
get_feature_names_out(
opts
):Promise
<any
>
Get output feature names for transformation.
Parameters
Parameter | Type | Description |
---|---|---|
opts | object | - |
opts.input_features ? | any | Not used, present here for API consistency by convention. |
Returns Promise
<any
>
Defined in generated/feature_extraction/DictVectorizer.ts:211
get_metadata_routing()
get_metadata_routing(
opts
):Promise
<any
>
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
Parameters
Parameter | Type | Description |
---|---|---|
opts | object | - |
opts.routing ? | any | A MetadataRequest encapsulating routing information. |
Returns Promise
<any
>
Defined in generated/feature_extraction/DictVectorizer.ts:247
init()
init(
py
):Promise
<void
>
Initializes the underlying Python resources.
This instance is not usable until the Promise
returned by init()
resolves.
Parameters
Parameter | Type |
---|---|
py | PythonBridge |
Returns Promise
<void
>
Defined in generated/feature_extraction/DictVectorizer.ts:79
inverse_transform()
inverse_transform(
opts
):Promise
<any
[]>
Transform array or sparse matrix X back to feature mappings.
X must have been produced by this DictVectorizer’s transform or fit_transform method; it may only have passed through transformers that preserve the number of features and their order.
In the case of one-hot/one-of-K coding, the constructed feature names and values are returned rather than the original ones.
Parameters
Parameter | Type | Description |
---|---|---|
opts | object | - |
opts.dict_type ? | any | Constructor for feature mappings. Must conform to the collections.Mapping API. |
opts.X ? | ArrayLike | Sample matrix. |
Returns Promise
<any
[]>
Defined in generated/feature_extraction/DictVectorizer.ts:285
restrict()
restrict(
opts
):Promise
<any
>
Restrict the features to those in support using feature selection.
This function modifies the estimator in-place.
Parameters
Parameter | Type | Description |
---|---|---|
opts | object | - |
opts.indices ? | boolean | Whether support is a list of indices. |
opts.support ? | ArrayLike | Boolean mask or list of indices (as returned by the get_support member of feature selectors). |
Returns Promise
<any
>
Defined in generated/feature_extraction/DictVectorizer.ts:326
set_inverse_transform_request()
set_inverse_transform_request(
opts
):Promise
<any
>
Request metadata passed to the inverse_transform
method.
Note that this method is only relevant if enable_metadata_routing=True
(see sklearn.set_config
). Please see User Guide on how the routing mechanism works.
The options for each parameter are:
Parameters
Parameter | Type | Description |
---|---|---|
opts | object | - |
opts.dict_type ? | string | boolean | Metadata routing for dict_type parameter in inverse_transform . |
Returns Promise
<any
>
Defined in generated/feature_extraction/DictVectorizer.ts:369
set_output()
set_output(
opts
):Promise
<any
>
Set output container.
See Introducing the set_output API for an example on how to use the API.
Parameters
Parameter | Type | Description |
---|---|---|
opts | object | - |
opts.transform ? | "default" | "pandas" | "polars" | Configure output of transform and fit_transform . |
Returns Promise
<any
>
Defined in generated/feature_extraction/DictVectorizer.ts:405
transform()
transform(
opts
):Promise
<any
>
Transform feature->value dicts to array or sparse matrix.
Named features not encountered during fit or fit_transform will be silently ignored.
Parameters
Parameter | Type | Description |
---|---|---|
opts | object | - |
opts.X ? | any [] | Dict(s) or Mapping(s) from feature names (arbitrary Python objects) to feature values (strings or convertible to dtype). |
Returns Promise
<any
>
Defined in generated/feature_extraction/DictVectorizer.ts:439