hoi.metrics.DTC#
- class hoi.metrics.DTC(x, y=None, multiplets=None, verbose=None)[source]#
Dual total correlation.
Dual total correlation is another extension of mutual information to an arbitrary number of variables. It can be understood as the difference between the total entropy in a set of variables \(X^{n}\) and the entropy of each element \(X_{j}\) that is intrinsic to it and not shared with any other part. It is sensitive to both shared redundancies and synergies. It is defined as:
\[\begin{split}DTC(X^{n}) &= H(X^{n}) - \sum_{j=1}^{n} H(X_j|X_{-j}^{n}) \\ &= \sum_{j=1}^{n} H(X_j) - (n-1)H(X^{n})\end{split}\]- Parameters:
- xarray_like
Standard NumPy arrays of shape (n_samples, n_features) or (n_samples, n_features, n_variables)
- yarray_like
The feature of shape (n_samples,) for estimating task-related DTC
- multipletslist | None
List of multiplets to compute. Should be a list of multiplets, for example [(0, 1, 2), (2, 7, 8, 9)]. By default, all multiplets are going to be computed.
References
Te Sun, 1978 [23]
- Attributes:
entropies
Entropies of shape (n_mult,)
multiplets
Indices of the multiplets of shape (n_mult, maxsize).
order
Order of each multiplet of shape (n_mult,).
undersampling
Under-sampling threshold.
Methods
compute_entropies
([method, minsize, ...])Compute entropies for all multiplets.
fit
([minsize, maxsize, method])Compute the Dual total correlation.
get_combinations
(minsize[, maxsize, astype])Get combinations of features.
- __iter__()#
Iteration over orders.
- compute_entropies(method='gcmi', minsize=1, maxsize=None, fill_value=-1, **kwargs)#
Compute entropies for all multiplets.
- Parameters:
- method{‘gcmi’, ‘binning’, ‘knn’, ‘kernel}
Name of the method to compute entropy. Use either :
‘gcmi’: gaussian copula entropy [default]. See
hoi.core.entropy_gcmi()
‘binning’: binning-based estimator of entropy. Note that to use this estimator, the data have be to discretized. See
hoi.core.entropy_bin()
‘knn’: k-nearest neighbor estimator. See
hoi.core.entropy_knn()
‘kernel’: kernel-based estimator of entropy see
hoi.core.entropy_kernel()
- minsizeint, optional
Minimum size of the multiplets. Default is 1.
- maxsizeint, optional
Maximum size of the multiplets. Default is None.
- fill_valueint, optional
Value to fill the multiplet indices with. Default is -1.
- kwargsdict, optional
Additional arguments to pass to the entropy function.
- Returns:
- h_xarray_like
Entropies of shape (n_mult, n_variables)
- h_idxarray_like
Indices of the multiplets of shape (n_mult, maxsize)
- orderarray_like
Order of each multiplet of shape (n_mult,)
- property entropies#
Entropies of shape (n_mult,)
- fit(minsize=2, maxsize=None, method='gcmi', **kwargs)[source]#
Compute the Dual total correlation.
- Parameters:
- minsize, maxsizeint | 2, None
Minimum and maximum size of the multiplets
- method{‘gcmi’, ‘binning’, ‘knn’, ‘kernel}
Name of the method to compute entropy. Use either :
‘gcmi’: gaussian copula entropy [default]. See
hoi.core.entropy_gcmi()
‘binning’: binning-based estimator of entropy. Note that to use this estimator, the data have be to discretized. See
hoi.core.entropy_bin()
‘knn’: k-nearest neighbor estimator. See
hoi.core.entropy_knn()
‘kernel’: kernel-based estimator of entropy see
hoi.core.entropy_kernel()
- kwargsdict | {}
Additional arguments are sent to each entropy function
- Returns:
- hoiarray_like
The NumPy array containing values of higher-rder interactions of shape (n_multiplets, n_variables)
- get_combinations(minsize, maxsize=None, astype='jax')#
Get combinations of features.
- Parameters:
- minsizeint
Minimum size of the multiplets
- maxsizeint | None
Maximum size of the multiplets. If None, minsize is used.
- astype{‘jax’, ‘numpy’, ‘iterator’}
Specify the output type. Use either ‘jax’ get the data as a jax array [default], ‘numpy’ for NumPy array or ‘iterator’.
- Returns:
- combinationsarray_like
Combinations of features.
- property multiplets#
Indices of the multiplets of shape (n_mult, maxsize).
By convention, we used -1 to indicate that a feature has been ignored.
- property order#
Order of each multiplet of shape (n_mult,).
- property undersampling#
Under-sampling threshold.