hoi.metrics.TC#

class hoi.metrics.TC(x, y=None, multiplets=None, verbose=None)[source]#

Total correlation.

Total correlation is the oldest exstension of mutual information to an arbitrary number of variables. It is defined as:

\[\begin{split}TC(X^{n}) &= \sum_{j=1}^{n} H(X_{j}) - H(X^{n}) \\\end{split}\]

The total correlation is equivalent to the Kullback-Liebler vergence between the joint distribution :math: P(X) and the product of the marginals. The total correlation is largely a measure of redundancy, sensitive to information shared between single elements.

Parameters:
xarray_like

Standard NumPy arrays of shape (n_samples, n_features) or (n_samples, n_features, n_variables)

yarray_like

The feature of shape (n_samples,) for estimating task-related TC

multipletslist | None

List of multiplets to compute. Should be a list of multiplets, for example [(0, 1, 2), (2, 7, 8, 9)]. By default, all multiplets are going to be computed.

References

Watabe, 1960 [26]

Attributes:
entropies

Entropies of shape (n_mult,)

multiplets

Indices of the multiplets of shape (n_mult, maxsize).

order

Order of each multiplet of shape (n_mult,).

undersampling

Under-sampling threshold.

Methods

compute_entropies([method, minsize, ...])

Compute entropies for all multiplets.

fit([minsize, maxsize, method])

Compute the Total correlation.

get_combinations(minsize[, maxsize, astype])

Get combinations of features.

__iter__()#

Iteration over orders.

compute_entropies(method='gcmi', minsize=1, maxsize=None, fill_value=-1, **kwargs)#

Compute entropies for all multiplets.

Parameters:
method{‘gcmi’, ‘binning’, ‘knn’, ‘kernel}

Name of the method to compute entropy. Use either :

minsizeint, optional

Minimum size of the multiplets. Default is 1.

maxsizeint, optional

Maximum size of the multiplets. Default is None.

fill_valueint, optional

Value to fill the multiplet indices with. Default is -1.

kwargsdict, optional

Additional arguments to pass to the entropy function.

Returns:
h_xarray_like

Entropies of shape (n_mult, n_variables)

h_idxarray_like

Indices of the multiplets of shape (n_mult, maxsize)

orderarray_like

Order of each multiplet of shape (n_mult,)

property entropies#

Entropies of shape (n_mult,)

fit(minsize=2, maxsize=None, method='gcmi', **kwargs)[source]#

Compute the Total correlation.

Parameters:
minsize, maxsizeint | 2, None

Minimum and maximum size of the multiplets

method{‘gcmi’, ‘binning’, ‘knn’, ‘kernel}

Name of the method to compute entropy. Use either :

kwargsdict | {}

Additional arguments are sent to each entropy function

Returns:
hoiarray_like

The NumPy array containing values of higher-rder interactions of shape (n_multiplets, n_variables)

get_combinations(minsize, maxsize=None, astype='jax')#

Get combinations of features.

Parameters:
minsizeint

Minimum size of the multiplets

maxsizeint | None

Maximum size of the multiplets. If None, minsize is used.

astype{‘jax’, ‘numpy’, ‘iterator’}

Specify the output type. Use either ‘jax’ get the data as a jax array [default], ‘numpy’ for NumPy array or ‘iterator’.

Returns:
combinationsarray_like

Combinations of features.

property multiplets#

Indices of the multiplets of shape (n_mult, maxsize).

By convention, we used -1 to indicate that a feature has been ignored.

property order#

Order of each multiplet of shape (n_mult,).

property undersampling#

Under-sampling threshold.