API Reference

Contents

API Reference#

class mlsim.bias.Demographic(rho_a=0.5, rho_z=0.5)[source]#

base class for sampling demographics (a= protected attribute,z = true target value)

ParamCreator#

alias of DemParams

get_rho_a()[source]#

get P(A=1)

Returns:

rho_a – Probability of being in the disadvantaged group, A =1

Return type:

float

get_rho_z()[source]#

return P(Z=1|A)

Returns:

rho_z – probability of the favorable outcome(z =1) for A=0 and A=1 in that order

Return type:

nparray of floats

sample(N)[source]#

Sample P(A,Z) = P(Z|A)P(A)

Parameters:

N (integer) – number of samples to return

Returns:

a_z_tuple – a tuple of lenght 2 with elements a and z as column np arrays each of length N

Return type:

Tuple

class mlsim.bias.DemographicCorrelated(rho_a=0.5, rho_z=[0.5, 0.3])[source]#
ParamCreator#

alias of DemParams

get_rho_a()#

get P(A=1)

Returns:

rho_a – Probability of being in the disadvantaged group, A =1

Return type:

float

get_rho_z()#

return P(Z=1|A)

Returns:

rho_z – probability of the favorable outcome(z =1) for A=0 and A=1 in that order

Return type:

nparray of floats

sample(N)#

Sample P(A,Z) = P(Z|A)P(A)

Parameters:

N (integer) – number of samples to return

Returns:

a_z_tuple – a tuple of lenght 2 with elements a and z as column np arrays each of length N

Return type:

Tuple

class mlsim.bias.DemographicIndependent(rho_a=0.2, rho_z=0.1)[source]#
ParamCreator#

alias of DemParams

get_rho_a()#

get P(A=1)

Returns:

rho_a – Probability of being in the disadvantaged group, A =1

Return type:

float

get_rho_z()#

return P(Z=1|A)

Returns:

rho_z – probability of the favorable outcome(z =1) for A=0 and A=1 in that order

Return type:

nparray of floats

sample(N)#

Sample P(A,Z) = P(Z|A)P(A)

Parameters:

N (integer) – number of samples to return

Returns:

a_z_tuple – a tuple of lenght 2 with elements a and z as column np arrays each of length N

Return type:

Tuple

class mlsim.bias.Feature(dist=<function <lambda>>, mu=[[5, 2], [2, 5]], param_tuple=None, N_a=2)[source]#

base class for all feature samplers: P(X|A,Z,Y) by default creates two dimensional features with shared parameters across groups and good separability of classes

dist#
function to sample X|parameters where the paramters are dependend on

Z,A,Y

Type:

function handle

theta#

params of dist, one per value of z,a, y

Type:

list-like or list of tupples

ParamCreator#

alias of FeatureParams

sample(a, z, y)[source]#

sample P(X|A,Z,Y) using distribution and parameters initialized for each a,z,y. The vectors a,z,y must be the same shape

Parameters:
  • a (list-like length n) – demographic variables

  • z (list like length n) – true target

  • y (list-like length n) – proxy target

Returns:

x – featuers, same shape as a,z,y

Return type:

list like, length n

class mlsim.bias.FeatureMeasurementQualityProxy(dist, loc, spread)[source]#

the measurement locations vary with the true target value z and the measurements spread vary with the meaured target value y, allowing for error to be present in both the features and the measurements. Also may vary with the protected attribute

ParamCreator#

alias of FeatureParams

sample(a, z, y)#

sample P(X|A,Z,Y) using distribution and parameters initialized for each a,z,y. The vectors a,z,y must be the same shape

Parameters:
  • a (list-like length n) – demographic variables

  • z (list like length n) – true target

  • y (list-like length n) – proxy target

Returns:

x – featuers, same shape as a,z,y

Return type:

list like, length n

class mlsim.bias.FeatureNoise(dist=<function <lambda>>, sig=1.0, N_a=2)[source]#

Base class for adding noise to features

ParamCreator#

alias of NoiseParams

sample(a, z, y, x)[source]#

add noise to the features conditions on a,z,y add a groupwise noise to the feature vectors than the other

class mlsim.bias.FeatureNoiseReplace(dist, mu=[0, 0, 0], cov=[[1, 0, 0], [0, 1, 0], [0, 0, 1]], d_shared=1)[source]#

feature noise that replcaes some of the features with noise according to mean and covariance attributes

ParamCreator#

alias of NoiseParams

sample(a, z, y, x)#

add noise to the features conditions on a,z,y add a groupwise noise to the feature vectors than the other

class mlsim.bias.FeaturePerGroupSharedParamAcrossGroups(dist, loc, spread)[source]#
ParamCreator#

alias of FeatureParams

sample(a, z, y)#

sample P(X|A,Z,Y) using distribution and parameters initialized for each a,z,y. The vectors a,z,y must be the same shape

Parameters:
  • a (list-like length n) – demographic variables

  • z (list like length n) – true target

  • y (list-like length n) – proxy target

Returns:

x – featuers, same shape as a,z,y

Return type:

list like, length n

class mlsim.bias.FeaturePerGroupSharedParamWithinGroup(dist, loc, spread)[source]#
ParamCreator#

alias of FeatureParams

sample(a, z, y)#

sample P(X|A,Z,Y) using distribution and parameters initialized for each a,z,y. The vectors a,z,y must be the same shape

Parameters:
  • a (list-like length n) – demographic variables

  • z (list like length n) – true target

  • y (list-like length n) – proxy target

Returns:

x – featuers, same shape as a,z,y

Return type:

list like, length n

class mlsim.bias.FeaturePerGroupTwoParam(dist, loc, spread)[source]#

feature sampler with two parameters that vary per group

ParamCreator#

alias of FeatureParams

sample(a, z, y)#

sample P(X|A,Z,Y) using distribution and parameters initialized for each a,z,y. The vectors a,z,y must be the same shape

Parameters:
  • a (list-like length n) – demographic variables

  • z (list like length n) – true target

  • y (list-like length n) – proxy target

Returns:

x – featuers, same shape as a,z,y

Return type:

list like, length n

class mlsim.bias.FeatureSharedParam(loc, spread, dist=<function <lambda>>, N_a=2)[source]#

feature sampler with two total parameters and one parameter shared across Z (eg shared spread) A and Y have no impact on X

ParamCreator#

alias of FeatureParams

sample(a, z, y)#

sample P(X|A,Z,Y) using distribution and parameters initialized for each a,z,y. The vectors a,z,y must be the same shape

Parameters:
  • a (list-like length n) – demographic variables

  • z (list like length n) – true target

  • y (list-like length n) – proxy target

Returns:

x – featuers, same shape as a,z,y

Return type:

list like, length n

class mlsim.bias.FeatureTwoParams(loc, spread, dist=<function <lambda>>, N_a=2)[source]#

feature sampler with two unique parameters per class

ParamCreator#

alias of FeatureParams

sample(a, z, y)#

sample P(X|A,Z,Y) using distribution and parameters initialized for each a,z,y. The vectors a,z,y must be the same shape

Parameters:
  • a (list-like length n) – demographic variables

  • z (list like length n) – true target

  • y (list-like length n) – proxy target

Returns:

x – featuers, same shape as a,z,y

Return type:

list like, length n

class mlsim.bias.Population(demographic_sampler=<class 'mlsim.bias.bias_components.Demographic'>, target_sampler=<class 'mlsim.bias.bias_components.Target'>, feature_sampler=<class 'mlsim.bias.bias_components.Feature'>, feature_noise_sampler=<class 'mlsim.bias.bias_components.FeatureNoise'>, parameter_dictionary={})[source]#

Object for describing a population so that sampling from the population and biased samples are possible from a sampler type and parameter dictionary

get_parameter_description()[source]#

Build a string output that describes this object

Returns:

description – values of each parameter value grouped by sampler

Return type:

string

make_DataFrame(a, z, y, x)[source]#

combine into data frame with labels

Parameters:

a (list)

make_StructuredDataset(a, z, y, x)[source]#

Converts a dataframe created by one of the above functions into a dataset usable in IBM 360 package

Parameters:
  • df (pandas dataframe)

  • label_names (optional, a list of strings describing each label)

  • protected_attribute_names (optional, a list of strings describing)

  • attributes (features corresponding to protected)

Return type:

aif360.datasets.StructuredDataset containing the data with y as the target and a as protected attribute.

sample(N, return_as='DataFrame')[source]#

sample N members of the population, according to its underlying distribution

Parameters:
  • N (int) – number of samples

  • return_as (string, 'dataframe') – type to return as, can be pandas ‘DataFrame’ or IBM AIF360 ‘structuredDataset’

sample_unfavorable_outcomes(N, rho_z_scale)[source]#

sample so that the disadvantaged group (a=1) gets the favorable outcome (y=1) less often based on the rho_z_scale

class mlsim.bias.PopulationInstantiated(demographic_sampler=<mlsim.bias.bias_components.Demographic object>, target_sampler=<mlsim.bias.bias_components.Target object>, feature_sampler=<mlsim.bias.bias_components.Feature object>, feature_noise_sampler=<mlsim.bias.bias_components.FeatureNoise object>)[source]#

To instantiate with either default parameters or instantiated sampler objects

get_parameter_description()#

Build a string output that describes this object

Returns:

description – values of each parameter value grouped by sampler

Return type:

string

make_DataFrame(a, z, y, x)#

combine into data frame with labels

Parameters:

a (list)

make_StructuredDataset(a, z, y, x)#

Converts a dataframe created by one of the above functions into a dataset usable in IBM 360 package

Parameters:
  • df (pandas dataframe)

  • label_names (optional, a list of strings describing each label)

  • protected_attribute_names (optional, a list of strings describing)

  • attributes (features corresponding to protected)

Return type:

aif360.datasets.StructuredDataset containing the data with y as the target and a as protected attribute.

sample(N, return_as='DataFrame')#

sample N members of the population, according to its underlying distribution

Parameters:
  • N (int) – number of samples

  • return_as (string, 'dataframe') – type to return as, can be pandas ‘DataFrame’ or IBM AIF360 ‘structuredDataset’

sample_unfavorable_outcomes(N, rho_z_scale)#

sample so that the disadvantaged group (a=1) gets the favorable outcome (y=1) less often based on the rho_z_scale

class mlsim.bias.Target(beta=0.05, N_a=2)[source]#
ParamCreator#

alias of TargetParams

sample(a, z)[source]#

sample P(Y|A,Z) via P(Y=Z|A,Z) :param a: :param z: :param beta: :type beta: float

class mlsim.bias.TargetDisadvantagedError(beta=0.1, N_a=2)[source]#
ParamCreator#

alias of TargetParams

sample(a, z)#

sample P(Y|A,Z) via P(Y=Z|A,Z) :param a: :param z: :param beta: :type beta: float

class mlsim.bias.TargetTwoError(beta=[0, 0.1])[source]#
ParamCreator#

alias of TargetParams

sample(a, z)#

sample P(Y|A,Z) via P(Y=Z|A,Z) :param a: :param z: :param beta: :type beta: float