API Reference#

class mlsim.bias.Demographic(rho_a=0.5, rho_z=0.5)[source]#

base class for sampling demographics (a= protected attribute,z = true target value)


alias of mlsim.bias.bias_components.DemParams


get P(A=1)


rho_a – Probability of being in the disadvantaged group, A =1

Return type



return P(Z=1|A)


rho_z – probability of the favorable outcome(z =1) for A=0 and A=1 in that order

Return type

nparray of floats


Sample P(A,Z) = P(Z|A)P(A)


N (integer) – number of samples to return


a_z_tuple – a tuple of lenght 2 with elements a and z as column np arrays each of length N

Return type


class mlsim.bias.DemographicCorrelated(rho_a=0.5, rho_z=[0.5, 0.3])[source]#
class mlsim.bias.DemographicIndependent(rho_a=0.2, rho_z=0.1)[source]#
class mlsim.bias.Feature(dist=<function <lambda>>, mu=[[5, 2], [2, 5]], param_tuple=None)[source]#

base class for all feature samplers: P(X|A,Z,Y) by default creates two dimensional features with shared parameters across groups and good separability of classes

function to sample X|parameters where the paramters are dependend on



function handle


params of dist, one per value of z


list-like or list of tupples


alias of mlsim.bias.bias_components.FeatureParams

sample(a, z, y)[source]#

sample P(X|A,Z,Y) using distribution and parameters initialized for each a,z,y. The vectors a,z,y must be the same shape

  • a (list-like length n) – demographic variables

  • z (list like length n) – true target

  • y (list-like length n) – proxy target


x – featuers, same shape as a,z,y

Return type

list like, length n

class mlsim.bias.FeatureMeasurementQualityProxy(dist, loc, spread)[source]#

the measurement locations vary with the true target value z and the measurements spread vary with the meaured target value y, allowing for error to be present in both the features and the measurements. Also may vary with the protected attribute

class mlsim.bias.FeatureNoise(dist=<function <lambda>>, sig=1.0)[source]#

Base class for adding noise to features


alias of mlsim.bias.bias_components.NoiseParams

sample(a, z, y, x)[source]#

add noise to the features conditions on a,z,y add a groupwise noise to the feature vectors than the other

class mlsim.bias.FeatureNoiseReplace(dist, mu=[0, 0, 0], cov=[[1, 0, 0], [0, 1, 0], [0, 0, 1]], d_shared=1)[source]#

feature noise that replcaes some of the features with noise according to mean and covariance attributes

class mlsim.bias.FeaturePerGroupSharedParamAcrossGroups(dist, loc, spread)[source]#
class mlsim.bias.FeaturePerGroupSharedParamWithinGroup(dist, loc, spread)[source]#
class mlsim.bias.FeaturePerGroupTwoParam(dist, loc, spread)[source]#

feature sampler with two parameters that vary per group

class mlsim.bias.FeatureSharedParam(dist, loc, spread)[source]#

feature sampler with one parameter shared across Z (eg shared spread) A and Y have no impact on X

class mlsim.bias.FeatureTwoParams(dist, loc, spread)[source]#

feature sampler with two unique parameters per class

class mlsim.bias.Population(demographic_sampler=<class 'mlsim.bias.bias_components.Demographic'>, target_sampler=<class 'mlsim.bias.bias_components.Target'>, feature_sampler=<class 'mlsim.bias.bias_components.Feature'>, feature_noise_sampler=<class 'mlsim.bias.bias_components.FeatureNoise'>, parameter_dictionary={})[source]#

Object for describing a population so that sampling from the population and biased samples are possible from a sampler type and parameter dictionary


Build a string output that describes this object


description – values of each parameter value grouped by sampler

Return type


make_DataFrame(a, z, y, x)[source]#

combine into data frame with labels


a (list) –

make_StructuredDataset(a, z, y, x)[source]#

Converts a dataframe created by one of the above functions into a dataset usable in IBM 360 package

  • df (pandas dataframe) –

  • label_names (optional, a list of strings describing each label) –

  • protected_attribute_names (optional, a list of strings describing) –

  • attributes (features corresponding to protected) –

Return type


sample(N, return_as='DataFrame')[source]#

sample N members of the population, according to its underlying distribution

  • N (int) – number of samples

  • return_as (string, 'dataframe') – type to return as, can be pandas ‘DataFrame’ or IBM AIF360 ‘structuredDataset’

sample_unfavorable_outcomes(N, rho_z_scale)[source]#

sample so that the disadvantaged group (a=1) gets the favorable outcome (y=1) less often based on the rho_z_scale

class mlsim.bias.PopulationInstantiated(demographic_sampler=<mlsim.bias.bias_components.Demographic object>, target_sampler=<mlsim.bias.bias_components.Target object>, feature_sampler=<mlsim.bias.bias_components.Feature object>, feature_noise_sampler=<mlsim.bias.bias_components.FeatureNoise object>)[source]#

To instantiate with either default parameters or instantiated sampler objects

class mlsim.bias.Target(beta=0.05)[source]#

alias of mlsim.bias.bias_components.TargetParams

sample(a, z)[source]#

sample P(Y|A,Z) via P(Y=Z|A,Z) :param a: :param z: :param beta: :type beta: float

class mlsim.bias.TargetDisadvantagedError(beta=0.1)[source]#
class mlsim.bias.TargetTwoError(beta=[0, 0.1])[source]#