API Reference#

class mlsim.bias.Demographic(rho_a=0.5, rho_z=0.5)[source]#

base class for sampling demographics (a= protected attribute,z = true target value)

ParamCreator#

alias of mlsim.bias.bias_components.DemParams

get_rho_a()[source]#

get P(A=1)

Returns

rho_a – Probability of being in the disadvantaged group, A =1

Return type

float

get_rho_z()[source]#

return P(Z=1|A)

Returns

rho_z – probability of the favorable outcome(z =1) for A=0 and A=1 in that order

Return type

nparray of floats

sample(N)[source]#

Sample P(A,Z) = P(Z|A)P(A)

Parameters

N (integer) – number of samples to return

Returns

a_z_tuple – a tuple of lenght 2 with elements a and z as column np arrays each of length N

Return type

Tuple

class mlsim.bias.DemographicCorrelated(rho_a=0.5, rho_z=[0.5, 0.3])[source]#
class mlsim.bias.DemographicIndependent(rho_a=0.2, rho_z=0.1)[source]#
class mlsim.bias.Feature(dist=<function <lambda>>, mu=[[5, 2], [2, 5]], param_tuple=None)[source]#

base class for all feature samplers: P(X|A,Z,Y) by default creates two dimensional features with shared parameters across groups and good separability of classes

dist#
function to sample X|parameters where the paramters are dependend on

Z,A,Y

Type

function handle

theta#

params of dist, one per value of z

Type

list-like or list of tupples

ParamCreator#

alias of mlsim.bias.bias_components.FeatureParams

sample(a, z, y)[source]#

sample P(X|A,Z,Y) using distribution and parameters initialized for each a,z,y. The vectors a,z,y must be the same shape

Parameters
  • a (list-like length n) – demographic variables

  • z (list like length n) – true target

  • y (list-like length n) – proxy target

Returns

x – featuers, same shape as a,z,y

Return type

list like, length n

class mlsim.bias.FeatureMeasurementQualityProxy(dist, loc, spread)[source]#

the measurement locations vary with the true target value z and the measurements spread vary with the meaured target value y, allowing for error to be present in both the features and the measurements. Also may vary with the protected attribute

class mlsim.bias.FeatureNoise(dist=<function <lambda>>, sig=1.0)[source]#

Base class for adding noise to features

ParamCreator#

alias of mlsim.bias.bias_components.NoiseParams

sample(a, z, y, x)[source]#

add noise to the features conditions on a,z,y add a groupwise noise to the feature vectors than the other

class mlsim.bias.FeatureNoiseReplace(dist, mu=[0, 0, 0], cov=[[1, 0, 0], [0, 1, 0], [0, 0, 1]], d_shared=1)[source]#

feature noise that replcaes some of the features with noise according to mean and covariance attributes

class mlsim.bias.FeaturePerGroupSharedParamAcrossGroups(dist, loc, spread)[source]#
class mlsim.bias.FeaturePerGroupSharedParamWithinGroup(dist, loc, spread)[source]#
class mlsim.bias.FeaturePerGroupTwoParam(dist, loc, spread)[source]#

feature sampler with two parameters that vary per group

class mlsim.bias.FeatureSharedParam(dist, loc, spread)[source]#

feature sampler with one parameter shared across Z (eg shared spread) A and Y have no impact on X

class mlsim.bias.FeatureTwoParams(dist, loc, spread)[source]#

feature sampler with two unique parameters per class

class mlsim.bias.Population(demographic_sampler=<class 'mlsim.bias.bias_components.Demographic'>, target_sampler=<class 'mlsim.bias.bias_components.Target'>, feature_sampler=<class 'mlsim.bias.bias_components.Feature'>, feature_noise_sampler=<class 'mlsim.bias.bias_components.FeatureNoise'>, parameter_dictionary={})[source]#

Object for describing a population so that sampling from the population and biased samples are possible from a sampler type and parameter dictionary

get_parameter_description()[source]#

Build a string output that describes this object

Returns

description – values of each parameter value grouped by sampler

Return type

string

make_DataFrame(a, z, y, x)[source]#

combine into data frame with labels

Parameters

a (list) –

make_StructuredDataset(a, z, y, x)[source]#

Converts a dataframe created by one of the above functions into a dataset usable in IBM 360 package

Parameters
  • df (pandas dataframe) –

  • label_names (optional, a list of strings describing each label) –

  • protected_attribute_names (optional, a list of strings describing) –

  • attributes (features corresponding to protected) –

Return type

aif360.datasets.StructuredDataset

sample(N, return_as='DataFrame')[source]#

sample N members of the population, according to its underlying distribution

Parameters
  • N (int) – number of samples

  • return_as (string, 'dataframe') – type to return as, can be pandas ‘DataFrame’ or IBM AIF360 ‘structuredDataset’

sample_unfavorable_outcomes(N, rho_z_scale)[source]#

sample so that the disadvantaged group (a=1) gets the favorable outcome (y=1) less often based on the rho_z_scale

class mlsim.bias.PopulationInstantiated(demographic_sampler=<mlsim.bias.bias_components.Demographic object>, target_sampler=<mlsim.bias.bias_components.Target object>, feature_sampler=<mlsim.bias.bias_components.Feature object>, feature_noise_sampler=<mlsim.bias.bias_components.FeatureNoise object>)[source]#

To instantiate with either default parameters or instantiated sampler objects

class mlsim.bias.Target(beta=0.05)[source]#
ParamCreator#

alias of mlsim.bias.bias_components.TargetParams

sample(a, z)[source]#

sample P(Y|A,Z) via P(Y=Z|A,Z) :param a: :param z: :param beta: :type beta: float

class mlsim.bias.TargetDisadvantagedError(beta=0.1)[source]#
class mlsim.bias.TargetTwoError(beta=[0, 0.1])[source]#