API Reference#
- class mlsim.bias.Demographic(rho_a=0.5, rho_z=0.5)[source]#
base class for sampling demographics (a= protected attribute,z = true target value)
- ParamCreator#
alias of
DemParams
- get_rho_a()[source]#
get P(A=1)
- Returns:
rho_a – Probability of being in the disadvantaged group, A =1
- Return type:
float
alias of
DemParams
get P(A=1)
- Returns:
rho_a – Probability of being in the disadvantaged group, A =1
- Return type:
float
return P(Z=1|A)
- Returns:
rho_z – probability of the favorable outcome(z =1) for A=0 and A=1 in that order
- Return type:
nparray of floats
Sample P(A,Z) = P(Z|A)P(A)
- Parameters:
N (integer) – number of samples to return
- Returns:
a_z_tuple – a tuple of lenght 2 with elements a and z as column np arrays each of length N
- Return type:
Tuple
- class mlsim.bias.DemographicIndependent(rho_a=0.2, rho_z=0.1)[source]#
- ParamCreator#
alias of
DemParams
- get_rho_a()#
get P(A=1)
- Returns:
rho_a – Probability of being in the disadvantaged group, A =1
- Return type:
float
- get_rho_z()#
return P(Z=1|A)
- Returns:
rho_z – probability of the favorable outcome(z =1) for A=0 and A=1 in that order
- Return type:
nparray of floats
- sample(N)#
Sample P(A,Z) = P(Z|A)P(A)
- Parameters:
N (integer) – number of samples to return
- Returns:
a_z_tuple – a tuple of lenght 2 with elements a and z as column np arrays each of length N
- Return type:
Tuple
- class mlsim.bias.Feature(dist=<function <lambda>>, mu=[[5, 2], [2, 5]], param_tuple=None, N_a=2)[source]#
base class for all feature samplers: P(X|A,Z,Y) by default creates two dimensional features with shared parameters across groups and good separability of classes
- dist#
- function to sample X|parameters where the paramters are dependend on
Z,A,Y
- Type:
function handle
- theta#
params of dist, one per value of z,a, y
- Type:
list-like or list of tupples
- ParamCreator#
alias of
FeatureParams
- sample(a, z, y)[source]#
sample P(X|A,Z,Y) using distribution and parameters initialized for each a,z,y. The vectors a,z,y must be the same shape
- Parameters:
a (list-like length n) – demographic variables
z (list like length n) – true target
y (list-like length n) – proxy target
- Returns:
x – featuers, same shape as a,z,y
- Return type:
list like, length n
- class mlsim.bias.FeatureMeasurementQualityProxy(dist, loc, spread)[source]#
the measurement locations vary with the true target value z and the measurements spread vary with the meaured target value y, allowing for error to be present in both the features and the measurements. Also may vary with the protected attribute
- ParamCreator#
alias of
FeatureParams
- sample(a, z, y)#
sample P(X|A,Z,Y) using distribution and parameters initialized for each a,z,y. The vectors a,z,y must be the same shape
- Parameters:
a (list-like length n) – demographic variables
z (list like length n) – true target
y (list-like length n) – proxy target
- Returns:
x – featuers, same shape as a,z,y
- Return type:
list like, length n
- class mlsim.bias.FeatureNoise(dist=<function <lambda>>, sig=1.0, N_a=2)[source]#
Base class for adding noise to features
- ParamCreator#
alias of
NoiseParams
- class mlsim.bias.FeatureNoiseReplace(dist, mu=[0, 0, 0], cov=[[1, 0, 0], [0, 1, 0], [0, 0, 1]], d_shared=1)[source]#
feature noise that replcaes some of the features with noise according to mean and covariance attributes
- ParamCreator#
alias of
NoiseParams
- sample(a, z, y, x)#
add noise to the features conditions on a,z,y add a groupwise noise to the feature vectors than the other
alias of
FeatureParams
sample P(X|A,Z,Y) using distribution and parameters initialized for each a,z,y. The vectors a,z,y must be the same shape
- Parameters:
a (list-like length n) – demographic variables
z (list like length n) – true target
y (list-like length n) – proxy target
- Returns:
x – featuers, same shape as a,z,y
- Return type:
list like, length n
alias of
FeatureParams
sample P(X|A,Z,Y) using distribution and parameters initialized for each a,z,y. The vectors a,z,y must be the same shape
- Parameters:
a (list-like length n) – demographic variables
z (list like length n) – true target
y (list-like length n) – proxy target
- Returns:
x – featuers, same shape as a,z,y
- Return type:
list like, length n
- class mlsim.bias.FeaturePerGroupTwoParam(dist, loc, spread)[source]#
feature sampler with two parameters that vary per group
- ParamCreator#
alias of
FeatureParams
- sample(a, z, y)#
sample P(X|A,Z,Y) using distribution and parameters initialized for each a,z,y. The vectors a,z,y must be the same shape
- Parameters:
a (list-like length n) – demographic variables
z (list like length n) – true target
y (list-like length n) – proxy target
- Returns:
x – featuers, same shape as a,z,y
- Return type:
list like, length n
feature sampler with two total parameters and one parameter shared across Z (eg shared spread) A and Y have no impact on X
alias of
FeatureParams
sample P(X|A,Z,Y) using distribution and parameters initialized for each a,z,y. The vectors a,z,y must be the same shape
- Parameters:
a (list-like length n) – demographic variables
z (list like length n) – true target
y (list-like length n) – proxy target
- Returns:
x – featuers, same shape as a,z,y
- Return type:
list like, length n
- class mlsim.bias.FeatureTwoParams(loc, spread, dist=<function <lambda>>, N_a=2)[source]#
feature sampler with two unique parameters per class
- ParamCreator#
alias of
FeatureParams
- sample(a, z, y)#
sample P(X|A,Z,Y) using distribution and parameters initialized for each a,z,y. The vectors a,z,y must be the same shape
- Parameters:
a (list-like length n) – demographic variables
z (list like length n) – true target
y (list-like length n) – proxy target
- Returns:
x – featuers, same shape as a,z,y
- Return type:
list like, length n
- class mlsim.bias.Population(demographic_sampler=<class 'mlsim.bias.bias_components.Demographic'>, target_sampler=<class 'mlsim.bias.bias_components.Target'>, feature_sampler=<class 'mlsim.bias.bias_components.Feature'>, feature_noise_sampler=<class 'mlsim.bias.bias_components.FeatureNoise'>, parameter_dictionary={})[source]#
Object for describing a population so that sampling from the population and biased samples are possible from a sampler type and parameter dictionary
- get_parameter_description()[source]#
Build a string output that describes this object
- Returns:
description – values of each parameter value grouped by sampler
- Return type:
string
- make_StructuredDataset(a, z, y, x)[source]#
Converts a dataframe created by one of the above functions into a dataset usable in IBM 360 package
- Parameters:
df (pandas dataframe)
label_names (optional, a list of strings describing each label)
protected_attribute_names (optional, a list of strings describing)
attributes (features corresponding to protected)
- Return type:
aif360.datasets.StructuredDataset containing the data with y as the target and a as protected attribute.
- class mlsim.bias.PopulationInstantiated(demographic_sampler=<mlsim.bias.bias_components.Demographic object>, target_sampler=<mlsim.bias.bias_components.Target object>, feature_sampler=<mlsim.bias.bias_components.Feature object>, feature_noise_sampler=<mlsim.bias.bias_components.FeatureNoise object>)[source]#
To instantiate with either default parameters or instantiated sampler objects
- get_parameter_description()#
Build a string output that describes this object
- Returns:
description – values of each parameter value grouped by sampler
- Return type:
string
- make_DataFrame(a, z, y, x)#
combine into data frame with labels
- Parameters:
a (list)
- make_StructuredDataset(a, z, y, x)#
Converts a dataframe created by one of the above functions into a dataset usable in IBM 360 package
- Parameters:
df (pandas dataframe)
label_names (optional, a list of strings describing each label)
protected_attribute_names (optional, a list of strings describing)
attributes (features corresponding to protected)
- Return type:
aif360.datasets.StructuredDataset containing the data with y as the target and a as protected attribute.
- sample(N, return_as='DataFrame')#
sample N members of the population, according to its underlying distribution
- Parameters:
N (int) – number of samples
return_as (string, 'dataframe') – type to return as, can be pandas ‘DataFrame’ or IBM AIF360 ‘structuredDataset’
- sample_unfavorable_outcomes(N, rho_z_scale)#
sample so that the disadvantaged group (a=1) gets the favorable outcome (y=1) less often based on the rho_z_scale