API Reference#

mlsim.anomaly.geometric_2d_gmm_sp(r_clusters, cluster_size, cluster_spread, p_sp_clusters, domain_range, k, N, p_clusters=None)[source]#

Sample from a gaussian mixture model with Simpson’s Paradox and spread means return data in a data fram

r_clustersscalar [0,1]

correlation coefficient of clusters

cluster_size2 vector

variance in each direction of each cluster

cluster_spreadscalar [0,1]

pearson correlation of means

p_sp_clustersscalar in [0,1]

portion of clusters with SP

p_clustersvector in [0,1)^k, optional

probabilty of membership of a sample in each cluster (controls relative size of clusters) default is [1.0/k]*k for uniform

domain_range[xmin, xmax, ymin, ymax]

planned region for points to be in, means will be in middle 80%

kinteger

number of clusters

Nscalar

number of points

mlsim.anomaly.geometric_indep_views_gmm_sp(d, r_clusters, cluster_size, cluster_spread, p_sp_clusters, domain_range, k, N, p_clusters=None, numeric_categorical=False)[source]#

Sample from a gaussian mixture model with Simpson’s Paradox and spread means return data in a data fram

dinteger

number of independent views, groups of 3 columns with sp

r_clustersscalar [0,1] or list of d

correlation coefficient of clusters

cluster_size2 vector or list of d

variance in each direction of each cluster

cluster_spreadscalar [0,1] list of d

pearson correlation of means

p_sp_clustersscalar in [0,1] list of d

portion of clusters with SP

p_clustersvector in [0,1)^k, optional or list of d vectors

probabilty of membership of a sample in each cluster (controls relative size of clusters) default is [1.0/k]*k for uniform

domain_range[xmin, xmax, ymin, ymax] list of d

planned region for points to be in, means will be in middle 80%

kinteger or list of d

number of clusters

Nscalar

number of points, shared across all views

numeric_categorical=False

use numerical (ordinal) values instead of letters

mlsim.anomaly.plot_clustermat(z, fmt=None)[source]#

black and white matshow for clustering and feat allocation matrices

Parameters:
  • z (nparray, square to be plotted)

  • fmt (if z is not a square, then str of what it is)

fmt options: ‘crplist’ : a list of values from zero to k ‘ibplist’ : a list of lists of varying lengths ‘list’ : a list, but not nparray otherwise ready to plot

mlsim.anomaly.sp_plot(df, x_col, y_col, color_col, ci=None, domain_range=[0, 20, 0, 20], ax=None, aggplot=True, x_jitter=0, height=3, legend=True)[source]#

create SP vizualization plot from 2 columns of a df