hypehd.visualization

Module Contents

Functions

cluster_3d(df, cols[, c_type, number, min_sample, ...])

Plots a three-dimensional graph of clusters for the three specified numerical columns.

cluster_2d(df, cols[, c_type, number, min_sample, ...])

Plots a two-dimensional graph of clusters for the two specified numercial columns.

graph_3d(df, ax1, ax2, ax3[, lab1, lab2, lab3, path, name])

Plots a three-dimensional graph based on the three columns entered.

f_test(group1, group2)

Return the p-value given two groups' data.

demo_graph(var, input_data[, group])

Show the count of categorical characteristic variables in each group and combine with a summary table.

longitudinal_graph(outcome, time, group, input_data)

Show the scatter plot of outcome means over time in each group and combine with a summary table. Function for

relation(df[, gtype, path, name_chi, name_cor])

Plots a heat-map of the relationship between features of the same type. If type of feature

survival_analysis(time, censor_status, group, input_data)

Show the kaplan-meier curve and combine with a median survival time summary.

boxplot_grid(df[, col1, col2, col3])

A function for creating a grid of box plots with two options. One being that

pie(df, col[, path, name])

Draws a pie chart of the specified column. If the path is given the png file of

hypehd.visualization.cluster_3d(df, cols, c_type='k-means', number=None, min_sample=3, eps=0.5, lab1=None, lab2=None, lab3=None, legend=False, path=None, name='cluster3d')

Plots a three-dimensional graph of clusters for the three specified numerical columns. The type of clustering as well as some minor fine-tuning of clustering model are available.

dfpd.DataFrame, mandatory

The dataset that contains the feature that will be plotted.

colslist, mandatory

A list with three element of type str which are the column names.

c_typestr, optional

The type of clustering model. The available options are DBSCAN, OPTICS and k-means. If no type is specified, k-means will be preformed.

numberint, optional

The number of clusters. This parameter is only used for k-means. The default value is the smallest number of clusters with an inertia value less than 50.

min_sampleint, optional

The minimum number of samples in a cluster. This parameter is only used for DBSCAN and OPTICS. The default value is 3.

eps: float, optional

The maximum distance between samples for them to fall into the same cluster. This parameter is only used for DBSCAN. The default value is 0.5.

lab1str, optional

The label for axis 1. The default option is the name of the column in the data frame.

lab2str, optional

The label for axis 2. The default option is the name of the column in the data frame.

lab3str, optional

The label for axis 3. The default option is the name of the column in the data frame.

legendbool, optional

Show legend or not. default is False.

pathstr, optional

The directory path to save the plot in. Plot will not be saved if not specified.

namestr, optional

Name of the plot. The default is cluster3d.

fig : plt.figure ax : axes.Axes The figure and axes of the plot.

> cluster_3d(df = data, cols = [‘age’,’height’,’BMI’], c_type = “DBSCAN”, lab1 = “Age”, lab2 = “Height”)

hypehd.visualization.cluster_2d(df, cols, c_type='k-means', number=None, min_sample=3, eps=0.5, lab1=None, lab2=None, path=None, name='cluster2d')

Plots a two-dimensional graph of clusters for the two specified numercial columns. The type of clustering as well as some minor fine-tuning of clustering model are available.

dfpd.DataFrame, mandatory

The dataset that contains the feature that will be plotted.

colslist, mandatory

A list with two element of type str which are the column names.

c_typestr, optional

The type of clustering model. The available options are DBSCAN, OPTICS and k-means. If no type is specified, k-means will be preformed.

numberint, optional

The number of clusters. This parameter is only used for k-means. The default value is the smallest number of clusters with an inertia value less than 50.

min_sampleint, optional

The minimum number of samples in a cluster. This parameter is only used for DBSCAN and OPTICS. The default value is 3.

eps: float, optional

The maximum distance between samples for them to fall into the same cluster. This parameter is only used for DBSCAN. The default value is 0.5.

lab1str, optional

The label for axis 1. The default option is the name of the column in the data frame.

lab2str, optional

The label for axis 2. The default option is the name of the column in the data frame.

pathstr, optional

The directory path to save the plot in. Plot will not be saved if not specified.

namestr, optional

Name of the plot. The default is cluster2d.

fig : plt.figure ax : axes.Axes The figure and axes of the plot.

> cluster_2d(df = data, cols = [‘age’,’height’], c_type = “OPTICS”, lab1 = “Age”, lab2 = “Height”)

hypehd.visualization.graph_3d(df, ax1: str, ax2: str, ax3: str, lab1=None, lab2=None, lab3=None, path=None, name='graph3d')

Plots a three-dimensional graph based on the three columns entered.

dfpd.DataFrame, mandatory

The dataset that contains the feature that will be plotted.

ax1str, mandatory

The name of the column for axis 1 containing the feature to be plotted.

ax2str, mandatory

The name of the column for axis 2 containing the feature to be plotted.

ax3str, mandatory

The name of the column for axis 3 containing the feature to be plotted.

lab1str, optional

The label for axis 1. The default option is the name of the column in the data frame.

lab2str, optional

The label for axis 2. The default option is the name of the column in the data frame.

lab3str, optional

The label for axis 3. The default option is the name of the column in the data frame.

pathstr, optional

The directory path to save the plot in. Plot will not be saved if not specified.

namestr, optional

Name of the plot. The default is graph3d.

fig : plt.figure ax : axes.Axes The figure and axes of the plot.

> graph_3d(df = health_data, ax1 = “sbp”, ax2 = “dbp”, ax3 = “chd”, lab1 = “SBP”, lab2 = “DBP”, lab3 = “CHD”)

hypehd.visualization.f_test(group1, group2)

Return the p-value given two groups’ data.

group1series or list, mandatory

Containing continuous numbers for F-test from group 1.

group2series or list, mandatory

Containing continuous numbers for F-test from group 2.

p_valuefloat

A number round to 3 decimal places.

> a = [0.28, 0.2, 0.26, 0.28, 0.5] > b = [0.2, 0.23, 0.26, 0.21, 0.23] > f_test(a, b) 0.004

hypehd.visualization.demo_graph(var: list, input_data: pandas.DataFrame, group=None)

Show the count of categorical characteristic variables in each group and combine with a summary table. Show the boxplot of countinous characteristic variables in each group and combine with a smmary table.

varlist, mandatory

List of the characteristic variables. The list can include both categorical and countinous variables. The function can automatically detect its type and then use proper plot.

input_datapd.DataFrame, mandatory

Input dataset name.

groupnames of variables in input_data, optional

Grouping variables that will produce plottings and summary tables with different colors (e.g. treatment group).

tuple (fig_list, ax_list) : list of Figure, list of axes.Axes The matplotlib figures and axes containing the plots and summary tables.

longitudinal_graph

> demo_graph(var=[‘gender’,’age’], input_data=data, group=”treatment”)

hypehd.visualization.longitudinal_graph(outcome: list, time, group, input_data: pandas.DataFrame)

Show the scatter plot of outcome means over time in each group and combine with a summary table. Function for longitudinal data analysis.

outcomelist, mandatory

List of the continuous outcome(y) variables need to be plotted.

timenames of variables in input_data, mandatory

Time variables(x)(e.g. visit number).

groupnames of time variables in input_data, mandatory

Grouping variables that will produce plottings and summary tables with different colors (e.g. treatment group).

input_datapd.DataFrame, mandatory

Input dataset name.

tuple (fig_list, ax_list) : list of Figure, list of axes.Axes The matplotlib figures and axes containing the plots and summary tables.

demo_graph

> longitudinal_graph(outcome=[“change_from_baseline”], time=”visit”, group=”treatment”, input_data=data)

hypehd.visualization.relation(df, gtype=3, path=None, name_chi='chiheatmap', name_cor='corheatmap')

Plots a heat-map of the relationship between features of the same type. If type of feature (numerical or categorical) is not specified, both heat-maps will be drawn. The measure used is chi-squared for categorical and correlation for numerical types. This function does not calculate the relationship between categorical and numerical values.

dfpd.DataFrame, mandatory

The dataset.

gtypeint, optional

Type of feature. 1 for categorical (chi-squared), 2 for numerical (correlation) any other number for both. The default is 3 (both).

pathstr, optional

The directory path to save the plot in. Plot will not be saved if not specified.

name_chistr, optional

Name of the plot for the categorical features. The default is chiheatmap.

name_cor: str, optional

Name of the plot for the numerical features. The default is corheatmap.

list : plt.figure, axes.Axes, p <Object containing both figure and axes> A list containing the figure and axes of each drawn plot.

> relation(df = health_data, gtype = 2, path = “/Users/Person/Documents”, name_cor = “numerical_heatmap”)

hypehd.visualization.survival_analysis(time, censor_status, group, input_data: pandas.DataFrame)

Show the kaplan-meier curve and combine with a median survival time summary. Function for survival data analysis.

timenames of time variables in input_data, mandatory

Time to event of interest.

censor_statusnames of variables in input_data, mandaory

True(1) if the event of interest was observed, False(0) if the event was lost (right-censored).

groupnames of time variables in input_data, mandatory

Grouping variables that will produce plottings and summary tables with different colors (e.g. treatment group).

input_datapd.DataFrame, mandatory

Input dataset name.

fig, ax : Figure, axes.Axes The matplotlib figure and ax containing the plot and summary table.

> survival_analysis(time=”time_to_event”, censor_status=”censor”, group=”treatment”, input_data=data)

hypehd.visualization.boxplot_grid(df, col1=None, col2=None, col3=None)

A function for creating a grid of box plots with two options. One being that no column was specified, in this case the grid will be box plots of all numeric features in the dataset. The other being that three columns were specified, with one column being numeric and the others being categorical. In this case the grid will be of the numeric value on the basis of the two categorical values. If all three columns are specified, then the first case will be preformed.

dfpd.DataFrame, mandatory

The dataset holding the data that will be plotted.

col1str, optional

The categorical column that the grid will be split based on.

col2str, optional

The categorical column on the x-axis of each plot.

col3str, optional

The numerical column on the y-axis of each plot.

fig, ax : Figure, array of axes.Axes The matplotlib figure and axes containing the plots.

> boxplot_grid(df=health_data, col1=”months”, col2=”sex”, col3=”BMI”)

hypehd.visualization.pie(df, col, path=None, name='pie_chart')

Draws a pie chart of the specified column. If the path is given the png file of the chart will be saved under the name of pie_chart.png at the specified path. This chart should be used for columns with discrete values.

dfpd.DataFrame, mandatory

The dataset that contains the feature that will be plotted.

colstr, mandatory

The name of the column containing the feature to be plotted.

pathstr, optional

Path to the directory that the png file of the chart will be saved in. If left empty, the file will not be saved.

namestr, optional

Name of the png image of the chart. The default is pie_chart.

ax : axes.Axes The axes of the plot.

> pie(df = demographic_data, col = “sex”, path = “/Users/Person/Documents”, name = “demo_pie”)