hypehd.visualization
Module Contents
Functions
|
Plots a three-dimensional graph of clusters for the three specified numerical columns. |
|
Plots a two-dimensional graph of clusters for the two specified numercial columns. |
|
Plots a three-dimensional graph based on the three columns entered. |
|
Return the p-value given two groups' data. |
|
Show the count of categorical characteristic variables in each group and combine with a summary table. |
|
Show the scatter plot of outcome means over time in each group and combine with a summary table. Function for |
|
Plots a heat-map of the relationship between features of the same type. If type of feature |
|
Show the kaplan-meier curve and combine with a median survival time summary. |
|
A function for creating a grid of box plots with two options. One being that |
|
Draws a pie chart of the specified column. If the path is given the png file of |
- hypehd.visualization.cluster_3d(df, cols, c_type='k-means', number=None, min_sample=3, eps=0.5, lab1=None, lab2=None, lab3=None, legend=False, path=None, name='cluster3d')
Plots a three-dimensional graph of clusters for the three specified numerical columns. The type of clustering as well as some minor fine-tuning of clustering model are available.
- dfpd.DataFrame, mandatory
The dataset that contains the feature that will be plotted.
- colslist, mandatory
A list with three element of type str which are the column names.
- c_typestr, optional
The type of clustering model. The available options are DBSCAN, OPTICS and k-means. If no type is specified, k-means will be preformed.
- numberint, optional
The number of clusters. This parameter is only used for k-means. The default value is the smallest number of clusters with an inertia value less than 50.
- min_sampleint, optional
The minimum number of samples in a cluster. This parameter is only used for DBSCAN and OPTICS. The default value is 3.
- eps: float, optional
The maximum distance between samples for them to fall into the same cluster. This parameter is only used for DBSCAN. The default value is 0.5.
- lab1str, optional
The label for axis 1. The default option is the name of the column in the data frame.
- lab2str, optional
The label for axis 2. The default option is the name of the column in the data frame.
- lab3str, optional
The label for axis 3. The default option is the name of the column in the data frame.
- legendbool, optional
Show legend or not. default is False.
- pathstr, optional
The directory path to save the plot in. Plot will not be saved if not specified.
- namestr, optional
Name of the plot. The default is cluster3d.
fig : plt.figure ax : axes.Axes The figure and axes of the plot.
> cluster_3d(df = data, cols = [‘age’,’height’,’BMI’], c_type = “DBSCAN”, lab1 = “Age”, lab2 = “Height”)
- hypehd.visualization.cluster_2d(df, cols, c_type='k-means', number=None, min_sample=3, eps=0.5, lab1=None, lab2=None, path=None, name='cluster2d')
Plots a two-dimensional graph of clusters for the two specified numercial columns. The type of clustering as well as some minor fine-tuning of clustering model are available.
- dfpd.DataFrame, mandatory
The dataset that contains the feature that will be plotted.
- colslist, mandatory
A list with two element of type str which are the column names.
- c_typestr, optional
The type of clustering model. The available options are DBSCAN, OPTICS and k-means. If no type is specified, k-means will be preformed.
- numberint, optional
The number of clusters. This parameter is only used for k-means. The default value is the smallest number of clusters with an inertia value less than 50.
- min_sampleint, optional
The minimum number of samples in a cluster. This parameter is only used for DBSCAN and OPTICS. The default value is 3.
- eps: float, optional
The maximum distance between samples for them to fall into the same cluster. This parameter is only used for DBSCAN. The default value is 0.5.
- lab1str, optional
The label for axis 1. The default option is the name of the column in the data frame.
- lab2str, optional
The label for axis 2. The default option is the name of the column in the data frame.
- pathstr, optional
The directory path to save the plot in. Plot will not be saved if not specified.
- namestr, optional
Name of the plot. The default is cluster2d.
fig : plt.figure ax : axes.Axes The figure and axes of the plot.
> cluster_2d(df = data, cols = [‘age’,’height’], c_type = “OPTICS”, lab1 = “Age”, lab2 = “Height”)
- hypehd.visualization.graph_3d(df, ax1: str, ax2: str, ax3: str, lab1=None, lab2=None, lab3=None, path=None, name='graph3d')
Plots a three-dimensional graph based on the three columns entered.
- dfpd.DataFrame, mandatory
The dataset that contains the feature that will be plotted.
- ax1str, mandatory
The name of the column for axis 1 containing the feature to be plotted.
- ax2str, mandatory
The name of the column for axis 2 containing the feature to be plotted.
- ax3str, mandatory
The name of the column for axis 3 containing the feature to be plotted.
- lab1str, optional
The label for axis 1. The default option is the name of the column in the data frame.
- lab2str, optional
The label for axis 2. The default option is the name of the column in the data frame.
- lab3str, optional
The label for axis 3. The default option is the name of the column in the data frame.
- pathstr, optional
The directory path to save the plot in. Plot will not be saved if not specified.
- namestr, optional
Name of the plot. The default is graph3d.
fig : plt.figure ax : axes.Axes The figure and axes of the plot.
> graph_3d(df = health_data, ax1 = “sbp”, ax2 = “dbp”, ax3 = “chd”, lab1 = “SBP”, lab2 = “DBP”, lab3 = “CHD”)
- hypehd.visualization.f_test(group1, group2)
Return the p-value given two groups’ data.
- group1series or list, mandatory
Containing continuous numbers for F-test from group 1.
- group2series or list, mandatory
Containing continuous numbers for F-test from group 2.
- p_valuefloat
A number round to 3 decimal places.
> a = [0.28, 0.2, 0.26, 0.28, 0.5] > b = [0.2, 0.23, 0.26, 0.21, 0.23] > f_test(a, b) 0.004
- hypehd.visualization.demo_graph(var: list, input_data: pandas.DataFrame, group=None)
Show the count of categorical characteristic variables in each group and combine with a summary table. Show the boxplot of countinous characteristic variables in each group and combine with a smmary table.
- varlist, mandatory
List of the characteristic variables. The list can include both categorical and countinous variables. The function can automatically detect its type and then use proper plot.
- input_datapd.DataFrame, mandatory
Input dataset name.
- groupnames of variables in input_data, optional
Grouping variables that will produce plottings and summary tables with different colors (e.g. treatment group).
tuple (fig_list, ax_list) : list of Figure, list of axes.Axes The matplotlib figures and axes containing the plots and summary tables.
longitudinal_graph
> demo_graph(var=[‘gender’,’age’], input_data=data, group=”treatment”)
- hypehd.visualization.longitudinal_graph(outcome: list, time, group, input_data: pandas.DataFrame)
Show the scatter plot of outcome means over time in each group and combine with a summary table. Function for longitudinal data analysis.
- outcomelist, mandatory
List of the continuous outcome(y) variables need to be plotted.
- timenames of variables in input_data, mandatory
Time variables(x)(e.g. visit number).
- groupnames of time variables in input_data, mandatory
Grouping variables that will produce plottings and summary tables with different colors (e.g. treatment group).
- input_datapd.DataFrame, mandatory
Input dataset name.
tuple (fig_list, ax_list) : list of Figure, list of axes.Axes The matplotlib figures and axes containing the plots and summary tables.
demo_graph
> longitudinal_graph(outcome=[“change_from_baseline”], time=”visit”, group=”treatment”, input_data=data)
- hypehd.visualization.relation(df, gtype=3, path=None, name_chi='chiheatmap', name_cor='corheatmap')
Plots a heat-map of the relationship between features of the same type. If type of feature (numerical or categorical) is not specified, both heat-maps will be drawn. The measure used is chi-squared for categorical and correlation for numerical types. This function does not calculate the relationship between categorical and numerical values.
- dfpd.DataFrame, mandatory
The dataset.
- gtypeint, optional
Type of feature. 1 for categorical (chi-squared), 2 for numerical (correlation) any other number for both. The default is 3 (both).
- pathstr, optional
The directory path to save the plot in. Plot will not be saved if not specified.
- name_chistr, optional
Name of the plot for the categorical features. The default is chiheatmap.
- name_cor: str, optional
Name of the plot for the numerical features. The default is corheatmap.
list : plt.figure, axes.Axes, p <Object containing both figure and axes> A list containing the figure and axes of each drawn plot.
> relation(df = health_data, gtype = 2, path = “/Users/Person/Documents”, name_cor = “numerical_heatmap”)
- hypehd.visualization.survival_analysis(time, censor_status, group, input_data: pandas.DataFrame)
Show the kaplan-meier curve and combine with a median survival time summary. Function for survival data analysis.
- timenames of time variables in input_data, mandatory
Time to event of interest.
- censor_statusnames of variables in input_data, mandaory
True(1) if the event of interest was observed, False(0) if the event was lost (right-censored).
- groupnames of time variables in input_data, mandatory
Grouping variables that will produce plottings and summary tables with different colors (e.g. treatment group).
- input_datapd.DataFrame, mandatory
Input dataset name.
fig, ax : Figure, axes.Axes The matplotlib figure and ax containing the plot and summary table.
> survival_analysis(time=”time_to_event”, censor_status=”censor”, group=”treatment”, input_data=data)
- hypehd.visualization.boxplot_grid(df, col1=None, col2=None, col3=None)
A function for creating a grid of box plots with two options. One being that no column was specified, in this case the grid will be box plots of all numeric features in the dataset. The other being that three columns were specified, with one column being numeric and the others being categorical. In this case the grid will be of the numeric value on the basis of the two categorical values. If all three columns are specified, then the first case will be preformed.
- dfpd.DataFrame, mandatory
The dataset holding the data that will be plotted.
- col1str, optional
The categorical column that the grid will be split based on.
- col2str, optional
The categorical column on the x-axis of each plot.
- col3str, optional
The numerical column on the y-axis of each plot.
fig, ax : Figure, array of axes.Axes The matplotlib figure and axes containing the plots.
> boxplot_grid(df=health_data, col1=”months”, col2=”sex”, col3=”BMI”)
- hypehd.visualization.pie(df, col, path=None, name='pie_chart')
Draws a pie chart of the specified column. If the path is given the png file of the chart will be saved under the name of pie_chart.png at the specified path. This chart should be used for columns with discrete values.
- dfpd.DataFrame, mandatory
The dataset that contains the feature that will be plotted.
- colstr, mandatory
The name of the column containing the feature to be plotted.
- pathstr, optional
Path to the directory that the png file of the chart will be saved in. If left empty, the file will not be saved.
- namestr, optional
Name of the png image of the chart. The default is pie_chart.
ax : axes.Axes The axes of the plot.
> pie(df = demographic_data, col = “sex”, path = “/Users/Person/Documents”, name = “demo_pie”)