make_3d_plots.py – Make 3D PCoA plots
Description:
This script automates the construction of 3D plots (kinemage format) from the PCoA output file generated by principal_coordinates.py (e.g. P1 vs. P2 vs. P3, P2 vs. P3 vs. P4, etc., where P1 is the first component).
Usage: make_3d_plots.py [options]
Input Arguments:
Note
[REQUIRED]
- -i, --coord_fname
- Input principal coordinates filepath (i.e., resulting file from principal_coordinates.py). Alternatively, a directory containing multiple principal coordinates files for jackknifed PCoA results.
- -m, --map_fname
- Input metadata mapping filepath
[OPTIONAL]
- -b, --colorby
- Comma-separated list categories metadata categories (column headers) to color by in the plots. The categories must match the name of a column header in the mapping file exactly. Multiple categories can be list by comma separating them without spaces. The user can also combine columns in the mapping file by separating the categories by “&&” without spaces. [default=color by all]
- -s, --scaling_method
- Comma-separated list of scaling methods (i.e. scaled or unscaled) [default=unscaled]
- -a, --custom_axes
- This is the category from the metadata mapping file to use as a custom axis in the plot. For instance, if there is a pH category and you would like to see the samples plotted on that axis instead of PC1, PC2, etc., one can use this option. It is also useful for plotting time-series data. Note: if there is any non-numeric data in the column, it will not be plotted [default: None]
- -p, --prefs_path
- Input user-generated preferences filepath. NOTE: This is a file with a dictionary containing preferences for the analysis. [default: None]
- -k, --background_color
- Background color to use in the plots. [default: black]
- --ellipsoid_smoothness
- Used only when plotting ellipsoids for jackknifed beta diversity (i.e. using a directory of coord files instead of a single coord file). Valid choices are 0-3. A value of 0 produces very coarse “ellipsoids” but is fast to render. If you encounter a memory error when generating or displaying the plots, try including just one metadata column in your plot. If you still have trouble, reduce the smoothness level to 0. [default: 1]
- --ellipsoid_opacity
- Used only when plotting ellipsoids for jackknifed beta diversity (i.e. using a directory of coord files instead of a single coord file). The valid range is between 0-1. 0 produces completely transparent (invisible) ellipsoids and 1 produces completely opaque ellipsoids. [default=0.33]
- --ellipsoid_method
- Used only when plotting ellipsoids for jackknifed beta diversity (i.e. using a directory of coord files instead of a single coord file). Valid values are “IQR” and “sdev”. [default=IQR]
- --master_pcoa
- Used only when plotting ellipsoids for jackknifed beta diversity (i.e. using a directory of coord files instead of a single coord file). These coordinates will be the center of each ellipisoid. [default: None; arbitrarily chosen PC matrix will define the center point]
- -t, --taxa_fname
- Used only when generating BiPlots. Input summarized taxa filepath (i.e., from summarize_taxa.py). Taxa will be plotted with the samples. [default=None]
- --n_taxa_keep
- Used only when generating BiPlots. This is the number of taxa to display. Use -1 to display all. [default: 10]
- --biplot_output_file
- Used only when generating BiPlots. Output coordinates filepath when generating a biplot. [default: None]
- --output_format
- Output format. If this option is set to invue you will need to also use the option -b to define which column(s) from the metadata file the script should use when writing an output file. [default: king]
- -n, --interpolation_points
- Used only when generating inVUE plots. Number of points between samples for interpolatation. [default: 0]
- --polyhedron_points
- Used only when generating inVUE plots. The number of points to be generated when creating a frame around the PCoA plots. [default: 4]
- --polyhedron_offset
- Used only when generating inVUE plots. The offset to be added to each point created when using the –polyhedron_points option. This is only used when using the invue output_format. [default: 1.5]
- --add_vectors
- Create vectors based on a column of the mapping file. This parameter accepts up to 2 columns: (1) create the vectors, (2) sort them. If you wanted to group by Species and order by SampleID you will pass –add_vectors=Species but if you wanted to group by Species but order by DOB you will pass –add_vectors=Species,DOB; this is useful when you use –custom_axes param [default: None]
- --vectors_algorithm
- The algorithm used to create the vectors. The method used can be RMS (either using ‘avg’ or ‘trajectory’); or the first difference (using ‘diff’), or ‘wdiff’ for a modified first difference algorithm (see –window_size) the aforementioned use all the dimensions and weights them using their percentage explained; returns the norm of the created vectors; and their confidence using ANOVA. The Vectors are created as follows: for ‘avg’ it calculates the average at each timepoint (averaging within a group), then calculates the norm of each point; for ‘trajectory’ calculates the norm for the 1st-2nd, 2nd-3rd, etc.; for ‘diff’, it calculates the norm for all the time-points and then calculates the first difference for each resulting point; for for ‘wdiff’ it uses the same procedure as the previous method but the subtraction will be between the mean of the next number of elements specified in –window_size and the current element, both methods (‘wdiff’ and ‘diff’) will also include the mean and the standard deviation of the calculations [defautl: None]
- --vectors_axes
- The number of axes to account while doing the vector specificcalculations. We suggest using 3 because those are the ones being displayed in the plots but you could use any number between 1 and number of samples- 1. To use all of them pass 0. [default: 3]
- --vectors_path
- Name of the file to save the first difference, or the root mean square (RMS) of the vectors grouped by the column used with the –add_vectors function. Note that this option only works with –add_vectors. The file is going to be created inside the output_dir and its name will start with the word ‘Vectors’.[default: vectors_output.txt]
- -w, --weight_by_vector
- Use -w when you want the output created in the –vectors_path to be weighted by the space between samples in the –add_vectors, sorting column, i. e. days between samples [default: False]
- --window_size
- Use –window_size, when selecting the modified first difference (‘wdiff’) option for –vectors_algorithm. This integer determines the number of elements to be averaged per element subtraction, the resulting vector. [default: None]
- -o, --output_dir
- Path to the output directory
Output:
By default, the script will plot the first three dimensions in your file. Other combinations can be viewed using the “Views:Choose viewing axes” option in the KiNG viewer (Chen, Davis, & Richardson, 2009), which may require the installation of kinemage software. The first 10 components can be viewed using “Views:Paralled coordinates” option or typing “/”. The mouse can be used to modify display parameters, to click and rotate the viewing axes, to select specific points (clicking on a point shows the sample identity in the low left corner), or to select different analyses (upper right window). Although samples are most easily viewed in 2D, the third dimension is indicated by coloring each sample (dot/label) along a gradient corresponding to the depth along the third component (bright colors indicate points close to the viewer).
Default Usage:
If you just want to use the default output, you can supply the principal coordinates file (i.e., resulting file from principal_coordinates.py) and a user-generated mapping file, where the default coloring will be based on the SampleID as follows:
make_3d_plots.py -i unweighted_unifrac_pc.txt -m Fasting_Map.txt
Mapping File Usage by Category:
Additionally, the user can supply their mapping file (‘-m’) and a specific category to color by (‘-b’) or any combination of categories. When using the -b option, the user can specify the coloring for multiple mapping labels, where each mapping label is separated by a comma, for example: -b’mapping_column1,mapping_column2’. The user can also combine mapping labels and color by the combined label that is created by inserting an ‘&&’ between the input columns, for example: -b ‘mapping_column1&&mapping_column2’.
make_3d_plots.py -i unweighted_unifrac_pc.txt -m Fasting_Map.txt -b 'Treatment&&DOB'
Color All Categories:
If the user would like to color all categories in their metadata mapping file they should not pass -b (default is color by all categories)
make_3d_plots.py -i unweighted_unifrac_pc.txt -m Fasting_Map.txt
Prefs File Example:
As an alternative, the user can supply a preferences (prefs) file, using the -p option. The prefs file allows the user to give specific samples their own columns within a given mapping column. This file also allows the user to perform a color gradient, given a specific mapping column.If the user wants to color by using the prefs file (e.g. prefs.txt), they can use the following code:
make_3d_plots.py -i unweighted_unifrac_pc.txt -m Fasting_Map.txt -p prefs.txt
Output Directory:
If you want to give an specific output directory (e.g. ‘3d_plots’), use the following code:
make_3d_plots.py -i unweighted_unifrac_pc.txt -m Fasting_Map.txt -o 3d_plots
Background Color Example:
If the user would like to color the background white they can use the ‘-k’ option as follows:
make_3d_plots.py -i unweighted_unifrac_pc.txt -m Fasting_Map.txt -k white
Jackknifed Principal Coordinates (w/ confidence intervals):
If you have created jackknifed PCoA files, you can pass the folder containing those files, instead of a single file. The user can also specify the opacity of the ellipses around each point ‘–ellipsoid_opacity’, which is a value from 0-1. Currently there are two metrics ‘–ellipsoid_method’ that can be used for generating the ellipsoids, which are ‘IQR’ and ‘sdev’. The user can specify all of these options as follows:
make_3d_plots.py -i pcoa -m Fasting_Map.txt -b 'Treatment&&DOB' --ellipsoid_opacity=0.5 --ellipsoid_method=IQR
Bi-Plots:
If the user would like to see which taxa are more prevalent in different areas of the PCoA plot, they can generate Bi-Plots, by passing a principal coordinates file or folder ‘-i’, a mapping file ‘-m’, and a summarized taxa file ‘-t’ from summarize_taxa.py. Can be combined with jacknifed principal coordinates.
make_3d_plots.py -i unweighted_unifrac_pc.txt -m Fasting_Map.txt -t otu_table_L3.txt