The STK has been split such that the main functionality can be imported into your own Python script and used programmatically. If you have downloaded the source package there are a number of example scripts in the stk/scripts directory. If you know Python, simply import the stk.supertree_toolkit module. If you don’t, then go and learn Python – it’s very useful.

Below is a description of the functions that are available in the API.

stk.supertree_toolkit.add_historical_event(XML, event_description)

Add a historial_event element to the XML. The element contains a description of the event and the the current date will ba added

stk.supertree_toolkit.all_sourcenames(XML, trees=False)

Create a sensible sourcename for all sources in the current dataset. This includes appending a, b, etc for duplicate names.

stk.supertree_toolkit.amalgamate_trees(XML, format='nexus', anonymous=False, ignoreWarnings=False)

Create a string containing all trees in the XML. String can be formatted to one of Nexus, Newick or TNT. Only Nexus formatting takes into account the anonymous flag - the other two are anonymous anyway Any errors and None is returned - check for this as this is the callers responsibility

stk.supertree_toolkit.check_subs(XML, new_taxa)

Check a subs file and issue a warning if any of the incoming taxa are not already in the dataset. This is often what is wanted, but sometimes it is not. We run this before we do the subs to alert the user of this but they may continue


Cleans up (i.e. deletes) non-informative trees and empty sources Same function as check data, but instead of raising message, simply fixes the problems.

stk.supertree_toolkit.create_matrix(XML, format='hennig', quote=False, taxonomy=None, outgroups=False, ignoreWarnings=False)

From all trees in the XML, create a matrix

stk.supertree_toolkit.create_matrix_from_trees(trees, format='hennig')

Given a dictionary of trees, create a matrix

stk.supertree_toolkit.create_name(authors, year, append='')

Construct a sensible from a list of authors and a year for a source name. Input: authors - list of last (family, sur) names (string).

year - the year (string). append - append something onto the end of the name.

Output: source_name - (string)

stk.supertree_toolkit.create_subset(XML, search_terms, andSearch=True, includeMultiple=True, ignoreWarnings=False)

Create a new dataset which is a subset of the incoming one. searchTerms is a dict, with the following keys: years - list consisting of the years to include. An entry can contain two years seperated by -. A range will then

be used.

characters - list of charcters to include character_types - list of character types to include (Molecular, Morphological, Behavioural or Other) analyses - list of analyses to include (MRP, etc) taxa - list of taxa that must be in a source tree fossil - all_fossil or all_extant

Multiple requests produce and matches (so between 2000-2010 and Molecular and contain Gallus gallus) unless andSearch is false. If it is, an or search is used. So the example would be years 2000-2010 or Molecular or contain Gallus gallus

includeMultiple means that a source can contain Molecular and Morophological characters and match Molecular (or, indeed, Morpholoigcal). Set to False to include if it’s only Molecular you’re after (i.e. trees with mixed character sets will be ignored). This applies to characters and character_types only (as the other terms don’t make sense with this off).

Note: this funtion is not (yet) taxonomically aware, so Galliformes will only return trees that actually have a leaf called Galliformes. Gallus gallus will not match.

Also note: The tree strings are searched for taxa, not the taxa elements (which are optional)

A new PHYML file will be produced. The calling function must do something sensible with that

stk.supertree_toolkit.create_tree_name(XML, source_tree_element)

Creates a tree name for a given source Simply the source_name with a number added source_tree_element is the element that contains the source tree, i.e. sources/source/source_tree

stk.supertree_toolkit.data_independence(XML, make_new_xml=False, ignoreWarnings=False)

Return a list of sources that are not independent. This is decided on the source data and the characters.

stk.supertree_toolkit.data_overlap(XML, overlap_amount=2, filename=None, detailed=False, show=False, verbose=False, ignoreWarnings=False)

Calculate the amount of taxonomic overlap between source trees. The output is a True/False by default, but you can specify an optional filename, which will save a nice graphic. For the GUI, the output can also be a PNG graphic to display (and then save).

If filename is None, no graphic is generated. Otherwise a simple graphic is generated showing the number of cluster. If detailed is set to true, a graphic is generated showing all trees. For data containing >200 source tres this could be very big and take along time. More likely, you’ll run out of memory.

stk.supertree_toolkit.data_summary(XML, detailed=False, ignoreWarnings=False)

Creates a text string that summarises the current data set via a number of statistics such as the number of character types, distribution of years of publication, etc.

Up to the calling function to display string nicely

stk.supertree_toolkit.export_bibliography(XML, filename, format='bibtex')

Export all source papers as a bibliography in either bibtex, xml, html, short or long formats

stk.supertree_toolkit.get_all_characters(XML, ignoreErrors=False)

Returns a dictionary containing a list of characters within each character type


From a full XML-PHYML string, extract all source names.

stk.supertree_toolkit.get_all_taxa(XML, pretty=False, ignoreErrors=False)

Produce a taxa list by scanning all trees within a PHYML file.

The list is return sorted (alphabetically).

Setting pretty=True means all underscores will be replaced by spaces

stk.supertree_toolkit.get_analyses_used(XML, ignoreErrors=False)

Return a sorted, unique array of all analyses types used in this dataset

stk.supertree_toolkit.get_character_numbers(XML, ignoreErrors=False)

Return the number of trees that use each character

stk.supertree_toolkit.get_character_types_from_tree(XML, name, sort=False)

Get the character types that were used in a particular tree

stk.supertree_toolkit.get_characters_from_tree(XML, name, sort=False)

Get the characters that were used in a particular tree


Return a sorted, unique array of all character names used in this dataset


Return a list of fossil taxa


For each tree, get the outgroup defined in the schema

stk.supertree_toolkit.get_publication_year_tree(XML, name)

Return a dictionary of years and the number of publications within that year


Return a dictionary of years and the number of publications within that year

stk.supertree_toolkit.get_taxa_from_tree(XML, tree_name, sort=False)

Return taxa from a single tree based on name


Get weights for each tree. Returns dictionary of tree name (key) and weights (value)

stk.supertree_toolkit.import_bibliography(XML, bibfile, skip=False)

Create a bunch of sources from a bibtex file. This includes setting the sourcenames for each source.

stk.supertree_toolkit.import_tree(filename, gui=False, tree_no=-1)

Takes a NEXUS formatted file and returns a list containing the tree strings


Return an array of all trees in a file. All formats are supported that we’ve come across but submit a bug if a (common-ish) tree file shows up that can’t be parsed.


Super simple function that returns XML string from PHYML file


Parse the XML and obtain all tree strings Output: dictionary of tree strings, with key indicating treename (unique)


Reads in a subs file and returns two arrays: new_taxa and the corresponding old_taxa

None is used to indicated deleted taxa

stk.supertree_toolkit.permute_tree(tree, matrix='hennig', treefile=None, verbose=False)

Permute a tree where there is uncertianty in taxa location. Output either a tree file or matrix file of all possible permutations.

Note this is a recursive algorithm.


Read a Nexus or Hennig formatted matrix file. Returns the matrix and taxa.

stk.supertree_toolkit.replace_genera(XML, dry_run=False, ignoreWarnings=False)

Remove all generic taxa by replacing them with a polytomy of all species in the dataset belonging to that genera

stk.supertree_toolkit.safe_taxonomic_reduction(XML, matrix=None, taxa=None, verbose=False, queue=None, ignoreWarnings=False)

Perform STR on data to remove taxa that provide no useful additional information. Based on PerEQ (Jeffery and Wilkson, unpublished).

stk.supertree_toolkit.set_all_tree_names(XML, overwrite=False)

Set all unset tree names


Ensures all sources have unique names.

stk.supertree_toolkit.single_sourcename(XML, append='')

Create a sensible source name based on the bibliographic data. XML should contain the xml_root for the source that is to be altered only. NOTE: It is the responsibility of the calling process of this

function to check for name uniqueness.

From the textual output from STR (safe_taxonomic_reduction), create the subs file to put the C category taxa back into the dataset. We work with the text out as it’s the same as PerlEQ, which means this might work from them also...


Create taxonomic subs from a CSV file, where the first column is the old taxon and all other columns are the new taxa to be subbed in-place

stk.supertree_toolkit.substitute_taxa(XML, old_taxa, new_taxa=None, only_existing=False, ignoreWarnings=False, verbose=False, skip_existing=False, generic_match=False)

Swap the taxa in the old_taxa array for the ones in the new_taxa array

If the new_taxa array is missing, simply delete the old_taxa

only_existing will ensure that the new_taxa are already in the dataset

Returns a new XML with the taxa swapped from each tree and any taxon elements for those taxa removed. It’s up to the calling function to do something sensible with this infomation

stk.supertree_toolkit.substitute_taxa_in_trees(trees, old_taxa, new_taxa=None, only_existing=False, ignoreWarnings=False, verbose=False, generic_match=False)

Swap the taxa in the old_taxa array for the ones in the new_taxa array

If the new_taxa array is missing, simply delete the old_taxa

only_existing will ensure only taxa in the dataset are subbed in.

Returns a new list of trees with the taxa swapped from each tree It’s up to the calling function to do something sensible with this infomation

Previous topic

5. STK Tutorial

This Page