Making maps with ggplot2 and sf

Recently, the newest version of the popular ggplot2 graphics package was announced, and it has some nifty mapping features that I was keen to try out Continue reading


Further Testing Of quickRareCurve

After my post yesterday, documenting a faster parallelised version of the rarecurve function (quickRareCurve), I realised it’d be good to show a real world example using it on a reasonably large OTU table, to prove that it is indeed quicker than the original function. So, here we go. Continue reading

Speeding Up Rarefaction Curves For Microbial Community Ecology

When beginning analyses on microbial community data, it is often helpful to compute rarefaction curves. A rarefaction curve tells you about the rate at which new species/OTUs are detected as you increase the number of individuals/sequences sampled. It does this by taking random subsamples from 1, up to the size of your sample, and computing the number of species present in each subsample. Ideally, you want your rarefaction curves to be relatively flat, as this indicates that additional sampling would not likely yield further species.

The vegan package in R has a nice function for computing rarefaction curves for species by site abundance tables. However, for microbial datasets this function is often prohibitively slow. Continue reading

The future of USEARCH; a closed-source software in an open-source world

Let me start off by stating that I have enormous respect for Rob Edgar (creator of USEARCH, UPARSE etc.). His contributions to the field of bioinformatics, and indirectly to the fields of molecular and microbial ecology have been huge, you only need to look at his citation rates to see that! So this post is not intended as a criticism of him or his work in any way.

That said, I’ve recently been thinking about the deluge of new algorithms for picking Operational Taxonomic Units (OTUs) from molecular sequence datasets and wondering where and how USEARCH, UCLUST et al. will fit in. Continue reading

ggplot2; From default to delightful

If you use R to analyse and plot your data, then you’ve probably heard of and used the ggplot2 package, written by Hadley Wickham. ggplot2 is a highly flexible plotting package allowing you to create just about any kind of plot you can think of, and customise just about any aspect of your plot.

However, ggplot2 is also known for it’s somewhat strange choice of default options (at least, they seem strange to me!). Therefore, it can seem like a lot of work to go from a basic plot to something that is approaching publication quality. Continue reading

Merging Taxonomy With Non-Qiime OTU Tables

This is a quick post more to document some useful code. When conducting bioinformatic analyses using Qiime, one of the last steps is to cluster sequences into OTUs (operational taxonomic units) and assign taxonomy to them. You can then make an OTU table which contains all your OTUs and their associated taxonomy. Bam, easy!

But what to do if you haven’t/don’t want to use Qiime to cluster OTUs? Continue reading

Know Your Fungi: A Brief Demo Of The New Fungal Classifier

As a microbial ecologist, part of my job is to try and assign taxonomy to all of the microbial critters living in the habitats I study. For archaeal or bacterial 16S rRNA gene sequences this is relatively easy. The Ribosomal Database Project have a naive Bayesian classifier which is trained on a large curated database of archaeal and bacterial 16S rRNA gene sequences. Best of all, it is implemented in the popular bioinformatics pipeline Qiime, making it nice and easy to apply to your own data.

But what if you are for dealing with fungal ITS sequences instead? Continue reading

Barking up the wrong tree? Leaf your troubles behind with dendextend

I was recently asked by one of my PhD supervisors to help out on a paper by doing some metagenomic analyses. My mission was essentially to perform some taxonomic analyses of metagenomes and show how a metagenome generated in our lab related to these.

So, naturally, I said yes, carried out the necessary analyses and proceeded to design a figure to show the result. I figured a dendrogram would be a nice way of showing compositional similarity between the community we studied and other communities. Continue reading