@@ -42,6 +42,7 @@ your home directory: `~/ICoVeR/`.
...
@@ -42,6 +42,7 @@ your home directory: `~/ICoVeR/`.
## Quick start
## Quick start
The easiest way to get started with ICoVeR is by installing the package with the pre-loaded CSTR data set.
The easiest way to get started with ICoVeR is by installing the package with the pre-loaded CSTR data set.
The CSTR data set is generated by us from and consists of multiple samples from an anaerobic digester.
This data is already in the repository under [R.ICoVeR/data](https://github.com/bbroeksema/ICoVeR/tree/master/R.ICoVeR/data).
This data is already in the repository under [R.ICoVeR/data](https://github.com/bbroeksema/ICoVeR/tree/master/R.ICoVeR/data).
To install ICoVeR open [R.ICoVeR/ICoVeR.Rproj](https://github.com/bbroeksema/ICoVeR/blob/master/R.ICoVeR/ICoVeR.Rproj) in RStudio and use the "build and reload" button in the build tab (top right of the screen).
To install ICoVeR open [R.ICoVeR/ICoVeR.Rproj](https://github.com/bbroeksema/ICoVeR/blob/master/R.ICoVeR/ICoVeR.Rproj) in RStudio and use the "build and reload" button in the build tab (top right of the screen).
Alternatively. use an R-session to install the ICoVeR package:
Alternatively. use an R-session to install the ICoVeR package:
In order to get you started quickly with the tool, we provide two prepared datasets.
To demonstrate how you can prepare your own data sets for ICoVeR we have added the files, required for ICoVeR, for the data set published by [Wrighton et al](http://www.sciencemag.org/content/337/6102/1661).
The first data set is the one generated by [Wrighton et al](http://www.sciencemag.org/content/337/6102/1661).
To prepare the data files read by ICoVer you have to provide the following files:
The second data set, is the CSTR data set, which is generated by us from an anaerobic digester.
*[REQ] A [fasta file](https://github.com/bbroeksema/ICoVeR/blob/master/data/wrighton_assembly.fasta.gz) with sequences for all the contigs (may be gzip compressed)
*[REQ] A [coverage file](https://github.com/bbroeksema/ICoVeR/blob/master/data/wrighton_avg_cov.csv) with coverage levels of contigs for each of the samples
*[REQ] An [essential single copy gene file](https://github.com/bbroeksema/ICoVeR/blob/master/data/wrighton_escg.csv) with contig - gene pairs
*[OPT] A [clusterings file](https://github.com/bbroeksema/ICoVeR/blob/master/data/wrighton_clusterings.csv) with binning results from one or more automated binning tools such as metabat. Although optional, it is highly recommended to start with an automated approach to speed up the verification and refinement process.
**TODO: ** Add some notes on what tools we used to create those files.
## Preprocessing
## Preprocessing
In order to load the data into our interactive contig binning system, we need to
In order to load the data into our interactive contig binning system, we need to pre-process the afore mentioned files.
pre-process the fasta and abundance level files. This is done by using the
This is done by using the scripts provided in the R.preprocessing directory.
scripts provided in the R.preprocessing directory. In this example we will
In this example we will prepare the wrighton data set for ICoVeR.
prepare the cstr data set for interactive contig binning.
The `PrepareDataForInteractiveBinning` function performs the following steps:
The `PrepareDataForInteractiveBinning` function performs the following steps:
1. Extract gc_content and contig length for each contig from the fasta file
1. Extract gc_content and contig length for each contig from the fasta file
1. Extract tetra nucleotide frequencies for each contig from the fasta file
1. Extract tetra nucleotide frequencies for each contig from the fasta file
1.Combine extracted information with sample abundance levels into the files
1.Reads optional binning results from automated methods
which are expected by the interactive contig binning tool.
1. Combines extracted information with sample abundance levels into the files which are expected by ICoVeR.
**NOTE:** The ids in the fasta file must match the names in the abundance file.
**NOTE:** The ids in the fasta file must match the names in the abundance file.
Thus, if a contig in the fasta file starts with a line `>contig_123`, there must
Thus, if a contig in the fasta file starts with a line `>contig_123`, there must
...
@@ -99,28 +105,34 @@ Start R studio and type the following commands.
...
@@ -99,28 +105,34 @@ Start R studio and type the following commands.
# Set the working directory (change path to your local checkout location)
# Set the working directory (change path to your local checkout location)
> setwd("~/ICoVeR")
> setwd("~/ICoVeR")
# Source the required R file for preprocessing. This will load all the required
# Source the required R files for preprocessing.
# dependencies as well.
# Alternatively, have a look at R.preprocessing/preprocessing.R from which
Or by directly putting the right link into your browser window:
http://localhost:2110/ocpu/library/ICoVeR/www/
http://localhost:8000/ocpu/library/ICoVeR/www/
**NOTE 1:** The port number (i.e. 2110) must match with the output of OpenCPU in your RStudio session.
**NOTE 1:** The port number (i.e. 8000) must match with the output of OpenCPU in your RStudio session.
This can differ everytime you start the application.
This can differ everytime you start the application.
To control this, use the commands listed above: first stopping OpenCPU, next restarting it with a fixed port.
**NOTE 2:** The application stores the most important data resulting from your analysis (i.e. clustering results and tags).
**NOTE 2:** The application stores the most important data resulting from your analysis (i.e. clustering results and tags).
An analysis can therefore be splitted into several sessions.
An analysis can therefore be splitted into several sessions.
The initial preparations are not required after the first (assuming binning of the same data set is continued).
The initial preparations do not have to be repeated each time you want to continue working on the data set (assuming binning of the same data set is continued).
To continue a previously stopped session, just start R(Studio), load the OpenCPU library as shown above, and point your browser to the correct url.
To continue a previously stopped session, just start R(Studio), load the OpenCPU library as shown above, and point your browser to the correct url.