Commit 8105a89b authored by Bertjan Broeksema's avatar Bertjan Broeksema
Browse files

Upgrade the long installation guide.

parent eafa185d
...@@ -9,7 +9,7 @@ Table of Contents ...@@ -9,7 +9,7 @@ Table of Contents
* [Prerequisites](#prerequisites ) * [Prerequisites](#prerequisites )
* [Quick start](#quick-start ) * [Quick start](#quick-start )
* [Data preparation](#data-preparation ) * [Prepare your own data](#prepare-your-own-data)
## Prerequisites ## Prerequisites
...@@ -42,6 +42,7 @@ your home directory: `~/ICoVeR/`. ...@@ -42,6 +42,7 @@ your home directory: `~/ICoVeR/`.
## Quick start ## Quick start
The easiest way to get started with ICoVeR is by installing the package with the pre-loaded CSTR data set. The easiest way to get started with ICoVeR is by installing the package with the pre-loaded CSTR data set.
The CSTR data set is generated by us from and consists of multiple samples from an anaerobic digester.
This data is already in the repository under [R.ICoVeR/data](https://github.com/bbroeksema/ICoVeR/tree/master/R.ICoVeR/data). This data is already in the repository under [R.ICoVeR/data](https://github.com/bbroeksema/ICoVeR/tree/master/R.ICoVeR/data).
To install ICoVeR open [R.ICoVeR/ICoVeR.Rproj](https://github.com/bbroeksema/ICoVeR/blob/master/R.ICoVeR/ICoVeR.Rproj) in RStudio and use the "build and reload" button in the build tab (top right of the screen). To install ICoVeR open [R.ICoVeR/ICoVeR.Rproj](https://github.com/bbroeksema/ICoVeR/blob/master/R.ICoVeR/ICoVeR.Rproj) in RStudio and use the "build and reload" button in the build tab (top right of the screen).
Alternatively. use an R-session to install the ICoVeR package: Alternatively. use an R-session to install the ICoVeR package:
...@@ -67,25 +68,30 @@ browseURL("http://localhost:8000/ocpu/library/ICoVeR/www/", browser = getOption( ...@@ -67,25 +68,30 @@ browseURL("http://localhost:8000/ocpu/library/ICoVeR/www/", browser = getOption(
{% endhighlight %} {% endhighlight %}
## Data preparation ## Prepare your own data
In order to get you started quickly with the tool, we provide two prepared datasets. To demonstrate how you can prepare your own data sets for ICoVeR we have added the files, required for ICoVeR, for the data set published by [Wrighton et al](http://www.sciencemag.org/content/337/6102/1661).
The first data set is the one generated by [Wrighton et al](http://www.sciencemag.org/content/337/6102/1661). To prepare the data files read by ICoVer you have to provide the following files:
The second data set, is the CSTR data set, which is generated by us from an anaerobic digester.
* [REQ] A [fasta file](https://github.com/bbroeksema/ICoVeR/blob/master/data/wrighton_assembly.fasta.gz) with sequences for all the contigs (may be gzip compressed)
* [REQ] A [coverage file](https://github.com/bbroeksema/ICoVeR/blob/master/data/wrighton_avg_cov.csv) with coverage levels of contigs for each of the samples
* [REQ] An [essential single copy gene file](https://github.com/bbroeksema/ICoVeR/blob/master/data/wrighton_escg.csv) with contig - gene pairs
* [OPT] A [clusterings file](https://github.com/bbroeksema/ICoVeR/blob/master/data/wrighton_clusterings.csv) with binning results from one or more automated binning tools such as metabat. Although optional, it is highly recommended to start with an automated approach to speed up the verification and refinement process.
**TODO: ** Add some notes on what tools we used to create those files.
## Preprocessing ## Preprocessing
In order to load the data into our interactive contig binning system, we need to In order to load the data into our interactive contig binning system, we need to pre-process the afore mentioned files.
pre-process the fasta and abundance level files. This is done by using the This is done by using the scripts provided in the R.preprocessing directory.
scripts provided in the R.preprocessing directory. In this example we will In this example we will prepare the wrighton data set for ICoVeR.
prepare the cstr data set for interactive contig binning.
The `PrepareDataForInteractiveBinning` function performs the following steps: The `PrepareDataForInteractiveBinning` function performs the following steps:
1. Extract gc_content and contig length for each contig from the fasta file 1. Extract gc_content and contig length for each contig from the fasta file
1. Extract tetra nucleotide frequencies for each contig from the fasta file 1. Extract tetra nucleotide frequencies for each contig from the fasta file
1. Combine extracted information with sample abundance levels into the files 1. Reads optional binning results from automated methods
which are expected by the interactive contig binning tool. 1. Combines extracted information with sample abundance levels into the files which are expected by ICoVeR.
**NOTE:** The ids in the fasta file must match the names in the abundance file. **NOTE:** The ids in the fasta file must match the names in the abundance file.
Thus, if a contig in the fasta file starts with a line `>contig_123`, there must Thus, if a contig in the fasta file starts with a line `>contig_123`, there must
...@@ -99,28 +105,34 @@ Start R studio and type the following commands. ...@@ -99,28 +105,34 @@ Start R studio and type the following commands.
# Set the working directory (change path to your local checkout location) # Set the working directory (change path to your local checkout location)
> setwd("~/ICoVeR") > setwd("~/ICoVeR")
# Source the required R file for preprocessing. This will load all the required # Source the required R files for preprocessing.
# dependencies as well. # Alternatively, have a look at R.preprocessing/preprocessing.R from which
> source("R.preprocessing/preprocessing.R") # below commands are taken.
> source("R.preprocessing/FrequenciesSignatures.R")
> source("R.preprocessing/SymmetrizedSignatures.R")
> source("R.preprocessing/ExtractESCG.R")
> source("R.preprocessing/PrepareDataForInteractiveBinning.R")
> PrepareDataForInteractiveBinning( > PrepareDataForInteractiveBinning(
dataset.name = "cstr", dataset.name = "wrighton",
file.fasta = "data//cstr_assembled.fasta", file.fasta = "data//wrighton_assembly.fasta.gz",
file.abundance = "data//cstr_avg_coverage.csv", file.abundance = "data//wrighton_avg_cov.csv",
dir.result = "data//prepared" file.escg = "data//wrighton_escg.csv",
file.clusterings = "data//wrighton_clusterings.csv",
dir.result = "R.ICoVeR//data"
) )
## Installation ## Installation
# Check if you find cstr.rda and cstr.schema.rda in: R.ICoVeR/data. If so, we # Check if you find wrighton.rda, wrigthon.schema.rda and wrighton.escg.rda in: R.ICoVeR/data. If so, we
# continue installing the interactive binning application. # continue installing the interactive binning application.
> library(devtools) > library(devtools)
# NOTE: Before installing you should check the file R.ICoBiRe/R/sqlite.R. At the # NOTE: Before installing you should check the file R.ICoBiRe/R/sqlite.R. At the
# top there is a variable declared named: p.db.dataset. The value for this # top there is a variable declared named: p.db.dataset. The value for this
# variable must match the data set you want to analyze (i.e. "cstr" in # variable must match the data set you want to analyze (i.e. "wrighton" in
# this case, which is the default). # this case).
# #
# If you want to bin a different contig set, you must change this value # If you want to bin a different data set, you must change this value
# **before** installing the R.ICoBiRe package. # **before** installing the R.ICoBiRe package.
> install_local(file.path(getwd(), "R.ICoVeR")) > install_local(file.path(getwd(), "R.ICoVeR"))
{% endhighlight %} {% endhighlight %}
...@@ -137,17 +149,27 @@ Using config: ~/.opencpu.conf ...@@ -137,17 +149,27 @@ Using config: ~/.opencpu.conf
OpenCPU started. OpenCPU started.
[httpuv] http://localhost:2110/ocpu [httpuv] http://localhost:2110/ocpu
OpenCPU single-user server ready. OpenCPU single-user server ready.
opencpu$stop() # It starts at a random port, which is annoying.
opencpu$start(8000)
{% endhighlight %} {% endhighlight %}
If OpenCPU started without errors, the interactive application can now be accessed If OpenCPU started without errors, the interactive application can now be accessed
in your browser at the following url: Open it in your browser using:
{% highlight r %}
browseURL("http://localhost:8000/ocpu/library/ICoVeR/www/", browser = getOption("browser"), encodeIfNeeded = FALSE)
{% endhighlight %}
Or by directly putting the right link into your browser window:
http://localhost:2110/ocpu/library/ICoVeR/www/ http://localhost:8000/ocpu/library/ICoVeR/www/
**NOTE 1:** The port number (i.e. 2110) must match with the output of OpenCPU in your RStudio session. **NOTE 1:** The port number (i.e. 8000) must match with the output of OpenCPU in your RStudio session.
This can differ everytime you start the application. This can differ everytime you start the application.
To control this, use the commands listed above: first stopping OpenCPU, next restarting it with a fixed port.
**NOTE 2:** The application stores the most important data resulting from your analysis (i.e. clustering results and tags). **NOTE 2:** The application stores the most important data resulting from your analysis (i.e. clustering results and tags).
An analysis can therefore be splitted into several sessions. An analysis can therefore be splitted into several sessions.
The initial preparations are not required after the first (assuming binning of the same data set is continued). The initial preparations do not have to be repeated each time you want to continue working on the data set (assuming binning of the same data set is continued).
To continue a previously stopped session, just start R(Studio), load the OpenCPU library as shown above, and point your browser to the correct url. To continue a previously stopped session, just start R(Studio), load the OpenCPU library as shown above, and point your browser to the correct url.
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment