README.md 13.5 KB
Newer Older
Olivier Parisot's avatar
Olivier Parisot committed
1 2
# blizaar-ce

Olivier Parisot's avatar
Olivier Parisot committed
3
Copyright 2019-2020 Luxembourg Institute of Science and Technology (LIST - http://www.list.lu/). 
Olivier Parisot's avatar
Olivier Parisot committed
4

Olivier Parisot's avatar
Olivier Parisot committed
5
Any use of this software constitutes full acceptance of all terms of the [software's license](./LICENSE.txt).
Olivier Parisot's avatar
Olivier Parisot committed
6 7 8


## Overview
Fintan Mc Gee's avatar
Fintan Mc Gee committed
9
This repository stores the functionality developed by LIST as part of the [BLIZAAR project](https://blizaar.list.lu/). 
10
BLIZAAR is an international collaborative project (PRCI) proposal which fits in the “Information and Communication Society” challenge. It involves French and Luxembourgish partners working in collaboration to craft novel ways of exploring and analyzing dynamic multilayer networks. 
Fintan Mc Gee's avatar
Fintan Mc Gee committed
11
It is an early stage research prototype that was used to visualize in house data set, provided by project team members. 
Fintan Mc Gee's avatar
Fintan Mc Gee committed
12
for a description of the functionality offered by the LIST tool see the relevant [BLIZAAR project wiki page](https://blizaar.list.lu/doku.php?id=list_tool).
Olivier Parisot's avatar
Olivier Parisot committed
13

Fintan Mc Gee's avatar
Fintan Mc Gee committed
14
A list of publications related to the project can be found [here](https://blizaar.list.lu/doku.php?id=publications).
Olivier Parisot's avatar
Olivier Parisot committed
15 16 17 18


## Contact

Fintan Mc Gee's avatar
Fintan Mc Gee committed
19
Any question? Please contact [Fintan McGee](mailto:fintan.mcgee@list.lu) or visit the [LIST website](https://www.list.lu/en/contact/).
Olivier Parisot's avatar
Olivier Parisot committed
20

Fintan Mc Gee's avatar
Fintan Mc Gee committed
21 22 23

## System Architecture 

Fintan Mc Gee's avatar
Fintan Mc Gee committed
24
The system is centred around a node.js middle-ware web-server  and Neo4J back end.
Fintan Mc Gee's avatar
Fintan Mc Gee committed
25 26 27
For details about the versions of each piece of software see the BLIZAAR toolchain setup section.

### Back End Components 
28
#### Nodejs Server 
Fintan Mc Gee's avatar
Fintan Mc Gee committed
29

Fintan Mc Gee's avatar
Fintan Mc Gee committed
30
This server acts as middleware for the entire project . It serves all webpages to the front end, and is also responsible for user access rights All requests from the front end pass through this server to their target back end components  whether it is for data retrieval from the graph database or for processing of the graph data.  The Nodejs middleware stores an instance of  user’s current graph for each currently logged in user, which can be passed to the various backend engines for processing and updated form the neo4j database
Fintan Mc Gee's avatar
Fintan Mc Gee committed
31
The Node JS sever contains several modules , each of which is a separate file and provides different functionality.
32
##### app.js. 
Fintan Mc Gee's avatar
Fintan Mc Gee committed
33
The app.js file provides the interface to the server. All rest API calls are received here and passed to the relevant module.
34
##### system.js 
Fintan Mc Gee's avatar
Fintan Mc Gee committed
35
This module interfaces with the MongoDB, and is responsible for user access control. All API calls are authenticated via this model ( as the MongoDB stores user info). As the MongoDB is also used for serializing (storing) user graph data between sessions, as well as caching data ( such as the node and edge types for each graph type to speed up queries), that functionality is also stored in this module.
36
##### graph.js 
Fintan Mc Gee's avatar
Fintan Mc Gee committed
37
This module stores and process the graph on the server side.  It is updated via queries to the neo4j database, and graphs loaded from the mongoDB vis system.js.
38
##### graphDB.js 
Fintan Mc Gee's avatar
Fintan Mc Gee committed
39
This module is responsible for interacting with and querying the neo4j database. It takes requests via app.js and transforms them into cypher queries.
40
##### rServer.js 
Fintan Mc Gee's avatar
Fintan Mc Gee committed
41
This module handles requests  concerning graph process and passes them to and rServe instance. It transforms requests into a suitable format and updates the graph with the returned data. It also queries the rServer for available processing functions, e.g. clustering and layout.
42
##### Neo 4j server 
Fintan Mc Gee's avatar
Fintan Mc Gee committed
43
The neo 4j server stores all project master data sources. It is accessed via its built in REST API from the node.js middleware server. All requests for data from the front end or other components should be made to the middle ware which then queries the neo4j database. The various different data sets are distinguished in the database using neo4j node labels.
44
#### R Server 
Fintan Mc Gee's avatar
Fintan Mc Gee committed
45
The back end R server is used to process graph data. R scripts  are stored within in the R engine are remotely invoked from the nodejs server’s R server component. Access to a running R instance  is provided by Rserve, the standard R server application available with all R installations.
46
#### Mongo DB 
Fintan Mc Gee's avatar
Fintan Mc Gee committed
47 48
This data base store information related to user profiles and access rights.  We have chosen this DB rather than storing this information in the graph DB as it integrates easily into the neo4j software stack with minimal overhead, and user information does not require the use of a graphDB. Additionally, we aim to keep a clean common master data DB that can be shared across all users.
The mongoDB is also used as a cache for graph data. Graphs can be saved by the user and these are stored in the MongoDB for quick retrieval . The various node and edge types for each graph are also cached here, to allow for quick access.
Fintan Mc Gee's avatar
Fintan Mc Gee committed
49

50 51
### Front End components 
#### Angular.js 
Fintan Mc Gee's avatar
Fintan Mc Gee committed
52 53
Angular.js is the framework being used for front end development. It offers a robust, proven frame work for website design. For more information see section

54
#### Bootstrap 
Fintan Mc Gee's avatar
Fintan Mc Gee committed
55
Bootstap is a css library that offers a consistent look and feel to all front end webpages. It is frequently used in angular projects
56
#### D3.js/ WebGl / Visualisation development 
Fintan Mc Gee's avatar
Fintan Mc Gee committed
57 58
Development of front end visualizations is not constrained to any specific technology. Project researchers are free to use any available trools, such as d3.js, webgl sigma.js or any technology of their choice. Currently d3.js , sigmal and webgl have been tested and shown to not have any significant issues interacting with the rest of the framework.

59 60
### Graph Structure and Storage 
#### Graph Data Structure 
Fintan Mc Gee's avatar
Fintan Mc Gee committed
61 62
We use a very basic graph data structure modelled on the common types of structure used by d3.js when processing graphs. At its most basic, a graph object consists  of a simple JavaScript object with two properties , nodes and links. Each link contains an id of its source and target rather than a reference to the node object . This is to allow easier transmission of the graphs via json, as  a reference cannon be properly encoded in a json message without duplicating objects.
To allow fast lookups of nodes and links, a look up table (simply a JavaScript object used as a property map) for each is calculated in the middleware . For nodes it  is called node,  and the key is the node id. For links it is called link and the key is the link Id.
63
#### Master Graph Data 
Fintan Mc Gee's avatar
Fintan Mc Gee committed
64 65
As part of this project all primary input data sets  (e.g. the histograph data set, protein interaction data sets, metabolite interaction data sets etc.) for both application domains are stored in the neo4J back end graph data base. This master graph data is the source form which users build their own graphs.
Neo4J uses the field “id” to identify nodes uniquely. We also use this a unique identifier for all of our nodes throughout the framework.
66
#### User Graph Data 
Fintan Mc Gee's avatar
Fintan Mc Gee committed
67
A user builds a graph by making queries from the front end to the back end master graph data sets  via the middle ware. To avoid having the front end having to pass  the full graph to the middle ware for every query we store a copy of each users current graph on the server. As well as reducing load between the front end and back end, this  simplifies saving graphs and work in progress on a per user basis
68
#### Neo4J Graph DB structure and Terminology 
Fintan Mc Gee's avatar
Fintan Mc Gee committed
69 70 71 72
We store all master graph data in Neo4j. Regardless of the application domain, all  input graph data is stored there. Within neo 4j node labels are used to identify sets of nodes.  A node can have multiple labels. Each back end graph is distinguished by a different label. Additionally labels are used to distinguish different types of nodes.
For example within the histograph data sets, all nodes have a label “histograph” and nodes describing people have a label “person”, as well as “histograph”. Histograph node describing places will have a label “place” as well as “histograph”. 
Edges also have can have a type specified , which can be used to restrict the edges which come back associated to nodes in a query. Edges and labels querying has specific semantics in (the Neo4j query language), however all of this should be invisible at the front end.  We merely describe the label convention here to help understand better how queries for graph data are formulated, passed from the front end to the middle-ware, translated into cypher in the middle-ware and passed as a query to the back end.

73 74
## BLIZAAR Tool chain setup 
### Application Framework Tools 
Fintan Mc Gee's avatar
Fintan Mc Gee committed
75 76 77 78
The following components / tools are necessary to set up you BLIZAAR project so that the applications will run. The version of each that has been tested is specified, using other versions that contain a major release may cause issues, however minor version differences  should generally be fine. 
For windows users installers for Neo4j, MongoDB and Node.js are available from the project git hub and can be downloaded from the repository http://blizaar.list.lu:5001/mcgee/blizaar_windows_tools


79 80 81
#### Neo4J (3.0.3) 
Neo 4J is the chosen graph database . As it is a graph database it  does not use SQL, but rather its own query language Cypher.
####Mongo DB (3.2.7) 
Fintan Mc Gee's avatar
Fintan Mc Gee committed
82
Mongo DB is a SQL free database that we are uses to store system administration setting and information. 
83
#### Node.js ( 6.5.0 ) 
Fintan Mc Gee's avatar
Fintan Mc Gee committed
84
Node is our application server, web-server and middleware router for all messages form front end to back end
85
#### R (3.3.x) 
Fintan Mc Gee's avatar
Fintan Mc Gee committed
86
We use R for some back end computations, via the R serve package, providing the interface for the node.js server.  Setting up R for the project requires some additional action described later.
87
#### Rstudio 
Fintan Mc Gee's avatar
Fintan Mc Gee committed
88
Rstudio is an environment for working with R and is much easier to use than the default interface.
89
#### Google Chrome 
Fintan Mc Gee's avatar
Fintan Mc Gee committed
90 91
As this is a web tool, a browser is necessary. Many people already use chrome and it has been shown to work well with existing visualizations. Firefox is acceptable too, however internet explorer will most likely not work.

Fintan Mc Gee's avatar
Fintan Mc Gee committed
92 93
#### JavaScript Development libraries 
There are many many development libraries available for JavaScript. All JavaScript development libraries, should be checked in as part of the GIT repository so all developers are accessing the same version.
Fintan Mc Gee's avatar
Fintan Mc Gee committed
94 95


96
### Installations Prerequisites: 
Fintan Mc Gee's avatar
Fintan Mc Gee committed
97
Download each of the preceding applications, and ensure that you have access to the project repository with your GIT username and password
98
#### Setup Procedure 
Fintan Mc Gee's avatar
Fintan Mc Gee committed
99

100
##### 1. Install Git   
Fintan Mc Gee's avatar
Fintan Mc Gee committed
101

102
 Git also installs git bash which is a bash based shell which is very useful for not only using GIT from the command line, but also for scripting. Integrating the command “Git BASH here” unto the right click menus (and option available during the install, see image) is a useful feature. It is best to only use git BASH for command line control of GIT (so there is no need to enable it for use from the windows command prompt)
Fintan Mc Gee's avatar
Fintan Mc Gee committed
103 104 105

Once git has been installed , it is worth setting up an ssh key to simplifying using git and checking out of data. An ssh key allows a user on a specific machine to use git operations without having to enter a username and password every time. See the section for more details.

106
##### 2. Clone the repository 
Fintan Mc Gee's avatar
Fintan Mc Gee committed
107
Get the project files as follows: Open a git bash window and navigate to the directory you would like to store the project in and type in the following:
108
  `git clone --recursive git@git.list.lu:eScience/blizaar_ce.git`
Fintan Mc Gee's avatar
Fintan Mc Gee committed
109

110
##### 3. Install Neo4j 
Fintan Mc Gee's avatar
Fintan Mc Gee committed
111 112 113
Extract the zip tool to your  chosen neo4j directory and note the path.
Edit the //neo4j_start.bat// and //neo4j_stop.bat// files in the project's root directory,  to point at at your installation.

114
##### 4. Install MongoDB 
Fintan Mc Gee's avatar
Fintan Mc Gee committed
115
Install MongoDb using the downloaded installer and edit the //mongo_start.bat// file in the project's root directory,  to point at at your installation.
Fintan Mc Gee's avatar
Fintan Mc Gee committed
116

117
##### 5. Install Neo4J 
Fintan Mc Gee's avatar
Fintan Mc Gee committed
118 119
Use the Neo4J installer to install Neo4J.

Fintan Mc Gee's avatar
Fintan Mc Gee committed
120
##### 6. Clone the master Neo4J back-end graph DB 
Fintan Mc Gee's avatar
Fintan Mc Gee committed
121 122
Clone the Blizaar_neo4j_DB project and copy the blizaar.graphdb subfolder of the blizaar_data subfolder of the project  into the databases subdirectory of your Neo4J installations “data” folder.

123
##### 7. Configure Neo4J 
Fintan Mc Gee's avatar
Fintan Mc Gee committed
124 125
Edit the Neo4J config file (in the /conf/neo4j.conf subfolder of your Neo4J installation) to point at blizaar.graphdb.
i.e. set the following parameter:
126
  `dbms.active_database=blizaar.graphdb`
Fintan Mc Gee's avatar
Fintan Mc Gee committed
127
  
128
##### 8. Install R & RStudio 
Fintan Mc Gee's avatar
Fintan Mc Gee committed
129
Install R and R studio, and then open the R.rproj file in the R subfolder of the BLIZAAR platform installation directory.
Fintan Mc Gee's avatar
Fintan Mc Gee committed
130
Run the script firstTimeSetup.r with the following command.
131
  `source('./firstTimeSetup.r')`
Fintan Mc Gee's avatar
Fintan Mc Gee committed
132

133
##### 8. Install Required Node Packages 
Fintan Mc Gee's avatar
Fintan Mc Gee committed
134 135
The node.js middle-ware requires packages to be installed, fortunate node.js provide a package manager.
To install all required packages run the following command at the command prompt in the BLIZAAR installation directory ( you can use the regual windows command prompt, to git bash.. 
136
  `npm install`
Fintan Mc Gee's avatar
Fintan Mc Gee committed
137
  
138
##### Running the Platform 
Fintan Mc Gee's avatar
Fintan Mc Gee committed
139
To run the platform each of the components needs to be started: Neo4J, RServe, MongoDB, and the node.js middleware.
Fintan Mc Gee's avatar
Fintan Mc Gee committed
140

141
##### 1. Start Neo4J 
Fintan Mc Gee's avatar
Fintan Mc Gee committed
142
Start Neo4J by running the //start_neo4J.bat// batch file in the platform home directory. This file MUST be run as** administrator**.
Fintan Mc Gee's avatar
Fintan Mc Gee committed
143

144
##### 2. Start MongoDB 
Fintan Mc Gee's avatar
Fintan Mc Gee committed
145
Start MongoDB by running the //start_neo4J.bat// batch file in the platform home directory. This file must **NOT** be run as administrator.
Fintan Mc Gee's avatar
Fintan Mc Gee committed
146

147
##### 3. Start RServe 
Fintan Mc Gee's avatar
Fintan Mc Gee committed
148 149
Start RStudio by opening the //R.rproj// file in the R subfolder of the platform home directory.
Run the following command to start RServe:
150
  `source('./startRserve.R')`
Fintan Mc Gee's avatar
Fintan Mc Gee committed
151
  
152
##### 4. Start the node.js server 
Fintan Mc Gee's avatar
Fintan Mc Gee committed
153
Open a command prompt and navigate to the BLIZAAR platform home directory. Type the following to start the server
154
  `node app.js`
Fintan Mc Gee's avatar
Fintan Mc Gee committed
155
The first time the server is run it will automatically create the MongoDB database file with a default user and username.
156
##### 5. Login to the application 
Fintan Mc Gee's avatar
Fintan Mc Gee committed
157 158
Open the chrome browser and enter //localhost:3333// in the navigation bar.