forked from anusharanganathan/RDFDatabank
-
Notifications
You must be signed in to change notification settings - Fork 2
Expand file tree
/
Copy pathDatabank_VM_Installation.txt
More file actions
541 lines (428 loc) · 25.2 KB
/
Databank_VM_Installation.txt
File metadata and controls
541 lines (428 loc) · 25.2 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
Databank VM Setup - 0.3rc2
This document details installing Databank from source
For installing Databank from a debian package, visit http://apt-repo.bodleian.ox.ac.uk/databank/
and follow the instruction under 'Using the repository' and 'Installing Databank'
------------------------------------------------------------------------------------------------------
I. Virtual machine details
------------------------------------------------------------------------------------------------------
Ubuntu 11.10 server i386
512MB RAM
8GB Harddrive (all of hard disk not allocated at the time of creation)
Network: NAT
1 processor
hostname: databank
Partition disk - guided - use entire disk and set up LVM
Full name: Databank Admin
username: demoSystemUser
password: xxxxxxx
NO encryption of home dir
No proxy
No automatic updates
No predefined software
Install Grub boot loader to master boot record
Installing VMWare tools
Select Install Vmware tools from the VMWare console
sudo mkdir /mnt/cdrom
sudo mount /dev/cdrom /mnt/cdrom
cd tmp
cd /tmp
ls -l
tar zxpf /mnt/cdrom/VMwareTools-7.7.6-203138.tar.gz vmware-tools-distrib/
ls -l
sudo umount /dev/cdrom
sudo apt-get install linux-headers-virtual
sudo apt-get install psmisc
cd vmware-tools-distrib/
sudo ./vmware-install.pl
Accept all of the default options
------------------------------------------------------------------------------------------------------
II. A. Packages to be Installed
------------------------------------------------------------------------------------------------------
sudo apt-get install build-essential
sudo apt-get update
sudo apt-get install openssh-server
sudo apt-get install python-dev
sudo apt-get install python-setuptools
sudo apt-get install python-virtualenv
sudo apt-get install curl
sudo apt-get install links2
sudo apt-get install unzip
sudo apt-get install libxml2-dev
sudo apt-get install libxslt-dev
sudo apt-get install libxml2
sudo apt-get install libxslt1.1
sudo apt-get install redis-server
------------------------------------------------------------------------------------------------------
III. Create mysql user and database for Databank
------------------------------------------------------------------------------------------------------
# If you don't have mysql installed, run the following command
sudo apt-get install mysql-server libmysql++-dev
# Create mysql user and database for Databank
# Create Database databankauth and user databanksqladmin. Give user databanksqladmin access to databankauth
# Set the password for user databanksqladmin - replace 'password' in the command below
mysql -u root -p
mysql> use mysql;
mysql> CREATE DATABASE databankauth DEFAULT CHARACTER SET utf8 COLLATE utf8_bin;
mysql> GRANT ALL ON databankauth.* TO databanksqladmin@localhost IDENTIFIED BY password;
mysql> exit
# Test the user and database are created fine.
# You should be able to login as used databanksqladmin and use the database databankatuh.
# The database will be populated with the required tables when the databank application is setup
mysql -h localhost -u databanksqladmin -p
mysql> use databankauth;
mysql> show tables;
mysql> exit
------------------------------------------------------------------------------------------------------
IV. Install Databank, Sword server and python depedencies
------------------------------------------------------------------------------------------------------
Databank's root folder is not /var/lib/databank
# Create all of the folders needed for Databank and set the permission and owner
sudo mkdir /var/lib/databank
sudo mkdir /var/log/databank
sudo mkdir /var/cache/databank
sudo mkdir /etc/default/databank
sudo mkdir /silos
sudo chown -R databankadmin:www-data /var/lib/databank/
sudo chown -R databankadmin:www-data /var/log/databank/
sudo chown -R databankadmin:www-data /var/cache/databank/
sudo chown -R databankadmin:www-data /etc/default/databank/
sudo chown -R databankadmin:www-data /silos/
sudo chmod -R 775 /var/lib/databank/
sudo chmod -R 775 /var/log/databank/
sudo chmod -R 775 /var/cache/databank/
sudo chmod -R 775 /etc/default/databank/
sudo chmod -R 775 /silos/
# Pull databank source code from Github into /var/lib/databank
sudo apt-get install git-core git-doc
git clone git://github.com/dataflow/RDFDatabank /var/lib/databank
# Move all of the config files into /etc/default/databank so you don't overwrite them by mistake when updating the source code
cp production.ini /etc/default/databank/
cp development.ini /etc/default/databank/
cp -r docs/apache_config/*_wsgi /etc/default/databank/
cp docs/solr_config/conf/schema.xml /etc/default/databank/
# Setup a virtual environemnt fro python and install all the python packages
virtualenv --no-site-packages /var/lib/databank/
cd /var/lib/databank/
source bin/activate
easy_install python-dateutil==1.5
easy_install pairtree==0.7.1-T
easy_install https://github.com/anusharanganathan/RecordSilo/raw/master/dist/RecordSilo-0.4.15-py2.7.egg
easy_install solrpy==0.9.5
easy_install rdflib==2.4.2
easy_install redis==2.4.11
easy_install MySQL-python
easy_install pylons==1.0
easy_install lxml==2.3.4
easy_install web.py
easy_install sqlalchemy==0.7.6
easy_install repoze.what-pylons
easy_install repoze.what-quickstart
# Repoze.what installs repoze.who version 1.0.19 while Databank uses repoze.who 2.0a4. So delete repoze.who 1.0.19
rm -r lib/python2.7/site-packages/repoze.who-1.0.19-py2.7.egg/
# Pylons installs the latest version of WebOb, which expects all requests in utf-8 while earlier WebOb until 1.0.8 did't insist on utf-8.
# So remove the latest version of WebOb, which currently is 1.2b3
rm -r lib/python2.7/site-packages/WebOb-1.2b3-py2.7.egg/
# Install the particular version of repoze.who and WebOb needed for Databank
easy_install repoze.who==2.0a4
easy_install webob==1.0.8
# Pull the sword server from source forge and copy the folder sss within sword server into databank
cd ~
wget http://sword-app.svn.sourceforge.net/viewvc/sword-app/sss/branches/sss-2/?view=tar
mv index.html\?view\=tar sword-server-2.tar.gz
tar xzvf sword-server-2.tar.gz
cp -r ./sss-2/sss/ ./
cd /var/lib/databank
Installing profilers in python and pylons to obtain run time performance and other stats
Note: This package is OPTIONAL and is only needed in development machines.
See the note below about running Pylons in debug mode (section B)
easy_install profiler
easy_install repoze.profile
------------------------------------------------------------------------------------------------------
V. Customizing Databank to your environment
------------------------------------------------------------------------------------------------------
All of Databank's configuration settings are placed in the file production.ini or development.ini
* development.ini is configured to work in debug mode with all of the logs written to the console.
* production.ini is configured to not work in debug mode with all of the logs written to log files
The following settings need to be configured
1. Adminsitrator email and smtp server for emails
The databank will email errors to the administrator
Edit the field 'email_to' for the email address
Edit the field 'smtp_server' for the smtp server to be used. The default value is 'localhost'.
2. The location where all of Databank's data is to be stored
Edit the field 'granary.store'
The default value is '/silos'
3^. The url where Databank will be available.
Examples for this are:
The server name like http://example.com/databank/ or
the ip address fo the machine,if it has no cname http://192.168.23.131/ or
just using localhost (development / evaluation) http://localhost/ or
Edit the field 'granary.uri_root'
The default value is 'http://databank/'
4. The mysql database connction string for databank
The format of the connection string is mysql://username:password@localhost:3306/database_name
Replace username, password and database_name with the corect values.
The default username is databankdsqladmin
The default database name is databankauth
Edit the field 'sqlalchemy.url'
The default value is mysql://databanksqladmin:d6sqL4dm;n@localhost:3306/databankauth'
5. The SOLR end point
Should point to the databank solr instance
Edit the field 'solr.host'
The default value is http://localhost:8080/solr,
6. Default metadata values
The value of publisher and the defualt value of rights and license can be modified
These are treated as text strings and are currently used in the manifest.rdf
^ This setting will also need to be modified at /var/lib/databank/rdfdatabank/tests/RDFDatabankConfig.py
Change 'granary_uri_root'.
See section XVI for the significance of the base URI
------------------------------------------------------------------------------------------------------
VI. Customizing Databank Sword to your environment
------------------------------------------------------------------------------------------------------
The sword configuration settings are placed in the file sss.conf.json
The url where Databank will be available needs to be set
Without this, a sword client cannot talk to Databank through the sword interface
Edit the field 'base_url'
The default value is http://localhost:5000/swordv2/
Replace http://localhost/ with the correct base url
Examples for this are:
The server name like http://example.com/databank/ or
the ip address fo the machine,if it has no cname http://192.168.23.131/ or
just using localhost (development / evaluation) http://localhost/ or
Edit the field 'db_base_url'
The default value is http://192.168.23.133/
Replace with the correct base url
------------------------------------------------------------------------------------------------------
VII. Intialize databank and Create the main admin user to access Databank
------------------------------------------------------------------------------------------------------
paster setup-app production.ini
python add_user.py admin password dataflow-devel@googlegroups.com
The second command is used to create the administrator user for databank.
* The administrator has a default username as 'admin'.
* This user is the root administrator for Databank and has access to all the silos in Databank.
* Please choose a strong password for the user and replace the string 'password' with the password.
------------------------------------------------------------------------------------------------------
VIII. Installing SOLR with Tomcat and cutomizing SOLR for Databank
* If you already have an existing SOLR installation and would like to use that, see section XVIII
------------------------------------------------------------------------------------------------------
# Install solr with tomcat
sudo apt-get install openjdk-6-jre
sudo apt-get install solr-tomcat
This will install Solr from Ubuntu's repositories as well as install and configure Tomcat.
Tomcat is installed with CATALINA_HOME in /usr/share/tomcat6 and CATALINA_BASE in /var/lib/tomcat6,
following the rules from /usr/share/doc/tomcat6-common/RUNNING.txt.gz.
The Catalaina configuration files are in /etc/tomcat6/
Solr itself lives in three spots, /usr/share/solr, /var/lib/solr/ and /etc/solr.
These directories contain the solr home director, data directory and configuration data respectively.
You can visit the url http://localhost:8080 and http://localhost:8080/solr to make sure Tomcat and SOLR are working fine
# Stop tomcat before customizing solr
/etc/init.d/tomcat6 stop
# Backup the current solr schema
sudo cp /etc/solr/conf/schema.xml /etc/solr/conf/schema.xml.bak
# Copy (sym link) the Databank SOLR Schema into Solr
sudo ln -sf /etc/default/databank/schema.xml /etc/solr/conf/schema.xml
# Start tomcat and test solr is working fine by visting http://localhost:8080/solr
/etc/init.d/tomcat6 start
------------------------------------------------------------------------------------------------------
IX. Setting up Supervisor to manage the message workers
------------------------------------------------------------------------------------------------------
Items are indexed in SOLR from Databank, through redis using message queues
The workers that run on these message queues are managed using supervisor
# If you do not already have supervisor, install it
sudo apt-get install supervisor
# Configuring Supervisor for Databank
# Stop supervisor
sudo /etc/init.d/supervisor stop
# Copy (sym link) the supervisor configuration files for the message workers
sudo ln -sf /var/lib/databank/message_workers/workers_available/worker_broker.conf /etc/supervisor/conf.d/worker_broker.conf
sudo ln -sf /var/lib/databank/message_workers/workers_available/worker_solr.conf /etc/supervisor/conf.d/worker_solr.conf
sudo /etc/init.d/supervisor start
# The controller for supervisor can be invoked with the command 'supervisorctl'
sudo supervisorctl
This will list all of the jobs manged by supervisor and their current status.
You can start / stop / restart jobs from within the controller.
For more info on supervisor, read http://supervisord.org/index.html
------------------------------------------------------------------------------------------------------
X. Integrate Databank with Datacite, for minting DOIs (this section is optional)
------------------------------------------------------------------------------------------------------
If you want to integrate Databank with Datacite for minting DOIs for each of the data-packages, then you would need to do the following:
Create a file called doi_config.py which has all of the authentication information given to you by Datacite.
Copy the lines below (starting from the line #-*- coding: utf-8 -*-).
Edit the values for each of the fields in "#Details pertaining to account with datacite" and
"#Datacite api endpoint" if it is different
Save it in a file called doi_config.py and copy it to /var/lib/databank/rdfdatabank/config/
By default, this file is palced in /var/lib/databank/rdfdatabank/config/doi_config.py.
If you want to place the file in a different location, make sure Datababk knows where to find the file.
The field 'doi.config' in section [app:main] in production.ini and development.ini has this setting.
#-*- coding: utf-8 -*-
from pylons import config
class OxDataciteDoi():
def __init__(self):
"""
DOI service provided by the British Library on behalf of Datacite.org
API Doc: https://api.datacite.org/
Metadata requirements: http://datacite.org/schema/DataCite-MetadataKernel_v2.0.pdf
"""
#Details pertaining to account with datacite
self.account = "BL.xxxx"
self.description = "Oxford University Library Service Databank"
self.contact = "Contact Name of person in your organisation"
self.email = "email of contact person in your organisation"
self.password = "password as given by DataCite"
self.domain = "ox.ac.uk"
self.prefix = "the prefix as given by DataCite"
self.quota = 500
if config.has_key("doi.count"):
self.doi_count_file = config['doi.count']
#Datacite api endpoint
self.endpoint_host = "api.datacite.org"
self.endpoint_path_doi = "/doi"
self.endpoint_path_metadata = "/metadata"
------------------------------------------------------------------------------------------------------
XI. Integrate Databank with Apache
------------------------------------------------------------------------------------------------------
1. Install Apache and the required libraries
sudo apt-get install apache2 apache2-utils libapache2-mod-wsgi
2. Stop Apache before making any modification
sudo /etc/init.d/apache2 stop
3. Add a new site in apache sites-available called 'databank_ve27_wsgi'
sudo ln -sf /etc/default/databank/databank_ve27_wsgi /etc/apache2/sites-available/databank_ve27_wsgi
4. Disable the default sites
# Check what default sites you have
sudo ls -l /etc/apache2/sites-available
sudo a2dissite default
sudo a2dissite default-ssl
sudo a2dissite 000-default
5. Enable the site 'databank_ve27_wsgi'
sudo a2ensite databank_ve_27_wsgi
6. Reload apache and start it
sudo /etc/init.d/apache2 reload
sudo /etc/init.d/apache2 start
------------------------------------------------------------------------------------------------------
XII. Making sure all of the needed folders are available and apache has access to all the needed parts
------------------------------------------------------------------------------------------------------
Apache runs as user www-data. Make sure the user www-data is able to read write to the following locations
/var/lib/databank
/silos
/var/log/databank
/var/cache/databank
Change permission, so www-data has access to RDFDatabank
sudo chgrp -R www-data path_to_dir
sudo chmod -R 775 $path_to_dir
------------------------------------------------------------------------------------------------------
XIII. Test your Pylons installation
------------------------------------------------------------------------------------------------------
Visit the page http://localhost/
If you see an error message look at the logs at /var/log/apache2/databank-error.log and /var/log/databank/
------------------------------------------------------------------------------------------------------
XIV. Run the test code and make sure all the tests pass
------------------------------------------------------------------------------------------------------
The test code is located at /var/lib/databank/rdfdatabank/tests
The test use the configuration file RDFDatabankConfig.py, which you may need to modify
granary_uri_root="http://databank"
This needs to be the same value as granary.uri_root in the production.ini file (or development.ini file if usign that instead)
endpointhost="localhost"
This should point to the url where the databank instance is running.
If it is running on http://localhost/, it should be localhost. If it is running on http://example.org it should be example.org.
if it is running on a non-standard port like port 5000 at http://localhost:5000, this would be localhost:5000
endpointpath="/sandbox/" and endpointpath2="/sandbox2/"
The silos that are going to be used for testing. Currently only the silo defined in endpointpath is used.
The silos will be created by the test if they don't exist.
The rest of the file lists the credentials of the different users used for testing
To run the tests
Make sure databank is running (see section IX)
cd /var/lib/databank
. bin/activate
cd rdfdatabank/tests
python TestSubmission.py
-----------------------------------------------------------------------------------------------------
XV. Running Pylons from the command line in debug mode and dumping logs to stdout
-----------------------------------------------------------------------------------------------------
If you would like to run Pylons in debug mode from the command line and dump all of the log messages to stdout, stop apache and start paster from the command line.
The configuration file development.ini has been setup to do just that.
Make sure the user running paster has access to all the directories.
Running Pylons on port 80 (host=0.0.0.0 and port=80 in development.ini)
you are now likely to be running databank as the super user and not user 'www-data' and so would have to revisit section XII and
change permissions giving the super user running paster access to the different directories.
The commands to run pylons from the command line
sudo /etc/init.d/apache2 stop
sudo ./bin/paster serve development.ini
To stop paster,press ctrl+c
To run paster on another port, modify the fields host and port in development.ini.
For example, to run on port 5000, the settings would be
host = 127.0.0.1
port = 5000
-----------------------------------------------------------------------------------------------------
XVI. The Base URI setting (granary.uri_root) for Databank and it's significance
-----------------------------------------------------------------------------------------------------
One of the configuration options available in Databank is the 'granary.uri_root' which is the base uri for Databank.
This value is used in the following:
* Each of the silos created in Databank will be intialized with this base URI
* In each of the data packages, the metadata (held in the manifest.rdf) will use this base URI in creating the URI for the data package
* The links to each data item in the package will be created using this base uri (aggregate map for each data package)
If this base uri doesn't resolve, the links for each of the items in the data package will not resolve
This base uri is regarded to be permanent. Modifying the base uri at some point in the future will create all new silos and the data packages within the new silos with the new base uri, but the existing silos and data packages will continue to have the old uri.
-----------------------------------------------------------------------------------------------------
XVII. Recap of the services running in Databank
-----------------------------------------------------------------------------------------------------
Apache2
Runs the databank web server (powered by Pylons)
at http://localhost or http://ip_address from your host machine
Apache should start automatically on startup of the VM.
The apache log files are at
/var/log/apache2/
The command to stop, start and restart apache are
sudo /etc/init.d/apache2 [ stop | start | restart ]
Tomcat
Tomcat runs the SOLR webservice. Tomcat should start automatically on startup of the VM.
Tomcat should be available at http://localhost:8080 and
SOLR should be available at http://localhost:8080/solr
Tomcat is installed with
CATALINA_HOME in /usr/share/tomcat6,
CATALINA_BASE in /var/lib/tomcat6 and
configuration files in /etc/tomcat6/
SOLR itself lives in three spots,
/usr/share/solr - contains the SOLR home director,
/var/lib/solr/ - contains the data directory and
/etc/solr � contains the configuration data
The command to stop, start and restart tomcat are
sudo /etc/init.d/tomcat6 [ stop | start | restart ]
Redis
Runs a basic messaging queue used by the API for indexing items into SOLR
and storing information that need to accessed quickly (like embargo information)
Redis should start automatically on startup of the VM.
The data directory is at /var/lib/redis and the configuration is at /etc/redis
The command to stop, start and restart redis are
sudo /etc/init.d/redis-server [ stop | start | restart ]
Supervisor
Supervisor maintains the message workers run by Databank.
Run the supervisor controller to manage processes maintained by supervisor
sudo supervisorctl
------------------------------------------------------------------------------------------------------
XVIII. Integrating SOLR for Databank with an existing SOLR installation
------------------------------------------------------------------------------------------------------
If you already have a SOLR instance running and would like to add databank to it
- either by creating a new core (https://wiki.apache.org/solr/CoreAdmin)
- or by creating a new instance
http://wiki.apache.org/solr/SolrTomcat#Multiple_Solr_Webapps
http://wiki.apache.org/solr/SolrJetty#Running_multiple_instances
you can do so.
Once you have created a new core or new instance, and verified it is wotking,
stop SOLR,
replace the example schema file for that core / instance with Databank's schema file.
It is available at /etc/default/databank/schema.xml
Start SOLR
Stop Databank web server (stop apache) and the solr worker (using supervisorctl)
You need to configure the solr end point in the config file production.ini or development.ini
(as mentioned in section V).
In the case of mmultiple cores, the solr end point would be something like http://localhost:8080/solr/core_databank
if you have called the databank core 'core_databank'
In the case of mmultiple SOLR instances, the solr end point would be something like http://localhost:8080/solr_databank
if you have called the databank instance 'solr_databank'
Edit the field 'solr.host'.
Replace the default value with your solr endpoint
You need to configure the solr end point in the config file loglines.cfg
located at /var/lib/databank/message_workers/ and used by the solr worker for indexing items into SOLR
Edit the field 'solrurl' in the section [worker_solr].
Replace the default value with your solr endpoint
Start the solr worker (using supervisorctl) and the Databank web server (start apache)
-----------------------------------------------------------------------------------------------------