Table of Contents

maxLevel	3

Introduction

The Data Repository is a software component that manages an RDF database and makes it available to other applications through a REST API, and gives end users specific views of the data. It adds role-based access control of varying granularity, transactional editing, custom treatment of ontologies and minimal/fast inference, and various administrative functions on top of the RDF database.

This page explains how it works on a host computer system, and how to install and maintain it. This page serves as an application administrator's manual for the development cycle.

Components and Layout

The data repository is installed in two intentionally separate places on the host operating system:

...

Another advantage to the separate location is that it gives the system administrator more flexibility to assign that directory to a location with appropriate capacity, reliability, and performance.

Command-Line Tools

The installed repository includes a set of command-line tools you will use for many of the administrative tasks. They are found in the etc/ subdirectory of the repository home directory. All of them respond to these two options:

...

Code Block
bash ${REPO_HOME}/etc/upgrade.sh --version upgrade.sh from release 1.1-MS4.00 SCM revision 5422

Installation

Platform Requirements

This application requires Sun's JRE version 1.6, e.g. "Java HotSpot(TM) SE Runtime Environment".
The repository is a pure Java webapp and ought to run on any Java Servlet container conforming to the 2.5 version of the specification. It has only been thoroughly tested on Apache Tomcat 6.0 and smoke-tested on Apache Tomcat 7.0, however.
The supporting utility scripts and tools require a Unix environment such as MacOS or Linux. MS Windows is NOT supported.
Aside from the Java Servlet environment, the webapp requires a separate "home" directory, located outside of the servlet container hierarchy, to which the container's JVM has read/write access.

See the Installation section below for more specific and detailed requirements.

Scalability Limits

Note that only one instance of a Repository webapp may be run on a given home directory. This means that only one JVM and Servlet Container may access that home directory and RDF dataset at any one time. This is a restriction imposed by the Sesame triplestore.

It is not possible to "scale" performance of the repository by sharing the online RDF database among multiple machines or processes. it is possible to make periodic read-only snapshots of a database and serve them from separate machines, so long as you do not allow them to be changed.

Install and Configure Repository

Prerequisites

Unix-like operating system. This procedure is only valid for Unix variants like Linux, Solaris, MacOSX. To run some of the scripts you will need to have these commands installed:
- bash
- perl
- curl
- awk (surely anything that calls itself unix must have awk)
- tr (seriously, is tr missing? if you are running Gentoo, install an operating system)
Sun's Java JDK 1.6.0.18 (though any 1.6 version ought to work just as well).
Apache Tomcat web servlet container, version 6.0 or 7.0, configured to run with the Java JDK in #2.
- Tomcat must be configured to use SSL, see for example: Apache Tomcat 6.0 SSL Configuration HOW-TO
- See the Procedures section if using Ubuntu's tomcat6 package.
- It may be necessary to download Tomcat directly and install it manually if the version supplied by the host OS's package system is not usable. Don't hesitate to do this if it is expedient; Tomcat can run as a pure Java application in a single file hierarchy, so a manual download can work just as well (if not better) than the packaged version.
Optional: Apache Derby RDBMS installed in your Tomcat servlet container.
- A copy of Derby is provided if you need to install it.

Step 1. Get Repository Distribution

The repository is distributed as a single Zip file. It contains a file README which identifies the software release it was built from. It is the artifact produced by the Maven project:

Code Block
org.eagle-i:eagle-i-repository-dist

Step 2. Establish the Repository Home Directory

You need to determine the repository's home directory. It may be anywhere on the system so long as it satisfies these criteria:

...

Code Block
REPO_HOME=/opt/eaglei/repo

Step 3. Populate the Repository Home Directory from the Distribution

Unpack the distribution Zip archive in a directory under /tmp:

...

It should contain the subdirectories etc/ lib/ and webapps/

Step 4. Locate the Servlet Container (Apache Tomcat)

Determine the Java Servlet Container's home directory (e.g. Tomcat) which is usually dictated by your host OS. For example, it may be the 'tomcat' user's home directory, ~tomcat.

...

Code Block
CATALINA_HOME=/opt/tomcat

Step 5. Configure Tomcat: JAVA_OPTS and System Properties

Ensure that your Tomcat server is run with the following options on its JVM. The simplest way to accomplish this is to have the environment variable JAVA_OPTS include those options, but each platform, distro, package etc. of Tomcat has its own mechanism for setting this variable. For example, on Fedora 14, it should be in the file /etc/tomcat6/tomcat6.conf.

...

Code Block
# example org.eaglei.repository.home = /opt/eaglei/repo derby.system.home= /opt/eaglei/repo

Step 6. Install Apache Derby jars if necessary

Look in your Tomcat installation's main lib directory. If there are no files named derby.jar or derby-version.jar, you must install the Derby jars from the "scripts" distribution, e.g.

Code Block
cp ${REPO_HOME}/lib/derby-* ${CATALINA_HOME}/lib/

Step 7. (OPTIONAL) Choose alternate Apache Derby implementation

Are you already running applications which use a certain Apache Derby in your servlet container? If so, set the environment variable DERBY_HOME as documented by Apache; if not, leave it unset and the script will use its own version of Derby (the jars in its lib/ subdirectory):

...

NOTE: You must use the same version of Derby to create this initial user database as the version installed in Tomcat, so if Tomcat is already running a version of Derby, set DERBY_HOME to use that.

Step 8. Install the Repository

Follow this step-by-step procedure. Before you start, make sure the Tomcat server is not running.

...

Code Block
bash ${REPO_HOME}/etc/finish-install.sh \ -f firstname \ -l lastname \ -m admin@ei.edu \ USERNAME PASSWORD https://localhost:8443

Confirm it is running by visiting the admin page (login with USERNAME and PASSWORD):
Code Block
https://localhost:8443/repository/admin

Upgrades

This is the procedure to upgrade an existing repository instance to a new release of the software. All existing configurations, data, and user accounts are preserved. However, if the upgrade includes ontology changes there may also be an extra procedure to transform the existing data to reconcile it with ontology changes. Always consult the release notes.

Before Upgrading

Get the Repository Distribution

The repository release is distributed as a single Zip file. It contains a file README whcih identifies the software release it was built from. It is the artifact produced by the Maven project:

Code Block
org.eagle-i:eagle-i-repository-dist

Back up

It would be a wise precaution to make a backup of the current repository state so you can roll back to it in case of fatal problems with the upgrade. Follow the Backup Procedure in the Procedures section to get a snapshot of the current repository contents.

Step By Step Upgrade Procedure

Note that the directory macros ${CATALINA_HOME} and ${REPO_HOME} are used in the examples here; see the Install Procedure above for a description of what they mean.

Unpack the distribution Zip archive in a directory e.g. under /tmp:
Code Block
cd /tmp unzip repository-dist.zip
Shut down your Tomcat java servlet container.
Delete the old repo webapp subdirectory and WAR file, since there should not be any local modifcations there. For example:
Code Block
rm -rf ${CATALINA_HOME}/webapps/ROOT*
Save the current release files in case you have to roll back:
Code Block
cd ${REPO_HOME} mv etc etc.old mv lib lib.old mv webapps webapps.old
Copy the distribution into place (in this example the distribution is version 1.1-MS1.00-SNAPSHOT) -- note there are 2 steps:
Code Block
cp -f -rp /tmp/repository-1.1-MS1.00-SNAPSHOT/* ${REPO_HOME} cp webapps/* ${CATALINA_HOME}/webapps
Start up your tomcat java servlet container.
Run the upgrade script, substituting your admin's username and password:
Code Block
bash ${REPO_HOME}/etc/upgrade.sh USERNAME PASSWORD https://localhost:8443
Watch the output of upgrade.sh very carefully! Pay particular attention to the final status and any messages beginning "WARN", they will indicate problems you MUST resolve.
Confirm that it worked: visit the repo admin page, check for new version, and then follow the link to Show Data Model Ontology versions to confirm that "loaded" and "available" versions of the ontology are the same.When running the upgrade script, there may be messages about out-of-date NG_Internal and NG_Query graphs. Most likely, these are nothing to worry about -- check the release notes. These graphs are only initialized from static files when the repository was created, and afterward they accumulate statements, so reloading a new copy of the original data is not practical. Some releases may include instructions for making changes in these graphs when upgrading from previous versions.

Configuration

URIs for Creating New Roles, Transitions, and Workspaces

When you create a new Role or* Workflow Transition*, you have the option of assigning your own URI to the new resource. When should you make up a URI, and when should you just let the system create one?

...

For Workspace (aka Named Graph) URIs, you have to assign them in the process of creating a new Named Graph. Follow the rules below to create a reasonable URI.

Rules of Creating Your Own URIs

Note that these URIs do not need to be resolvable. They are purely symbolic names for instances buried within the repository, which are virtually guaranteed never to appear in the outside world. So don't worry about whether the URI is actually resolved, most of the existing URIs for these types of things are not resolvable anyway.

...

http://dartmouse.edu/repo/Role_LabRathttp://dartmouse.edu/repo/WFT_13_2http://eagle-i.org/ont/repo/1.0/DARTMOUSE_ROLE_PIhttp://eagle-i.org/ont/repo/1.0/DARTMOUSE_WFT_TRASH
Exception: The URI of a named graph representing an ontology is usually the same as the URI of the ontology itself, i.e. the subject of its owl:versionInfo statement. If you should happen to add a new ontology named graph to the repository, use that URI for its name. However this should be a very rare occurrence; usually new ontological information is simply added to the existing eagle-i data model ontology graph.

Managing Access Controls on Contact & "Hidden" Properties

The repository has a mechanism for restricting access to some of the properties of resource instances, deemed "hidden" and "contact" properties - these are two distinct sets of properties, configured independently but by an identical mechanism. See the Resource Property Hiding and Acces Control sections under Concepts in the Repository Design Specification / API Manual for more details about how this works.

...

Once you have set up a single repository to your liking, you can export and re-import the grants to other repositories. See the Procedure: Exporting and Importing Property Access Controls section below.

Configuration Reference

This section lists everything that can be configured, so you can get familiar with it before installing anything.

System Properties

The repository requires these system properties to be defined in the JVM environment running your servlet container:

...

Code Block
org.eaglei.repository.home = /opt/eaglei/repo derby.system.home= /opt/eaglei/repo

Repository Home Directory

The repository has a notion of a home directory, the root of a hierarchy of other runtime files.

...

configuration.properties - java properties file with repository and log4j configuration props. This is optional, it must be created by the administrator.
logs/ - Default subdirectory for log files, see configuration. Created automatically by default.
sesame/ - Default Sesame RDF database files - DO NOT TOUCH. Created automatically by default.
etc/ - Contains scripts and tools for the repo administrator.
db/ - Default subdirectory Derby RDBMS files - DO NOT TOUCH. Created automatically by default.

The Repository Configuration Properties File

The configuration file is read by Apache Commons Configuration, which recognizes interpolated property and system property values. See its documentation for more information about features in the configuration file.

...

Code Block
log4j.logger.org.eaglei.repository=DEBUG, repository

Configuring Logging

The repository uses Apache log4j for its logging. Any properties starting with log4j. in the repository configuration properties are simply passed through to configure log4j. The Loggers (aka Categories) are all descendents of the repository root Logger, org.eaglei.repository, so you should configure the log level and appenders for that Logger.

...

Code Block

log4j.logger.org.eaglei.repository=INFO, repository
log4j.additivity.org.eaglei.repository=false
log4j.logger.org.eaglei.repository.servlet.RepositoryServlet=DEBUG, repository
log4j.additivity.org.eaglei.repository.servlet.RepositoryServlet=false
log4j.appender.repository.BufferedIO=false
log4j.appender.repository.ImmediateFlush=true

Monitoring and Troubleshooting

Version Information

It's often helpful to know exactly what version of the repository you're dealing with, especially in a hectic development and/or testing environment when many versions are available. The release version appears in these places:

Dissemination HTML pages, the head element contains a meta tag with the name eaglei.version, e.g.
Code Block
<meta name="eaglei.version" content="1.1-MS5.00-SNAPSHOT" />
The repository admin home page /repository/admin lists application version info in a human-readable format.
The page /repository/version gives a complete breakdown of component versions, including repo source and the version of the OpenRDF Sesame database. It is XHTML, and it includes meta tags to be easy to scrape or transform.

Log Files

Since the repository is mainly accessed by the REST service API it provides to other applications, you should get used to monitoring it by watching the log file. This is a text file (UTF-8 encoding) maintained by the log4j library under the control of the repository's configuration properties. See the description of the log.dir property above to learn the directory where logfiles are created; they are automatically rotated when the logfile grows too large.

...

To troubleshoot problems with the logging system itself (e.g. log4j config that isn't working as expected), look for where your Java Servlet container writes the standard output stream. For Tomcat 6, this is typically the catalina.out file in some log directory.

Performance Monitoring

As of release 1.1MS5 the repo can log the elapsed time (in milliseconds) for each service request. You must enable DEBUG level logging for the RepositoryServlet, as in this configuration example.

...

See the eaglei.repository.slow.query configuration property for more details. Note that this only applies to to queries made through the SPARQL Protocol endpoint, not the SPARQL queries generated internally by the repo code.

Tuning

The performance of Sesame's NativeStore implementation is extremely sensitive to its index configuration. There is a major benefit to configuring indexes that help resolve triple patterns used by the most frequent and/or voluminous SPARQL queries. A knowledgeable repository administrator should adjust the setting of the eaglei.repository.sesame.indexes property to get the NativeStore to build the most necessary indexes. See doc on that configuration for more details.

Administrator Tools

make-snapshot.sh Script

The make-snapshot script creates a complete backup copy of a data repository, in a designated directory. It has to be given a directory because the backup consists of multiple files. It is packaged with the repository distribution, under the etc/ directory.

...

NO MESSAGE is printed upon success, which lets it run under cron.

Usage

Synopsis:

Code Block
make-snapshot.sh username password repo-URL directory

...

username - username with which to authenticate to the repo
password - password with which to authenticate to the repo
repo-URL - prefix of repository URL, e.g. "https://localhost/"
directory - directory in which to write the dump, will be created if necessary

Restoring Dumps made by make-snapshot

Given a dump created in e.g. ${DUMPDIR}, to restore this dump on a newly-created, empty, repository, use these commands: (where ${REPOSITORY} is URL prefix of the repo)

...

Code Block
curl -s -S -D - -u ADMIN:PASSWORD -F action=replace -F all= \ -F "content=@${DUMPDIR}/resources.trig;type=application/x-trig" \ ${REPOSITORY}/repository/graph

Examples

For example, your crontab might invoke this command to write a daily snapshot

...

Code Block
make-snapshot.sh ADMIN PASSWORD https://localhost:8443 "daily_cron_`date \+%u`"

move-everything.sh: Copying Everything Between Repositories or Files

The move-everything.sh script replicates all of a repository's contents - including resources, users and metadata - from one repository to a different one, or from a static file dump to a live repository. It transforms all resource (and user) URIs to match the URI prefix of the destination repository.

...

Since resource URIs have to be resolvable, this effectively creates new resources in the destination repository with URIs that resolve there. It does this by substituting the target's default prefix into all URIs that used to resolve at the source repository.

This Is Inherently Not A Good Idea

Before you start copying resources around, be sure you understand why this is not a good idea! Reasons include:

...

Given all of these limitations, move-everything can still be an effective way of populating a repository for testing and demonstrations. Just stay aware of what doesn't work, and only use it when the results are temporary and will be discarded.

Restoring from Backups

There is one other legitimate use of move-everything: restoring a backup copy made with make-snapshot. In this case you don't really have to transform the URIs, and the whole intent is to re-create the original state of the repo so the side effects are all desired.

Using the Script

The resource copying script is installed under etc/ in the repository home directory. Its name is move-everything.sh . It only runs on a Unix-based operating system such as Linux or MacOS X. It requires bash, perl 5, and the curl executable.

...

Code Block
Usage: move-everything.sh [--version] [ -f \| --force ] \ [-exclude-users user,user,..] [-nousers] --from-snapshot directory --from-prefix from-prefix \ to-username to-password to-repo-URL

Options

The --force option: Normally the script starts up with a dialog explaining how dangerous it is and how the destination repo will be completely obliterated, and ask if you want to continue. Adding this option (abbreviated -f) will bypass the question and run every time, without asking. It is necessary when embedding it in another script. Only specify --force when you are very sure you're doing the right thing. When prompted with the "Danger!" message, take time to actually read it before agreeing. You may be surprised.

...

The -from-snapshot and from-prefix options must be specified together. They select the input data from a directory of serialized files, in the same format as produced by the make-snapshot script. The value of from-snapshot is the path to the direcotry containing the RDF serialization files. The value of -from-prefix is the exact and complete URI prefix (including the trailing '/') of the repo that generated the dump in the directory. This is necessary because the script does not ahve access to that repository to query it for its prefix.

Fixed Arguments

The fixed command arguments are either one or two triplets of repository access information, i.e. the username, password, and URL of each repo.

...

Code Block

make-snapshot bigbird PASSWORD https://harvard.eagle-i.net \
harvard.monday
move-everything.sh -f \
--from-snapshot harvard.monday \
--from-prefix http://harvard.eagle-i.net/i/ \
bigbird PASSWORD https://localhost:8443

Hints

We strongly recommend you avoid using the Superuser (administrator) login on the source repository, to prevent accidentally obliterating it by getting the argument order wrong. Use an account that has read access to every graph (e.g. the Admin-Read-Only role). This restricts you to using the --nousers version of the command but in most cases that is adequate. See the Procedures section for recommendations on how to maintain copies of repositories this way.

move-resources.sh - Copying Only Resource Instances

The goal of this procedure is to copy all of the resource instances in one Named Graph from one repository to another, along with their relevant provenance and administrative metadata.

Since resource URIs have to be resolvable, this effectively creates new resources in the destination repository with URIs that resolve there. The hostname portion of the URI matches the new repository server, and even the local name is allocated by the destination repository -- so there is no predictable way to relate new URIs to the old ones.

This Is Inherently Not A Good Idea

Before you start copying resources around, be sure you understand why this is not a good idea! Reasons include:

...

Given all of these limitations, the resource-mover script can still be an effective way of populating a repository for testing and demonstrations. Just stay aware of what doesn't work.

Using the Script

The resource copying script is installed under etc/ in the repository home directory. Its name is move-resources. It only runs on a Unix-based operating system such as Linux or MacOS X. It requires perl 5 and the curl executable.

...

Code Block

move-resources -s https://qa.harvard.eagle-i.net:8443 -u bert:ernie \
-g http://eagle-i.org/ont/repo/1.0/NG_Published https://localhost:8443 \
root:password http://eagle-i.org/ont/repo/1.0/NG_Experimental
Moved 4694 data statements and 322 metadata statements.

Procedures

Procedure: Upgrading Packaged Tomcat

Warning

title	IMPORTANT

If you are using the Tomcat server from e.g. a Linux distro's package system, you must be aware of the following serious pitfall that can affect the repository when you upgrade Tomcat through the package system:

...

Finally, delete the entire ${CATALINA_HOME}/work direcotry. Tomcat rebuilds it on startup anyway, but it can contain mistaken caches that do not get updated. Now you can start up Tomcat as usual.

Procedure: Installing Repo on Ubuntu 10's packaged Tomcat6

See also: The Procedure to redirect Port 80 so your URLs are simplified.

...

Shut down tomcat. This is major surgery, and tomcats don't like to be vivisected no matter how much more satisfying you may find it.
Disable Java Security -- alternately, you could try to configure all the authorization grants to give the repository webapp access to the filesystem and property resources it needs, but I found it much easier to just disable java security. DO NOT RUN THE TOMCAT PROCESS AS ROOT if you do this, but you should not be running it as root in any case. That's just insane.
1. Edit the file /etc/init.d/tomcat6 and change the following variable to look like this:
  Code Block
  TOMCAT6_SECURITY=no
Install Derby jars: ONLY IF DERBY IS NOT ALREADY INSTALLED IN THE COMMON AREA OF YOUR TOMCAT. If another webapp is already using Derby, they should share that version.
1. Find the Derby jars in the lib/ subdirectory under where you installed the create-user.sh script.
2. Copy them to the Tomcat common library directory:
  Code Block
  cp ${REPO-ZIP-DIR}/lib/derby\* /usr/share/tomcat6/lib/
Install the webapp: First, get rid of any existing root webapp, then copy in the webapp (ROOT.war file from your installation kit) and be sure it is readable by the tomcat6 user:
Code Block
rm /var/lib/tomcat6/webapps/ROOT\*cp ROOT.war /var/lib/tomcat6/webapps/ROOT.war
Install cached webapp context: This is VERY IMPORTANT, and the Tomcat docs does not even mention it, but without it your server will be mysteriously broken. The file /etc/tomcat6/Catalina/localhost/ROOT.xml must be a copy of your app's context.xml. Redo this command after installing every new ROOT.war:
Code Block
mkdir \-p /etc/tomcat6/Catalina/localhostunzip \-p /var/lib/tomcat6/webapps/ROOT.war META-INF/context.xml > /etc/tomcat6/Catalina/localhost/ROOT.xml
Add System Properties: Be sure you have added system properties to the file /etc/tomcat6/catalina.properties e.g.
Code Block
org.eaglei.repository.home = /opt/eaglei/repoderby.system.home = /opt/eaglei/repo
...of course, the value of these properties will be your Repository Home Directory path.
Start up Tomcat:
Code Block
sudo /etc/init.d/tomcat6 start
Troubleshooting: If there are problems, check the following places for logs (because packaged apps make everything so much easier):
- /var/log/daemon.log - really dire tomcat problems and stdout/stderr go to syslog
- /var/log/tomcat6/* - normal catalina logging
- ${REPOSITORY_HOME}/logs/repository.log - default repo log file in release 1.1; under 1.0 the filename was default.log.

Procedure: Run Tomcat on Port 80 (and 443)

We want the repository (and other Web tools) to have a simple URL, without the ugly port number after the hostname, e.g. NOT http://dev.harvard.eagle-i.net:8080/..., but just http://dev.harvard.eagle-i.net/ (because, really, that's already enough to remmeber.) This procedure uses IP port redirection to let your Tomcat server appear to be running on the canonical HTTP port, which is 80. It is the simplest and safest method to accomplish this under Linux.

The sanest alternative, running an Apache httpd server as an AJP forwarder, is much more effort and adds another point of failure. We will not even discuss running Tomcat as root so it has access to port 80, since that is simply unacceptable.

Ubuntu

This step-by-step procedure has only been tested under Ubuntu Linux 9.10 (krazy kitten). For Fedora, see the next section.

...

Discover your machine's primary IP address and set the ADDR shell variable: (Note that this assumes eth0 is your primary network interface --use ifconfig -a to see them all)
Code Block
ADDR=`ifconfig eth0 \| perl \-ne 'print "$1\n" if m/\sinet addr\:(\d+\.\d+\.\d+\.\d+)\s/;'`

Run these iptables commands to redirect all port 80 requests to port 8080.

Code Block

iptables \-t nat \-A OUTPUT \-d localhost \-p tcp \--dport 80 \-j REDIRECT \--to-ports 8080iptables \-t nat \-A OUTPUT \-d $ADDR \-p tcp \--dport 80 \-j REDIRECT \--to-ports 8080iptables \-t nat \-A PREROUTING \-d $ADDR \-p tcp \--dport 80 \-j REDIRECT \--to-ports 8080

(If using SSL) Run these iptables commands to redirect all port 443 requests to port 8443.

Code Block

iptables \-t nat \-A OUTPUT \-d localhost \-p tcp \--dport 443 \-j REDIRECT \--to-ports 8443iptables \-t nat \-A OUTPUT \-d $ADDR \-p tcp \--dport 443 \-j REDIRECT \--to-ports 8443iptables \-t nat \-A PREROUTING \-d $ADDR \-p tcp \--dport 443 \-j REDIRECT \--to-ports 8443

Save the rules in the canonical place to be reloaded on boot:
Code Block
iptables-save > /etc/iptables.rules
Create a script to be run by the network startup infrastructure that will reload the iptables whenever the network is configured on:
Code Block
cat << EOF > /etc/network/if\-pre-up.d/iptablesload\#\!/bin/shiptables-restore < /etc/iptables.rulesexit 0EOF
Test by accessing your server both locally and remotely by the port-80 URL. Then reboot the machine and try it again to be sure the iptables commands are run correctly on boot.

Fedora

Several of the same assumptions/caveats as Ubuntu (above) apply:

...

Run this iptables command to redirect all port 80 requests to port 8080.

Code Block
/sbin/iptables \-t nat \-I PREROUTING \-p tcp \--dport 80 \-j REDIRECT \--to-port 8080

Save the rules in the canonical place to be reloaded on boot:
Code Block
/sbin/iptables-save
Update the startup settings so iptables will run upon reboot:
Code Block
chkconfig \--level 35 iptables on
Test by accessing your server both locally and remotely by the port-80 URL. Then reboot the machine and try it again to be sure the iptables commands are run correctly on boot.

Procedure: Dump and Restore the RDF Resource Data

The recommended way to dump out the RDF resource data content of the repository is to export it as serialized RDF. If you are exporting the entire contents of the repository, it is essential to preserve the mapping of statements to named graphs, so you must use one of the formats that encodes RDF as quads (statement plus graph-name/context).

...

This is a complex manual procedure with many options -- for a simpler semi-automated backup snapshot procedure, see the section on using the make-snapshot script.

Make Backup Dump (obsolete - see make-snapshot)

Typical command to make a backup, in TriG format to a file, e.g. all-dump.trig (here highlighted in yellow) from a server running locally on port 80. In practice, you'll probably need to change all the highlighted parts, such as the username:password login credentials, and the hostname in the target URL if not running locally.

...

Code Block
status=200, 13.283sec

Restore Repository from Backup

NOTE: This form of the procedure is a bit obsolete, since the new move-everything.sh script can also restore the state of a repository from its own backup -- effectively moving data to itself. See that command for details.

...

Code Block
status=201, 13.283sec

Procedure: Saving and Restoring User Accounts

As of the MS6 release, you can use the new Export/Import service to create user accounts automatically (e.g. on a newly-created repository). This is NOT the same thing as true backup and restore; rather, it is intended more for setting up a test environment. The export and import services are very complex and powerful. This only gives one small example of what they can do. For all the details, see their entry in the API Manual.

Step 0. Create Prototype Accounts and Export Them

Only do this once. Once you create a user file you like, you can use it over and over, on any different sites and tiers you like.

...

Code Block
.... -d 'exclude=frankenstein moreau lizardo' ....

Step 1. Import Accounts on Destination Sites

You can start with a newly-created repository which needs to have user accounts added. It only has the initial administrator login, e.g. bigbird. Use the import service to add users from the file you created in step 0. The following command adds all of the accounts except bigbird (since it already exists), and aborts without changing anything if there are already duplicates of any of the users on the destination repo. It will print "status=200" on success.

...

Note that the transform=yes argument means import will translate the instance URIs of the new users to newly-created URIs in the repository's default namespace. This is usually what you want. If you are positively restoring users already in the correct namespace and you want to preserve the old URIs, substitute transform=no.

Step 2. Testing Users

The easiest way to test the existence and details of a user is with the /whoami service. It does not show roles, however, you'll have to go to the repository administrative UI for that (or take it on faith). For example, after restoring users including curator, this is how you'd check that curator exists:

...

It's probably only necessary to test one user like this, and to make sure the output includes a URI, as a check that the whole import succeeded.

Procedure: Exporting and Importing Property Access Controls

This is only relevant to release 1.5MS1 and later, when resource properties have access controls.

...

Page tree

Page History

Versions Compared

Old Version 4

New Version 5

Key

Introduction

Components and Layout

Command-Line Tools

Installation

Platform Requirements

Scalability Limits

Install and Configure Repository

Prerequisites

Step 1. Get Repository Distribution

Step 2. Establish the Repository Home Directory

Step 3. Populate the Repository Home Directory from the Distribution

Step 4. Locate the Servlet Container (Apache Tomcat)

Step 5. Configure Tomcat: JAVA_OPTS and System Properties

Step 6. Install Apache Derby jars if necessary

Step 7. (OPTIONAL) Choose alternate Apache Derby implementation

Step 8. Install the Repository

Upgrades

Before Upgrading

Get the Repository Distribution

Back up

Step By Step Upgrade Procedure

Configuration

URIs for Creating New Roles, Transitions, and Workspaces

Rules of Creating Your Own URIs

Managing Access Controls on Contact & "Hidden" Properties

Configuration Reference

System Properties

Repository Home Directory

The Repository Configuration Properties File

Configuring Logging

Monitoring and Troubleshooting

Version Information

Log Files

Performance Monitoring

Tuning

Administrator Tools

make-snapshot.sh Script

Usage

Restoring Dumps made by make-snapshot

Examples

move-everything.sh: Copying Everything Between Repositories or Files

This Is Inherently Not A Good Idea

Restoring from Backups

Using the Script

Options

Fixed Arguments

Hints

move-resources.sh - Copying Only Resource Instances

This Is Inherently Not A Good Idea

Using the Script

Procedures

Procedure: Upgrading Packaged Tomcat

Procedure: Installing Repo on Ubuntu 10's packaged Tomcat6

Procedure: Run Tomcat on Port 80 (and 443)

Ubuntu

Fedora

Procedure: Dump and Restore the RDF Resource Data

Make Backup Dump (obsolete - see make-snapshot)

Restore Repository from Backup

Procedure: Saving and Restoring User Accounts

Step 0. Create Prototype Accounts and Export Them

Step 1. Import Accounts on Destination Sites

Step 2. Testing Users

Procedure: Exporting and Importing Property Access Controls