Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Added anchor

Table of Contents
maxLevel3

...

Introduction

The Data Repository is a software component that manages an RDF database and makes it available to other applications through a REST API, and gives end users specific views of the data. It adds role-based access control of varying granularity, transactional editing, custom treatment of ontologies and minimal/fast inference, and various administrative functions on top of the RDF database.

This page explains how it works on a host computer system, and how to install and maintain it. This page serves as an application administrator's manual for the development cycle.

 

Components and Layout

The data repository is installed in two intentionally separate places on the host operating system:

...

Another advantage to the separate location is that it gives the system administrator more flexibility to assign that directory to a location with appropriate capacity, reliability, and performance.

 

Command-Line Tools

The installed repository includes a set of command-line tools you will use for many of the administrative tasks. They are found in the etc/ subdirectory of the repository home directory. All of them respond to these two options:

  • --version - display what released version the tool came from
  • --help - display a synopsis of command args and switches

For example:

No Format

bash ${REPO_HOME}/etc/upgrade.sh --version
upgrade.sh from release 1.1-MS4.00 SCM revision 5422

 

Installation

 

Platform Requirements

  • This application requires Sun's JRE version 1.6, e.g. "Java HotSpot(TM) SE Runtime Environment".
  • The repository is a pure Java webapp and ought to run on any Java Servlet container conforming to the 2.5 version of the specification. It has only been thoroughly tested on Apache Tomcat 6.0 and Apache Tomcat 7.0, however.
  • The supporting utility scripts and tools require a Unix environment such as MacOS or Linux. MS Windows is NOT supported.
  • Aside from the Java Servlet environment, the webapp requires a separate "home" directory, located outside of the servlet container hierarchy, to which the container's JVM has read/write access.

...

Install and Configure Repository

 

Step 1. Get Repository Distribution

The repository is distributed as a single Zip file. It contains a file README which identifies the software release it was built from. It is the artifact produced by the Maven project:

No Format

org.eagle-i:eagle-i-repository-dist

 

Step 2. Establish the Repository Home Directory

...

We will call this directory REPO_HOME and it will appear in commands and scripts below as ${REPO_HOME}.
Create the repository home directory in your file system. It is useful to have a base eagle-i directory to place data and configuration used by other eagle-i applications. For example,

No Format

mkdir /opt/eaglei
mkdir /opt/eaglei/repo

If necessary (i.e if you created it using your own user-id), change ownership of the directory to the user-id under which Tomcat is running. If you followed the example above, change the ownership of the two directories using the -R option. For example, if the user-id under which Tomcat executes is tomcat

No Format

chown -R tomcat /opt/eaglei

Initialize it as a variable in your shell environment. In this example (Bourne/bash shell) the repository home directory is /opt/eaglei/repo :

No Format

REPO_HOME=/opt/eaglei/repo

 

Step 3. Populate the Repository Home Directory from the Distribution

Unpack the distribution Zip archive in a directory under /tmp:

No Format

cd /tmp
unzip repository-dist.zip

Move the contents of the unzipped directory to your repository home directory. In this example the distribution is version 1.1-MS1.00-SNAPSHOT

No Format

mv /tmp/repository-1.1-MS1.00-SNAPSHOT/* ${REPO_HOME}/.

List the contents of the home directory:

No Format

cd ${REPO_HOME}
ls

It should contain the subdirectories etc/ lib/ and webapps/

 

Step 4. Locate the Servlet Container (Apache Tomcat)

...

We will call this directory CATALINA_HOME and it will appear in commands and scripts below as

No Format

${CATALINA_HOME}

Initialize it as a variable in your shell environment. In this example (in Bourne/bash shell) the Tomcat'shome directory is /opt/tomcat:

No Format

CATALINA_HOME=/opt/tomcat

...

Ensure that your Tomcat server is run with the following options on its JVM. The simplest way to accomplish this is to have the environment variable JAVA_OPTS include those options, but each platform, distro, package etc. of Tomcat has its own mechanism for setting this variable. For example, on Fedora 14, it should be in the file /etc/tomcat6/tomcat6.conf. If you can't find your distribution's configuration file, you may create a file setenv.sh in tomcat's bin directory to add the environment variable:

No Format

...(ONLY DO THIS if you can't find your distribution's config file)
cd ${CATALINA_HOME}/bin
touch setenv.sh

Edit the configuration file (tomcat6.conf, setenv.sh or whatever your distribution uses) and add the following line:

No Format

JAVA_OPTS="-XX:PermSize=64M -XX:MaxPermSize=256M -Xmx1024m"

Add the following two system properties to file conf/catalina.properties under the CATALINA_HOME directory -- the same directory where you'll find server.xml. The value for both of these properties is the absolute path of the repository home directory. In this example, it is /opt/eaglei/repo:

No Format

# example
org.eaglei.repository.home = /opt/eaglei/repo
derby.system.home= /opt/eaglei/repo

...

Look in your Tomcat installation's main lib directory. If there are no files named derby.jar or derby-version.jar, you must install the Derby jars from the "scripts" distribution, e.g.

No Format

cp ${REPO_HOME}/lib/derby-* ${CATALINA_HOME}/lib/

...

Bourne/bash shell version:

No Format

....(ONLY DO THIS when ALREADY running Apache Derby!) 
export DERBY_HOME=my-derby-installation-toplevel 

C Shell/csh version:

No Format

....(ONLY DO THIS when ALREADY running Apache Derby!)
setenv DERBY_HOME my-derby-installation-toplevel 

NOTE: You must use the same version of Derby to create this initial user database as the version installed in Tomcat, so if Tomcat is already running a version of Derby, set  DERBY_HOME to use that.

 

Step 8. Install the Repository

Follow this step-by-step procedure. Before you start, make sure the Tomcat server is not running.

     

  1. Navigate to Tomcat's webapps directory. If there exist a directory named ROOT, move it aside. The eagle-i repository must be the ROOT application

    No Format
    
    cd ${CATALINA_HOME}/webapps
    mv ROOT ROOT.original
    
  2. Copy the repository webapp to the Tomcat webapps directory:

    No Format
    
    cp ${REPO_HOME}/webapps/ROOT.war ${CATALINA_HOME}/webapps/.
    
  3. Create your initial administrative user login. Think of a USERNAME and PASSWORD and substitute them for the upper case words in this command:

    No Format
    
    bash ${REPO_HOME}/etc/prepare-install.sh USERNAME PASSWORD ${REPO_HOME}
    
  4. Start up Tomcat.
  5. Run the finish-install script, which loads the data model ontology among other things. Note that you can also give it additional options to specify a personal name and email box for the initial admin user.

    No Format
    bash ${REPO_HOME}/etc/finish-install.sh USERNAME PASSWORD https://localhost:8443

    ...or, with username metadata included:

    No Format
    bash ${REPO_HOME}/etc/finish-install.sh \
    -f firstname \
    -l lastname \
    -m admin@ei.edu \
    USERNAME PASSWORD https://localhost:8443
  6. Run the upgrade.sh script, which preforms additional configurations.

    No Format
    bash ${REPO_HOME}/etc/upgrade.sh USERNAME PASSWORD https://localhost:8443
    
  7. Copy the file default.configuration.properties in located in {${REPO_HOME} }}into a  file named {{configuration.properties  and edit the latter to reflect your installation. See the  #Configuration section below for details on the property definitions and expected values.
  8. Restart Tomcat to pick up these configuration changes. Confirm that the eagle-i repository is running by visiting the admin page (login with USERNAME and PASSWORD):

    No Format
    
    https://localhost:8443/repository/admin
    

 

Anchor
upgrade
upgrade
Upgrade

This is the procedure to upgrade an existing repository instance to a new release of the software. All existing configurations, data, and user accounts are preserved. However, if the upgrade includes ontology changes there will also be an extra procedure to transform the existing data to reconcile it with ontology changes. 

 

Before Upgrading

 

Get the Repository Distribution

The repository release is distributed as a single Zip file. It contains a file README whcih identifies the software release it was built from. It is the artifact produced by the Maven project:

No Format

org.eagle-i:eagle-i-repository-dist

...

It would be a wise precaution to make a backup of the current repository state so you can roll back to it in case of fatal problems with the upgrade. Follow the Backup Procedure in the #Procedures section to get a snapshot of the current repository contents.

 

Step By Step Upgrade Procedure

Note that the directory macros ${CATALINA_HOME} and ${REPO_HOME} are used in the examples here; see the Install Procedure above for a description of what they mean.

  1. Unpack the distribution Zip archive in a directory e.g. under /tmp:

    No Format
    
    cd /tmp
    unzip repository-dist.zip
    
  2. Shut down your Tomcat java servlet container.
  3. Delete the old repo webapp subdirectory and WAR file, since there should not be any local modifcations there. For example:

    No Format
    
    rm -rf ${CATALINA_HOME}/webapps/ROOT*
    
  4. Save the current release files in case you have to roll back:

    No Format
    
    cd ${REPO_HOME}
    mv etc etc.old
    mv lib lib.old
    mv webapps webapps.old
    
  5. Copy the distribution into place (in this example the distribution is version 1.7-MS1.01) -- note there are 2 steps:

    No Format
    
    cp -f -rp /tmp/repository-1.7-MS1.01/* ${REPO_HOME}
    cp ${REPO_HOME}/webapps/ROOT.war ${CATALINA_HOME}/webapps/.
    
  6. Start up your tomcat java servlet container.
  7. Run the upgrade script, substituting your admin's username and password:

    No Format
    bash ${REPO_HOME}/etc/upgrade.sh USERNAME PASSWORD https://localhost:8443

    Watch the output of upgrade.sh very carefully! Pay particular attention to the final status and any messages beginning "WARN", they will indicate problems you MUST resolve.

  8. Confirm that it worked: visit the repo admin page, check for new version, and then follow the link to Show Data Model Ontology versions to confirm that "loaded" and "available" versions of the ontology are the same.When running the upgrade script, there may be messages about out-of-date NG_Internal and NG_Query graphs. Most likely, these are nothing to worry about -- check the release notes. These graphs are only initialized from static files when the repository was created, and afterward they accumulate statements, so reloading a new copy of the original data is not practical. Some releases may include instructions for making changes in these graphs when upgrading from previous versions.
  9. Download the data migration toolkit that corresponds to your repository version (in this example, version 1.7-MS1.02) and run the data migration script, substituting your admin's username and password:

    No Format
    
    wget -O ${REPO_HOME}/etc/eagle-i-datatools-datamanagement.jar \
    http://infra.search.eagle-i.net:8081/nexus/content/repositories/\
    releases/org/eagle-i/eagle-i-datatools-datamanagement/1.7-MS1.02/\
    eagle-i-datatools-datamanagement-1.7-MS1.02.jar
    
    bash ${REPO_HOME}/etc/data-migration.sh -u USERNAME -p PASSWORD -r https://localhost:8443
    

    Watch the output of data-migration.sh very carefully! Pay particular attention to the final status and any messages beginning "WARN", they will indicate problems you MUST resolve. In addition to the output on screen, the data-migration script will place a data migration report in the logs directory directly under /etc.

 

Configuration

 

URIs for Creating New Roles, Transitions, and Workspaces

...

For Workspace (aka Named Graph) URIs, you have to assign them in the process of creating a new Named Graph. Follow the rules below to create a reasonable URI.

 

Rules of Creating Your Own URIs

...

http://dartmouse.edu/repo/Role_LabRathttp://dartmouse.edu/repo/WFT_13_2http://eagle-i.org/ont/repo/1.0/DARTMOUSE_ROLE_PIhttp://eagle-i.org/ont/repo/1.0/DARTMOUSE_WFT_TRASH
Exception: The URI of a named graph representing an ontology is usually the same as the URI of the ontology itself, i.e. the subject of its owl:versionInfo statement. If you should happen to add a new ontology named graph to the repository, use that URI for its name. However this should be a very rare occurrence; usually new ontological information is simply added to the existing eagle-i data model ontology graph.

 

Managing Access Controls on Contact & "Hidden" Properties

...

Once you have set up a single repository to your liking, you can export and re-import the grants to other repositories. See the Procedure: Exporting and Importing Property Access Controls section below.

 

Configuration Reference

This section lists everything that can be configured, so you can get familiar with it before installing anything.

 

System Properties

The repository requires these system properties to be defined in the JVM environment running your servlet container:

...

If you are using the Apache Tomcat version 6 container (which is recommended), you can add these system properties to file conf/catalina.properties - add lines like these: (note that the path /opt/eaglei/repo is just shown as an example)

No Format

org.eaglei.repository.home = /opt/eaglei/repo
derby.system.home= /opt/eaglei/repo

...

  • configuration.properties - java properties file with repository and log4j configuration props. This is optional, it must be created by the administrator.
  • logs/ - Default subdirectory for log files, see configuration. Created automatically by default.
  • sesame/ - Default Sesame RDF database files - DO NOT TOUCH. Created automatically by default.
  • etc/ - Contains scripts and tools for the repo administrator.
  • db/ - Default subdirectory Derby RDBMS files - DO NOT TOUCH. Created automatically by default.

 

The Repository Configuration Properties File

...

  • eaglei.repository.namespace - The namespace URI prefix for Eagle-I resource instances created in the repository.
    • Every administrator should set this to a reasonable value for his/her site, because the default is NOT desireable.
    • The value must be a fully qualified, resolvable, HTTP URL.
    • For example, http://foo.bar.edu/i/
    • Use the http scheme, NOT https, since the container will redirect to https if necessary, but it is not possible to direct back if it becomes preferable to use http later.
    • The system-generated default is the hostname followed by /i/ -- but this is often wrong, since Java's determination of hostnames in a servlet container environment is not reliable.
  • eaglei.repository.title - the decorative title for UI pages, should be set for cosmetic reasons.
    • Set this to the name of your site, e.g. "Miskatonic University School of Medicine".
  • eaglei.repository.logo - URL of the logo image for your site, may be either relative URL (to refer to a image embedded in the webapp) or an absolute URL to use an image hosted elsewhere. It should be about 50 pixels high and a suitable with given the proportions.
  • eaglei.repository.index.url - Set this to the URL to which you want the site's "root" (top-level index) page redirected. Although the repository is installed as the root webapp to have control over resolving Semantic Web URIs, it does not need the root page so this allows you to configure your site as you like.
  • eaglei.repository.admin.backgroundColor - Lets you change the background color for admin web UI pages, to give admins an obvious cue when they are operating on e.g. the production vs. test repos. Value is CSS color expression, e.g. crayon name like "bisque" or hex #CCFFCC (Added in Release 1.2MS2 or 3)
  • eaglei.repository.instance.xslt - path to XSL stylesheet used to transform the HTML output of the instance dissemination service. A value for this key is required to produce XHTML in the dissemination service; without it, the service returns the internal XML document describing the instance.
    • If it is a relative path then it must be located relative to the root of the web application, if absolute then it is in the filesystem at large.
    • The advantage of keeping your stylesheets external to the webapp is that you can change them easily, and don't have to modify the webapp from its default installation.
    • An example is provided at repository/styles/example.xsl which creates very simple HTML, as a demonstration of how to write an XSL stylesheet.
  • eaglei.repository.instance.css - URI of the CSS stylesheet resource to be used to style instance dissemination pages. It must be an absolute path or absolute URL. The default is:

    No Format
    eaglei.repository.instance.css = /repository/styles/i.css
  • eaglei.repository.tbox.graphs - a comma-separated list of graph URIs making up the "TBox".
    You should never have to set this! It is configurable "just in case", and for testing/experimenting. For more information, see the section on inferencing in the API Manual.
    By default, the TBox consists of:
    • The repository's internal ontology, http://eagle-i.org/ont/repo/1.0/
    • The eagle-i data model ontology, http://purl.obolibrary.org/obo/ero.owl
  • eaglei.repository.datamodel.source - the full name of a resource within the webapp which its itself a property file describing the RDF data model ontology. You should not need to set this, the default is adequate for the eagle-i applicaiton. Default is eaglei-datamodel.properties which is a built-in resource file.
    For a description of the contents of this properties file, see the separate document Guide to Data Model Configuration Properties
  • eaglei.repository.sesame.dir - directory where Sesame RDF database files are created.
    • Defaults to sesame subdirectory of home dir.
  • eaglei.repository.log.dir - Directory where log files are created.
    • Defaults to logs subdirectory of the home dir. 
    • You can also configure log4j explicitly by adding log4j properties to this file.
  • eaglei.repository.sesame.indexes - index configuration for Sesame triple store. Must be a comma-separated list of index specifiers, see Sesame NativeStore configuration documentation for details. Use this to change the internal indexes Sesame maintains to process queries. It takes effect on next servlet container (tomcat) restart.

    Warning
    titleWARNING

    If you have a configured value and wish to go back to the default, do NOT just delete this configuration property. If you do, Sesame will simply keep the existing indexes. You must change it to the original default value, which is documented int he default configuration file.

  • eaglei.repository.slow.query - Value in seconds of time after which a SPARQL query should be considered "slow" and logged as such. Only affects the SPARQL Protocol endpoint service. Default is 0, which never logs. Use this to check for performance problems, since it logs the full text of the query and time of occurance in the regular log at INFO level.
  • eaglei.repository.sparqlprotocol.max.time - Time limit, in seconds, of the maximum time allowed for a query invoked by the SPARQL Protocol endpoint. Note that this does not affect any internally-generated SPARQL queries.
    • Any user can override this setting to impose a shorter timeout by giving a value for the nonstandard time argument.
    • Only the Administrator can override with a longer timeout.
    • The built-in default is 600 seconds (10 min) if nothing is configured.
    • If a SPARQL Protocol request cannot be complted within the timeout, it returns an HTTP 413 status (result too large - it was the standard response code that comes closest to the concept).
  • eaglei.repository.anonymous.user - This is a hack, only intended for testing the Anonymous role. Its value is a username, e.g. "nobody". If configured, when the designated user logs in, their session is downgraded to the Anonymous role; this allows explicit testing of Anonymous (vs. Authenticated) access even when the webapp configuration does not allow unauthenticated access. ONLY TESTERS SHOULD EVER NEED TO SET THIS.
  • Configuring Contact Hiding:*The following properties control the contact hiding extension, which restricts the display of "contact location" properties of instances and instead offers an anonymous email option. Red properties are required *only if you enable contact hiding:
    • eaglei.repository.hideContacts - true|false, enables the contact hiding function. When it is false, none of the other properties are used.
    • eaglei.repository.postmaster - email address of repository administrator(s). User-generated messages about resources without a contact email address get sent here, as well as diagnostic messages. We recommend using an email list or alias so it can be changed or directed to multiple people.
    • eaglei.repository.mail.host - hostname of SMTP server for outgoing mail, defaults to localhost.
    • eaglei.repository.mail.port - TCP port number of SMTP server for outgoing mail, only necessary if using a non-default port for your chosen type of service.
    • eaglei.repository.mail.ssl - Use SSL for connection to SMTP server for outgoing mail, value is true or false.
    • eaglei.repository.mail.username - Username with which to authenticate to SMTP server for outgoing mail, default is unauthenticated.
    • eaglei.repository.mail.password - password with which to authenticate to SMTP server for outgoing mail, default is none.

...

Note that the properties file may also contain Log4J configuration properties. For example you can turn on debugging log output by adding this line:

No Format

log4j.logger.org.eaglei.repository=DEBUG, repository

...

The default log4j configuration sets up an appender named repository with buffered I/O for efficiency. Note that this means log messages will not appear in the log file immediately, but only after the logging volume fills a buffer. This is useless for interactive debugging through the logs. If you are doing interactive debugging and want to see more log detail, along with immediate results, you should add the properties:

No Format

log4j.logger.org.eaglei.repository=DEBUG, repository
log4j.appender.repository.BufferedIO=false
log4j.appender.repository.ImmediateFlush=true

Also note that the default configuration turns off additivity in the repo root Logger; this means its log events do not propagate up to e.g. the root logger. If you wish to turn it back on, add this to your configuration:

No Format

log4j.additivity.org.eaglei.repository=true

Here are all of the default log4j configuration properties:

No Format

log4j.logger.org.eaglei.repository=INFO, repository
log4j.additivity.org.eaglei.repository=false
log4j.appender.repository=org.apache.log4j.RollingFileAppender
log4j.appender.repository.File=${eaglei.repository.log.dir}/repository.log
log4j.appender.repository.ImmediateFlush=false
log4j.appender.repository.BufferedIO=true
log4j.appender.repository.Append=true
log4j.appender.repository.Encoding=UTF-8
log4j.appender.repository.layout=org.apache.log4j.PatternLayout
log4j.appender.repository.layout.ConversionPattern=%d{ISO8601} %p %c - %m%n

IMPORTANT NOTE: If you add logger configurations to tweak the level of a subset of the repo log hierarchy, you must add an additivity configuration to prevent log4j from applying the ancestor logger as well, which would result in double log entries. For example, this fragment shows a default log level of INFO but adds DEBUG logging of RepositoryServlet to get elapsed time messages:

No Format

log4j.logger.org.eaglei.repository=INFO, repository
log4j.additivity.org.eaglei.repository=false
log4j.logger.org.eaglei.repository.servlet.RepositoryServlet=DEBUG, repository
log4j.additivity.org.eaglei.repository.servlet.RepositoryServlet=false
log4j.appender.repository.BufferedIO=false
log4j.appender.repository.ImmediateFlush=true

Monitoring and Troubleshooting

 

Version Information

It's often helpful to know exactly what version of the repository you're dealing with, especially in a hectic development and/or testing environment when many versions are available. The release version appears in these places:

  1. Dissemination HTML pages, the head element contains a meta tag with the name eaglei.version, e.g.

    No Format
    <meta name="eaglei.version" content="1.1-MS5.00-SNAPSHOT" />
  2. The repository admin home page /repository/admin lists application version info in a human-readable format.
  3. The page /repository/version gives a complete breakdown of component versions, including repo source and the version of the OpenRDF Sesame database. It is XHTML, and it includes meta tags to be easy to scrape or transform.

 

Log Files

Since the repository is mainly accessed by the REST service API it provides to other applications, you should get used to monitoring it by watching the log file. This is a text file (UTF-8 encoding) maintained by the log4j library under the control of the repository's configuration properties. See the description of the log.dir property above to learn the directory where logfiles are created; they are automatically rotated when the logfile grows too large.

...

To troubleshoot problems with the logging system itself (e.g. log4j config that isn't working as expected), look for where your Java Servlet container writes the standard output stream. For Tomcat 6, this is typically the catalina.out file in some log directory.

 

Performance Monitoring

As of release 1.1MS5 the repo can log the elapsed time (in milliseconds) for each service request. You must enable DEBUG level logging for the RepositoryServlet, as in this configuration example.

No Format

log4j.logger.org.eaglei.repository=INFO, repository
log4j.additivity.org.eaglei.repository=false
log4j.logger.org.eaglei.repository.servlet.RepositoryServlet=DEBUG, repository
log4j.additivity.org.eaglei.repository.servlet.RepositoryServlet=false
log4j.appender.repository.BufferedIO=false
log4j.appender.repository.ImmediateFlush=true

As of release 1.2MS3 the repo will also show the time spent on internal SPARQL queries, which can be useful when tuning Sesame indexes. Add these log4j configuration lines to see just the query log messages:

No Format

log4j.logger.org.eaglei.repository.util.SPARQL = DEBUG, repository
log4j.additivity.org.eaglei.repository.util.SPARQL = false

Then, you'll see log entries like this which you can correlate to requests from your application:

No Format

...service invocation examples:

2011-01-27 14:28:06,483 T=http-8443-1 DEBUG org.eaglei.repository.servlet.RepositoryServlet - 
============== Ending Request /repository/update (2,159 mSec elapsed)

2011-01-27 14:27:58,023 T=http-8443-1 DEBUG org.eaglei.repository.servlet.RepositoryServlet - 
============== Ending Request /repository/workflow/push (1,763 mSec elapsed)

... (internal query example:)

2011-04-15 14:13:28,383 T=http-8443-1 DEBUG org.eaglei.repository.util.SPARQL - 
SPARQL Query executed by 
org.eaglei.repository.model.User:findAll at line 227 in elapsed time (mSec) 15

...

See the eaglei.repository.slow.query configuration property for more details. Note that this only applies to to queries made through the SPARQL Protocol endpoint, not the SPARQL queries generated internally by the repo code.

 

Tuning

The performance of Sesame's NativeStore implementation is extremely sensitive to its index configuration. There is a major benefit to configuring indexes that help resolve triple patterns used by the most frequent and/or voluminous SPARQL queries. A knowledgeable repository administrator should adjust the setting of the eaglei.repository.sesame.indexes property to get the NativeStore to build the most necessary indexes. See doc on that configuration for more details.

 

Administrator Tools

 

make-snapshot.sh Script

The make-snapshot script creates a complete backup copy of a data repository, in a designated directory. It has to be given a directory because the backup consists of multiple files. It is packaged with the repository distribution, under the etc/ directory.

...

NO MESSAGE is printed upon success, which lets it run under cron.

 

Usage

Synopsis:

No Format

make-snapshot.sh username password repo-URL directory

...

  • username - username with which to authenticate to the repo
  • password - password with which to authenticate to the repo
  • repo-URL - prefix of repository URL, e.g. "https://localhost/"
  • directory - directory in which to write the dump, will be created if necessary

 

Restoring Dumps made by make-snapshot

Given a dump created in e.g. ${DUMPDIR}, to restore this dump on a newly-created, empty, repository, use these commands: (where ${REPOSITORY} is URL prefix of the repo)

No Format

curl -D - -s -S -u ADMIN:PASSWORD -F type=user -F format=application/x-trig \
-F content=@${DUMPDIR}/users.trig -F duplicate=replace \
-F transform=no ${REPOSITORY}/repository/import
No Format

curl -s -S -D - -u ADMIN:PASSWORD -F action=replace -F all= \
-F "content=@${DUMPDIR}/resources.trig;type=application/x-trig" \
${REPOSITORY}/repository/graph

...

in a differently-named directory each day, rotating through a week:

No Format

make-snapshot.sh ADMIN PASSWORD https://localhost:8443 "daily_cron_`date +%u`"

...

Since resource URIs have to be resolvable, this effectively creates new resources in the destination repository with URIs that resolve there. It does this by substituting the target's default prefix into all URIs that used to resolve at the source repository.

 

This Is Inherently Not A Good Idea

...

Given all of these limitations, move-everything can still be an effective way of populating a repository for testing and demonstrations. Just stay aware of what doesn't work, and only use it when the results are temporary and will be discarded.

 

Restoring from Backups

There is one other legitimate use of move-everything: restoring a backup copy made with make-snapshot. In this case you don't really have to transform the URIs, and the whole intent is to re-create the original state of the repo so the side effects are all desired.

 

Using the Script

The resource copying script is installed under etc/ in the repository home directory. Its name is move-everything.sh . It only runs on a Unix-based operating system such as Linux or MacOS X. It requires bash, perl 5, and the curl executable.

The synopsis for copying from repository to repository:

No Format

Usage: move-everything.sh [--version|--version] [ -f | --force ]
[-exclude-users user,user,..|-exclude-users user,user,..] [-nousers]
from-username from-password from-repo-URL
to-username to-password to-repo-URL

The synopsis for copying from file to repository:

No Format

Usage: move-everything.sh [--version|--version] [ -f | --force ]
[-exclude-users user,user,..|-exclude-users user,user,..] [-nousers]
--from-snapshot directory --from-prefix from-prefix
to-username to-password to-repo-URL

...

The --from-snapshot and --from-prefix options must be specified together. They select the input data from a directory of serialized files, in the same format as produced by the make-snapshot script. The value of --from-snapshot is the path to the direcotry containing the RDF serialization files. The value of -from-prefix is the exact and complete URI prefix (including the trailing '/') of the repo that generated the dump in the directory. This is necessary because the script does not ahve access to that repository to query it for its prefix.

 

Fixed Arguments

The fixed command arguments are either one or two triplets of repository access information, i.e. the username, password, and URL of each repo.

...

Here is an example that copies from the production Harvard repo to a local one:

No Format

move-everything.sh bigbird PASSWORD https://harvard.eagle-i.net \
bigbird PASSWORD https://localhost:8443

Here is an example that copies a snapshot the production Harvard repo to a local one:

No Format

make-snapshot bigbird PASSWORD https://harvard.eagle-i.net \
harvard.monday


move-everything.sh -f \
--from-snapshot harvard.monday \
--from-prefix http://harvard.eagle-i.net/i/ \
bigbird PASSWORD https://localhost:8443

...

We strongly recommend you avoid using the Superuser (administrator) login on the source repository, to prevent accidentally obliterating it by getting the argument order wrong. Use an account that has read access to every graph (e.g. the Admin-Read-Only role). This restricts you to using the --nousers version of the command but in most cases that is adequate. See the #Procedures section for recommendations on how to maintain copies of repositories this way.

 

move-resources.sh - Copying Only Resource Instances

...

Since resource URIs have to be resolvable, this effectively creates new resources in the destination repository with URIs that resolve there. The hostname portion of the URI matches the new repository server, and even the local name is allocated by the destination repository -- so there is no predictable way to relate new URIs to the old ones.

 

This Is Inherently Not A Good Idea

...

Given all of these limitations, the resource-mover script can still be an effective way of populating a repository for testing and demonstrations. Just stay aware of what doesn't work.

 

Using the Script

The resource copying script is installed under  etc/ in the repository home directory. Its name is move-resources. It only runs on a Unix-based operating system such as Linux or MacOS X. It requires perl 5 and the curl executable.

Run it with -h to get the synopsis:

No Format

Usage: move-resources [-verbose] [-replace]
[--type published|workspace]{ --file source-file --prefix uri-prefix | --source source-repo-url 
	--user login:password --graph src-graph-URI }
dest-repo-url dest-login:dest-password dest-graph-URI



(options may be abbreviated to first letter, e.g. -f)

...

Here is an example command, it copies from the Published graph on qa.harvard to an "Experimental" graph on the local repo (on https://localhost:8443)

No Format

move-resources -s https://qa.harvard.eagle-i.net:8443 -u bert:ernie \
-g http://eagle-i.org/ont/repo/1.0/NG_Published https://localhost:8443 \
root:password http://eagle-i.org/ont/repo/1.0/NG_Experimental



Moved 4694 data statements and 322 metadata statements.

Procedures

 

Procedure: Upgrading Packaged Tomcat

...

  1. Shut down tomcat. This is major surgery, and tomcats don't like to be vivisected no matter how much more satisfying you may find it.
  2. Disable Java Security -- alternately, you could try to configure all the authorization grants to give the repository webapp access to the filesystem and property resources it needs, but I found it much easier to just disable java security. DO NOT RUN THE TOMCAT PROCESS AS ROOT if you do this, but you should not be running it as root in any case. That's just insane.
    1. Edit the file /etc/init.d/tomcat6 and change the following variable to look like this:

      No Format
      TOMCAT6_SECURITY=no
  3. Install Derby jars: ONLY IF DERBY IS NOT ALREADY INSTALLED IN THE COMMON AREA OF YOUR TOMCAT. If another webapp is already using Derby, they should share that version.
    1. Find the Derby jars in the lib/ subdirectory under where you installed the create-user.sh script.
    2. Copy them to the Tomcat common library directory:

      No Format
      cp ${REPO-ZIP-DIR}/lib/derby* /usr/share/tomcat6/lib/
  4. Install the webapp: First, get rid of any existing root webapp, then copy in the webapp (ROOT.war file from your installation kit) and be sure it is readable by the tomcat6 user:

    No Format
    rm /var/lib/tomcat6/webapps/ROOT*cp ROOT.war /var/lib/tomcat6/webapps/ROOT.war
  5. Install cached webapp context: This is VERY IMPORTANT, and the Tomcat docs does not even mention it, but without it your server will be mysteriously broken. The file /etc/tomcat6/Catalina/localhost/ROOT.xml must be a copy of your app's context.xml. Redo this command after installing every new ROOT.war:

    No Format
    mkdir -p /etc/tomcat6/Catalina/localhost
    unzip -p /var/lib/tomcat6/webapps/ROOT.war META-INF/context.xml > /etc/tomcat6/Catalina/localhost/ROOT.xml
  6. Add System Properties: Be sure you have added system properties to the file /etc/tomcat6/catalina.properties e.g.

    No Format
    org.eaglei.repository.home = /opt/eaglei/repoderby.system.home = /opt/eaglei/repo

    ...of course, the value of these properties will be your Repository Home Directory path.

  7. Start up Tomcat:

    No Format
    sudo /etc/init.d/tomcat6 start
  8. Troubleshooting: If there are problems, check the following places for logs (because packaged apps make everything so much easier):
    • /var/log/daemon.log - really dire tomcat problems and stdout/stderr go to syslog
    • /var/log/tomcat6/* - normal catalina logging
    • ${REPOSITORY_HOME}/logs/repository.log - default repo log file in release 1.1; under 1.0 the filename was default.log.

...

  • have been tested under Ubuntu Linux 9.10 _(krazy kitten), Fedora 12 and 14, and CentOS 6.03
  • assume you are running Tomcat on port 8080. To redirect the HTTPS (HTTP on SSL) port, also run the 3 additional iptables commands (assuming port 443) below.
  • require root privileges
  • assume the Bourne shell (/bin/sh)

 

  1. To check the what rules are running

    No Format
    iptables -t nat -n -L
  2. Discover your machine's primary IP address and set the ADDR shell variable: (Note that this assumes eth0 is your primary network interface --use

    

    ifconfig -a to see them all)

    No Format
    ADDR=`ifconfig eth0 | perl -ne 'print "$1\n" if m/\sinet addr\:(\d+\.\d+\.\d+\.\d+)\s/;'`
  3. Run these iptables commands to redirect all port 80 requests to port 8080.

    No Format
    iptables -t nat -A OUTPUT -d localhost -p tcp --dport 80 -j REDIRECT --to-ports 8080
    iptables -t nat -A OUTPUT -d $ADDR -p tcp --dport 80 -j REDIRECT --to-ports 8080
    iptables -t nat -A PREROUTING -d $ADDR -p tcp --dport 80 -j REDIRECT --to-ports 8080
  4. (If using SSL) Run these iptables commands to redirect all port 443 requests to port 8443.

    No Format
    iptables -t nat -A OUTPUT -d localhost -p tcp --dport 443 -j REDIRECT --to-ports 8443
    iptables -t nat -A OUTPUT -d $ADDR -p tcp --dport 443 -j REDIRECT --to-ports 8443
    iptables -t nat -A PREROUTING -d $ADDR -p tcp --dport 443 -j REDIRECT --to-ports 8443
  5. Check that your new rules are running (use the command above)
  6. Additional configuration
    1. Ubuntu
      1. Save the rules in the canonical place to be reloaded on boot:

        No Format
        iptables-save > /etc/iptables.rules
      2. Create a script to be run by the network startup infrastructure that will reload the iptables whenever the network is configured on:

        No Format
        cat << EOF > /etc/network/if-pre-up.d/iptablesload
        #!/bin/sh
        iptables-restore < /etc/iptables.rules
        exit 0
        EOF
    2. Fedora
      1. Save the rules to be reloaded on boot:
        1. The cleaner/preferable method, but apparently not working:

          No Format
          /sbin/iptables-save
        2. Hacky, but works: manually edit /etc/sysconfig/iptables
      2. Update the startup settings so iptables will run upon reboot:

        No Format
        chkconfig --level 35 iptables on
  7. Test by accessing your server both locally and remotely by the port-80 URL. Then reboot the machine and try it again to be sure the iptables commands are run correctly on boot.

 

Procedure: Dump and Restore the RDF Resource Data

...

This is a complex manual procedure with many options -- for a simpler semi-automated backup snapshot procedure, see the section on using the make-snapshot script.

 

Make Backup Dump (obsolete - see make-snapshot)

Typical command to make a backup, in TriG format to a file, e.g. all-dump.trig (here highlighted in yellow) from a server running locally on port 80. In practice, you'll probably need to change all the highlighted parts, such as the username:password login credentials, and the hostname in the target URL if not running locally.

No Format

curl -G -X GET -s -S -u username:password -o all-dump.trig -d all \
--write-out 'status=%{http_code}, %{time_total}sec\n' \
-d format=application/x-trig https://localhost:8443/repository/graph

Be sure the output shows a successful status code (namely 200), as shown here, since curl will return a successful status even if the HTTP service did not succeed; curl only reports on the success of the network request-and-response transaction.

No Format

status=200, 13.283sec

Restore Repository from Backup

...

Warning
titleWARNING

this replaces the entire contents of the repository!

No Format

curl -s -S -u username:password -F action=replace -F all= \
--write-out 'status=%{http_code}, %{time_total}sec\n' \
-F 'content=@all-dump.trig;type=application/x-trig' https://localhost:8443/repository/graph

Be sure the output shows a successful status code (namely 201, since it created graphs), as shown here, since curl will return a successful status even if the HTTP service did not succeed; curl only reports on the success of the network request-and-response transaction.

No Format

status=201, 13.283sec

Procedure: Saving and Restoring User Accounts

As of the MS6 release, you can use the new Export/Import service to create user accounts automatically (e.g. on a newly-created repository). This is NOT the same thing as true backup and restore; rather, it is intended more for setting up a test environment. The export and import services are very complex and powerful. This only gives one small example of what they can do. For all the details, see their entry in the API Manual.

 

Step 0. Create Prototype Accounts and Export Them

...

Now run a command like this to export the accounts into the file all-users.trig

No Format

curl -s -S -u username:password -G -d type=user -d format=application/x-trig \
--write-out 'status=%{http_code}\n' \
-o all-users.trig https://hostname:8443/repository/export

Note that you have to change the hostname and possibly the login. If there are accounts you do not want in the export, add an exclude argument to filter them out, with a space-separated list, e.g.

No Format

.... -d 'exclude=frankenstein moreau lizardo' ....

...

You can start with a newly-created repository which needs to have user accounts added. It only has the initial administrator login, e.g. bigbird. Use the import service to add users from the file you created in step 0. The following command adds all of the accounts except bigbird (since it already exists), and aborts without changing anything if there are already duplicates of any of the users on the destination repo. It will print "status=200" on success.

No Format

curl -s -S -u username:password -F type=user -F format=application/x-trig \
-F transform=yes --write-out 'status=%{http_code}\n' \
-F exclude=bigbird \
-F content=@all-users.trig https://hostname:8443/repository/import

Note that the transform=yes argument means import will translate the instance URIs of the new users to newly-created URIs in the repository's default namespace. This is usually what you want. If you are positively restoring users already in the correct namespace and you want to preserve the old URIs, substitute transform=no.

 

Step 2. Testing Users

The easiest way to test the existence and details of a user is with the /whoami service. It does not show roles, however, you'll have to go to the repository administrative UI for that (or take it on faith). For example, after restoring users including curator, this is how you'd check that curator exists:

No Format

curl -s -S -u curator:password -G -d format=text/plain https://hostname:8443/repository/whoami

It's probably only necessary to test one user like this, and to make sure the output includes a URI, as a check that the whole import succeeded.

 

Procedure: Exporting and Importing Property Access Controls

...

To export property grants, plug those URIs into the following command (you need to replace italicized words):

No Format

curl -G -k -u ADMIN:PASSWORD -d type=grant -d "include=HIDE,CONTACT" \
-d format=application/x-trig https://localhost:8443/repository/export

This writes a record of grants to the standard output. Since the URIs are the same between other repositories running the same data model, you should be able to import them with the command (shows standard input in the example):

No Format

curl -k -u ADMIN:PASSWORD -F type=grant \
-F duplicate=abort -F transform=no -F content=@- \
-F format=application/x-trig https://localhost:8443/repository/import