This document is intended to be a thorough description of the data repository design, concentrating on its external interfaces. Its intended audience is implementors of other components that depend on the repository, as well as the repository coders. It should be kept up to date with any API changes so it can serve as a reference manual.
The repository is essentially an RDF triple-store with some extra features dictated by the needs of data entry tools, search, dissemination, and the data collection and curation process. Here are some of them:
See the Repository Requirements page to review the original requirements list and correlate it with what this design document provides.
This section describes the internal conceptual data model.
The primary purpose of the repository is to store, edit, and retrieve resource instances for its clients. The term comes from the eagle-i data model: a resource is an indivisible unit of something described in the eagle-i database. A resource instance in the repository is the corresponding graph of RDF statements that represents this resource as abstract data. It is rooted at one subject URI.
A resource instance is defined as the subject URI, and the collection of statements in which it is the subject. Furthermore, there must be one statement whose predicate is rdf:type and whose object is a URI (preferably of type owl:Class, but this is not checked).
The significance of resource instances in the repository architecture is:
In Cycle 3, the definition of a resource instance was extended to encompass embedded instances. This only changes the rules governing what statements belong in the instance's graph, although it has profound effects on the behavior of the repository. Embedded instances are essentially regular resource instances which are considered part of a "parent" resource instance. The precise definition of an embedded instance is as follows:
http://eagle-i.org/ont/app/1.0/ClassGroup_embedded_class |
Here is how EIs behave in repository operations:
Creation: An EI is created by adding a new URI with an appropriate rdf:type statement to a modification (including creation) of its parent. The type must belong to the embedded class group.
Modification, Deletion: Any modification of an EI must be done as a modification of its parent. The EI's properties, including type, may be changed; it may be deleted. These changes are recorded as a modification of the parent. The changes to the parent and its EIs may be driven by one HTTP request to the /update service, and will be performed in a single transaction.
Dissemination: A dissemination request on the parent instance will include all of the statements about its EIs. The EIs will be filtered of hidden properties (e.g. admin data and contact hiding) by the same rules as the parent, and returned in the same serialized RDF graph.
Dissemination requests on EIs are not supported. It is not recommended, the results are undefined.
Metadata Harvest:
See the description of the /harvest service for full details. Essentially, since EIs do not have an independent presence in the "instance" model of the repository, they are not reported on individually when the harvest service reports changes. A change to an EI, even deletion of the EI, is reported as a change to its parent. Likewise, creation of an EI is also reported as a change to its parent.
The repository is required to hide statements with certain predicates when exporting resource instances in these contexts:
The set of predicates to be hidden is defined by the data model ontology, and identified by the data model configuration properties:
The hidden predicates are themselves subjects of statements whose predicate is the hiding predicate, and whose object is the hiding object, e.g.
...for example this configuration would hide dm:someStupidProperty: datamodel.hideProperty.predicate = dm:hasSpecialAttribute datamodel.hideProperty.object = dm:cantSeeMe ....and then later on, in the ontology (shown in N3): dm:someStupidProperty dm:hasSpecialAttribute dm:cantSeeMe. |
The mechanism of hidden-property hiding is implemented through access controls. See that section for more details.
This has to be implemented in the repository in order to enforce a consistent security model, which would not be possible if content hiding were left up to each client application.
Properties are "hidden" for various reasons, such as:
The "contact" issue is closely related to property hiding. The essential problem is that for every resource instance, it is desired to have a means for the agent viewing that resource (e.g. a user in a Web browser viewing a Semantic Web-style dissemination page of the resource instance) to contact the agent who "owns" (or is otherwise responsible for) the resource. Email is one means of implementing this contact, but certainly not the only one. The contact could be in the form of a telephone number, street address, or even a redirect to another Web site which might include indirect contact info of its own. The purpose is to put a potential consumer of the resource in touch with its owner.
The repository only gets involved to mediate this contact process because it is also responsible for hiding all contact information from the agents who would use it. It must therefore implement some means of accepting a contact request or message from the outside agent, and forward it to the determined owner of the resource.
Contact properties are identified in the same way as hidden properties, only the relevant data model configuration keys are:
datamodel.contactProperty.predicate
datamodel.contactProperty.object
The mechanism of contact-property hiding is implemented through access controls - see that section for more details.
There is a separate ontology document describing the repository's internal data model and administrative metadata.It i s an attachment to this page. Note that some statements described by that ontology appear as publically-readable metadata statements, while others are private and never exposed outside of the repository codebase.
The "ontology" graph is considered read-only in normal operation. All internal metadata (i.e. administrative metadata) is stored in a separate, distinct, named graph which should only be available to the repository's internal operations.
The repository design takes full advantage of the named graph abstraction provided by most modern RDF database engines (Sesame, Virtuoso, Anzo, AllegroGraph). Every statement in the RDF database belongs to exactly one named graph. Since this is typically implemented by adding a fourth column to each triple for the named graph's URI, these databases are often called quad-stores instead of triple-stores. The data repository design takes advantage of named graphs:
Internally, we collect some metadata about each named graph: access control rules, of course, and a type that documents the purpose of the named graph.
The repository is created with a few fixed named graphics for specific purposes (e.g. internal metadata statements). Other named graphs are created as needed. Even the repository's own ontology is not a fixed graph since it can be managed like any other ontology loaded from a serialization.
Relationships - it would be helpful to record metadata about related named graphs, though the most compelling case for this is ontologies that use owl:includes to embed other ontology graphs. Since Sesame does not interpret OWL by itself, and we have no plans to add this sort of functionality for the initial repository implementation, this will be considered later.
The repository provides views to give clients an easier way to query over a useful set of named graphs. A view is just a way of describing a dataset (a collection of named graphs). The repository server has a built-in set of views, each named by a simple keyword. You can use a view with the SPARQL Protocol and with a resource dissemination request. It is a equivalent to building up a dataset out of named graphs but it is a lot less trouble, and guaranteed to be stable whereas graph names might change. The views are:
Important Note: You may have noticed that according to the definition, the user view is the same as the all view for an administrator user, so why bother creating an all view? It is intended to be specified when you have a query that really must cover all of the named graphs to work properly; if a non-administrator attempts it, it will fail with a permission error, instead of misleadingly returning a subset of the graphs.
A workspace is just another way to describe a dataset, by starting with a named graph. It is effectively a special kind of view. The name of a workspace is the URI of its base named graph, which must be of type workspace or published. When you specify that as the workspace, the repository server automatically adds these other graphs to the dataset:
You can specify a workspace instead of a view in SPARQL Protocol requests, and in resource dissemination requests.
As of the Version 1, MS5 release, the repository supports inferencing in some very specific cases. Since the repository's RDF data is very frequently modified, it does only the minimal inferencing needed by its users in order to keep the performance bearable.
Many inferencing schemes require inferencing to be re-done over the entire RDF database after every change because tracing the effects of a change through various rules would be at least as much computational effort as simply running the inferencing over. We have chosen a select subset of RDFS and OWL inference rules that makes incremental changes easy and efficient to re-compute.
See the RDF Semantics page for an overview of the greater body of inference rules (of which we only implement a small subset). The repository implements two different kinds of inferencing:
rdfs:subClassOf
relationships are created as direct statements.subPropertyOf
relationships are created as direct statements.rdf:type
properties are added.The TBox graphs are configurable. You can set the configuration property eaglei.repository.tbox.graphs to a comma-separated list of graph URIs. By default, the TBox consists of:
This inferencing scheme ensures very fast performance by assuming the TBox graphs never change under normal operations, which ought to be true. The data model ontology graph is only modified when a new version of the ontology is released. Likewise, the repository's internal ontology graph remains unchanged once the repository is installed. When the TBox graphs are changed, be aware that you will probably see a delay of many seconds or perhaps minutes, as all the TBox and ABox inferencing is re-done.
Inferred statements are not normally written when an entire graph is dumped. See the /graph service for details.
Authentication is managed entirely by the Java Servlet container. We rely on the container to supply an authenticated user name (a short text string) and whether that user has the "superuser" role. The container's role is only used for bootstrapping; normally roles are recorded in the RDF database and they take precedence over the container's role map.
Each login user is (ideally) recorded in both the RDBMS used by the servlet container (or possibly some other external DB) and the RDF database. This is necessary because the servlet container, which is doing the authentication, only has access to the RDBMS through a plugin, but the repository authorization mechanism expects an RDF expression of the user as well. All of the services that modify users keep the RDBMS and RDF databases synchronized, and can cope with users found in one and not the other.
The RDBMS description of a user contains:
The RDF description of a user contains:
When a user is present in RDF but not in the RDBMS, they are considered disabled and cannot login. They can be reinstated through the Admin UI.
When a user is present in the RDBMS but not in the RDF, they are considered undocumented. Upon login, an undocumented user is given the URI corresponding to his/her highest known role:
:Role_Superuser if the RDBMS indicates that role, or :Role_Anonymous otherwise. (Arguably the default role could also be :Role_Authenticated, but without RDF data for the user they are not fully authenticated, and this is incentive to fix the discrepancy.)
To fix an undocumented user:
An Administrator (superuser) can become documented by logging in, and either running the /whoami service with create=true or using the Admin UI to edit their own user info and saving it. An Administrator can fix an ordinary undocumented user by using the Admin UI to save their descriptive metadata; even if it is all blank, a user record will be created. Importing users also straightens out the mapping automatically.
Roles are a way to characterize a group of users, for example, to grant them some access rights in the access-control system. Functionally, the role is part of a user's authentication, i.e., "who" they are.
A role is defined by a URI, the subject of a :Role instance. It should also have a locally unique, short text-string name (the rdfs:label of its :Role instance).
Each Role is independent of other Roles. Roles cannot be "nested". __ _This is a _necessary limitation that simplified the implementation considerably.
The Superuser role is built into the system because its privileges are hardcoded.
There are a couple of special Roles whose membership is implicit, that is, it never needs to be granted explicitly:
A repository user is identified uniquely (within the scope of ONE repository instance) by a short symbolic username. This is a character string composed of characters from a certain restricted subset of the ASCII character set, in order to avoid problems of character translation and metacharacter interpretation in both the protocol layer (HTTP) and OS tools such as command shells. The password, which is paired with a username to serve as login credentials, is likewise restricted to the same range of characters as the username.
Character restrictions: The username and password MUST NOT include the character ':' (colon). They MAY only include:
Note that although the HTTP protocol allows any graphic characters in the ISO-8859-1 codeset (modulo ':'), and linear whitespace, and even chars WITH special MIME RFC-2047 encoding rules, these are often implemented wrongly by HTTP clients and also invite encoding and metacahracter problems with OS and scriptiing tools. To avoid these troubles we simply restrict the available characters.
All of the servlet URLs in this interface except the public dissemination request (/i) require authentication. The dissemination request makes use of an authenticated user and roles when they are available, to access data that would be invisible to an anonymous request, but it is never required.
This is just an outline of the access control system. It is implemented as statements stored in the internal metadata graph. The access controls applying to an instance or other object are not to be directly visible in the repository API, execpt through administrative UI.
These types of access can be granted:
On the following types of resources:
/graph
Usually reserved for admins.Access control statements grant access to either a specific user, or to a Role, which applies to all users holding that role.
Any user asserting the Superuser role is always granted access, bypassing all controls. This lets us bootstrap the system when there is no RDF database yet to describe grants. Repository administrators should always have the Administrator role, since most of the Admin UI and API requires it.
Access control is implemented by statements of the form:
The resource is the URI of the instance, named graph, or workflow transition of interest. The access-type names one of the four types of access described above: read, add, remove, admin. Finally, the accessor is the URI of the Principal to be granted the access, either a Role or an Agent (user).
We anticipate having a relatively small number of these access grants. Although named graphs and workflow transitions need elaborate access descriptions, there are only a few of those -- on the order of dozens. Resource instances are of course more numerous but most of them have no access grants, deriving their read/query access from the named graph they reside in. The workflow claim service adds temporary grants to give the claim owner read/write access to be able to edit the instance while it is claimed.
The repository automatically records provenance metadata about objects when they are created and modified by users' actions. Provenance means information about the history and origin of the data, in this case the authenticated identity responsible and time of the latest change. The following properties are recorded for these types of objects, and can be obtained by querying with a view or dataset that includes the named graph containing public administrative metadata.
Note that the there is at most one value of any of these properties for each subject. That means the "modified" properties are updated whenever a subject is modified and the record of the previous modification is lost. This is a simplification that may be remedied at some point in the future if we add versioning of data to the repository.
Named Graphs:
dcterms:modified
---literal date of last modification, encoded as xsd:dateTimedcterms:contributor
---the URI of the Agent (authenticated user) who last modified itdcterms:source
---description of file or URI last loaded into this NG, if that is how it was created. This record is used to compare it against the source later to decide whether an update is necessary. It is a node (possibly blank node) with the following properties:
dcterms:identifier
---the resolvable URI of the resource loaded, most likely a URL in either the file or http scheme.dcterms:modified
---last-modification date of the resource, for later comparison when deciding whether to decache the repository copy of an external file, a literal xsd:dateTime.dcterms:created
---literal date when resource was created, encoded as xsd:dateTime
dcterms:creator
---the URI of the Agent (should be an authorized user) who created the instance.
dcterms:creator
comes from the uploaded data.dcterms:mediator
---ONLY when dcterms:creator
does not refer to the authenticated user who created the data, this is the URI of the Agent (authenticated user) who created this instance in the Repository.
dcterms:modified
---literal date when resource was last modified, encoded as xsd:dateTime
dcterms:contributor
---the URI of the Agent (authenticated user) who last modified this instance.
Some repository features are implemented as extensions to the Sesame RDF database (aka triplestore). This means they are available both internally to the repository implementation and externally whenever an API to Sesame, its SPARQL query engine, is exposed.
1. Output formats
Additional output formats for both RDF serialization and SPARQL tuple query results allow output in:
text/html
text/plain
(for SPARQL)2. SPARQL Query Function
The repository adds a custom function to Sesame's query engine: :upperCaseStr
. It returns the toUpperCase()
version of of the string value of an RDF value. Use it to sort values ignoring whether (a) the case of characters differs, (b) they are datatyped-literals or untyped literals (or other terms).
To invoke it you must have the repository's URI namespace defined as a prefix. For example,
PREFIX repo:<[http://eagle-i.org/ont/repo/1.0/]> ..query text... ORDER BYrepo:upperCaseStr(?label) |
The repository includes a "workflow" control system that directs the complete life cycle of each resource instance, and mediates access by users at each life cycle stage. The word "workflow" is often used to describe process-management and administration systems, but in this case it is really just a minimal implementation of states and extended access control.
It was implemented in the repository because it depends on persistent data and access control which are already available in the repository. It is also closely integrated with the access control system, which is easier to accomplish securely from within the repository codebase.
Workflow is manifested in RDF statements (of course) in the internal metadata graph. Although the Web API exposes some URIs and names of workflow objects, the ontology and access control details are intentionally hidden. There is no need for applications using workflow to see the model, all their access is through the API.
The model is a state map, with nodes and transitions between them. Elements of workflow are:
Bootstrapping refers to how a new repository node first starts up. The process is not trivial, since so much of its operation depends on the RDF database (triple-store, actually a quad-store) which is completely empty when a new repository is launched.
The repository must be simple (ideally "foolproof", although that only breeds more destructive fools) to install and manage, since it is intended to be deployed at dozens or hundreds of sites, managed by administrators with varying experience levels. All the while, it must still maintain adequate security and data integrity.
The bootstrap process:
The configuration properties are loaded from the file configuration.properties
in the repository home directory. It is read by Apache Commons Configuration, which allows system properties and other variables to be interpolated into the values. See the Apache Web site for complete documentation.
See the Configuration section in the Administrator Guide for a complete list of configuration properties.
These are the possible HTTP requests in the repository API.
The repository's webapp must be mounted at the web server's root, so that it can resolve the canonical URI of resource data instances.
This section lists the formats that the repo can use for output and, in some cases, input, of data. The MIME type is how you describe it in the API, whether through explicit args or HTTP headers.
In a request, you can usually specify input format two ways:
/update
which has multiple text entities with a possible content-type)You can ask for an output format in two ways as well:
Note that the tabular (tuple) and boolean query result formats are output-only. There are no requests that take them as input formats.
RDF Serialization formats:
Name |
Symbol |
Default MIME type |
Additional MIME types |
---|---|---|---|
RDF/XML |
RDFXML |
application/rdf+xml |
application/xml |
N3 |
N3 |
text/rdf+n3 |
|
N-Triples |
NTRIPLES |
text/plain |
|
TriG |
TRIG |
application/x-trig |
|
TriX |
TRIG |
application/trix |
|
NTriples With Context 1,2 |
Context-NTriples |
text/x-context-ntriples |
|
HTML 2,3 |
RDFHTML |
text/html |
application/xhtml+xml |
1 The Context-NTriples format is not an official RDF serialization; it was added for this repository, as a convenient way to export quads for testing. Note that it was formerly named NQuads, but there is already a different unofficial format known to the RDF community that is called "NQuads".
2 This format only supports output, it cannot be read by the repository.
3 HTML is for interactive viewing only, it cannot be parsed.
Tuple (Tabular, SPARQL SELECT) Query Result Serialization formats:
Name |
Symbol |
Default MIME type |
Additional MIME types |
---|---|---|---|
SPARQL/XML |
SPARQL |
application/sparql-results+xml |
application/xml |
SPARQL/JSON |
JSON |
application/sparql-results+json |
|
TEXT |
TEXT |
text/plain |
|
HTML |
HTML |
text/html |
application/xhtml+xml |
Boolean Query Result Serialization formats:
Name |
Symbol |
Default MIME type |
Additional MIME types |
---|---|---|---|
SPARQL/XML |
SPARQL |
application/sparql-results+xml |
|
TEXT |
TEXT |
text/boolean |
|
This request returns a tabular report on the current authenticated user's identity, to compose a friendly display in a Web UI. This is simply a SPARQL query wrapped in a servlet to hide the details of the internal data.
This request can also function as a "login" mechanism to establish a session and cache authentication, while at the same time getting the user's displayable name to show in the UI.
Subtle Alternate Function: Note that the POST form of this request has a separate function when create=true
: It creates the RDF metadata for a login user account. Normally this is only done as part of the initial bootstrap procedure, by the finish-install.sh script
. When users are created through the Admin UI or /import
their RDF metadata is created automatically.
NOTE: Perhaps it would be better to implement this as a redirect to the resolvable URI for the person, which would then yield a description compatible with the FOAF standard. That's how the Semantic Web wants us to manage it.
URL: /repository/whoami (GET, POST)
Args:
format
---Same as for SPARQL result format, same default (SPARQL XML)
create=(true|false)
When true, invokes alternate function of this service to create RDF metadata for current user (see explanation above).
firstname=text
---ONLY when create=true, the first name value of the created User instance (optional).
lastname=text
---ONLY when create=true, the last name value of the created User instance(optional).
mbox=text
---ONLY when create=true, the mbox
value of the created User instance(optional).
GET Result:
Response document is a SPARQL tuple result, format determined by the same protocols as for /sparql
. It contains the following columns:
Note that the last 3 fields may be empty if that data is not available.
If there is no :Agent
instance for the logged-in user, the URI will revert to their implicitly asserted Role, for example, :Role_Superuser
for an administrator. This is the same URI that appears in provenance metadata entries like dc:creator
.
POST Result:
When create=true
, the result document is empty, and the status code indicates success:
Access:
Open to authenticated users.
This call creates one or more globally unique, resolvable, URIs for new resource instances. It does not add any data to the repository; the instances will not exist until a user inserts some statements about them. The URI namespace is the default namespace from the configuration properties, followed immediately by the unique identifier. Note that ETL tools may request thousands of URIs at once so the mechanism to produce unique IDs must be able to handle that.
URL: {{/repository/new (POST only)
Args:
count
---number of URIs to return; optional, default is 1.
format
---same as for SPARQL result format, same default (SPARQL XML)
Result:
The requested number of new URIs are returned, packaged as a SPARQL query result for a field named "new". Its encoding is determined by the format
parameter or, if none specified, by the Accept
header of the HTTP request. Default is SPARQL/XML.
Access: Requires an authenticated user.
The disseminate service returns the RDF content of an instance; it is how the URI is resolved to implement the Linked Open Data paradigm of the Semantic Web. Note that there are actually three valid ways to construct a request for any given data instance:
/i/instance-ID
---assumes that URI prefix matches Web server's DNS address, in other words, the configured default URI namespace./i?uri=instance-URI
---retrieves any instance URI whether the prefix matches the default namespace or not. This allows one repository to resolve multiple domains./repository/resource?uri=instance-URI
---Just like the /i
form, only with authentication required. This is the recommended URL for programs accessing resource contents through the REST API, since /i
might not require or make use of authentication credentials.URL: /i/instance-ID
(GET or POST method)
/i
/repository/resource
Args:
uri=uri
---optional, only if a URI is not specified as the tail of the request URI; an alternate way to explicitly specify the complete URI of the resource to disseminate. Allows any URI to be accessed, instead of assuming that the URI's namespace matches the hostname, context, and servlet path ("/i") to which the repository's webserver responds.
format=mimetype
---optionally override the dissemination format that would be chosen by HTTP content negotiation. Note that choosing text/html
results in a special human-readable result.
view=view
---optionally choose a different view dataset from which to select the graph for dissemination. Mutually exclusive with workspace.
workspace=uri
---URI of the workspace named graph to take the place of the default graph. Relevant metadata and ontology graphs are included automatically. Mutually exclusive with view.
noinferred
(boolean)---Exclusive of all inferred statements from the generated results. This only applies to rdf:type
statements; if the noinferred query argument is present (it need not have any value) then inferred types are left out of the results.
forceRDF
(boolean)---forces a result of serialized RDF data. When the negotiated result format is text/html, the usual choice is to geneate the human-readable view. This forces an HTML rendering of the RDF statements which can be handy for troubleshooting, especially when combined with noinferred. NOTE: This is the only way to see the Embedded Instance statements in an interactive HTML view, for example, in a web browser, so it is especially handy for generating a clean view for debugging EIs. Default false.
forceXML
(boolean)---when an HTML format would be generated, output the intermediate XML document instead of transforming it to XHTML. This is mainly useful for obtaining examples of the intermediate XML for developing new XSLT stylesheets and testing/debugging. Default is false.
Result:
Returns a serialization, or, optionally, human-readable HTML rendering of the graph of RDF statements describing the indicated resource instance. Note the deliberate choice of words: This graph includes not only the statements of which the URI is the subject, but also:
Note that depending on the authenticated roles of the requesting user and the configured access controls, some properties may be excluded from this result. For example, in some cases, unauthenticated users will not see certain properties which may contain confidential information.
About HTML dissemination:
When the negotiated format is text/html, and unless either of the forceRDF or forceXML args was given, the dissemination process creates an intermediate XML document and transforms it into XHTML with the configured XSLT stylesheet. See description of the eaglei.repository.instance.xslt in the Repository Administrator Guide.
If no XSLT stylesheet is configured, the intermediate XML document is delivered instead, with a media content type of application/xml. Note that this means, to obtain correct XHTML output, you MUST configure an XSLT stylesheet.
The content of the intermediate XML format is described in a W3C XML Schema document that may be downloaded from a running repository at for example, https://localhost:8443/repository/schemas/instance.xsd
We provide an example transformation stylesheet that produces very simple HTML, intended to be the basis of custom stylesheets. It is available for download at .:https://localhost:8443/repository/styles/example.xsl
We manage the transformation within the repository, instead of adding an xml-stylesheet processing instruction to the XML, for compelling reasons:
The transformation stylesheet is supplied with these parameters when it is invoked. They should be declared with <xsl:param name="..."/>
directives in the XSL. Be sure your stylesheet can cope with parameters that are not set, by supplying default values.
__repo_version
---string containing Maven version spec of the running repository code. This is always set.__repo_css
---configured value of eaglei.repository.instance.css
, may not be set.__repo_logo
---configured value of eaglei.repository.logo
, may not be set.__repo_title
---configured value of eaglei.repository.title
, may not be set.Property Filtering
The set of properties returned in the HTML view is based on the same result as RDF disseminations, which is automatically filtered as necessary for the requesting user's access level.
Access:
Requires read permission on all named graphs in the query's chosen view. Note that this is the ONLY service available to unauthenticated users, so it must be able to gather a useful result from named graphs readable by the Anonymous role. If you do access this service with credentials, you will be able to see instances and properties that would be invisible to an unauthenticated user, for example, instances in private workspaces that are still in unpublished workflow states.
Note that when the requesting user does not have read access to the requested instance, it will appear to him/her that it does not exist; the error returned is identical to one for a nonexistent resource, since it is essentially the same case.
There is also access control on some indivdual properties of the resource: those properties identified as hidden and contact properties by the data model ontology (and its configuration, see that separate document). The access controls on the resource URI configured as datamodel.hideProperty.object regulate hidden proerties, and datamodel.contactProperty.object regulates contact properties.
Anyone with READ access on the URI gets to see the properties. To expose them to the world you'd give access to the Anonymous role. Normally only Curator, RNAV, and Lab User roles would be granted access to hidden and contact properties since they have to see and manipulate them through the data tools.
This service actually implements three different kinds of requests:
The update operation does all its work in the instance's home named graph. For an existing instance being modified, it is computed as the named graph in which the asserted rdf:type
statement(s) are found.
When creating new resources: Since the create operation doesn't have an instance from which to derive its home graph, its home graph must be specified as the workspace arg.
Workflow implications of creating new resources: Since the /update
action that creates a new resource instance is effectively performing a transition from the New workflow state, the current user must have permission to make such a transition to the destination workspace; if there are multiple transitions, one is chosen arbitrarily.
Acquiring and use of edit tokens: The edit token is intended to "protect" the read-only copy of the instance that you (presumably) download as a basis for edits. The correct sequence of operations when modifying an instance is:
/update
request to modify the resource instance, with token (1).This ensures that no matter how much time passes between (2) and (3), if, for example, a user dawdles over an interactive edit or forgets and leaves it overnight, the edit token is already in place to indicate his/her intention to make a change. It does not prevent another user from coming along and grabbing the token to make a change, but it will indicate that there is an edit in progress, and it will prevent a stale copy from being checked in.
Comparison with SPARQL/UPDATE: In case you are wondering why we chose to implement this complex special-purpose service instead of a general protocol like SPARQL/UPDATE - there were some compelling reasons:
When action=create
, there must be no existing statements in the repository with the given URI as a subject. The request must include an insert arg containing one or more statements whose predicate is rdf:type
. (All of the subjects must match the request URI). It is an error to specify a delete arg.
When action=gettoken
, an edit token is created if necessary, and returned along with the user who created it, time it was created, and a boolean value that is true if it was newly created by this request. This information is intended to help a UI service advise the user when there might be an edit in progress, if the boolean was false and the timestamp on the token is recent.
When action=update
, there must be an existing instance matching the URI of the request. DO NOT specify the workspace arg, since the repository automatically finds the instance's home graph and makes all changes there.
Updating a resource instance's properties requires an edit token. The token lets the server check that your edits are based on the current valid state of the resource; if another update occurs before yours, its changes could be corrupted or lost. To update, first, run the /repository/update
request with the arg, action=gettoken
to obtain the current edit token, creating one if necessary. Then, get the content properties as before. When calling /repository/update
again with action=update
add the token=token-uri
arg.
Note on file format and character set: The request specifies the file format and/or character set of the serialized RDF data as a Content-Type header value in the entity bodies of insert and delete args, for example, text/rdf+n3; charset="ISO-8859-1"
. The character set defaults to Unicode UTF-8, so if your source data is not in that character set you must declare it. The content-type can be provided in two different places – they are searched in this order of priority, and the first one found is the only one considered:
Content-Type
---header on the value of the content entity in a POST request. This takes precedence because it allows for different content-types in insert and delete args.format
---query argument value.*URL:* {{/repository/update \[ /instance-ID \]}} (POST only) |
Args:
{{uri---}}optional way to explicitly specify the complete URI, instead of assuming that the URI's namespace matches the hostname, context, and servlet path ("/i") of this webserver.
format
---the default expected format for insert and delete graphs. If the args specify a content-type header, that overrides this value. Only recognizes triples even if the format supports quads.
action=(update|create|gettoken)
---Update to modify an existing instance, create adds a new one. See below for details about gettoken.
token=uri
---When action is update or create, this must be supplied. The value is the URI returned by the last gettoken request.
workspace=uri
---Choose workspace named graph where new instance is created. Only necessary when action=create. Optional, default is the default workspace. DO NOT specify a workspace when action=update.
delete
---graph of statements to remove from the instance; subject must be the instance URI. Deletes are done before inserts. Graph may include wildcard URIs in predicate and/or object to match all values in that part of a statement.
insert
---graph of statements to add to instance; subject must be the instance URI.
bypassSanity
---(boolean, default false, deprecated) NOTE: It is best if you pretend this option does not exist. When true, it skips some of the sanity tests on the resulting instance graph, mostly the ones checking the integrity of Embedded instances. Requires Administrator privilege. This was added to make the data migration from broken old EI data possible, it should rarely if ever be needed.
The delete wildcard URI is http://eagle-i.org/ont/repo/1.0/MatchAnything
Result:
HTTP status indicates success or failure. Any modifications are transactional; on success the entire change was made, and upon failure nothing gets changed.
When action=update is used to effect a change to a resource instance, this service automatically optimizes the requested change so the fewest actual statements are modified. For example, if the request deletes all statements by using the wildcard URI in the position of predicate and value, and then inserts all of the statements that were there before along with an additional new statement, the only change actually made is to add that new statement. Since a gratuitous change to an rdf:type statement results in extra time spent inferencing, it is best to avoid it when possible.
When an update fails because the edit token is stale, the HTTP status is always 409 (Conflict). If this occurs, the only solution is to get a fresh token and re-do the update. It is NOT advisable to have a client simply retry the update with the same data – at least inform the user that there has been an intervening edit and updating now would destroy somebody else's changes.
When action=gettoken, the response includes a document formatted (according to the chosen format) as a SPARQL tuple result. It includes the columns:
token
---URI of the edit token. It has no meaning other than its use in an update transaction. This is the last edit token created (and not yet "used up" by an update) on this instance; or else a new one that was created if there wasn't one available.
created
---literal timestamp at which the token was created. It may be useful to display the date to the user if there was an existing token of recent vintage.
creator
---URI of the user who created the token
new
---boolean literal that is true if this gettoken operation created a new token. when false, taht means the token already existed, which indicates there MAY already be another user's update in progress which might conflict with yours. (see the created and creator values)
creatorLabel
---rdfs:label
--of the creator if available.
Access:
Requires ADD access to the either the instance itself or the its home named graph if the insert argument was given, and REMOVE access on the instance or the graph if the delete argument was given. When action=create, requires READ access on an appropriate Workflow Transition - out of the New state.
We may eventually decide to implement a quota on the count of statements that may be added or deleted, as a protection against DOS attacks and runaway clients.