Securing a GraphDB EE cluster

Ellis Pritchard
8 min readDec 19, 2019
Lots of padlocks

In a previous article, we set up a GraphDB EE cluster with separate master and worker nodes, however we didn’t mention security!

Importantly, security is completely OFF by default, i.e. anyone can connect to any instance without authenticating.

Unfortunately, the documentation (also for clusters) is a bit of a tease, and leaves too many questions unanswered, so here’s something that is hopefully more useful!

Overview

There are two separate security concerns we’ll cover:

  • Securing inter-cluster (i.e. master to worker) communication.
  • Authenticating with the Workbench, and with the API on master nodes to make queries, or use the management facilities (repository creation etc.)

I’m not going to cover SSL set up, or authenticating with LDAP here, since these are reasonably covered by the official documentation.

This guide will cover securing inter-cluster communication, and local authentication and authorization.

Securing inter-cluster communication

To secure inter-cluster communication you need to set a secret on each of the instances; this goes in the conf/graphdb.properties file:

graphdb.auth.token.secret = some-long-random-secret-here

Set this to the same value on all instances of your cluster, and restart them.

What this apparently does is allow GraphDB to use a special internal cluster user when talking between master and worker instances (and also between masters), while preventing spoof connections from outside the cluster.

Note that the documentation says that you don’t actually need to do this if using the Workbench to connect repositories. Regardless, it is probably best not to rely on this, since the Workbench is not the only way to connect instances.

Local Authentication Capabilities

Each instance in the cluster has its own database of user accounts which can authenticate with the API. Yes, it’s own database — without setting up LDAP instances do not share a user database!

This means that any instance to which you want to connect (or want APIs to connect to), has to have user accounts individually set up on it.

The database is not a repository, it’s actually a part of the run-time configuration file stored in work/workbench/settings.js (relative to graphdb.home)— this file is relevant to all instance types, whether they are running the Workbench or not, and is created on start-up if it does not exist.

By default, the settings.js file has one user configured, admin with the default password root, the relevant part of the file looks like this (it’s actually JSON):

{
"users" : {
"admin" : {
"username" : "admin",
"password" : "{bcrypt}$2a$10$503Ebaadf00",
"grantedAuthorities" : [ "ROLE_ADMIN" ],
"appSettings" : {
...
},
"dateCreated" : 1575392449518
},
...
}

Since you need to be able to authenticate with both master and worker instances in order to manage repositories, you will need at least the admin account (but should change the password).

Oh, that’s awkward

Unfortunately, just setting the Active Location in the Workbench does not route the users created in the Users and Access panel to that instance: all users created in the Workbench stay in the Workbench. 😞

Furthermore, your Linux OS almost certainly won’t have a command-line tool suitable for generating the encrypted password, or an easy to install package to do so, due to some kind of long-lived feud between Linux developers.

Command-line tools for encrypting the password using bcrypt do exist (golang and python) — but check that the output is actually compatible with the exact algorithm used by GraphDB (output should start with $2a$10$, i.e. version 2a, with 2^10 key expansion rounds).

You could create all the users you need in the Workbench UI, and then copy them (as appropriate) to the settings.js file on the other instances, however the easiest way is probably to use use the REST API against each instance once it is running.

For dev-opsy automated set-up, you could install standard password file(s) on all instances, perhaps varying by instance type, and manage passwords in, for example, Ansible vaults. Having a bcrypt command line tool might then come in handy.

Using the REST API to create user accounts

Basically, put the new username on the end of the URL path, pass the new password in a X-GraphDB-Password header, and POST the access rights (roles) in the body (change the host/port as required):

curl -X POST -H'X-GraphDB-Password: a-really-good-password' -H'content-type: application/json' 'http://localhost:8080/rest/security/user/alice' -d '{"grantedAuthorities": [ "READ_REPO_repo-1", "ROLE_USER" ]}'

Maybe that’s easier than editing JSON in the settings.js file?!

Rights and Roles

The REST example above grants two roles to alice using the grantedAuthorities array: ROLE_USER, which is a basic role which doesn’t include repository management, and READ_REPO_repo-1 which allows reading only on repository repo-1.

Here are details of all the supported roles (also explained somewhat in the docs):

  • Admin (API: ROLE_ADMIN) — can do anything. In particular, this is the only role that can manage users and locations.
  • Repository Manager (API: ROLE_REPO_MANAGER) — can manage repositories, including creation and deletion, but cannot manage users or locations, and cannot set the Workbench active location.
  • User (API: ROLE_USER) — can read, and possibly also write, on all, or a restricted set of repositories.

Only ROLE_USER can be restricted to read or read/write on individual repositories, using pseudo-roles WRITE_REPO_repo-name and READ_REPO_repo-name. To grant read or read/write access on all repositories, use the special roles READ_REPO_* or WRITE_REPO_*.

You might consider restricting access to ROLE_ADMIN to a few trusted Ops folk, but allow ROLE_REPO_MANAGER to anyone who might reasonably need to create or delete repositories. Applications (and people) using Query APIs should only need one of the read or read/write roles, potentially restricted to a subset of repositories.

Since users and APIs will connect only to the master instance(s), you only need to create accounts for those purposes on the master instance(s): the workers only require an account with ROLE_ADMIN, e.g. the admin user.

Security — not yet!

You might notice at this point that, despite all our hard work, you can still connect to all the instances without authenticating. That’s because security is off by default.

Now is the time to actually enable security (but see Workbench caveat)!

⚠️ After you enable security, the Workbench doesn’t like being pointed to an active location it can no-longer connect to, and will not let you log in, even with valid credentials (you may see brief flashes of an error message about locations before it kicks you back to the login screen). Add admin authentication details to your Workbench locations before turning security on for the instances.

There are three ways of enabling security, as shown in the documentation:

  • for the Workbench itself, just turn it on using the Users and Access panel; you’ll immediately need to log in.
  • call the REST API on every instance.
  • edit the settings.js file on all instances, and restart.

Using the REST API we can just use curl to each instance to switch security on (change the host/port as required):

curl -XPOST -H'content-type: application/json' 'http://localhost:8080/rest/security' -d 'true'

Since security was not on, we don’t need to authenticate to do this!

Otherwise, we can change the properties section in the work/workbench/settings.js file and restart the instance:

"properties" : {
"current.location" : "",
"security.enabled" : "true"
},

Now security is enabled!

Uh oh, I broke it!

If you already created your locations in the Workbench, when you enabled security, you may have broken it! Don’t panic! 😫

The Workbench can’t to connect to the locations any more, because they now require authentication, and will show everything as broken. Even worse than this, due to a bug, if authentication fails for the active location you may not even be able to log in to the Workbench so that you can fix it!

To avoid this bug in the first place, set the active location to Local before enabling security; alternatively, add authentication details to each of the locations before turning on security for the instances.

If you’ve already locked yourself out, try editing the Workbench work/workbench/settings.js file and set current.location (under properties) to an empty string, and restart the Workbench instance. You should now be able to fix the locations.

If you get utterly stuck, just delete the Workbench settings.js file, restart the Workbench instance, and the Workbench will recreate it with the default admin password, no locations and no security; now add your users and connect to locations again, with the appropriate authentication details, before turning security back on for the Workbench.

Note that you can only manage locations from the Workbench when logged in as a user with the ADMIN_ROLE, and, importantly, the authentication you use for a location must use a user with the ADMIN_ROLE, else you will get the somewhat unhelpful error “Cannot connect to location (GraphDBWorkbenchException: Not a valid GraphDB instance.)”.

Using secured APIs

Now we’ve secured the cluster, how do APIs connect?

There are two methods:

  • Basic auth using username and password.
  • Token based auth, using a JWT generated by a login API.

The latter is recommended, because the bcrypt algorithm used to perform the authentication can be expensive (it’s intended to be expensive to slow down brute-force attacks), but the former is easier to use. Both are insecure over plain HTTP on public networks, but are fine on private internal networks, or over TLS.

To authenticate using Basic auth, base64 encode the username and password, (separating them with a colon), and send this as the Authorization header with every request, e.g.

# create Basic auth token
PASSWORD=$(echo -n 'username:password' | base64)
# send a SPARQL query to repo-1 with the Authorization header
curl -H"Authorization: Basic $PASSWORD" 'http://localhost:8080/repositories/repo-1?query=CONSTRUCT+%7B%3Fs+%3Fp+%3Fo%7D+WHERE+%7B%3Fs+%3Fp+%3Fo%7D+LIMIT+10'

To authenticate using a JWT, first call the /rest/login endpoint with your username on the end of the URL, and the password in an X-GraphDB-Password header, then use the token returned in the Authorization header to call the API:

# authenticate: the JWT is returned in the Authorization header
AUTH=$(curl -Is -XPOST -H'X-GraphDB-Password: password' 'http://localhost:8080/rest/login/username' | grep Authorization)
echo $AUTH # will be something like:
Authorization: GDB eyJ1xxxxxxxxxxxxxx
# send a query with the (captured) Authorization header:
curl -H"$AUTH" 'http://localhost:8080/repositories/repo-1?query=CONSTRUCT+%7B%3Fs+%3Fp+%3Fo%7D+WHERE+%7B%3Fs+%3Fp+%3Fo%7D+LIMIT+10'

You can re-use the JWT until it expires (after 24 hours), saving the expense of the bcrypt every time.

Using the security management API with security enabled

To save any confusion, when trying to create or update users using the /rest/security API, once security is enabled, you always need to send an Authorization header for a user with ROLE_ADMIN, and, only if the operation needs a user’s password, also a X-GraphDB-Password header to contain the password to be used in the operation itself. e.g. to change a user’s password:

curl -v -X PATCH -H"Authorization: Basic $(echo -n 'admin:admin-password' | base64)" -H'X-GraphDB-Password: new-user-password' -H’content-type: application/json’ ‘http://localhost:8080/rest/security/user/username' -d '{}'

The -v is helpful here, because you don’t always seem to get an error message if your request is wrong, and sometimes you get a 501 rather than a more appropriate status code.

Using the RDF4J console with security enabled

The connect command takes plain-text username and password parameters following the GraphDB instance URL; note that the connection is not tested until you issue a command that requires authorization:

> connect http://localhost:8080
Connected to http://localhost:8080
> show repositories
Failed to get repository list: org.eclipse.rdf4j.http.protocol.UnauthorizedException
> disconnect
Disconnecting from http://localhost:8080
> connect http://localhost:8080 some-username some-password
Connected to http://localhost:8080
> show repositories
+----------
|repo-1 ("Description of repo-1...")
|repo-2 ("Description of repo-2...")
+----------

Summary

  • Security is not enabled by default.
  • Unless you want to configure LDAP, user accounts need to be managed independently on each instance.
  • All instances need an account with admin rights, such as the default admin user — change the password!
  • Configure any users you need for Workbench and API use as necessary on the Workbench and master instances, assigning them appropriate roles.
  • worker instances only really need an admin account, however you should probably also add a more restrictive account with the Repository Manager (ROLE_REPO_MANAGER) role.
  • Configure a shared secret on all master and worker nodes to secure inter-cluster communication.

Hope that helps!

--

--