Storage configuration • examinr

Exam data needs to be stored somewhere. Examinr does not make any assumption on how or where your data will be stored. You must set the storage provider in the exam configuration via exam_config(storage_provider=).

Security and Privacy

Please note that examinr stores sensitive information through the storage provider. It is your responsibility to know and understand the privacy legislation governing the data in your jurisdiction. The dbi_storage_provider() provides a simple workaround by hashing the user id, but this adds another step for you when exporting grades (matching user ids to hashed user ids).

It is recommended that you securely store the data using encryption, but this generally requires more knowledge about the database system in use.

DBI Storage Provider

With dbi_storage_provider(), examinr supports DBI-backed storage out of the box. You can leverage this storage provider by calling exam_config(storage_provider = dbi_storage_provider(...)). You are responsible for the initial setup of the database and establishing a connection with the database, which is passed on to dbi_storage_provider(). In addition to the connection object, the dbi_storage_provider() needs to be told the name of the database tables for storing attempts data and section data (i.e., users’ answers).

The built-in storage provider assumes the following two database tables are read- and writable:

Attempts table

Field	SQL data type	Storage requirements
`attempt_id`	CHAR (or UUID)	36 characters (on database systems which support a UUID type, it should be stored as such)
`user_id`	CHAR	either 64 characters (if hashing is enabled), or variable length (depending on the authentication provider)
`exam_id`	CHAR	variable length
`exam_version`	CHAR	variable length
`started_at`	DOUBLE PRECISION	seconds since 1970-01-01 00:00 UTC (UNIX timestamp)
`seed`	INTEGER	a signed integer of 4 bytes
`finished_at`	DOUBLE PRECISION	seconds since 1970-01-01 00:00 UTC (UNIX timestamp)
`user_obj`	CHAR	variable length (must be able to hold a serialized R object of arbitrary size)
`points`	CHAR	variable length (must be able to hold a serialized R object of arbitrary size)

Some database systems (e.g., PostgreSQL) have native support for a UUID type which should be preferred over character data for the attempts_id field. The attempt_id field should be unique and as such can be used as primary key on the table. The table is mainly queried for a given user_id, exam_id and exam_version. Therefore, it makes sense to add an index (not unique!) spanning these 3 fields to speed up the queries.

Both user_obj and points are serialized R objects and can thus be of arbitrary size. The database should not make any assumption on the size of these fields. Internally, examinr serializes the R objects with serialize() and stores the base64 encoded character string.

An example table definition for PostgreSQL may look like

CREATE TABLE attempts (
  attempt_id   uuid              PRIMARY KEY,
  user_id      character varying NOT NULL,
  exam_id      character varying NOT NULL,
  exam_version character varying NOT NULL,
  started_at   double precision  NOT NULL,
  seed         integer           NOT NULL,
  user_obj     text              NOT NULL,
  finished_at  double precision,
  points       text
);

CREATE INDEX attempts_index ON attempts (user_id, exam_id, exam_version);

Section data table

Field	SQL data type	Storage requirements
`attempt_id`	CHAR (or UUID)	same as for `attempt_id` of the attempts table
`section`	CHAR	variable length
`saved_at`	DOUBLE PRECISION	seconds since 1970-01-01 00:00 UTC (UNIX timestamp)
`section_data`	CHAR	variable length (must be able to hold a serialized R object of arbitrary size)

This definition does not have a natural candidate for a primary key, so it is up to you and your database system what kind of primary key you want to use.

The attempt_id field is a reference to the attempt stored in the attempts table. The table is mainly queried for a given attempt_id and section. Therefore, it makes sense to add an index (not unique!) spanning these two fields to speed up the queries.

An example table definition for PostgreSQL may look like:

CREATE TABLE section_data (
  id           serial            PRIMARY KEY,
  attempt_id   uuid              NOT NULL REFERENCES attempts (attempt_id)
                                          ON DELETE CASCADE ON UPDATE CASCADE,
  section      character varying NOT NULL,
  saved_at     double precision  NOT NULL,
  section_data text
);

CREATE INDEX section_data_index ON section_data (attempt_id, section);

User ID hashing

User IDs can often be used for identifying individuals and in turn their personal information. In many jurisdictions you must ensure that this information is properly protected. If you don’t have a database with transparent encryption, it is sensible to store a hashed variant of the user id instead of the actual value. The dbi_storage_provider() by default hashes the user id using digest::hmac(key, user_id, algo = "sha256") where key is by default the string {exam_id}_{exam_version}. While this ensures the data cannot be traced back to the actual user id by unauthorized third parties, it also prevents you from associating the data (in particular the grading information) with the actual user.

Currently, it is your responsibility to match the hashed user id’s back to the actual users by computing the HMAC of the known user ids and comparing to the hashed user ids in the grading data.

Example usages

Ephemeral database for testing

If you don’t have access to a database server and want to test an examinr exam, an ephemeral in-memory SQLite database is very useful. This database only exists as long as the R server process lives. It can only be used by a single process at a time and as such is not suited for actual exams. For this example you need the RSQLite package.

In the server-start code you have to set up the database and configure the storage provider:

#! context="server-start"
library(RSQLite)
# Create an ephemeral in-memory RSQLite database
db_con <- dbConnect(SQLite(), ":memory:")

# Create the attempts table (SQLite doesn't have a UUID type)
dbExecute(db_con, 'CREATE TABLE attempts (
  attempt_id   varchar(36) PRIMARY KEY,
  user_id      varchar(64) NOT NULL,
  exam_id      varchar(64) NOT NULL,
  exam_version varchar(64) NOT NULL,
  user_obj     text        NOT NULL,
  seed         integer     NOT NULL,
  started_at   timestamp   NOT NULL DEFAULT CURRENT_TIMESTAMP,
  finished_at  timestamp,
  points       text
)')
# Create an index on the attempts table
dbExecute(db_con, 'CREATE INDEX attempts_index ON attempts (user_id, exam_id, exam_version)')

# Create the section data table
dbExecute(db_con, 'CREATE TABLE section_data (
  id           integer      PRIMARY KEY,
  attempt_id   varchar(36)  NOT NULL,
  section      varchar(64)  NOT NULL,
  saved_at     timestamp    NOT NULL DEFAULT CURRENT_TIMESTAMP,
  section_data text
)')
# Create an index on the section data table
dbExecute(db_con, 'CREATE INDEX section_data_index ON section_data (attempt_id, section)')

# Disconnect from the DB when shiny stops (this also destroys the ephemeral database!)
shiny::onStop(function () { dbDisconnect(db_con) })

# Use this database as storage for the exam
exam_config(storage_provider = dbi_storage_provider(db_con, 'attempts', 'section_data'))

PostgreSQL database

If you have access to a PostgreSQL database it can be used via the RPostgres package. For easy handling of dropped connections, using a pool of database connections via the pool package is generally recommended. First, the database needs to be set up and the tables created according to the table definitions above. The pool is then created in a server-start context and passed on to dbi_storage_provider().

#! context="server-start"
library(pool)

# Create a pool of connections to your PostgreSQL database
db_pool <- dbPool(drv = RPostgres::Postgres(),
                  dbname = "{database name}",
                  host = "{database host}",
                  username = "{username}",
                  password = "{password}",
                  minSize = 1,
                  maxSize = 2)

# Close all connections in the pool shiny stops
shiny::onStop(function () { poolClose(db_pool) })

# Use this database as storage for the exam
exam_config(storage_provider = dbi_storage_provider(db_pool, 'attempts', 'section_data'))

Note that you won’t need a large pool of connections as each Shiny process can only use one connection at a time anyways. The pool is mostly useful for automatically handling dropped connections.

Writing your own storage provider

If the DBI-backed storage provider does not suit your needs, you can write your own storage provider and configure it via exam_config(storage_provider=) A storage provider is a list of functions invoked by examinr to read and write all exam data.

The list of functions must include all of the following:

`create_attempt(user, exam_id, exam_version, seed, started_at, ...)`

This function is invoked when a user tries to start a new attempt. The function must return a unique identifier for the attempt, or FALSE in case of an error.

Argument	Description
`user`	user object as returned by the authentication provider
`exam_id`	exam identifier (scalar character)
`exam_version`	version string associated with the exam (scalar character)
`seed`	integer used to seed the attempt (scalar integer)
`started_at`	the date-time the attempt was started, including timezone (object of type `POSIXct`)
`...`	currently not used

`finish_attempt(attempt_id, finished_at, ...)`

This function should mark an attempt as “finished” and is invoked when a user finishes an attempt (or the time elapsed). The function should return TRUE if the data is saved successfully and FALSE in case of an error.

Argument	Description
`attempt_id`	the attempt identifier (as returned by `create_attempt()`)
`finished_at`	the date-time the attempt was finished, including timezone (object of type `POSIXct`)
`...`	currently not used

`grade_attempt(attempt_id, points, ...)`

Grade the given attempt, assigning the given points object to the attempt. The function should return TRUE if the points are saved successfully and FALSE in case of an error.

Argument	Description
`attempt_id`	the attempt identifier (as returned by `create_attempt()`)
`points`	an R list object which holds all grading information for the attempt
`...`	currently not used

`get_attempts(user, exam_id, exam_version, ...)`

Query all attempts filtered by the given arguments. The argument user may be NULL, in which case the function should return the attempts for all users. The function must return a list of attempt objects (of the form described below), or NULL in case of an error.

Argument	Description
`user`	user object as returned by the authentication provider
`exam_id`	exam identifier (scalar character)
`exam_version`	version string associated with the exam (scalar character)
`...`	currently not used

The function must return a list of attempt objects. An attempt object is a list containing the following information

Element	Description
`id`	attempt identifier
`user`	user object associated with this attempt (as given to `create_attempt()`)
`started_at`	time (as `POSIXct` object in the system’s timezone) the attempt was started
`finished_at`	time (as `POSIXct` object in the system’s timezone) the attempt was finished (may be `NULL` if the attempt is not yet finished)
`points`	R list object as given to `grade_attempt()` (may be `NULL` if the attempt is not yet graded)

`save_section_data(attempt_id, section, section_data, ...)`

Save section data for the given attempt. The function should return TRUE if the data is saved successfully and FALSE in case of an error.

Argument	Description
`attempt_id`	the attempt identifier (as returned by `create_attempt()`)
`section`	section identifier
`section_data`	an R list object with section-specific data
`...`	currently not used

`get_section_data(attempt_id, section, ...)`

Query the most recently saved section data for the given attempt. If section is NULL, the function should not filter by section, but return the latest data for all sections. The function must return a list of section data objects (of the form described below), or NULL in case of an error.

Argument	Description
`attempt_id`	the attempt identifier (as returned by `create_attempt()`)
`section`	section identifier
`...`	currently not used

The function must return a list of section data objects. A section data object is a list containing the following information.

Element	Description
`section`	section identifier
`timestamp`	time (as `POSIXct` object in the system’s timezone) when the data was saved
`section_data`	an R list object with section-specific data, exactly as given to `save_section_data()`

`get_last_section(attempt_id, ...)`

Get the identifier of the section for which the most recent data is available (i.e., the section the attempt was showing last). The function should return a character string with the section identifier of the section for which data was saved most recently. In case of an error, or if there is no data available for the given filter, the function should return NULL.

Argument	Description
`attempt_id`	the attempt identifier (as returned by `create_attempt()`)
`...`	currently not used