Exam data needs to be stored somewhere. Examinr does not make any assumption on how or where your data will be stored. You must set the storage provider in the exam configuration via exam_config(storage_provider=)
.
Please note that examinr stores sensitive information through the storage provider. It is your responsibility to know and understand the privacy legislation governing the data in your jurisdiction. The dbi_storage_provider()
provides a simple workaround by hashing the user id, but this adds another step for you when exporting grades (matching user ids to hashed user ids).
It is recommended that you securely store the data using encryption, but this generally requires more knowledge about the database system in use.
With dbi_storage_provider()
, examinr supports DBI-backed storage out of the box. You can leverage this storage provider by calling exam_config(storage_provider = dbi_storage_provider(...))
. You are responsible for the initial setup of the database and establishing a connection with the database, which is passed on to dbi_storage_provider()
. In addition to the connection object, the dbi_storage_provider()
needs to be told the name of the database tables for storing attempts data and section data (i.e., users’ answers).
The built-in storage provider assumes the following two database tables are read- and writable:
Field | SQL data type | Storage requirements |
---|---|---|
attempt_id |
CHAR (or UUID) | 36 characters (on database systems which support a UUID type, it should be stored as such) |
user_id |
CHAR | either 64 characters (if hashing is enabled), or variable length (depending on the authentication provider) |
exam_id |
CHAR | variable length |
exam_version |
CHAR | variable length |
started_at |
DOUBLE PRECISION | seconds since 1970-01-01 00:00 UTC (UNIX timestamp) |
seed |
INTEGER | a signed integer of 4 bytes |
finished_at |
DOUBLE PRECISION | seconds since 1970-01-01 00:00 UTC (UNIX timestamp) |
user_obj |
CHAR | variable length (must be able to hold a serialized R object of arbitrary size) |
points |
CHAR | variable length (must be able to hold a serialized R object of arbitrary size) |
Some database systems (e.g., PostgreSQL) have native support for a UUID type which should be preferred over character data for the attempts_id
field. The attempt_id
field should be unique and as such can be used as primary key on the table. The table is mainly queried for a given user_id
, exam_id
and exam_version
. Therefore, it makes sense to add an index (not unique!) spanning these 3 fields to speed up the queries.
Both user_obj
and points
are serialized R objects and can thus be of arbitrary size. The database should not make any assumption on the size of these fields. Internally, examinr serializes the R objects with serialize()
and stores the base64 encoded character string.
An example table definition for PostgreSQL may look like
CREATE TABLE attempts (
attempt_id uuid PRIMARY KEY,
user_id character varying NOT NULL,
exam_id character varying NOT NULL,
exam_version character varying NOT NULL,
started_at double precision NOT NULL,
seed integer NOT NULL,
user_obj text NOT NULL,
finished_at double precision,
points text
);
CREATE INDEX attempts_index ON attempts (user_id, exam_id, exam_version);
Field | SQL data type | Storage requirements |
---|---|---|
attempt_id |
CHAR (or UUID) | same as for attempt_id of the attempts table |
section |
CHAR | variable length |
saved_at |
DOUBLE PRECISION | seconds since 1970-01-01 00:00 UTC (UNIX timestamp) |
section_data |
CHAR | variable length (must be able to hold a serialized R object of arbitrary size) |
This definition does not have a natural candidate for a primary key, so it is up to you and your database system what kind of primary key you want to use.
The attempt_id
field is a reference to the attempt stored in the attempts table. The table is mainly queried for a given attempt_id
and section
. Therefore, it makes sense to add an index (not unique!) spanning these two fields to speed up the queries.
An example table definition for PostgreSQL may look like:
CREATE TABLE section_data (
id serial PRIMARY KEY,
attempt_id uuid NOT NULL REFERENCES attempts (attempt_id)
ON DELETE CASCADE ON UPDATE CASCADE,
section character varying NOT NULL,
saved_at double precision NOT NULL,
section_data text
);
CREATE INDEX section_data_index ON section_data (attempt_id, section);
User IDs can often be used for identifying individuals and in turn their personal information. In many jurisdictions you must ensure that this information is properly protected. If you don’t have a database with transparent encryption, it is sensible to store a hashed variant of the user id instead of the actual value. The dbi_storage_provider()
by default hashes the user id using digest::hmac(key, user_id, algo = "sha256")
where key
is by default the string {exam_id}_{exam_version}
. While this ensures the data cannot be traced back to the actual user id by unauthorized third parties, it also prevents you from associating the data (in particular the grading information) with the actual user.
Currently, it is your responsibility to match the hashed user id’s back to the actual users by computing the HMAC of the known user ids and comparing to the hashed user ids in the grading data.
If you don’t have access to a database server and want to test an examinr exam, an ephemeral in-memory SQLite database is very useful. This database only exists as long as the R server process lives. It can only be used by a single process at a time and as such is not suited for actual exams. For this example you need the RSQLite package.
In the server-start code you have to set up the database and configure the storage provider:
#! context="server-start"
library(RSQLite)
# Create an ephemeral in-memory RSQLite database
db_con <- dbConnect(SQLite(), ":memory:")
# Create the attempts table (SQLite doesn't have a UUID type)
dbExecute(db_con, 'CREATE TABLE attempts (
attempt_id varchar(36) PRIMARY KEY,
user_id varchar(64) NOT NULL,
exam_id varchar(64) NOT NULL,
exam_version varchar(64) NOT NULL,
user_obj text NOT NULL,
seed integer NOT NULL,
started_at timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
finished_at timestamp,
points text
)')
# Create an index on the attempts table
dbExecute(db_con, 'CREATE INDEX attempts_index ON attempts (user_id, exam_id, exam_version)')
# Create the section data table
dbExecute(db_con, 'CREATE TABLE section_data (
id integer PRIMARY KEY,
attempt_id varchar(36) NOT NULL,
section varchar(64) NOT NULL,
saved_at timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
section_data text
)')
# Create an index on the section data table
dbExecute(db_con, 'CREATE INDEX section_data_index ON section_data (attempt_id, section)')
# Disconnect from the DB when shiny stops (this also destroys the ephemeral database!)
shiny::onStop(function () { dbDisconnect(db_con) })
# Use this database as storage for the exam
exam_config(storage_provider = dbi_storage_provider(db_con, 'attempts', 'section_data'))
If you have access to a PostgreSQL database it can be used via the RPostgres package. For easy handling of dropped connections, using a pool of database connections via the pool package is generally recommended. First, the database needs to be set up and the tables created according to the table definitions above. The pool is then created in a server-start context and passed on to dbi_storage_provider()
.
#! context="server-start"
library(pool)
# Create a pool of connections to your PostgreSQL database
db_pool <- dbPool(drv = RPostgres::Postgres(),
dbname = "{database name}",
host = "{database host}",
username = "{username}",
password = "{password}",
minSize = 1,
maxSize = 2)
# Close all connections in the pool shiny stops
shiny::onStop(function () { poolClose(db_pool) })
# Use this database as storage for the exam
exam_config(storage_provider = dbi_storage_provider(db_pool, 'attempts', 'section_data'))
Note that you won’t need a large pool of connections as each Shiny process can only use one connection at a time anyways. The pool is mostly useful for automatically handling dropped connections.
If the DBI-backed storage provider does not suit your needs, you can write your own storage provider and configure it via exam_config(storage_provider=)
A storage provider is a list of functions invoked by examinr to read and write all exam data.
The list of functions must include all of the following:
create_attempt(user, exam_id, exam_version, seed, started_at, ...)
This function is invoked when a user tries to start a new attempt. The function must return a unique identifier for the attempt, or FALSE
in case of an error.
Argument | Description |
---|---|
user |
user object as returned by the authentication provider |
exam_id |
exam identifier (scalar character) |
exam_version |
version string associated with the exam (scalar character) |
seed |
integer used to seed the attempt (scalar integer) |
started_at |
the date-time the attempt was started, including timezone (object of type POSIXct ) |
... |
currently not used |
finish_attempt(attempt_id, finished_at, ...)
This function should mark an attempt as “finished” and is invoked when a user finishes an attempt (or the time elapsed). The function should return TRUE
if the data is saved successfully and FALSE
in case of an error.
Argument | Description |
---|---|
attempt_id |
the attempt identifier (as returned by create_attempt() ) |
finished_at |
the date-time the attempt was finished, including timezone (object of type POSIXct ) |
... |
currently not used |
grade_attempt(attempt_id, points, ...)
Grade the given attempt, assigning the given points
object to the attempt. The function should return TRUE
if the points are saved successfully and FALSE
in case of an error.
Argument | Description |
---|---|
attempt_id |
the attempt identifier (as returned by create_attempt() ) |
points |
an R list object which holds all grading information for the attempt |
... |
currently not used |
get_attempts(user, exam_id, exam_version, ...)
Query all attempts filtered by the given arguments. The argument user
may be NULL
, in which case the function should return the attempts for all users. The function must return a list of attempt objects (of the form described below), or NULL
in case of an error.
Argument | Description |
---|---|
user |
user object as returned by the authentication provider |
exam_id |
exam identifier (scalar character) |
exam_version |
version string associated with the exam (scalar character) |
... |
currently not used |
The function must return a list of attempt objects. An attempt object is a list containing the following information
Element | Description |
---|---|
id |
attempt identifier |
user |
user object associated with this attempt (as given to create_attempt() ) |
started_at |
time (as POSIXct object in the system’s timezone) the attempt was started |
finished_at |
time (as POSIXct object in the system’s timezone) the attempt was finished (may be NULL if the attempt is not yet finished) |
points |
R list object as given to grade_attempt() (may be NULL if the attempt is not yet graded) |
save_section_data(attempt_id, section, section_data, ...)
Save section data for the given attempt. The function should return TRUE
if the data is saved successfully and FALSE
in case of an error.
Argument | Description |
---|---|
attempt_id |
the attempt identifier (as returned by create_attempt() ) |
section |
section identifier |
section_data |
an R list object with section-specific data |
... |
currently not used |
get_section_data(attempt_id, section, ...)
Query the most recently saved section data for the given attempt. If section
is NULL
, the function should not filter by section, but return the latest data for all sections. The function must return a list of section data objects (of the form described below), or NULL
in case of an error.
Argument | Description |
---|---|
attempt_id |
the attempt identifier (as returned by create_attempt() ) |
section |
section identifier |
... |
currently not used |
The function must return a list of section data objects. A section data object is a list containing the following information.
Element | Description |
---|---|
section |
section identifier |
timestamp |
time (as POSIXct object in the system’s timezone) when the data was saved |
section_data |
an R list object with section-specific data, exactly as given to save_section_data()
|
get_last_section(attempt_id, ...)
Get the identifier of the section for which the most recent data is available (i.e., the section the attempt was showing last). The function should return a character string with the section identifier of the section for which data was saved most recently. In case of an error, or if there is no data available for the given filter, the function should return NULL
.
Argument | Description |
---|---|
attempt_id |
the attempt identifier (as returned by create_attempt() ) |
... |
currently not used |