Exam data needs to be stored somewhere. Examinr does not make any
assumption on how or where your data will be stored. You must set the
storage provider in the exam configuration via
exam_config(storage_provider=)
.
Please note that examinr stores sensitive information through the
storage provider. It is your responsibility to know and understand the
privacy legislation governing the data in your jurisdiction. The
dbi_storage_provider()
provides a simple workaround by hashing the user id, but this adds another
step for you when exporting grades (matching user ids to hashed user
ids).
It is recommended that you securely store the data using encryption, but this generally requires more knowledge about the database system in use.
With dbi_storage_provider()
, examinr supports DBI-backed storage out of the box. You
can leverage this storage provider by calling
exam_config(storage_provider = dbi_storage_provider(...))
.
You are responsible for the initial setup of the database and
establishing a connection with the database, which is passed on to
dbi_storage_provider()
. In addition to the connection
object, the dbi_storage_provider()
needs to be told the
name of the database tables for storing attempts data and section data
(i.e., users’ answers).
The built-in storage provider assumes the following two database tables are read- and writable:
Field | SQL data type | Storage requirements |
---|---|---|
attempt_id |
CHAR (or UUID) | 36 characters (on database systems which support a UUID type, it should be stored as such) |
user_id |
CHAR | either 64 characters (if hashing is enabled), or variable length (depending on the authentication provider) |
exam_id |
CHAR | variable length |
exam_version |
CHAR | variable length |
started_at |
DOUBLE PRECISION | seconds since 1970-01-01 00:00 UTC (UNIX timestamp) |
seed |
INTEGER | a signed integer of 4 bytes |
finished_at |
DOUBLE PRECISION | seconds since 1970-01-01 00:00 UTC (UNIX timestamp) |
user_obj |
CHAR | variable length (must be able to hold a serialized R object of arbitrary size) |
points |
CHAR | variable length (must be able to hold a serialized R object of arbitrary size) |
Some database systems (e.g., PostgreSQL) have native support for a
UUID type which should be preferred over character data for the
attempts_id
field. The attempt_id
field should
be unique and as such can be used as primary key on the table. The table
is mainly queried for a given user_id
, exam_id
and exam_version
. Therefore, it makes sense to add an index
(not unique!) spanning these 3 fields to speed up the
queries.
Both user_obj
and points
are serialized R
objects and can thus be of arbitrary size. The database should not make
any assumption on the size of these fields. Internally, examinr
serializes the R objects with serialize()
and stores the
base64 encoded character string.
An example table definition for PostgreSQL may look like
CREATE TABLE attempts (
attempt_id uuid PRIMARY KEY,
user_id character varying NOT NULL,
exam_id character varying NOT NULL,
exam_version character varying NOT NULL,
started_at double precision NOT NULL,
seed integer NOT NULL,
user_obj text NOT NULL,
finished_at double precision,
points text
);
CREATE INDEX attempts_index ON attempts (user_id, exam_id, exam_version);
Field | SQL data type | Storage requirements |
---|---|---|
attempt_id |
CHAR (or UUID) | same as for attempt_id of the attempts
table |
section |
CHAR | variable length |
saved_at |
DOUBLE PRECISION | seconds since 1970-01-01 00:00 UTC (UNIX timestamp) |
section_data |
CHAR | variable length (must be able to hold a serialized R object of arbitrary size) |
This definition does not have a natural candidate for a primary key, so it is up to you and your database system what kind of primary key you want to use.
The attempt_id
field is a reference to the attempt
stored in the attempts table. The table is mainly queried for a given
attempt_id
and section
. Therefore, it makes
sense to add an index (not unique!) spanning these two fields
to speed up the queries.
An example table definition for PostgreSQL may look like:
CREATE TABLE section_data (
id serial PRIMARY KEY,
attempt_id uuid NOT NULL REFERENCES attempts (attempt_id)
ON DELETE CASCADE ON UPDATE CASCADE,
section character varying NOT NULL,
saved_at double precision NOT NULL,
section_data text
);
CREATE INDEX section_data_index ON section_data (attempt_id, section);
User IDs can often be used for identifying individuals and in turn
their personal information. In many jurisdictions you must ensure that
this information is properly protected. If you don’t have a database
with transparent encryption, it is sensible to store a hashed variant of
the user id instead of the actual value. The
dbi_storage_provider()
by default hashes the user id using
digest::hmac(key, user_id, algo = "sha256")
where
key
is by default the string
{exam_id}_{exam_version}
. While this ensures the data
cannot be traced back to the actual user id by unauthorized third
parties, it also prevents you from associating the data (in particular
the grading information) with the actual user.
Currently, it is your responsibility to match the hashed user id’s back to the actual users by computing the HMAC of the known user ids and comparing to the hashed user ids in the grading data.
If you don’t have access to a database server and want to test an examinr exam, an ephemeral in-memory SQLite database is very useful. This database only exists as long as the R server process lives. It can only be used by a single process at a time and as such is not suited for actual exams. For this example you need the RSQLite package.
In the server-start code you have to set up the database and configure the storage provider:
#! context="server-start"
library(RSQLite)
# Create an ephemeral in-memory RSQLite database
db_con <- dbConnect(SQLite(), ":memory:")
# Create the attempts table (SQLite doesn't have a UUID type)
dbExecute(db_con, 'CREATE TABLE attempts (
attempt_id varchar(36) PRIMARY KEY,
user_id varchar(64) NOT NULL,
exam_id varchar(64) NOT NULL,
exam_version varchar(64) NOT NULL,
user_obj text NOT NULL,
seed integer NOT NULL,
started_at timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
finished_at timestamp,
points text
)')
# Create an index on the attempts table
dbExecute(db_con, 'CREATE INDEX attempts_index ON attempts (user_id, exam_id, exam_version)')
# Create the section data table
dbExecute(db_con, 'CREATE TABLE section_data (
id integer PRIMARY KEY,
attempt_id varchar(36) NOT NULL,
section varchar(64) NOT NULL,
saved_at timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
section_data text
)')
# Create an index on the section data table
dbExecute(db_con, 'CREATE INDEX section_data_index ON section_data (attempt_id, section)')
# Disconnect from the DB when shiny stops (this also destroys the ephemeral database!)
shiny::onStop(function () { dbDisconnect(db_con) })
# Use this database as storage for the exam
exam_config(storage_provider = dbi_storage_provider(db_con, 'attempts', 'section_data'))
If you have access to a PostgreSQL database it can be used via the
RPostgres package. For easy handling of dropped connections,
using a pool of database connections via the pool package is
generally recommended. First, the database needs to be set up and the
tables created according to the table definitions above. The pool is
then created in a server-start context and passed on to
dbi_storage_provider()
.
#! context="server-start"
library(pool)
# Create a pool of connections to your PostgreSQL database
db_pool <- dbPool(drv = RPostgres::Postgres(),
dbname = "{database name}",
host = "{database host}",
username = "{username}",
password = "{password}",
minSize = 1,
maxSize = 2)
# Close all connections in the pool shiny stops
shiny::onStop(function () { poolClose(db_pool) })
# Use this database as storage for the exam
exam_config(storage_provider = dbi_storage_provider(db_pool, 'attempts', 'section_data'))
Note that you won’t need a large pool of connections as each Shiny process can only use one connection at a time anyways. The pool is mostly useful for automatically handling dropped connections.
If the DBI-backed storage provider does not suit your needs, you can
write your own storage provider and configure it via
exam_config(storage_provider=)
A storage provider is a list
of functions invoked by examinr to read and write all exam data.
The list of functions must include all of the following:
create_attempt(user, exam_id, exam_version, seed, started_at, ...)
This function is invoked when a user tries to start a new attempt.
The function must return a unique identifier for the attempt, or
FALSE
in case of an error.
Argument | Description |
---|---|
user |
user object as returned by the authentication provider |
exam_id |
exam identifier (scalar character) |
exam_version |
version string associated with the exam (scalar character) |
seed |
integer used to seed the attempt (scalar integer) |
started_at |
the date-time the attempt was started, including
timezone (object of type POSIXct ) |
... |
currently not used |
finish_attempt(attempt_id, finished_at, ...)
This function should mark an attempt as “finished” and is invoked
when a user finishes an attempt (or the time elapsed). The function
should return TRUE
if the data is saved successfully and
FALSE
in case of an error.
Argument | Description |
---|---|
attempt_id |
the attempt identifier (as returned by
create_attempt() ) |
finished_at |
the date-time the attempt was finished, including
timezone (object of type POSIXct ) |
... |
currently not used |
grade_attempt(attempt_id, points, ...)
Grade the given attempt, assigning the given points
object to the attempt. The function should return TRUE
if
the points are saved successfully and FALSE
in case of an
error.
Argument | Description |
---|---|
attempt_id |
the attempt identifier (as returned by
create_attempt() ) |
points |
an R list object which holds all grading information for the attempt |
... |
currently not used |
get_attempts(user, exam_id, exam_version, ...)
Query all attempts filtered by the given arguments. The argument
user
may be NULL
, in which case the function
should return the attempts for all users. The function must
return a list of attempt objects (of the form described below), or
NULL
in case of an error.
Argument | Description |
---|---|
user |
user object as returned by the authentication provider |
exam_id |
exam identifier (scalar character) |
exam_version |
version string associated with the exam (scalar character) |
... |
currently not used |
The function must return a list of attempt objects. An attempt object is a list containing the following information
Element | Description |
---|---|
id |
attempt identifier |
user |
user object associated with this attempt (as given to
create_attempt() ) |
started_at |
time (as POSIXct object in the
system’s timezone) the attempt was started |
finished_at |
time (as POSIXct object in the
system’s timezone) the attempt was finished (may be
NULL if the attempt is not yet finished) |
points |
R list object as given to grade_attempt()
(may be NULL if the attempt is not yet graded) |
save_section_data(attempt_id, section, section_data, ...)
Save section data for the given attempt. The function should return
TRUE
if the data is saved successfully and
FALSE
in case of an error.
Argument | Description |
---|---|
attempt_id |
the attempt identifier (as returned by
create_attempt() ) |
section |
section identifier |
section_data |
an R list object with section-specific data |
... |
currently not used |
get_section_data(attempt_id, section, ...)
Query the most recently saved section data for the given
attempt. If section
is NULL
, the function
should not filter by section, but return the latest data for all
sections. The function must return a list of section data objects (of
the form described below), or NULL
in case of an error.
Argument | Description |
---|---|
attempt_id |
the attempt identifier (as returned by
create_attempt() ) |
section |
section identifier |
... |
currently not used |
The function must return a list of section data objects. A section data object is a list containing the following information.
Element | Description |
---|---|
section |
section identifier |
timestamp |
time (as POSIXct object in the
system’s timezone) when the data was saved |
section_data |
an R list object with section-specific data, exactly as
given to save_section_data()
|
get_last_section(attempt_id, ...)
Get the identifier of the section for which the most recent data is
available (i.e., the section the attempt was showing last). The function
should return a character string with the section identifier of the
section for which data was saved most recently. In case of an error, or
if there is no data available for the given filter, the function should
return NULL
.
Argument | Description |
---|---|
attempt_id |
the attempt identifier (as returned by
create_attempt() ) |
... |
currently not used |