Controlled outputs, full data: A privacy-protecting infrastructure for MOOC data
Abstract
Learning analytics research presents challenges for researchers embracing the principles of open science. Protecting student privacy is paramount, but progress in increasing scientific understanding and improving educational outcomes depends upon open, scalable and replicable research. Findings have repeatedly been shown to be contextually dependent on personal and demographic variables, so how can we use this data in a manner that is ethical and secure for all involved? This paper presents ongoing work on the MOOC Replication Framework (MORF), a big data repository and analysis environment for Massive Open Online Courses (MOOCs). We discuss MORF's approach to protecting student privacy, which allows researchers to use data without having direct access. Through an open API, documentation and tightly controlled outputs, this framework provides researchers with the opportunity to perform secure, scalable research and facilitates collaboration, replication, and novel research. We also highlight ways in which MORF represents a solution template to issues surrounding privacy and security in the age of big data in education and key challenges still to be tackled.
Practitioner notes
What is already known about this topic
- Personal Identifying Information (PII) has many valid and important research uses in education.
- The ability to replicate or build on analyses is important to modern educational research, and is usually enabled through sharing data.
- Data sharing generally does not involve PII in order to protect student privacy.
- MOOCs present a rich data source for education researchers to better understand online learning.
What this paper adds
- The MOOC replication framework (MORF) 2.1 is a new infrastructure that enables researchers to conduct analyses on student data without having direct access to the data, thus protecting student privacy.
- Detail of the MORF 2.1 structure and workflow.
Implications for practice and/or policy
- MORF 2.1 is available for use by practitioners and research with policy implications.
- The infrastructure and approach in MORF could be applied to other types of educational data.
CONFLICT OF INTEREST
No conflict of interest (financial or non-financial) has been declared by the authors.
Open Research
DATA AVAILABILITY STATEMENT
The data available in the MOOC Replication Framework can be used by external researchers through the framework under a data use agreement with the relevant university which oversees the data being accessed, and with evidence of approval of the research by an Institutional Review Board (or comparable ethical oversight organization, such as in uses of MORF for non-US data by non-US researchers).