The Datafication of Student Life and the Consequences for Student Data Privacy

Author Kyle M. L. Jones (MLIS, PhD) Indiana University-Indianapolis (IUPUI)

The COVID-19 pandemic changed American higher education in more ways than many people realize: beyond forcing schools to transition overnight to fully online learning, the health crisis has indirectly fueled institutions’ desire to datafy students in order to track, measure, and intervene in their lives. Higher education institutions now collect enormous amounts of student data, by tracking students’ performance and behaviors through learning management systems, learning analytic systems, keystroke clicks, radio frequency identification, and card swipes throughout campus locations. How do institutions use all this data, and what are the implications for student data privacy? Are the technologies as effective as institutions claim? This blog explores these questions and calls for higher education institutions to better protect students, their individuality, and their power to make the best choices for their education and lives.

When the pandemic prevented faculty and students from accessing their common campus haunts, including offices and classrooms, they relied on technologies to fill their information, communication, and education needs. Higher education was arguably better prepared than other organizations and institutions for immersive online education. For decades, universities and colleges have invested significant resources in networking infrastructures and applications to support constant communication and information sharing. Educational technologies, such as learning management systems (LMSs) licensed by Instructure (Canvas) and Blackboard, and productivity tools such as Microsoft’s Office365 are ubiquitous in higher education. So, while the transition to online education was difficult for some in pedagogical terms, the technological ability to do so was not: higher education was prepared.

Datafication Explained: How Institutions Quantify Students

The same technological ubiquity that has helped higher education succeed during the pandemic has also fueled institutions’ growing desire to datafy students for the purposes of observing, measuring, and intervening in their lives. These practices are not new to universities and colleges, who have long held that creating education records about students supports administrative record keeping and instruction. But data and informational conditions today are much different than just 10 to 20 years ago: the ability to track, capture, and analyze a student’s online information behaviors, communications, and system actions (e.g., clicks, keystrokes, facial movements), not to mention their granular academic history, is possible.

In non-pandemic times, when students are immersed in campus life, myriad sensors (e.g., WiFi, RFID) and systems (e.g., building and transactional card swipes) associated with a specific location also make it possible to analyze a student’s physical movements. These data points enable institutions to track where a student has been and with whom that student has associated, by examining similar patterns in the data.

How are institutions and the educational technology (edtech) companies they rely on using their growing stores of data? At the University of Arizona, its “Smart Campus research” aims to “repurpose the data already being captured from student ID cards to identify those most at risk for not returning after their first year of college.” It used student ID card data to track and measure social interactions through time-stamp and geolocation metadata. The analysis enabled the university to map student interactions and their social networks, all for the purpose of predicting a student’s likelihood of being retained.

Edtech has also invested heavily in descriptive and predictive analytic capabilities, sometimes referred to as learning analytics. Common LMSs often record and share descriptive statistics with instructors concerning which pages and resources (e.g., PDFs, quizzes, etc.) a student has clicked on; some instructors use the data to create visualizations to make students aware of their engagement levels in comparison to their peers in a course. Other companies use their access to real-time system data and the students who create the data, to run experiments. Pearson gained attention for its use of social-psychological interventions on over 9,000 students at 165 institutions to test “whether students who received the messages attempted and completed more problems than their counterparts at other institutions.” While some characterize Pearson’s efforts as simple A/B testing, often used to examine interface tweaks on websites and applications, Pearson did the interventions based on its own ethical review, without input from any of the 165 institutions and without students’ consent.

Is Datafication Worth It? Privacy Considerations

The higher education data ecosystem and the paths it opens for universities, edtech, and other third-party actors to use it raises significant questions about the effects on students’ privacy. The datafication of student life may lead institutions to improve student learning as well as retention and graduation rates. Maybe studying student life at a granular, identifiable level, or even at broader subgroup levels, improves institutional decision making and improves an institution’s financial situation. But what are the costs of these gains? The examples above, many of which I have more comprehensively summarized and analyzed elsewhere, point to clear issues.

Chief among them is privacy. It is not normative for institutions—or the companies they contract for services—to expose a student’s life, regardless of the purposes or justifications. Yet, universities and colleges continue to push the point that they can do so and are often justified in doing so if it improves student success. But student success is a broad term. Whose success matters and warrants the intrusion? Often an analytic, especially a predictive measure, requires historical data, meaning that one student’s life is made analyzable only for another student downstream to benefit months or years later. And how do institutions define success? Student success may be learning gains, but education institutions often construe it as retention and graduation, which are just proxies.

When institutions datafy student life for some purpose other than to directly help students, they treat students as objects—not human beings with unique interests, goals, and autonomy over their lives. Institutions and others can use data and related artifacts to guide, nudge, and even manipulate student choices with an invisible hand, since students are rarely aware of the full reach of an institution’s data infrastructure. Students trust that institutions will protect identifiable data and information, but that trust is misplaced if institutions 1) are not transparent about their data practices and 2) do not enable students to make their own privacy choices to the greatest extent possible. Student privacy policy is often difficult for students to understand and locate.

Moreover, institutions need to justify their analytic practices. They should provide an overview of the intention of their practice and explain the empirical support for that justification. If the practice is experimental, institutions must communicate that they have no clear evidence that the practice will produce benefits for students. If science supports the practice, institutions should provide that science to students to review and summarize.

Many other policy and practice recommendations are relevant, as the literature outlines ethics codes, philosophical arguments, and useful principles for practice. The key point here is that the datafication of student life and the privacy problems it creates are justified only if higher education institutions protect students and put their interests first, treat students as humans, and respect their choices about their lives.

The Datafication of Student Life and the Consequences for Student Data Privacy

Related Resources