What Matters the Most with Education Data
Education agencies nationwide have collectively spent hundreds of millions of dollars over the last decade building state longitudinal data systems (SLDS) to collect and integrate the information dispersed across countless source systems throughout their enterprise into a consolidated education data warehouse. The goals of these systems include streamlining and simplifying federal and state reporting, but increasingly the integrated data warehouse is sought after for use by educators on the ground. SLDS data can power data-driven education dashboards that offer teachers up-to-date information about their classrooms, insight into how different groups of students are performing, and which areas of focus might improve each individual student’s outcome. Guidance counselors and school principals can access charts and visualizations of discipline and behavioral trends, truancy, and test performance to identify students at-risk of dropping out, so they can more effectively intervene to help turn those students around.
The success of SLDS initiatives hinges on many factors, among them strong sponsorship and project management, effective stakeholder engagement, robust system performance, comprehensive security, and data governance plans. I could make an argument, however, that nothing matters more than data quality.
Low-quality, inaccurate data can erode ROI quickly and, in some cases, irretrievably. Poor data quality results in misleading analysis. It can lead educators to faulty conclusions and suboptimal decisions. And once stakeholders lose confidence in the accuracy and quality of the data, the system is, perhaps forever, perceived as unreliable. Stakeholders abandon its use, and ROI is destroyed.
There’s No Room for Poor Data Quality in Education
Many industries can afford some level of inaccurate data – some level of fault tolerance is acceptable. This is often the case in manufacturing and even retail. Education, however, exists to further the improvement of individual minds. There can be no tolerance for faulty data. That the data is mostly accurate brings no comfort to students or their parents. Furthermore, inaccurate data may be challenged, and must ultimately be corrected. This can entail an expensive dispute / resolution process that may include hearings, reviews, judgements, and the time and expense of countless resources, both administrative and technical. So it pays to get the data right before using it.
How, then, are education data quality issues detected and managed? Traditionally, the plethora of data quality rules governing collection and integration of education data are embedded into the extract, transform, and loading (ETL) that integrates the data. That means the data quality rules are hardcoded into the software code. This approach is problematic because as the rules change over time, or as rules must be removed or added, the ETL needs to change. ETL developers are expensive, and changing business rules that are hardcoded into the ETL requires a full SDLC of design, development, and full regression testing of that ETL. In the interim the business loses time and money.
Another approach is to find the data quality issues after the data has been loaded, using post-ETL reports designed to expose data quality issues. That approach, however, has its own drawbacks – data quality issues are only identified after the data has already been loaded, and, once identified, erroneous records may need to be deleted and the source data, once corrected, needs to be loaded again.
There is a better way. In my next blog I’ll discuss a third approach, one eScholar has developed, that allows customers to create, update, and enforce their business rules before data is loaded to the SLDS and without the need for expensive ETL development cycles.
Join the discussion!
In the meantime, I’d love to hear from all of you what your take is on the challenge of education data quality. How are you addressing those challenges today? What are your pain points, and how have you approached overcoming them? Let us know in the comment section below or on Twitter. I hope you join the discussion – there is more than one way to skin a cat (who does that anyway?), and I look forward to hearing how all of you are tackling to challenges of education data quality!