UNC Libraries Utilize Machine Learning To Uncover Racist Laws In South’s History

home_sep-scaled — Civil rights demonstrators stand in front of Chapel Hill's Long Meadow Dairy Store in February, 1960. This photo is #P0033 of the Roland Giduz Photographic Collection, North Carolina Collection Photographic Archives, The Wilson Library, UNC-Chapel Hill.

Machine learning doesn’t just plunge us headfirst into what the future could hold. It also can provide greater insight into the past.

This was a discovery made by UNC Libraries when they began using machine learning to uncover racist laws in North Carolina’s history—from Reconstruction through the start of the Civil Rights Movement—as part of the On the Books project.

In 2019, On the Books: Jim Crow and Algorithms of Resistance began when a social studies teacher asked UNC librarian Sarah Carrier if there was a comprehensive list of all Jim Crow laws in North Carolina. In fact, the last publication on the subject was Pauli Murray’s work States’ Laws on Race and Color from the 1950s.

Very little research had been done to compile a truly comprehensive directory of these laws since then. So UNC Libraries, led by co-principal investigators Amanda Henley and Matt Jansen, identified relevant record volumes and trained an algorithm to find racist language in laws with the help of historical and legal scholars. From their algorithm, they found nearly 2,000 Jim Crow laws in North Carolina from 1866 to 1967.

“We think that it’s an important project because we believe that understanding the past is really critical in shaping a more equitable and inclusive future,” Henley said. “We think it’s really important that people understand the pervasiveness of the Jim Crow laws, and also when they were enacted, to better understand the inequities that still exist.”

Jansen said it remained vital to keep materials accessible to K-12 students, not just legal scholars and university students, since the original teacher who posed the question taught at a primary school. At the same time, On The Books drew on the expertise of disciplines across UNC, including the Law Library.

Said Jansen, “It really speaks to the age-old idea of the university as people who have these different areas of expertise coming together in one place to collaborate on things.”

Mellon Foundation grant

Last month, UNC Libraries received a grant of $400,000 from the Andrew W Mellon Foundation to extend their work to two more states. (The process of picking which two states is ongoing.) This will help researchers better understand how racist legislation practices compared across states, Henley and Jansen said.

“We need people in-state who know a lot about the history of whichever state they’re coming from,” Jansen said, “and have all those components of the legal expertise, the technical expertise, that kind of stuff, to really suss through all the different little challenges and local oddities that will definitely arise in this type of project.

The grant will also fund research and teaching fellowships using the project’s techniques, enabling more researchers to use the products for research questions involving North Carolina’s legal history, Jim Crow and more.

Research fellows will pursue their own projects incorporating On The Books while teaching fellows will develop instruction modules from the materials.

Jansen said through the research process, they found many laws on the edge of the traditional Jim Crow law definitions, including laws targeting Native Americans. The list they’ve come up with is still not entirely comprehensive, but it offers a greater depth of time to Murray’s work from the 1950s.

“None of this stuff is perfect,” Jansen said. “It’s all better, so I think everything will continue to get better through all these various projects this year. We’ll better understand what’s going on. We’ll better understand the limitations of what we’ve done so far and what the paths forward might be. I think that’s exciting.”

Surprisingly large volume of laws

Another discovery they made is that many counties passed laws with similar language in efforts to copy the Jim Crow regulations of their neighbors, Jansen said. From their research, both PIs expressed surprise at just the sheer volume of Jim Crow laws that proliferated in North Carolina. It’s a complex structure that includes the laws that initiated segregation as well as the multitudes created to uphold and administer the practice, Henley said.

“The number of laws that had to be passed to uphold all this stuff over time was a really big part of what we found,” Jansen added.

While the scope of this project was specific to North Carolina’s Jim Crow-era laws, libraries are interested in using machine learning to uncover other depths to the content they already have. After all, there’s a limit to how much material individuals can go through and painstakingly catalog.

While a specific scholar’s expertise can be integral to a research project, humans are not well-tuned to analyze scores of data, Jansen said. Machine learning allows us to see patterns across wide arrays of data.

“We wouldn’t be able to identify these laws without the close understanding and the close learning that the historians bring to the table,” Jansen said. “But it is scaling that up and looking for any surprising trends or differences at scale that we can really only do through a machine learning or data-driven type of approach.”