Kay Dickersin knew she was leaping to the front lines of scholarly publication when she joined The Online Journal of Current Clinical Trials. Scientific print-publishing was—and still is—slow and cumbersome, and reading its results sometimes required researchers to go to the library. But as associate editor at this electronic peer-reviewed journal—one of the very first, launched in the summer of 1992—Dickersin was poised to help bring scientists into the new digital age.
Dickersin, an epidemiologist, acted as an associate editor, helping researchers publish their work. But the OJCCT was a bit ahead of its time. The journal was sold in 1994 to a publisher that eventually became part of Taylor & Francis, and which stopped the e-presses just a couple years later. And after that happened, its papers—reports, reviews, and meta-analysis of clinical trials—all disappeared. Dickersin wasn’t just sad to lose her editing gig: She was dismayed that the scientific community was losing those archives. “One of my important studies was in there,” she says, “and no one could get it.”
Couldn’t, that is, until Dickersin decided to go spelunking for science.
For more than a decade, Dickersin’s paper was missing along with about 80 others. Sometimes, the ex-editors would try to find out who had the rights to the articles, whether they could just take copies and put them on their own website. “We don’t want to do that,” they’d always conclude. “We don’t want to get in trouble.” Finally, Dickersin went to the librarians at Johns Hopkins University, where she is a professor, for help—and that’s how she found Portico.
Portico is like a Wayback Machine for scholarly publications. The digital preservation service ingests, meta-tags, preserves, manages, and updates content for publishers and libraries, and then provides access to those archives. The company soon signed on to the project and got permission from Taylor & Francis to make the future archives open-access.
Then came the trial of actually getting the articles. Edward Huth, the journal’s once-editor-in-chief, had a CD-ROM with some. Dickersin and librarian Mariyam Thohira searched catalogs for article titles and locations, and requested some scattered copies via interlibrary loan. Dickersin scanned in her own files.
A copy of her important paper, a report on publication bias, appeared in the records they uncovered. In the article, Dickersin had looked at 293 clinical trials funded by the National Institutes of Health to find out whether the trials’ characteristics and findings affected their publication. See, scientists tend to publish positive findings and leave negative or null results in their desk drawers/desktops. It’s a well-understood gap today, but when Dickersin published in 1993, “reproducibility crisis” wasn’t yet a buzzword. But her research was already there: While 93 percent of the completed clinical trials did publish results, most of the 7 percent that stayed mum had negative conclusions.
She and Thohira placed this paper, along with the rest of the spoils, into a Dropbox folder that they shared with Portico. They managed to turn up more than 50 of the articles, but a subset of papers was—is—missing.
Portico has asked the medical community to dig around and send along the papers they have that perhaps no one else does. “This is a good test,” says Kate Wittenberg, Portico’s managing director. “It’s an experiment for us. I don’t think we’ve ever turned to crowd-sourcing.” In the quest to create a universally accessible online archive, individual humans’ downloads and printouts, hoarded offline, are the only things that can complete the catalog.
Whether it’s cancer screenings or supernova specifics or fossil interpretation, having that history is both important and getting harder. Tech changes fast; data files change fast; graphics packages change fast; software changes fast. At Portico, preservationists are trying to forecast what publishers will be doing in a decade, and how to keep safe the datasets and the analysis algorithms—always in the background. “If we’re doing our job very well, no one notices us,” says Wittenberg.
Portico is not the only player in this invisible game. Leslie Johnston, director of digital preservation at the National Archives and Records Administration, is the person in charge of figuring out how to cache and maintain digital government and historical records for the US in perpetuity—emails, census and topographic maps, photos from the shuttle or old National Science Foundation events, aerial images of Earth, and the datasets from federal organizations. “There are a lot of federal agencies that do research,” says Johnston. And the government funds a lot of research. The federal archivists try to make sure that governmental data and software of lasting import last.
Johnston got her start training as an archaeologist (go figure), and in the late 90s, she worked for the Harvard Design School as head of instructional technology and library systems. There, faculty members sometimes wanted to access to files from a previous course. “The catch was that we hadn’t kept any of it,” says Johnston. “Every term, we overwrote what was on the server.” It was kind of a volta for her, philosophically and professionally. “It suddenly dawned on me that what we had was the history of school, and we’d been throwing it away,” she says.
At the National Archives, Johnston’s team makes sure files are uncorrupted and then preserves them in their original forms, like keeping the WordPerfect for DOS version of your fifth-grade book report even after you’ve converted it to a .docx. Then, they try to determine what file format is and will stay most accessible, and make another copy (and, as formats change again, different kinds of copies). Add metadata. Index them. Voila (if we elide difficulties and details): preserved. That way, people can find, for example, 2001’s Storm and Unusual Weather Phenomena Data and make something of them—in 2101.
A lot can go on in a hundred years, or a thousand, or however long we keep existing and trying to understand this universe. File formats can become obsolete and unreadable, nuclear bombs can explode, tidal waves can inundate. Johnston thinks about all of it, and how to make sure the records, scientific and otherwise, survive and stay searchable. “My job is to worry about the worst thing that can happen,” she says. Because if or when it does, you want to make sure science doesn’t go the way of the Library of Alexandria.
- Now that almost all scientific journals put their print versions online, the debates about where and how science should appear are different.
- People in the open-access movement think research should be both easily accessible and free.
- Another hot topic in scientific publishing involves preprints: Articles freely posted online before peer review. Biologists preprint quite a bit, but they’re more conflicted about it than astronomers and physicists, who founded the arXiv around the same time the OJCCT came online.