January 2018 – ResearchScapes

Despite the fact that Wikipedia was born almost two decades ago, despite the fact that many libraries (mine included) have cancelled all other print and digital general encyclopedias and use it by preference, despite the fact that an increasing number of academics have actually found interesting uses for it within their classrooms – Wikipedia remains controversial. There are of course questions about bias and accuracy in any crowd-sourced site. But a short look into the history of encyclopedic works should alleviate some fears.

Wikipedia first came into being in 2001. The Internet itself had already grown beyond the “primordial swamp” that Paul Evans Peters called it in 1990 (Discussion at Institute on Collection Development for the Electronic Library. April 29-May 2, 1990,) but it was still a place that held a wild mix of legitimate, questionable, and not-so-legitimate sources. Graphical user interfaces were relatively new, search engines were unsophisticated, and there was little consistency in who was making digital materials available, and what it was they were offering the public.

To complicate things, the wiki platform confused many people in the academic world. Wikipedia was created by what seemed to be a world-wide group of interested readers, readers that might or might not have any recognized authority about what they wrote. This made Wikipedia seem amateurish and intellectually suspect.

To put it very simply, Wikipedia seemed to have little claim to any intellectual authority. The term “crowdsourcing” had not yet been coined; to the serious eye, Wikipedia was based on unvetted volunteerism. It was a kind of “stone soup,” where people were adding, trading off, editing each other, reporting inappropriate posts, always always creating something with no obvious recipe.

Wikipedia’s main competition, of course, was the venerable Encyclopedia Britannica.

Photo by Valentin on Unsplash

Between 1768-1771, the first edition of the Encyclopedia Britannica was compiled in Edinburgh and published in three volumes. As the first English-language encyclopedia, it quickly became an important title in the ever-increasing number of published reference works. It was heavily edited, and articles came to be written and signed by well-known scholars. As the scope of scholarship expanded rapidly, so did the Britannica’s size. When the 11th edition was published in 1910, it had increased to a whopping twenty-nine volumes. With that edition, its publication passed to the United States.

Society had come to look at encyclopedias in two ways. First, they were a convenient way of holding large amounts of information, paper cans to put facts and knowledge in. But an equally significant characteristic was that they were also a way of talking about that knowledge in an authoritative way.
So our crowd-sourced, stone-soup encyclopedia, Wikipedia, was born into a world that, on the surface, already had a hugely historic and effective title dominating the encyclopedic landscape.

But did it really?

The value of a reference work lies in its timeliness, its accuracy, and its authority. By 2001, even Britannica’s conservative editorship had allowed digital publication. But they maintained tight control over authorship and editing, leading, of course, to an issue with timeliness. Wikipedia, although the sourcing and authorship was distributed, was able to add, update and correct entries very quickly, literally on an hourly basis.

And that leads to the second important aspect of the value of a reference work: accuracy. The founders and serious participants of Wikipedia quickly developed mechanisms by which entered articles could be flagged, corrected, and objected to. Pieces of missing information could be added, explanations could be expanded, and articles could be removed. And although all of that remained the basis for the greatest objections to Wikipedia, the organization and its world-wide community soldiered on. Finally, in 2005, the highly respected journal Nature published an article in which the two titles were put head to head on the question of accuracy. And although Wikipedia was found to have a few more errors in the selected articles, it was determined that both Britannica and Wikipedia had errors. (Nature 438, 900–901 (15 December 2005))

There is also the ever-important argument of the importance of “authority.” For although Britannica’s reputation had been diminished somewhat when its editorship moved the United State, that could regarded as an issue of intellectual snobbery. The editors remained committed to finding the best possible authors for articles. Wikipedia, of course, was dependent on the intellectual efforts of unvetted volunteers.

But, against our belief in authority, we must place cultural and temporal bias. So, in the 11th edition of the Britannica, in the article on “The Negro,” the scholar Thomas Joyce writes “Mentally the negro is inferior to the white.” Clearly such a statement would never appear in the current edition of any decent encyclopedia. But I put it here to suggest that at the time that anything is published, an author and a few editors might not be in a good position to have the cultural distance to see bias.

So what can be our conclusion on Wikipedia?

Crowdsourcing clearly has its dangers, and therefore its detractors. But faith in unseen authority in edited reference works also has its dangers. Both types of sources inevitably reflect cultural biases and, frequently, have factual errors.

How do we teach students to use Wikipedia? We teach it the way we teach them to use any kind of reference work: read entries carefully and critically, examine them for bias. Use their bibliographies and added links to other materials and collections. Use them as jumping off points to more scholarly works. Use them (carefully) for a general orientation to a subject. And, of course, never use them as a citable source.

In short, as we all know, thoughtful, analytic reading of any source, at any time, is central to a researcher’s successful process. And don’t forget: the stone soup of fable turned out to be really tasty.

On Wednesday January 10, I had the privilege of presenting the following poster at the annual CTW* Retreat:

Harvesting Gov Docs Locally for Preservation & Discovery. Poster presented at CTW Retreat 10 Jan. 2018.

A quick summary of the chart featured prominently in the center of the poster, which is copied from James A. Jacobs’ report “Born-Digital U.S. Federal Government Information: Preservation and Access,” and which was re-presented in his October 2017 presentation with James R. Jacobs called “Government Information: Everywhere and Nowhere,” provides an easy way to understand the nature of the problem.

Scope of the Preservation Challenge. Source: Jacobs, 2014.

The first column represents the number of items distributed by the Government Publishing Office (GPO) to Federal Depository Library Program (FDLP) libraries in 2011 (appx. 10,200 items). The second column represents the total number of items distributed by GPO to FDLP over its entire 200 year history (appx. 2-3 million items). The third column is the number of URLs harvested by the 2008 End of Term crawl (appx. 160 million URLs).

Clearly, the scope of government information produced outside of the GPO and FDLP is very large. So large in fact that what is produced online each year makes the entire 200 year history of the Depository Library Program look like a drop in the bucket. This vast array of online government information can be called fugitive. No one knows how much born-digital government information has been created or where it all is.

At Connecticut College, Lori Looney and I are exploring ways of being proactive about this situation through our role in the FDLP. While we are unable to participate in large-scale digitization projects, we have nonetheless adopted this idea of being proactive in the FDLP from some of the ideas sketched out in Peter Hernon and Laura Saunders’ College & Research Libraries article “The Federal Depository Library Program in 2023: One Perspective on the Transition to the Future.” We see their proactive approach as preferable to withdrawing from the program altogether or assuming a more passive role within it that would maintain the status quo. We describe our adoption of this approach in our essay “Experience of a New Government Documents Librarian,” published in Susan Caro’s book Government Information Essentials.

Our latest activity addressed by the poster consists of several easy steps that librarians everywhere can do in their own libraries:

Keep track of your favorite websites and online publications, and make sure their URLs are captured in the Internet Archive’s Wayback Machine
Add rare, hard-to-find, and/or local government documents to your library catalog, as well as digitizing those that are not already available online, and upload them to Internet Archive, ideally with as much catalog metadata as possible
Advocate for the long-term value of seemingly obscure government information and help spread the word that short-term ease of accessibility actually masks the major problems associated with long-term preservation, access, and usability

Some of the documents we harvested in this capacity (see a few examples below) are local government publications that may not be easy to find online and which may not be accessible through any other library catalog anywhere. By finding them, adding them to Internet Archive, downloading them, physically adding them to our collection, and adding records to OCLC/WorldCat we are actively supporting preservation and discovery.

This is a very small way of responding to the very large problem of web preservation in general. However, as a small institution with a selective collection of government publications, it is a practical strategy for contributing to the efforts of larger institutions involved with the fascinating and complex problems like the End of Term (EOT) Web Archive.

—Andrew Lopez

_____

Works Consulted

Hernon, Peter, and Laura Saunders. “The Federal Depository Library Program in 2023: One Perspective on the Transition to the Future.” College and Research Libraries 70, no. 4 (2009): 351–70.

Jacobs, James A. “Born-Digital U.S. Federal Government Information: Preservation and Access.” Center for Research Libraries: Global Resources Collections Forum, 17 Mar. 2014.

Jacobs, James A., and James R. Jacobs. “Government Information: Everywhere and Nowhere.” Livestream web-based presentation to Government Publications Librarians of New England (GPLNE), 24 Oct. 2017.

Lopez, Andrew and Lori Looney. “Experience of a New Government Documents Librarian.” Government Information Essentials. Ed. Susanne Caro. Chicago: ALA Editions, 2018. 13-20.

Seneca, Tracy, Abbie Grotke, Cathy Nelson Hartman, and Kris Carpenter. “It Takes a Village to Save the Web: The End of Term Web Archive.” DttP: Documents to the People (Spring 2012): 16-23.

_____

*CTW is the library consortium between Connecticut College, Trinity College, and Wesleyan University

ResearchScapes

Discussions on the art and craft of research

Month: January 2018

The Rise and Fall of Authority, or, is Wikipedia an Encyclopedia, or Stone Soup?

Harvesting Gov Docs Locally for Preservation and Discovery

Recent Posts

Archives

Categories

Meta

Tags

Recent Comments