Bulk Access to Primary Legal Materials

Works of the U.S. Government Investigation

1. Introduction and Packing List for the Data Set

Under U.S. law, works of the U.S. Government are not subject to copyright. Works of the U.S. Government are those authored by federal employees or officers in the course of their official duties. See 17 USC § 105 and 17 USC § 403. This section of the U.S. Copyright Act is only applicable to the federal government, not state or local jurisdictions.

Works of the U.S. Government are typically thought of as those directly published by the federal government, often through the facilities of the Government Publishing Office. However, another example are scholarly journal articles, authored by federal employees or officers in their course of their official duties, and published by private publishers. Under the law, those journal articles are not subject to copyright and, under § 403 of the act, should be explicitly labeled as not being subject to copyright. The overall issue of a journal is of course properly subject to copyright, but the individual articles that fall under § 105 are not. Public Resource undertook an investigation of the scholarly literature to determine if these provisions of the law are being observed. Our conclusion is that the law is being widely flaunted by private publishers who assert copyright and restrict access to works that are properly in the public domain.

Our study includes not only works by federal agencies or officers, but also those created in the National Laboratories. Though those works are not covered by § 105 of the U.S. Copyright Act, the Department of Energy requires, as a condition of their contract with each national laboratory, that the U.S. government be granted a perpetual global license to use the works freely. We have thus included the national laboratories in our study.

This page contains information about the data produced in the course of this investigation. The directories are as follows:

The tree is walkable with wget -m -np and will yield a 9.74 gigabyte data set. Each of the areas is described in the following sections. Public Resource asserts no copyright on any of these materials and you may use them freely, with or without attribution.

2. The American Bar Association Study

[aba/] The American Bar Association, in addition to it's role as a fraternal and professional association for lawyers, is also one of the largest publishers of legal journals. A large subset of their publications were manually scanned to determine which articles were authored by federal employees or officers and for any evidence of whether or not the article was written in the course of the author's official duties. The investigation pulled 552 articles. The results of that investigation were then tabulated. [pdf ; html]

Under ABA rules, a member may present a REPORT AND RESOLUTION for consideration by the House of Delegates, which then votes on the text of the resolution. Resolutions which receive an affirmative vote are considered policy statements of the American Bar Associates. Such a report and resolution on the subject of Works of the U.S. Government was prepared by Carl Malamud of Public Resource. A number of law students and lawyers joined as co-sponsors of the resolution. The report attached to the resolution outlines the history of the Works of Government clause of the U.S. Copyright Act and presents results of the audit of ABA publications, law review articles, and preliminary results on the examination of the broader scholarly corpus. The Report and Resolution were then submitted for calendaring to the Committee on Rules and Calendar of the House of Delegates

As a non-lawyer, Malamud was considered to be an associate member of the ABA and thus not eligible to prepare and submit such a resolution. Law students are likewise not considered to be full members of the ABA. After much back and forth, the names of all the non-lawyers were scrubbed from the documents and Ed Walters and Tim Stanley, both lawyers and long-time members of the Public Resource board of directors, became the official sponsors of the resolution. While the document went through a number of drafts, the May 23, 2017 draft is a good place to start.

The Report and Resolution was approved for calendaring before the House of Delegates and Carl Malamud was granted special privileges of the floor to present the issue. During this period, however, despite repeated attempts to reach out to the ABA Sections such as the Section on Intellectual Property Law and the Section on Administrative Law and Regulatory Practice and ABA Committees such as the Standing Committee on Publishing Oversight, all attempts to discuss the matter were rebuffed.

One week before the August 2017 ABA mid-year meeting where the Report and Resolution had been calendared, Public Resource was summoned to a conference call with eight ABA representatives of sections, including the Sections on Intellectual Property, Administrative Law, Antitrust Law, and Science. Representing Public Resource were Carl Malamud, Tim Stanley, and Misha Guttentag, a Public Resource Fellow. On that call, we were informed that after an examination of our audit results, the section representatives believed that none of the articles in question had been conducted in the course of the employee or author's official duties and that, furthermore, most section delegates to the had been instructed in advance to oppose the resolution. The representatives made clear that a full assault on the resolution would be made calling into account the legal analysis (which had been carefully vetted by Public Resource with a distinguished advisory board of legal experts) and would further continue to include personal attacks on the motivations of the presenters. Faced with a predetermined conclusion, Public Resource informed the Committee on Rules and Calendar that it would not be presenting the resolution for consideration.

3. Phase 1 Bibliographic Searches

[Phase1Searches/] A broad search of the bibliographic literature was conducted by Jeremy Sutton Frye, a graduate student at the University of North Carolina School of Information and Library Science. Mr. Frye was appointed a Public Resource Fellow and worked directly with Carl Malamud on this investigation and additional advice and assistance was provided by Professors Paul Jones and Denise Anthony. Additional technical support was provided by UNC's

The investigation consisted of examining the U.S. Government Manual and other sources to create a large number of search terms that could be used in searching bibliographic databases for articles that were authored by employees of various branches of the federal government. [Phase1Searches/Vocabulary/]

This vocabulary was then used on three of the major commercial bibliographic search services used by libraries: EBSCO, ProQuest, and Web of Science. Additional searches were conducted using the National Academies of Sciences, Engineering and Medicine's TRID database of transportation research literature and the IEEE's Xplore database of computer science literature. [Phase1Searches/Data/]

The results were exported using the standard RIS bibliographic exchange file format. A number of iterations on the vocabulary were conducted using random checks of search results to eliminate false positives and negative results. For example, a search for Centers for Disease Control yielded not only the U.S. agency but their Chinese counterpart which uses the same designation when publishing articles in English. After the searches were concluded, a list of the Digital Object Identifiers (DOIs) were extracted. Note that we choose to work only with articles that had a DOI assigned, which left out journal articles by publishers that had not adopted this convention. In addition to those scientific publications that don't use DOIs, that of course eliminates the vast majority of the scholarly literature for the legal profession, which has not kept pace with the best practices adopted by other scholars.

The resulting set of DOIs was a list of 1,196,153 articles that appeared to be by federal employees or officers. A number of simply scripts were then used to sort the list by publishers and by federal agency. Note that some of the DOIs on the list are not valid, others might include some false positives (not by the federal agency we thought we were pulling up), and the list does not make a determination of whether or not the article was conducted in the course of the employee or officer's official duties. [Phase1Searches/Lists/]

4. Phase 2 Audits

[Phase2Audits/] In Phase 2 of the investigation, a statistically valid random sample of DOIs was pulled for 38 government agencies and 29 publishers were pulled for manual examination. A few of the articles were unavailable on the University of North Carolina campus, a sad commentary on the state of access to the scholarly literature at major universities around the world. For those articles that we were able to pull, a spreadsheet was created for each that listed the authors and their affiliations, any copyright assertions by the publishers, the presence or absence of the disclaimers required under § 403 of the U.S. Copyright Act, and any evidence that the article was or was not conducted in the course of their official duties.

5. Identification of Publishers

[Publishers/] A utility directory is provided that maps DOI prefixes to publishers, with a JSON file for each publisher obtained from CrossRef. In addition, a csv file provides summary results.

6. DOI Trees from CrossRef and Unpaywall

[DOItrees/] Each of the DOIs on the master list was fed into the API of two major services that work with the scholarly literature. CrossRef is a nonprofit organization that serves as registry of DOIs for publishers. Their API was used to pull bibliographic information on the articles. Unpaywall is a wonderful nonprofit service that looks for articles that are available freely “in the wild” and their API provides results in the JSON format including the location of articles that availble. Each of the DOIs was sent into these services and the results saved as a directory tree (including error results where the DOI we retrieved from the commercial bibliographic search services). Those results were were tarred up into two .tgz files. In addition, the first 500 json results are provided for browsing for 7 of the publishers: American Medical Association (DOI prefix 10.1001), Wiley Blackwell (10.1002), Elsevier - Academic Press (10.1006), Springer Verlag (10.1007), Elsevier (10.1016), Cambridge University Press (10.1017), and American Chemical Society (10.1021).

7. Conclusion

While the results vary somewhat by publisher and by agency, the overall conclusion is striking: the vast majority of the scholarly literature authored by federal employees does not contain the disclaimer required under the law. In addition, for the vast majority of the scholarly literature, publishers improperly and illegally assert copyright, reserve all rights, and deny access to materials that were carried out with public funds and for a public purpose. This misappropriation of the public domain for private gain is widespread within the community of publishers, and is prevalent not only among commercial publishers whose primary purpose is pecuniary gain but among nonprofit scholarly societies that are chartered for public purposes.