If you use RegData, please cite:

Al-Ubaydli, O. and McLaughlin, P.A. (2015) “RegData: A numerical database on industry-specific regulations for all United States industries and federal regulations, 1997-2012.” Regulation & Governance, doi: 10.1111/rego.12107.

What is a regulatory restriction?

Is it better to look at a regulator’s word counts or restriction counts?

Can I compare the levels of regulation in different industries?

How is industry regulation determined?

Can a parent industry have less regulation than a child industry?

Why graph by industry?

Can I print or save the graphs?

Why are certain industries unavailable?

RegData says an industry has “no significant regulation.” What does that mean?

Why can I only view term count for a single industry at a time?

What’s the difference between 2-digit, 3-digit, and 4-digit industries?

Does RegData cover 5-digit and 6-digit industries?

What is the industry search term count?

What data series are available?

Does RegData include appendices, supplements, and guidance documents that are printed in the CFR?

Can I access the raw data?

What is a regulatory restriction?

A regulatory restriction is an occurrence of one of the following strings: “shall,” “must,” “may not,” “prohibited,” and “required.”

Is it better to look at a regulator’s word counts or restriction counts?

It depends. At an aggregate level, word counts and restriction counts are highly correlated. However, when you break the text of the CFR up into numerous chunks—such as those that correspond to specific regulators—you start to see that some regulators use more restrictions relative to word counts than others. If you simply want to know how much text a regulator has created, then use word counts. If you want to know many restrictions a regulator has created, then use restrictions.

Can I compare the levels of regulation in different industries?

You can compare the growth of regulation across industries, but the RegData 2.0 interface does not facilitate the comparison of levels of regulation across industries. This is because some sets of industry search terms include words that are much more common than other sets of search terms. While it is possible to download the data and compare the levels of regulation in different industries, users should be aware that comparing the levels of regulation across industries may not yield accurate results. Growth rates, on the other hand, can be compared across industries.

How is industry regulation determined?

RegData uses text analysis to measure how frequently a part of the CFR targets each specific industry in the economy. By combining this information with the number of restrictions in the same part of the CFR, RegData offers a metric of how heavily regulated each industry is in each year that its data covers.

The Growth of Industry Regulation dataset indicates how regulated an industry is at a given point in time compared to how regulated it was in 1997. See the details on the methodology page.

Can a parent industry have less regulation than a child industry?

Yes, but with a caveat: this does not necessarily mean the parent is less regulated than the child. It is better to look at the industry regulation indexes for each industry and compare growth rates of regulation rather than the levels of regulation shown in the industry regulation series.

Industry regulation comprises two components: restrictions in a CFR part and industry relevance of the part containing those restrictions. Industry relevance equals the number of times an industry is mentioned in a CFR part, divided by the number of words in that part. RegData measures how often an industry is mentioned by looking for industry-specific search terms in the text of the CFR. Some search terms are more likely to be used by regulators than others. This is because the search terms used follow a predetermined set of rules.

In some cases, this led to a set of search terms that matches fairly well with the terms that regulators tend to use when they target particular industries with regulation. In other cases, the NAICS description was either too vague or too specific, resulting in a set of search terms that did not have much chance of being used in the CFR, or that were too ambiguous. For example, one 3-digit NAICS industry is “Transportation Equipment Manufacturing.” This includes automakers, train manufacturers, airplane makers, etc. While all of these manufacturers are regulated at the federal level, they are usually regulated at a more specific level (e.g., the 2011 Volume 6 of Title 49 uses the term “automobile manufacturer” 30 times, while it uses the term “transportation equipment manufacturer” 0 times). For this reason, RegData does not easily lend itself to comparisons of the absolute level of regulation across industries.

Why graph by industry?

Industry refers to our metric of how heavily regulated each industry in the economy is. This can be used to examine changes in the quantity of regulation targeting specific industries over time. You can select 2-digit, 3-digit, or 4-digit industries, and the corresponding graphs will show how targeted the selected industries are by the restrictions in the CFR.

There are several potential uses of a measure of how heavily regulated specific industries are. Both the causes and consequences of regulation are likely to differ from one industry to the next, and by quantifying regulations for all industries, individuals can test whether industry characteristics, such as dynamism, unionization, or a penchant for lobbying, are correlated with industry-specific regulation levels. The variety of industry-specific regulatory outcomes offered by RegData permits researchers to compare effects across industries with greater statistical certainty. For example, if someone wanted to know whether high unionization rates are correlated with heavy regulation, she could compare RegData’s measure of industry-specific regulation for highly unionized industries to industries with little to no unionization.

Can I print or save the graphs?

Yes. On the upper right hand corner of the graphs you’ll see an export button, which gives you options to print or save the graphs.

Why are certain industries unavailable?

The regulation index is constructed using two components: restrictions and industry relevance. For some industries, it is more difficult to construct a convincing industry relevance metric than for others. This is because of the search terms RegData uses.

When constructing the metric industry relevance, RegData searches each part of the CFR for the number of times that each industry is mentioned. Specifically, RegData follows a set of rules to develop a set of search terms for each industry based on the North American Industry Classification System’s (NAICS) descriptions of all industries. In some cases, this leads to a set of search terms that matches fairly well with the terms that regulators tend to use when they target particular industries with regulation. In other cases, the NAICS description was either too vague or too specific, resulting in a set of search terms that did not have much chance of being used in the CFR, or that were too ambiguous.

For example, one 3-digit NAICS industry is “Transportation Equipment Manufacturing.” This includes automakers, train manufacturers, airplane makers, etc. While all of these manufacturers are regulated at the federal level, they are usually regulated at a more specific level (e.g., the 2011 Volume 6 of Title 49 uses the term “automobile manufacturer” 30 times, while it uses the term “transportation equipment manufacturer” 0 times). The website does not allow you to graph industries that had an average industry relevance less than a minimum threshold.

In the next version of RegData, industry descriptions at the 5- and 6-digit levels will be included, which will create hundreds of more specific search terms, such as “automobile manufacturer,” that are likely to be used by regulators.

RegData says an industry has “no significant regulation.” What does that mean?

RegData will show the “no significant regulation” message when the selected regulator’s text does not include any of the industry’s search terms, or if the level of the regulation index is so low that showing industry regulation growth rates might be misleading.

Why can I only view term count for a single industry at a time?

The industry search term count equals the number of times the search terms of a particular industry were found in the relevant text of the CFR. The search terms themselves are based on the industry description used by the North American Industry Classification System (NAICS). For example, the search terms for “Oil and Gas Extraction” industry (a 3-digit industry) include the strings, “oil,” “gas,” “oil extraction,” and “gas extraction,” among others.

Some of these search terms are more likely to be used by regulators and in natural language than others. The terms “oil” and “gas” are much more common, for example, than the terms “oil extraction” or “gas extraction.” Similarly, some industries have search terms that are much more common than other industries. As a result, unless a user controls for the probability of a search term occurring in natural or legal language, it can be misleading to compare search term counts across industries. On the other hand, because the search terms within an industry remain the same from year to year, a user can examine the changes in the search term count series within one industry over time.

What’s the difference between 2-digit, 3-digit, and 4-digit industries?

2-digit data uses very broad classifications (e.g., manufacturing) to describe the sectors of the economy. Three-digit data is made up of more precise subdivisions of the 2-digit classifications (e.g., chemical manufacturing), while 4-digit data contains more precise subdivisions of the 3-digit classifications (e.g., pharmaceutical and medicine manufacturing). Another way of thinking of these divisions is as parents and children; for example, manufacturing is the parent to chemical manufacturing (and several other types of manufacturing in the 3-digit category), which in turn is the parent to pharmaceutical and medicine manufacturing (and other types of manufacturing in the 4-digit category).

Does RegData cover 5-digit and 6-digit industries?

Not yet. We plan to expand RegData again in the future, and at that point we will likely include 5- and 6-digit industries.

What is the industry search term count?

Our program searches each part of the CFR for industry-specific search terms. The search term count that is generated for a regulator and industry indicates how often that industry’s search terms appeared in the parts of the CFR where that regulator published in each year.

What data series are available?

RegData 2.0 includes several components, each of which is available as its own data series. These include: word count (by regulator), restriction count (by regulator), search term count (by regulator and industry), industry relevance (by regulator and industry), industry regulation (by regulator and industry), and the industry regulation index (by regulator and index).

Does RegData include appendices, supplements, and guidance documents that are printed in the CFR?

Kind of. RegData 2.0 relies on XML files that are published by the United States Government Printing Office. These XML files allow RegData to attribute regulatory text to the regulatory agency that created the regulations. However, appendices, supplements, and guidance documents that are not printed within CFR parts had to be excluded. The structure of the XML files does not allow RegData to attribute any text outside of CFR parts to the regulator that created it.

RegData still measures these appendices, supplements, guidance documents, and any other text that is in the CFR but not in a CFR part to create a total count data series. If you look at the total restrictions graph in all industries by all regulators, you will see all restrictions attributed to regulators plus restrictions found in the unattributable text.

Can I access the raw data?

Yes! You can download the full dataset. See the data page for details.