On the Contents and Utility of IoT Cybersecurity Guidelines

11 min readJul 8, 2024

This is a brief for the research paper “On the Contents and Utility of IoT Cybersecurity Guidelines,” published at ACM FSE 2024 (PACSME). This work was led by Jesse Chen, and represents a collaboration between the University of Arizona (Chen, Rahaman) and Purdue University (Anandayuvaraj, Davis). The full paper is here.

Summary

Internet of Things (IoT) systems control both minor and major systems, from smart coffee machines and driers (it’s OK if these are a little dumb sometimes) to smart traffic lights and dams (these have to be pretty dam reliable). In recognition of this need, many different organizations have put out guidelines on how to improve the cybersecurity of IoT devices. There are over 100 official (officious?) documents titled things like “Guidelines for IoT Cybersecurity”. These documents provide no evidence to support their assertions. Their content and merits have not been critically examined. We do not know what topics and recommendations they cover, nor their effectiveness at preventing real-world IoT failures.

To understand what’s inside these documents, and whether engineers should pay any attention to them, we collected 142 IoT cybersecurity guidelines and then examined a random sample. Most of them have concrete advice: 87.2% recommendations are actionable and 38.7% recommendations can prevent specific threats. We identified 958 unique recommendations. Comparing across each guideline, we found that each guideline has gaps in its topic coverage and comprehensiveness. We “hand-checked” some major IoT failures from the news media and from cybersecurity reports, and found that (1) the union of the guidelines mitigated all 17 of the failures from our news stories corpus, (2) 21% of the CVEs evade the guidelines, and (3) individually, no guideline performed particularly well. In summary, we found substantial shortcomings in each guideline’s depth and breadth, but as a whole they can address major security issues.

Background

IoT: Definition and engineering implications

Internet of Things (IoT) systems are everywhere in your daily life. We used to call these systems “embedded systems” and then we started connecting them to the Internet. These systems control aspects of our homes (refrigerators, ovens, clothes driers), our businesses (badge systems, timecards), our transportation (stoplights, dynamic speed limit signs), our power generation (dams), and much more. Engineers developing IoT systems are thus responsible for economic progress as well as safety. A lot of the intelligence built into IoT systems is software-based. We need hardware to give the devices the capability of reading sensors, processing the results, and communicating with each other and with remote intelligence on-premises or in the Cloud. But it’s software that determines what they do. (On a personal note, this is why I got into software engineering in the first place!)

The software industry does not have a great track record of correcntess and cybersecurity. The traditional adage has been to Move Fast and Break Things, but this is ethical only when the consequences are small (eg a game stops working). When the consequences are large — the clothes drier bursts into flame, the dam monitoring fails, the stoplights show green everywhere — we must Move Slow and Not Kill People.

IoT Security Guidelines

Since the IoT connects traditional embedded systems to the Internet, for the purpose of this paper we will assume that the embedded systems folks remember how to keep things safe. We focus instead on cybersecurity — that Internet connection means that untrusted folks may still be able to see the device, and we need to ensure they cannot interact with or control it inappropriately.

To promote IoT cybersecurity, many organizations have published IoT security guidelines. These organizations include government bodies such as NIST, for-profit companies such as Microsoft, non-profit trade groups such as the Cloud Security Alliance (CSA), and professional organizations such as the Institute of Electrical and Electronics Engineers (IEEE).

In this paper, we defined an IoT cybersecurity guideline as a set of “best practice” recommendations to secure IoT production and use, i.e. the security capabilities to incorporate into the product, and the engineering and operational processes for security. These guidelines can be general or domain-specific (e.g. healthcare). Domain-specific guidelines usually extend general guidelines. These general guidelines generally define IoT as a device or system of devices that are connected via some network (e.g. internet, Bluetooth). Thus, their threat models are similar, consisting of network threats (e.g. man-in-the-middle attacks) and device threats (e.g. unauthorized access).

Several governments are moving towards mandating these guidelines at different levels. In the United States, there has been state-level legislation (e.g. California, Oregon, Virginia), as well as national-level legislation to regulate IoT security. Other governments have also enacted legislation to regulate IoT security, such as the UK , Singapore, and the European Union. Individual companies are also adopting certain guidelines — for example, in 2023, GE Gas Power adopted the NIST Framework for Improving Critical Infrastructure Cybersecurity.

What do we want to find out?

Although IoT cybersecurity guidelines are being widely adopted, there are just a few research works that examine them critically. Those prior works do not fully explain their contents (topics covered and level of detail) nor utility (mitigation of real-world IoT security failures).

In this study, we specifically looked at general guidelines, and we asked four questions:

Theme I: Guideline Contents.

How comprehensive (breadth and depth coverage) are individual IoT guidelines?
What topics are covered collectively? How does coverage vary by publisher type?
To what extent are the recommendations concrete enough to be actionable, and specific enough to address distinct security threats, vulnerabilities, or attack surfaces?
To what extent could the guidelines have prevented real-world security problems?

Approach

The next figure illustrates our approach to understanding the guidelines contents. We start on the left with a big pile of IoT Cybersecurity Guidelines. Now what?

We had a pile of 142 guidelines documents, or 3,804 pages of content. Clearly we couldn’t read all of them!
At the time that we performed the study, Large Language Models (LLMs) were not yet available, so we couldn’t try to automate this process with them.
So we decided to repeatedly pick one, pull out the recommendations it makes, and insert that recommendation into a tree of knowledge.
We proceeded one-at-a-time and stopped when we stopped seeing new recommendations. This approach is called sampling until saturation.

Illustration of our approach to analyzing these guidelines. Figure reads left-to-right.

When we finished, we ended up with a tree of knowledge that looks like the next figure.

Excerpt of our tree of knowledge. It starts very general at the top level (“IoT Security Recommendations” describes everything!) and gets more specific as you go down. The figure shows sub-categories up to level 3. The % is the fraction of all recommendations that fit into the indicated node.

By making the full tree, we forced all of the different guidelines to use equivalent language and structure. The full tree shows the complete content across all guidelines, with all the repetition eliminated. Each of the guidelines we analyzed can be understood as containing a sub-tree of this tree — all the guidelines have IoT Security Recommendations, some of them describe organizational processes, some of those describe processes for IoT Infrastructure, some of those talk about Network configuration, and so on.

How comprehensive are the guidelines?

So, by comparing each of the per-guideline trees of knowledge to the unified version we created, we can understand the strengths and weaknesses of each guideline, as well as the potential benefit of reading a bunch of guidelines (do they all just say the same thing?).

Then, considering the full tree, we can look at specific failures (from news articles and from cybersecurity reports) and see if the causes of those failures would be mitigated if the engineers followed the full set of recommendations from the tree.

Most of the guidelines do not cover very many categories (breadth) and they do not go into much detail (depth). However, they do cover a lot of different topics, and when we put them all together we get some nice knowledge! This is depicted in the next figure, where our unified tree of knowledge is far more broad and deep than any individual guideline.

Two-dimensional assessment of comprehensiveness. The x-axis depicts depth (the level of detail on a given subject). The y-axis depicts breadth (the number of topics covered). Higher is better on both axes, so guidelines want to be in the upper right. The star in the upper right represents our tree of knowledge.

Remember, we didn’t add any new knowledge, we just organized the existing knowledge. So the fact that our tree is so much deeper and broader than each individual tree shows that there is a lot of knowledge out there, if only someone (ahem) would go to the trouble of organizing it.

How well can they prevent failures?

We answer this by analyzing past events. A retrospective study is a form of observational study used in many other research fields (e.g. healthcare, medicine, psychology) when conducting a prospective study is deemed infeasible or unnecessary. For this work, the past events are security vulnerabilities recorded in Common Vulnerabilities and Exposures (CVEs) and real-world failures reported in the news media.1 For each CVE or failure, we look for recommendations from our tree of knowledge that could have prevented them.

We need some CVEs

We wanted to understand whether IoT-related cybersecurity failures (these are called CVEs) could have been prevented by following the recommendations in our tree of knowledge. This was a little tricky to estimate because we don’t want to assume that humans never make mistakes. Did the cybersecurity vulnerability result because the engineers had the wrong plan (design mistake), or because they had the right plan but slipped up (implementation mistake)?

We think that CVEs originating from implementation slips/errors like buffer overflows can only be minimized, not completely avoided, even if following recommendations like conducting static analysis or using safer programming languages.
On the other hand, CVEs caused by design flaws, e.g. a device having hard-coded system passwords that provide shell access, can be prevented if a corresponding recommendation exists.

So we only wanted to look at design flaws. To this end, one author labeled each CVE independently as a design error or an implementation error, and then discussed the labels with another author until they agreed. We started with 158 CVEs and filtered out 61 implementation errors, leaving 97 design-error CVEs.

We need some failures in the news

This was easier — we used a keyword search in Google News and picked out articles that had lots of detail about a cybersecurity failure in an IoT system. We found 17 news articles (13 from Wired and 4 from The New York Times). Across the 17 articles, there were 33 sources of failure and 29 repair recommendations; most articles listed multiple sources of failure and multiple repair recommendations. These articles do not usually provide enough detail to distinguish design errors from implementation errors. On the other hand, they do include process-level errors, while CVEs only discuss technical aspects.

Results

Our approach is retrospective, so we want to see if a given CVE or news article describes causes that would be prevented if engineers followed our tree of knowledge. For each subject, we choose the category at each level of the taxonomy that is most related to it, traversing down until we reach recommendations (leaf nodes). This step helps us find the set of most relevant recommendations. If there is no such category, then there is no matching recommendation. Note that we matched only device recommendations to the design-mistake-CVEs, while both device and process recommendations are matched to news.

For CVEs, we were able to pair one or more recommendations from our tree of knowledge to 77 of the 97 design-error CVEs, or 79%.

Example of a covered CVE: Consider CVE-2021–33218 [16]: “…There are Hard-coded System Passwords that provide shell access”. This CVE could have been mitigated by the following recommendation: “Security parameters and passwords should not be hard-coded into source code or stored in a local file”.
Example of an uncovered CVE: Consider CVE-2021–27943 [17]: “The pairing procedure … is vulnerable to a brute-force attack (against only 10000 possibilities), allowing a threat actor to forcefully pair the device, leading to remote control of the TV settings and configurations”. While there were recommendations suggesting to prevent brute-force account logins, there were none for brute-force device pairing. Although not all IoT devices may necessarily require such pairing capabilities, it is indeed one way of connecting to networks, which is a capability by definition all IoT devices have.

Although our tree of knowledge performed well, the individual guidelines did not. Most of the guidelines (24/25) have individual CVE coverage of less than 40% — the best performer had 62% coverage.

For each of the news articles, we found at least one recommendation that would have mitigated or prevented all 17 of the studied articles.

Example of a covered article: An article from Wired described security vulnerabilities in two GPS tracking apps, iTrack and ProTrack. These apps are used to track fleets equipped with GPS trackers, such as GT06N by Concox. The article reports that these apps were exploited to gain unauthorized access to thousands of vehicles. The exploit exposed access to safety-critical functions of the vehicles, such as the remote engine power toggle. This article described two sources of failure: (1) A default password was enabled across all accounts, and (2) The GPS tracking device access was not isolated from a safety-critical function (engine on/off). We found 8 recommendations that could have mitigated these failure causes, coming from the Authentication and Network Segmentation categories. Two of these recommendations were, from Authentication: “Each device should have a unique default password”; and from Network Segmentation: “[Split] network elements into separate components to help isolate security breaches and minimize overall risk”.

Again, our tree of knowledge performed well, but each individual guideline did poorly. 19 of 25 had coverage <40%, and 6/25 had coverage of 0%. The best performance was under 60%.

Are we doomed? (“Discussion”)

Here’s what we learned:

IoT is too big for any one guideline: Our comprehensiveness analysis showed weaknesses in most of the IoT guidelines. Even after combining all guidelines into a unified taxonomy, our usefulness analysis shows that it does not mitigate 21% of the CVEs in our corpus. We suggest that IoT technologies are too diverse to be captured by general guidelines with a broad definition of IoT, which is even acknowledged by a guideline we reviewed. For example, six guidelines define IoT as anything that connects to the Internet or a network and interacts with the physical world and data. We found that individual guidelines were not comprehensive even for general topics such as passwords. We therefore suggest that guideline publishers develop more domain-specific guidelines with better comprehensiveness. Defining security requires context; security guidelines must define a context, specific and concrete enough to be actionable.
Guideline authors should help organizations prioritize: Security engineering is not free. Following security recommendations is hindered by its associated costs. Our unified tree of knowledge contains 958 unique recommendations — this is a lot for an organization to track, even after they go to the trouble of studying and integrating multiple guidelines. We suggest that guideline providers incorporate a priority score for their recommendations based on their importance. Only IOTSF-2021 and IIC-2019 provided metrics for which recommendations to follow in order to meet certain security statuses, thereby implying a prioritization. Clearer prioritization would facilitate a better cost-vs-security tradeoff for companies with constrained compliance budgets.
Legislators should not assume that these guidelines are good: As mentioned earlier, legislators are considering laws that regulate IoT engineering, and IoT cybersecurity in particular. The guidelines that policymakers review may determine the comprehensiveness of legislation! Given that only 4/25 of the guidelines we studied were reasonably comprehensive and that 19/25 guidelines had coverage less than 40% to prevent newsworthy failures, the current reality gives us pause.
Practitioners should consult multiple guidelines: Given that most guidelines are not comprehensive, practitioners should consult multiple guidelines depending on their security priority. Generally, the most comprehensive guide- lines are from government and industry non-profits, so practitioners can prioritize these over other publishers. Comprehensiveness is correlated with the number of recommendations and pages. Engineers must also assess which which security capabilities or processes would be most impactful to them, as guidelines currently do not help here.

Thanks for reading!

The full paper is here: https://arxiv.org/pdf/2310.01653. Let us know if you want the tree of knowledge — we’re trying to get it up on the web.