IMC '21: Proceedings of the 21st ACM Internet Measurement Conference

Full Citation in the ACM Digital Library

SESSION: Characterizing and measuring networks

Examination of WAN traffic characteristics in a large-scale data center network

Large cloud service providers have built an increasing number of geo-distributed data centers (DCs) connected by WAN to host their diverse services. While we have seen a large body of work on traffic engineering of WAN, the WAN traffic characteristics of production DC networks remain not well understood. In this paper, we report on the network traffic observed in Baidu's DC network (DCN) that consists of tens of geo-distributed DCs. Baidu hosts both traditional services like Web and Computing, as well as emerging services, such as Analytics, AI, and Map. We analyze WAN traffic characteristics in Baidu's DCN from the perspectives of traffic demands, traffic communication among DCs, and traffic characteristics of diverse services. Specifically, we focus on the disparity that might exist among different types of services. We also discuss the implications of our findings for WAN traffic engineering, fabric design, and service deployment.

AutoSens: inferring latency sensitivity of user activity through natural experiments

We consider the problem of inferring the latency sensitivity of user activity in the context of interactive online services. Our method relies on natural experiments, i.e., leveraging the variation in user-experienced latency seen in the normal course. At its core, our technique, dubbed AutoSens, compares the distribution of latency of the user actions actually performed with the underlying distribution of latency independent of whether users choose to perform any action. This then yields a normalized user preference based on latency. We discuss ways of mitigating various confounders and then present our findings in the context of a large online email service, Microsoft Outlook Web Access (OWA).

Federated infrastructure: usage, patterns, and insights from "the people's network"

In this paper, we provide the first broad measurement study of the operation, adoption, performance, and efficacy of Helium. The Helium network aims to provide low-power, wide-area network wireless coverage for Internet of Things-class devices. In contrast to traditional infrastructure, "hotspots" (base stations) are owned and operated by individuals who are paid by the network for providing coverage and are paid directly by users for ferrying data.

As of May, 2021, Helium has over 40,000 active hotspots with 1,000 new hotspots coming online every day. This deployment is decentralized - 84% of users own at most three hotspots. Some support infrastructure remains highly centralized, however, with over 99% of data traffic routed through one cloud endpoint and multiple cities in which all hotspots rely on one ISP for backhaul. Helium is largely speculative today with more hotspot activity than user activity. Crowdsourced, incentive-guided infrastructure deployment largely works but shows evidence of gamification and apathy. As Helium lacks clear, radio-oriented coverage maps, we develop and test coverage models based on network incentives. Finally, empirical testing with IoT devices finds basic success, but uncovers numerous reliability issues.

SESSION: Cloud

From cloud to edge: a first look at public edge platforms

Public edge platforms have drawn increasing attention from both academia and industry. In this study, we perform a first-of-its-kind measurement study on a leading public edge platform that has been densely deployed in China. Based on this measurement, we quantitatively answer two critical yet unexplored questions. First, from end users' perspective, what is the performance of commodity edge platforms compared to cloud, in terms of the end-to-end network delay, throughput, and the application QoE. Second, from the edge service provider's perspective, how are the edge workloads different from cloud, in terms of their VM subscription, monetary cost, and resource usage. Our study quantitatively reveals the status quo of today's public edge platforms, and provides crucial insights towards developing and operating future edge services.

Measuring the network performance of Google cloud platform

Public cloud platforms are vital in supporting online applications for remote learning and telecommuting during the COVID-19 pandemic. The network performance between cloud regions and access networks directly impacts application performance and users' quality of experience (QoE). However, the location and network connectivity of vantage points often limits the visibility of edge-based measurement platforms (e.g., RIPE Atlas).

We designed and implemented the CLoud-based Applications Speed Platform (CLASP) to measure performance to various networks from virtual machines in cloud regions with speed test servers that have been widely deployed on the Internet. In our five-month longitudinal measurements in Google Cloud Platform (GCP), we found that 30-70% of ISPs we measured showed severe throughput degradation from the peak throughput of the day.

Cloudy with a chance of short RTTs: analyzing cloud connectivity in the internet

Cloud computing has seen continuous growth over the last decade. The recent rise in popularity of next-generation applications brings forth the question: "Can current cloud infrastructure support the low latency requirements of such apps?" Specifically, the interplay of wireless last-mile and investments of cloud operators in setting up direct peering agreements with ISPs globally to current cloud reachability and latency has remained largely unexplored.

This paper investigates the state of end-user to cloud connectivity over wireless media through extensive measurements over six months. We leverage 115,000 wireless probes on the Speed-checker platform and 195 cloud regions from 9 well-established cloud providers. We evaluate the suitability of current cloud infrastructure to meet the needs of emerging applications and highlight various hindering pressure points. We also compare our results to a previous study over RIPE Atlas. Our key findings are: (i) the most impact on latency comes from the geographical distance to the datacenter; (ii) the choice of a measurement platform can significantly influence the results; (iii) wireless last-mile access contributes significantly to the overall latency, almost surpassing the impact of the geographical distance in many cases. We also observe that cloud providers with their own private network backbone and direct peering agreements with serving ISPs offer noticeable improvements in latency, especially in its consistency over longer distances.

SESSION: Measurement and modeling

Unbiased experiments in congested networks

When developing a new networking algorithm, it is established practice to run a randomized experiment, or A/B test, to evaluate its performance. In an A/B test, traffic is randomly allocated between a treatment group, which uses the new algorithm, and a control group, which uses the existing algorithm. However, because networks are congested, both treatment and control traffic compete against each other for resources in a way that biases the outcome of these tests. This bias can have a surprisingly large effect; for example, in lab A/B tests with two widely used congestion control algorithms, the treatment appeared to deliver 150% higher throughput when used by a few flows, and 75% lower throughput when used by most flows---despite the fact that the two algorithms have identical throughput when used by all traffic.

Beyond the lab, we show that A/B tests can also be biased at scale. In an experiment run in cooperation with Netflix, estimates from A/B tests mistake the direction of change of some metrics, miss changes in other metrics, and overestimate the size of effects. We propose alternative experiment designs, previously used in online platforms, to more accurately evaluate new algorithms and allow experimenters to better understand the impact of congestion on their tests.

Revisiting TCP congestion control throughput models & fairness properties at scale

Much of our understanding of congestion control algorithm (CCA) throughput and fairness is derived from models and measurements that (implicitly) assume congestion occurs in the last mile. That is, these studies evaluated CCAs in "small scale" edge settings at the scale of tens of flows and up to a few hundred Mbps bandwidths. However, recent measurements show that congestion can also occur at the core of the Internet on inter-provider links, where thousands of flows share high bandwidth links. Hence, a natural question is: Does our understanding of CCA throughput and fairness continue to hold at the scale found in the core of the Internet, with 1000s of flows and Gbps bandwidths?

Our preliminary experimental study finds that some expectations derived in the edge setting do not hold at scale. For example, using loss rate as a parameter to the Mathis model to estimate TCP NewReno throughput works well in edge settings, but does not provide accurate throughput estimates when thousands of flows compete at high bandwidths. In addition, BBR - which achieves good fairness at the edge when competing solely with other BBR flows - can become very unfair to other BBR flows at the scale of the core of the Internet. In this paper, we discuss these results and others, as well as key implications for future CCA analysis and evaluation.

Precise error estimation for sketch-based flow measurement

As a class of approximate measurement approaches, sketching algorithms have significantly improved the estimation of network flow information using limited resources. While these algorithms enjoy sound error-bound analysis under worst-case scenarios, their actual errors can vary significantly with the incoming flow distribution, making their traditional error bounds too "loose" to be useful in practice. In this paper, we propose a simple yet rigorous error estimation method to more precisely analyze the errors for posterior sketch queries by leveraging the knowledge from the sketch counters. This approach will enable network operators to understand how accurate the current measurements are and make appropriate decisions accordingly (e.g., identify potential heavy users or answer "what-if" questions to better provision resources). Theoretical analysis and trace-driven experiments show that our estimated bounds on sketch errors are much tighter than previous ones and match the actual error bounds in most cases.

SESSION: Protocols

Who's got your mail?: characterizing mail service provider usage

E-mail has long been a critical component of daily communication and the core medium for modern business correspondence. While traditionally e-mail service was provisioned and implemented independently by each Internet-connected organization, increasingly this function has been outsourced to third-party services. As with many pieces of key communications infrastructure, such centralization can bring both economies of scale and shared failure risk. In this paper, we investigate this issue empirically --- providing a large-scale measurement and analysis of modern Internet e-mail service provisioning. We develop a reliable methodology to better map domains to mail service providers. We then use this approach to document the dominant and increasing role played by a handful of mail service providers and hosting companies over the past four years. Finally, we briefly explore the extent to which nationality (and hence legal jurisdiction) plays a role in such mail provisioning decisions.

Characterising the IETF through the lens of RFC deployment

Protocol standards, defined by the Internet Engineering Task Force (IETF), are crucial to the successful operation of the Internet. This paper presents a large-scale empirical study of IETF activities, with a focus on understanding collaborative activities, and how these underpin the publication of standards documents (RFCs). Using a unique dataset of 2.4 million emails, 8,711 RFCs and 4,512 authors, we examine the shifts and trends within the standards development process, showing how protocol complexity and time to produce standards has increased. With these observations in mind, we develop statistical models to understand the factors that lead to successful uptake and deployment of protocols, deriving insights to improve the standardisation process.

Third time's not a charm: exploiting SNMPv3 for router fingerprinting

In this paper, we show that adoption of the SNMPv3 network management protocol standard offers a unique---but likely unintended---opportunity for remotely fingerprinting network infrastructure in the wild. Specifically, by sending unsolicited and unauthenticated SNMPv3 requests, we obtain detailed information about the configuration and status of network devices including vendor, uptime, and the number of restarts. More importantly, the reply contains a persistent and strong identifier that allows for lightweight Internet-scale alias resolution and dual-stack association. By launching active Internet-wide SNMPv3 scan campaigns, we show that our technique can fingerprint more than 4.6 million devices of which around 350k are network routers. Not only is our technique lightweight and accurate, it is complementary to existing alias resolution, dual-stack inference, and device fingerprinting approaches. Our analysis not only provides fresh insights into the router deployment strategies of network operators worldwide, but also highlights potential vulnerabilities of SNMPv3 as currently deployed.

SESSION: IOT and TLS

IoTLS: understanding TLS usage in consumer IoT devices

Consumer IoT devices are becoming increasingly popular, with most leveraging TLS to provide connection security. In this work, we study a large number of TLS-enabled consumer IoT devices to shed light on how effectively they use TLS, in terms of establishing secure connections and correctly validating certificates, and how observed behavior changes over time. To this end, we gather more than two years of TLS network traffic from IoT devices, conduct active probing to test for vulnerabilities, and develop a novel blackbox technique for exploring the trusted root stores in IoT devices by exploiting a side-channel through TLS Alert Messages. We find a wide range of behaviors across devices, with some adopting best security practices but most being vulnerable in one or more of the following ways: use of old/insecure protocol versions and/or ciphersuites, lack of certificate validation, and poor maintenance of root stores. Specifically, we find that at least 8 IoT devices still include distrusted certificates in their root stores, 11/32 devices are vulnerable to TLS interception attacks, and that many devices fail to adopt modern protocol features over time. Our findings motivate the need for IoT manufacturers to audit, upgrade, and maintain their devices' TLS implementations in a consistent and uniform way that safeguards all of their network traffic.

Tracing your roots: exploring the TLS trust anchor ecosystem

Secure TLS server authentication depends on reliable trust anchors. The fault intolerant design of today's system---where a single compromised trust anchor can impersonate nearly all web entities---necessitates the careful assessment of each trust anchor found in a root store. In this work, we present a first look at the root store ecosystem that underlies the accelerating deployment of TLS. Our broad collection of TLS user agents, libraries, and operating systems reveals a surprisingly condensed root store ecosystem, with nearly all user agents ultimately deriving their roots from one of three root programs: Apple, Microsoft, and NSS. This inverted pyramid structure further magnifies the importance of judicious root store management by these foundational root programs.

Our analysis of root store management presents evidence of NSS's relative operational agility, transparency, and rigorous inclusion policies. Unsurprisingly, all derivative root stores in our dataset (e.g., Linuxes, Android, NodeJS) draw their roots from NSS. Despite this solid footing, derivative root stores display lax update routines and often customize their root stores in questionable ways. By scrutinizing these practices, we highlight two fundamental obstacles to existing NSS-derived root stores: rigid on-or-off trust and multi-purpose root stores. Taken together, our study highlights the concentration of root store trust in TLS server authentication, exposes questionable root management practices, and proposes improvements for future TLS root stores.

Open for hire: attack trends and misconfiguration pitfalls of IoT devices

Mirai and its variants have demonstrated the ease and devastating effects of exploiting vulnerable Internet of Things (IoT) devices. In many cases, the exploitation vector is not sophisticated; rather, adversaries exploit misconfigured devices (e.g. unauthenticated protocol settings or weak/default passwords). Our work aims at unveiling the state of IoT devices along with an exploration of the current attack landscape. In this paper, we perform an Internet-level IPv4 scan to unveil 1.8 million misconfigured IoT devices that may be exploited to perform large-scale attacks. These results are filtered to exclude a total of 8,192 devices that we identify as honeypots during our scan. To study current attack trends, we deploy six state-of-art IoT honeypots for a period of 1 month. We gather a total of 200, 209 attacks and investigate how adversaries leverage misconfigured IoT devices. In particular, we study different attack types, including denial of service, multistage attacks and attacks from infected online hosts. Furthermore, we analyze data from a /8 network telescope covering a total of 81 billion requests towards IoT protocols (e.g. CoAP, UPnP). Combining knowledge from the aforementioned experiments, we identify 11, 118 IP addresses (that are part of the detected misconfigured IoT devices) that attacked our honeypot setup and the network telescope.

SESSION: Video

Can you see me now?: a measurement study of Zoom, Webex, and Meet

Since the outbreak of the COVID-19 pandemic, videoconferencing has become the default mode of communication in our daily lives at homes, workplaces and schools, and it is likely to remain an important part of our lives in the post-pandemic world. Despite its significance, there has not been any systematic study characterizing the user-perceived performance of existing videoconferencing systems other than anecdotal reports. In this paper, we present a detailed measurement study that compares three major videoconferencing systems: Zoom, Webex and Google Meet. Our study is based on 48 hours' worth of more than 700 videoconferencing sessions, which were created with a mix of emulated videoconferencing clients deployed in the cloud, as well as real mobile devices running from a residential network. We find that the existing videoconferencing systems vary in terms of geographic scope, which in turns determines streaming lag experienced by users. We also observe that streaming rate can change under different conditions (e.g., number of users in a session, mobile device status, etc), which affects user-perceived streaming quality. Beyond these findings, our measurement methodology can enable reproducible benchmark analysis for any types of comparative or longitudinal study on available videoconferencing systems.

Measuring the performance and network utilization of popular video conferencing applications

Video conferencing applications (VCAs) have become a critical Internet application during the COVID-19 pandemic, as users worldwide now rely on them for work, school, and telehealth. It is thus increasingly important to understand the resource requirements of different VCAs and how they perform under different network conditions, including: how do application-layer performance metrics (e.g., resolution or frames per second) vary under different link capacity; how VCAs perform under temporary reductions in available capacity; how they compete with themselves, with each other, and with other applications; and how usage modality (e.g., gallery vs. speaker mode) affects utilization. We study three modern VCAs: Zoom, Google Meet, and Microsoft Teams. Answers to these questions differ substantially depending on VCA. First, the average utilization on an unconstrained link varies between 0.8 Mbps and 1.9 Mbps. Given temporary reduction of capacity, some VCAs can take as long as 50 seconds to recover to steady state. Differences in proprietary congestion control algorithms also result in unfair bandwidth allocations: in constrained bandwidth settings, one Zoom video conference can consume more than 75% of the available bandwidth when competing with another VCA (e.g., Meet, Teams). For some VCAs, client utilization can decrease as the number of participants increases, due to the reduced video resolution of each participant's video stream given a larger number of participants. Finally, one participant's viewing mode (e.g., pinning a speaker) can affect the upstream utilization of other participants.

The shape of view: an alert system for video viewership anomalies

Internet video providers rely on alerting workflows to identify and remedy incidents that can impact users (e.g., outages or buggy players). There is growing evidence for the need for viewership-based analytics---detecting and diagnosing incidents that manifest through changes in viewership patterns but not in other (e.g., QoE) metrics. However, both detection and diagnosis of viewership anomalies is challenging due to the contextual nature of anomalies, non-stationarity of viewership, and complex dependencies between the structure of events and how they impact different subpopulations of viewers. We present Proteas, an alerting framework for video viewership anomalies that tackles these challenges. Proteas builds on key spatiotemporal structural insights. First, across different sub-populations of viewers and days of the week, we find that the shape of the viewership curve remains invariant over multiple weeks, thus enabling anomaly detection. Second, we use the hierarchy of viewership groups to produce compact alerts. Finally, we find that common anomalies manifest with spatiotemporal signatures, which enables us to classify anomalies to produce actionable alerts. We evaluate Proteas using 3 months of real viewership data (including the onset of the COVID-19 pandemic) and show that Proteas is accurate with over 80% True Positive Rate, average precision of over 86% (i.e., few false positives) and doesn't miss any major events. In addition, we find that approximately half of Proteas's alerts refer to events not caught by other alerting workflows, thus adding value to operators' existing toolkit.

SESSION: HTTP and QUIC

It's over 9000: analyzing early QUIC deployments with the standardization on the horizon

After nearly five years and 34 draft versions, standardization of the new connection oriented transport protocol QUIC was finalized in May 2021. Designed as a fundamental network protocol with increased complexity due to the combination of functionality from multiple network stack layers, it has the potential to drastically influence the Internet ecosystem. Nevertheless, even in its early stages, the protocol attracted a variety of parties including large providers. Our study shows, that more than 2.3 M IPv4 and 300k IPv6 addresses support QUIC hosting more than 30 M domains.

Using our newly implemented stateful QUIC scanner (QScanner) we are able to successfully scan 26 M targets. We show that TLS as an integral part is similarly configured between QUIC and TLS over TCP stacks for the same target. In comparison, we identify 45 widely varying transport parameter configurations, e.g., with differences in the order of magnitudes for performance relevant parameters. Combining these configurations with HTTP Server header values and associated domains reveals two large edge deployments from Facebook and Google. Thus, while found QUIC deployments are located in 4667 autonomous systems, numerous of these are again operated by large providers.

In our experience, IETF QUIC already sees an advanced deployment status mainly driven by large providers. We argue that the current deployment state and diversity of existing implementations and seen configurations solidifies the importance of QUIC as a future research topic. In this work, we provide and evaluate a versatile tool set, to identify QUIC capable hosts and their properties.

Besides the stateful QScanner we present and analyze a newly implemented IPv4 and IPv6 ZMap module. We compare it to additional detection methods based on HTTP Alternative Service Header values from HTTP handshakes and DNS scans of the newly drafted HTTPS DNS resource record. While each method reveals unique deployments the latter would allow lightweight scans to detect QUIC capable targets but is drastically biased towards Cloudflare.

Web censorship measurements of HTTP/3 over QUIC

Web traffic censorship limits the free access to information, making it a global human rights issue. The introduction of HTTP/3 (HTTP over QUIC) yields promising expectations to counteract such interference, due to its novelty, build-in encryption, and faster connection establishment. To evaluate this hypothesis and analyze the current state of HTTP/3 blocking, we extended the open-source censorship measurement-tool OONI with an HTTP/3 module. Using an input list of possibly-blocked websites, real-world measurements with HTTPS and HTTP/3 were conducted in selected Autonomous Systems in China, Iran, India, and Kazakhstan. The presented evaluation assesses the different blocking methodologies employed for TCP/TLS versus the ones employed for QUIC. The results reveal dedicated UDP blocking in Iran and major IP blocklisting affecting QUIC in China and India.

QUICsand: quantifying QUIC reconnaissance scans and DoS flooding events

In this paper, we present first measurements of Internet background radiation originating from the emerging transport protocol QUIC. Our analysis is based on the UCSD network telescope, correlated with active measurements. We find that research projects dominate the QUIC scanning ecosystem but also discover traffic from non-benign sources. We argue that although QUIC has been carefully designed to restrict reflective amplification attacks, the QUIC handshake is prone to resource exhaustion attacks, similar to TCP SYN floods. We confirm this conjecture by showing how this attack vector is already exploited in multi-vector attacks: On average, the Internet is exposed to four QUIC floods per hour and half of these attacks occur concurrently with other common attack types such as TCP/ICMP floods.

Sharding and HTTP/2 connection reuse revisited: why are there still redundant connections?

HTTP/2 and HTTP/3 avoid concurrent connections but instead multiplex requests over a single connection. Besides enabling new features, this reduces overhead and enables fair bandwidth sharing. Redundant connections should hence be a story of the past with HTTP/2. However, they still exist, potentially hindering innovation and performance. Thus, we measure their spread and analyze their causes in this paper. We find that 36% - 72% of the 6.24 M HTTP Archive and 78% of the Alexa Top 100k websites cause Chromium-based webbrowsers to open superfluous connections. We mainly attribute these to domain sharding, despite HTTP/2 efforts to revert it, and DNS load balancing, but also the Fetch Standard.

SESSION: Blockchain and DeFi

TopoShot: uncovering Ethereum's network topology leveraging replacement transactions

Ethereum relies on a peer-to-peer overlay network to propagate information. The knowledge of Ethereum network topology holds the key to understanding Ethereum's security, availability, and user anonymity. However, an Ethereum network's topology is stored in individual nodes' internal routing tables, measuring which poses challenges and remains an open research problem in the existing literature.

This paper presents TopoShot, a new method uniquely repurposing Ethereum's transaction replacement/eviction policies for topology measurement. TopoShot can be configured to support Geth, Parity and other major Ethereum clients. As validated on local nodes, TopoShot achieves 100% measurement precision and high recall (88% ~ 97%). To efficiently measure the large Ethereum networks in the wild, we propose a non-trivial schedule to run pair-wise measurements in parallel. To enable ethical measurement on Ethereum mainnet, we propose workload-adaptive configurations of TopoShot to minimize the service interruption to target nodes/network.

We systematically measure a variety of Ethereum networks and obtain new knowledge including the full-network topology in major testnets (Ropsten, Rinkeby and Goerli) and critical sub-network topology in the mainnet. The results on testnets show interesting graph-theoretic properties, such as all testnets exhibit graph modularity significantly lower than random graphs, implying resilience to network partitions. The mainnet results show biased neighbor selection strategies adopted by critical Ethereum services such as mining pools and transaction relays, implying a degree of centralization in real Ethereum networks.

Selfish & opaque transaction ordering in the Bitcoin blockchain: the case for chain neutrality

Most public blockchain protocols, including the popular Bitcoin and Ethereum blockchains, do not formally specify the order in which miners should select transactions from the pool of pending (or uncommitted) transactions for inclusion in the blockchain. Over the years, informal conventions or "norms" for transaction ordering have, however, emerged via the use of shared software by miners, e.g., the GetBlockTemplate (GBT) mining protocol in Bitcoin Core. Today, a widely held view is that Bitcoin miners prioritize transactions based on their offered "transaction fee-per-byte." Bitcoin users are, consequently, encouraged to increase the fees to accelerate the commitment of their transactions, particularly during periods of congestion. In this paper, we audit the Bitcoin blockchain and present statistically significant evidence of mining pools deviating from the norms to accelerate the commitment of transactions for which they have (i) a selfish or vested interest, or (ii) received dark-fee payments via opaque (non-public) side-channels. As blockchains are increasingly being used as a record-keeping substrate for a variety of decentralized (financial technology) systems, our findings call for an urgent discussion on defining neutrality norms that miners must adhere to when ordering transactions in the chains. Finally, we make our data sets and scripts publicly available.

An empirical study of DeFi liquidations: incentives, risks, and instabilities

Financial speculators often seek to increase their potential gains with leverage. Debt is a popular form of leverage, and with over 39.88B USD of total value locked (TVL), the Decentralized Finance (DeFi) lending markets are thriving. Debts, however, entail the risks of liquidation, the process of selling the debt collateral at a discount to liquidators. Nevertheless, few quantitative insights are known about the existing liquidation mechanisms.

In this paper, to the best of our knowledge, we are the first to study the breadth of the borrowing and lending markets of the Ethereum DeFi ecosystem. We focus on Aave, Compound, MakerDAO, and dYdX, which collectively represent over 85% of the lending market on Ethereum. Given extensive liquidation data measurements and insights, we systematize the prevalent liquidation mechanisms and are the first to provide a methodology to compare them objectively. We find that the existing liquidation designs well incentivize liquidators but sell excessive amounts of discounted collateral at the borrowers' expenses. We measure various risks that liquidation participants are exposed to and quantify the instabilities of existing lending protocols. Moreover, we propose an optimal strategy that allows liquidators to increase their liquidation profit, which may aggravate the loss of borrowers.

SESSION: Performance and tools

Measuring DNS-over-HTTPS performance around the world

In recent years, DNS-over-HTTPS (DoH) has gained significant traction as a privacy-preserving alternative to unencrypted DNS. While several studies have measured DoH performance relative to traditional DNS and other encrypted DNS schemes, they are often incomplete, either conducting measurements from single countries or are unable to compare encrypted DNS to default client behavior. To expand on existing research, we use the BrightData proxy network to gather a dataset consisting of 22,052 unique clients across 224 countries and territories. Our data shows that the performance impact of a switch to DoH is mixed, with a median slowdown of 65ms per query across a 10-query connection, but with 28% of clients receiving a speedup over that same interval. We compare four public DoH providers, noting that Cloudflare excels in both DoH resolution time (265ms) and global points-of-presence (146). Furthermore, we analyze geographic differences between DoH and Do53 resolution times, and provide analysis on possible causes, finding that clients from countries with low Internet infrastructure investment are almost twice as likely to experience a slowdown when switching to DoH as those with high Internet infrastructure investment. We conclude with possible improvements to the DoH ecosystem. We hope that our findings can help to inform continuing DoH deployments.

TRAGEN: a synthetic trace generator for realistic cache simulations

Traces from production caching systems of users accessing content are seldom made available to the public as they are considered private and proprietary. The dearth of realistic trace data makes it difficult for system designers and researchers to test and validate new caching algorithms and architectures. To address this key problem, we present TRAGEN, a tool that can generate a synthetic trace that is "similar" to an original trace from the production system in the sense that the two traces would result in similar hit rates in a cache simulation. We validate TRAGEN by first proving that the synthetic trace is similar to the original trace for caches of arbitrary size when the Least-Recently-Used (LRU) policy is used. Next, we empirically validate the similarity of the synthetic trace and original trace for caches that use a broad set of commonly-used caching policies that include LRU, SLRU, FIFO, RANDOM, MARKERS, CLOCK and PLRU. For our empirical validation, we use original request traces drawn from four different traffic classes from the world's largest CDN, each trace consisting of hundreds of millions of requests for tens of millions of objects. TRAGEN is publicly available and can be used to generate synthetic traces that are similar to actual production traces for a number of traffic classes such as videos, social media, web, and software downloads. Since the synthetic traces are similar to the original production ones, cache simulations performed using the synthetic traces will yield similar results to what might be attained in a production setting, making TRAGEN a key tool for cache system developers and researchers.

HLISA: towards a more reliable measurement tool

Automated browsers (web bots) are an invaluable tool for studying the web. However, research has shown that web bots can be distinguished from regular browsers and that they may be served different content as a consequence. This undermines their utility as a measurement tool. So far, three methods have been used to detect web bots: browser fingerprint, order of site traversal, and aspects of page interaction.

While site traversal depends on the study being executed, the other two aspects can be controlled in a generic fashion. Whereas identifiability of web bot fingerprints has been studied in the past, how to alter the fingerprint has received less attention. In this paper, we study which method to alter the fingerprint incurs the least side effects. Secondly, we provide an initial investigation of how the interaction API of Selenium differs from human interaction. We incorporate the latter results into HLISA, an API that simulates interaction like humans. Finally, we discuss the conceptual arms race between simulators and detectors and find that conceptually, detecting HLISA requires modelling human interaction.

SESSION: DNS and attacks

Home is where the hijacking is: understanding DNS interception by residential routers

DNS interception --- when a user's DNS queries to a target resolver are intercepted en route and forwarded to a different resolver --- is a phenomenon of concern to both researchers and Internet users because of its implications for security and privacy. While the prevalence of DNS interception has received some attention, less is known about where in the network interception takes place. We introduce methods to identify where DNS interception occurs and who the interceptors may be. We identify when interception is performed before the query exits the ISP, and even when it is performed by the Customer Premises Equipment (CPE) in the user's own home. We believe that these techniques are vital in the light of the ongoing debate concerning the value of privacy-enhancing DNS transport.

TsuNAME: exploiting misconfiguration and vulnerability to DDoS DNS

TheInternet's Domain Name System (DNS) is a part of every web request and e-mail exchange, so DNS failures can be catastrophic, taking out major websites and services. This paper identifies TsuNAME, a vulnerability where some recursive resolvers can greatly amplify queries, potentially resulting in a denial-of-service to DNS services. TsuNAME is caused by cyclical dependencies in DNS records. A recursive resolver repeatedly follows these cycles, coupled with insufficient caching and application-level retries greatly amplify an initial query, stressing authoritative servers. Although issues with cyclic dependencies are not new, the scale of amplification has not previously been understood. We document real-world events in .nz (a country-level domain), where two misconfigured domains resulted in a 50% increase on overall traffic. We reproduce and document root causes of this event through experiments, and demostrate a 500× amplification factor. In response to our disclosure, several DNS software vendors have documented their mitigations, including Google public DNS and Cisco OpenDNS. For operators of authoritative DNS services we have developed and released CycleHunter, an open-source tool that detects cyclic dependencies and prevents attacks. We use CycleHunter to evaluate roughly 184 million domain names in 7 large, top-level domains (TLDs), finding 44 cyclic dependent NS records used by 1.4k domain names. The TsuNAME vulnerability is weaponizable, since an adversary can easily create cycles to attack the infrastructure of a parent domains. Documenting this threat and its solutions is an important step to ensuring it is fully addressed.

The far side of DNS amplification: tracing the DDoS attack ecosystem from the internet core

In this paper, we shed new light on the DNS amplification ecosystem, by studying complementary data sources, bolstered by orthogonal methodologies. First, we introduce a passive attack detection method for the Internet core, i.e., at Internet eXchange Points (IXPs). Surprisingly, IXPs and honeypots observe mostly disjoint sets of attacks: 96% of IXP-inferred attacks were invisible to a sizable honeypot platform. Second, we assess the effectiveness of observed DNS attacks by studying IXP traces jointly with diverse data from independent measurement infrastructures. We find that attackers efficiently detect new reflectors and purposefully rotate between them. At the same time, we reveal that attackers are a small step away from bringing about significantly higher amplification factors (14×). Third, we identify and fingerprint a major attack entity by studying patterns in attack traces. We show that this entity dominates the DNS amplification ecosystem by carrying out 59% of the attacks, and provide an in-depth analysis of its behavior over time. Finally, our results reveal that operators of various .gov names do not adhere to DNSSEC key rollover best practices, which exacerbates amplification potential. We can verifiably connect this operational behavior to misuses and attacker decision-making.

SESSION: Information and misinformation

Throttling Twitter: an emerging censorship technique in Russia

In March 2021, the Russian government started to throttle Twitter on a national level, marking the first ever use of large-scale, targeted throttling for censorship purposes. The slowdown was intended to pressure Twitter to comply with content removal requests from the Russian government.

In this paper, we take a first look at this emerging censorship technique. We work with local activists in Russia to detect and measure the throttling and reverse engineer the throttler from in-country vantage points. We find that the throttling is triggered by Twitter domains in the TLS SNI extension, and the throttling limits both upstream and downstream traffic to a value between 130 kbps and 150 kbps by dropping packets that exceed this rate. We also find that the throttling devices appear to be located close to end-users, and that the throttling behaviors are consistent across different ISPs suggesting that they are centrally coordinated. Notably, this deployment marks a departure from Russia's previously decentralized model to a more centralized one that gives significant power to the authority to impose desired restrictions unilaterally. Russia's throttling of Twitter serves as a wake-up call to censorship researchers, and we hope to encourage future work in detecting and circumventing this emerging censorship technique.

Understanding engagement with U.S. (mis)information news sources on Facebook

Facebook has become an important platform for news publishers to promote their work and engage with their readers. Some news pages on Facebook have a reputation for consistently low factualness in their reporting, and there is concern that Facebook allows their misinformation to reach large audiences. To date, there is remarkably little empirical data about how often users "like," comment and share content from news pages on Facebook, how user engagement compares between sources that have a reputation for misinformation and those that do not, and how the political leaning of the source impacts the equation. In this work, we propose a methodology to generate a list of news publishers' official Facebook pages annotated with their partisanship and (mis)information status based on third-party evaluations, and collect engagement data for the 7.5 M posts that 2,551 U.S. news publishers made on their pages during the 2020 U.S. presidential election. We propose three metrics to study engagement (1) across the Facebook news ecosystem, (2) between (mis)information providers and their audiences, and (3) with individual pieces of content from (mis)information providers. Our results show that misinformation news sources receive widespread engagement on Facebook, accounting for 68.1% of all engagement with far-right news providers, followed by 37.7 % on the far left. Individual posts from misinformation news providers receive consistently higher median engagement than non-misinformation in every partisanship group. While most prevalent on the far right, misinformation appears to be an issue across the political spectrum.

Unique on Facebook: formulation and evidence of (nano)targeting individual users with non-PII data

The privacy of an individual is bounded by the ability of a third party to reveal their identity. Certain data items such as a passport ID or a mobile phone number may be used to uniquely identify a person. These are referred to as Personal Identifiable Information (PII) items. Previous literature has also reported that, in datasets including millions of users, a combination of several non-PII items (which alone are not enough to identify an individual) can uniquely identify an individual within the dataset. In this paper, we define a data-driven model to quantify the number of interests from a user that make them unique on Facebook. To the best of our knowledge, this represents the first study of individuals' uniqueness at the world population scale. Besides, users' interests are actionable non-PII items that can be used to define ad campaigns and deliver tailored ads to Facebook users. We run an experiment through 21 Facebook ad campaigns that target three of the authors of this paper to prove that, if an advertiser knows enough interests from a user, the Facebook Advertising Platform can be systematically exploited to deliver ads exclusively to a specific user. We refer to this practice as nanotargeting. Finally, we discuss the harmful risks associated with nanotargeting such as psychological persuasion, user manipulation, or blackmailing, and provide easily implementable countermeasures to preclude attacks based on nanotargeting campaigns on Facebook.

SESSION: COVID and current affairs

Locked-in during lock-down: undergraduate life on the internet in a pandemic

Governments around the world enacted stay-at-home orders in response to the COVID-19 pandemic, which changed many aspects of life, including how people interacted with the Internet. These draconian restrictions on in-person social interactions were perhaps most acutely felt by people living alone. We study the changes in network traffic of one such population, students remaining in the (single-occupancy) on-campus dormitories at a large residential educational institution during the onset and initial few months of the lock-down. Specifically, we analyze how students shifted their online work and leisure behaviors at an application level. Further, we segment the population into domestic and international students, and find that even within these two broad sub-populations, there are significant differences in Internet-based behavior. Our work provides a focused lens on pandemic Internet usage, examining both 1) a concentrated user population and 2) the differing impacts of a global pandemic on disparate sub-populations.

Networked systems as witnesses: association between content demand, human mobility and an infection spread

While non-pharmaceutical interventions (NPIs) such as stay-at-home, shelter-in-place, and school closures are considered the most effective ways to limit the spread of infectious diseases, their use is generally controversial given the political, ethical, and socioeconomic issues they raise. Part of the challenge is the non-obvious link between the level of compliance with such measures and their effectiveness.

In this paper, we argue that users' demand on networked services can act as a proxy for the social distancing behavior of communities, offering a new approach to evaluate these measures' effectiveness. We leverage the vantage point of one of the largest worldwide CDNs together with publicly available datasets of mobile users' behavior, to examine the relationship between changes in user demand on the CDN and different interventions including stay-at-home/shelter-in-place, mask mandates, and school closures. As networked systems become integral parts of our everyday lives, they can act as witnesses of our individual and collective actions. Our study illustrates the potential value of this new role.

Polls, clickbait, and commemorative $2 bills: problematic political advertising on news and media websites around the 2020 U.S. elections

Online advertising can be used to mislead, deceive, and manipulate Internet users, and political advertising is no exception. In this paper, we present a measurement study of online advertising around the 2020 United States elections, with a focus on identifying dark patterns and other potentially problematic content in political advertising. We scraped ad content on 745 news and media websites from six geographic locations in the U.S. from September 2020 to January 2021, collecting 1.4 million ads. We perform a systematic qualitative analysis of political content in these ads, as well as a quantitative analysis of the distribution of political ads on different types of websites. Our findings reveal the widespread use of problematic tactics in political ads, such as bait-and-switch ads formatted as opinion polls to entice users to click, the use of political controversy by content farms for clickbait, and the more frequent occurrence of political ads on highly partisan news websites. We make policy recommendations for online political advertising, including greater scrutiny of non-official political ads and comprehensive standards across advertising platforms.

SESSION: Web

Who you gonna call?: an empirical evaluation of website security.txt deployment

The security.txt proposed standard allows organizations to define how security researchers should disclose security issues. While it is still proceeding through the final stages of standardization, major online services have already adopted the standard (such as Google, Facebook, LinkedIn, and Github). In this work, we conduct an empirical investigation into how websites are deploying security.txt. We first monitor security.txt adoption over a 15-month period, identifying the level of deployment for top websites. We also characterize the information being provided through security.txt and issues present in the provided data. Ultimately, our analysis sheds light on how the security.txt mechanism manifests in practice and its implications for vulnerability reporting, particularly for large-scale automated notification campaigns.

Understanding the performance of webassembly applications

WebAssembly is the newest language to arrive on the web. It features a compact binary format, making it fast to be loaded and decoded. While WebAssembly is generally expected to be faster than JavaScript, there have been mixed results in proving which code is faster. Little research has been done to comprehend WebAssembly's performance benefit. In this paper, we conduct a systematic study to understand the performance of WebAssembly applications and compare it with JavaScript. Our measurements were performed on three sets of subject programs with diverse settings. Among others, our findings include: (1) WebAssembly compilers are commonly built atop LLVM, where their optimizations are not tailored for WebAssembly. We show that these optimizations often become ineffective for WebAssembly, leading to counter-intuitive results. (2) JIT optimization has a significant impact on JavaScript performance. However, no substantial performance increase was observed for WebAssembly with JIT. (3) The performance of WebAssembly and JavaScript varies substantially depending on the execution environment. (4) WebAssembly uses significantly more memory than its JavaScript counterparts. We hope that our findings can help WebAssembly tooling developers identify optimization opportunities. We also report the challenges encountered when compiling C benchmarks to WebAssembly and discuss our solutions.

Knock and talk: investigating local network communications on websites

Modern webpages are amalgamations of resources requested from various public Internet services. In principle though, webpages can also request resources from localhost and devices in the LAN, providing a degree of internal network access to external entities. Prior work has demonstrated how this access can be used for supporting web attacks, particularly for profiling and fingerprinting users.

In this paper, we empirically investigate if and how popular websites are interacting with their visitors' localhost and LAN resources, and compare the behavior observed to that from known malicious websites. We crawl and monitor the network requests made by the landing pages of domains in the Tranco top 100K domains as well as ~145K websites that are known to be related to malware, phishing, or abuse. For both popular and malicious sites, we detect over 100 sites in each category making requests to internal network destinations, including several highly-ranked sites (within the top 10K). Investigating these sites in-depth, we identify that over 40% of the ones from the top 100K list do so to conduct host profiling, purportedly for fraud and bot detection. We also uncover cases of legitimate native application communication and likely developer errors. For malicious sites, we do not detect cases of internal network attacks. Rather, we believe that the malicious sites generating local network traffic are compromised or cloned phishing websites and that the traffic results from the corresponding benign sites. We observe significantly more local activity when on the Windows OS, compared to Linux or Mac OS X, as well as extensive use of WebSockets, which are not bound by the Same-Origin Policy. Ultimately, our exploration provides empirical grounding on the localhost and LAN network activities of websites, revealing both intentional and unintentional behavior.

TrackerSift: untangling mixed tracking and functional web resources

Trackers have recently started to mix tracking and functional resources to circumvent privacy-enhancing content blocking tools. Such mixed web resources put content blockers in a bind: risk breaking legitimate functionality if they act and risk missing privacy-invasive advertising and tracking if they do not. In this paper, we propose TrackerSift to progressively classify and untangle mixed web resources (that combine tracking and legitimate functionality) at multiple granularities of analysis (domain, hostname, script, and method). Using TrackerSift, we conduct a large-scale measurement study of such mixed resources on 100K websites. We find that more than 17% domains, 48% hostnames, 6% scripts, and 9% methods observed in our crawls combine tracking and legitimate functionality. While mixed web resources are prevalent across all granularities, TrackerSift is able to attribute 98% of the script-initiated network requests to either tracking or functional resources at the finest method-level granularity. Our analysis shows that mixed resources at different granularities are typically served from CDNs or as in-lined and bundled scripts, and that blocking them indeed results in breakage of legitimate functionality. Our results highlight opportunities for finer-grained content blocking to remove mixed resources without breaking legitimate functionality.

SESSION: Autonomous systems and BGP

AS-level BGP community usage classification

BGP communities are a popular mechanism used by network operators for traffic engineering, blackholing, and to realize network policies and business strategies. In recent years, many research works have contributed to our understanding of how BGP communities are utilized, as well as how they can reveal secondary insights into real-world events such as outages and security attacks. However, one fundamental question remains unanswered: "Which ASes tag announcements with BGP communities and which remove communities in the announcements they receive?" A grounded understanding of where BGP communities are added or removed can help better model and predict BGP-based actions in the Internet and characterize the strategies of network operators.

In this paper we develop, validate, and share data from the first algorithm that can infer BGP community tagging and cleaning behavior at the AS-level. The algorithm is entirely passive and uses BGP update messages and snapshots, e.g. from public route collectors, as input. First, we quantify the correctness and accuracy of the algorithm in controlled experiments with simulated topologies. To validate in the wild, we announce prefixes with communities and confirm that more than 90% of the ASes that we classify behave as our algorithm predicts. Finally, we apply the algorithm to data from four sets of BGP collectors: RIPE, RouteViews, Isolario, and PCH. Tuned conservatively, our algorithm ascribes community tagging and cleaning behaviors to more than 13k ASes, the majority of which are large networks and providers. We make our algorithm and inferences available as a public resource to the BGP research community.

The parallel lives of autonomous systems: ASN allocations vs. BGP

Autonomous Systems (ASes) exist in two dimensions on the Internet: the administrative and the operational one. Regional Internet Registries (RIRs) rule the former, while BGP the latter. In this work, we reconstruct the lives of the ASes on both dimensions, performing a joint analysis that covers 17 years of data. For the administrative dimension, we leverage delegation files published by RIRs to report the daily status of Internet resources they allocate. For the operational dimension, we characterize the temporal activity of ASNs in the Internet control plane using BGP data collected by the RouteViews and RIPE RIS projects. We present a methodology to extract insights about AS life cycles, including dealing with pitfalls affecting authoritative public datasets. We then perform a joint analysis to establish the relationship (or lack of) between these two dimensions for all allocated ASNs and all ASNs visible in BGP. We characterize the usual behaviors, specific differences between RIRs and historical resources, as well as measure the discrepancies between the two "parallel" lives. We find discrepancies and misalignment that reveal useful insights, and we highlight through examples the potential of this new lens to help pinpoint malicious BGP activity and various types of misconfigurations. This study illuminates a largely unexplored aspect of the Internet global routing system and provides methods and data to support broader studies that relate to security, policy, and network management.

How biased is our validation (data) for AS relationships?

The business relationships between Autonomous Systems (ASes) can provide fundamental insights into the Internet's routing ecosystem. Throughout the last two decades, many works focused on how to improve the inference of those relationships. Yet, it has proven difficult to assemble extensive ground-truth data sets for validation. Therefore, more recent works rely entirely on relationships extracted from BGP communities to serve as "best-effort" ground-truth. In this paper, we highlight the shortcomings of this trend. We show that the best-effort validation data does not cover relationships between ASes within the Latin American (LACNIC) service region even though ~14% of all inferred relationships are from that region. We further show that the overall precision of 96-98 % for peering relationships achieved by three of the most prominent algorithms can drop by 14-25 % when considering only peering relationships between Tier-1 and other transit providers. Finally, we discuss potential ways to overcome the presented challenges in the future.

SESSION: Analyzing platforms and applications

A large-scale characterization of online incitements to harassment across platforms

Attack strategies used by online harassers have evolved over time to inflict increasing harm to their targets. In addition to scaling harassment through incitement and coordination, online communities that commonly engage in harassment are likely a source of "innovation" for harassment attack strategies. We use the incitements or calls to harassment posted by members of these communities as a lens through which to holistically measure and understand this ecosystem. We create a filtering pipeline to discover 14,679 incitements to harassment within four large-scale data sets of messages and posts that span multiple platforms.

Our approach studies the coordination itself, detecting inciting language, rather than individual attack types, to understand a broad range of harassment strategies. In particular, this approach allows us to create a taxonomy of attack strategies. We use this taxonomy to categorize the preferred approaches of coordinated attackers and the proportion of incitements for various types of harassment on different platforms. We find that over 50% of the incitements to harassment included calls to report the target to authorities or their respective platforms. Finally, we provide suggestions for actions and future research that could be performed by researchers, platforms, authorities, and anti-harassment groups.

RacketStore: measurements of ASO deception in Google play via mobile and app usage

Online app search optimization (ASO) platforms that provide bulk installs and fake reviews for paying app developers in order to fraudulently boost their search rank in app stores, were shown to employ diverse and complex strategies that successfully evade state-of-the-art detection methods. In this paper we introduce RacketStore, a platform to collect data from Android devices of participating ASO providers and regular users, on their interactions with apps which they install from the Google Play Store. We present measurements from a study of 943 installs of RacketStore on 803 unique devices controlled by ASO providers and regular users, that consists of 58,362,249 data snapshots collected from these devices, the 12,341 apps installed on them and their 110,511,637 Google Play reviews. We reveal significant differences between ASO providers and regular users in terms of the number and types of user accounts registered on their devices, the number of apps they review, and the intervals between the installation times of apps and their review times. We leverage these insights to introduce features that model the usage of apps and devices, and show that they can train supervised learning algorithms to detect paid app installs and fake reviews with an F1-measure of 99.72% (AUC above 0.99), and detect devices controlled by ASO providers with an F1-measure of 95.29% (AUC = 0.95). We discuss the costs associated with evading detection by our classifiers and also the potential for app stores to use our approach to detect ASO work with privacy.

Smart at what cost?: characterising mobile deep neural networks in the wild

With smartphones' omnipresence in people's pockets, Machine Learning (ML) on mobile is gaining traction as devices become more powerful. With applications ranging from visual filters to voice assistants, intelligence on mobile comes in many forms and facets. However, Deep Neural Network (DNN) inference remains a compute intensive workload, with devices struggling to support intelligence at the cost of responsiveness. On the one hand, there is significant research on reducing model runtime requirements and supporting deployment on embedded devices. On the other hand, the strive to maximise the accuracy of a task is supported by deeper and wider neural networks, making mobile deployment of state-of-the-art DNNs a moving target.

In this paper, we perform the first holistic study of DNN usage in the wild in an attempt to track deployed models and match how these run on widely deployed devices. To this end, we analyse over 16k of the most popular apps in the Google Play Store to characterise their DNN usage and performance across devices of different capabilities, both across tiers and generations. Simultaneously, we measure the models' energy footprint, as a core cost dimension of any mobile deployment. To streamline the process, we have developed gaugeNN, a tool that automates the deployment, measurement and analysis of DNNs on devices, with support for different frameworks and platforms. Results from our experience study paint the landscape of deep learning deployments on smartphones and indicate their popularity across app developers. Furthermore, our study shows the gap between bespoke techniques and real-world deployments and the need for optimised deployment of deep learning models in a highly dynamic and heterogeneous ecosystem.

SESSION: Autonomous systems 2 and name management

Risky BIZness: risks derived from registrar name management

In this paper, we explore a domain hijacking risk that is an accidental byproduct of undocumented operational practices between domain registrars and registries. We show how over the last nine years over 512K domains have been implicitly exposed to the risk of hijacking, affecting names in most popular TLDs (including .com and .net) as well as legacy TLDs with tight registration control (such as .edu and .gov). Moreover, we show that this weakness has been actively exploited by multiple parties who, over the years, have assumed control over 163K domains without having any ownership interest in those names. In addition to characterizing the nature and size of this problem, we also report on the efficacy of the remediation in response to our outreach with registrars.

Identifying ASes of state-owned internet operators

In this paper we present and apply a methodology to accurately identify state-owned Internet operators worldwide and their Autonomous System Numbers (ASNs). Obtaining an accurate dataset of ASNs of state-owned Internet operators enables studies where state ownership is an important dimension, including research related to Internet censorship and surveillance, cyber-warfare and international relations, ICT development and digital divide, critical infrastructure protection, and public policy. Our approach is based on a multi-stage, in-depth manual analysis of datasets that are highly diverse in nature. We find that each of these datasets contributes in different ways to the classification process and we identify limitations and shortcomings of these data sources. We obtain the first data set of this type, make it available to the research community together with the several lessons we learned in the process, and perform a preliminary analysis based on our data. We find that 53% (i.e., 123) of the world's countries are majority owners of Internet operators, highlighting that this is a widespread phenomenon. We also find and document the existence of subsidiaries of state-owned governments operating in foreign countries, an aspect that touches every continent and particularly affects Africa. We hope that this work and the associated data set will inspire and enable a broad set of Internet measurement studies and interdisciplinary research.

ASdb: a system for classifying owners of autonomous systems

While Autonomous Systems (ASes) are crucial for routing Internet traffic, organizations that own them are little understood. Regional Internet Registries (RIRs) inconsistently collect, release, and update basic AS organization information (e.g., website), and prior work provides only coarse-grained classification. Bootstrapping from RIR WHOIS data, we build ASdb, a system that uses data from established business intelligence databases and machine learning to accurately categorize ASes at scale. ASdb achieves 96% coverage of ASes, and 93% and 75% accuracy on 17 industry categories and 95 sub-categories, respectively. ASdb creates a more rich, accurate, comprehensive, and maintainable dataset cataloging AS-owning organizations. This system, and resulting dataset, will allow researchers to better understand who owns the Internet, and perform new forms of meaningful analysis and interpretation at scale.

SESSION: Characterization and measuring networks 2

Inferring regional access network topologies: methods and applications

Using a toolbox of Internet cartography methods, and new ways of applying them, we have undertaken a comprehensive active measurement-driven study of the topology of U.S. regional access ISPs. We used state-of-the-art approaches in various combinations to accommodate the geographic scope, scale, and architectural richness of U.S. regional access ISPs. In addition to vantage points from research platforms, we used public WiFi hotspots and public transit of mobile devices to acquire the visibility needed to thoroughly map access networks across regions. We observed many different approaches to aggregation and redundancy, across links, nodes, buildings, and at different levels of the hierarchy. One result is substantial disparity in latency from some Edge COs to their backbone COs, with implications for end users of cloud services. Our methods and results can inform future analysis of critical infrastructure, including resilience to disasters, persistence of the digital divide, and challenges for the future of 5G and edge computing.

Follow the scent: defeating IPv6 prefix rotation privacy

IPv6's large address space allows ample freedom for choosing and assigning addresses. To improve client privacy and resist IP-based tracking, standardized techniques leverage this large address space, including privacy extensions and provider prefix rotation. Ephemeral and dynamic IPv6 addresses confound not only tracking and traffic correlation attempts, but also traditional network measurements, logging, and defense mechanisms. We show that the intended anti-tracking capability of these widely deployed mechanisms is unwittingly subverted by edge routers using legacy IPv6 addressing schemes that embed unique identifiers.

We develop measurement techniques that exploit these legacy devices to make tracking such moving IPv6 clients feasible by combining intelligent search space reduction with modern high-speed active probing. Via an Internet-wide measurement campaign, we discover more than 9M affected edge routers and approximately 13k/48 prefixes employing prefix rotation in hundreds of ASes worldwide. We mount a six-week campaign to characterize the size and dynamics of these deployed IPv6 rotation pools, and demonstrate via a case study the ability to remotely track client address movements over time. We responsibly disclosed our findings to equipment manufacturers, at least one of which subsequently changed their default addressing logic.

Towards identifying networks with internet clients using public data

Does an outage impact any users? Can a geolocation database known to be good at locating users and bad at infrastructure be trusted for a particular prefix? Is a content-heavy network likely to peer with a particular network? For these questions and many more, knowing which prefixes contain Internet users aids in interpreting Internet analysis. However, existing datasets of Internet activity are out of date, unvalidated, based on privileged data, or too coarse. As a step towards identifying which IP prefixes contain users, we present multiple novel techniques to identify which IP prefixes host web clients without relying on privileged data. Our techniques identify client activity in ASes responsible for 98.8% of Microsoft CDN traffic and in prefixes responsible for 95.2% of Microsoft CDN traffic. Less than 1% of prefixes identified by our technique as active do not contact Microsoft at all. We present measurements of Internet usage worldwide and sketch future directions for extending the techniques to measure relative activity levels across prefixes.

SESSION: Extra

Corrigendum: cloud provider connectivity in the flat internet

This corrigendum corrects and extends our results on the benefit of peer locking in mitigating the propagation of route leaks on the Internet, originally published in [2]. The updated results show even higher benefits of peer locking than originally reported, and an extended analysis covering additional peer locking deployment scenarios shows partial deployments also yield significant reduction in propagation of leaked routes. The original paper can be found at https://dl.acm.org/doi/10.1145/3419394.3423613.