CODASPY '22: Proceedings of the Twelveth ACM Conference on Data and Application Security and Privacy

Full Citation in the ACM Digital Library

SESSION: Keynote Talk 1

Session details: Keynote Talk 1

How (Not) to Deploy Cryptography on the Internet

The core protocols in the Internet infrastructure play a central role in delivering packets to their destination. The inter-domain routing with BGP (Border Gateway Protocol) computes the correct paths in the global Internet, and DNS (Domain Name System) looks up the destination addresses. Due to their critical function they are often attacked: the adversaries redirect victims to malicious servers or networks by making them traverse incorrect routes or reach incorrect destinations, e.g., for cyber-espionage, for spam distribution, for theft of crypto-currency, for censorship [1, 4-6]. This results in relatively stealthy attacks which cannot be immediately detected and prevented [2, 3]. By the time the attacks are detected, damage was already done.

The frequent attacks along with the devastating damages that they incur, motivates the deployment of cryptographic defences to secure the Internet infrastructure. Multiple efforts are devoted to protecting the core Internet protocols with cryptographic mechanisms, BGP with RPKI and DNS with DNSSEC. Recently the deployment of these defences took off, and many networks and DNS servers in the Internet already adopted them. We review the deployed defences and show that the tradeoffs made by the operators or developers can be exploited to disable the cryptographic defences. We also provide mitigations and discuss challenges in their adoption.

SESSION: Keynote Talk 2

Session details: Keynote Talk 2

Predicting Asymptotic Behavior of Network Covert Channels: Experimental Results

The problem of covert communication via computer systems is almost as old as the problem of computer security itself. In the earliest years, covert communication was seen as mainly as a theoretical problem. But as computer systems have become more complex and ubiquitous, covert communication has begun to see practical use, particularly in the last two decades (see, e.g. Mazurcyk et al. in [MW19].) In this talk I will be reporting on the work we have been doing at NRL on evaluating the impact that existing research on the asymptotic behavior on covert channels has on embeddings in real-world channels.

SESSION: Session 1: Machine Learning and Security

Session details: Session 1: Machine Learning and Security

GINN: Fast GPU-TEE Based Integrity for Neural Network Training

Machine learning models based on Deep Neural Networks (DNNs) are increasingly deployed in a wide variety of applications, ranging from self-driving cars to COVID-19 diagnosis. To support the computational power necessary to train a DNN, cloud environments with dedicated Graphical Processing Unit (GPU) hardware support have emerged as critical infrastructure. However, there are many integrity challenges associated with outsourcing the computation to use GPU power, due to its inherent lack of safeguards to ensure computational integrity. Various approaches have been developed to address these challenges, building on trusted execution environments (TEE). Yet, no existing approach scales up to support realistic integrity-preserving DNN model training for heavy workloads (e.g., deep architectures and millions of training examples) without sustaining a significant performance hit. To mitigate the running time difference between pure TEE (i.e., full integrity) and pure GPU (i.e., no integrity) , we combine random verification of selected computation steps with systematic adjustments of DNN hyperparameters (e.g., a narrow gradient clipping range), which limits the attacker's ability to shift the model parameters arbitrarily. Experimental analysis shows that the new approach can achieve a 2X to 20X performance improvement over a pure TEE-based solution while guaranteeing an extremely high probability of integrity (e.g., 0.999) with respect to state-of-the-art DNN backdoor attacks.

EG-Booster: Explanation-Guided Booster of ML Evasion Attacks

The widespread usage of machine learning (ML) in a myriad of domains has raised questions about its trustworthiness in high-stakes environments. Part of the quest for trustworthy ML is assessing robustness to test-time adversarial examples. Inline with the trustworthy ML goal, a useful input to potentially aid robustness evaluation is feature-based explanations of model predictions. In this paper, we present a novel approach, called EG-Booster, that leverages techniques from explainable ML to guide adversarial example crafting for improved robustness evaluation of ML models. The key insight in EG-Booster is the use of feature-based explanations of model predictions to guide adversarial example crafting by adding consequential perturbations (likely to result in model evasion) and avoiding non-consequential perturbations (unlikely to contribute to evasion). EG-Booster is agnostic to model architecture, threat model, and supports diverse distance metrics used in the literature. We evaluate EG-Booster using image classification benchmark datasets: MNIST and CIFAR10. Our findings suggest that EG-Booster significantly improves the evasion rate of state-of-the-art attacks while performing a smaller number of perturbations. Through extensive experiments that cover four white-box and three black-box attacks, we demonstrate the effectiveness of EG-Booster against two undefended neural networks trained on MNIST and CIFAR10, and an adversarially-trained ResNet model trained on CIFAR10. Furthermore, we introduce a stability assessment metric and evaluate the reliability of our explanation-based attack boosting approach by tracking the similarity between the model's predictions across multiple runs of EG-Booster. Our results over 10 separate runs suggest that EG-Booster's output is stable across distinct runs. Combined with state-of-the-art attacks, we hope EG-Booster will be used towards improved robustness assessment of ML models against evasion attacks.

Leveraging Synthetic Data and PU Learning For Phishing Email Detection

Imbalanced data classification has always been one of the most challenging problems in data science especially in the cybersecurity field, where we observe an out-of-balance proportion between benign and phishing examples in security datasets. Even though there are many phishing detection methods in literature, most of them neglect the imbalanced nature of phishing email datasets. In this paper, we examine the imbalanced property by varying legitimate to phishing class ratios. We generate new synthetic instances using a generative adversarial network model for long sentences (LeakGAN) to balance out the training process and ameliorate its impact on classification. These synthetic instances are labeled by positive-unlabeled learning and added to the initial imbalanced training set. The resulting dataset is given to the Bidirectional Encoder Representations from Transformers (BERT) model for sequence classification. We compare several state-of-the-art methods from the literature against our approach, which achieves a high performance throughout all the imbalanced ratios reaching an F1-score of 99.6% for the most extreme imbalanced ratio and an F1-score of 99.8% for balanced cases.

DP-UTIL: Comprehensive Utility Analysis of Differential Privacy in Machine Learning

Differential Privacy (DP) has emerged as a rigorous formalism to quantify privacy protection provided by an algorithm that operates on privacy sensitive data. In machine learning (ML), DP has been employed to limit inference/disclosure of training examples. Prior work leveraged DP across the ML pipeline, albeit in isolation, often focusing on mechanisms such as gradient perturbation.

In this paper, we present DP-UTIL, a holistic utility analysis framework of DP across the ML pipeline with focus on input perturbation, objective perturbation, gradient perturbation, output perturbation, and prediction perturbation. Given an ML task on privacy-sensitive data, DP-UTIL enables a ML privacy practitioner to perform holistic comparative analysis on the impact of DP in these five perturbation spots, measured in terms of model utility loss, privacy leakage, and the number of truly revealed training samples.

We evaluate DP-UTIL over classification tasks on vision, medical, and financial datasets, using two representative learning algorithms (logistic regression and deep neural network) against membership inference attack as a case study attack. One of the highlights of our results is that prediction perturbation consistently achieves the lowest utility loss on all models across all datasets. In logistic regression models, objective perturbation results in lowest privacy leakage compared to other perturbation techniques. For deep neural networks, gradient perturbation results in lowest privacy leakage. Moreover, our results on true revealed records suggest that as privacy leakage increases a differentially private model reveals a greater number of member samples. Overall, our findings suggest that to make informed decisions as to the choice of perturbation mechanisms, a ML privacy practitioner needs to examine the dynamics among optimization techniques (convex vs. non-convex), number of classes, and privacy budget.

SESSION: Session 2: Privacy

Session details: Session 2: Privacy

Privacy-Preserving Maximum Matching on General Graphs and its Application to Enable Privacy-Preserving Kidney Exchange

To this day, there are still some countries where the exchange of kidneys between multiple incompatible patient-donor pairs is restricted by law. Typically, legal regulations in this context are put in place to prohibit coercion and manipulation in order to prevent a market for organ trade. Yet, in countries where kidney exchange is practiced, existing platforms to facilitate such exchanges generally lack sufficient privacy mechanisms. In this paper, we propose a privacy-preserving protocol for kidney exchange that not only addresses the privacy problem of existing platforms but also is geared to lead the way in overcoming legal issues in those countries where kidney exchange is still not practiced. In our approach, we use the concept of secret sharing to distribute the medical data of patients and donors among a set of computing peers in a privacy-preserving fashion. These computing peers then execute our new Secure Multi-Party Computation (SMPC) protocol among each other to determine an optimal set of kidney exchanges. As part of our new protocol, we devise a privacy-preserving solution to the maximum matching problem on general graphs. We have implemented the protocol in the SMPC benchmarking framework MP-SPDZ and provide a comprehensive performance evaluation. Furthermore, we analyze the practicality of our protocol when used in a dynamic setting where patients and donors arrive and depart over time) based on a data set from the United Network for Organ Sharing.

Towards Automated Content-based Photo Privacy Control in User-Centered Social Networks

A large number of photos shared online often contain private user information, which can cause serious privacy breaches when viewed by unauthorized users. Thus, there is a need for more efficient privacy control that requires automatic detection of users' private photos. However, the automatic detection of users' private photos is a challenging task, since different users may have different privacy concerns and a generalized one-size-fits-all approach for private photo detection would not be suitable for most users. User-specific detection of private photos should, therefore, be investigated. Furthermore, for effective privacy control, the exact sensitive regions in private photos need to be pinpointed, so that sensitive content can be protected via different privacy control methods. In this paper, we propose a novel system, AutoPri, to enable automatic and user-specific content-based photo privacy control in online social networks. We collect a large dataset of 31, 566 private and public photos from real-world users and present important observations on photo privacy concerns. Our system can automatically detect private photos in a user-specific manner using a detection model based on a multimodal variational autoencoder and pinpoint sensitive regions in private photos with an explainable deep learning-based approach. Our evaluations show that AutoPri can effectively determine user-specific private photos with high accuracy (94.32%) and pinpoint exact sensitive regions in them to enable effective privacy control in user-centered online social networks.

Genomic Data Sharing under Dependent Local Differential Privacy

Privacy-preserving genomic data sharing is prominent to increase the pace of genomic research, and hence to pave the way towards personalized genomic medicine. In this paper, we introduce (ε , T$)-dependent local differential privacy (LDP) for privacy-preserving sharing of correlated data and propose a genomic data sharing mechanism under this privacy definition. We first show that the original definition of LDP is not suitable for genomic data sharing, and then we propose a new mechanism to share genomic data. The proposed mechanism considers the correlations in data during data sharing, eliminates statistically unlikely data values beforehand, and adjusts the probability distributions for each shared data point accordingly. By doing so, we show that we can avoid an attacker from inferring the correct values of the shared data points by utilizing the correlations in the data. By adjusting the probability distributions of the shared states of each data point, we also improve the utility of shared data for the data collector. Furthermore, we develop a greedy algorithm that strategically identifies the processing order of the shared data points with the aim of maximizing the utility of the shared data. Our evaluation results on a real-life genomic dataset show the superiority of the proposed mechanism compared to the randomized response mechanism (a widely used technique to achieve LDP).

Prediction of Mobile App Privacy Preferences with User Profiles via Federated Learning

Permission managers in mobile devices allow users to control permissions requests, by granting of denying application's access to data and sensors. However, existing managers are ineffective at both protecting and warning users of the privacy risks of their permissions' decisions. Recent research proposes privacy protection mechanisms through user profiles to automate privacy decisions, taking personal privacy preferences into consideration. While promising, these proposals usually resort to a centralized server towards training the automation model, thus requiring users to trust this central entity. In this paper we propose a methodology to build privacy profiles and train neural networks for prediction of privacy decisions, while guaranteeing user privacy, even against a centralized server. Specifically, we resort to privacy-preserving clustering techniques towards building the privacy profiles, that is, the server computes the centroids (profiles) without access to the underlying data. Then, using federated learning, the model to predict permission decisions is learnt in a distributed fashion while all data remains locally in the users' devices. Experiments following our methodology show the feasibility of building a personalized and automated permission manager guaranteeing user privacy, while also reaching a performance comparable to the centralized state of the art, with an F1-score of 0.9.

SESSION: Session 3: Software Security

Session details: Session 3: Software Security

Building a Commit-level Dataset of Real-world Vulnerabilities

While CVE have become a de facto standard for publishing advisories on vulnerabilities, the state of current CVE databases is lackluster. Yet, CVE advisories are insufficient to bridge the gap with the vulnerability artifacts in the impacted program. Therefore, the community is lacking a public real-world vulnerabilities dataset providing such association. In this paper, we present a method restoring this missing link by analyzing the vulnerabilities from the AOSP, an aggregate of more than 1,800 projects. It is the perfect target for building a representative dataset of vulnerabilities, as it covers the full spectrum that may be encountered in a modern system where a variety of low-level and higher-level components interact. More specifically, our main contribution is a dataset of more than 1,900 vulnerabilities, associating generic metadata (e.g. vulnerability type, impact level) with their respective patches at the commit granularity (e.g. fix commit-id, affected files, source code language). Finally, we also augment this dataset by providing precompiled binaries for a subset of the vulnerabilities. These binaries open various data usage, both for binary only analysis and at the interface between source and binary. In addition of providing a common baseline benchmark, our dataset release supports the community for data-driven software security research.

ReSIL: Revivifying Function Signature Inference using Deep Learning with Domain-Specific Knowledge

Function signature recovery is important for binary analysis and security enhancement, such as bug finding and control-flow integrity enforcement. However, binary executables typically have crucial information vital for function signature recovery stripped off during compilation. To make things worse, recent studies show that many compiler optimization strategies further complicate the recovery of function signatures with intended violations to function calling conventions.

In this paper, we first perform a systematic study to quantify the extent to which compiler optimizations (negatively) impact the accuracy of existing deep learning techniques for function signature recovery. Our experiments show that a state-of-the-art deep learning technique has its accuracy dropped from 98.7% to 87.7% when training and testing optimized binaries. We further identify specific weaknesses in existing approaches and propose an enhanced deep learning approach named \sysname (\underlineRe vivifying Function \underlineS ignature \underlineI nference using Deep \underlineL earning) to incorporate compiler-optimization-specific domain knowledge into the learning process. Our experimental results show that \sysname significantly improves the accuracy and F1 score in inferring function signatures, e.g., with accuracy in inferring the number of arguments for callees compiled with optimization flag O1 from 84.8% to 92.67%. We also demonstrate security implications of \sysname in Control-Flow Integrity enforcement in stopping potential Counterfeit Object-Oriented Programming (COOP) attacks.

A Modular and Extensible Framework for Securing TLS

While being both extremely powerful and popular, TLS is a protocol that is hard to securely deploy. On the one hand, system administrators are required to grasp several security concepts to fully understand the impact of each option and avoid misconfigurations. On the other hand, app developers should use cryptographic libraries in a secure way avoiding dangerous default settings or other subtleties (e.g., padding or modes of operations). To help secure TLS, we propose a modular framework, extensible with new features and capable of streamlining the mitigation process of known and newly discovered TLS attacks even for non-expert users.

Recovering Structure of Input of a Binary Program

This paper presents an algorithm to automatically infer a recursive state machine (RSM) describing the space of acceptable input of an arbitrary binary program by executing that program with one or more valid inputs. The algorithm automatically identifies atomic fields of fixed and variable lengths and syntactic elements, such as separators and terminators, and generalizes them into regular expression tokens. It constructs an RSM of tokens to represent structures such as arrays and records. Further, it constructs nested states in the RSM to represent complex, nested structures. The RSM may serve as an independent parser for the program's acceptable inputs. A controlled experiment was performed using a prototype implementation of the algorithm and a set of synthetic programs with input formats that mimic characteristics of conventional data formats, such as CSV, PNG, PE file, etc. The experiment demonstrates that the inferred RSMs correctly identify the syntactic elements and their grammatical orderings. When used as generators, the RSMs also produced syntactically correct data for the formats that use terminators to end a sequence of elements, but not so when the format maintains a count of elements for variable length fields instead of a terminator. Experiments with real-world programs produced similar results.

Hardening with Scapolite: A DevOps-based Approach for Improved Authoring and Testing of Security-Configuration Guides in Large-Scale Organizations

Security Hardening is the process of configuring IT systems to ensure the security of the systems' components and data they process or store. In many cases, so-called security-configuration guides are used as a basis for security hardening. These guides describe secure configuration settings for components such as operating systems and standard applications. Rigorous testing of security-configuration guides and automated mechanisms for their implementation and validation are necessary since erroneous implementations or checks of hardening guides may severely impact systems' security and functionality. At Siemens, centrally maintained security-configuration guides carry machine-readable information specifying both the implementation and validation of each required configuration step. The guides are maintained within git repositories; automated pipelines generate the artifacts for implementation and checking, e.g., PowerShell scripts for Windows, and carry out testing of these artifacts on AWS images. This paper describes our experiences with our DevOps-inspired approach for authoring, maintaining, and testing security-configuration guides. We want to share these experiences to help other organizations with their security hardening and increase their systems' security.

SESSION: Session 4: Access Control and Privacy

Session details: Session 4: Access Control and Privacy

Toward Deep Learning Based Access Control

A common trait of current access control approaches is the challenging need to engineer abstract and intuitive access control models. This entails designing access control information in the form of roles (RBAC), attributes (ABAC), or relationships (ReBAC) as the case may be, and subsequently, designing access control rules. This framework has its benefits but has significant limitations in the context of modern systems that are dynamic, complex, and large-scale, due to which it is difficult to maintain an accurate access control state in the system for a human administrator. This paper proposes Deep Learning Based Access Control (DLBAC) by leveraging significant advances in deep learning technology as a potential solution to this problem. We envision that DLBAC could complement and, in the long-term, has the potential to even replace, classical access control models with a neural network that reduces the burden of access control model engineering and updates. Without loss of generality, we conduct a thorough investigation of a candidate DLBAC model, called DLBAC_alpha, using both real-world and synthetic datasets. We demonstrate the feasibility of the proposed approach by addressing issues related to accuracy, generalization, and explainability. We also discuss challenges and future research directions.

ProSPEC: Proactive Security Policy Enforcement for Containers

By providing lightweight and portable support for cloud native applications, container environments have gained significant momentum lately. A container orchestrator such as Kubernetes can enable the automatic deployment and maintenance of a large number of containerized applications. However, due to its critical role, a container orchestrator also attracts a wide range of security threats exploiting misconfigurations or implementation flaws. Moreover, enforcing security policies at runtime against such security threats becomes far more challenging, as the large scale of container environments implies high complexity, while the high dynamicity demands a short response time. In this paper, we tackle this key security challenge to container environments through a proactive approach, namely, ProSPEC. Our approach leverages learning-based prediction to conduct the computationally intensive steps (e.g., security verification) in advance, while keeping the runtime steps (e.g., policy enforcement) lightweight. Consequently, ProSPEC can ensure a practical response time (e.g., less than 10 ms in contrast to 600 ms with one of the most popular existing approaches) for large container environments (up to 800 Pods).

NEUTRON: A Graph-based Pipeline for Zero-trust Network Architectures

The Zero-Trust Architecture (ZTA) security paradigm deploys comprehensive user- and resource-aware defenses both at the network's perimeter and inside the network. However, deploying a ZTA approach requires specifying and managing a large, network spanning set of fine-grained security policies, which will increase administrators' workloads and increase the chance of errors. This paper presents the design and prototype implementation of the NEUTRON policy framework, which provides an automated end-to-end policy pipeline, specification, management, testing, and deployment. NEUTRON uses a flexible, graph-based approach to specify and share complex, fine-grained network security policies. NEUTRON provides a software structure so that policy patterns may be easily shared between organizations, reducing the burden of creating the policy. Administrators assemble the software for their site, and the NEUTRON policy generator creates the entire network-wide security policy. Treating the security policy like software also allows new approaches to policy verification and policy change impact analysis. Thus we designed the Security Policy Regression Tool (SPRT), which uses our novelRuleset Aggregation Algorithm to perform scalable verification of the network-wide security policy across the network model. Moreover, our graph-based framework allows for efficient computation and visualization of the policy change impact.

Landmark Privacy: Configurable Differential Privacy Protection for Time Series

Several application domains, including healthcare, smart building, and traffic monitoring, require the continuous publishing of data, also known as time series. In many cases, time series are geotagged data containing sensitive personal details, and thus their processing entails privacy concerns. Several definitions have been proposed that allow for privacy preservation while processing and publishing such data, with differential privacy being the most prominent one. Most existing differential privacy schemes protect either a single timestamp (event-level), or all the data per user (user-level), or per window (w-event-level) in the time series, considering however all timestamps as equally significant. In this work, we define a novel configurable privacy notion, landmark privacy, which differentiates events into significant (landmarks) and regular, achieving to provide better data utility while preserving adequately the privacy of each event. We propose three schemes that guarantee landmark privacy, and design an appropriate dummy landmark selection module to better protect the actual temporal position of the landmarks. Finally, we provide a thorough experimental study where (i) we study the behavior of our framework on real and synthetic data, with and without temporal correlation, and (ii) demonstrate that landmark privacy achieves generally better data utility in the presence of landmarks than user-level privacy.

SESSION: Session 5: IoT Security

Session details: Session 5: IoT Security

Securing Smart Grids Through an Incentive Mechanism for Blockchain-Based Data Sharing

Smart grids leverage the data collected from smart meters to make important operational decisions. However, they are vulnerable to False Data Injection (FDI) attacks in which an attacker manipulates meter data to disrupt the grid operations. Existing works on FDI are based on a simple threat model in which a single grid operator has access to all the data, and only some meters can be compromised.

Our goal is to secure smart grids against FDI under a realistic threat model. To this end, we present a threat model in which there are multiple operators, each with a partial view of the grid, and each can be fully compromised. An effective defense against FDI in this setting is to share data between the operators. However, the main challenge here is to incentivize data sharing. We address this by proposing an incentive mechanism that rewards operators for uploading data, but penalizes them if the data is missing or anomalous. We derive formal conditions under which our incentive mechanism is provably secure against operators who withhold or distort measurement data for profit. We then implement the data sharing solution on a private blockchain, introducing several optimizations that overcome the inherent performance limitations of the blockchain. Finally, we conduct an experimental evaluation that demonstrates that our implementation has practical performance.

Security Analysis of IoT Frameworks using Static Taint Analysis

Internet of Things (IoT) frameworks are designed to facilitate provisioning and secure operation of IoT devices. A typical IoT framework consists of various software layers and components including third-party libraries, communication protocol stacks, the Hardware Abstraction Layer (HAL), the kernel, and the apps. IoT frameworks have implicit data flows in addition to explicit data flows due to their event-driven nature. In this paper, we present a static taint tracking framework, IFLOW, that facilitates the security analysis of system code by enabling specification of data-flow queries that can refer to a variety of software entities. We have formulated various security relevant data-flow queries and solved them using IFLOW to analyze the security of several popular IoT frameworks: Amazon FreeRTOS SDK, SmartThings SDK, and Google IoT SDK. Our results show that IFLOW can both detect real bugs and localize security analysis to the relevant components of IoT frameworks.

Toward a Resilient Key Exchange Protocol for IoT

In order for resource-constrained Internet of Things (IoT) devices to set up secure communication channels to exchange confidential messages, Symmetric Key Cryptography (SKC) is usually preferred to resource-intensive Public Key Cryptography (PKC). At the core of setting up a secure channel is secure key exchange, the process of two IoT devices securely agreeing on a common session key before they communicate. While compared to using PKC, key exchange using SKC is more resource-aware for IoT environments, it requires either a pre-shared secret or trusted intermediaries between the two devices; neither assumption is realistic in IoT. In this paper, we relax the above assumptions and introduce a new intermediary-based secure key exchange protocol for IoT devices that do not support PKC. With a design that is lightweight and deployable in IoT, our protocol fundamentally departs from existing intermediary-based solutions in that (1) it leverages intermediary parties that can be malicious and (2) it can detect malicious intermediary parties. We provide a formal proof that our protocol is secure and conduct a theoretical analysis to show the failure probability of our protocol is easily negligible with a reasonable setup and its malicious helper detection probability can be 1.0 even when a malicious helper only tampers a small number of messages. We implemented our protocol and our experimental results show that our protocol significantly improves the computation time and energy cost. Dependent on the IoT device type (Raspberry Pi, Arduino Due, or Sam D21) and the PKC algorithms to compare against (ECDH, DH, or RSA), our protocol is 2.3 to 1591 times faster on one of the two devices and 0.7 to 4.67 times faster on the other.

A TOCTOU Attack on DICE Attestation

A major security challenge for modern IoT deployments is to ensure that the devices run legitimate firmware free from malware. This challenge can be addressed through a security primitive called attestation which allows a remote backend to verify the firmware integrity of the devices it manages. In order to accelerate broad attestation adoption in the IoT domain the Trusted Computing Group (TCG) has introduced the Device Identifier Composition Engine (DICE) series of specifications. DICE is a hardware-software architecture for constrained, e.g., microcontroller-based IoT devices where the firmware is divided into successively executed layers. In this paper, we demonstrate a remote Time-Of-Check Time-Of-Use (TOCTOU) attack on DICE-based attestation. We demonstrate that it is possible to install persistent malware in the flash memory of a constrained microcontroller that cannot be detected through DICE-based attestation. The main idea of our attack is to install malware during runtime of application logic in the top firmware layer. The malware reads the valid attestation key and stores it on the device's flash memory. After reboot, the malware uses the previously stored key for all subsequent attestations to the backend. We conduct the installation of malware and copying of the key through Return-Oriented Programming (ROP). As a platform for our demonstration, we use the Cortex-M-based nRF52840 microcontroller. We provide a discussion of several possible countermeasures which can mitigate the shortcomings of the DICE specifications.

SESSION: Session 6: Authentication and Device Security

Session details: Session 6: Authentication and Device Security

Shared Multi-Keyboard and Bilingual Datasets to Support Keystroke Dynamics Research

Keystroke dynamics has been shown to be a promising method for user authentication based on a user's typing rhythms. Over the years, it has seen increasing applications such as in preventing transaction fraud, account takeovers, and identity theft. However, due to the variable nature of keystroke dynamics, a user's typing patterns may vary on a different keyboard or in a different keyboard language setting, which may affect the system accuracy. In other words, an algorithm modeled with data collected using a mechanical keyboard may perform significantly differently when tested with an ergonomic keyboard. Similarly, an algorithm modeled with data collected in one language may perform significantly differently when tested with another language. Hence, there is a need to study the impact of multiple keyboards and multiple languages on keystroke dynamics performance. This motivated us to develop two free-text keystroke dynamics datasets. The first is a multi-keyboard keystroke dataset comprising of four (4) physical keyboards - mechanical, ergonomic, membrane, and laptop keyboards - and the second is a bilingual keystroke dataset in both English and Chinese languages. Data were collected from a total of 86 participants using a non-intrusive web-based keylogger in a semi-controlled setting. To the best of our knowledge, these are the first multi-keyboard and bilingual keystroke datasets, as well as the data collection software, to be made publicly available for research purposes. The usefulness of our datasets was demonstrated by evaluating the performance of two state-of-the-art free-text algorithms.

Leveraging Disentangled Representations to Improve Vision-Based Keystroke Inference Attacks Under Low Data Constraints

Keystroke inference attacks are a form of side-channel attacks in which an attacker leverages various techniques to recover a user's keystrokes as she inputs information into some display (e.g., while sending a text message or entering her pin). Typically, these attacks leverage machine learning approaches, but assessing the realism of the threat space has lagged behind the pace of machine learning advancements, due in-part, to the challenges in curating large real-life datasets. We aim to overcome the challenge of having limited number of real data by introducing a video domain adaptation technique that is able to leverage synthetic data through supervised disentangled learning. Specifically, for a given domain, we decompose the observed data into two factors of variation: Style and Content. Doing so provides four learned representations: real-life style, synthetic style, real-life content and synthetic content. Then, we combine them into feature representations from all combinations of style-content pairings across domains, and train a model on these combined representations to classify the content (i.e., labels) of a given datapoint in the style of another domain. We evaluate our method on real-life data using a variety of metrics to quantify the amount of information an attacker is able to recover. We show that our method prevents our model from overfitting to a small real-life training set, indicating that our method is an effective form of data augmentation, thereby making keystroke inference attacks more practical.

Cache Shaping: An Effective Defense Against Cache-Based Website Fingerprinting

Cache-based website fingerprinting attacks can infer which website a user visits by measuring CPU cache activities. Studies have shown that an attacker can achieve high accuracy with a low sampling rate by monitoring cache occupancy of the entire Last Level Cache. Although a defense has been proposed, it was not effective when an attacker adapts and retrains a classifier with defended data. In this paper, we propose a new defense, referred to as cache shaping, to preserve user privacy against cache-based website fingerprinting attacks. Our proposed defense produces dummy cache activities by introducing dummy I/O operations and implementing with multiple processes, which hides fingerprints when a user visits websites. Our experimental results over large-scale datasets collected from multiple web browsers and operating systems show that our defense remains effective even if an attacker retrains a classifier with defended cache traces. We demonstrate the efficacy of our defense in the closed-world setting and the open-world setting by leveraging deep neural networks as classifiers.

Quantifying the Risk of Wormhole Attacks on Bluetooth Contact Tracing

Digital contact tracing is a valuable tool for containing the spread of infectious diseases. During the COVID-19 pandemic, different systems have been developed that enable decentralized contact tracing on mobile devices. Several of the systems provide strong security and privacy guarantees. However, they also inherit weaknesses of the underlying wireless protocols. In particular, systems using Bluetooth LE beacons are vulnerable to so-called wormhole attacks, in which an attacker tunnels the beacons between different locations and creates false contacts between individuals. While this vulnerability has been widely discussed, the risk of successful attacks in practice is still largely unknown. In this paper, we quantitatively analyze the risk of wormhole attacks for the exposure notification system of Google and Apple, which builds on Bluetooth LE. To this end, we dissect and model the communication process of the system and identify factors contributing to the risk. Through a causal and empirical analysis, we find that the incidence and infectivity of the traced disease drive the risk of wormhole attacks, whereas technical aspects only play a minor role. Given the infectious delta variant of COVID-19, the risk of successful attacks thus increases and may pose a threat to digital contact tracing. As a remedy, we propose countermeasures that can be integrated into existing contact tracing systems and significantly reduce the success of wormhole attacks.

Towards Resiliency of Heavy Vehicles through Compromised Sensor Data Reconstruction

Almost all aspects of modern automobiles are controlled by embedded computers, known as Electronic Control Units (ECUs). ECUs are connected with each other over a Controller Area Network (CAN) network. ECUs communicate with each other and control the automobile's behavior using messages. Heavy vehicles, unlike passenger cars, are constructed using ECUs manufactured by different Original Equipment Manufacturers (OEMs). For reasons of interoperability, the Society of Automotive Engineers (SAE) mandates that all ECUs should communicate using the standardized SAE-J1939 protocol that gives semantics to the signals transmitted on the CAN network. Security concerns have been historically ignored in protocols and standards. Consequently, an ECU having malicious code can spoof other ECUs, e.g., a message can be injected through the OBD-II port or the telematics unit into the internal network to interfere with the behavior of the vehicle. Intrusion Detection Systems (IDS) have been proposed and utilized to detect various types of security attacks. However, such systems are only capable of detecting attacks and cannot mitigate them. A compromised ECU may generate invalid data values; even if such invalid values are detected, there is still a need to counter their effects. Almost all prior works focus on detecting attacks. We demonstrate how to make the vehicle resilient to attacks. We analyze the log files of real driving scenarios and show ECUs are significantly dependent on other ECUs to operate. We demonstrate that parameters of a compromised ECU can be reconstructed from those of other non-compromised ECUs to allow the vehicle to continue operation and make it resilient to attacks. We achieve this by modeling the behavior of an ECU using the multivariate Long Short-Term Memory (LSTM) neural network. We then reconstruct compromised ECU values using information obtained from trustworthy ECUs. Despite some levels of errors, our model can reconstruct trustworthy data values that can be substituted for values generated by compromised ECUs. The error between the reconstructed values and the correct ones is less than 6% of the operating range for the compromised ECU, which is significantly low and can be substituted. Our proposed approach makes the vehicle resilient without requiring changes to the internal architecture.

SESSION: Session 7: Encryption and Privacy

Session details: Session 7: Encryption and Privacy

Parallel Operations over TFHE-Encrypted Multi-Digit Integers

Recent advances in Fully Homomorphic Encryption (FHE) allow for a practical evaluation of non-trivial functions over encrypted data. In particular, novel approaches for combining ciphertexts broadened the scope of prospective applications. However, for arithmetic circuits, the overall complexity grows with the desired precision and there is only a limited space for parallelization. In this paper, we put forward several methods for fully parallel addition of multi-digit integers encrypted with the TFHE scheme. Since these methods handle integers in a special representation, we also revisit the signum function, firstly addressed by Bourse et al., and we propose a method for the maximum of two numbers; both with particular respect to parallelization. On top of that, we outline an approach for multiplication by a known integer. According to our experiments, the fastest approach for parallel addition of 31-bit encrypted integers in an idealized setting with 32 threads is estimated to be more than 6x faster than the fastest sequential approach. Finally, we demonstrate our algorithms on an evaluation of a practical neural network.

Private Lives Matter: A Differential Private Functional Encryption Scheme

The use of data combined with tailored statistical analysis has presented a unique opportunity to organizations in diverse fields to observe users' behaviors and needs, and accordingly adapt and fine-tune their services. However, in order to offer utilizable, plausible, and personalized alternatives to users, this process usually also entails a breach of their privacy. The use of statistical databases for releasing data analytics is growing exponentially, and while many cryptographic methods are utilized to protect the confidentiality of the data -- a task that has been ably carried out by many authors over the years -- only a few %rudimentary number of works focus on the problem of privatizing the actual databases. Believing that securing and privatizing databases are two equilateral problems, in this paper, we propose a hybrid approach by combining Functional Encryption with the principles of Differential Privacy. Our main goal is not only to design a scheme for processing statistical data and releasing statistics in a privacy-preserving way but also to provide a richer, more balanced, and comprehensive approach in which data analytics and cryptography go hand in hand with a shift towards increased privacy.

Efficient Dynamic Searchable Encryption with Forward Privacy under the Decent Leakage

Dynamic searchable symmetric encryption (SSE) enables clients to update and search encrypted data stored on a server and provides efficient search operations instead of leakages of inconsequential information. The amount of permitted leakage is a crucial factor of dynamic SSE; more leakage allows us to design an efficient scheme, while leakage attacks tell us that the leakage has a real-world impact. Leakage-abuse attacks (NDSS 2012) and subsequent works suggest that dynamic SSE schemes should not unnecessarily reveal extra information during the search procedure, and in particular, file-injection attacks (USENIX Security 2016) showed that forward privacy, which restricts the leakage during the addition procedure, is a vital security notion for dynamic SSE. In this paper, we propose a new dynamic SSE scheme with a good balance of efficiency and security levels; our scheme achieves both high efficiency and forward-privacy and only requires the decent leakage, i.e., only allows the leakage of search and access patterns during search operations. Specifically, we first show there is still no such scheme by uncovering a flaw in the security proof of Etemad et al.'s scheme (PoPETs 2018) and showing that extra leakage is required to fix it. We then propose the first forward-private dynamic SSE scheme that only requires symmetric-key primitives and the standard, decent leakage to prove the security. Although the client's information is slightly larger than existing schemes, our experimental results show that our scheme is comparable to Etemad et al.'s scheme, which is the most-efficient-ever scheme with forward privacy, in terms of efficiency.

RS-PKE: Ranked Searchable Public-Key Encryption for Cloud-Assisted Lightweight Platforms

Since more and more data from lightweight platforms like IoT devices or mobile apps are being outsourced to the cloud, the need to ensure privacy while retaining data usability is essential. In this paper, we design a framework where lightweight platforms like IoT devices can encrypt documents and generate document indexes using the public key before uploading the document to the cloud, and an admin can search and retrieve the top-k most relevant documents that match a specific keyword using the private key. In most existing searchable encryption that supports IoT, all the documents that match a queried keyword are returned to the admin. This is not practical as IoT devices continuously upload data. We formally name our framework asRanked Searchable Public-Key Encryption (RS-PKE). We also implemented a prototype of RS-PKE and tested it in the Amazon EC2 cloud using the RFC dataset. The comprehensive evaluation demonstrates that RS-PKE is efficient and secure for practical deployment.

SESSION: Panel I

Session details: Panel I

Security and Privacy for Emerging IoT and CPS Domains

The proliferation of IoT and CPS technologies demand novel conceptual, foundational and applied cybersecurity solutions. The dynamic behaviour of these distributed systems augmented with physical and computational constraints of smart devices, require cybersecurity approaches for timely prevention and detection of attacks. This panel aims to discuss open challenges and highlight future research directions for cybersecurity in IoT and CPS.

SESSION: Panel II

Session details: Panel II

Enforcement of Laws and Privacy Preferences in Modern Computing Systems

Modern civilization is highly dependent on computing systems, touching all aspects of business, government, and individual life. At the same time, there has been an increase in laws and privacy preferences whose implementation and effectiveness depend on software. Whereas organizations and individuals have been expected to comply with laws and regulations, now computing systems must also be compliant and accountable. Computing systems need to be designed with privacy preferences and legal statutes in mind, and should be adaptable to change.

SESSION: Poster Session I

Session details: Poster Session I

I Don't Know Why You Need My Data: A Case Study of Popular Social Media Privacy Policies

Data privacy, a critical human right, is gaining importance as new technologies are developed, and the old ones evolve. In mobile platforms such as Android, data privacy regulations require developers to communicate data access requests using privacy policy statements (PPS). This case study cross-examines the PPS in popular social media (SM) apps --- Facebook and Twitter --- for features of language ambiguity, sensitive data requests, and whether the statements tally with the data requests made in the Manifest file. Subsequently, we conduct a comparative analysis between the PPS of these two apps to examine trends that may constitute a threat to user data privacy.

Disclosure Risk from Homogeneity Attack in Differentially Private Release of Frequency Distribution

Differential privacy (DP) provides a robust model to achieve privacy guarantees in released information. We examine the robustness of the protection against homogeneity attack (HA) in multi-dimensional frequency distributions sanitized via DP randomization mechanisms. We propose measures for disclosure risk from HA and derive closed-form relationships between privacy loss parameters in DP and disclosure risk from HA. We also provide a lower bound to the disclosure risk on a sensitive attribute when all the cells formed by quasi-identifiers are homogeneous for the sensitive attribute. The availability of the closed-form relationships helps understand the abstract concepts of DP and privacy loss parameters by putting them in the context of a concrete privacy attack and offers a perspective for choosing privacy loss parameters when employing DP mechanisms to release information in practice. We apply the closed-form mathematical relationships on real-life datasets to assess disclosure risk due to HA in differentially private sanitized frequency distributions at various privacy loss parameters.

A New Bound for Privacy Loss from Bayesian Posterior Sampling

Differential privacy (DP) is a state-of-the-art concept that formalizes privacy guarantees. We derive a new bound for the privacy loss from releasing Bayesian posterior samples in the setting of DP. The new bound is tighter than the existing bounds for common Bayesian models and is also consistent with the likelihood principle. We apply the privacy loss quantified by the new bound to release differentially private synthetic data from Bayesian models in several experiments and show the improved utility of the synthetic data compared to those generated from explicitly designed randomization mechanisms that privatize posterior distributions.

Does Deception Leave a Content Independent Stylistic Trace?

A recent survey claims that there are \em no general linguistic cues for deception. Since Internet societies are plagued with deceptive attacks such as phishing and fake news, this claim means that we must build individual datasets and detectors for each kind of attack. It also implies that when a new scam (e.g., Covid) arrives, we must start the whole process of data collection, annotation, and model building from scratch. In this paper, we put this claim to the test by building a quality domain-independent deception dataset and investigating whether a model can perform well on more than one form of deception.

Transforming Memory Image to Sound Wave Signals for an Effective IoT Fingerprinting

As the need and adaptation for smart environments continue to rise, owing mainly to the evolution in IoT technology's processing and sensing capabilities, the security community must contend with increasing attack surfaces on our network, critical systems, and infrastructures. Thus, developing an effective fingerprint to deal with some of these threats is of paramount importance. As such, in this paper, we explored the use of memory snapshots for effective dynamic process-level fingerprints. Our technique transforms a memory snapshot into a sound wave signal, from which we then retrieve their distinctive Mel-Frequency Cepstral Coefficients (MFCC) features as unique process-level identifiers. The evaluation of this proposed technique on our dataset demonstrated that MFCC-based fingerprints generated from the same IoT process memory at different times exhibit much stronger similarities than those acquired from different IoT process spaces.

SESSION: Poster Session II

Session details: Poster Session II

Using Adversarial Defences Against Image Classification CAPTCHA

CAPTCHAs are widely used today as a reliable method to set up a Turing test to discern between humans and computers. With the improvements in AI technology, many AI hard problems could now be solved with new techniques, for example, better Optical Character Recognition models. This work highlights the possibility of using adversarial defences techniques such as Spatial smoothing and JPEG compression to defeat image classification CAPTCHAs.

Poisoning Attacks against Feature-Based Image Classification

Adversarial machine learning and the robustness of machine learning is gaining attention, especially in image classification. Attacks based on data poisoning, with the aim to lower the integrity or availability of a model, showed high success rates, while barely reducing the classifiers accuracy - particularly against Deep Learning approaches such as Convolutional Neural Networks (CNNs). While Deep Learning has become the most prominent technique for many pattern recognition tasks, feature-extraction based systems still have their applications - and there is surprisingly little research dedicated to the vulnerability of those approaches. We address this gap and show preliminary results in evaluating poisoning attacks against feature-extraction based systems, and compare them to CNNs, on a traffic sign classification dataset. Our findings show that feature-extraction based ML systems require higher poisoning percentages to achieve similar backdoor success, and also need a consistent (static) backdoor position to work.

Demystifying Video Traffic from IoT (Spy) Camera using Undecrypted Network Traffic

Video traffic can create significant privacy and security threats to an organization or a smart home. Integration of IoT cameras has increased this problem manifold especially when there is no clear distinction among the protocols that can be used in IoT cameras and traditional video streaming or sharing applications. In this paper, we initiate a study on distinguishing video traffic in IoT cameras from that in video conferencing or sharing applications. We have used three IoT cameras, four video conferencing applications and two video sharing platforms to collect network traffic at network and above layers. We found a number of protocols like Real-time Transport Protocol, QUIC protocol, UDT protocol and TLS protocols that are used for transferring video traffic in these applications. We found that the protocols that carry IoT camera traffic have significantly different characteristics compared to that in video conferencing and sharing applications, e.g., in terms of video codec.

kTRACKER: Passively Tracking KRACK using ML Model

Recently, a number of attacks have been demonstrated (like key reinstallation attack, called KRACK) on WPA2 protocol suite in Wi-Fi WLAN. In this paper, we design and implement a system, called kTRACKER, to passively detect anomalies in the handshake of Wi-Fi security protocols, in particular WPA2, between a client and an access point using COTS radios. A state machine model is implemented to detect KRACK attack by passively monitoring multiple wireless channels. In particular, we perform deep packet inspection and develop a grouping algorithm to group Wi-Fi handshake packets to identify the symptoms of the KRACK in specific stages of a handshake session. Our implementation of kTRACKER does not require any modification to the firmware of the supplicant i.e., client or the authenticator i.e., access point or the COTS devices, our system just needs to be in the accessible range from clients and access points. We use a publicly available dataset for performance analysis of kTRACKER. We employ gradient boosting-based supervised machine learning models, and show that an accuracy around 93.39% and a false positive rate of 5.08% can be achieved using kTRACKER.

Qubit Reset and Refresh: A Gamechanger for Random Number Generation

Generation of random binary numbers for cryptographic use is often addressed using pseudorandom number generating functions in compilers and specialized cryptographic packages. Using the IBM's Qiskit reset functionality, we were able to implement a straight-forward in-line Python function that returns a list of quantum-generated random numbers, by creating and executing a circuit on IBM quantum systems.

We successfully created a list of 1000 1024-bit binary random numbers as well as a list of 40,000 25-bit binary random numbers for randomness testing, using the NIST Statistical Test Suite for Random and Pseudorandom Number Generators for Cryptographic Applications. The quantum-generated random data we tested showed very strong randomness, according to the NIST suite.

Previously, IBM's quantum implementation required a single qubit for each bit of data generated in a circuit, making generation of large random numbers impractical. IBM's addition of the reset instruction eliminates this restriction and allows for the creation of functions that can generate a larger quantity of data-bit output, using only a small number of qubits.

Towards Robust Detection of PDF-based Malware

With the indisputable prevalence of PDFs, several studies into PDF malware and their evasive variants have been conducted to test the robustness of ML-based PDF classifier frameworks, Hidost and Mimicus. As heavily documented, the fundamental difference between them is that Hidost investigates the logical structure of PDFs, while Mimicus detects malicious indicators through their structural features. However, there exists techniques to mutate such features such that malicious PDFs are able to bypass these classifiers. In this work, we investigated three known attacks: Mimicry, Mimicry+, and Reverse Mimicry to compare how effective they are in evading classifiers in Hidost and Mimicus. The results shows that Mimicry and Mimicry+ are effective in bypassing models in Mimicus but not in Hidost, while Reverse Mimicy is effective against both models in Mimicus and Hidost.

Macro-level Inference in Collaborative Learning

With increasing data collection, also efforts to extract the underlying knowledge increase. Among these, collaborative learning efforts become more important, where multiple organisations want to jointly learn a common predictive model, e.g. to detect anomalies or learn how to improve a production process. Instead of learning only from their own data, a collaborative approach enables the participants to learn a more generalising model, also capable to predict settings not yet encountered by their own organisation, but some of the others. However, in many cases, the participants would not want to directly share and disclose their data, for regulatory reasons, or because the data constitute a business asset. Approaches such as federated learning allow to train a collaborative model without exposing the data itself. However, federated learning still requires exchanging intermediate models from each participant. Information that can be inferred from these models is thus a concern. Threats to individual data points and defences have been studied e.g. in membership inference attacks. However, we argue that in many use cases, also global properties are of interest -- not only to outsiders, but specifically also to the other participants, which might be competitors. In a production process, e.g. knowing which types of steps a company performs frequently, or obtaining information on quantities of a specific product or material a company processes, could reveal business secrets, without needing to know details of individual data points.

MK-RS-PKE: Multi-Keyword Ranked Searchable Public-Key Encryption for Cloud-Assisted Lightweight Platforms

Since more and more data from lightweight platforms like IoT devices or mobile apps are being outsourced to the cloud, the need to ensure privacy while retaining data usability is essential. This paper designs a framework where lightweight platforms like IoT devices can encrypt documents and generate document indexes using the public key before uploading the document to the cloud. An admin can search and retrieve the top-k most relevant documents that match a multi-keyword query using the private key. Most existing searchable encryption that supports IoT returns all the documents matching queried keywords. However, IoT devices can produce massive data, which is not practical for such schemes. We formally name our framework Multi-keyword Ranked Searchable Public-Key Encryption (MK-RS-PKE).