CODASPY '21: Proceedings of the Eleventh ACM Conference on Data and Application Security and Privacy

Full Citation in the ACM Digital Library

SESSION: Keynote I

Session details: Keynote I

When Models Learn Too Much

Statistical machine learning uses training data to produce models that capture patterns in that data. When models are trained on private data, such as medical records or personal emails, there is a risk that those models not only learn the hoped-for patterns, but will also learn and expose sensitive information about their training data. Several different types of inference attacks on machine learning models have been found, and methods have been proposed to mitigate the risks of exposing sensitive aspects of training data.

Differential privacy provides formal guarantees bounding certain types of inference risk, but, at least with state-of-the-art methods, providing substantive differential privacy guarantees requires adding so much noise to the training process for com¬plex models that the resulting models are useless. Experimental evidence, however, suggests that inference attacks have limited power, and in many cases a very small amount of privacy noise seems to be enough to defuse inference attacks.

In this talk, I will give an overview of a variety of different inference risks for machine learning models, talk about strategies for evaluating model inference risks, and report on some experiments by our research group to better understand the power of inference attacks in more realistic settings, and explore some broader the connections between privacy, fair-ness, and adversarial robustness.


Session details: Keynote II

Measurable and Deployable Security: Gaps, Successes, and Opportunities

Security measurement helps identify deployment gaps and present extremely valuable research opportunities. However, such research is often deemed as not novelty by academia. I will first share my research journey designing and producing a high-precision tool CryptoGuard for scanning cryptographic vulnerabilities in large Java projects. That work led us to publish two benchmarks used for systematically assessing state-of-the-art academic and commercial solutions, as well as help Oracle Labs integrate our detection in their routine scanning.

Other specific measurement and deployment cases to discuss include the Payment Card Industry Data Security Standard, which was involved in high-profile data breach incidents, and fine-grained Address Space Layout Randomization (ASLR). The talk will also point out the need for measurement in AI development in the context of code repair.

Broadening research styles by accepting and encouraging deployment-related work will facilitate our field to progress towards maturity.

SESSION: Session 1A: Adversarial Machine Learning

Session details: Session 1A: Adversarial Machine Learning

Membership Inference Attacks and Defenses in Classification Models

We study the membership inference (MI) attack against classifiers, where the attacker's goal is to determine whether a data instance was used for training the classifier. Through systematic cataloging of existing MI attacks and extensive experimental evaluations of them, we find that a model's vulnerability to MI attacks is tightly related to the generalization gap---the difference between training accuracy and test accuracy. We then propose a defense against MI attacks that aims to close the gap by intentionally reduces the training accuracy. More specifically, the training process attempts to match the training and validation accuracies, by means of a new set regularizer using the Maximum Mean Discrepancy between the softmax output empirical distributions of the training and validation sets. Our experimental results show that combining this approach with another simple defense (mix-up training) significantly improves state-of-the-art defense against MI attacks, with minimal impact on testing accuracy.

Using Single-Step Adversarial Training to Defend Iterative Adversarial Examples

Adversarial examples are among the biggest challenges for machine learning models, especially neural network classifiers. Adversarial examples are inputs manipulated with perturbations insignificant to humans while being able to fool machine learning models. Researchers achieve great progress in utilizing adversarial training as a defense. However, the overwhelming computational cost degrades its applicability, and little has been done to overcome this issue. Single-Step adversarial training methods have been proposed as computationally viable solutions; however, they still fail to defend against iterative adversarial examples. In this work, we first experimentally analyze several different state-of-the-art (SOTA) defenses against adversarial examples. Then, based on observations from experiments, we propose a novel single-step adversarial training method that can defend against both single-step and iterative adversarial examples. Through extensive evaluations, we demonstrate that our proposed method successfully combines the advantages of both single-step (low training overhead) and iterative (high robustness) adversarial training defenses. Compared with ATDA on the CIFAR-10 dataset, for example, our proposed method achieves a 35.67% enhancement in test accuracy and a 19.14% reduction in training time. When compared with methods that use BIM or Madry examples (iterative methods) on the CIFAR-10 dataset, our proposed method saves up to 76.03% in training time, with less than 3.78% degeneration in test accuracy. Finally, our experiments with the ImageNet dataset clearly show the scalability of our approach and its performance advantages over SOTA single-step approaches.

Real-Time Evasion Attacks against Deep Learning-Based Anomaly Detection from Distributed System Logs

Distributed system logs, which record states and events that occurred during the execution of a distributed system, provide valuable information for troubleshooting and diagnosis of its operational issues. Due to the complexity of such systems, there have been some recent research efforts on automating anomaly detection from distributed system logs using deep learning models. As these anomaly detection models can also be used to detect malicious activities inside distributed systems, it is important to understand their robustness against evasive manipulations in adversarial environments. Although there are various attacks against deep learning models in domains such as natural language processing and image classification, they cannot be applied directly to evade anomaly detection from distributed system logs. In this work, we explore the adversarial robustness of deep learning-based anomaly detection models on distributed system logs. We propose a real-time attack method called LAM (Log Anomaly Mask) to perturb streaming logs with minimal modifications in an online fashion so that the attacks can evade anomaly detection by even the state-of-the-art deep learning models. To overcome the search space complexity challenge, LAM models the perturber as a reinforcement learning agent that operates in a partially observable environment to predict the best perturbation action. We have evaluated the effectiveness of LAM on two log-based anomaly detection systems for distributed systems: DeepLog and an AutoEncoder-based anomaly detection system. Our experimental results show that LAM significantly reduces the true positive rate of these two models while achieving attack imperceptibility and real-time responsiveness.

SESSION: Session 1B: Adversarial Machine Learning

Session details: Session 1B: Adversarial Machine Learning

We Can Pay Less: Coordinated False Data Injection Attack Against Residential Demand Response in Smart Grids

Advanced metering infrastructure, along with home automation processes, is enabling more efficient and effective demand-side management opportunities for both consumers and utility companies. However, tight cyber-physical integration also enables novel attack vectors for false data injection attacks (FDIA) as home automation/ home energy management systems reside outside the utilities' control perimeter. Authentic users themselves can manipulate these systems without causing significant security breaches compared to traditional FDIAs. This work depicts a novel FDIA that exploits one of the commonly utilised distributed device scheduling architectures. We evaluate the attack impact using a realistic dataset to demonstrate that adversaries gain significant benefits, independently from the actual algorithm used for optimisation, as long as they have control over a sufficient amount of demand. Compared to traditional FDIAs, reliable security mechanisms such as proper authentication, security protocols, security controls or, sealed/controlled devices cannot prevent this new type of FDIA. Thus, we propose a set of possible impact alleviation solutions to thwart this type of attack.

Brittle Features of Device Authentication

Authenticating a networked device relies on identifying its unique characteristics. Recent device fingerprinting proposals demonstrate that device activity, such as network traffic, can be used to extract features which identify devices using machine learning (ML). However, there has been little work examining how adversarial machine learning can compromise these schemes. In this work, we show two efficient attacks against three ML-based device authentication (MDA) systems. One of the attacks is an adaptation of an existing gradient-estimation-based attack to the MDA setting; the second uses a fuzzing-based approach. We find that the MDA systems use brittle features for device identification and hence, can be reliably fooled with only 30 to 80 failed authentication attempts. However, selecting features that are robust against adversarial attack is challenging, as indicators such as information gain are not reflective of the features that adversaries most profitably attack. We demonstrate that it is possible to defend MDA systems which rely on neural networks, and in the general case, offer targeted advice for designing more robust MDA systems.

Role-Based Deception in Enterprise Networks

Historically, enterprise network reconnaissance is an active process, often involving port scanning. However, as routers and switches become more complex, they also become more susceptible to compromise. From this vantage point, an attacker can passively identify high-value hosts such as the workstations of IT administrators, C-suite executives, and finance personnel. The goal of this paper is to develop a technique to deceive and dissuade such adversaries. We propose HoneyRoles, which uses honey connections to build metaphorical haystacks around the network traffic of client hosts belonging to high-value organizational roles. The honey connections also act as network canaries to signal network compromise, thereby dissuading the adversary from acting on information observed in network flows. We design a prototype implementation of HoneyRoles an OpenFlow SDN controller and evaluate its security using the PRISM probabilistic model checker. Our performance evaluation shows that HoneyRoles has a small effect on network request completion time, and security analysis demonstrates that once an alert is raised, HoneyRoles can quickly identify the compromised switch with high probability. In doing so, we show that role-based network deception is a promising approach for defending against adversaries in compromised network devices.

SESSION: Session 2: Blockchains, Digital Currency

Session details: Session 2: Blockchains, Digital Currency

BFastPay: A Routing-free Protocol for Fast Payment in Bitcoin Network

Bitcoin is the most popular cryptocurrency which supports payment services via the Bitcoin peer-to-peer network. However, Bitcoin suffers from a fundamental problem. In practice, a secure Bitcoin transaction requires the payee to wait for at least 6 block confirmations (one hour) to be validated. Such a long waiting time thwarts the wide deployment of the Bitcoin payment services because many usage scenarios require a much shorter waiting time. In this paper, we propose BFastPay to accelerate the Bitcoin payment validation. BFastPay employs a smart contract called BFPayArbitrator to host the payer's security deposit and fulfills the role of a trusted payment arbitrator which guarantees that a payee always receives the payment even if attacks occur. BFastpay is a routing-free solution that eliminates the requirement for payment routing in the traditional payment routing network (e.g., Lightning Network). The theoretical and experimental results show that BFast is able to significantly reduce the Bitcoin payment waiting time (e.g., from 60 mins to less than 1 second) with nearly no extra operation cost.

Security Threats from Bitcoin Wallet Smartphone Applications: Vulnerabilities, Attacks, and Countermeasures

Nowadays, Bitcoin is the most popular cryptocurrency. With the proliferation of smartphones and the high-speed mobile Internet, more and more users have started accessing their Bitcoin wallets on their smartphones. Users can download and install a variety of Bitcoin wallet applications (e.g., Coinbase, Luno, Bitcoin Wallet) on their smartphones and access their Bitcoin wallets anytime and anywhere. However, it is still unknown whether these Bitcoin wallet smartphone applications are secure or if they are new attack surfaces for adversaries to attack these application users. In this work, we explored the insecurity of the 10 most popular Bitcoin wallet smartphone applications and discovered three security vulnerabilities. By exploiting them, adversaries can launch various attacks including Bitcoin deanonymization, reflection and amplification spamming, and wallet fraud attacks. To address the identified security vulnerabilities, we developed a phone-side Bitcoin Security Rectifier to secure Bitcoin wallet smartphone application users. The developed rectifier does not require any modifications to current wallet applications and is compliant with Bitcoin standards.

BlockFLA: Accountable Federated Learning via Hybrid Blockchain Architecture

Federated Learning (FL) is a distributed, and decentralized machine learning protocol. By executing FL, a set of agents can jointly train a model without sharing their datasets with each other, or a third-party. This makes FL particularly suitable for settings where data privacy is desired.

At the same time, concealing training data gives attackers an opportunity to inject backdoors into the trained model. It has been shown that an attacker can inject backdoors to the trained model during FL, and then can leverage the backdoor to make the model misclassify later. Several works tried to alleviate this threat by designing robust aggregation functions. However, given more sophisticated attacks are developed over time, which by-pass the existing defenses, we approach this problem from a complementary angle in this work. Particularly, we aim to discourage backdoor attacks by detecting, and punishing the attackers, possibly after the end of training phase.

To this end, we develop a hybrid blockchain-based FL framework that uses smart contracts to automatically detect, and punish the attackers via monetary penalties. Our framework is general in the sense that, any aggregation function, and any attacker detection algorithm can be plugged into it. We conduct experiments to demonstrate that our framework preserves the communication-efficient nature of FL, and provide empirical results to illustrate that it can successfully penalize attackers by leveraging our novel attacker detection algorithm.

SteemOps: Extracting and Analyzing Key Operations in Steemit Blockchain-based Social Media Platform

Advancements in distributed ledger technologies are driving the rise of blockchain-based social media platforms such as Steemit, where users interact with each other in similar ways as conventional social networks. These platforms are autonomously managed by users using decentralized consensus protocols in a cryptocurrency ecosystem. The deep integration of social networks and blockchains in these platforms provides potential for numerous cross-domain research studies that are of interest to both the research communities. However, it is challenging to process and analyze large volumes of raw Steemit data as it requires specialized skills in both software engineering and blockchain systems and involves substantial efforts in extracting and filtering various types of operations. To tackle this challenge, we collect over 38 million blocks generated in Steemit during a 45 month time period from 2016/03 to 2019/11 and extract ten key types of operations performed by the users. The results generate SteemOps, a new dataset that organizes more than 900 million operations from Steemit into three sub-datasets namely (i) social-network operation dataset (SOD), (ii) witness-election operation dataset (WOD) and (iii) value-transfer operation dataset (VOD). We describe the dataset schema and its usage in detail and outline possible future research studies using SteemOps. SteemOps is designed to facilitate future research aimed at providing deeper insights on emerging blockchain-based social media platforms.

SESSION: Session 3: Privacy

Session details: Session 3: Privacy

Decentralized Reputation

In this work we develop a privacy-preserving reputation scheme for collaborative systems such as P2P networks in which peers can represent themselves with different pseudonyms when interacting with others. All these pseudonyms, however, are bound to the same reputation token, allowing honest peers to maintain their good record even when switching to a new pseudonym while preventing malicious ones from making a fresh start.

Our system is truly decentralized. Using an append-only distributed ledger such as Bitcoin's blockchain, we show how participants can make anonymous yet verifiable assertions about their own reputation. In particular, reputation can be demonstrated and updated effectively using efficient zkSNARK proofs. The system maintains soundness, peer-pseudonym unlinkability as well as unlinkability among pseudonyms of the same peer. We formally prove these properties and we evaluate the efficiency of the various operations, demonstrating the viability of our approach.

Don't fool yourself with Forward Privacy, Your queries STILL belong to us!

Dynamic Searchable Symmetric Encryption (DSSE) enables a user to perform encrypted search queries on encrypted data stored on a server. Recently, a notion of Forward Privacy (FP) was introduced to guarantee that a newly added document cannot be linked to previous queries, and to thwart relative attacks and lessen information leakage and its consequences. However, in this paper we show that the forward-private schemes have no advantage (in preventing the related attacks) compared to traditional approaches, and previous attacks are still applicable on FP schemes. In FP approaches, access pattern leakage is still possible and can be employed to uncover the search pattern which can be used by passive and adaptive attacks. To address this issue, we construct a new parallelizable DSSE approach to obfuscate the access and search pattern. Our cost-efficient scheme supports both updates and searches. Our security proof and performance analysis demonstrate the practicality, efficiency, and security of our approach.

A Large Publicly Available Corpus of Website Privacy Policies Based on DMOZ

Studies have shown website privacy policies are too long and hard to comprehend for their target audience. These studies and a more recent body of research that utilizes machine learning and natural language processing to automatically summarize privacy policies greatly benefit, if not rely on, corpora of privacy policies collected from the web. While there have been smaller annotated corpora of web privacy policies made public, we are not aware of any large publicly available corpus. We use DMOZ, a massive open-content directory of the web, and its manually categorized 1.5 million websites, to collect hundreds of thousands of privacy policies associated with their categories, enabling research on privacy policies across different categories/market sectors. We review the statistics of this corpus and make it available for research. We also obtain valuable insights about privacy policies, e.g., which websites post them less often. Our corpus of web privacy policies is a valuable tool at the researchers' disposal to investigate privacy policies. For example, it facilitates comparison among different methods of privacy policy summarization by providing a benchmark, and can be used in unsupervised machine learning to summarize privacy policies.

Adaptive Fingerprinting: Website Fingerprinting over Few Encrypted Traffic

Website fingerprinting attacks can infer which website a user visits over encrypted network traffic. Recent studies can achieve high accuracy (e.g., 98%) by leveraging deep neural networks. However, current attacks rely on enormous encrypted traffic data, which are time-consuming to collect. Moreover, large-scale encrypted traffic data also need to be recollected frequently to adjust the changes in the website content. In other words, the bootstrap time for carrying out website fingerprinting is not practical. In this paper, we propose a new method, named Adaptive Fingerprinting, which can derive high attack accuracy over few encrypted traffic by leveraging adversarial domain adaption. With our method, an attacker only needs to collect few traffic rather than large-scale datasets, which makes website fingerprinting more practical in the real world. Our extensive experimental results over multiple datasets show that our method can achieve 89% accuracy over few encrypted traffic in the closed-world setting and 99% precision and 99% recall in the open-world setting. Compared to a recent study (named Triplet Fingerprinting), our method is much more efficient in pre-training time and is more scalable. Moreover, the attack performance of our method can outperform Triplet Fingerprinting in both the closed-world evaluation and open-world evaluation.

UTrack: Enterprise User Tracking Based on OS-Level Audit Logs

Tracking user activities inside an enterprise network has been a fundamental building block for today's security infrastructure, as it provides accurate user profiling and helps security auditors to make informed decisions based on the derived insights from the abundant log data. Towards more accurate user tracking, we propose a novel paradigm named UTrack by leveraging rich system-level audit logs. From a holistic perspective, we bridge the semantic gap between user accounts and real users, tracking a real user's activities across different user accounts and different network hosts based on causal relationship among processes. To achieve better scalability and a more salient view, we apply a variety of data reduction and compression techniques to process the large amount of data. %and significantly reduce the data volume. We implement UTrack in a real enterprise environment consisting of 111 hosts, which generate more than 4 billion events in total during the experiment time of one month. Through our evaluation, we demonstrate that UTrack is able to accurately identify the events that are relevant to user activities. Our data reduction and compression modules largely reduce the output data size, producing a both accurate and salient overview on a user session profile.

SESSION: Session 4: Policies

Graph-Based Specification of Admin-CBAC Policies

We present a graph-based language for the specification of administrative access control policies in Admin-CBAC, an administrative model for Category-Based Access Control. More precisely, we propose a multi-level graph representation of policies and a graph-rewriting semantics for administrative actions, from which properties (such as safety, liveness and effectiveness of policies) and constraints (such as separation of duties) can be checked using graph traversal algorithms and rewriting properties. Since Admin-CBAC is a generic model, the techniques are directly applicable to a variety of access control models. In particular, we illustrate our techniques for the RBAC and ABAC instances of Admin-CBAC.

Incremental Maintenance of ABAC Policies

Discovery of Attribute Based Access Control policies through mining has been studied extensively in the literature. However, current solutions assume that the rules are to be mined from a static data set of access permissions and that this process only needs to be done once. However, in real life, access policies are dynamic in nature and may change based on the situation. Simply utilizing the current approaches would necessitate that the mining algorithm be re-executed for every update in the permissions or user/object attributes, which would be significantly inefficient. In this paper, we propose to incrementally maintain ABAC policies by only updating the rules that may be affected due to any change in the underlying access permissions or attributes. A comprehensive experimental evaluation demonstrates that the proposed incremental approach is significantly more efficient than the conventional ABAC mining.

Formal Analysis of ReBAC Policy Mining Feasibility

Relationship-Based Access Control (ReBAC) expresses authorization in terms of various direct and indirect relationships amongst entities, most commonly between users. The need for ReBAC policy mining arises when an existing access control system is reformulated in ReBAC. This paper considers the feasibility of ReBAC policy mining in context of user to user authorization, such as arises in various social and business contexts. In accordance with the policy mining literature, we assume that complete data is provided regarding user to user authorizations for a given user set, along with complete relationship data amongst these users comprising a labeled relationship graph. A ReBAC policy language is also specified. ReBAC policy mining seeks to formulate a ReBAC policy with the given policy language and relationship graph, which is exactly equivalent to the given authorizations. ReBAC policy mining feasibility problem asks whether such a policy exists and if so to provide the policy. We investigate this problem in context of different ReBAC policy languages which differ in the relationships, inverse relationships and non-relationships that can be used to build the policy. We develop a feasibility detection algorithm and analyze its complexity. We show that our policy languages are progressively more expressive as we introduce additional capability. In case of infeasibility, various solution approaches are discussed.

SESSION: Session 5: Pandemic Security Issues

Identifying and Characterizing COVID-19 Themed Malicious Domain Campaigns

Ever since the beginning of the outbreak of the COVID-19 pandemic, attackers acted quickly to exploit the confusion, uncertainty and anxiety caused by the pandemic and launched various attacks through COVID-19 themed malicious domains. Malicious domains are rarely deployed independently, but rather almost always belong to much bigger and coordinated attack campaigns. Thus, analyzing COVID-themed malicious domains from the angle of attack campaigns would help us gain a deeper understanding of the scale, scope and sophistication of the threats imposed by such malicious domains. In this paper, we collect data from multiple sources, and identify and characterize COVID-themed malicious domain campaigns, including the evolution of such campaigns, their underlying infrastructures and the different strategies taken by attackers behind these campaigns. Our exploration suggests that some malicious domains have strong correlations, which can guide us to identify new malicious domains and raise alarms at the early stage of their deployment. The results shed light on the emergency for detecting and mitigating public event related cyber attacks.

Contact Tracing Made Un-relay-able

Automated contact tracing is a key solution to control the spread of airborne transmittable diseases: it traces contacts among individuals in order to alert people about their potential risk of being infected. The current SARS-CoV-2 pandemic put a heavy strain on the healthcare system of many countries. Governments chose different approaches to face the spread of the virus and the contact tracing apps were considered the most effective ones. In particular, by leveraging on the Bluetooth Low-Energy technology, mobile apps allow to achieve a privacy-preserving contact tracing of citizens. While researchers proposed several contact tracing approaches, each government developed its own national contact tracing app.

In this paper, we demonstrate that many popular contact tracing apps (e.g., the ones promoted by the Italian, French, Swiss government) are vulnerable to relay attacks. Through such attacks people might get misleadingly diagnosed as positive to SARS-CoV-2, thus being enforced to quarantine and eventually leading to a breakdown of the healthcare system. To tackle this vulnerability, we propose a novel and lightweight solution that prevents relay attacks, while providing the same privacy-preserving features as the current approaches. To evaluate the feasibility of both the relay attack and our novel defence mechanism, we developed a proof of concept against the Italian contact tracing app (i.e., Immuni). The design of our defence allows it to be integrated into any contact tracing app. To foster the adoption of our solution in contact tracing apps and encourage developers to integrate it in the future releases, we publish the source code.

SESSION: Session 6 Hardware and Device Security/Privacy

Session details: Session 6 Hardware and Device Security/Privacy

Ghost Thread: Effective User-Space Cache Side Channel Protection

Cache-based side channel attacks pose a serious threat to computer security. Numerous cache attacks have been demonstrated, highlighting the need for effective and efficient defense mechanisms to shield systems from this threat. In this paper, we propose a novel application-level protection mechanism, called Ghost Thread. Ghost Thread is a flexible library that allows a user to protect cache accesses to a requested sensitive region to mitigate cache-based side channel attacks. This is accomplished by injecting random cache accesses to the sensitive cache region by separate threads. Compared with prior work that injects noise in a modified OS and hardware, our novel approach is applicable to commodity OS and hardware. Compared with other user-space mitigation mechanisms, our novel approach does not require any special hardware support, and it only requires slight code changes in the protected application making it readily deployable. Evaluation results on an Apache server show that Ghost Thread provides both strong protection and negligible overhead on real-world applications where only a fragment requires protection. In the worst-case scenario where the entire application requires protection, Ghost Thread still incurs negligible overhead when a system is under utilized, and moderate overhead when a system is fully utilized.

The Cost of OSCORE and EDHOC for Constrained Devices

Many modern IoT applications rely on the Constrained Application Protocol (CoAP). Recently, the Internet Engineering Task Force (IETF) proposed two novel protocols for securing it. These are: 1) Object Security for Constrained RESTful Environments (OSCORE) providing authenticated encryption for the CoAP's payload data and 2) Ephemeral Diffie-Hellman Over COSE (EDHOC) providing the symmetric session keys required for OSCORE. In this paper, we present the design of four firmware libraries for these protocols which are especially targeted for constrained microcontrollers and their detailed evaluation. More precisely, we present the design of uOSCORE and μEDHOC libraries for regular microcontrollers and μOSCORE-TEE and μEDHOC-TEE libraries for microcontrollers with a Trusted Execution Environment (TEE), such as microcontrollers featuring ARM TrustZone-M. Our firmware design for the latter class of devices concerns the fact that attackers may exploit common software vulnerabilities, e.g., buffer overflows in the protocol logic, OS or application to compromise the protocol security. We present an evaluation of our implementations in terms of RAM/FLASH requirements and execution speed on a broad range of microcontrollers. Our implementations are available as open-source software.

Secure Pull Printing with QR Codes and National eID Cards: A Software-oriented Design and an Open-source Implementation

With more systems becoming digitised, enterprises are adopting cloud technologies and outsourcing non-critical services to reduce the pressure on IT departments. In this process, it is crucial to achieving the right balance between costs, usability and security; prioritising security over the rest when handling sensitive data. Considering the print management, often off-premise, many enterprises report at least one print-related security incident that led to data loss in the past year. This problem can damage the enterprise business, especially considering the fines prescribed by current regulations or its reputation. Focusing on securing enterprise printing, pull printing is the set of technologies and processes that allow the release of print jobs according to specific conditions; typically user authentication and proximity to a printer. We design a software-oriented pull printing infrastructure that supports a print release mechanism using QR codes and electronic IDentity cards as a second-factor authenticator. Our solution addresses the costs, as any medium-size organisation can adopt our open-source solution without additional devices or access badges; and the user experience, as we offer a driverless print environment and a user-friendly mobile application.

SESSION: Session 7 Software Security and Malware

Session details: Session 7 Software Security and Malware

Code Specialization through Dynamic Feature Observation

Modern software (both programs and libraries) provides large amounts of functionality, vastly exceeding what is needed for a single given task. This additional functionality results in an increased attack surface: first, an attacker can use bugs in the unnecessary functionality to compromise the software, and second, defenses such as control-flow integrity (CFI) rely on conservative analyses that gradually lose precision with growing code size.

Removing unnecessary functionality is challenging as the debloating mechanism must remove as much code as possible, while keeping code required for the program to function. Unfortunately, most software does not come with a formal description of the functionality that it provides, or even a mapping between functionality and code. We therefore require a mechanism that-given a set of representable inputs and configuration parameters-automatically infers the underlying functionality, and discovers all reachable code corresponding to this functionality.

We propose Ancile, a code specialization technique that leverages fuzzing (based on user provided seeds) to discover the code necessary to perform the functionality required by the user. From this, we remove all unnecessary code and tailor indirect control-flow transfers to the minimum necessary for each location, vastly reducing the attack surface. We evaluate Ancile using real-world software known to have a large attack surface, including image libraries and network daemons like nginx. For example, our evaluation shows that Ancile can remove up to 93.66% of indirect call transfer targets and up to 78% of functions in libtiff's tiffcrop utility, while still maintaining its original functionality.

Towards Accurate Labeling of Android Apps for Reliable Malware Detection

In training their newly-developed malware detection methods, researchers rely on threshold-based labeling strategies that interpret the scan reports provided by online platforms, such as VirusTotal. The dynamicity of this platform renders those labeling strategies unsustainable over prolonged periods, which leads to inaccurate labels. Using inaccurately labeled apps to train and evaluate malware detection methods significantly undermines the reliability of their results, leading to either dismissing otherwise promising detection approaches or adopting intrinsically inadequate ones. The infeasibility of generating accurate labels via manual analysis and the lack of reliable alternatives force researchers to utilize VirusTotal to label apps. In the paper, we tackle this issue in two manners. Firstly, we reveal the aspects of VirusTotalss dynamicity and how they impact threshold-based labeling strategies and provide actionable insights on how to use these labeling strategies given VirusTotal's dynamicity reliably. Secondly, we motivate the implementation of alternative platforms by (a) identifying VirusTotal limitations that such platforms should avoid, and (b) proposing an architecture of how such platforms can be constructed to mitigate VirusTotal's limitations.

SE-PAC: A Self-Evolving PAcker Classifier against rapid packers evolution

Packers are widespread tools used by malware authors to hinder static malware detection and analysis. Identifying the packer used to pack a malware is essential to properly unpack and analyze the malware, be it manually or automatically. While many well-known packers are used, there is a growing trend for new custom packers that make malware analysis and detection harder. Research works have been very effective in identifying known packers or their variants, with signature-based, supervised machine learning or similarity-based techniques. However, identifying new packer classes remains an open problem.

This paper presents a self-evolving packer classifier that provides an effective, incremental, and robust solution to cope with the rapid evolution of packers. We propose a composite pairwise distance metric combining different types of packer features. We derive an incremental clustering approach able to identify both (variants of) known packer classes and new ones, as well as to update clusters automatically and efficiently. Our system thus continuously enhances, integrates, adapts and evolves packer knowledge. Moreover, to optimize post clustering packer processing costs, we introduce a new post clustering strategy for selecting small subsets of relevant samples from the clusters. Our approach effectiveness and time-resilience are assessed with: 1) a real-world malware feed dataset composed of 16k packed binaries, comprising 29 unique packers, and 2) a synthetic dataset composed of 19k manually crafted packed binaries, comprising 31 unique packers (including custom ones).

SESSION: Poster Session

Session details: Poster Session

Quantum Obfuscation: Quantum Predicates with Entangled qubits

In this paper we discuss developing opaque predicates with the help of quantum entangled qubits. These opaque predicates obfuscate classical control flow in hybrid quantum-classical systems. The idea is to use a pair of entangled qubits, one at compile-time and one in the compiled code at runtime to create opaque predicates. We make use of the CHSH game (John Clauser, Michael Horne, Abner Shimony, and Richard Holt) to get consensus about the value of a qubit at runtime, whose value can be predicted at compile time with high probability due to quantum properties. The paper discusses designing opaque predicate that relies on the quantum behavior of the entangled qubits and quantum measurements. The obfuscation produced by this technique maintain only a semantic accuracy of 85.35% when one entangled pair of qubits are used. However, we show that the accuracy can be improved to 100% by introducing additional entangled qubit pairs.

Neutralizing Hostile Drones with Surveillance Drones

In this paper we discuss a technique to safeguard specific airspace from intruding drones with the help of surveillance drones. The idea is to use multiple surveillance drones to patrol through the area looking for suspicious flying objects. The surveillance drones are trained to identify permissible drones in the area and hostile drones using image recognition algorithms. Once a hostile drone is detected the surveillance drones surround it making it difficult to maneuver. In the meantime, our automated drone attack framework launches cyber-attacks against the hostile drone to bring it down.

Blockchain-based Proof of Existence (PoE) Framework using Ethereum Smart Contracts

In recent years, Blockchain, underpinned by distributed ledger technology (DLT) has been touted as the next disruptive technology with the potential to revolutionise various industry verticals and horizontals. Plagiarism and Intellectual Property Infringements of copyrights of artifacts, trade secrets, etc., are often fought in courts of law. There is an inherent need to adduce reliable evidence to establish a prima facie tort case or even beyond. In this paper we aim to leverage on the Blockchain technology to provide a digital transformation in the post-Covid world by offering a new platform to aid in the protection of one's intellectual property rights through a Proof of Existence (PoE) framework using Ethereum smart contracts. We have developed a seamless web platform to allow users experience a simple yet secure Proof of Existence (PoE) service by allowing the users to (i) certify, (ii) manage and (iii) view their documents securely through a digital portfolio. This PoE service leverages on the Blockchain characteristics to provide a reliable and transparent means to record a tamper-proof evidence of copyright information with timestamp as proof of existence for all its transactions through smart contracts.

IIoT-ARAS: IIoT/ICS Automated Risk Assessment System for Prediction and Prevention

As IT/OT convergence continues to evolve, the traditionally isolated ICS/OT systems are increasingly exposed to a myriad of online and offline threats. Although IIoT enhances the reachability in ICS, improved data analytics, ensuring ease of access and decision making, it unwittingly opens the ICS environment to attackers. The design of IIoT introduces multiple entry points to an isolated system, which is used to protect itself via air-gapping and risk avoidance strategies. This study explores a comprehensive mapping of threats and risks for IT/OT convergence. Additionally, we propose IIoT-ARAS - an automated risk assessment system based on OCTAVE Allegro and ISO/IEC 27030 methodologies. The design of IIoT-ARAS is aimed to be agentless, with minimum interruptions to the OT environment. Furthermore, the system performs automated regular asset inventory checks, threshold optimization, probability computation, risk evaluations, and contingency plan configuration.

OBFUS: An Obfuscation Tool for Software Copyright and Vulnerability Protection

In this paper, we propose OBFUS, a web-based tool that can easily apply obfuscation techniques to high-level and low-level programming languages. OBFUS's high-level obfuscator parses and obfuscates the source code, overlaying the obfuscation to produce more complex results. OBFUS's low-level obfuscator decompiles binary programs into LLVM IR. This LLVM IR pro-gram is obfuscated and the LLVM IR program is recompiled to become an obfuscated binary program.

Object Allocation Pattern as an Indicator for Maliciousness - An Exploratory Analysis

Traditionally, Android malware is analyzed using static or dynamic analysis. Although static techniques are often fast; however, they cannot be applied to classify obfuscated samples or malware with a dynamic payload. In comparison, the dynamic approach can examine obfuscated variants but often incurs significant runtime overhead when collecting every important malware behavioral data. This paper conducts an exploratory analysis of memory forensics as an alternative technique for extracting feature vectors for an Android malware classifier. We utilized the reconstructed per-process object allocation network to identify distinguishable patterns in malware and benign application. Our evaluation results indicate the network structural features in the malware category are unique compared to the benign dataset, and thus features extracted from the remnant of in-memory allocated objects can be utilized for robust Android malware classification algorithm.

Attribute-Based Access Control for NoSQL Databases

NoSQL databases are gaining popularity in recent times for their ability to manage high volumes of unstructured data efficiently. This necessitates such databases to have strict data security mechanisms. Attribute-Based Access Control (ABAC) has been widely appreciated for its high flexibility and dynamic nature. We present an approach for integrating ABAC into NoSQL databases, specifically MongoDB, that typically only support Role-Based Access Control (RBAC). We also discuss an implementation and performance results for ABAC in MongoDB, while emphasizing that it can be extended to other NoSQL databases as well.

A Multi Perspective Access Control in a Smart Home

Existing methods to manage privileges in smart home systems have not considered allocating privileges to users based on (i) the relationship of the user with the device, (ii) the location and risk of the device and (iii) the current environment. In this work, we take a multi perspective view on the problem of sharing fine-grained privileges of IoT devices among multiple users in a smart home. We propose the concepts of user role (subset of privileges specific to each device), tasks and security levels (labels for each privilege) to allot right privileges to users. Thereby, limiting the exploitation of privileges assigned to legitimate insiders of the house. Thus, our work matches the aspirations of previous surveys on building a comprehensive access control system to manage privileges in a shared smart home.

Assessing the Alignment of Social Robots with Trustworthy AI Design Guidelines: A Preliminary Research Study

The last few years have seen a strong movement supporting the need of having intelligent consumer products align with specific design guidelines for trustworthy artificial intelligence (AI). This global movement has led to multiple institutional recommendations for ethically aligned trustworthy design of the AI driven technologies, like consumer robots and autonomous vehicles. There has been prior research towards finding security and privacy related vulnerabilities within various types of social robots. However, none of these previous works has studied the implications of these vulnerabilities in terms of the robot design aligning with trustworthy AI. In an attempt to address this gap in existing literature, we have performed a unique research study with two social robots - Zümi and Cozmo. In this study, we have explored flaws within the robot's system, and have analyzed these flaws to assess the overall alignment of the robot system design with the IEEE global standards on the design of ethically aligned trustworthy autonomous intelligent systems (IEEE A/IS Standards). Our initial research shows that the vulnerabilities and design weaknesses, which we found in these robots, can lead to hacking, injection attacks, and other malfunctions that might affect the technology users negatively. We test the intelligent functionalities in these robots to find faults, and conduct a preliminary examination of how these flaws can potentially result in non-adherence with the IEEE A/IS principles. Through this novel study, we demonstrate our approach towards determining alignment of social robots with benchmarks for trustworthy AI, thereby creating a case for prospective design improvements to address unique risks leading to issues with robot ethics and trust.

Towards Efficient Labeling of Network Incident Datasets Using Tcpreplay and Snort

Research on network intrusion detection (NID) requires a large amount of traffic data with reliable labels indicating which packets are associated with particular network attacks. In this paper, we implement a prototype of an automated system to create labeled packet datasets for NID research. In this paper, we implement a prototype of an automated system to assign labels to packet datasets for NID research. By re-transmitting pre-captured packet data in a controlled network environment pre-installed with a network intrusion detection system, the system automatically assigns labels to attack packets within the packet data. In the feasibility study, we investigate factors that may influence the detection accuracy of the attacking packets and show an example using the prototype to label a packet file. Finally, we show an efficient way to locate the packets associated with issued NID alerts using this prototype.


Session details: Panels

AI for Security and Security for AI

On one side, the security industry has successfully adopted some AI-based techniques. Use varies from mitigating denial of service attacks, forensics, intrusion detection systems, homeland security, critical infrastructures protection, sensitive information leakage, access control, and malware detection. On the other side, we see the rise of Adversarial AI. Here the core idea is to subvert AI systems for fun and profit. The methods utilized for the production of AI systems are systematically vulnerable to a new class of vulnerabilities. Adversaries are exploiting these vulnerabilities to alter AI system behavior to serve a malicious end goal. This panel discusses some of these aspects.

Is there a Security Mindset and Can it be Taught?

The field of cybersecurity is becoming very dynamic, and needs continuous evolution. This requires not only the formal and informal education, but a security mindset to be developed for our future workforce. This panel elaborates on some such aspects.