CODASPY '20: Proceedings of the Tenth ACM Conference on Data and Application Security and Privacy

Full Citation in the ACM Digital Library

SESSION: Invited Talk

Can AI be for Good in the Midst of Cyber Attacks and Privacy Violations?: A Position Paper

Artificial Intelligence (AI) is affecting every aspect of our lives from healthcare to finance to driving to managing the home. Sophisticated machine learning techniques with a focus on deep learning are being applied successfully to detect cancer, to make the best choices for investments, to determine the most suitable routes for driving as well as to efficiently manage the electricity in our homes. We expect AI to have even more influence as advances are made with technology as well as in learning, planning, reasoning and explainable systems. While these advances will greatly advance humanity, organizations such as the United Nations have embarked on initiatives such as "AI for Good" and we can expect to see more emphasis on applying AI for the good of humanity especially in developing countries. However, the question that needs to be answered is Can AI be for Good when when the AI techniques can be attacked and the AI techniques themselves can cause privacy violations? This position paper will provide an overview of this topic with protecting children and children's rights as an example.

SESSION: Session 1: Trusted Environment

ProximiTEE: Hardened SGX Attestation by Proximity Verification

Intel SGX enables protected enclaves on untrusted computing platforms. An important part of SGX is its remote attestation mechanism that allows a remote verifier to check that the expected enclave was correctly initialized before provisioning secrets to it. However, SGX attestation is vulnerable to relay attacks where the attacker, using malicious software on the target platform, redirects the attestation and therefore the provisioning of confidential data to a platform that he physically controls. Although relay attacks have been known for a long time, their consequences have not been carefully examined. In this paper, we analyze relay attacks and show that redirection increases the adversary's abilities to compromise the enclave in several ways, enabling for instance physical and digital side-channel attacks that would not be otherwise possible.

We propose ProximiTEE, a novel solution to prevent relay attacks. Our solution is based on a trusted embedded device that is attached to the target platform. Our device verifies the proximity of the attested enclave, thus allowing attestation to the intended enclave regardless of malicious software, such as a compromised OS, on the target platform. The device also performs periodic proximity verification which enables secure enclave revocation by detaching the device. Although proximity verification has been proposed as a defense against relay attacks before, this paper is the first to experimentally demonstrate that it can be secure and reliable for TEEs like SGX. Additionally, we consider a stronger adversary that has obtained leaked SGX attestation keys and emulates an enclave on the target platform. To address such emulation attacks, we propose a second solution where the target platform is securely initialized by booting it from the attached embedded device.

MOSE: Practical Multi-User Oblivious Storage via Secure Enclaves

Multi-user oblivious storage allows users to access their shared data on the cloud while retaining access pattern obliviousness and data confidentiality simultaneously. Most secure and efficient oblivious storage systems focus on the utilization of the maximum network bandwidth in serving concurrent accesses via a trusted proxy. How- ever, since the proxy executes a standard ORAM protocol over the network, the performance is capped by the network bandwidth and latency. Moreover, some important features such as access control and security against active adversaries have not been thoroughly explored in such proxy settings. In this paper, we propose MOSE, a multi-user oblivious storage system that is efficient and enjoys from some desirable security properties. Our main idea is to harness a secure enclave, namely Intel SGX, residing on the untrusted storage server to execute proxy logic, thereby, minimizing the network bottleneck of proxy-based designs. In this regard, we address various technical design chal- lenges such as memory constraints, side-channel attacks and scala- bility issues when enabling proxy logic in the secure enclave. We present a formal security model and analysis for secure enclave multi-user ORAM with access control. We optimize MOSE to boost its throughput in serving concurrent requests. We implemented MOSE and evaluated its performance on commodity hardware. Our evaluation confirmed the efficiency of MOSE, where it achieves approximately two orders of magnitudes higher throughput than the state-of-the-art proxy-based design, and also, its performance is scalable proportional to the available system resources.

DeepTrust: An Automatic Framework to Detect Trustworthy Users in Opinion-based Systems

Opinion spamming has recently gained attention as more and more online platforms rely on users' opinions to help potential customers make informed decisions on products and services. Yet, while work on opinion spamming abounds, most efforts have focused on detecting an individual reviewer as spammer or fraudulent. We argue that this is no longer sufficient, as reviewers may contribute to an opinion-based system in various ways, and their input could range from highly informative to noisy or even malicious. In an effort to improve the detection of trustworthy individuals within opinion-based systems, in this paper, we develop a supervised approach to differentiate among different types of reviewers. Particularly, we model the problem of detecting trustworthy reviewers as a multi-class classification problem, wherein users may be fraudulent, unreliable or uninformative, or trustworthy. We note that expanding from the classic binary classification of trustworthy/untrustworthy (or malicious) reviewers is an interesting and challenging problem. Some untrustworthy reviewers may behave similarly to reliable reviewers, and yet be rooted by dark motives. On the contrary, other untrustworthy reviewers may not be malicious but rather lazy or unable to contribute to the common knowledge of the reviewed item. Our proposed method, DeepTrust, relies on a deep recurrent neural network that provides embeddings aggregating temporal information: we consider users' behavior over time, as they review multiple products. We model the interactions of reviewers and the products they review using a temporal bipartite graph and consider the context of each rating by including other reviewers' ratings of the same items. We carry out extensive experiments on a real-world dataset of Amazon reviewers, with known ground truth about spammers and fraudulent reviews. Our results show that DeepTrust can detect trustworthy, uninformative, and fraudulent users with an F1-measure of 0.93. Also, we drastically improve on detecting fraudulent reviewers (AUROC of 0.97 and average precision of 0.99 when combining DeepTrust with the F&G algorithm) as compared to REV2 state-of-the-art methods (AUROC of 0.79 and average precision of 0.48). Further, DeepTrust is robust to cold start users and overperforms all existing baselines.

TrustAV: Practical and Privacy Preserving Malware Analysis in the Cloud

While the number of connected devices is constantly growing, we observe an increased incident rate of cyber attacks that target user data. Typically, personal devices contain the most sensitive information regarding their users, so there is no doubt that they can be a very valuable target for adversaries. Typical defense solution to safeguard user devices and data, are based in malware analysis mechanisms. To amortize the processing and maintenance overheads, the outsourcing of network inspection mechanisms to the cloud has become very popular recently. However, the majority of such cloud-based applications usually offers limited privacy preserving guarantees for data processing in third-party environments. In this work, we propose TrustAV, a practical cloud-based malware detection solution destined for a plethora of device types. TrustAV is able to offload the processing of malware analysis to a remote server, where it is executed entirely inside, hardware supported, secure enclaves. By doing so, TrustAV is capable to shield the transfer and processing of user data even in untrusted environments with tolerable performance overheads, ensuring that private user data are never exposed to malicious entities or honest-but-curious providers. TrustAV also utilizes various techniques in order to overcome performance overheads, introduced by the Intel SGX technology, and reduce the required enclave memory --a limiting factor for malware analysis executed in secure enclave environments-- offering up to 3x better performance.

SESSION: Session 2: Access Control and Authentication

Session details: Session 2: Access Control and Authentication

CREHMA: Cache-aware REST-ful HTTP Message Authentication

Scalability and security are two important elements of contemporary distributed software systems. The Web vividly shows that while complying with the constraints defined by the architectural style REST, the layered design of software with intermediate systems enables to scale at large. Intermediaries such as caches, however, interfere with the security guarantees of the industry standard for protecting data in transit on the Web, TLS, as in these circumstances the TLS channel already terminates at the intermediate system's server. For more in-depth defense strategies, service providers require message-oriented security means in addition to TLS. These are hardly available and only in the form of HTTP signature schemes that do not take caches into account either. In this paper we introduce CREHMA, a REST-ful HTTP message signature scheme that guarantees the integrity and authenticity of Web assets from end-to-end while simultaneous allowing service providers to enjoy the benefits of Web caches. Decisively, CREHMA achieves these guarantees without having to trust on the integrity of the cache and without requiring making changes to existing Web caching systems. In extensive experiments we evaluated CREHMA and found that it only introduces marginal impacts on metrics such as latency and data expansion while providing integrity protection from end to end. CREHMA thus extends the possibilities of service providers to achieve an appropriate balance between scalability and security.

Tap-Pair: Using Spatial Secrets for Single-Tap Device Pairing of Augmented Reality Headsets

Augmented Reality (AR) headsets, which allow for a realistic integration between the physical environment and virtual objects, are rapidly coming to customer and enterprise markets. This is largely because they enable a broad range of multi-user applications in which all participants experience the same augmentation of their natural surrounding. However, despite their increasing expansion, there currently exist no implemented methods for secure ad-hoc device pairing of multiple AR headsets. Given the importance of multi-user experiences for future applications of this technology, in this paper we propose two distinct ways to establish secure ad-hoc connections that rely only on typical user interactions in AR: gazing and tapping either at the location of a shared point on the wall or towards the user with whom one wants to connect. To show the feasibility and deployability of the proposed system to existing technology, we build a prototype of Tap-Pair, a system for ad-hoc pairing of AR headsets that is based on Password Authenticated Key Exchange protocols, requires only user interactions that are common in AR, and can be extended to more than two users. The experimental evaluation of the Tap-Pair prototype in a series of measurements at three different locations confirms the feasibility of our proposal, showing that the system built with currently available augmented reality headsets indeed achieves successful pairing in more than 90% of attempts, while keeping the probability of the attacker's success lower than 1e-3.

Admin-CBAC: An Administration Model for Category-Based Access Control

We present Admin-CBAC, an administrative model for Category- Based Access Control (CBAC). Since most of the access control models in use nowadays are instances of CBAC, in particular the popular RBAC and ABAC models, from Admin-CBAC we derive administrative models for RBAC and ABAC too. We define Admin- CBAC using Barker's metamodel, and use its axiomatic semantics to derive properties of administrative policies. Using an abstract operational semantics for administrative actions, we show how properties (such as safety, liveness and effectiveness of policies) and constraints (such as separation of duties) can be checked, and discuss the impact of policy changes. Although the most interesting properties of policies are generally undecidable in dynamic access control models, we identify particular cases where reachability based properties are decidable and can be checked using our operational semantics, generalising previous results for RBAC and ABACalpha.

SESSION: Session 3: Adversarial Machine Learning

Session details: Session 3: Adversarial Machine Learning

Random Spiking and Systematic Evaluation of Defenses Against Adversarial Examples

Image classifiers often suffer from adversarial examples, which are generated by strategically adding a small amount of noise to input images to trick classifiers into misclassification. Over the years, many defense mechanisms have been proposed, and different researchers have made seemingly contradictory claims on their effectiveness. We present an analysis of possible adversarial models, and propose an evaluation framework for comparing different defense mechanisms. As part of the framework, we introduce a more powerful and realistic adversary strategy. Furthermore, we propose a new defense mechanism called Random Spiking (RS), which generalizes dropout and introduces random noises in the training process in a controlled manner. Evaluations under our proposed framework suggest RS delivers better protection against adversarial examples than many existing schemes.

Backdoor Embedding in Convolutional Neural Network Models via Invisible Perturbation

Deep learning models have consistently outperformed traditional machine learning models in various classification tasks, including image classification. As such, they have become increasingly prevalent in many real world applications including those where security is of great concern. Such popularity, however, may attract attackers to exploit the vulnerabilities of the deployed deep learning models and launch attacks against security-sensitive applications. In this paper, we focus on a specific type of data poisoning attack, which we refer to as a \em backdoor injection attack. The main goal of the adversary performing such attack is to generate and inject a backdoor into a deep learning model that can be triggered to recognize certain embedded patterns with a target label of the attacker's choice. Additionally, a backdoor injection attack should occur in a stealthy manner, without undermining the efficacy of the victim model. Specifically, we propose two approaches for generating a backdoor that is hardly perceptible yet effective in poisoning the model. We consider two attack settings, with backdoor injection carried out either before model training or during model updating. We carry out extensive experimental evaluations under various assumptions on the adversary model, and demonstrate that such attacks can be effective and achieve a high attack success rate (above 90%) at a small cost of model accuracy loss with a small injection rate, even under the weakest assumption wherein the adversary has no knowledge either of the original training data or the classifier model.

Explore the Transformation Space for Adversarial Images

Deep learning models are vulnerable to adversarial examples. Most of current adversarial attacks add pixel-wise perturbations restricted to some \(L^p\)-norm, and defense models are evaluated also on adversarial examples restricted inside \(L^p\)-norm balls. However, we wish to explore adversarial examples exist beyond \(L^p\)-norm balls and their implications for attacks and defenses. In this paper, we focus on adversarial images generated by transformations. We start with color transformation and propose two gradient-based attacks. Since \(L^p\)-norm is inappropriate for measuring image quality in the transformation space, we use the similarity between transformations and the Structural Similarity Index. Next, we explore a larger transformation space consisting of combinations of color and affine transformations. We evaluate our transformation attacks on three data sets --- CIFAR10, SVHN, and ImageNet --- and their corresponding models. Finally, we perform retraining defenses to evaluate the strength of our attacks. The results show that transformation attacks are powerful. They find high-quality adversarial images that have higher transferability and misclassification rates than C&W's \(L^p \) attacks, especially at high confidence levels. They are also significantly harder to defend against by retraining than C&W's \(L^p \) attacks. More importantly, exploring different attack spaces makes it more challenging to train a universally robust model.

SESSION: Session 4: Privacy I

Session details: Session 4: Privacy I

AuthPDB: Authentication of Probabilistic Queries on Outsourced Uncertain Data

Query processing over uncertain data has gained much attention recently. Due to the high computational complexity of query evaluation on uncertain data, the data owner can outsource her data to a server that provides query evaluation as a service. However, a dishonest server may return cheap (and incorrect) query answers, hoping that the client who has weak computational power cannot catch the incorrect results. To address the integrity issue, in this paper, we design AuthPDB, a framework that supports efficient authentication of query evaluation for both all-answer and top-k queries on outsourced probabilistic databases. Our empirical results on real-world datasets demonstrate the effectiveness and efficiency of AuthPDB.

A Baseline for Attribute Disclosure Risk in Synthetic Data

The generation of synthetic data is widely considered as viable method for alleviating privacy concerns and for reducing identification and attribute disclosure risk in micro-data. The records in a synthetic dataset are artificially created and thus do not directly relate to individuals in the original data in terms of a 1-to-1 correspondence. As a result, inferences about said individuals appear to be infeasible and, simultaneously, the utility of the data may be kept at a high level. In this paper, we challenge this belief by interpreting the standard attacker model for attribute disclosure as classification problem. We show how disclosure risk measures presented in recent publications may be compared to or even be reformulated as machine learning classification models. Our overall goal is to empirically analyze attribute disclosure risk in synthetic data and to discuss its close relationship to data utility. Moreover, we improve the baseline for attribute disclosure risk from the attacker's perspective by applying variants of the RadiusNearestNeighbor and the EnsembleVote classifier.

SESSION: Poster Session

Session details: Poster Session

Evaluation of Secure Remote Offering Service for Information Bank

An information bank is a reliable data ecosystem for the distribution and utilization of personal data (PD). In order to maintain the trust of individuals, sharing of personal data between businesses and the information bank is required to be secure. Therefore, the information bank must prevent abuse and leaking of personal data. There are several measures that can be taken to limit the damage imposed upon the individual in the case of data abuse or leakage. However, it is difficult to prevent abuse and leakage once the data has been shared with businesses. This work focuses on the security of an offering service on the information bank. The information bank offers useful information or services to individuals from businesses' based on shared personal data. We devise a remote offering service enabling businesses to target individuals without sharing personal data. Moreover, we consider a malicious threat on the remote offering service and propose a mechanism for detecting this threat. The experimental results suggest that the proposed mechanism is useful in some real security use cases.

A Score Fusion Method by Neural Network in Multi-Factor Authentication

Recently, information security has attracted more interest from researchers. Personal authentication has become more important than ever, because authentication vulnerability is regarded as a problem. In cases where such high confidentiality is required, multi-factor authentication which combines multiple authentication factors is often used. In this study, we focus on score fusion method which merge authentication score of each factor in multi-factor authentication. In conventional score fusion methods, the weighting of factors is fixed. Therefore, they are not suitable when the tendency for factors of high accuracy is different between users. We propose a user dependent weighting score fusion method using neural network. Our proposed method is evaluated in comparison with conventional score fusion methods. The result shows that the accuracy of our proposed method is higher than conventional methods.

Service-Oriented Modeling for Cyber Threat Analysis

The future of enterprise cyber defense is predictive and the use of model-based threat hunting is an enabling technique. Current approaches to threat modeling are predicated on the assumption that models are used to develop better software, rather than to describe threats to software being used as a service (SaaS). In this paper, we propose a service-modeling methodology that will facilitate pro-active cyber defense for organizations adopting SaaS. We model structural and dynamic elements to provide a robust representation of the defensible system. Our approach is validated by implementing a prototype and by using it to model a popular course management system.

DRAT: A Drone Attack Tool for Vulnerability Assessment

Drones are usually associated with the military but in recent times, they are also used for public and commercial interests such as transporting of goods, communications, agriculture, disaster mitigation and environment preservation. However, like any system, drones have vulnerabilities that can be exploited which can jeopardise a drone's operation and may lead to loss of lives, property and money. Thus drones deployed must be carefully evaluated and selected. Pen-testing is a way to assess the vulnerabilities of drones but it may require multiple commands, files or scripts. In this work, we propose a tool to allow easy pen-testing and assessment of drones. Vulnerability assessment of the DJI Mavic 2 Pro is discussed extensively as well. Future work includes addressing the vulnerabilities of other drones and expanding the tool to conduct pen-testing on other drones.

GRAMAC: A Graph Based Android Malware Classification Mechanism

Android malware analysis has been an active area of research as the number and types of Android malwares have increased dramatically. Most of the previous works have used permission based model, behavioral analysis, and code analysis to identify the family of a malware. Code Analysis are weak against obfuscated approach, it does not include real time execution of the application. Behavioral analysis captures the runtime behavior but is weak when it comes to obfuscated applications. Permission based model only uses manifest files for analysing malwares. In this paper, we propose a novel graph signature based malware classification mechanism . The proposed graph signature uses sensitive API calls to capture the flow of control which helps to find a caller-callee relationship between the sensitive APIs and the nodes incident on them. A dataset of graph signatures of widely known malware families are then created. A new application's graph signature is compared with graph signatures in the dataset and the application is classified into the respective malware family or declared as goodware/unknown. Experiments with 15 malware families from the AMD dataset and a total of 400 applications gave an average accuracy of 0.97 with an error rate of 0.03.

A Performance Study on Cryptographic Algorithms for IoT Devices

Internet of Things (IoT) devices have grown in popularity over the past few years. These inter-connected devices collect and share data for automating industrial or household tasks. Despite its unprecedented growth, this paradigm currently faces many challenges that could hinder the deployment of such a system. These challenges include power, processing capabilities, and security, etc. Our project aims to explore these areas by studying an IoT network that secures data using common cryptographic algorithms, such as AES, ChaCha20, RSA, and Twofish. We measure computational time and power usage while running these cryptographic algorithms on IoT devices. Our findings show that while Twofish is the most power-efficient, Chacha20 is overall the most suitable one for IoT devices.

A Performance Comparison of WireGuard and OpenVPN

A fundamental problem that confronts virtual private network (VPN) applications is the overhead on throughput, ease of deployment and use, and overall utilization. WireGuard is a recently introduced light and secure cross-platform VPN application. It aims to simplify the process of setting up a secure connection while utilizing the multi-threading capability and minimizing the use of bandwidth. There have been several follow-up studies on WireGuard since its birth, most of which focus on the security analysis of the protocol. Despite the author's claim that WireGuard has impressive wins over OpenVPN and IPsec, there is no rigorous analysis of its performance to date. This paper presents a performance comparison of WireGuard and its main rival OpenVPN on various metrics. We construct an automated test framework and deploy it on a total of eight nodes, including remote AWS instances and local virtual machines. Our test results clearly show two main edges that WireGuard has over OpenVPN, its performance on multi-core machines and its light codebase.

Obscure: Information-Theoretically Secure, Oblivious, and Verifiable Aggregation Queries

We develop a secret-sharing-based prototype, entitled Obscure that provides communication-efficient and information-theoretically secure algorithms for aggregation queries using multi-party computation (MPC). The query execution algorithms over secret-shared data are developed to deal with an honest but curious, as well as, a malicious server by providing result verification algorithms. Obscure prevents an adversary to know the data, the query, and the tuple-identity satisfying the query.

Poisoning Attacks in Federated Learning: An Evaluation on Traffic Sign Classification

Federated Learning has recently gained attraction as a means to analyze data without having to centralize it from initially distributed data sources. Generally, this is achieved by only exchanging and aggregating the parameters of the locally learned models. This enables better handling of sensitive data, e.g. of individuals, or business related content. Applications can further benefit from the distributed nature of the learning by using multiple computer resources, and eliminating network communication overhead. Adversarial Machine Learning in general deals with attacks on the learning process, and backdoor attacks are one specific attack that tries to break the integrity of a model by manipulating the behavior on certain inputs. Recent work has shown that despite the benefits of Federated Learning, the distributed setting also opens up new attack vectors for adversaries. In this paper, we thus specifically study this manipulation of the training process to embed a backdoor on the example of a dataset for traffic sign classification. Extending earlier work, we specifically include the setting of sequential learning, in additional to parallel averaging, and perform a broad analysis on a number of different settings.

On the Impact of Word Representation in Hate Speech and Offensive Language Detection and Explanation

Online hate speech and offensive language have been widely recognized as critical social problems. To defend against this problem, several recent works have emerged that focus on the detection and explanation of hate speech and offensive language using machine learning approaches. Although these approaches are quite effective in the detection and explanation of hate speech and offensive language samples, they do not explore the impact of the representation of such samples. In this work, we introduce a novel, pronunciation-based representation of hate speech and offensive language samples to enable its detection with high accuracy. To demonstrate the effectiveness of our pronunciation-based representation, we extend an existing hate-speech and offensive language defense model based on deep Long Short-term Memory (LSTM) neural networks by using our pronunciation-based representation of hate speech and offensive language samples to train this model. Our work finds that the pronunciation-based presentation significantly reduces noise in the datasets and enhances the overall performance of the existing model.

Deployment-quality and Accessible Solutions for Cryptography Code Development

Cryptographic API misuses seriously threatens software security. Automatic screening of cryptographic misuse vulnerabilities has been a popular and important line of research over the years. However, the vision of producing a scalable detection tool that developers can routinely use to screen millions of line of code has not been achieved yet. Our main technical goal is to attain a high precision and high throughput approach based on specialized program analysis. Specifically, we design inter-procedural program slicing on top of a new on-demand flow-, context- and field- sensitive data flow analysis. Our current prototype named CryptoGuard can detect a wide range of Java cryptographic API misuses with a precision of 98.61%, when evaluated on 46 complex Apache Software Foundation projects (including, Spark, Ranger, and Ofbiz). Our evaluation on 6,181 Android apps also generated many security insights. We created a comprehensive benchmark named CryptoApi-Bench with 40-unit basic cases and 131-unit advanced cases for in-depth comparison with leading solutions (e.g., SpotBugs, CrySL, Coverity). To make CryptoGuard widely accessible, we are in the process of integrating CryptoGuard with the Software Assurance Marketplace (SWAMP). SWAMP is a popular no-cost service for continuous software assurance and static code analysis.

A Comprehensive Benchmark on Java Cryptographic API Misuses

Misuses of cryptographic APIs are prevalent in existing real-world Java code. Some open-sourced and commercial cryptographic vulnerability detection tools exist that capture misuses in Java program. To analyze their efficiency and coverage, we build a comprehensive benchmark named CryptoAPI-Bench that consists of 171 unit test cases. The test cases include basic cases and complex cases. We assess four tools i.e., SpotBugs, CryptoGuard, CrySL, and Coverity using CryptoAPI-Bench and show their relative performance.

SESSION: Session 5: Mobile Security

Session details: Session 5: Mobile Security

Defensive Charging: Mitigating Power Side-Channel Attacks on Charging Smartphones

Mobile devices are increasingly relied upon in user's daily lives. This dependence supports a growing network of mobile device charging hubs in public spaces such as airports. Unfortunately, the public nature of these hubs make them vulnerable to tampering. By embedding illicit power meters in the charging stations an attacker can launch power side-channel attacks aimed at inferring user activity on smartphones (e.g., web browsing or typing patterns). In this paper, we present three power side-channel attacks that can be launched by an adversary during the phone charging process. Such attacks use machine learning to identify unique patterns hidden in the measured current draw and infer information about a user's activity. To defend against these attacks, we design and rigorously evaluate two defense mechanisms, a hardware-based and software-based solution. The defenses randomly perturb the current drawn during charging thereby masking the unique patterns of the user's activities. Our experiments show that the two defenses force each one of the attacks to perform no better than random guessing. In practice, the user would only need to choose one of the defensive mechanisms to protect themselves against intrusions involving power draw analysis.

Dissecting Android Cryptocurrency Miners

Cryptojacking applications pose a serious threat to mobile devices. Due to the extensive computations, they deplete the battery fast and can even damage the device. In this work we make a step towards combating this threat. We collected and manually verified a large dataset of Android mining apps. In this paper, we analyze the gathered miners and identify how they work, what are the most popular libraries and APIs used to facilitate their development, and what static features are typical for this class of applications. Further, we analyzed our dataset using VirusTotal. The majority of our samples is considered malicious by at least one VirusTotal scanner, but 16 apps are not detected by any engine; and at least 5 apks were not seen previously by the service. Mining code could be obfuscated or fetched at runtime, and there are many confusing miner-related apps that actually do not mine. Thus, static features alone are not sufficient for miner detection. We have collected a feature set of dynamic metrics both for miners and unrelated benign apps, and built a machine learning-based tool for dynamic detection. Our BrenntDroid tool is able to detect miners with 95% of accuracy on our dataset.

Understanding Privacy Awareness in Android App Descriptions Using Deep Learning

Permissions are a key factor in Android to protect users' privacy. As it is often not obvious why applications require certain permissions, developer-provided descriptions in Google Play and third-party markets should explain to users how sensitive data is processed. Reliably recognizing whether app descriptions cover permission usage is challenging due to the lack of enforced quality standards and a variety of ways developers can express privacy-related facts. We introduce a machine learning-based approach to identify critical discrepancies between developer-described app behavior and permission usage. By combining state-of-the-art techniques in natural language processing (NLP) and deep learning, we design a convolutional neural network (CNN) for text classification that captures the relevance of words and phrases in app descriptions in relation to the usage of dangerous permissions. Our system predicts the likelihood that an app requires certain permissions and can warn about descriptions in which the requested access to sensitive user data and system features is textually not represented. We evaluate our solution on 77,000 real-world app descriptions and find that we can identify individual groups of dangerous permissions with a precision between 71% and 93%. To highlight the impact of individual words and phrases, we employ a model explanation algorithm and demonstrate that our technique can successfully bridge the semantic gap between described app functionality and its access to security- and privacy-sensitive resources.

FridgeLock: Preventing Data Theft on Suspended Linux with Usable Memory Encryption

To secure mobile devices, such as laptops and smartphones, against unauthorized physical data access, employing Full Disk Encryption (FDE) is a popular defense. This technique is effective if the device is always shut down when unattended. However, devices are often suspended instead of switched off. This leaves confidential data such as the FDE key, passphrases and user data in RAM which may be read out using cold boot, JTAG or DMA attacks. These attacks can be mitigated by encrypting the main memory during suspend. While this approach seems promising, it is not implemented on Windows or Linux. We present FridgeLock to add memory encryption on suspend to Linux. Our implementation as a Linux Kernel Module (LKM) does not require an admin to recompile the kernel. Using Dynamic Kernel Module Support (DKMS) allows for easy and fast deployment on existing Linux systems, where the distribution provides a prepackaged kernel and kernel updates. We tested our module on a range of 4.19 to 5.3 kernels and experienced a low performance impact, sustaining the system's usability. We hope that our tool leads to a more detailed evaluation of memory encryption in real world usage scenarios.

SESSION: Panel: A Vision for Winning the Cybersecurity Arms Race

Session details: Panel: A Vision for Winning the Cybersecurity Arms Race

Developing A Compelling Vision for Winning the Cybersecurity Arms Race

In cybersecurity there is a continuous arms race between the attackers and the defenders. In this panel, we investigate three key questions regarding this arms race. First question is whether this arms race is winnable. Second, if the answer to the first question is in the affirmative, what steps we need to take to win this race. Third, if the answer to the first question is negative, what is the justification for this and what steps can we take to improve the state of affairs and increase the bar for the attackers significantly.

SESSION: Session 6: System Security

Session details: Session 6: System Security

The Good, the Bad and the (Not So) Ugly of Out-of-Band Authentication with eID Cards and Push Notifications: Design, Formal and Risk Analysis

Everyday life is permeated by new technologies allowing people to perform almost any kind of operation from their smart devices. Although this is amazing from a convenience perspective, it may result in several security issues concerning the need for authenticating users in a proper and secure way. Electronic identity cards (also called eID cards) play a very important role in this regard, due to the high level of assurance they provide in identification and authentication processes. However, authentication solutions relying on them are still uncommon and suffer from many usability limitations. In this paper, we thus present the design and implementation of a novel passwordless, multi-factor authentication protocol based on eID cards. To reduce known usability issues while keeping a high level of security, our protocol leverages push notifications and mobile devices equipped with NFC, which can be used to interact with eID cards. In addition, we evaluate the security of the protocol through a formal security analysis and a risk analysis, whose results emphasize the acceptable level of security.

n-m-Variant Systems: Adversarial-Resistant Software Rejuvenation for Cloud-Based Web Applications

Web servers are a popular target for adversaries as they are publicly accessible and often vulnerable to compromise. Compromises can go unnoticed for months, if not years, and recovery often involves a complete system rebuild. In this paper, we propose n-m-Variant Systems, an adversarial-resistant software rejuvenation framework for cloud-based web applications. We improve the state-of-the-art by introducing a variable m that provides a knob for administrators to tune an environment to balance resource usage, performance overhead, and security guarantees. Using m, security guarantees can be tuned for seconds, minutes, days, or complete resistance. We design and implement an n-m-Variant System prototype to protect a Mediawiki PHP application serving dynamic content from an external SQL persistent storage. Our performance evaluation shows a throughput reduction of 65% for 108 seconds of resistance and 83% for 12 days of resistance to sophisticated adversaries, given appropriate resource allocation. Furthermore, we use theoretical analysis and simulation to characterize the impact of system parameters on resilience to adversaries. Through these efforts, our work demonstrates how properties of cloud-based servers can enhance the integrity of Web servers.

ZeroLender: Trustless Peer-to-Peer Bitcoin Lending Platform

Since its inception a decade ago, Bitcoin and its underlying blockchain technology have been garnering interest from a large spectrum of financial institutions. Although it encompasses a currency, a payment method, and a ledger, Bitcoin as it currently stands does not support bitcoins lending. In this paper, we present a platform called ZeroLender for peer-to-peer lending in Bitcoin. Our protocol utilizes zero-knowledge proofs to achieve unlinkability between lenders and borrowers while securing payments in both directions against potential malicious behaviour of the ZeroLender as well as the lenders, and prove by simulation that our protocol is privacy-preserving. Based on our experiments, we show that the runtime and transcript size of our protocol scale linearly with respect to the number of lenders and repayments.

SESSION: Session 7: IoT

Session details: Session 7: IoT

Attacking and Protecting Tunneled Traffic of Smart Home Devices

The number of smart home IoT (Internet of Things) devices has been growing fast in recent years. Along with the great benefits brought by smart home devices, new threats have appeared. One major threat to smart home users is the compromise of their privacy by traffic analysis (TA) attacks. Researchers have shown that TA attacks can be performed successfully on either plain or encrypted traffic to identify smart home devices and infer user activities. Tunneling traffic is a very strong countermeasure to existing TA attacks. However, in this work, we design a Signature based Tunneled Traffic Analysis (STTA) attack that can be effective even on tunneled traffic. Using a popular smart home traffic dataset, we demonstrate that our attack can achieve an 83% accuracy on identifying 14 smart home devices. We further design a simple defense mechanism based on adding uniform random noise to effectively protect against our TA attack without introducing too much overhead. We prove that our defense mechanism achieves approximate differential privacy.

SeCaS: Secure Capability Sharing Framework for IoT Devices in a Structured P2P Network

The emergence of the internet of Things (IoT) has resulted in the possession of a continuously increasing number of highly heterogeneous connected devices by the same owner. To make full use of the potential of a personal IoT network, there must be secure and effective cooperation between them. While application platforms (e.g., Samsung SmartThings) and interoperable protocols (e.g., MQTT) exist already, the reliance on a central hub to coordinate communication introduces a single-point of failure, provokes bottleneck problems and raises privacy concerns. In this paper we propose SeCaS, a Secure Capability Sharing framework, built on top of a peer-to-peer (P2P) architecture. SeCaS addresses the problems of fault tolerance, scalability and security in resource discovery and sharing for IoT infrastructures using a structured P2P network, in order to take advantage of the self-organised and decentralised communication it provides. SeCaS brings three main contributions: (i) a capability representation that allows each device to specify what services they offer, and can be used as a common language to search for, and exchange, capabilities, resulting in flexible service discovery that can leverage the properties on a distributed hash table (DHT); (ii) a set of four protocols that provides identification of the different devices that exist in the network and authenticity of the messages that are exchanged among them; and (iii) a thorough security and complexity analysis of the proposed scheme that shows SeCaS to be both secure and scalable.

IoT Expunge: Implementing Verifiable Retention of IoT Data

The growing deployment of Internet of Things (IoT) systems aims to ease the daily life of end-users by providing several value-added services. However, IoT systems may capture and store sensitive, personal data about individuals in the cloud, thereby jeopardizing user-privacy. Emerging legislation, such as California's CalOPPA and GDPR in Europe, support strong privacy laws to protect an individual's data in the cloud. One such law relates to strict enforcement of data retention policies. This paper proposes a framework, entitled IoT Expunge that allows sensor data providers to store the data in cloud platforms that will ensure enforcement of retention policies. Additionally, the cloud provider produces verifiable proofs of its adherence to the retention policies. Experimental results on a real-world smart building testbed show that IoT Expunge imposes minimal overheads to the user to verify the data against data retention policies.

SESSION: Session 8: Privacy II

Session details: Session 8: Privacy II

CREPE: A Privacy-Enhanced Crash Reporting System

Software crashes are nearly impossible to avoid. The reported crashes often contain useful information assisting developers in finding the root cause of the crash. However, crash reports may carry sensitive and private information about the users and their systems, which may be used by an attacker who has compromised the crash reporting system to violate the user's privacy and security. Besides, a single bug may trigger loads of identical reports which excessively consumes system resources and overwhelms application developers.

In this paper, we introduce CREPE, a security-concerned crash reporting solution, that effectively reduces the number of submitted crash reports to mitigate the security and privacy risk associated with the current implementation of the crash reporting system. Similar to the currently deployed systems, CREPE aggregates and categorizes the crashes based on their root cause. On top of that, the server marks the crash categories in which sufficient reports have been received as "saturated" and informs the clients periodically through software updates. On the client, CREPE engages the reporting application in categorizing each crash to only submit reports belonging to non-saturated categories. We evaluate CREPE using one year of data from Mozilla crash reporting system containing 38,834,383 reports of Firefox crashes. Our analysis suggests that we can significantly reduce the number of submitted reports by bucketing 100 most frequent crash signatures at the client. This helps to preserve the security and the privacy of a significant portion of users whose data has not been shared with the server due to the redundancy of their crash data with previously submitted reports.

A Hypothesis Testing Approach to Sharing Logs with Confidence

Logs generated by systems and applications contain a wide variety of heterogeneous information that is important for performance profiling, failure detection, and security analysis. There is a strong need for sharing the logs among different parties to outsource the analysis or to improve system and security research. However, sharing logs may inadvertently leak confidential or proprietary information. Besides sensitive information that is directly saved in logs, such as user-identifiers and software versions, indirect evidence like performance metrics can also lead to the leakage of sensitive information about the physical machines and the system. In this work, we introduce a game-based definition of the risk of exposing sensitive information through released logs. We propose log indistinguishability, a property that is met only when the logs leak little information about the protected sensitive attributes. We design an end-to-end framework that allows a user to identify risk of information leakage in logs, to protect the exposure with log redaction and obfuscation, and to release the logs with a much lower risk of exposing the sensitive attribute. Our framework contains a set of statistical tests to identify violations of the log indistinguishability property and a variety of obfuscation methods to prevent the leakage of sensitive information. The framework views the log-generating process as a black-box and can therefore be applied to different systems and processes. We perform case studies on two different types of log datasets: Spark event log and hardware counters. We show that our framework is effective in preventing the leakage of the sensitive attribute with a reasonable testing time and an acceptable utility loss in logs.

Rényi Differentially Private ADMM for Non-Smooth Regularized Optimization

In this paper we consider the problem of minimizing composite objective functions consisting of a convex differentiable loss function plus a non-smooth regularization term, such as $L_1$ norm or nuclear norm, under Rényi differential privacy (RDP). To solve the problem, we propose two stochastic alternating direction method of multipliers (ADMM) algorithms: ssADMM based on gradient perturbation and mpADMM based on output perturbation. Both algorithms decompose the original problem into sub-problems that have closed-form solutions. The first algorithm, ssADMM, applies the recent privacy amplification result for RDP to reduce the amount of noise to add. The second algorithm, mpADMM, numerically computes the sensitivity of ADMM variable updates and releases the updated parameter vector at the end of each epoch. We compare the performance of our algorithms with several baseline algorithms on both real and simulated datasets. Experimental results show that, in high privacy regimes (small ε), ssADMM and mpADMM outperform baseline algorithms in terms of classification and feature selection performance, respectively.

PREDICT: Efficient Private Disease Susceptibility Testing in Direct-to-Consumer Model

Genome sequencing has rapidly advanced in the last decade, making it easier for anyone to obtain digital genomes at low costs from companies such as Helix, MyHeritage, and 23andMe. Companies now offer their services in a direct-to-consumer (DTC) model without the intervention of a medical institution. Thereby, providing people with direct services for paternity testing, ancestry testing and disease susceptibility testing (DST) to infer diseases' predisposition. Genome analyses are partly motivated by curiosity and people often want to partake without fear of privacy invasion. Existing privacy protection solutions for DST adopt cryptographic techniques to protect the genome of a patient from the party responsible for computing the analysis. Said techniques include homomorphic encryption, which can be computationally expensive and could take minutes for only a few single-nucleotide polymorphisms (SNPs). A predominant approach is a solution that computes DST over encrypted data, but the design depends on a medical unit and exposes test results of patients to the medical unit, making the design uncomfortable for privacy-aware individuals. Hence it is pertinent to have an efficient privacy-preserving DST solution with a DTC service. We propose a novel DTC model that protects the privacy of SNPs and prevents leakage of test results to any other party save for the genome owner. Conversely, we protect the privacy of the algorithms or trade secrets used by the genome analyzing companies. Our work utilizes a secure obfuscation technique in computing DST, eliminating expensive computations over encrypted data. Our approach significantly outperforms existing state-of-the-art solutions in runtime and scales linearly for equivalent levels of security. As an example, computing DST for 10,000 SNPs requires approximately 96 milliseconds on commodity hardware. With this efficient and privacy-preserving solution which is also simulation-based secure, we open possibilities for performing genome analyses on collectively shared data resources.

SESSION: Session 9: Malware Detection

Session details: Session 9: Malware Detection

Deceiving Portable Executable Malware Classifiers into Targeted Misclassification with Practical Adversarial Examples

Due to voluminous malware attacks in the cyberspace, machine learning has become popular for automating malware detection and classification. In this work we play devil's advocate by investigating a new type of threats aimed at deceiving multi-class Portable Executable (PE) malware classifiers into targeted misclassification with practical adversarial samples. Using a malware dataset with tens of thousands of samples, we construct three types of PE malware classifiers, the first one based on frequencies of opcodes in the disassembled malware code (opcode classifier), the second one the list of API functions imported by each PE sample (API classifier), and the third one the list of system calls observed in dynamic execution (system call classifier). We develop a genetic algorithm augmented with different support functions to deceive these classifiers into misclassifying a PE sample into any target family. Using an Rbot malware sample whose source code is publicly available, we are able to create practical adversarial samples that can deceive the opcode classifier into targeted misclassification with a successful rate of 75%, the API classifier with a successful rate of 83.3%, and the system call classifier with a successful rate of 91.7%.

DANdroid: A Multi-View Discriminative Adversarial Network for Obfuscated Android Malware Detection

We present DANdroid, a novel Android malware detection model using a deep learning Discriminative Adversarial Network (DAN) that classifies both obfuscated and unobfuscated apps as either malicious or benign. Our method, which we empirically demonstrate is robust against a selection of four prevalent and real-world obfuscation techniques, makes three contributions. Firstly, an innovative application of discriminative adversarial learning results in malware feature representations with a strong degree of resilience to the four obfuscation techniques. Secondly, the use of three feature sets; raw opcodes, permissions and API calls, that are combined in a multi-view deep learning architecture to increase this obfuscation resilience. Thirdly, we demonstrate the potential of our model to generalize over rare and future obfuscation methods not seen in training. With an overall dataset of 68,880 obfuscated and unobfuscated malicious and benign samples, our multi-view DAN model achieves an average F-score of 0.973 that compares favourably with the state-of-the-art, despite being exposed to the selected obfuscation methods applied both individually and in combination.

PESC: A Per System-Call Stack Canary Design for Linux Kernel

Stack canary is the most widely deployed defense technique against stack buffer overflow attacks. However, since its proposition, the design of stack canary has very few improvements during the past 20 years, making it vulnerable to new and sophisticated attacks. For example, the ARM64 Linux kernel is still adopting the same design with StackGuard, using one global canary for the whole kernel. The x86_64 Linux kernel leverages a better design by using a per-task canary for different threads. Unfortunately, both of them are vulnerable to kernel memory leaks. Using the memory leak bugs or hardware side-channel attacks, e.g., Meltdown or Spectre, attackers can easily peek the kernel stack canary value, thus bypassing the protection. To address this issue, we proposed a fine-grained design of the kernel stack canary named PESC, standing for Per-System-Call Canary, which changes the kernel canary value on the system call basis. With PESC, attackers cannot accumulate any knowledge of prior canary across multiple system calls. In other words, PESC is resilient to memory leaks. Our key observation is that before serving a system call, the kernel stack is empty and there are no residual canary values on the stack. As a result, we can directly change the canary value on system call entry without the burden of tracking and updating old canary values on the kernel stack. Moreover, to balance the performance as well as the security, we proposed two PESC designs: one relies on the performance monitor counter register, termed as PESC-PMC, while the other one uses the kernel random number generator, denoted as PESC-RNG. We implemented both PESC-PMC and PESC-RNG on the real-world hardware, using HiKey960 board for ARM64 and Intel i7-7700 for x86_64. The synthetic benchmark and SPEC CPU2006 experimental results show that the performance overhead introduced by PESC-PMC and PESC-RNG on the whole system is less than 1%.

ISAdetect: Usable Automated Detection of CPU Architecture and Endianness for Executable Binary Files and Object Code

Static and dynamic binary analysis techniques are actively used to reverse engineer software's behavior and to detect its vulnerabilities, even when only the binary code is available for analysis. To avoid analysis errors due to misreading op-codes for a wrong CPU architecture, these analysis tools must precisely identify the Instruction Set Architecture (ISA) of the object code under analysis. The variety of CPU architectures that modern security and reverse engineering tools must support is ever increasing due to massive proliferation of IoT devices and the diversity of firmware and malware targeting those devices. Recent studies concluded that falsely identifying the binary code's ISA caused alone about 10% of failures of IoT firmware analysis. The state of the art approaches detecting ISA for executable object code look promising, and their results demonstrate effectiveness and high-performance. However, they lack the support of publicly available datasets and toolsets, which makes the evaluation, comparison, and improvement of those techniques, datasets, and machine learning models quite challenging (if not impossible). This paper bridges multiple gaps in the field of automated and precise identification of architecture and endianness of binary files and object code. We develop from scratch the toolset and datasets that are lacking in this research space. As such, we contribute a comprehensive collection of open data, open source, and open API web-services. We also attempt experiment reconstruction and cross-validation of effectiveness, efficiency, and results of the state of the art methods. When training and testing classifiers using solely code-sections from executable binary files, all our classifiers performed equally well achieving over 98% accuracy. The results are consistent and comparable with the current state of the art, hence supports the general validity of the algorithms, features, and approaches suggested in those works.