[[2208.02857] Identity-Based Authentication for On-Demand Charging of Electric Vehicles](http://arxiv.org/abs/2208.02857)
Dynamic wireless power transfer provides means for charging Electric Vehicles (EVs) while driving, avoiding stopping for charging and hence fostering their widespread adoption. Researchers devoted much effort over the last decade to provide a reliable infrastructure for potential users to improve comfort and time management. Due to the severe security and performance system requirements, the different scheme proposed in last years lack of a unified protocol involving the modern architecture model with merged authentication and billing processes. Furthermore, they require the continuous interaction of the trusted entity during the process, increasing the delay for the communication and reducing security due to the large number of message exchanges. In this paper, we propose a secure, computationally lightweight, unified protocol for fast authentication and billing that provides on-demand dynamic charging to comprehensively deal with all the computational and security constraints. The protocol employs an ID-based public encryption scheme to manage mutual authentication and pseudonyms to preserve the user's identity across multiple charging processes. Compared to state-of-the-art authentication protocols, our proposal overcomes the problem of overwhelming interactions and provides public scheme security against the use of simple operations in wide open communications without impacting on performance.
[[2208.02877] A Forward-secure Efficient Two-factor Authentication Protocol](http://arxiv.org/abs/2208.02877)
Two-factor authentication (2FA) schemes that rely on a combination of knowledge factors (e.g., PIN) and device possession have gained popularity. Some of these schemes remain secure even against strong adversaries that (a) observe the traffic between a client and server, and (b) have physical access to the client's device, or its PIN, or breach the server. However, these solutions have several shortcomings; namely, they (i) require a client to remember multiple secret values to prove its identity, (ii) involve several modular exponentiations, and (iii) are in the non-standard random oracle model. In this work, we present a 2FA protocol that resists such a strong adversary while addressing the above shortcomings. Our protocol requires a client to remember only a single secret value/PIN, does not involve any modular exponentiations, and is in a standard model. It is the first one that offers these features without using trusted chipsets. This protocol also imposes up to 40% lower communication overhead than the state-of-the-art solutions do.
[[2208.02820] MOVE: Effective and Harmless Ownership Verification via Embedded External Features](http://arxiv.org/abs/2208.02820)
Currently, deep neural networks (DNNs) are widely adopted in different applications. Despite its commercial values, training a well-performed DNN is resource-consuming. Accordingly, the well-trained model is valuable intellectual property for its owner. However, recent studies revealed the threats of model stealing, where the adversaries can obtain a function-similar copy of the victim model, even when they can only query the model. In this paper, we propose an effective and harmless model ownership verification (MOVE) to defend against different types of model stealing simultaneously, without introducing new security risks. In general, we conduct the ownership verification by verifying whether a suspicious model contains the knowledge of defender-specified external features. Specifically, we embed the external features by tempering a few training samples with style transfer. We then train a meta-classifier to determine whether a model is stolen from the victim. This approach is inspired by the understanding that the stolen models should contain the knowledge of features learned by the victim model. In particular, we develop our MOVE method under both white-box and black-box settings to provide comprehensive model protection. Extensive experiments on benchmark datasets verify the effectiveness of our method and its resistance to potential adaptive attacks. The codes for reproducing the main experiments of our method are available at \url{https://github.com/THUYimingLi/MOVE}.
[[2208.02858] An Empirical Study on Ethereum Private Transactions and the Security Implications](http://arxiv.org/abs/2208.02858)
Recently, Decentralized Finance (DeFi) platforms on Ethereum are booming, and numerous traders are trying to capitalize on the opportunity for maximizing their benefits by launching front-running attacks and extracting Miner Extractable Values (MEVs) based on information in the public mempool. To protect end users from being harmed and hide transactions from the mempool, private transactions, a special type of transactions that are sent directly to miners, were invented. Private transactions have a high probability of being packed to the front positions of a block and being added to the blockchain by the target miner, without going through the public mempool, thus reducing the risk of being attacked by malicious entities.
Despite the good intention of inventing private transactions, due to their stealthy nature, private transactions have also been used by attackers to launch attacks, which has a negative impact on the Ethereum ecosystem. However, existing works only touch upon private transactions as by-products when studying MEV, while a systematic study on private transactions is still missing. To fill this gap and paint a complete picture of private transactions, we take the first step towards investigating the private transactions on Ethereum. In particular, we collect large-scale private transaction datasets and perform analysis on their characteristics, transaction costs and miner profits, as well as security impacts. This work provides deep insights on different aspects of private transactions.
[[2208.02878] Differentially Private Counterfactuals via Functional Mechanism](http://arxiv.org/abs/2208.02878)
Counterfactual, serving as one emerging type of model explanation, has attracted tons of attentions recently from both industry and academia. Different from the conventional feature-based explanations (e.g., attributions), counterfactuals are a series of hypothetical samples which can flip model decisions with minimal perturbations on queries. Given valid counterfactuals, humans are capable of reasoning under ``what-if'' circumstances, so as to better understand the model decision boundaries. However, releasing counterfactuals could be detrimental, since it may unintentionally leak sensitive information to adversaries, which brings about higher risks on both model security and data privacy. To bridge the gap, in this paper, we propose a novel framework to generate differentially private counterfactual (DPC) without touching the deployed model or explanation set, where noises are injected for protection while maintaining the explanation roles of counterfactual. In particular, we train an autoencoder with the functional mechanism to construct noisy class prototypes, and then derive the DPC from the latent prototypes based on the post-processing immunity of differential privacy. Further evaluations demonstrate the effectiveness of the proposed framework, showing that DPC can successfully relieve the risks on both extraction and inference attacks.
[[2208.02883] Beware of Discarding Used SRAMs: Information is Stored Permanently](http://arxiv.org/abs/2208.02883)
Data recovery has long been a focus of the electronics industry for decades by security experts, focusing on hard disk recovery, a type of non-volatile memory. Unfortunately, none of the existing research, neither from academia, industry, or government, have ever considered data recovery from volatile memories. The data is lost when it is powered off, by definition. To the best of our knowledge, we are the first to present an approach to recovering data from a static random access memory. It is conventional wisdom that SRAM loses its contents whenever it turns off, and it is not required to protect sensitive information, e.g., the firmware code, secret encryption keys, etc., when an SRAM-based computing system retires. Unfortunately, the recycling of integrated circuits poses a severe threat to the protection of intellectual properties. In this paper, we present a novel concept to retrieve SRAM data as aging leads to a power-up state with an imprint of the stored values. We show that our proposed approaches can partially recover the previously used SRAM content. The accuracy of the recovered data can be further increased by incorporating multiple SRAM chips compared to a single one. It is impossible to retrieve the prior content of some stable SRAM cells, where aging shifts these cells towards stability. As the locations of these cells vary from chip to chip due to uncontrollable process variation, the same cell has a higher chance of being unstable or stable against aging in any of the chips, which helps us recover the content. Finally, majority voting is used to combine a set of SRAM chips' data to recover the stored data. We present our experimental result using commercial off-the-shelf SRAMs with stored binary image data before performing accelerated aging. We demonstrate the successful partial retrieval on SRAMs that are aged with as little as 4 hours of accelerated aging with 85C.
[[2208.02906] Quantifying the Sensitivity and Unclonability of Optical Physical Unclonable Functions](http://arxiv.org/abs/2208.02906)
Due to their unmatched entropy, complexity, and security level, optical Physical Unclonable Functions (PUFs) currently receive a lot of interest in the literature. Despite the large body of existing works, however, one of their core features has never been quantified in detail, namely their physical unclonability. This paper tackles this fundamental and yet largely unaddressed issue. In simulations and/or experiments, the sensitivity of diffraction-based optical responses is investigated with respect to various small alterations such as variation in the position, size, and number of the scatterers, as well as perturbations in the spatial alignment between the physical unclonable function (PUF) and the measurement apparatus. Our analysis focuses on 2D optical PUFs because of their relevance in integrated applications and the need to reply to security concerns that can be raised when the physical structure of the geometry is accessible. Among the results of this study, the sensitivity analysis shows that a positional perturbation of scatterers on the order of \SI{30}{\nano\meter}, i.e., far below the wavelength of the probing laser light of \SI{632}{\nano\meter} wavelength, is sufficient to invalidate the PUF response and thus detect a forgery attempt. These results support and quantify the high adversarial efforts required to clone optical PUFs, even for 2D layouts.
[[2208.02999] Cryptoeconomic Security for Data Availability Committees](http://arxiv.org/abs/2208.02999)
Layer 2 systems have received increasing attention due to their potential to scale the throughput of L1 blockchains. To avoid the cost of putting data on chain, these systems increasingly turn to off-chain data availability solutions such as data availability commitees (DACs). However, placing trust on DACs conflicts with the goal of obtaining an L2 architecture whose security relies solely on the L1 chain. To eliminate such trust assumptions, we propose a DAC protocol that provides financial incentives to deter the DAC nodes from adversarial behavior. We then analyze the interaction of rational DAC nodes and clients as a dynamic game, with a Byzantine adversary that can corrupt and bribe the participants. We also define a notion of optimality for the DAC protocols, inspired by fairness and economic feasibility. Our main result shows that our protocol is optimal and guarantees security with the highest possible probability under reasonable assumptions on the adversary.
[[2208.03099] Planning and Scheduling in Digital Health with Answer Set Programming](http://arxiv.org/abs/2208.03099)
In the hospital world there are several complex combinatory problems, and solving these problems is important to increase the degree of patients' satisfaction and the quality of care offered. The problems in the healthcare are complex since to solve them several constraints and different type of resources should be taken into account. Moreover, the solutions must be evaluated in a small amount of time to ensure the usability in real scenarios. We plan to propose solutions to these kind of problems both expanding already tested solutions and by modelling solutions for new problems, taking into account the literature and by using real data when available. Solving these kind of problems is important but, since the European Commission established with the General Data Protection Regulation that each person has the right to ask for explanation of the decision taken by an AI, without developing Explainability methodologies the usage of AI based solvers e.g. those based on Answer Set programming will be limited. Thus, another part of the research will be devoted to study and propose new methodologies for explaining the solutions obtained.
[[2208.03309] Lethal Dose Conjecture on Data Poisoning](http://arxiv.org/abs/2208.03309)
Data poisoning considers an adversary that distorts the training set of machine learning algorithms for malicious purposes. In this work, we bring to light one conjecture regarding the fundamentals of data poisoning, which we call the Lethal Dose Conjecture. The conjecture states: If $n$ clean training samples are needed for accurate predictions, then in a size-$N$ training set, only $\Theta(N/n)$ poisoned samples can be tolerated while ensuring accuracy. Theoretically, we verify this conjecture in multiple cases. We also offer a more general perspective of this conjecture through distribution discrimination. Deep Partition Aggregation (DPA) and its extension, Finite Aggregation (FA) are recent approaches for provable defenses against data poisoning, where they predict through the majority vote of many base models trained from different subsets of training set using a given learner. The conjecture implies that both DPA and FA are (asymptotically) optimal -- if we have the most data-efficient learner, they can turn it into one of the most robust defenses against data poisoning. This outlines a practical approach to developing stronger defenses against poisoning via finding data-efficient learners. Empirically, as a proof of concept, we show that by simply using different data augmentations for base learners, we can respectively double and triple the certified robustness of DPA on CIFAR-10 and GTSRB without sacrificing accuracy.
[[2208.02917] Padding-only defenses add delay in Tor](http://arxiv.org/abs/2208.02917)
Website fingerprinting is an attack that uses size and timing characteristics of encrypted downloads to identify targeted websites. Since this can defeat the privacy goals of anonymity networks such as Tor, many algorithms to defend against this attack in Tor have been proposed in the literature. These algorithms typically consist of some combination of the injection of dummy "padding" packets with the delay of actual packets to disrupt timing patterns. For usability reasons, Tor is intended to provide low latency; as such, many authors focus on padding-only defenses in the belief that they are "zero-delay." We demonstrate through Shadow simulations that by increasing queue lengths, padding-only defenses add delay when deployed network-wide, so they should not be considered "zero-delay." We further argue that future defenses should also be evaluated using network-wide deployment simulations
[[2208.03111] Data-free Backdoor Removal based on Channel Lipschitzness](http://arxiv.org/abs/2208.03111)
Recent studies have shown that Deep Neural Networks (DNNs) are vulnerable to the backdoor attacks, which leads to malicious behaviors of DNNs when specific triggers are attached to the input images. It was further demonstrated that the infected DNNs possess a collection of channels, which are more sensitive to the backdoor triggers compared with normal channels. Pruning these channels was then shown to be effective in mitigating the backdoor behaviors. To locate those channels, it is natural to consider their Lipschitzness, which measures their sensitivity against worst-case perturbations on the inputs. In this work, we introduce a novel concept called Channel Lipschitz Constant (CLC), which is defined as the Lipschitz constant of the mapping from the input images to the output of each channel. Then we provide empirical evidences to show the strong correlation between an Upper bound of the CLC (UCLC) and the trigger-activated change on the channel activation. Since UCLC can be directly calculated from the weight matrices, we can detect the potential backdoor channels in a data-free manner, and do simple pruning on the infected DNN to repair the model. The proposed Channel Lipschitzness based Pruning (CLP) method is super fast, simple, data-free and robust to the choice of the pruning threshold. Extensive experiments are conducted to evaluate the efficiency and effectiveness of CLP, which achieves state-of-the-art results among the mainstream defense methods even without any data. Source codes are available at https://github.com/rkteddy/channel-Lipschitzness-based-pruning.
[[2208.03110] MorDeephy: Face Morphing Detection Via Fused Classification](http://arxiv.org/abs/2208.03110)
Face morphing attack detection (MAD) is one of the most challenging tasks in the field of face recognition nowadays. In this work, we introduce a novel deep learning strategy for a single image face morphing detection, which implies the discrimination of morphed face images along with a sophisticated face recognition task in a complex classification scheme. It is directed onto learning the deep facial features, which carry information about the authenticity of these features. Our work also introduces several additional contributions: the public and easy-to-use face morphing detection benchmark and the results of our wild datasets filtering strategy. Our method, which we call MorDeephy, achieved the state of the art performance and demonstrated a prominent ability for generalising the task of morphing detection to unseen scenarios.
[[2208.03276] Modeling Self-Propagating Malware with Epidemiological Models](http://arxiv.org/abs/2208.03276)
Self-propagating malware (SPM) has recently resulted in large financial losses and high social impact, with well-known campaigns such as WannaCry and Colonial Pipeline being able to propagate rapidly on the Internet and cause service disruptions. To date, the propagation behavior of SPM is still not well understood, resulting in the difficulty of defending against these cyber threats. To address this gap, in this paper we perform a comprehensive analysis of a newly proposed epidemiological model for SPM propagation, Susceptible-Infected-Infected Dormant-Recovered (SIIDR). We perform a theoretical analysis of the stability of the SIIDR model and derive its basic reproduction number by representing it as a system of Ordinary Differential Equations with continuous time. We obtain access to 15 WananCry attack traces generated under various conditions, derive the model's transition rates, and show that SIIDR fits best the real data. We find that the SIIDR model outperforms more established compartmental models from epidemiology, such as SI, SIS, and SIR, at modeling SPM propagation.
[[2208.02851] Self-Ensembling Vision Transformer (SEViT) for Robust Medical Image Classification](http://arxiv.org/abs/2208.02851)
Vision Transformers (ViT) are competing to replace Convolutional Neural Networks (CNN) for various computer vision tasks in medical imaging such as classification and segmentation. While the vulnerability of CNNs to adversarial attacks is a well-known problem, recent works have shown that ViTs are also susceptible to such attacks and suffer significant performance degradation under attack. The vulnerability of ViTs to carefully engineered adversarial samples raises serious concerns about their safety in clinical settings. In this paper, we propose a novel self-ensembling method to enhance the robustness of ViT in the presence of adversarial attacks. The proposed Self-Ensembling Vision Transformer (SEViT) leverages the fact that feature representations learned by initial blocks of a ViT are relatively unaffected by adversarial perturbations. Learning multiple classifiers based on these intermediate feature representations and combining these predictions with that of the final ViT classifier can provide robustness against adversarial attacks. Measuring the consistency between the various predictions can also help detect adversarial samples. Experiments on two modalities (chest X-ray and fundoscopy) demonstrate the efficacy of SEViT architecture to defend against various adversarial attacks in the gray-box (attacker has full knowledge of the target model, but not the defense mechanism) setting. Code: https://github.com/faresmalik/SEViT
[[2208.03142] BoxShrink: From Bounding Boxes to Segmentation Masks](http://arxiv.org/abs/2208.03142)
One of the core challenges facing the medical image computing community is fast and efficient data sample labeling. Obtaining fine-grained labels for segmentation is particularly demanding since it is expensive, time-consuming, and requires sophisticated tools. On the contrary, applying bounding boxes is fast and takes significantly less time than fine-grained labeling, but does not produce detailed results. In response, we propose a novel framework for weakly-supervised tasks with the rapid and robust transformation of bounding boxes into segmentation masks without training any machine learning model, coined BoxShrink. The proposed framework comes in two variants - rapid-BoxShrink for fast label transformations, and robust-BoxShrink for more precise label transformations. An average of four percent improvement in IoU is found across several models when being trained using BoxShrink in a weakly-supervised setting, compared to using only bounding box annotations as inputs on a colonoscopy image data set. We open-sourced the code for the proposed framework and published it online.
[[2208.03232] Driving Points Prediction For Abdominal Probabilistic Registration](http://arxiv.org/abs/2208.03232)
Inter-patient abdominal registration has various applications, from pharmakinematic studies to anatomy modeling. Yet, it remains a challenging application due to the morphological heterogeneity and variability of the human abdomen. Among the various registration methods proposed for this task, probabilistic displacement registration models estimate displacement distribution for a subset of points by comparing feature vectors of points from the two images. These probabilistic models are informative and robust while allowing large displacements by design. As the displacement distributions are typically estimated on a subset of points (which we refer to as driving points), due to computational requirements, we propose in this work to learn a driving points predictor. Compared to previously proposed methods, the driving points predictor is optimized in an end-to-end fashion to infer driving points tailored for a specific registration pipeline. We evaluate the impact of our contribution on two different datasets corresponding to different modalities. Specifically, we compared the performances of 6 different probabilistic displacement registration models when using a driving points predictor or one of 2 other standard driving points selection methods. The proposed method improved performances in 11 out of 12 experiments.
[[2208.02934] A Noise-Robust Loss for Unlabeled Entity Problem in Named Entity Recognition](http://arxiv.org/abs/2208.02934)
Named Entity Recognition (NER) is an important task in natural language processing. However, traditional supervised NER requires large-scale annotated datasets. Distantly supervision is proposed to alleviate the massive demand for datasets, but datasets constructed in this way are extremely noisy and have a serious unlabeled entity problem. The cross entropy (CE) loss function is highly sensitive to unlabeled data, leading to severe performance degradation. As an alternative, we propose a new loss function called NRCES to cope with this problem. A sigmoid term is used to mitigate the negative impact of noise. In addition, we balance the convergence and noise tolerance of the model according to samples and the training process. Experiments on synthetic and real-world datasets demonstrate that our approach shows strong robustness in the case of severe unlabeled entity problem, achieving new state-of-the-art on real-world datasets.
[[2208.03274] A Holistic Approach to Undesired Content Detection in the Real World](http://arxiv.org/abs/2208.03274)
We present a holistic approach to building a robust and useful natural language classification system for real-world content moderation. The success of such a system relies on a chain of carefully designed and executed steps, including the design of content taxonomies and labeling instructions, data quality control, an active learning pipeline to capture rare events, and a variety of methods to make the model robust and to avoid overfitting. Our moderation system is trained to detect a broad set of categories of undesired content, including sexual content, hateful content, violence, self-harm, and harassment. This approach generalizes to a wide range of different content taxonomies and can be used to create high-quality content classifiers that outperform off-the-shelf models.
[[2208.03295] Learning from data in the mixed adversarial non-adversarial case: Finding the helpers and ignoring the trolls](http://arxiv.org/abs/2208.03295)
The promise of interaction between intelligent conversational agents and humans is that models can learn from such feedback in order to improve. Unfortunately, such exchanges in the wild will not always involve human utterances that are benign or of high quality, and will include a mixture of engaged (helpers) and unengaged or even malicious users (trolls). In this work we study how to perform robust learning in such an environment. We introduce a benchmark evaluation, SafetyMix, which can evaluate methods that learn safe vs. toxic language in a variety of adversarial settings to test their robustness. We propose and analyze several mitigating learning algorithms that identify trolls either at the example or at the user level. Our main finding is that user-based methods, that take into account that troll users will exhibit adversarial behavior across multiple examples, work best in a variety of settings on our benchmark. We then test these methods in a further real-life setting of conversations collected during deployment, with similar results.
[[2208.03306] Branch-Train-Merge: Embarrassingly Parallel Training of Expert Language Models](http://arxiv.org/abs/2208.03306)
We present Branch-Train-Merge (BTM), a communication-efficient algorithm for embarrassingly parallel training of large language models (LLMs). We show it is possible to independently train subparts of a new class of LLMs on different subsets of the data, eliminating the massive multi-node synchronization currently required to train LLMs. BTM learns a set of independent expert LMs (ELMs), each specialized to a different textual domain, such as scientific or legal text. These ELMs can be added and removed to update data coverage, ensembled to generalize to new domains, or averaged to collapse back to a single LM for efficient inference. New ELMs are learned by branching from (mixtures of) ELMs in the current set, further training the parameters on data for the new domain, and then merging the resulting model back into the set for future use. Experiments show that BTM improves in- and out-of-domain perplexities as compared to GPT-style Transformer LMs, when controlling for training cost. Through extensive analysis, we show that these results are robust to different ELM initialization schemes, but require expert domain specialization; LM ensembles with random data splits do not perform well. We also present a study of scaling BTM into a new corpus of 64 domains (192B whitespace-separated tokens in total); the resulting LM (22.4B total parameters) performs as well as a Transformer LM trained with 2.5 times more compute. These gains grow with the number of domains, suggesting more aggressive parallelism could be used to efficiently train larger models in future work.
[[2208.02922] ACE: Adaptive Constraint-aware Early Stopping in Hyperparameter Optimization](http://arxiv.org/abs/2208.02922)
Deploying machine learning models requires high model quality and needs to comply with application constraints. That motivates hyperparameter optimization (HPO) to tune model configurations under deployment constraints. The constraints often require additional computation cost to evaluate, and training ineligible configurations can waste a large amount of tuning cost. In this work, we propose an Adaptive Constraint-aware Early stopping (ACE) method to incorporate constraint evaluation into trial pruning during HPO. To minimize the overall optimization cost, ACE estimates the cost-effective constraint evaluation interval based on a theoretical analysis of the expected evaluation cost. Meanwhile, we propose a stratum early stopping criterion in ACE, which considers both optimization and constraint metrics in pruning and does not require regularization hyperparameters. Our experiments demonstrate superior performance of ACE in hyperparameter tuning of classification tasks under fairness or robustness constraints.
[[2208.03066] Tailoring to the Tails: Risk Measures for Fine-Grained Tail Sensitivity](http://arxiv.org/abs/2208.03066)
Expected risk minimization (ERM) is at the core of machine learning systems. This means that the risk inherent in a loss distribution is summarized using a single number - its average. In this paper, we propose a general approach to construct risk measures which exhibit a desired tail sensitivity and may replace the expectation operator in ERM. Our method relies on the specification of a reference distribution with a desired tail behaviour, which is in a one-to-one correspondence to a coherent upper probability. Any risk measure, which is compatible with this upper probability, displays a tail sensitivity which is finely tuned to the reference distribution. As a concrete example, we focus on divergence risk measures based on f-divergence ambiguity sets, which are a widespread tool used to foster distributional robustness of machine learning systems. For instance, we show how ambiguity sets based on the Kullback-Leibler divergence are intricately tied to the class of subexponential random variables. We elaborate the connection of divergence risk measures and rearrangement invariant Banach norms.
[[2208.03160] Almost-Orthogonal Layers for Efficient General-Purpose Lipschitz Networks](http://arxiv.org/abs/2208.03160)
It is a highly desirable property for deep networks to be robust against small input changes. One popular way to achieve this property is by designing networks with a small Lipschitz constant. In this work, we propose a new technique for constructing such Lipschitz networks that has a number of desirable properties: it can be applied to any linear network layer (fully-connected or convolutional), it provides formal guarantees on the Lipschitz constant, it is easy to implement and efficient to run, and it can be combined with any training objective and optimization method. In fact, our technique is the first one in the literature that achieves all of these properties simultaneously. Our main contribution is a rescaling-based weight matrix parametrization that guarantees each network layer to have a Lipschitz constant of at most 1 and results in the learned weight matrices to be close to orthogonal. Hence we call such layers almost-orthogonal Lipschitz (AOL). Experiments and ablation studies in the context of image classification with certified robust accuracy confirm that AOL layers achieve results that are on par with most existing methods. Yet, they are simpler to implement and more broadly applicable, because they do not require computationally expensive matrix orthogonalization or inversion steps as part of the network architecture. We provide code at https://github.com/berndprach/AOL.
[[2208.03249] Parameter Averaging for Robust Explainability](http://arxiv.org/abs/2208.03249)
Neural Networks are known to be sensitive to initialisation. The explanation methods that rely on neural networks are not robust since they can have variations in their explanations when the model is initialized and trained with different random seeds. The sensitivity to model initialisation is not desirable in many safety critical applications such as disease diagnosis in healthcare, in which the explainability might have a significant impact in helping decision making. In this work, we introduce a novel method based on parameter averaging for robust explainability in tabular data setting, referred as XTab. We first initialize and train multiple instances of a shallow network (referred as local masks) with different random seeds for a downstream task. We then obtain a global mask model by "averaging the parameters" of local masks and show that the global model uses the majority rule to rank features based on their relative importance across all local models. We conduct extensive experiments on a variety of real and synthetic datasets, demonstrating that the proposed method can be used for feature selection as well as to obtain the global feature importance that are not sensitive to sub-optimal model initialisation.
[[2208.03138] Human Saliency-Driven Patch-based Matching for Interpretable Post-mortem Iris Recognition](http://arxiv.org/abs/2208.03138)
Forensic iris recognition, as opposed to live iris recognition, is an emerging research area that leverages the discriminative power of iris biometrics to aid human examiners in their efforts to identify deceased persons. As a machine learning-based technique in a predominantly human-controlled task, forensic recognition serves as "back-up" to human expertise in the task of post-mortem identification. As such, the machine learning model must be (a) interpretable, and (b) post-mortem-specific, to account for changes in decaying eye tissue. In this work, we propose a method that satisfies both requirements, and that approaches the creation of a post-mortem-specific feature extractor in a novel way employing human perception. We first train a deep learning-based feature detector on post-mortem iris images, using annotations of image regions highlighted by humans as salient for their decision making. In effect, the method learns interpretable features directly from humans, rather than purely data-driven features. Second, regional iris codes (again, with human-driven filtering kernels) are used to pair detected iris patches, which are translated into pairwise, patch-based comparison scores. In this way, our method presents human examiners with human-understandable visual cues in order to justify the identification decision and corresponding confidence score. When tested on a dataset of post-mortem iris images collected from 259 deceased subjects, the proposed method places among the three best iris matchers, demonstrating better results than the commercial (non-human-interpretable) VeriEye approach. We propose a unique post-mortem iris recognition method trained with human saliency to give fully-interpretable comparison outcomes for use in the context of forensic examination, achieving state-of-the-art recognition performance.
[[2208.03051] Hybrid Multimodal Feature Extraction, Mining and Fusion for Sentiment Analysis](http://arxiv.org/abs/2208.03051)
In this paper, we present our solutions for the Multimodal Sentiment Analysis Challenge (MuSe) 2022, which includes MuSe-Humor, MuSe-Reaction and MuSe-Stress Sub-challenges. The MuSe 2022 focuses on humor detection, emotional reactions and multimodal emotional stress utilising different modalities and data sets. In our work, different kinds of multimodal features are extracted, including acoustic, visual, text and biological features. These features are fused by TEMMA and GRU with self-attention mechanism frameworks. In this paper, 1) several new audio features, facial expression features and paragraph-level text embeddings are extracted for accuracy improvement. 2) we substantially improve the accuracy and reliability for multimodal sentiment prediction by mining and blending the multimodal features. 3) effective data augmentation strategies are applied in model training to alleviate the problem of sample imbalance and prevent the model form learning biased subject characters. For the MuSe-Humor sub-challenge, our model obtains the AUC score of 0.8932. For the MuSe-Reaction sub-challenge, the Pearson's Correlations Coefficient of our approach on the test set is 0.3879, which outperforms all other participants. For the MuSe-Stress sub-challenge, our approach outperforms the baseline in both arousal and valence on the test dataset, reaching a final combined result of 0.5151.
[[2208.03219] Construction of English Resume Corpus and Test with Pre-trained Language Models](http://arxiv.org/abs/2208.03219)
Information extraction(IE) has always been one of the essential tasks of NLP. Moreover, one of the most critical application scenarios of information extraction is the information extraction of resumes. Constructed text is obtained by classifying each part of the resume. It is convenient to store these texts for later search and analysis. Furthermore, the constructed resume data can also be used in the AI resume screening system. Significantly reduce the labor cost of HR. This study aims to transform the information extraction task of resumes into a simple sentence classification task. Based on the English resume dataset produced by the prior study. The classification rules are improved to create a larger and more fine-grained classification dataset of resumes. This corpus is also used to test some current mainstream Pre-training language models (PLMs) performance.Furthermore, in order to explore the relationship between the number of training samples and the correctness rate of the resume dataset, we also performed comparison experiments with training sets of different train set sizes.The final multiple experimental results show that the resume dataset with improved annotation rules and increased sample size of the dataset improves the accuracy of the original resume dataset.
[[2208.02856] Embedding Alignment for Unsupervised Federated Learning via Smart Data Exchange](http://arxiv.org/abs/2208.02856)
Federated learning (FL) has been recognized as one of the most promising solutions for distributed machine learning (ML). In most of the current literature, FL has been studied for supervised ML tasks, in which edge devices collect labeled data. Nevertheless, in many applications, it is impractical to assume existence of labeled data across devices. To this end, we develop a novel methodology, Cooperative Federated unsupervised Contrastive Learning (CF-CL), for FL across edge devices with unlabeled datasets. CF-CL employs local device cooperation where data are exchanged among devices through device-to-device (D2D) communications to avoid local model bias resulting from non-independent and identically distributed (non-i.i.d.) local datasets. CF-CL introduces a push-pull smart data sharing mechanism tailored to unsupervised FL settings, in which, each device pushes a subset of its local datapoints to its neighbors as reserved data points, and pulls a set of datapoints from its neighbors, sampled through a probabilistic importance sampling technique. We demonstrate that CF-CL leads to (i) alignment of unsupervised learned latent spaces across devices, (ii) faster global convergence, allowing for less frequent global model aggregations; and (iii) is effective in extreme non-i.i.d. data settings across the devices.
[[2208.03209] Bias and Fairness in Computer Vision Applications of the Criminal Justice System](http://arxiv.org/abs/2208.03209)
Discriminatory practices involving AI-driven police work have been the subject of much controversies in the past few years, with algorithms such as COMPAS, PredPol and ShotSpotter being accused of unfairly impacting minority groups. At the same time, the issues of fairness in machine learning, and in particular in computer vision, have been the subject of a growing number of academic works. In this paper, we examine how these area intersect. We provide information on how these practices have come to exist and the difficulties in alleviating them. We then examine three applications currently in development to understand what risks they pose to fairness and how those risks can be mitigated.
[[2208.03167] Disentangling 3D Attributes from a Single 2D Image: Human Pose, Shape and Garment](http://arxiv.org/abs/2208.03167)
For visual manipulation tasks, we aim to represent image content with semantically meaningful features. However, learning implicit representations from images often lacks interpretability, especially when attributes are intertwined. We focus on the challenging task of extracting disentangled 3D attributes only from 2D image data. Specifically, we focus on human appearance and learn implicit pose, shape and garment representations of dressed humans from RGB images. Our method learns an embedding with disentangled latent representations of these three image properties and enables meaningful re-assembling of features and property control through a 2D-to-3D encoder-decoder structure. The 3D model is inferred solely from the feature map in the learned embedding space. To the best of our knowledge, our method is the first to achieve cross-domain disentanglement for this highly under-constrained problem. We qualitatively and quantitatively demonstrate our framework's ability to transfer pose, shape, and garments in 3D reconstruction on virtual data and show how an implicit shape loss can benefit the model's ability to recover fine-grained reconstruction details.
[[2208.03112] Explanation of Machine Learning Models of Colon Cancer Using SHAP Considering Interaction Effects](http://arxiv.org/abs/2208.03112)
When using machine learning techniques in decision-making processes, the interpretability of the models is important. Shapley additive explanation (SHAP) is one of the most promising interpretation methods for machine learning models. Interaction effects occur when the effect of one variable depends on the value of another variable. Even if each variable has little effect on the outcome, its combination can have an unexpectedly large impact on the outcome. Understanding interactions is important for understanding machine learning models; however, naive SHAP analysis cannot distinguish between the main effect and interaction effects. In this paper, we introduce the Shapley-Taylor index as an interpretation method for machine learning models using SHAP considering interaction effects. We apply the method to the cancer cohort data of Kyushu University Hospital (N=29,080) to analyze what combination of factors contributes to the risk of colon cancer.