[[2208.09975] ETHERLED: Sending Covert Morse Signals from Air-Gapped Devices via Network Card (NIC) LEDs](http://arxiv.org/abs/2208.09975)
Highly secure devices are often isolated from the Internet or other public networks due to the confidential information they process. This level of isolation is referred to as an 'air-gap .'
In this paper, we present a new technique named ETHERLED, allowing attackers to leak data from air-gapped networked devices such as PCs, printers, network cameras, embedded controllers, and servers. Networked devices have an integrated network interface controller (NIC) that includes status and activity indicator LEDs. We show that malware installed on the device can control the status LEDs by blinking and alternating colors, using documented methods or undocumented firmware commands. Information can be encoded via simple encoding such as Morse code and modulated over these optical signals. An attacker can intercept and decode these signals from tens to hundreds of meters away. We show an evaluation and discuss defensive and preventive countermeasures for this exfiltration attack.
[[2208.10161] BRIEF but Powerful: Byzantine-Robust and Privacy-Preserving Federated Learning via Model Segmentation and Secure clustering](http://arxiv.org/abs/2208.10161)
Byzantine-robust Federated Learning (FL) aims to counter malicious clients and to train an accurate global model while maintaining an extremely low attack success rate. Most of the existing systems, however, are only robust in honest/semi-honest majority settings. FLTrust (NDSS '21) extends the context to the malicious majority for clients but with a strong restriction that the server should be provided with an auxiliary dataset before training in order to filter malicious inputs. Private FLAME/FLGUARD (USENIX '22) gives a solution to guarantee both robustness and updates confidentiality in the semi-honest majority context. It is so far impossible to balance the trade-off among malicious context, robustness, and updates confidentiality. To tackle this problem, we propose a novel Byzantine-robust and privacy-preserving FL system, called BRIEF, to capture malicious minority and majority for server and client sides. Specifically, based on the DBSCAN algorithm, we design a new method for clustering via pairwise adjusted cosine similarity to boost the accuracy of the clustering results. To thwart attacks of malicious majority, we develop an algorithm called Model Segmentation, where local updates in the same cluster are aggregated together, and the aggregations are sent back to corresponding clients correctly. We also leverage multiple cryptographic tools to conduct clustering tasks without sacrificing training correctness and updates confidentiality. We present detailed security proof and empirical evaluation along with convergence analysis for BRIEF. The experimental results demonstrate that the testing accuracy of BRIEF is practically close to the FL baseline (0.8% gap on average). At the same time, the attack success rate is around 0%-5%. We further optimize our design so that the communication overhead and runtime can be decreased by {67%-89.17% and 66.05%-68.75%}, respectively.
[[2208.09727] Security Implications of Large Language Model Code Assistants: A User Study](http://arxiv.org/abs/2208.09727)
Advances in Deep Learning have led to the emergence of Large Language Models (LLMs) such as OpenAI Codex which powers GitHub Copilot. LLMs have been fine tuned and packaged so that programmers can use them in an Integrated Development Environment (IDE) to write code. An emerging line of work is assessing the code quality of code written with the help of these LLMs, with security studies warning that LLMs do not fundamentally have any understanding of the code they are writing, so they are more likely to make mistakes that may be exploitable. We thus conducted a user study (N=58) to assess the security of code written by student programmers when guided by LLMs. Half of the students in our study had the help of the LLM and the other half did not. The students were asked to write code in C that performed operations over a singly linked list, including node operations such as inserting, updating, removing, combining, and others. While the results of our study showed that the students who had the help of an LLM were more likely to write functional code, no generalizable impact on security was observed -- the security impacts were localized to individual functions. We also investigate systematic stylistic differences between unaided and LLM-assisted code, finding that LLM code is more repetitive, which may have an amplifying effect if vulnerable code is repeated in addition to the impact on source code attribution.
[[2208.09741] Sensor Security: Current Progress, Research Challenges, and Future Roadmap](http://arxiv.org/abs/2208.09741)
Sensors are one of the most pervasive and integral components of today's safety-critical systems. Sensors serve as a bridge between physical quantities and connected systems. The connected systems with sensors blindly believe the sensor as there is no way to authenticate the signal coming from a sensor. This could be an entry point for an attacker. An attacker can inject a fake input signal along with the legitimate signal by using a suitable spoofing technique. As the sensor's transducer is not smart enough to differentiate between a fake and legitimate signal, the injected fake signal eventually can collapse the connected system. This type of attack is known as the transduction attack. Over the last decade, several works have been published to provide a defense against the transduction attack. However, the defenses are proposed on an ad-hoc basis; hence, they are not well-structured. Our work begins to fill this gap by providing a checklist that a defense technique should always follow to be considered as an ideal defense against the transduction attack. We name this checklist as the Golden reference of sensor defense. We provide insights on how this Golden reference can be achieved and argue that sensors should be redesigned from the transducer level to the sensor electronics level. We point out that only hardware or software modification is not enough; instead, a hardware/software (HW/SW) co-design approach is required to ride on this future roadmap to the robust and resilient sensor.
[[2208.10134] SoK: Machine Learning with Confidential Computing](http://arxiv.org/abs/2208.10134)
Privacy and security challenges in Machine Learning (ML) have become a critical topic to address, along with ML's pervasive development and the recent demonstration of large attack surfaces. As a mature system-oriented approach, confidential computing has been increasingly utilized in both academia and industry to improve privacy and security in various ML scenarios. In this paper, we systematize the findings on confidential computing-assisted ML security and privacy techniques for providing i) confidentiality guarantees and ii) integrity assurances. We further identify key challenges and provide dedicated analyses of the limitations in existing Trusted Execution Environment (TEE) systems for ML use cases. We discuss prospective works, including grounded privacy definitions, partitioned ML executions, dedicated TEE designs for ML, TEE-aware ML, and ML full pipeline guarantee. These potential solutions can help achieve a much strong TEE-enabled ML for privacy guarantees without introducing computation and system costs.
[[2208.10270] To show or not to show: Redacting sensitive text from videos of electronic displays](http://arxiv.org/abs/2208.10270)
With the increasing prevalence of video recordings there is a growing need for tools that can maintain the privacy of those recorded. In this paper, we define an approach for redacting personally identifiable text from videos using a combination of optical character recognition (OCR) and natural language processing (NLP) techniques. We examine the relative performance of this approach when used with different OCR models, specifically Tesseract and the OCR system from Google Cloud Vision (GCV). For the proposed approach the performance of GCV, in both accuracy and speed, is significantly higher than Tesseract. Finally, we explore the advantages and disadvantages of both models in real-world applications.
[[2208.09525] Glass-Vault: A Generic Transparent Privacy-preserving Exposure Notification Analytics Platform](http://arxiv.org/abs/2208.09525)
The highly transmissible COVID-19 disease is a serious threat to people's health and life. To automate tracing those who have been in close physical contact with newly infected people and/or to analyse tracing-related data, researchers have proposed various ad-hoc programs that require being executed on users' smartphones. Nevertheless, the existing solutions have two primary limitations: (1) lack of generality: for each type of analytic task, a certain kind of data needs to be sent to an analyst; (2) lack of transparency: parties who provide data to an analyst are not necessarily infected individuals; therefore, infected individuals' data can be shared with others (e.g., the analyst) without their fine-grained and direct consent. In this work, we present Glass-Vault, a protocol that addresses both limitations simultaneously. It allows an analyst to run authorised programs over the collected data of infectious users, without learning the input data. Glass-Vault relies on a new variant of generic Functional Encryption that we propose in this work. This new variant, called DD-Steel, offers these two additional properties: dynamic and decentralised. We illustrate the security of both Glass-Vault and DD-Steel in the Universal Composability setting. Glass-Vault is the first UC-secure protocol that allows analysing the data of Exposure Notification users in a privacy-preserving manner. As a sample application, we indicate how it can be used to generate "infection heatmaps".
[[2208.09595] The Saddle-Point Accountant for Differential Privacy](http://arxiv.org/abs/2208.09595)
We introduce a new differential privacy (DP) accountant called the saddle-point accountant (SPA). SPA approximates privacy guarantees for the composition of DP mechanisms in an accurate and fast manner. Our approach is inspired by the saddle-point method -- a ubiquitous numerical technique in statistics. We prove rigorous performance guarantees by deriving upper and lower bounds for the approximation error offered by SPA. The crux of SPA is a combination of large-deviation methods with central limit theorems, which we derive via exponentially tilting the privacy loss random variables corresponding to the DP mechanisms. One key advantage of SPA is that it runs in constant time for the $n$-fold composition of a privacy mechanism. Numerical experiments demonstrate that SPA achieves comparable accuracy to state-of-the-art accounting methods with a faster runtime.
[[2208.09716] zk-PCN: A Privacy-Preserving Payment Channel Network Using zk-SNARKs](http://arxiv.org/abs/2208.09716)
Payment channel network (PCN) is a layer-two scaling solution that enables fast off-chain transactions but does not involve on-chain transaction settlement. PCNs raise new privacy issues including balance secrecy, relationship anonymity and payment privacy. Moreover, protecting privacy causes low transaction success rates. To address this dilemma, we propose zk-PCN, a privacy-preserving payment channel network using zk-SNARKs. We prevent from exposing true balances by setting up \textit{public balances} instead. Using public balances, zk-PCN can guarantee high transaction success rates and protect PCN privacy with zero-knowledge proofs. Additionally, zk-PCN is compatible with the existing routing algorithms of PCNs. To support such compatibility, we propose zk-IPCN to improve zk-PCN with a novel proof generation (RPG) algorithm. zk-IPCN reduces the overheads of storing channel information and lowers the frequency of generating zero-knowledge proofs. Finally, extensive simulations demonstrate the effectiveness and efficiency of zk-PCN in various settings.
[[2208.09776] Privacy-Preserving Protocols for Smart Cameras and Other IoT Devices](http://arxiv.org/abs/2208.09776)
Millions of consumers depend on smart camera systems to remotely monitor their homes and businesses. However, the architecture and design of popular commercial systems require users to relinquish control of their data to untrusted third parties, such as service providers (e.g., the cloud). Third parties therefore can (and in some instances have) access the video footage without the users' knowledge or consent -- violating the core tenet of user privacy. In this paper, we introduce CaCTUs, a privacy-preserving smart camera system that returns control to the user; the root of trust begins with the user and is maintained through a series of cryptographic protocols designed to support popular features, such as sharing, deleting, and viewing videos live. In so doing, we demonstrate that it is feasible to implement a performant smart-camera system that leverages the convenience of a cloud-based model while retaining the ability to control access to (private) data. We then discuss how our techniques and protocols can also be extended to privacy-preserving designs of other IoT devices recording time series data.
[[2208.09967] Inferring Sensitive Attributes from Model Explanations](http://arxiv.org/abs/2208.09967)
Model explanations provide transparency into a trained machine learning model's blackbox behavior to a model builder. They indicate the influence of different input attributes to its corresponding model prediction. The dependency of explanations on input raises privacy concerns for sensitive user data. However, current literature has limited discussion on privacy risks of model explanations.
We focus on the specific privacy risk of attribute inference attack wherein an adversary infers sensitive attributes of an input (e.g., race and sex) given its model explanations. We design the first attribute inference attack against model explanations in two threat models where model builder either (a) includes the sensitive attributes in training data and input or (b) censors the sensitive attributes by not including them in the training data and input.
We evaluate our proposed attack on four benchmark datasets and four state-of-the-art algorithms. We show that an adversary can successfully infer the value of sensitive attributes from explanations in both the threat models accurately. Moreover, the attack is successful even by exploiting only the explanations corresponding to sensitive attributes. These suggest that our attack is effective against explanations and poses a practical threat to data privacy.
On combining the model predictions (an attack surface exploited by prior attacks) with explanations, we note that the attack success does not improve. Additionally, the attack success on exploiting model explanations is better compared to exploiting only model predictions. These suggest that model explanations are a strong attack surface to exploit for an adversary.
[[2208.10253] The Economics of Privacy and Utility: Investment Strategies](http://arxiv.org/abs/2208.10253)
The inevitable leakage of privacy as a result of unrestrained disclosure of personal information has motivated extensive research on robust privacy-preserving mechanisms. However, existing research is mostly limited to solving the problem in a static setting with disregard for the privacy leakage over time. Unfortunately, this treatment of privacy is insufficient in practical settings where users continuously disclose their personal information over time resulting in an accumulated leakage of the users' sensitive information. In this paper, we consider privacy leakage over a finite time horizon and investigate optimal strategies to maximize the utility of the disclosed data while limiting the finite-horizon privacy leakage. We consider a simple privacy mechanism that involves compressing the user's data before each disclosure to meet the desired constraint on future privacy. We further motivate several algorithms to optimize the dynamic privacy-utility tradeoff and evaluate their performance via extensive synthetic performance tests.
[[2208.09915] MockingBERT: A Method for Retroactively Adding Resilience to NLP Models](http://arxiv.org/abs/2208.09915)
Protecting NLP models against misspellings whether accidental or adversarial has been the object of research interest for the past few years. Existing remediations have typically either compromised accuracy or required full model re-training with each new class of attacks. We propose a novel method of retroactively adding resilience to misspellings to transformer-based NLP models. This robustness can be achieved without the need for re-training of the original NLP model and with only a minimal loss of language understanding performance on inputs without misspellings. Additionally we propose a new efficient approximate method of generating adversarial misspellings, which significantly reduces the cost needed to evaluate a model's resilience to adversarial attacks.
[[2208.10249] Prediction of User Request and Complaint in Spoken Customer-Agent Conversations](http://arxiv.org/abs/2208.10249)
We present the corpus called HealthCall. This was recorded in real-life conditions in the call center of Malakoff Humanis. It includes two separate audio channels, the first one for the customer and the second one for the agent. Each conversation was anonymized respecting the General Data Protection Regulation. This corpus includes a transcription of the spoken conversations and was divided into two sets: Train and Devel sets. Two important customer relationship management tasks were assessed on the HealthCall corpus: Automatic prediction of type of user requests and complaints detection. For this purpose, we have investigated 14 feature sets: 6 linguistic feature sets, 6 audio feature sets and 2 vocal interaction feature sets. We have used Bidirectional Encoder Representation from Transformers models for the linguistic features, openSMILE and Wav2Vec 2.0 for the audio features. The vocal interaction feature sets were designed and developed from Turn Takings. The results show that the linguistic features always give the best results (91.2% for the Request task and 70.3% for the Complaint task). The Wav2Vec 2.0 features seem more suitable for these two tasks than the ComPaRe16 features. Vocal interaction features outperformed ComPaRe16 features on Complaint task with a 57% rate achieved with only six features.
[[2208.09764] GAIROSCOPE: Injecting Data from Air-Gapped Computers to Nearby Gyroscopes](http://arxiv.org/abs/2208.09764)
It is known that malware can leak data from isolated, air-gapped computers to nearby smartphones using ultrasonic waves. However, this covert channel requires access to the smartphone's microphone, which is highly protected in Android OS and iOS, and might be non-accessible, disabled, or blocked.
In this paper we present `GAIROSCOPE,' an ultrasonic covert channel that doesn't require a microphone on the receiving side. Our malware generates ultrasonic tones in the resonance frequencies of the MEMS gyroscope. These inaudible frequencies produce tiny mechanical oscillations within the smartphone's gyroscope, which can be demodulated into binary information. Notably, the gyroscope in smartphones is considered to be a 'safe' sensor that can be used legitimately from mobile apps and javascript. We introduce the adversarial attack model and present related work. We provide the relevant technical background and show the design and implementation of GAIROSCOPE. We present the evaluation results and discuss a set of countermeasures to this threat. Our experiments show that attackers can exfiltrate sensitive information from air-gapped computers to smartphones located a few meters away via Speakers-to-Gyroscope covert channel.
[[2208.10251] Rethinking Textual Adversarial Defense for Pre-trained Language Models](http://arxiv.org/abs/2208.10251)
Although pre-trained language models (PrLMs) have achieved significant success, recent studies demonstrate that PrLMs are vulnerable to adversarial attacks. By generating adversarial examples with slight perturbations on different levels (sentence / word / character), adversarial attacks can fool PrLMs to generate incorrect predictions, which questions the robustness of PrLMs. However, we find that most existing textual adversarial examples are unnatural, which can be easily distinguished by both human and machine. Based on a general anomaly detector, we propose a novel metric (Degree of Anomaly) as a constraint to enable current adversarial attack approaches to generate more natural and imperceptible adversarial examples. Under this new constraint, the success rate of existing attacks drastically decreases, which reveals that the robustness of PrLMs is not as fragile as they claimed. In addition, we find that four types of randomization can invalidate a large portion of textual adversarial examples. Based on anomaly detector and randomization, we design a universal defense framework, which is among the first to perform textual adversarial defense without knowing the specific attack. Empirical results show that our universal defense framework achieves comparable or even higher after-attack accuracy with other specific defenses, while preserving higher original accuracy at the same time. Our work discloses the essence of textual adversarial attacks, and indicates that (1) further works of adversarial attacks should focus more on how to overcome the detection and resist the randomization, otherwise their adversarial examples would be easily detected and invalidated; and (2) compared with the unnatural and perceptible adversarial examples, it is those undetectable adversarial examples that pose real risks for PrLMs and require more attention for future robustness-enhancing strategies.
[[2208.10224] Friendly Noise against Adversarial Noise: A Powerful Defense against Data Poisoning Attacks](http://arxiv.org/abs/2208.10224)
A powerful category of data poisoning attacks modify a subset of training examples by small adversarial perturbations to change the prediction of certain test-time data. Existing defense mechanisms are not desirable to deploy in practice, as they often drastically harm the generalization performance, or are attack-specific and prohibitively slow to apply. Here, we propose a simple but highly effective approach that unlike existing methods breaks various types of poisoning attacks with the slightest drop in the generalization performance. We make the key observation that attacks exploit sharp loss regions to craft adversarial perturbations which can substantially alter examples' gradient or representations under small perturbations. To break poisoning attacks, our approach comprises two components: an optimized friendly noise that is generated to maximally perturb examples without degrading the performance, and a random varying noise component. The first component takes examples farther away from the sharp loss regions, and the second component smooths out the loss landscape. The combination of both components builds a very light-weight but extremely effective defense against the most powerful triggerless targeted and hidden-trigger backdoor poisoning attacks, including Gradient Matching, Bulls-eye Polytope, and Sleeper Agent. We show that our friendly noise is transferable to other architectures, and adaptive attacks cannot break our defense due to its random noise component.
[[2208.10276] An Input-Aware Mimic Defense Theory and its Practice](http://arxiv.org/abs/2208.10276)
The current security problems in cyberspace are characterized by strong and complex threats. Defenders face numerous problems such as lack of prior knowledge, various threats, and unknown vulnerabilities, which urgently need new fundamental theories to support. To address these issues, this article proposes a generic theoretical model for cyberspace defense and a new mimic defense framework, that is, Spatiotemporally heterogeneous, Input aware, and Dynamically updated Mimic Defense (SIDMD). We make the following contributions: (1) We first redefine vulnerabilities from the input space perspective to normalize the diverse cyberspace security problem. (2) We propose a novel unknown vulnerability discovery method and a dynamic scheduling strategy considering temporal and spatial dimensions without prior knowledge. Theoretical analysis and experimental results show that SIDMD has the best security performance in complex attack scenarios, and the probability of successful attacks is greatly reduced compared to the state-of-the-art.
[[2208.09518] Recurrent Neural Network-based Anti-jamming Framework for Defense Against Multiple Jamming Policies](http://arxiv.org/abs/2208.09518)
Conventional anti-jamming methods mainly focus on preventing single jammer attacks with an invariant jamming policy or jamming attacks from multiple jammers with similar jamming policies. These anti-jamming methods are ineffective against a single jammer following several different jamming policies or multiple jammers with distinct policies. Therefore, this paper proposes an anti-jamming method that can adapt its policy to the current jamming attack. Moreover, for the multiple jammers scenario, an anti-jamming method that estimates the future occupied channels using the jammers' occupied channels in previous time slots is proposed. In both single and multiple jammers scenarios, the interaction between the users and jammers is modeled using recurrent neural networks (RNN)s. The performance of the proposed anti-jamming methods is evaluated by calculating the users' successful transmission rate (STR) and ergodic rate (ER), and compared to a baseline based on Q-learning (DQL). Simulation results show that for the single jammer scenario, all the considered jamming policies are perfectly detected and high STR and ER are maintained. Moreover, when 70 % of the spectrum is under jamming attacks from multiple jammers, the proposed method achieves an STR and ER greater than 75 % and 80 %, respectively. These values rise to 90 % when 30 % of the spectrum is under jamming attacks. In addition, the proposed anti-jamming methods significantly outperform the DQL method for all the considered cases and jamming scenarios.
[[2208.09602] Analyzing Adversarial Robustness of Vision Transformers against Spatial and Spectral Attacks](http://arxiv.org/abs/2208.09602)
Vision Transformers have emerged as a powerful architecture that can outperform convolutional neural networks (CNNs) in image classification tasks. Several attempts have been made to understand robustness of Transformers against adversarial attacks, but existing studies draw inconsistent results, i.e., some conclude that Transformers are more robust than CNNs, while some others find that they have similar degrees of robustness. In this paper, we address two issues unexplored in the existing studies examining adversarial robustness of Transformers. First, we argue that the image quality should be simultaneously considered in evaluating adversarial robustness. We find that the superiority of one architecture to another in terms of robustness can change depending on the attack strength expressed by the quality of the attacked images. Second, by noting that Transformers and CNNs rely on different types of information in images, we formulate an attack framework, called Fourier attack, as a tool for implementing flexible attacks, where an image can be attacked in the spectral domain as well as in the spatial domain. This attack perturbs the magnitude and phase information of particular frequency components selectively. Through extensive experiments, we find that Transformers tend to rely more on phase information and low frequency information than CNNs, and thus sometimes they are even more vulnerable under frequency-selective attacks. It is our hope that this work provides new perspectives in understanding the properties and adversarial robustness of Transformers.
[[2208.09801] PointDP: Diffusion-driven Purification against Adversarial Attacks on 3D Point Cloud Recognition](http://arxiv.org/abs/2208.09801)
3D Point cloud is becoming a critical data representation in many real-world applications like autonomous driving, robotics, and medical imaging. Although the success of deep learning further accelerates the adoption of 3D point clouds in the physical world, deep learning is notorious for its vulnerability to adversarial attacks. In this work, we first identify that the state-of-the-art empirical defense, adversarial training, has a major limitation in applying to 3D point cloud models due to gradient obfuscation. We further propose PointDP, a purification strategy that leverages diffusion models to defend against 3D adversarial attacks. We extensively evaluate PointDP on six representative 3D point cloud architectures, and leverage 10+ strong and adaptive attacks to demonstrate its lower-bound robustness. Our evaluation shows that PointDP achieves significantly better robustness than state-of-the-art purification methods under strong attacks. Results of certified defenses on randomized smoothing combined with PointDP will be included in the near future.
[[2208.10231] An anomaly detection approach for backdoored neural networks: face recognition as a case study](http://arxiv.org/abs/2208.10231)
Backdoor attacks allow an attacker to embed functionality jeopardizing proper behavior of any algorithm, machine learning or not. This hidden functionality can remain inactive for normal use of the algorithm until activated by the attacker. Given how stealthy backdoor attacks are, consequences of these backdoors could be disastrous if such networks were to be deployed for applications as critical as border or access control. In this paper, we propose a novel backdoored network detection method based on the principle of anomaly detection, involving access to the clean part of the training data and the trained network. We highlight its promising potential when considering various triggers, locations and identity pairs, without the need to make any assumptions on the nature of the backdoor and its setup. We test our method on a novel dataset of backdoored networks and report detectability results with perfect scores.
[[2208.09482] A New Outlook on the Profitability of Rogue Mining Strategies in the Bitcoin Network](http://arxiv.org/abs/2208.09482)
Many of the recent works on the profitability of rogue mining strategies hinge on a parameter called $\gamma$ that measures the proportion of the honest network attracted by the attacker to mine on top of his fork. These works, see arXiv:1808.01041 and arXiv.1805.08281, have surmised conclusions based on premises that erroneously treat $\gamma$ to be constant. In this paper, we treat $\gamma$ as a stochastic process and attempt to find its distribution through a Markov analysis. We begin by making strong assumptions on gamma's behaviour and proceed to translate them mathematically in order to apply them in a Markov setting. The aforementioned is executed in two separate occasions for two different models. Furthermore, we model the Bitcoin network and numerically derive a limiting distribution whereby the relative accuracy of our models is tested through a likelihood analysis. Finally, we conclude that even with control of 20% of the total hashrate, honest mining is the strongly dominant strategy.
[[2208.10279] Defensive Distillation based Adversarial Attacks Mitigation Method for Channel Estimation using Deep Learning Models in Next-Generation Wireless Networks](http://arxiv.org/abs/2208.10279)
Future wireless networks (5G and beyond) are the vision of forthcoming cellular systems, connecting billions of devices and people together. In the last decades, cellular networks have been dramatically growth with advanced telecommunication technologies for high-speed data transmission, high cell capacity, and low latency. The main goal of those technologies is to support a wide range of new applications, such as virtual reality, metaverse, telehealth, online education, autonomous and flying vehicles, smart cities, smart grids, advanced manufacturing, and many more. The key motivation of NextG networks is to meet the high demand for those applications by improving and optimizing network functions. Artificial Intelligence (AI) has a high potential to achieve these requirements by being integrated in applications throughout all layers of the network. However, the security concerns on network functions of NextG using AI-based models, i.e., model poising, have not been investigated deeply. Therefore, it needs to design efficient mitigation techniques and secure solutions for NextG networks using AI-based methods. This paper proposes a comprehensive vulnerability analysis of deep learning (DL)-based channel estimation models trained with the dataset obtained from MATLAB's 5G toolbox for adversarial attacks and defensive distillation-based mitigation methods. The adversarial attacks produce faulty results by manipulating trained DL-based models for channel estimation in NextG networks, while making models more robust against any attacks through mitigation methods. This paper also presents the performance of the proposed defensive distillation mitigation method for each adversarial attack against the channel estimation model. The results indicated that the proposed mitigation method can defend the DL-based channel estimation models against adversarial attacks in NextG networks.
[[2208.10083] MetaRF: Differentiable Random Forest for Reaction Yield Prediction with a Few Trails](http://arxiv.org/abs/2208.10083)
Artificial intelligence has deeply revolutionized the field of medicinal chemistry with many impressive applications, but the success of these applications requires a massive amount of training samples with high-quality annotations, which seriously limits the wide usage of data-driven methods. In this paper, we focus on the reaction yield prediction problem, which assists chemists in selecting high-yield reactions in a new chemical space only with a few experimental trials. To attack this challenge, we first put forth MetaRF, an attention-based differentiable random forest model specially designed for the few-shot yield prediction, where the attention weight of a random forest is automatically optimized by the meta-learning framework and can be quickly adapted to predict the performance of new reagents while given a few additional samples. To improve the few-shot learning performance, we further introduce a dimension-reduction based sampling method to determine valuable samples to be experimentally tested and then learned. Our methodology is evaluated on three different datasets and acquires satisfactory performance on few-shot prediction. In high-throughput experimentation (HTE) datasets, the average yield of our methodology's top 10 high-yield reactions is relatively close to the results of ideal yield selection.
[[2208.09520] Accelerating Vision Transformer Training via a Patch Sampling Schedule](http://arxiv.org/abs/2208.09520)
We introduce the notion of a Patch Sampling Schedule (PSS), that varies the number of Vision Transformer (ViT) patches used per batch during training. Since all patches are not equally important for most vision objectives (e.g., classification), we argue that less important patches can be used in fewer training iterations, leading to shorter training time with minimal impact on performance. Additionally, we observe that training with a PSS makes a ViT more robust to a wider patch sampling range during inference. This allows for a fine-grained, dynamic trade-off between throughput and accuracy during inference. We evaluate using PSSs on ViTs for ImageNet both trained from scratch and pre-trained using a reconstruction loss function. For the pre-trained model, we achieve a 0.26% reduction in classification accuracy for a 31% reduction in training time (from 25 to 17 hours) compared to using all patches each iteration. Code, model checkpoints and logs are available at https://github.com/BradMcDanel/pss.
[[2208.09698] Fuse and Attend: Generalized Embedding Learning for Art and Sketches](http://arxiv.org/abs/2208.09698)
While deep Embedding Learning approaches have witnessed widespread success in multiple computer vision tasks, the state-of-the-art methods for representing natural images need not necessarily perform well on images from other domains, such as paintings, cartoons, and sketch. This is because of the huge shift in the distribution of data from across these domains, as compared to natural images. Domains like sketch often contain sparse informative pixels. However, recognizing objects in such domains is crucial, given multiple relevant applications leveraging such data, for instance, sketch to image retrieval. Thus, achieving an Embedding Learning model that could perform well across multiple domains is not only challenging, but plays a pivotal role in computer vision. To this end, in this paper, we propose a novel Embedding Learning approach with the goal of generalizing across different domains. During training, given a query image from a domain, we employ gated fusion and attention to generate a positive example, which carries a broad notion of the semantics of the query object category (from across multiple domains). By virtue of Contrastive Learning, we pull the embeddings of the query and positive, in order to learn a representation which is robust across domains. At the same time, to teach the model to be discriminative against examples from different semantic categories (across domains), we also maintain a pool of negative embeddings (from different categories). We show the prowess of our method using the DomainBed framework, on the popular PACS (Photo, Art painting, Cartoon, and Sketch) dataset.
[[2208.09756] Artifact-Based Domain Generalization of Skin Lesion Models](http://arxiv.org/abs/2208.09756)
Deep Learning failure cases are abundant, particularly in the medical area. Recent studies in out-of-distribution generalization have advanced considerably on well-controlled synthetic datasets, but they do not represent medical imaging contexts. We propose a pipeline that relies on artifacts annotation to enable generalization evaluation and debiasing for the challenging skin lesion analysis context. First, we partition the data into levels of increasingly higher biased training and test sets for better generalization assessment. Then, we create environments based on skin lesion artifacts to enable domain generalization methods. Finally, after robust training, we perform a test-time debiasing procedure, reducing spurious features in inference images. Our experiments show our pipeline improves performance metrics in biased cases, and avoids artifacts when using explanation methods. Still, when evaluating such models in out-of-distribution data, they did not prefer clinically-meaningful features. Instead, performance only improved in test sets that present similar artifacts from training, suggesting models learned to ignore the known set of artifacts. Our results raise a concern that debiasing models towards a single aspect may not be enough for fair skin lesion analysis.
[[2208.09788] FaceOff: A Video-to-Video Face Swapping System](http://arxiv.org/abs/2208.09788)
Doubles play an indispensable role in the movie industry. They take the place of the actors in dangerous stunt scenes or in scenes where the same actor plays multiple characters. The double's face is later replaced with the actor's face and expressions manually using expensive CGI technology, costing millions of dollars and taking months to complete. An automated, inexpensive, and fast way can be to use face-swapping techniques that aim to swap an identity from a source face video (or an image) to a target face video. However, such methods can not preserve the source expressions of the actor important for the scene's context. % essential for the scene. % that are essential in cinemas. To tackle this challenge, we introduce video-to-video (V2V) face-swapping, a novel task of face-swapping that can preserve (1) the identity and expressions of the source (actor) face video and (2) the background and pose of the target (double) video. We propose FaceOff, a V2V face-swapping system that operates by learning a robust blending operation to merge two face videos following the constraints above. It first reduces the videos to a quantized latent space and then blends them in the reduced space. FaceOff is trained in a self-supervised manner and robustly tackles the non-trivial challenges of V2V face-swapping. As shown in the experimental section, FaceOff significantly outperforms alternate approaches qualitatively and quantitatively.
[[2208.09913] A Unified Analysis of Mixed Sample Data Augmentation: A Loss Function Perspective](http://arxiv.org/abs/2208.09913)
We propose the first unified theoretical analysis of mixed sample data augmentation (MSDA), such as Mixup and CutMix. Our theoretical results show that regardless of the choice of the mixing strategy, MSDA behaves as a pixel-level regularization of the underlying training loss and a regularization of the first layer parameters. Similarly, our theoretical results support that the MSDA training strategy can improve adversarial robustness and generalization compared to the vanilla training strategy. Using the theoretical results, we provide a high-level understanding of how different design choices of MSDA work differently. For example, we show that the most popular MSDA methods, Mixup and CutMix, behave differently, e.g., CutMix regularizes the input gradients by pixel distances, while Mixup regularizes the input gradients regardless of pixel distances. Our theoretical results also show that the optimal MSDA strategy depends on tasks, datasets, or model parameters. From these observations, we propose generalized MSDAs, a Hybrid version of Mixup and CutMix (HMix) and Gaussian Mixup (GMix), simple extensions of Mixup and CutMix. Our implementation can leverage the advantages of Mixup and CutMix, while our implementation is very efficient, and the computation cost is almost neglectable as Mixup and CutMix. Our empirical study shows that our HMix and GMix outperform the previous state-of-the-art MSDA methods in CIFAR-100 and ImageNet classification tasks. Source code is available at https://github.com/naver-ai/hmix-gmix
[[2208.09916] A Web Application for Experimenting and Validating Remote Measurement of Vital Signs](http://arxiv.org/abs/2208.09916)
With a surge in online medical advising remote monitoring of patient vitals is required. This can be facilitated with the Remote Photoplethysmography (rPPG) techniques that compute vital signs from facial videos. It involves processing video frames to obtain skin pixels, extracting the cardiac data from it and applying signal processing filters to extract the Blood Volume Pulse (BVP) signal. Different algorithms are applied to the BVP signal to estimate the various vital signs. We implemented a web application framework to measure a person's Heart Rate (HR), Heart Rate Variability (HRV), Oxygen Saturation (SpO2), Respiration Rate (RR), Blood Pressure (BP), and stress from the face video. The rPPG technique is highly sensitive to illumination and motion variation. The web application guides the users to reduce the noise due to these variations and thereby yield a cleaner BVP signal. The accuracy and robustness of the framework was validated with the help of volunteers.
[[2208.09926] A semi-supervised Teacher-Student framework for surgical tool detection and localization](http://arxiv.org/abs/2208.09926)
Surgical tool detection in minimally invasive surgery is an essential part of computer-assisted interventions. Current approaches are mostly based on supervised methods which require large fully labeled data to train supervised models and suffer from pseudo label bias because of class imbalance issues. However large image datasets with bounding box annotations are often scarcely available. Semi-supervised learning (SSL) has recently emerged as a means for training large models using only a modest amount of annotated data; apart from reducing the annotation cost. SSL has also shown promise to produce models that are more robust and generalizable. Therefore, in this paper we introduce a semi-supervised learning (SSL) framework in surgical tool detection paradigm which aims to mitigate the scarcity of training data and the data imbalance through a knowledge distillation approach. In the proposed work, we train a model with labeled data which initialises the Teacher-Student joint learning, where the Student is trained on Teacher-generated pseudo labels from unlabeled data. We propose a multi-class distance with a margin based classification loss function in the region-of-interest head of the detector to effectively segregate foreground classes from background region. Our results on m2cai16-tool-locations dataset indicate the superiority of our approach on different supervised data settings (1%, 2%, 5%, 10% of annotated data) where our model achieves overall improvements of 8%, 12% and 27% in mAP (on 1% labeled data) over the state-of-the-art SSL methods and a fully supervised baseline, respectively. The code is available at https://github.com/Mansoor-at/Semi-supervised-surgical-tool-det
[[2208.10221] Dynamic Adaptive Threshold based Learning for Noisy Annotations Robust Facial Expression Recognition](http://arxiv.org/abs/2208.10221)
The real-world facial expression recognition (FER) datasets suffer from noisy annotations due to crowd-sourcing, ambiguity in expressions, the subjectivity of annotators and inter-class similarity. However, the recent deep networks have strong capacity to memorize the noisy annotations leading to corrupted feature embedding and poor generalization. To handle noisy annotations, we propose a dynamic FER learning framework (DNFER) in which clean samples are selected based on dynamic class specific threshold during training. Specifically, DNFER is based on supervised training using selected clean samples and unsupervised consistent training using all the samples. During training, the mean posterior class probabilities of each mini-batch is used as dynamic class-specific threshold to select the clean samples for supervised training. This threshold is independent of noise rate and does not need any clean data unlike other methods. In addition, to learn from all samples, the posterior distributions between weakly-augmented image and strongly-augmented image are aligned using an unsupervised consistency loss. We demonstrate the robustness of DNFER on both synthetic as well as on real noisy annotated FER datasets like RAFDB, FERPlus, SFEW and AffectNet.
[[2208.09872] Provably Tightest Linear Approximation for Robustness Verification of Sigmoid-like Neural Networks](http://arxiv.org/abs/2208.09872)
The robustness of deep neural networks is crucial to modern AI-enabled systems and should be formally verified. Sigmoid-like neural networks have been adopted in a wide range of applications. Due to their non-linearity, Sigmoid-like activation functions are usually over-approximated for efficient verification, which inevitably introduces imprecision. Considerable efforts have been devoted to finding the so-called tighter approximations to obtain more precise verification results. However, existing tightness definitions are heuristic and lack theoretical foundations. We conduct a thorough empirical analysis of existing neuron-wise characterizations of tightness and reveal that they are superior only on specific neural networks. We then introduce the notion of network-wise tightness as a unified tightness definition and show that computing network-wise tightness is a complex non-convex optimization problem. We bypass the complexity from different perspectives via two efficient, provably tightest approximations. The results demonstrate the promising performance achievement of our approaches over state of the art: (i) achieving up to 251.28% improvement to certified lower robustness bounds; and (ii) exhibiting notably more precise verification results on convolutional networks.
[[2208.09619] A Novel Hybrid Sampling Framework for Imbalanced Learning](http://arxiv.org/abs/2208.09619)
Class imbalance is a frequently occurring scenario in classification tasks. Learning from imbalanced data poses a major challenge, which has instigated a lot of research in this area. Data preprocessing using sampling techniques is a standard approach to deal with the imbalance present in the data. Since standard classification algorithms do not perform well on imbalanced data, the dataset needs to be adequately balanced before training. This can be accomplished by oversampling the minority class or undersampling the majority class. In this study, a novel hybrid sampling algorithm has been proposed. To overcome the limitations of the sampling techniques while ensuring the quality of the retained sampled dataset, a sophisticated framework has been developed to properly combine three different sampling techniques. Neighborhood Cleaning rule is first applied to reduce the imbalance. Random undersampling is then strategically coupled with the SMOTE algorithm to obtain an optimal balance in the dataset. This proposed hybrid methodology, termed "SMOTE-RUS-NC", has been compared with other state-of-the-art sampling techniques. The strategy is further incorporated into the ensemble learning framework to obtain a more robust classification algorithm, termed "SRN-BRF". Rigorous experimentation has been conducted on 26 imbalanced datasets with varying degrees of imbalance. In virtually all datasets, the proposed two algorithms outperformed existing sampling strategies, in many cases by a substantial margin. Especially in highly imbalanced datasets where popular sampling techniques failed utterly, they achieved unparalleled performance. The superior results obtained demonstrate the efficacy of the proposed models and their potential to be powerful sampling algorithms in imbalanced domain.
[[2208.09779] Robust Node Classification on Graphs: Jointly from Bayesian Label Transition and Topology-based Label Propagation](http://arxiv.org/abs/2208.09779)
Node classification using Graph Neural Networks (GNNs) has been widely applied in various real-world scenarios. However, in recent years, compelling evidence emerges that the performance of GNN-based node classification may deteriorate substantially by topological perturbation, such as random connections or adversarial attacks. Various solutions, such as topological denoising methods and mechanism design methods, have been proposed to develop robust GNN-based node classifiers but none of these works can fully address the problems related to topological perturbations. Recently, the Bayesian label transition model is proposed to tackle this issue but its slow convergence may lead to inferior performance. In this work, we propose a new label inference model, namely LInDT, which integrates both Bayesian label transition and topology-based label propagation for improving the robustness of GNNs against topological perturbations. LInDT is superior to existing label transition methods as it improves the label prediction of uncertain nodes by utilizing neighborhood-based label propagation leading to better convergence of label inference. Besides, LIndT adopts asymmetric Dirichlet distribution as a prior, which also helps it to improve label inference. Extensive experiments on five graph datasets demonstrate the superiority of LInDT for GNN-based node classification under three scenarios of topological perturbations.
[[2208.09833] Combating Noisy-Labeled and Imbalanced Data by Two Stage Bi-Dimensional Sample Selection](http://arxiv.org/abs/2208.09833)
Robust learning on noisy-labeled data has been an important task in real applications, because label noise directly leads to the poor generalization of deep learning models. Existing label-noise learning methods usually assume that the ground-truth classes of the training data are balanced. However, the real-world data is often imbalanced, leading to the inconsistency between observed and intrinsic class distribution due to label noises. Distribution inconsistency makes the problem of label-noise learning more challenging because it is hard to distinguish clean samples from noisy samples on the intrinsic tail classes. In this paper, we propose a learning framework for label-noise learning with intrinsically long-tailed data. Specifically, we propose a robust sample selection method called two-stage bi-dimensional sample selection (TBSS) to better separate clean samples from noisy samples, especially for the tail classes. TBSS consists of two new separation metrics to jointly separate samples in each class. Extensive experiments on multiple noisy-labeled datasets with intrinsically long-tailed class distribution demonstrate the effectiveness of our method.
[[2208.10010] NOSMOG: Learning Noise-robust and Structure-aware MLPs on Graphs](http://arxiv.org/abs/2208.10010)
While Graph Neural Networks (GNNs) have demonstrated their efficacy in dealing with non-Euclidean structural data, they are difficult to be deployed in real applications due to the scalability constraint imposed by multi-hop data dependency. Existing methods attempt to address this scalability issue by training multi-layer perceptrons (MLPs) exclusively on node content features using labels derived from trained GNNs. Even though the performance of MLPs can be significantly improved, two issues prevent MLPs from outperforming GNNs and being used in practice: the ignorance of graph structural information and the sensitivity to node feature noises. In this paper, we propose to learn NOise-robust Structure-aware MLPs On Graphs (NOSMOG) to overcome the challenges. Specifically, we first complement node content with position features to help MLPs capture graph structural information. We then design a novel representational similarity distillation strategy to inject structural node similarities into MLPs. Finally, we introduce the adversarial feature augmentation to ensure stable learning against feature noises and further improve performance. Extensive experiments demonstrate that NOSMOG outperforms GNNs and the state-of-the-art method in both transductive and inductive settings across seven datasets, while maintaining a competitive inference efficiency.
[[2208.10053] Robust Bayesian Nonnegative Matrix Factorization with Implicit Regularizers](http://arxiv.org/abs/2208.10053)
We introduce a probabilistic model with implicit norm regularization for learning nonnegative matrix factorization (NMF) that is commonly used for predicting missing values and finding hidden patterns in the data, in which the matrix factors are latent variables associated with each data dimension. The nonnegativity constraint for the latent factors is handled by choosing priors with support on the nonnegative subspace, e.g., exponential density or distribution based on exponential function. Bayesian inference procedure based on Gibbs sampling is employed. We evaluate the model on several real-world datasets including Genomics of Drug Sensitivity in Cancer (GDSC $IC_{50}$) and Gene body methylation with different sizes and dimensions, and show that the proposed Bayesian NMF GL$2^2$ and GL$\infty$ models lead to robust predictions for different data values and avoid overfitting compared with competitive Bayesian NMF approaches.
[[2208.09500] Explainable Biometrics in the Age of Deep Learning](http://arxiv.org/abs/2208.09500)
Systems capable of analyzing and quantifying human physical or behavioral traits, known as biometrics systems, are growing in use and application variability. Since its evolution from handcrafted features and traditional machine learning to deep learning and automatic feature extraction, the performance of biometric systems increased to outstanding values. Nonetheless, the cost of this fast progression is still not understood. Due to its opacity, deep neural networks are difficult to understand and analyze, hence, hidden capacities or decisions motivated by the wrong motives are a potential risk. Researchers have started to pivot their focus towards the understanding of deep neural networks and the explanation of their predictions. In this paper, we provide a review of the current state of explainable biometrics based on the study of 47 papers and discuss comprehensively the direction in which this field should be developed.
[[2208.09677] Net2Brain: A Toolbox to compare artificial vision models with human brain responses](http://arxiv.org/abs/2208.09677)
We introduce Net2Brain, a graphical and command-line user interface toolbox for comparing the representational spaces of artificial deep neural networks (DNNs) and human brain recordings. While different toolboxes facilitate only single functionalities or only focus on a small subset of supervised image classification models, Net2Brain allows the extraction of activations of more than 600 DNNs trained to perform a diverse range of vision-related tasks (e.g semantic segmentation, depth estimation, action recognition, etc.), over both image and video datasets. The toolbox computes the representational dissimilarity matrices (RDMs) over those activations and compares them to brain recordings using representational similarity analysis (RSA), weighted RSA, both in specific ROIs and with searchlight search. In addition, it is possible to add a new data set of stimuli and brain recordings to the toolbox for evaluation. We demonstrate the functionality and advantages of Net2Brain with an example showcasing how it can be used to test hypotheses of cognitive computational neuroscience.
[[2208.10004] A diverse large-scale building dataset and a novel plug-and-play domain generalization method for building extraction](http://arxiv.org/abs/2208.10004)
In this paper, we introduce a new building dataset and propose a novel domain generalization method to facilitate the development of building extraction from high-resolution remote sensing images. The problem with the current building datasets involves that they lack diversity, the quality of the labels is unsatisfactory, and they are hardly used to train a building extraction model with good generalization ability, so as to properly evaluate the real performance of a model in practical scenes. To address these issues, we built a diverse, large-scale, and high-quality building dataset named the WHU-Mix building dataset, which is more practice-oriented. The WHU-Mix building dataset consists of a training/validation set containing 43,727 diverse images collected from all over the world, and a test set containing 8402 images from five other cities on five continents. In addition, to further improve the generalization ability of a building extraction model, we propose a domain generalization method named batch style mixing (BSM), which can be embedded as an efficient plug-and-play module in the frond-end of a building extraction model, providing the model with a progressively larger data distribution to learn data-invariant knowledge. The experiments conducted in this study confirmed the potential of the WHU-Mix building dataset to improve the performance of a building extraction model, resulting in a 6-36% improvement in mIoU, compared to the other existing datasets. The adverse impact of the inaccurate labels in the other datasets can cause about 20% IoU decrease. The experiments also confirmed the high performance of the proposed BSM module in enhancing the generalization ability and robustness of a model, exceeding the baseline model without domain generalization by 13% and the recent domain generalization methods by 4-15% in mIoU.
[[2208.10044] Multilayer deep feature extraction for visual texture recognition](http://arxiv.org/abs/2208.10044)
Convolutional neural networks have shown successful results in image classification achieving real-time results superior to the human level. However, texture images still pose some challenge to these models due, for example, to the limited availability of data for training in several problems where these images appear, high inter-class similarity, the absence of a global viewpoint of the object represented, and others. In this context, the present paper is focused on improving the accuracy of convolutional neural networks in texture classification. This is done by extracting features from multiple convolutional layers of a pretrained neural network and aggregating such features using Fisher vector. The reason for using features from earlier convolutional layers is obtaining information that is less domain specific. We verify the effectiveness of our method on texture classification of benchmark datasets, as well as on a practical task of Brazilian plant species identification. In both scenarios, Fisher vectors calculated on multiple layers outperform state-of-art methods, confirming that early convolutional layers provide important information about the texture image for classification.
[[2208.09617] Pretrained Language Encoders are Natural Tagging Frameworks for Aspect Sentiment Triplet Extraction](http://arxiv.org/abs/2208.09617)
Aspect Sentiment Triplet Extraction (ASTE) aims to extract the spans of aspect, opinion, and their sentiment relations as sentiment triplets. Existing works usually formulate the span detection as a 1D token tagging problem, and model the sentiment recognition with a 2D tagging matrix of token pairs. Moreover, by leveraging the token representation of Pretrained Language Encoders (PLEs) like BERT, they can achieve better performance. However, they simply leverage PLEs as feature extractors to build their modules but never have a deep look at what specific knowledge does PLEs contain. In this paper, we argue that instead of further designing modules to capture the inductive bias of ASTE, PLEs themselves contain "enough" features for 1D and 2D tagging: (1) The token representation contains the contextualized meaning of token itself, so this level feature carries necessary information for 1D tagging. (2) The attention matrix of different PLE layers can further capture multi-level linguistic knowledge existing in token pairs, which benefits 2D tagging. (3) Furthermore, with simple transformations, these two features can also be easily converted to the 2D tagging matrix and 1D tagging sequence, respectively. That will further boost the tagging results. By doing so, PLEs can be natural tagging frameworks and achieve a new state of the art, which is verified by extensive experiments and deep analyses.
[[2208.09625] Representing Knowledge by Spans: A Knowledge-Enhanced Model for Information Extraction](http://arxiv.org/abs/2208.09625)
Knowledge-enhanced pre-trained models for language representation have been shown to be more effective in knowledge base construction tasks (i.e.,~relation extraction) than language models such as BERT. These knowledge-enhanced language models incorporate knowledge into pre-training to generate representations of entities or relationships. However, existing methods typically represent each entity with a separate embedding. As a result, these methods struggle to represent out-of-vocabulary entities and a large amount of parameters, on top of their underlying token models (i.e.,~the transformer), must be used and the number of entities that can be handled is limited in practice due to memory constraints. Moreover, existing models still struggle to represent entities and relationships simultaneously. To address these problems, we propose a new pre-trained model that learns representations of both entities and relationships from token spans and span pairs in the text respectively. By encoding spans efficiently with span modules, our model can represent both entities and their relationships but requires fewer parameters than existing models. We pre-trained our model with the knowledge graph extracted from Wikipedia and test it on a broad range of supervised and unsupervised information extraction tasks. Results show that our model learns better representations for both entities and relationships than baselines, while in supervised settings, fine-tuning our model outperforms RoBERTa consistently and achieves competitive results on information extraction tasks.
[[2208.09705] gBuilder: A Scalable Knowledge Graph Construction System for Unstructured Corpus](http://arxiv.org/abs/2208.09705)
We design a user-friendly and scalable knowledge graph construction (KGC) system for extracting structured knowledge from the unstructured corpus. Different from existing KGC systems, gBuilder provides a flexible and user-defined pipeline to embracing the rapid development of IE models. More built-in template-based or heuristic operators and programmable operators are available for adapting to data from different domains. Furthermore, we also design a cloud-based self-adaptive task scheduling for gBuilder to ensure its scalability on large-scale knowledge graph construction. Experimental evaluation not only demonstrates the ability of gBuilder to organize multiple information extraction models for knowledge graph construction in a uniform platform, and also confirms its high scalability on large-scale KGC task.
[[2208.09715] SemEval-2022 Task 8: Multi-lingual News Article Similarity](http://arxiv.org/abs/2208.09715)
This work is about finding the similarity between a pair of news articles. There are seven different objective similarity metrics provided in the dataset for each pair and the news articles are in multiple different languages. On top of the pre-trained embedding model, we calculated cosine similarity for baseline results and feed-forward neural network was then trained on top of it to improve the results. We also built separate pipelines for each similarity metric for feature extraction. We could see significant improvement from baseline results using feature extraction and feed-forward neural network.
[[2208.10228] Survey of NLP in Pharmacology: Methodology, Tasks, Resources, Knowledge, and Tools](http://arxiv.org/abs/2208.10228)
Natural language processing (NLP) is an area of artificial intelligence that applies information technologies to process the human language, understand it to a certain degree, and use it in various applications. This area has rapidly developed in the last few years and now employs modern variants of deep neural networks to extract relevant patterns from large text corpora. The main objective of this work is to survey the recent use of NLP in the field of pharmacology. As our work shows, NLP is a highly relevant information extraction and processing approach for pharmacology. It has been used extensively, from intelligent searches through thousands of medical documents to finding traces of adversarial drug interactions in social media. We split our coverage into five categories to survey modern NLP methodology, commonly addressed tasks, relevant textual data, knowledge bases, and useful programming libraries. We split each of the five categories into appropriate subcategories, describe their main properties and ideas, and summarize them in a tabular form. The resulting survey presents a comprehensive overview of the area, useful to practitioners and interested observers.
[[2208.09838] Tyche: A library for probabilistic reasoning and belief modelling in Python](http://arxiv.org/abs/2208.09838)
This paper presents Tyche, a Python library to facilitate probabilistic reasoning in uncertain worlds through the construction, querying, and learning of belief models. Tyche uses aleatoric description logic (ADL), which provides computational advantages in its evaluation over other description logics. Tyche belief models can be succinctly created by defining classes of individuals, the probabilistic beliefs about them (concepts), and the probabilistic relationships between them (roles). We also introduce a method of observation propagation to facilitate learning from complex ADL observations. A demonstration of Tyche to predict the author of anonymised messages, and to extract author writing tendencies from anonymised messages, is provided. Tyche has the potential to assist in the development of expert systems, knowledge extraction systems, and agents to play games with incomplete and probabilistic information.
[[2208.09894] Byzantines can also Learn from History: Fall of Centered Clipping in Federated Learning](http://arxiv.org/abs/2208.09894)
The increasing popularity of the federated learning framework due to its success in a wide range of collaborative learning tasks also induces certain security concerns regarding the learned model due to the possibility of malicious clients participating in the learning process. Hence, the objective is to neutralize the impact of the malicious participants and to ensure the final model is trustable. One common observation regarding the Byzantine attacks is that the higher the variance among the clients' models/updates, the more space for attacks to be hidden. To this end, it has been recently shown that by utilizing momentum, thus reducing the variance, it is possible to weaken the strength of the known Byzantine attacks. The Centered Clipping framework (ICML 2021) has further shown that, besides reducing the variance, the momentum term from the previous iteration can be used as a reference point to neutralize the Byzantine attacks and show impressive performance against well-known attacks. However, in the scope of this work, we show that the centered clipping framework has certain vulnerabilities, and existing attacks can be revised based on these vulnerabilities to circumvent the centered clipping defense. Hence, we introduce a strategy to design an attack to circumvent the centered clipping framework and numerically illustrate its effectiveness against centered clipping as well as other known defense strategies by reducing test accuracy to 5-40 on best-case scenarios.
[[2208.10273] Long-Short History of Gradients is All You Need: Detecting Malicious and Unreliable Clients in Federated Learning](http://arxiv.org/abs/2208.10273)
Federated learning offers a framework of training a machine learning model in a distributed fashion while preserving privacy of the participants. As the server cannot govern the clients' actions, nefarious clients may attack the global model by sending malicious local gradients. In the meantime, there could also be unreliable clients who are benign but each has a portion of low-quality training data (e.g., blur or low-resolution images), thus may appearing similar as malicious clients. Therefore, a defense mechanism will need to perform a three-fold differentiation which is much more challenging than the conventional (two-fold) case. This paper introduces MUD-HoG, a novel defense algorithm that addresses this challenge in federated learning using long-short history of gradients, and treats the detected malicious and unreliable clients differently. Not only this, but we can also distinguish between targeted and untargeted attacks among malicious clients, unlike most prior works which only consider one type of the attacks. Specifically, we take into account sign-flipping, additive-noise, label-flipping, and multi-label-flipping attacks, under a non-IID setting. We evaluate MUD-HoG with six state-of-the-art methods on two datasets. The results show that MUD-HoG outperforms all of them in terms of accuracy as well as precision and recall, in the presence of a mixture of multiple (four) types of attackers as well as unreliable clients. Moreover, unlike most prior works which can only tolerate a low population of harmful users, MUD-HoG can work with and successfully detect a wide range of malicious and unreliable clients - up to 47.5% and 10%, respectively, of the total population. Our code is open-sourced at https://github.com/LabSAINT/MUD-HoG_Federated_Learning.
[[2208.10278] Practical Vertical Federated Learning with Unsupervised Representation Learning](http://arxiv.org/abs/2208.10278)
As societal concerns on data privacy recently increase, we have witnessed data silos among multiple parties in various applications. Federated learning emerges as a new learning paradigm that enables multiple parties to collaboratively train a machine learning model without sharing their raw data. Vertical federated learning, where each party owns different features of the same set of samples and only a single party has the label, is an important and challenging topic in federated learning. Communication costs among different parties have been a major hurdle for practical vertical learning systems. In this paper, we propose a novel communication-efficient vertical federated learning algorithm named FedOnce, which requires only one-shot communication among parties. To improve model accuracy and provide privacy guarantee, FedOnce features unsupervised learning representations in the federated setting and privacy-preserving techniques based on moments accountant. The comprehensive experiments on 10 datasets demonstrate that FedOnce achieves close performance compared to state-of-the-art vertical federated learning algorithms with much lower communication costs. Meanwhile, our privacy-preserving technique significantly outperforms the state-of-the-art approaches under the same privacy budget.
[[2208.09754] FLIS: Clustered Federated Learning via Inference Similarity for Non-IID Data Distribution](http://arxiv.org/abs/2208.09754)
Classical federated learning approaches yield significant performance degradation in the presence of Non-IID data distributions of participants. When the distribution of each local dataset is highly different from the global one, the local objective of each client will be inconsistent with the global optima which incur a drift in the local updates. This phenomenon highly impacts the performance of clients. This is while the primary incentive for clients to participate in federated learning is to obtain better personalized models. To address the above-mentioned issue, we present a new algorithm, FLIS, which groups the clients population in clusters with jointly trainable data distributions by leveraging the inference similarity of clients' models. This framework captures settings where different groups of users have their own objectives (learning tasks) but by aggregating their data with others in the same cluster (same learning task) to perform more efficient and personalized federated learning. We present experimental results to demonstrate the benefits of FLIS over the state-of-the-art benchmarks on CIFAR-100/10, SVHN, and FMNIST datasets. Our code is available at https://github.com/MMorafah/FLIS.
[[2208.10013] FairDisCo: Fairer AI in Dermatology via Disentanglement Contrastive Learning](http://arxiv.org/abs/2208.10013)
Deep learning models have achieved great success in automating skin lesion diagnosis. However, the ethnic disparity in these models' predictions, where lesions on darker skin types are usually underrepresented and have lower diagnosis accuracy, receives little attention. In this paper, we propose FairDisCo, a disentanglement deep learning framework with contrastive learning that utilizes an additional network branch to remove sensitive attributes, i.e. skin-type information from representations for fairness and another contrastive branch to enhance feature extraction. We compare FairDisCo to three fairness methods, namely, resampling, reweighting, and attribute-aware, on two newly released skin lesion datasets with different skin types: Fitzpatrick17k and Diverse Dermatology Images (DDI). We adapt two fairness-based metrics DPM and EOM for our multiple classes and sensitive attributes task, highlighting the skin-type bias in skin lesion classification. Extensive experimental evaluation demonstrates the effectiveness of FairDisCo, with fairer and superior performance on skin lesion classification tasks.
[[2208.10271] Agent-based Model of Initial Token Allocations: Evaluating Wealth Concentration in Fair Launches](http://arxiv.org/abs/2208.10271)
With advancements in distributed ledger technologies and smart contracts, tokenized voting rights gained prominence within Decentralized Finance (DeFi). Voting rights tokens (aka. governance tokens) are fungible tokens that grant individual holders the right to vote upon the fate of a project. The motivation behind these tokens is to achieve decentral control. Because the initial allocations of these tokens is often un-democratic, the DeFi project Yearn Finance experimented with a fair launch allocation where no tokens are pre-mined and all participants have an equal opportunity to receive them. Regardless, research on voting rights tokens highlights the formation of oligarchies over time. The hypothesis is that the tokens' tradability is the cause of concentration. To examine this proposition, this paper uses an Agent-based Model to simulate and analyze the concentration of voting rights tokens post fair launch under different trading modalities. It serves to examine three distinct token allocation scenarios considered as fair. The results show that regardless of the allocation, concentration persistently occurs. It confirms the hypothesis that the disease is endogenous: the cause of concentration is the tokens tradablility. The findings inform theoretical understandings and practical implications for on-chain governance mediated by tokens.
[[2208.09951] Bipartite Matchings with Group Fairness and Individual Fairness Constraints](http://arxiv.org/abs/2208.09951)
We address group as well as individual fairness constraints in matchings in
the context of assigning items to platforms. Each item belongs to certain
groups and has a preference ordering over platforms. Each platform enforces
group fairness by specifying an upper and a lower bound on the number of items
that can be matched to it from each group. There could be multiple optimal
solutions that satisfy the group fairness constraints. To achieve individual
fairness, we introduce probabilistic individual fairness', where the goal is
to compute a distribution over
group fair' matchings such that every item has
a reasonable probability of being matched to a platform among its top choices.
In the case where each item belongs to exactly one group, we provide a
polynomial-time algorithm that computes a probabilistic individually fair
distribution over group fair matchings. When an item can belong to multiple
groups, and the group fairness constraints are specified as only upper bounds,
we rehash the same algorithm to achieve three different polynomial-time
approximation algorithms.
[[2208.10095] Socially Fair Center-based and Linear Subspace Clustering](http://arxiv.org/abs/2208.10095)
Center-based clustering (e.g., $k$-means, $k$-medians) and clustering using linear subspaces are two most popular techniques to partition real-world data into smaller clusters. However, when the data consists of sensitive demographic groups, significantly different clustering cost per point for different sensitive groups can lead to fairness-related harms (e.g., different quality-of-service). The goal of socially fair clustering is to minimize the maximum cost of clustering per point over all groups. In this work, we propose a unified framework to solve socially fair center-based clustering and linear subspace clustering, and give practical, efficient approximation algorithms for these problems. We do extensive experiments to show that on multiple benchmark datasets our algorithms either closely match or outperform state-of-the-art baselines.
[[2208.10240] A Multimodal Transformer: Fusing Clinical Notes with Structured EHR Data for Interpretable In-Hospital Mortality Prediction](http://arxiv.org/abs/2208.10240)
Deep-learning-based clinical decision support using structured electronic health records (EHR) has been an active research area for predicting risks of mortality and diseases. Meanwhile, large amounts of narrative clinical notes provide complementary information, but are often not integrated into predictive models. In this paper, we provide a novel multimodal transformer to fuse clinical notes and structured EHR data for better prediction of in-hospital mortality. To improve interpretability, we propose an integrated gradients (IG) method to select important words in clinical notes and discover the critical structured EHR features with Shapley values. These important words and clinical features are visualized to assist with interpretation of the prediction outcomes. We also investigate the significance of domain adaptive pretraining and task adaptive fine-tuning on the Clinical BERT, which is used to learn the representations of clinical notes. Experiments demonstrated that our model outperforms other methods (AUCPR: 0.538, AUCROC: 0.877, F1:0.490).
[[2208.09944] MolGraph: a Python package for the implementation of small molecular graphs and graph neural networks with TensorFlow and Keras](http://arxiv.org/abs/2208.09944)
Molecular machine learning (ML) has proven important for tackling various molecular problems, including the prediction of protein-drug interactions and blood brain-barrier permeability. Since relatively recently, so-called graph neural networks (GNNs) have been implemented for molecular ML, showing comparable or superior performance to descriptor-based approaches. Although various tools and packages exist to apply GNNs for molecular ML, a new GNN package, named MolGraph (https://github.com/akensert/molgraph), was developed in this work with the motivation to create GNNs highly compatible with the TensorFlow and Keras application programming interface (API). As MolGraph focuses specifically and exclusively on molecular ML, a chemistry module was implemented to accommodate the generation of molecular graphs $\unicode{x2014}$ which could then be inputted to the GNNs for molecular ML. To validate the GNNs, they were benchmarked against the datasets of MoleculeNet, as well as three chromatographic retention time datasets. The results on these benchmarks show that the GNNs performed as expected. Additionally, the GNNs proved useful for molecular identification and improved interpretability of chromatographic retention data.