[[2210.09545] Fine-mixing: Mitigating Backdoors in Fine-tuned Language Models](http://arxiv.org/abs/2210.09545)
Deep Neural Networks (DNNs) are known to be vulnerable to backdoor attacks. In Natural Language Processing (NLP), DNNs are often backdoored during the fine-tuning process of a large-scale Pre-trained Language Model (PLM) with poisoned samples. Although the clean weights of PLMs are readily available, existing methods have ignored this information in defending NLP models against backdoor attacks. In this work, we take the first step to exploit the pre-trained (unfine-tuned) weights to mitigate backdoors in fine-tuned language models. Specifically, we leverage the clean pre-trained weights via two complementary techniques: (1) a two-step Fine-mixing technique, which first mixes the backdoored weights (fine-tuned on poisoned data) with the pre-trained weights, then fine-tunes the mixed weights on a small subset of clean data; (2) an Embedding Purification (E-PUR) technique, which mitigates potential backdoors existing in the word embeddings. We compare Fine-mixing with typical backdoor mitigation methods on three single-sentence sentiment classification tasks and two sentence-pair classification tasks and show that it outperforms the baselines by a considerable margin in all scenarios. We also show that our E-PUR method can benefit existing mitigation methods. Our work establishes a simple but strong baseline defense for secure fine-tuned NLP models against backdoor attacks.
[[2210.09439] CAN-BERT do it? Controller Area Network Intrusion Detection System based on BERT Language Model](http://arxiv.org/abs/2210.09439)
Due to the rising number of sophisticated customer functionalities,
electronic control units (ECUs) are increasingly integrated into modern
automotive systems. However, the high connectivity between the in-vehicle and
the external networks paves the way for hackers who could exploit in-vehicle
network protocols' vulnerabilities. Among these protocols, the Controller Area
Network (CAN), known as the most widely used in-vehicle networking technology,
lacks encryption and authentication mechanisms, making the communications
delivered by distributed ECUs insecure. Inspired by the outstanding performance
of bidirectional encoder representations from transformers (BERT) for improving
many natural language processing tasks, we propose in this paper CAN-BERT", a
deep learning based network intrusion detection system, to detect cyber attacks
on CAN bus protocol. We show that the BERT model can learn the sequence of
arbitration identifiers (IDs) in the CAN bus for anomaly detection using the
masked language model" unsupervised training objective. The experimental
results on the Car Hacking: Attack \& Defense Challenge 2020" dataset show
that
CAN-BERT" outperforms state-of-the-art approaches. In addition to being
able to identify in-vehicle intrusions in real-time within 0.8 ms to 3 ms w.r.t
CAN ID sequence length, it can also detect a wide variety of cyberattacks with
an F1-score of between 0.81 and 0.99.
[[2210.09802] NFGen: Automatic Non-linear Function Evaluation Code Generator for General-purpose MPC Platforms](http://arxiv.org/abs/2210.09802)
Due to the absence of a library for non-linear function evaluation, so-called general-purpose secure multi-party computation (MPC) are not as ''general'' as MPC programmers expect. Prior arts either naively reuse plaintext methods, resulting in suboptimal performance and even incorrect results, or handcraft ad hoc approximations for specific functions or platforms. We propose a general technique, NFGen, that utilizes pre-computed discrete piecewise polynomials to accurately approximate generic functions using fixed-point numbers. We implement it using a performance-prediction-based code generator to support different platforms. Conducting extensive evaluations of 23 non-linear functions against six MPC protocols on two platforms, we demonstrate significant performance, accuracy, and generality improvements over existing methods.
[[2210.09940] Automatic Detection of Fake Key Attacks in Secure Messaging](http://arxiv.org/abs/2210.09940)
Popular instant messaging applications such as WhatsApp and Signal provide end-to-end encryption for billions of users. They rely on a centralized, application-specific server to distribute public keys and relay encrypted messages between the users. Therefore, they prevent passive attacks but are vulnerable to some active attacks. A malicious or hacked server can distribute fake keys to users to perform man-in-the-middle or impersonation attacks. While typical secure messaging applications provide a manual method for users to detect these attacks, this burdens users, and studies show it is ineffective in practice. This paper presents KTACA, a completely automated approach for key verification that is oblivious to users and easy to deploy. We motivate KTACA by designing two approaches to automatic key verification. One approach uses client auditing (KTCA) and the second uses anonymous key monitoring (AKM). Both have relatively inferior security properties, leading to KTACA, which combines these approaches to provide the best of both worlds. We provide a security analysis of each defense, identifying which attacks they can automatically detect. We implement the active attacks to demonstrate they are possible, and we also create a prototype implementation of all the defenses to measure their performance and confirm their feasibility. Finally, we discuss the strengths and weaknesses of each defense, the overhead on clients and service providers, and deployment considerations.
[[2210.09617] Making Split Learning Resilient to Label Leakage by Potential Energy Loss](http://arxiv.org/abs/2210.09617)
As a practical privacy-preserving learning method, split learning has drawn much attention in academia and industry. However, its security is constantly being questioned since the intermediate results are shared during training and inference. In this paper, we focus on the privacy leakage problem caused by the trained split model, i.e., the attacker can use a few labeled samples to fine-tune the bottom model, and gets quite good performance. To prevent such kind of privacy leakage, we propose the potential energy loss to make the output of the bottom model become a more `complicated' distribution, by pushing outputs of the same class towards the decision boundary. Therefore, the adversary suffers a large generalization error when fine-tuning the bottom model with only a few leaked labeled samples. Experiment results show that our method significantly lowers the attacker's fine-tuning accuracy, making the split model more resilient to label leakage.
[[2210.09399] Probabilistic Forecasting Methods for System-Level Electricity Load Forecasting](http://arxiv.org/abs/2210.09399)
Load forecasts have become an integral part of energy security. Due to the various influencing factors that can be considered in such a forecast, there is also a wide range of models that attempt to integrate these parameters into a system in various ways. Due to the growing importance of probabilistic load forecast models, different approaches are presented in this analysis. The focus is on different models from the short-term sector. After that, another model from the long-term sector is presented. Then, the presented models are put in relation to each other and examined with reference to advantages and disadvantages. Afterwards, the presented papers are analyzed with focus on their comparability to each other. Finally, an outlook on further areas of development in the literature will be discussed.
[[2210.09634] DPIS: An Enhanced Mechanism for Differentially Private SGD with Importance Sampling](http://arxiv.org/abs/2210.09634)
Nowadays, differential privacy (DP) has become a well-accepted standard for privacy protection, and deep neural networks (DNN) have been immensely successful in machine learning. The combination of these two techniques, i.e., deep learning with differential privacy, promises the privacy-preserving release of high-utility models trained with sensitive data such as medical records. A classic mechanism for this purpose is DP-SGD, which is a differentially private version of the stochastic gradient descent (SGD) optimizer commonly used for DNN training. Subsequent approaches have improved various aspects of the model training process, including noise decay schedule, model architecture, feature engineering, and hyperparameter tuning. However, the core mechanism for enforcing DP in the SGD optimizer remains unchanged ever since the original DP-SGD algorithm, which has increasingly become a fundamental barrier limiting the performance of DP-compliant machine learning solutions.
Motivated by this, we propose DPIS, a novel mechanism for differentially private SGD training that can be used as a drop-in replacement of the core optimizer of DP-SGD, with consistent and significant accuracy gains over the latter. The main idea is to employ importance sampling (IS) in each SGD iteration for mini-batch selection, which reduces both sampling variance and the amount of random noise injected to the gradients that is required to satisfy DP. Integrating IS into the complex mathematical machinery of DP-SGD is highly non-trivial. DPIS addresses the challenge through novel mechanism designs, fine-grained privacy analysis, efficiency enhancements, and an adaptive gradient clipping optimization. Extensive experiments on four benchmark datasets, namely MNIST, FMNIST, CIFAR-10 and IMDb, demonstrate the superior effectiveness of DPIS over existing solutions for deep learning with differential privacy.
[[2210.09904] MaSS: Multi-attribute Selective Suppression](http://arxiv.org/abs/2210.09904)
The recent rapid advances in machine learning technologies largely depend on the vast richness of data available today, in terms of both the quantity and the rich content contained within. For example, biometric data such as images and voices could reveal people's attributes like age, gender, sentiment, and origin, whereas location/motion data could be used to infer people's activity levels, transportation modes, and life habits. Along with the new services and applications enabled by such technological advances, various governmental policies are put in place to regulate such data usage and protect people's privacy and rights. As a result, data owners often opt for simple data obfuscation (e.g., blur people's faces in images) or withholding data altogether, which leads to severe data quality degradation and greatly limits the data's potential utility.
Aiming for a sophisticated mechanism which gives data owners fine-grained control while retaining the maximal degree of data utility, we propose Multi-attribute Selective Suppression, or MaSS, a general framework for performing precisely targeted data surgery to simultaneously suppress any selected set of attributes while preserving the rest for downstream machine learning tasks. MaSS learns a data modifier through adversarial games between two sets of networks, where one is aimed at suppressing selected attributes, and the other ensures the retention of the rest of the attributes via general contrastive loss as well as explicit classification metrics. We carried out an extensive evaluation of our proposed method using multiple datasets from different domains including facial images, voice audio, and video clips, and obtained promising results in MaSS' generalizability and capability of suppressing targeted attributes without negatively affecting the data's usability in other downstream ML tasks.
[[2210.09963] Methods To Ensure Privacy Regarding Medical Data -- Including an examination of the differential privacy algorithm RAPPOR and its implementation in "Cryptool 2"](http://arxiv.org/abs/2210.09963)
This document examines several applicable methods to ensure privacy of data gathered in the health care sector. To ensure a common understanding of the topic, the introduction explains the need for anonymization methods based on an example. Next, reasons for data collection are introduced in connection to the purpose to protect mentioned data, as well as currently applicable privacy laws to enforce this privacy. The question "What kind of privacy we are talking about and what conditions have to be fulfilled?" is dealt with in the subsequent chapter "Differential Privacy". Thus being established, common anonymization methods are explained and reviewed for their use in the healthcare sector. The RAPPOR algorithm and its differential privacy is dealt with in more detail before coming to a conclusion.
[[2210.09394] Review Learning: Alleviating Catastrophic Forgetting with Generative Replay without Generator](http://arxiv.org/abs/2210.09394)
When a deep learning model is sequentially trained on different datasets, it forgets the knowledge acquired from previous data, a phenomenon known as catastrophic forgetting. It deteriorates performance of the deep learning model on diverse datasets, which is critical in privacy-preserving deep learning (PPDL) applications based on transfer learning (TL). To overcome this, we propose review learning (RL), a generative-replay-based continual learning technique that does not require a separate generator. Data samples are generated from the memory stored within the synaptic weights of the deep learning model which are used to review knowledge acquired from previous datasets. The performance of RL was validated through PPDL experiments. Simulations and real-world medical multi-institutional experiments were conducted using three types of binary classification electronic health record data. In the real-world experiments, the global area under the receiver operating curve was 0.710 for RL and 0.655 for TL. Thus, RL was highly effective in retaining previously learned knowledge.
[[2210.09618] Object Recognition in Different Lighting Conditions at Various Angles by Deep Learning Method](http://arxiv.org/abs/2210.09618)
Existing computer vision and object detection methods strongly rely on neural networks and deep learning. This active research area is used for applications such as autonomous driving, aerial photography, protection, and monitoring. Futuristic object detection methods rely on rectangular, boundary boxes drawn over an object to accurately locate its location. The modern object recognition algorithms, however, are vulnerable to multiple factors, such as illumination, occlusion, viewing angle, or camera rotation as well as cost. Therefore, deep learning-based object recognition will significantly increase the recognition speed and compatible external interference. In this study, we use convolutional neural networks (CNN) to recognize items, the neural networks have the advantages of end-to-end, sparse relation, and sharing weights. This article aims to classify the name of the various object based on the position of an object's detected box. Instead, under different distances, we can get recognition results with different confidence. Through this study, we find that this model's accuracy through recognition is mainly influenced by the proportion of objects and the number of samples. When we have a small proportion of an object on camera, then we get higher recognition accuracy; if we have a much small number of samples, we can get greater accuracy in recognition. The epidemic has a great impact on the world economy where designing a cheaper object recognition system is the need of time.
[[2210.09375] Reconstruction Attack on Differential Private Trajectory Protection Mechanisms](http://arxiv.org/abs/2210.09375)
Location trajectories collected by smartphones and other devices represent a valuable data source for applications such as location-based services. Likewise, trajectories have the potential to reveal sensitive information about individuals, e.g., religious beliefs or sexual orientations. Accordingly, trajectory datasets require appropriate sanitization. Due to their strong theoretical privacy guarantees, differential private publication mechanisms receive much attention. However, the large amount of noise required to achieve differential privacy yields structural differences, e.g., ship trajectories passing over land. We propose a deep learning-based Reconstruction Attack on Protected Trajectories (RAoPT), that leverages the mentioned differences to partly reconstruct the original trajectory from a differential private release. The evaluation shows that our RAoPT model can reduce the Euclidean and Hausdorff distances between the released and original trajectories by over 68% on two real-world datasets under protection with $\varepsilon \leq 1$. In this setting, the attack increases the average Jaccard index of the trajectories' convex hulls, representing a user's activity space, by over 180%. Trained on the GeoLife dataset, the model still reduces the Euclidean and Hausdorff distances by over 60% for T-Drive trajectories protected with a state-of-the-art mechanism ($\varepsilon = 0.1$). This work highlights shortcomings of current trajectory publication mechanisms, and thus motivates further research on privacy-preserving publication schemes.
[[2210.09917] Controllable Fake Document Infilling for Cyber Deception](http://arxiv.org/abs/2210.09917)
Recent works in cyber deception study how to deter malicious intrusion by generating multiple fake versions of a critical document to impose costs on adversaries who need to identify the correct information. However, existing approaches are context-agnostic, resulting in sub-optimal and unvaried outputs. We propose a novel context-aware model, Fake Document Infilling (FDI), by converting the problem to a controllable mask-then-infill procedure. FDI masks important concepts of varied lengths in the document, then infills a realistic but fake alternative considering both the previous and future contexts. We conduct comprehensive evaluations on technical documents and news stories. Results show that FDI outperforms the baselines in generating highly believable fakes with moderate modification to protect critical information and deceive adversaries.
[[2210.09515] Artificial intelligence and renegotiation of commercial lease contracts affected by pandemic-related contingencies from Covid-19](http://arxiv.org/abs/2210.09515)
This paper aims to investigate the possibility of using artificial intelligence (AI) to resolve the legal issues raised by the Covid-19 emergency about the fate of continuing execution contracts, or those with deferred or periodic execution, as well as, more generally, to deal with exceptional events and contingencies. We first study whether the Italian legal system allows for ''maintenance'' remedies to cope with contingencies and to avoid the termination of the contract, while ensuring effective protection of the interests of both parties. We then give a complete and technical description of an AI-based predictive framework, aimed at assisting both the Magistrate (in the course of litigation) and the parties themselves (in out-of-court proceedings) in the redetermination of the rent of commercial lease contracts. This framework, called A.I.A.Co. for Artificial Intelligence for contract law Against Covid-19, has been developed under the Italian grant ''Fondo Integrativo Speciale per la Ricerca''.
[[2210.09852] Scaling Adversarial Training to Large Perturbation Bounds](http://arxiv.org/abs/2210.09852)
The vulnerability of Deep Neural Networks to Adversarial Attacks has fuelled research towards building robust models. While most Adversarial Training algorithms aim at defending attacks constrained within low magnitude Lp norm bounds, real-world adversaries are not limited by such constraints. In this work, we aim to achieve adversarial robustness within larger bounds, against perturbations that may be perceptible, but do not change human (or Oracle) prediction. The presence of images that flip Oracle predictions and those that do not makes this a challenging setting for adversarial robustness. We discuss the ideal goals of an adversarial defense algorithm beyond perceptual limits, and further highlight the shortcomings of naively extending existing training algorithms to higher perturbation bounds. In order to overcome these shortcomings, we propose a novel defense, Oracle-Aligned Adversarial Training (OA-AT), to align the predictions of the network with that of an Oracle during adversarial training. The proposed approach achieves state-of-the-art performance at large epsilon bounds (such as an L-inf bound of 16/255 on CIFAR-10) while outperforming existing defenses (AWP, TRADES, PGD-AT) at standard bounds (8/255) as well.
[[2210.09421] Deepfake Text Detection: Limitations and Opportunities](http://arxiv.org/abs/2210.09421)
Recent advances in generative models for language have enabled the creation of convincing synthetic text or deepfake text. Prior work has demonstrated the potential for misuse of deepfake text to mislead content consumers. Therefore, deepfake text detection, the task of discriminating between human and machine-generated text, is becoming increasingly critical. Several defenses have been proposed for deepfake text detection. However, we lack a thorough understanding of their real-world applicability. In this paper, we collect deepfake text from 4 online services powered by Transformer-based tools to evaluate the generalization ability of the defenses on content in the wild. We develop several low-cost adversarial attacks, and investigate the robustness of existing defenses against an adaptive attacker. We find that many defenses show significant degradation in performance under our evaluation scenarios compared to their original claimed performance. Our evaluation shows that tapping into the semantic information in the text content is a promising approach for improving the robustness and generalization performance of deepfake text detection schemes.
[[2210.09364] Probabilistic Categorical Adversarial Attack & Adversarial Training](http://arxiv.org/abs/2210.09364)
The existence of adversarial examples brings huge concern for people to apply Deep Neural Networks (DNNs) in safety-critical tasks. However, how to generate adversarial examples with categorical data is an important problem but lack of extensive exploration. Previously established methods leverage greedy search method, which can be very time-consuming to conduct successful attack. This also limits the development of adversarial training and potential defenses for categorical data. To tackle this problem, we propose Probabilistic Categorical Adversarial Attack (PCAA), which transfers the discrete optimization problem to a continuous problem that can be solved efficiently by Projected Gradient Descent. In our paper, we theoretically analyze its optimality and time complexity to demonstrate its significant advantage over current greedy based attacks. Moreover, based on our attack, we propose an efficient adversarial training framework. Through a comprehensive empirical study, we justify the effectiveness of our proposed attack and defense algorithms.
[[2210.09405] Towards Generating Adversarial Examples on Mixed-type Data](http://arxiv.org/abs/2210.09405)
The existence of adversarial attacks (or adversarial examples) brings huge concern about the machine learning (ML) model's safety issues. For many safety-critical ML tasks, such as financial forecasting, fraudulent detection, and anomaly detection, the data samples are usually mixed-type, which contain plenty of numerical and categorical features at the same time. However, how to generate adversarial examples with mixed-type data is still seldom studied. In this paper, we propose a novel attack algorithm M-Attack, which can effectively generate adversarial examples in mixed-type data. Based on M-Attack, attackers can attempt to mislead the targeted classification model's prediction, by only slightly perturbing both the numerical and categorical features in the given data samples. More importantly, by adding designed regularizations, our generated adversarial examples can evade potential detection models, which makes the attack indeed insidious. Through extensive empirical studies, we validate the effectiveness and efficiency of our attack method and evaluate the robustness of existing classification models against our proposed attack. The experimental results highlight the feasibility of generating adversarial examples toward machine learning models in real-world applications.
[[2210.09482] You Can't See Me: Physical Removal Attacks on LiDAR-based Autonomous Vehicles Driving Frameworks](http://arxiv.org/abs/2210.09482)
Autonomous Vehicles (AVs) increasingly use LiDAR-based object detection systems to perceive other vehicles and pedestrians on the road. While existing attacks on LiDAR-based autonomous driving architectures focus on lowering the confidence score of AV object detection models to induce obstacle misdetection, our research discovers how to leverage laser-based spoofing techniques to selectively remove the LiDAR point cloud data of genuine obstacles at the sensor level before being used as input to the AV perception. The ablation of this critical LiDAR information causes autonomous driving obstacle detectors to fail to identify and locate obstacles and, consequently, induces AVs to make dangerous automatic driving decisions. In this paper, we present a method invisible to the human eye that hides objects and deceives autonomous vehicles' obstacle detectors by exploiting inherent automatic transformation and filtering processes of LiDAR sensor data integrated with autonomous driving frameworks. We call such attacks Physical Removal Attacks (PRA), and we demonstrate their effectiveness against three popular AV obstacle detectors (Apollo, Autoware, PointPillars), and we achieve 45{\deg} attack capability. We evaluate the attack impact on three fusion models (Frustum-ConvNet, AVOD, and Integrated-Semantic Level Fusion) and the consequences on the driving decision using LGSVL, an industry-grade simulator. In our moving vehicle scenarios, we achieve a 92.7% success rate removing 90% of a target obstacle's cloud points. Finally, we demonstrate the attack's success against two popular defenses against spoofing and object hiding attacks and discuss two enhanced defense strategies to mitigate our attack.
[[2210.09503] Towards Fair Classification against Poisoning Attacks](http://arxiv.org/abs/2210.09503)
Fair classification aims to stress the classification models to achieve the equality (treatment or prediction quality) among different sensitive groups. However, fair classification can be under the risk of poisoning attacks that deliberately insert malicious training samples to manipulate the trained classifiers' performance. In this work, we study the poisoning scenario where the attacker can insert a small fraction of samples into training data, with arbitrary sensitive attributes as well as other predictive features. We demonstrate that the fairly trained classifiers can be greatly vulnerable to such poisoning attacks, with much worse accuracy & fairness trade-off, even when we apply some of the most effective defenses (originally proposed to defend traditional classification tasks). As countermeasures to defend fair classification tasks, we propose a general and theoretically guaranteed framework which accommodates traditional defense methods to fair classification against poisoning attacks. Through extensive experiments, the results validate that the proposed defense framework obtains better robustness in terms of accuracy and fairness than representative baseline methods.
[[2210.09466] Anisotropic Multi-Scale Graph Convolutional Network for Dense Shape Correspondence](http://arxiv.org/abs/2210.09466)
This paper studies 3D dense shape correspondence, a key shape analysis application in computer vision and graphics. We introduce a novel hybrid geometric deep learning-based model that learns geometrically meaningful and discretization-independent features with a U-Net model as the primary node feature extraction module, followed by a successive spectral-based graph convolutional network. To create a diverse set of filters, we use anisotropic wavelet basis filters, being sensitive to both different directions and band-passes. This filter set overcomes the over-smoothing behavior of conventional graph neural networks. To further improve the model's performance, we add a function that perturbs the feature maps in the last layer ahead of fully connected layers, forcing the network to learn more discriminative features overall. The resulting correspondence maps show state-of-the-art performance on the benchmark datasets based on average geodesic errors and superior robustness to discretization in 3D meshes. Our approach provides new insights and practical solutions to the dense shape correspondence research.
[[2210.09509] Deep Data Augmentation for Weed Recognition Enhancement: A Diffusion Probabilistic Model and Transfer Learning Based Approach](http://arxiv.org/abs/2210.09509)
Weed management plays an important role in many modern agricultural applications. Conventional weed control methods mainly rely on chemical herbicides or hand weeding, which are often cost-ineffective, environmentally unfriendly, or even posing a threat to food safety and human health. Recently, automated/robotic weeding using machine vision systems has seen increased research attention with its potential for precise and individualized weed treatment. However, dedicated, large-scale, and labeled weed image datasets are required to develop robust and effective weed identification systems but they are often difficult and expensive to obtain. To address this issue, data augmentation approaches, such as generative adversarial networks (GANs), have been explored to generate highly realistic images for agricultural applications. Yet, despite some progress, those approaches are often complicated to train or have difficulties preserving fine details in images. In this paper, we present the first work of applying diffusion probabilistic models (also known as diffusion models) to generate high-quality synthetic weed images based on transfer learning. Comprehensive experimental results show that the developed approach consistently outperforms several state-of-the-art GAN models, representing the best trade-off between sample fidelity and diversity and highest FID score on a common weed dataset, CottonWeedID15. In addition, the expanding dataset with synthetic weed images can apparently boost model performance on four deep learning (DL) models for the weed classification tasks. Furthermore, the DL models trained on CottonWeedID15 dataset with only 10% of real images and 90% of synthetic weed images achieve a testing accuracy of over 94%, showing high-quality of the generated weed samples. The codes of this study are made publicly available at https://github.com/DongChen06/DMWeeds.
[[2210.09520] Using Language to Extend to Unseen Domains](http://arxiv.org/abs/2210.09520)
It is expensive to collect training data for every possible domain that a vision model may encounter when deployed. We instead consider how simply verbalizing the training domain (e.g. "photos of birds") as well as domains we want to extend to but do not have data for (e.g. "paintings of birds") can improve robustness. Using a multimodal model with a joint image and language embedding space, our method LADS learns a transformation of the image embeddings from the training domain to each unseen test domain, while preserving task relevant information. Without using any images from the unseen test domain, we show that over the extended domain containing both training and unseen test domains, LADS outperforms standard fine-tuning and ensemble approaches over a suite of four benchmarks targeting domain adaptation and dataset bias
[[2210.09643] Improving Adversarial Robustness by Contrastive Guided Diffusion Process](http://arxiv.org/abs/2210.09643)
Synthetic data generation has become an emerging tool to help improve the adversarial robustness in classification tasks since robust learning requires a significantly larger amount of training samples compared with standard classification tasks. Among various deep generative models, the diffusion model has been shown to produce high-quality synthetic images and has achieved good performance in improving the adversarial robustness. However, diffusion-type methods are typically slow in data generation as compared with other generative models. Although different acceleration techniques have been proposed recently, it is also of great importance to study how to improve the sample efficiency of generated data for the downstream task. In this paper, we first analyze the optimality condition of synthetic distribution for achieving non-trivial robust accuracy. We show that enhancing the distinguishability among the generated data is critical for improving adversarial robustness. Thus, we propose the Contrastive-Guided Diffusion Process (Contrastive-DP), which adopts the contrastive loss to guide the diffusion model in data generation. We verify our theoretical results using simulations and demonstrate the good performance of Contrastive-DP on image datasets.
[[2210.09655] WaGI : Wavelet-based GAN Inversion for Preserving High-frequency Image Details](http://arxiv.org/abs/2210.09655)
Recent GAN inversion models focus on preserving image-specific details through various methods, e.g., generator tuning or feature mixing. While those are helpful for preserving details compared to a naiive low-rate latent inversion, they still fail to maintain high-frequency features precisely. In this paper, we point out that the existing GAN inversion models have inherent limitations in both structural and training aspects, which preclude the delicate reconstruction of high-frequency features. Especially, we prove that the widely-used loss term in GAN inversion, i.e., L2, is biased to reconstruct low-frequency features mainly. To overcome this problem, we propose a novel GAN inversion model, coined WaGI, which enables to handle high-frequency features explicitly, by using a novel wavelet-based loss term and a newly proposed wavelet fusion scheme. To the best of our knowledge, WaGI is the first attempt to interpret GAN inversion in the frequency domain. We demonstrate that WaGI shows outstanding results on both inversion and editing, compared to the existing state-of-the-art GAN inversion models. Especially, WaGI robustly preserves high-frequency features of images even in the editing scenario. We will release our code with the pre-trained model after the review.
[[2210.09670] Hierarchical Normalization for Robust Monocular Depth Estimation](http://arxiv.org/abs/2210.09670)
In this paper, we address monocular depth estimation with deep neural networks. To enable training of deep monocular estimation models with various sources of datasets, state-of-the-art methods adopt image-level normalization strategies to generate affine-invariant depth representations. However, learning with image-level normalization mainly emphasizes the relations of pixel representations with the global statistic in the images, such as the structure of the scene, while the fine-grained depth difference may be overlooked. In this paper, we propose a novel multi-scale depth normalization method that hierarchically normalizes the depth representations based on spatial information and depth distributions. Compared with previous normalization strategies applied only at the holistic image level, the proposed hierarchical normalization can effectively preserve the fine-grained details and improve accuracy. We present two strategies that define the hierarchical normalization contexts in the depth domain and the spatial domain, respectively. Our extensive experiments show that the proposed normalization strategy remarkably outperforms previous normalization methods, and we set new state-of-the-art on five zero-shot transfer benchmark datasets.
[[2210.09796] Inception-Based Crowd Counting -- Being Fast while Remaining Accurate](http://arxiv.org/abs/2210.09796)
Recent sophisticated CNN-based algorithms have demonstrated their extraordinary ability to automate counting crowds from images, thanks to their structures which are designed to address the issue of various head scales. However, these complicated architectures also increase computational complexity enormously, making real-time estimation implausible. Thus, in this paper, a new method, based on Inception-V3, is proposed to reduce the amount of computation. This proposed approach (ICC), exploits the first five inception blocks and the contextual module designed in CAN to extract features at different receptive fields, thereby being context-aware. The employment of these two different strategies can also increase the model's robustness. Experiments show that ICC can at best reduce 85.3 percent calculations with 24.4 percent performance loss. This high efficiency contributes significantly to the deployment of crowd counting models in surveillance systems to guard the public safety. The code will be available at https://github.com/YIMINGMA/CrowdCounting-ICC,and its pre-trained weights on the Crowd Counting dataset, which comprises a large variety of scenes from surveillance perspectives, will also open-sourced.
[[2210.09846] Analyzing the Robustness of PECNet](http://arxiv.org/abs/2210.09846)
Comprehensive robustness analysis of PECNet, a pedestrian trajectory prediction system for autonomous vehicles. A novel metric is introduced for dataset analysis and classification. Synthetic data augmentation techniques ranging from Newtonian mechanics to Deep Reinforcement Learning based simulations are used to improve and test the system. An improvement of 9.5% over state-of-the-art results is seen on the FDE while compromising ADE. We introduce novel architectural changes using SIRENs for higher precision results to validate our robustness hypotheses. Additionally, we diagrammatically propose a novel multi-modal system for the same task.
[[2210.09900] SA-DNet: A on-demand semantic object registration network adapting to non-rigid deformation](http://arxiv.org/abs/2210.09900)
As an essential processing step before the fusing of infrared and visible images, the performance of image registration determines whether the two images can be fused at correct spatial position. In the actual scenario, the varied imaging devices may lead to a change in perspective or time gap between shots, making significant non-rigid spatial relationship in infrared and visible images. Even if a large number of feature points are matched, the registration accuracy may still be inadequate, affecting the result of image fusion and other vision tasks. To alleviate this problem, we propose a Semantic-Aware on-Demand registration network (SA-DNet), which mainly purpose is to confine the feature matching process to the semantic region of interest (sROI) by designing semantic-aware module (SAM) and HOL-Deep hybrid matching module (HDM). After utilizing TPS to transform infrared and visible images based on the corresponding feature points in sROI, the registered images are fused using image fusion module (IFM) to achieve a fully functional registration and fusion network. Moreover, we point out that for different demands, this type of approach allows us to select semantic objects for feature matching as needed and accomplishes task-specific registration based on specific requirements. To demonstrate the robustness of SA-DNet for non-rigid distortions, we conduct extensive experiments by comparing SA-DNet with five state-of-the-art infrared and visible image feature matching methods, and the experimental results show that our method adapts better to the presence of non-rigid distortions in the images and provides semantically well-registered images.
[[2210.09996] Perceptual Grouping in Vision-Language Models](http://arxiv.org/abs/2210.09996)
Recent advances in zero-shot image recognition suggest that vision-language models learn generic visual representations with a high degree of semantic information that may be arbitrarily probed with natural language phrases. Understanding an image, however, is not just about understanding what content resides within an image, but importantly, where that content resides. In this work we examine how well vision-language models are able to understand where objects reside within an image and group together visually related parts of the imagery. We demonstrate how contemporary vision and language representation learning models based on contrastive losses and large web-based data capture limited object localization information. We propose a minimal set of modifications that results in models that uniquely learn both semantic and spatial information. We measure this performance in terms of zero-shot image recognition, unsupervised bottom-up and top-down semantic segmentations, as well as robustness analyses. We find that the resulting model achieves state-of-the-art results in terms of unsupervised segmentation, and demonstrate that the learned representations are uniquely robust to spurious correlations in datasets designed to probe the causal behavior of vision models.
[[2210.10020] ULN: Towards Underspecified Vision-and-Language Navigation](http://arxiv.org/abs/2210.10020)
Vision-and-Language Navigation (VLN) is a task to guide an embodied agent moving to a target position using language instructions. Despite the significant performance improvement, the wide use of fine-grained instructions fails to characterize more practical linguistic variations in reality. To fill in this gap, we introduce a new setting, namely Underspecified vision-and-Language Navigation (ULN), and associated evaluation datasets. ULN evaluates agents using multi-level underspecified instructions instead of purely fine-grained or coarse-grained, which is a more realistic and general setting. As a primary step toward ULN, we propose a VLN framework that consists of a classification module, a navigation agent, and an Exploitation-to-Exploration (E2E) module. Specifically, we propose to learn Granularity Specific Sub-networks (GSS) for the agent to ground multi-level instructions with minimal additional parameters. Then, our E2E module estimates grounding uncertainty and conducts multi-step lookahead exploration to improve the success rate further. Experimental results show that existing VLN models are still brittle to multi-level language underspecification. Our framework is more robust and outperforms the baselines on ULN by ~10% relative success rate across all levels.
[[2210.09559] Unsupervised Inference of Data-Driven Discourse Structures using a Tree Auto-Encoder](http://arxiv.org/abs/2210.09559)
With a growing need for robust and general discourse structures in many downstream tasks and real-world applications, the current lack of high-quality, high-quantity discourse trees poses a severe shortcoming. In order the alleviate this limitation, we propose a new strategy to generate tree structures in a task-agnostic, unsupervised fashion by extending a latent tree induction framework with an auto-encoding objective. The proposed approach can be applied to any tree-structured objective, such as syntactic parsing, discourse parsing and others. However, due to the especially difficult annotation process to generate discourse trees, we initially develop such method to complement task-specific models in generating much larger and more diverse discourse treebanks.
[[2210.09565] Towards Domain-Independent Supervised Discourse Parsing Through Gradient Boosting](http://arxiv.org/abs/2210.09565)
Discourse analysis and discourse parsing have shown great impact on many important problems in the field of Natural Language Processing (NLP). Given the direct impact of discourse annotations on model performance and interpretability, robustly extracting discourse structures from arbitrary documents is a key task to further improve computational models in NLP. To this end, we present a new, supervised paradigm directly tackling the domain adaptation issue in discourse parsing. Specifically, we introduce the first fully supervised discourse parser designed to alleviate the domain dependency through a staged model of weak classifiers by introducing the gradient boosting framework.
[[2210.09644] Tencent's Multilingual Machine Translation System for WMT22 Large-Scale African Languages](http://arxiv.org/abs/2210.09644)
This paper describes Tencent's multilingual machine translation systems for the WMT22 shared task on Large-Scale Machine Translation Evaluation for African Languages. We participated in the $\mathbf{constrained}$ translation track in which only the data and pretrained models provided by the organizer are allowed. The task is challenging due to three problems, including the absence of training data for some to-be-evaluated language pairs, the uneven optimization of language pairs caused by data imbalance, and the curse of multilinguality. To address these problems, we adopt data augmentation, distributionally robust optimization, and language family grouping, respectively, to develop our multilingual neural machine translation (MNMT) models. Our submissions won the $\mathbf{1st\ place}$ on the blind test sets in terms of the automatic evaluation metrics. Codes, models, and detailed competition results are available at https://github.com/wxjiao/WMT2022-Large-Scale-African.
[[2210.09658] ROSE: Robust Selective Fine-tuning for Pre-trained Language Models](http://arxiv.org/abs/2210.09658)
Even though the large-scale language models have achieved excellent performances, they suffer from various adversarial attacks. A large body of defense methods has been proposed. However, they are still limited due to redundant attack search spaces and the inability to defend against various types of attacks. In this work, we present a novel fine-tuning approach called \textbf{RO}bust \textbf{SE}letive fine-tuning (\textbf{ROSE}) to address this issue. ROSE conducts selective updates when adapting pre-trained models to downstream tasks, filtering out invaluable and unrobust updates of parameters. Specifically, we propose two strategies: the first-order and second-order ROSE for selecting target robust parameters. The experimental results show that ROSE achieves significant improvements in adversarial robustness on various downstream NLP tasks, and the ensemble method even surpasses both variants above. Furthermore, ROSE can be easily incorporated into existing fine-tuning methods to improve their adversarial robustness further. The empirical analysis confirms that ROSE eliminates unrobust spurious updates during fine-tuning, leading to solutions corresponding to flatter and wider optima than the conventional method. Code is available at \url{https://github.com/jiangllan/ROSE}.
[[2210.09934] A Simple and Effective Method to Improve Zero-Shot Cross-Lingual Transfer Learning](http://arxiv.org/abs/2210.09934)
Existing zero-shot cross-lingual transfer methods rely on parallel corpora or bilingual dictionaries, which are expensive and impractical for low-resource languages. To disengage from these dependencies, researchers have explored training multilingual models on English-only resources and transferring them to low-resource languages. However, its effect is limited by the gap between embedding clusters of different languages. To address this issue, we propose Embedding-Push, Attention-Pull, and Robust targets to transfer English embeddings to virtual multilingual embeddings without semantic loss, thereby improving cross-lingual transferability. Experimental results on mBERT and XLM-R demonstrate that our method significantly outperforms previous works on the zero-shot cross-lingual text classification task and can obtain a better multilingual alignment.
[[2210.10040] The Tail Wagging the Dog: Dataset Construction Biases of Social Bias Benchmarks](http://arxiv.org/abs/2210.10040)
How reliably can we trust the scores obtained from social bias benchmarks as faithful indicators of problematic social biases in a given language model? In this work, we study this question by contrasting social biases with non-social biases stemming from choices made during dataset construction that might not even be discernible to the human eye. To do so, we empirically simulate various alternative constructions for a given benchmark based on innocuous modifications (such as paraphrasing or random-sampling) that maintain the essence of their social bias. On two well-known social bias benchmarks (Winogender and BiasNLI) we observe that these shallow modifications have a surprising effect on the resulting degree of bias across various models. We hope these troubling observations motivate more robust measures of social biases.
[[2210.10041] Hidden State Variability of Pretrained Language Models Can Guide Computation Reduction for Transfer Learning](http://arxiv.org/abs/2210.10041)
While transferring a pretrained language model, common approaches conventionally attach their task-specific classifiers to the top layer and adapt all the pretrained layers. We investigate whether one could make a task-specific selection on which subset of the layers to adapt and where to place the classifier. The goal is to reduce the computation cost of transfer learning methods (e.g. fine-tuning or adapter-tuning) without sacrificing its performance.
We propose to select layers based on the variability of their hidden states given a task-specific corpus. We say a layer is already ``well-specialized'' in a task if the within-class variability of its hidden states is low relative to the between-class variability. Our variability metric is cheap to compute and doesn't need any training or hyperparameter tuning. It is robust to data imbalance and data scarcity. Extensive experiments on the GLUE benchmark demonstrate that selecting layers based on our metric can yield significantly stronger performance than using the same number of top layers and often match the performance of fine-tuning or adapter-tuning the entire language model.
[[2210.09671] Not All Poisons are Created Equal: Robust Training against Data Poisoning](http://arxiv.org/abs/2210.09671)
Data poisoning causes misclassification of test time target examples by injecting maliciously crafted samples in the training data. Existing defenses are often effective only against a specific type of targeted attack, significantly degrade the generalization performance, or are prohibitive for standard deep learning pipelines.
In this work, we propose an efficient defense mechanism that significantly reduces the success rate of various data poisoning attacks, and provides theoretical guarantees for the performance of the model. Targeted attacks work by adding bounded perturbations to a randomly selected subset of training data to match the targets' gradient or representation. We show that: (i) under bounded perturbations, only a number of poisons can be optimized to have a gradient that is close enough to that of the target and make the attack successful; (ii) such effective poisons move away from their original class and get isolated in the gradient space; (iii) dropping examples in low-density gradient regions during training can successfully eliminate the effective poisons, and guarantees similar training dynamics to that of training on full data. Our extensive experiments show that our method significantly decreases the success rate of state-of-the-art targeted attacks, including Gradient Matching and Bullseye Polytope, and easily scales to large datasets.
[[2210.09337] Robust Imitation of a Few Demonstrations with a Backwards Model](http://arxiv.org/abs/2210.09337)
Behavior cloning of expert demonstrations can speed up learning optimal policies in a more sample-efficient way over reinforcement learning. However, the policy cannot extrapolate well to unseen states outside of the demonstration data, creating covariate shift (agent drifting away from demonstrations) and compounding errors. In this work, we tackle this issue by extending the region of attraction around the demonstrations so that the agent can learn how to get back onto the demonstrated trajectories if it veers off-course. We train a generative backwards dynamics model and generate short imagined trajectories from states in the demonstrations. By imitating both demonstrations and these model rollouts, the agent learns the demonstrated paths and how to get back onto these paths. With optimal or near-optimal demonstrations, the learned policy will be both optimal and robust to deviations, with a wider region of attraction. On continuous control domains, we evaluate the robustness when starting from different initial states unseen in the demonstration data. While both our method and other imitation learning baselines can successfully solve the tasks for initial states in the training distribution, our method exhibits considerably more robustness to different initial states.
[[2210.10010] Vision Paper: Causal Inference for Interpretable and Robust Machine Learning in Mobility Analysis](http://arxiv.org/abs/2210.10010)
Artificial intelligence (AI) is revolutionizing many areas of our lives, leading a new era of technological advancement. Particularly, the transportation sector would benefit from the progress in AI and advance the development of intelligent transportation systems. Building intelligent transportation systems requires an intricate combination of artificial intelligence and mobility analysis. The past few years have seen rapid development in transportation applications using advanced deep neural networks. However, such deep neural networks are difficult to interpret and lack robustness, which slows the deployment of these AI-powered algorithms in practice. To improve their usability, increasing research efforts have been devoted to developing interpretable and robust machine learning methods, among which the causal inference approach recently gained traction as it provides interpretable and actionable information. Moreover, most of these methods are developed for image or sequential data which do not satisfy specific requirements of mobility data analysis. This vision paper emphasizes research challenges in deep learning-based mobility analysis that require interpretability and robustness, summarizes recent developments in using causal inference for improving the interpretability and robustness of machine learning methods, and highlights opportunities in developing causally-enabled machine learning models tailored for mobility analysis. This research direction will make AI in the transportation sector more interpretable and reliable, thus contributing to safer, more efficient, and more sustainable future transportation systems.
[[2210.09382] Tight Analysis of Extra-gradient and Optimistic Gradient Methods For Nonconvex Minimax Problems](http://arxiv.org/abs/2210.09382)
Despite the established convergence theory of Optimistic Gradient Descent Ascent (OGDA) and Extragradient (EG) methods for the convex-concave minimax problems, little is known about the theoretical guarantees of these methods in nonconvex settings. To bridge this gap, for the first time, this paper establishes the convergence of OGDA and EG methods under the nonconvex-strongly-concave (NC-SC) and nonconvex-concave (NC-C) settings by providing a unified analysis through the lens of single-call extra-gradient methods. We further establish lower bounds on the convergence of GDA/OGDA/EG, shedding light on the tightness of our analysis. We also conduct experiments supporting our theoretical results. We believe our results will advance the theoretical understanding of OGDA and EG methods for solving complicated nonconvex minimax real-world problems, e.g., Generative Adversarial Networks (GANs) or robust neural networks training.
[[2210.09507] An enhanced method of initial cluster center selection for K-means algorithm](http://arxiv.org/abs/2210.09507)
Clustering is one of the widely used techniques to find out patterns from a dataset that can be applied in different applications or analyses. K-means, the most popular and simple clustering algorithm, might get trapped into local minima if not properly initialized and the initialization of this algorithm is done randomly. In this paper, we propose a novel approach to improve initial cluster selection for K-means algorithm. This algorithm is based on the fact that the initial centroids must be well separated from each other since the final clusters are separated groups in feature space. The Convex Hull algorithm facilitates the computing of the first two centroids and the remaining ones are selected according to the distance from previously selected centers. To ensure the selection of one center per cluster, we use the nearest neighbor technique. To check the robustness of our proposed algorithm, we consider several real-world datasets. We obtained only 7.33%, 7.90%, and 0% clustering error in Iris, Letter, and Ruspini data respectively which proves better performance than other existing systems. The results indicate that our proposed method outperforms the conventional K means approach by accelerating the computation when the number of clusters is greater than 2.
[[2210.09843] BIOWISH: Biometric Recognition using Wearable Inertial Sensors detecting Heart Activity](http://arxiv.org/abs/2210.09843)
Wearable devices are increasingly used, thanks to the wide set of applications that can be deployed exploiting their ability to monitor physical activity and health-related parameters. Their usage has been recently proposed to perform biometric recognition, leveraging on the uniqueness of the recorded traits to generate discriminative identifiers. Most of the studies conducted on this topic have considered signals derived from cardiac activity, detecting it mainly using electrical measurements thorugh electrocardiography, or optical recordings employing photoplethysmography. In this paper we instead propose a BIOmetric recognition approach using Wearable Inertial Sensors detecting Heart activity (BIOWISH). In more detail, we investigate the feasibility of exploiting mechanical measurements obtained through seismocardiography and gyrocardiography to recognize a person. Several feature extractors and classifiers, including deep learning techniques relying on transfer learning and siamese training, are employed to derive distinctive characteristics from the considered signals, and differentiate between legitimate and impostor subjects. An multi-session database, comprising acquisitions taken from subjects performing different activities, is employed to perform experimental tests simulating a verification system. The obtained results testify that identifiers derived from measurements of chest vibrations, collected by wearable inertial sensors, could be employed to guarantee high recognition performance, even when considering short-time recordings.
[[2210.09698] Transfer learning with weak labels from radiology reports: application to glioma change detection](http://arxiv.org/abs/2210.09698)
Creating large annotated datasets represents a major bottleneck for the development of deep learning models in radiology. To overcome this, we propose a combined use of weak labels (imprecise, but fast-to-create annotations) and Transfer Learning (TL). Specifically, we explore inductive TL, where source and target domains are identical, but tasks are different due to a label shift: our target labels are created manually by three radiologists, whereas the source weak labels are generated automatically from textual radiology reports. We frame knowledge transfer as hyperparameter optimization, thus avoiding heuristic choices that are frequent in related works. We investigate the relationship between model size and TL, comparing a low-capacity VGG with a higher-capacity SEResNeXt. The task that we address is change detection in follow-up glioma imaging: we extracted 1693 T2-weighted magnetic resonance imaging difference maps from 183 patients, and classified them into stable or unstable according to tumor evolution. Weak labeling allowed us to increase dataset size more than 3-fold, and improve VGG classification results from 75% to 82% Area Under the ROC Curve (AUC) (p=0.04). Mixed training from scratch led to higher performance than fine-tuning or feature extraction. To assess generalizability, we also ran inference on an open dataset (BraTS-2015: 15 patients, 51 difference maps), reaching up to 76% AUC. Overall, results suggest that medical imaging problems may benefit from smaller models and different TL strategies with respect to computer vision problems, and that report-generated weak labels are effective in improving model performances. Code, in-house dataset and BraTS labels are released.
[[2210.09743] A Dashboard to Analysis and Synthesis of Dimensionality Reduction Methods in Remote Sensing](http://arxiv.org/abs/2210.09743)
Hyperspectral images (HSI) classification is a high technical remote sensing software. The purpose is to reproduce a thematic map . The HSI contains more than a hundred hyperspectral measures, as bands (or simply images), of the concerned region. They are taken at neighbors frequencies. Unfortunately, some bands are redundant features, others are noisily measured, and the high dimensionality of features made classification accuracy poor. The problematic is how to find the good bands to classify the regions items. Some methods use Mutual Information (MI) and thresholding, to select relevant images, without processing redundancy. Others control and avoid redundancy. But they process the dimensionality reduction, some times as selection, other times as wrapper methods without any relationship . Here , we introduce a survey on all scheme used, and after critics and improvement, we synthesize a dashboard, that helps user to analyze an hypothesize features selection and extraction softwares.
[[2210.09778] Compact multi-scale periocular recognition using SAFE features](http://arxiv.org/abs/2210.09778)
In this paper, we present a new approach for periocular recognition based on the Symmetry Assessment by Feature Expansion (SAFE) descriptor, which encodes the presence of various symmetric curve families around image key points. We use the sclera center as single key point for feature extraction, highlighting the object-like identity properties that concentrates to this unique point of the eye. As it is demonstrated, such discriminative properties can be encoded with a reduced set of symmetric curves. Experiments are done with a database of periocular images captured with a digital camera. We test our system against reference periocular features, achieving top performance with a considerably smaller feature vector (given by the use of a single key point). All the systems tested also show a nearly steady correlation between acquisition distance and performance, and they are also able to cope well when enrolment and test images are not captured at the same distance. Fusion experiments among the available systems are also provided.
[[2210.09345] CrossRE: A Cross-Domain Dataset for Relation Extraction](http://arxiv.org/abs/2210.09345)
Relation Extraction (RE) has attracted increasing attention, but current RE evaluation is limited to in-domain evaluation setups. Little is known on how well a RE system fares in challenging, but realistic out-of-distribution evaluation setups. To address this gap, we propose CrossRE, a new, freely-available cross-domain benchmark for RE, which comprises six distinct text domains and includes multi-label annotations. An additional innovation is that we release meta-data collected during annotation, to include explanations and flags of difficult instances. We provide an empirical evaluation with a state-of-the-art model for relation classification. As the meta-data enables us to shed new light on the state-of-the-art model, we provide a comprehensive analysis on the impact of difficult cases and find correlations between model and human annotations. Overall, our empirical investigation highlights the difficulty of cross-domain RE. We release our dataset, to spur more research in this direction.
[[2210.09770] EventGraph at CASE 2021 Task 1: A General Graph-based Approach to Protest Event Extraction](http://arxiv.org/abs/2210.09770)
This paper presents our submission to the 2022 edition of the CASE 2021 shared task 1, subtask 4. The EventGraph system adapts an end-to-end, graph-based semantic parser to the task of Protest Event Extraction and more specifically subtask 4 on event trigger and argument extraction. We experiment with various graphs, encoding the events as either "labeled-edge" or "node-centric" graphs. We show that the "node-centric" approach yields best results overall, performing well across the three languages of the task, namely English, Spanish, and Portuguese. EventGraph is ranked 3rd for English and Portuguese, and 4th for Spanish. Our code is available at: https://github.com/huiling-y/eventgraph_at_case
[[2210.09563] FedForgery: Generalized Face Forgery Detection with Residual Federated Learning](http://arxiv.org/abs/2210.09563)
With the continuous development of deep learning in the field of image generation models, a large number of vivid forged faces have been generated and spread on the Internet. These high-authenticity artifacts could grow into a threat to society security. Existing face forgery detection methods directly utilize the obtained public shared or centralized data for training but ignore the personal privacy and security issues when personal data couldn't be centralizedly shared in real-world scenarios. Additionally, different distributions caused by diverse artifact types would further bring adverse influences on the forgery detection task. To solve the mentioned problems, the paper proposes a novel generalized residual Federated learning for face Forgery detection (FedForgery). The designed variational autoencoder aims to learn robust discriminative residual feature maps to detect forgery faces (with diverse or even unknown artifact types). Furthermore, the general federated learning strategy is introduced to construct distributed detection model trained collaboratively with multiple local decentralized devices, which could further boost the representation generalization. Experiments conducted on publicly available face forgery detection datasets prove the superior performance of the proposed FedForgery. The designed novel generalized face forgery detection protocols and source code would be publicly available.
[[2210.09626] FLECS-CGD: A Federated Learning Second-Order Framework via Compression and Sketching with Compressed Gradient Differences](http://arxiv.org/abs/2210.09626)
In the recent paper FLECS (Agafonov et al, FLECS: A Federated Learning Second-Order Framework via Compression and Sketching), the second-order framework FLECS was proposed for the Federated Learning problem. This method utilize compression of sketched Hessians to make communication costs low. However, the main bottleneck of FLECS is gradient communication without compression. In this paper, we propose the modification of FLECS with compressed gradient differences, which we call FLECS-CGD (FLECS with Compressed Gradient Differences) and make it applicable for stochastic optimization. Convergence guarantees are provided in strongly convex and nonconvex cases. Experiments show the practical benefit of proposed approach.
[[2210.09943] On the Importance of Architectures and Hyperparameters for Fairness in Face Recognition](http://arxiv.org/abs/2210.09943)
Face recognition systems are deployed across the world by government agencies and contractors for sensitive and impactful tasks, such as surveillance and database matching. Despite their widespread use, these systems are known to exhibit bias across a range of sociodemographic dimensions, such as gender and race. Nonetheless, an array of works proposing pre-processing, training, and post-processing methods have failed to close these gaps. Here, we take a very different approach to this problem, identifying that both architectures and hyperparameters of neural networks are instrumental in reducing bias. We first run a large-scale analysis of the impact of architectures and training hyperparameters on several common fairness metrics and show that the implicit convention of choosing high-accuracy architectures may be suboptimal for fairness. Motivated by our findings, we run the first neural architecture search for fairness, jointly with a search for hyperparameters. We output a suite of models which Pareto-dominate all other competitive architectures in terms of accuracy and fairness. Furthermore, we show that these models transfer well to other face recognition datasets with similar and distinct protected attributes. We release our code and raw result files so that researchers and practitioners can replace our fairness metrics with a bias measure of their choice.
[[2210.09957] Contextual bandits with concave rewards, and an application to fair ranking](http://arxiv.org/abs/2210.09957)
We consider Contextual Bandits with Concave Rewards (CBCR), a multi-objective bandit problem where the desired trade-off between the rewards is defined by a known concave objective function, and the reward vector depends on an observed stochastic context. We present the first algorithm with provably vanishing regret for CBCR without restrictions on the policy space, whereas prior works were restricted to finite policy spaces or tabular representations. Our solution is based on a geometric interpretation of CBCR algorithms as optimization algorithms over the convex set of expected rewards spanned by all stochastic policies. Building on Frank-Wolfe analyses in constrained convex optimization, we derive a novel reduction from the CBCR regret to the regret of a scalar-reward bandit problem. We illustrate how to apply the reduction off-the-shelf to obtain algorithms for CBCR with both linear and general reward functions, in the case of non-combinatorial actions. Motivated by fairness in recommendation, we describe a special case of CBCR with rankings and fairness-aware objectives, leading to the first algorithm with regret guarantees for contextual combinatorial bandits with fairness of exposure.
[[2210.09693] TFAD: A Decomposition Time Series Anomaly Detection Architecture with Time-Frequency Analysis](http://arxiv.org/abs/2210.09693)
Time series anomaly detection is a challenging problem due to the complex temporal dependencies and the limited label data. Although some algorithms including both traditional and deep models have been proposed, most of them mainly focus on time-domain modeling, and do not fully utilize the information in the frequency domain of the time series data. In this paper, we propose a Time-Frequency analysis based time series Anomaly Detection model, or TFAD for short, to exploit both time and frequency domains for performance improvement. Besides, we incorporate time series decomposition and data augmentation mechanisms in the designed time-frequency architecture to further boost the abilities of performance and interpretability. Empirical studies on widely used benchmark datasets show that our approach obtains state-of-the-art performance in univariate and multivariate time series anomaly detection tasks. Code is provided at https://github.com/DAMO-DI-ML/CIKM22-TFAD.
[[2210.09475] AMPNet: Attention as Message Passing for Graph Neural Networks](http://arxiv.org/abs/2210.09475)
Feature-level interactions between nodes can carry crucial information for understanding complex interactions in graph-structured data. Current interpretability techniques, however, are limited in their ability to capture feature-level interactions between different nodes. In this work, we propose AMPNet, a general Graph Neural Network (GNN) architecture for uncovering feature-level interactions between different spatial locations within graph-structured data. Our framework applies a multiheaded attention operation during message-passing to contextualize messages based on the feature interactions between different nodes. We evaluate AMPNet on several benchmark and real-world datasets, and develop a synthetic benchmark based on cyclic cellular automata to test the ability of our framework to recover cyclic patterns in node states based on feature-interactions. We also propose several methods for addressing the scalability of our architecture to large graphs, including subgraph sampling during training and node feature downsampling.