[[2302.07287] Forward Pass: On the Security Implications of Email Forwarding Mechanism and Policy](http://arxiv.org/abs/2302.07287) #security
The critical role played by email has led to a range of extension protocols (e.g., SPF, DKIM, DMARC) designed to protect against the spoofing of email sender domains. These protocols are complex as is, but are further complicated by automated email forwarding -- used by individual users to manage multiple accounts and by mailing lists to redistribute messages. In this paper, we explore how such email forwarding and its implementations can break the implicit assumptions in widely deployed anti-spoofing protocols. Using large-scale empirical measurements of 20 email forwarding services (16 leading email providers and four popular mailing list services), we identify a range of security issues rooted in forwarding behavior and show how they can be combined to reliably evade existing anti-spoofing controls. We show how this allows attackers to not only deliver spoofed email messages to prominent email providers (e.g., Gmail, Microsoft Outlook, and Zoho), but also reliably spoof email on behalf of tens of thousands of popular domains including sensitive domains used by organizations in government (e.g., state.gov), finance (e.g., transunion.com), law (e.g., perkinscoie.com) and news (e.g., washingtonpost.com) among others.
[[2302.07347] Security Threat Mitigation For Smart Contracts: A Survey](http://arxiv.org/abs/2302.07347) #security
The blockchain technology has been used for recording state transitions of smart contracts - decentralized applications that can be invoked through external transactions. Smart contracts gained popularity and accrued hundreds of billions of dollars in market capitalization in recent years. Unfortunately, like all other programs, smart contracts are prone to security vulnerabilities that have incurred multimillion-dollar damages over the past decade. As a result, many automated threat mitigation solutions have been proposed to counter the security issues of smart contracts. These threat mitigation solutions include various tools and methods that are challenging to compare. This survey develops a comprehensive classification taxonomy of smart contract threat mitigation solutions within five orthogonal dimensions: defense modality, core method, targeted contracts, input-output data mapping, and threat model. We classify 133 existing threat mitigation solutions using our taxonomy and confirm that the proposed five dimensions allow us to concisely and accurately describe any smart contract threat mitigation solution. In addition to learning what the threat mitigation solutions do, we also show how these solutions work by synthesizing their actual designs into a set of uniform workflows corresponding to the eight existing defense core methods. We further create an integrated coverage map for the known smart contract vulnerabilities by the existing threat mitigation solutions. Finally, we perform the evidence-based evolutionary analysis, in which we identify trends and future perspectives of threat mitigation in smart contracts and pinpoint major weaknesses of the existing methodologies. For the convenience of smart contract security developers, auditors, users, and researchers, we deploy a regularly updated comprehensive open-source online registry of threat mitigation solutions.
[[2302.07431] Exploring the Techniques of Information Security Certification](http://arxiv.org/abs/2302.07431) #security
If the information system is intruded or attacked by hackers, leaked personal data or serious economic loss may occur; the threats may be serious security problems. For security services, information security certification is built based on Public Key Infrastructure (PKI) to be an important tool for the services of bank transactions, natural person certificate, blockchain, and Hyper Text Transfer Protocol Secure (HTTPS). Therefore, this study uses Taiwan Patent Search System (TPSS) to find and analyze the contents of patents for obtaining the innovation reports of information security certification in Taiwan according to patents. This study considers the single-factor and two-factors to analyze the relationships of annuals, technology leaders, market leaders, and major applications for exploring the patent portfolios of technology leaders and market leaders in information security certification.
[[2302.07467] Demystifying security and compatibility issues in Android Apps](http://arxiv.org/abs/2302.07467) #security
Never before has any OS been so popular as Android. Existing mobile phones are not simply devices for making phone calls and receiving SMS messages, but powerful communication and entertainment platforms for web surfing, social networking, etc. Even though the Android OS offers powerful communication and application execution capabilities, it is riddled with defects (e.g., security risks, and compatibility issues), new vulnerabilities come to light daily, and bugs cost the economy tens of billions of dollars annually. For example, malicious apps (e.g., back-doors, fraud apps, ransomware, spyware, etc.) are reported [Google, 2022] to exhibit malicious behaviours, including privacy stealing, unwanted programs installed, etc. To counteract these threats, many works have been proposed that rely on static analysis techniques to detect such issues. However, static techniques are not sufficient on their own to detect such defects precisely. This will likely yield false positive results as static analysis has to make some trade-offs when handling complicated cases (e.g., object-sensitive vs. object-insensitive). In addition, static analysis techniques will also likely suffer from soundness issues because some complicated features (e.g., reflection, obfuscation, and hardening) are difficult to be handled [Sun et al., 2021b, Samhi et al., 2022].
[[2302.07572] Similarity Calculation Based on Homomorphic Encryption](http://arxiv.org/abs/2302.07572) #security
In recent years, although some homomorphic encryption algorithms have been proposed to provide additive homomorphic encryption and multiplicative homomorphic encryption. However, similarity measures are required for searches and queries under homomorphic encrypted ciphertexts. Therefore, this study considers the cosine similarity, angular similarity, Tanimoto similarity, and soft cosine similarity and combines homomorphic encryption algorithms for similarity calculation. This study proposes mathematical models to prove the proposed homomorphic encryption-based similarity calculation methods and gives practical cases to explain the proposed methods. In experiments, the performance of the proposed homomorphic encryption-based similarity calculation methods has been evaluated under different security strengths.
[[2302.07586] Vulnerability Analysis of Digital Banks' Mobile Applications](http://arxiv.org/abs/2302.07586) #security
There is a rapid increase in the number of mobile banking applications' users due to an increase in smart mobile devices. Mobile banking is a financial transaction and service offered through mobile devices. Almost all financial institutions now provide mobile banking services to their customers. However, the security of mobile banking applications is of huge concern because of the amount of personal data and information they collect. If an attacker gets hold of personal information, they can access bank payment or card accounts. This research aims to analyze the vulnerability of the UK digital banks' applications to identify vulnerabilities in the apps and proffer countermeasures that can help improve the security of the bank applications. Androbugs, a vulnerability scanner, was used to analyze the vulnerability of six digital banks' android applications. Starling, Monese, Atom bank, Transferwise, Monzo, and Revolut were scanned. All the scanned digital banks' applications have vulnerabilities; however, some have more vulnerabilities than others. For example, Revolut's mobile application has the highest number of identified vulnerabilities. Therefore, there is need for more security in the digital banks' applications as well as other mobile banking applications.
[[2302.07777] FIDO2 the Rescue? Platform vs](http://arxiv.org/abs/2302.07777) #security
Modern smartphones support FIDO2 passwordless authentication using either external security keys or internal biometric authentication, but it is unclear whether users appreciate and accept these new forms of web authentication for their own accounts. We present the first lab study (N=87) comparing platform and roaming authentication on smartphones, determining the practical strengths and weaknesses of FIDO2 as perceived by users in a mobile scenario. Most participants were willing to adopt passwordless authentication during our in-person user study, but closer analysis shows that participants prioritize usability, security, and availability differently depending on the account type. We identify remaining adoption barriers that prevent FIDO2 from succeeding password authentication, such as missing support for contemporary usage patterns, including account delegation and usage on multiple clients.
[[2302.07636] DP-BART for Privatized Text Rewriting under Local Differential Privacy](http://arxiv.org/abs/2302.07636) #privacy
Privatized text rewriting with local differential privacy (LDP) is a recent approach that enables sharing of sensitive textual documents while formally guaranteeing privacy protection to individuals. However, existing systems face several issues, such as formal mathematical flaws, unrealistic privacy guarantees, privatization of only individual words, as well as a lack of transparency and reproducibility. In this paper, we propose a new system 'DP-BART' that largely outperforms existing LDP systems. Our approach uses a novel clipping method, iterative pruning, and further training of internal representations which drastically reduces the amount of noise required for DP guarantees. We run experiments on five textual datasets of varying sizes, rewriting them at different privacy guarantees and evaluating the rewritten texts on downstream text classification tasks. Finally, we thoroughly discuss the privatized text rewriting approach and its limitations, including the problem of the strict text adjacency constraint in the LDP paradigm that leads to the high noise requirement.
[[2302.07801] Data Forensics in Diffusion Models: A Systematic Analysis of Membership Privacy](http://arxiv.org/abs/2302.07801) #privacy
In recent years, diffusion models have achieved tremendous success in the field of image generation, becoming the stateof-the-art technology for AI-based image processing applications. Despite the numerous benefits brought by recent advances in diffusion models, there are also concerns about their potential misuse, specifically in terms of privacy breaches and intellectual property infringement. In particular, some of their unique characteristics open up new attack surfaces when considering the real-world deployment of such models. With a thorough investigation of the attack vectors, we develop a systematic analysis of membership inference attacks on diffusion models and propose novel attack methods tailored to each attack scenario specifically relevant to diffusion models. Our approach exploits easily obtainable quantities and is highly effective, achieving near-perfect attack performance (>0.9 AUCROC) in realistic scenarios. Our extensive experiments demonstrate the effectiveness of our method, highlighting the importance of considering privacy and intellectual property risks when using diffusion models in image generation tasks.
[[2302.07717] Field-sensitive Data Flow Integrity](http://arxiv.org/abs/2302.07717) #protect
Although numerous defenses against memory vulnerability exploits have been studied so far, highly-compatible, precise, and efficient defense is still an open problem. In fact, existing defense methods have at least one of the following problems: they (1) cannot precisely protect structure fields, (2) incur high protection overheads, and/or (3) cannot maintain compatibility with existing code due to imposing memory layout change on the protected program.
In this paper, we propose a novel memory-protection method FIX-Sense that aims to solve all of these problems simultaneously. Our key idea is to perform memory protection based on field-sensitive data-flow integrity. Specifically, our method (1) computes a safe write-read relation for each memory object, at the structure-field granularity, based on field-sensitive value-flow analysis at the compile-time of the protected program. (2) At run-time, lightweight verification is performed to determine whether each memory read executed by the protected program belong to the safe write-read relation calculated for the memory object at compile time. (3) This verification is implemented by lightweight metadata management that tracks memory writes at the structure field granularity without changing the memory layout of the target program (especially the structure field layout).
[[2302.07735] Targeted Attack on GPT-Neo for the SATML Language Model Data Extraction Challenge](http://arxiv.org/abs/2302.07735) #attack
Previous work has shown that Large Language Models are susceptible to so-called data extraction attacks. This allows an attacker to extract a sample that was contained in the training data, which has massive privacy implications. The construction of data extraction attacks is challenging, current attacks are quite inefficient, and there exists a significant gap in the extraction capabilities of untargeted attacks and memorization. Thus, targeted attacks are proposed, which identify if a given sample from the training data, is extractable from a model. In this work, we apply a targeted data extraction attack to the SATML2023 Language Model Training Data Extraction Challenge. We apply a two-step approach. In the first step, we maximise the recall of the model and are able to extract the suffix for 69% of the samples. In the second step, we use a classifier-based Membership Inference Attack on the generations. Our AutoSklearn classifier achieves a precision of 0.841. The full approach reaches a score of 0.405 recall at a 10% false positive rate, which is an improvement of 34% over the baseline of 0.301.
[[2302.07445] Silent Vulnerable Dependency Alert Prediction with Vulnerability Key Aspect Explanation](http://arxiv.org/abs/2302.07445) #attack
Due to convenience, open-source software is widely used. For beneficial reasons, open-source maintainers often fix the vulnerabilities silently, exposing their users unaware of the updates to threats. Previous works all focus on black-box binary detection of the silent dependency alerts that suffer from high false-positive rates. Open-source software users need to analyze and explain AI prediction themselves. Explainable AI becomes remarkable as a complementary of black-box AI models, providing details in various forms to explain AI decisions. Noticing there is still no technique that can discover silent dependency alert on time, in this work, we propose a framework using an encoder-decoder model with a binary detector to provide explainable silent dependency alert prediction. Our model generates 4 types of vulnerability key aspects including vulnerability type, root cause, attack vector, and impact to enhance the trustworthiness and users' acceptance to alert prediction. By experiments with several models and inputs, we confirm CodeBERT with both commit messages and code changes achieves the best results. Our user study shows that explainable alert predictions can help users find silent dependency alert more easily than black-box predictions. To the best of our knowledge, this is the first research work on the application of Explainable AI in silent dependency alert prediction, which opens the door of the related domains.
[[2302.07589] ARGUS: Context-Based Detection of Stealthy IoT Infiltration Attacks](http://arxiv.org/abs/2302.07589) #attack
IoT application domains, device diversity and connectivity are rapidly growing. IoT devices control various functions in smart homes and buildings, smart cities, and smart factories, making these devices an attractive target for attackers. On the other hand, the large variability of different application scenarios and inherent heterogeneity of devices make it very challenging to reliably detect abnormal IoT device behaviors and distinguish these from benign behaviors. Existing approaches for detecting attacks are mostly limited to attacks directly compromising individual IoT devices, or, require predefined detection policies. They cannot detect attacks that utilize the control plane of the IoT system to trigger actions in an unintended/malicious context, e.g., opening a smart lock while the smart home residents are absent.
In this paper, we tackle this problem and propose ARGUS, the first self-learning intrusion detection system for detecting contextual attacks on IoT environments, in which the attacker maliciously invokes IoT device actions to reach its goals. ARGUS monitors the contextual setting based on the state and actions of IoT devices in the environment. An unsupervised Deep Neural Network (DNN) is used for modeling the typical contextual device behavior and detecting actions taking place in abnormal contextual settings. This unsupervised approach ensures that ARGUS is not restricted to detecting previously known attacks but is also able to detect new attacks. We evaluated ARGUS on heterogeneous real-world smart-home settings and achieve at least an F1-Score of 99.64% for each setup, with a false positive rate (FPR) of at most 0.03%.
[[2302.07516] Offline-to-Online Knowledge Distillation for Video Instance Segmentation](http://arxiv.org/abs/2302.07516) #robust
In this paper, we present offline-to-online knowledge distillation (OOKD) for video instance segmentation (VIS), which transfers a wealth of video knowledge from an offline model to an online model for consistent prediction. Unlike previous methods that having adopting either an online or offline model, our single online model takes advantage of both models by distilling offline knowledge. To transfer knowledge correctly, we propose query filtering and association (QFA), which filters irrelevant queries to exact instances. Our KD with QFA increases the robustness of feature matching by encoding object-centric features from a single frame supplemented by long-range global information. We also propose a simple data augmentation scheme for knowledge distillation in the VIS task that fairly transfers the knowledge of all classes into the online model. Extensive experiments show that our method significantly improves the performance in video instance segmentation, especially for challenging datasets including long, dynamic sequences. Our method also achieves state-of-the-art performance on YTVIS-21, YTVIS-22, and OVIS datasets, with mAP scores of 46.1%, 43.6%, and 31.1%, respectively.
[[2302.07579] Semi-Supervised Deep Regression with Uncertainty Consistency and Variational Model Ensembling via Bayesian Neural Networks](http://arxiv.org/abs/2302.07579) #robust
Deep regression is an important problem with numerous applications. These range from computer vision tasks such as age estimation from photographs, to medical tasks such as ejection fraction estimation from echocardiograms for disease tracking. Semi-supervised approaches for deep regression are notably under-explored compared to classification and segmentation tasks, however. Unlike classification tasks, which rely on thresholding functions for generating class pseudo-labels, regression tasks use real number target predictions directly as pseudo-labels, making them more sensitive to prediction quality. In this work, we propose a novel approach to semi-supervised regression, namely Uncertainty-Consistent Variational Model Ensembling (UCVME), which improves training by generating high-quality pseudo-labels and uncertainty estimates for heteroscedastic regression. Given that aleatoric uncertainty is only dependent on input data by definition and should be equal for the same inputs, we present a novel uncertainty consistency loss for co-trained models. Our consistency loss significantly improves uncertainty estimates and allows higher quality pseudo-labels to be assigned greater importance under heteroscedastic regression. Furthermore, we introduce a novel variational model ensembling approach to reduce prediction noise and generate more robust pseudo-labels. We analytically show our method generates higher quality targets for unlabeled data and further improves training. Experiments show that our method outperforms state-of-the-art alternatives on different tasks and can be competitive with supervised methods that use full labels. Our code is available at https://github.com/xmed-lab/UCVME.
[[2302.07608] Uncertainty-Estimation with Normalized Logits for Out-of-Distribution Detection](http://arxiv.org/abs/2302.07608) #robust
Out-of-distribution (OOD) detection is critical for preventing deep learning models from making incorrect predictions to ensure the safety of artificial intelligence systems. Especially in safety-critical applications such as medical diagnosis and autonomous driving, the cost of incorrect decisions is usually unbearable. However, neural networks often suffer from the overconfidence issue, making high confidence for OOD data which are never seen during training process and may be irrelevant to training data, namely in-distribution (ID) data. Determining the reliability of the prediction is still a difficult and challenging task. In this work, we propose Uncertainty-Estimation with Normalized Logits (UE-NL), a robust learning method for OOD detection, which has three main benefits. (1) Neural networks with UE-NL treat every ID sample equally by predicting the uncertainty score of input data and the uncertainty is added into softmax function to adjust the learning strength of easy and hard samples during training phase, making the model learn robustly and accurately. (2) UE-NL enforces a constant vector norm on the logits to decouple the effect of the increasing output norm from optimization process, which causes the overconfidence issue to some extent. (3) UE-NL provides a new metric, the magnitude of uncertainty score, to detect OOD data. Experiments demonstrate that UE-NL achieves top performance on common OOD benchmarks and is more robust to noisy ID data that may be misjudged as OOD data by other methods.
[[2302.07689] Event-guided Multi-patch Network with Self-supervision for Non-uniform Motion Deblurring](http://arxiv.org/abs/2302.07689) #robust
Contemporary deep learning multi-scale deblurring models suffer from many issues: 1) They perform poorly on non-uniformly blurred images/videos; 2) Simply increasing the model depth with finer-scale levels cannot improve deblurring; 3) Individual RGB frames contain a limited motion information for deblurring; 4) Previous models have a limited robustness to spatial transformations and noise. Below, we extend the DMPHN model by several mechanisms to address the above issues: I) We present a novel self-supervised event-guided deep hierarchical Multi-patch Network (MPN) to deal with blurry images and videos via fine-to-coarse hierarchical localized representations; II) We propose a novel stacked pipeline, StackMPN, to improve the deblurring performance under the increased network depth; III) We propose an event-guided architecture to exploit motion cues contained in videos to tackle complex blur in videos; IV) We propose a novel self-supervised step to expose the model to random transformations (rotations, scale changes), and make it robust to Gaussian noises. Our MPN achieves the state of the art on the GoPro and VideoDeblur datasets with a 40x faster runtime compared to current multi-scale methods. With 30ms to process an image at 1280x720 resolution, it is the first real-time deep motion deblurring model for 720p images at 30fps. For StackMPN, we obtain significant improvements over 1.2dB on the GoPro dataset by increasing the network depth. Utilizing the event information and self-supervision further boost results to 33.83dB.
[[2302.07702] Audio-Visual Contrastive Learning with Temporal Self-Supervision](http://arxiv.org/abs/2302.07702) #robust
We propose a self-supervised learning approach for videos that learns representations of both the RGB frames and the accompanying audio without human supervision. In contrast to images that capture the static scene appearance, videos also contain sound and temporal scene dynamics. To leverage the temporal and aural dimension inherent to videos, our method extends temporal self-supervision to the audio-visual setting and integrates it with multi-modal contrastive objectives. As temporal self-supervision, we pose playback speed and direction recognition in both modalities and propose intra- and inter-modal temporal ordering tasks. Furthermore, we design a novel contrastive objective in which the usual pairs are supplemented with additional sample-dependent positives and negatives sampled from the evolving feature space. In our model, we apply such losses among video clips and between videos and their temporally corresponding audio clips. We verify our model design in extensive ablation experiments and evaluate the video and audio representations in transfer experiments to action recognition and retrieval on UCF101 and HMBD51, audio classification on ESC50, and robust video fingerprinting on VGG-Sound, with state-of-the-art results.
[[2302.07864] Denoising Diffusion Probabilistic Models for Robust Image Super-Resolution in the Wild](http://arxiv.org/abs/2302.07864) #robust
Diffusion models have shown promising results on single-image super-resolution and other image- to-image translation tasks. Despite this success, they have not outperformed state-of-the-art GAN models on the more challenging blind super-resolution task, where the input images are out of distribution, with unknown degradations. This paper introduces SR3+, a diffusion-based model for blind super-resolution, establishing a new state-of-the-art. To this end, we advocate self-supervised training with a combination of composite, parameterized degradations for self-supervised training, and noise-conditioing augmentation during training and testing. With these innovations, a large-scale convolutional architecture, and large-scale datasets, SR3+ greatly outperforms SR3. It outperforms Real-ESRGAN when trained on the same data, with a DRealSR FID score of 36.82 vs. 37.22, which further improves to FID of 32.37 with larger models, and further still with larger training sets.
[[2302.07324] READIN: A Chinese Multi-Task Benchmark with Realistic and Diverse Input Noises](http://arxiv.org/abs/2302.07324) #robust
For many real-world applications, the user-generated inputs usually contain various noises due to speech recognition errors caused by linguistic variations1 or typographical errors (typos). Thus, it is crucial to test model performance on data with realistic input noises to ensure robustness and fairness. However, little study has been done to construct such benchmarks for Chinese, where various language-specific input noises happen in the real world. In order to fill this important gap, we construct READIN: a Chinese multi-task benchmark with REalistic And Diverse Input Noises. READIN contains four diverse tasks and requests annotators to re-enter the original test data with two commonly used Chinese input methods: Pinyin input and speech input. We designed our annotation pipeline to maximize diversity, for example by instructing the annotators to use diverse input method editors (IMEs) for keyboard noises and recruiting speakers from diverse dialectical groups for speech noises. We experiment with a series of strong pretrained language models as well as robust training methods, we find that these models often suffer significant performance drops on READIN even with robustness methods like data augmentation. As the first large-scale attempt in creating a benchmark with noises geared towards user-generated inputs, we believe that READIN serves as an important complement to existing Chinese NLP benchmarks. The source code and dataset can be obtained from https://github.com/thunlp/READIN.
[[2302.07351] Constrained Decision Transformer for Offline Safe Reinforcement Learning](http://arxiv.org/abs/2302.07351) #robust
Safe reinforcement learning (RL) trains a constraint satisfaction policy by interacting with the environment. We aim to tackle a more challenging problem: learning a safe policy from an offline dataset. We study the offline safe RL problem from a novel multi-objective optimization perspective and propose the $\epsilon$-reducible concept to characterize problem difficulties. The inherent trade-offs between safety and task performance inspire us to propose the constrained decision transformer (CDT) approach, which can dynamically adjust the trade-offs during deployment. Extensive experiments show the advantages of the proposed method in learning an adaptive, safe, robust, and high-reward policy. CDT outperforms its variants and strong offline safe RL baselines by a large margin with the same hyperparameters across all tasks, while keeping the zero-shot adaptation capability to different constraint thresholds, making our approach more suitable for real-world RL under constraints.
[[2302.07769] XploreNAS: Explore Adversarially Robust & Hardware-efficient Neural Architectures for Non-ideal Xbars](http://arxiv.org/abs/2302.07769) #robust
Compute In-Memory platforms such as memristive crossbars are gaining focus as they facilitate acceleration of Deep Neural Networks (DNNs) with high area and compute-efficiencies. However, the intrinsic non-idealities associated with the analog nature of computing in crossbars limits the performance of the deployed DNNs. Furthermore, DNNs are shown to be vulnerable to adversarial attacks leading to severe security threats in their large-scale deployment. Thus, finding adversarially robust DNN architectures for non-ideal crossbars is critical to the safe and secure deployment of DNNs on the edge. This work proposes a two-phase algorithm-hardware co-optimization approach called XploreNAS that searches for hardware-efficient & adversarially robust neural architectures for non-ideal crossbar platforms. We use the one-shot Neural Architecture Search (NAS) approach to train a large Supernet with crossbar-awareness and sample adversarially robust Subnets therefrom, maintaining competitive hardware-efficiency. Our experiments on crossbars with benchmark datasets (SVHN, CIFAR10 & CIFAR100) show upto ~8-16% improvement in the adversarial robustness of the searched Subnets against a baseline ResNet-18 model subjected to crossbar-aware adversarial training. We benchmark our robust Subnets for Energy-Delay-Area-Products (EDAPs) using the Neurosim tool and find that with additional hardware-efficiency driven optimizations, the Subnets attain ~1.5-1.6x lower EDAPs than ResNet-18 baseline.
[[2302.07748] Whats New? Identifying the Unfolding of New Events in Narratives](http://arxiv.org/abs/2302.07748) #extraction
Narratives include a rich source of events unfolding over time and context. Automatic understanding of these events may provide a summarised comprehension of the narrative for further computation (such as reasoning). In this paper, we study the Information Status (IS) of the events and propose a novel challenging task: the automatic identification of new events in a narrative. We define an event as a triplet of subject, predicate, and object. The event is categorized as new with respect to the discourse context and whether it can be inferred through commonsense reasoning. We annotated a publicly available corpus of narratives with the new events at sentence level using human annotators. We present the annotation protocol and a study aiming at validating the quality of the annotation and the difficulty of the task. We publish the annotated dataset, annotation materials, and machine learning baseline models for the task of new event extraction for narrative understanding.
[[2302.07450] FedABC: Targeting Fair Competition in Personalized Federated Learning](http://arxiv.org/abs/2302.07450) #federate
Federated learning aims to collaboratively train models without accessing their client's local private data. The data may be Non-IID for different clients and thus resulting in poor performance. Recently, personalized federated learning (PFL) has achieved great success in handling Non-IID data by enforcing regularization in local optimization or improving the model aggregation scheme on the server. However, most of the PFL approaches do not take into account the unfair competition issue caused by the imbalanced data distribution and lack of positive samples for some classes in each client. To address this issue, we propose a novel and generic PFL framework termed Federated Averaging via Binary Classification, dubbed FedABC. In particular, we adopt the ``one-vs-all'' training strategy in each client to alleviate the unfair competition between classes by constructing a personalized binary classification problem for each class. This may aggravate the class imbalance challenge and thus a novel personalized binary classification loss that incorporates both the under-sampling and hard sample mining strategies is designed. Extensive experiments are conducted on two popular datasets under different settings, and the results demonstrate that our FedABC can significantly outperform the existing counterparts.
[[2302.07305] FedLE: Federated Learning Client Selection with Lifespan Extension for Edge IoT Networks](http://arxiv.org/abs/2302.07305) #federate
Federated learning (FL) is a distributed and privacy-preserving learning framework for predictive modeling with massive data generated at the edge by Internet of Things (IoT) devices. One major challenge preventing the wide adoption of FL in IoT is the pervasive power supply constraints of IoT devices due to the intensive energy consumption of battery-powered clients for local training and model updates. Low battery levels of clients eventually lead to their early dropouts from edge networks, loss of training data jeopardizing the performance of FL, and their availability to perform other designated tasks. In this paper, we propose FedLE, an energy-efficient client selection framework that enables lifespan extension of edge IoT networks. In FedLE, the clients first run for a minimum epoch to generate their local model update. The models are partially uploaded to the server for calculating similarities between each pair of clients. Clustering is performed against these client pairs to identify those with similar model distributions. In each round, low-powered clients have a lower probability of being selected, delaying the draining of their batteries. Empirical studies show that FedLE outperforms baselines on benchmark datasets and lasts more training rounds than FedAvg with battery power constraints.
[[2302.07493] Adaptive incentive for cross-silo federated learning: A multi-agent reinforcement learning approach](http://arxiv.org/abs/2302.07493) #federate
Cross-silo federated learning (FL) is a typical FL that enables organizations(e.g., financial or medical entities) to train global models on isolated data. Reasonable incentive is key to encouraging organizations to contribute data. However, existing works on incentivizing cross-silo FL lack consideration of the environmental dynamics (e.g., precision of the trained global model and data owned by uncertain clients during the training processes). Moreover, most of them assume that organizations share private information, which is unrealistic. To overcome these limitations, we propose a novel adaptive mechanism for cross-silo FL, towards incentivizing organizations to contribute data to maximize their long-term payoffs in a real dynamic training environment. The mechanism is based on multi-agent reinforcement learning, which learns near-optimal data contribution strategy from the history of potential games without organizations' private information. Experiments demonstrate that our mechanism achieves adaptive incentive and effectively improves the long-term payoffs for organizations.
[[2302.07684] A Federated Learning Benchmark for Drug-Target Interaction](http://arxiv.org/abs/2302.07684) #federate
Aggregating pharmaceutical data in the drug-target interaction (DTI) domain has the potential to deliver life-saving breakthroughs. It is, however, notoriously difficult due to regulatory constraints and commercial interests. This work proposes the application of federated learning, which we argue to be reconcilable with the industry's constraints, as it does not require sharing of any information that would reveal the entities' data or any other high-level summary of it. When used on a representative GraphDTA model and the KIBA dataset it achieves up to 15% improved performance relative to the best available non-privacy preserving alternative. Our extensive battery of experiments shows that, unlike in other domains, the non-IID data distribution in the DTI datasets does not deteriorate FL performance. Additionally, we identify a material trade-off between the benefits of adding new data, and the cost of adding more clients.
[[2302.07676] DIVOTrack: A Novel Dataset and Baseline Method for Cross-View Multi-Object Tracking in DIVerse Open Scenes](http://arxiv.org/abs/2302.07676) #fair
Cross-view multi-object tracking aims to link objects between frames and camera views with substantial overlaps. Although cross-view multi-object tracking has received increased attention in recent years, existing datasets still have several issues, including 1) missing real-world scenarios, 2) lacking diverse scenes, 3) owning a limited number of tracks, 4) comprising only static cameras, and 5) lacking standard benchmarks, which hinder the investigation and comparison of cross-view tracking methods. To solve the aforementioned issues, we introduce DIVOTrack: a new cross-view multi-object tracking dataset for DIVerse Open scenes with dense tracking pedestrians in realistic and non-experimental environments. Our DIVOTrack has ten distinct scenarios and 550 cross-view tracks, surpassing all cross-view multi-object tracking datasets currently available. Furthermore, we provide a novel baseline cross-view tracking method with a unified joint detection and cross-view tracking framework named CrossMOT, which learns object detection, single-view association, and cross-view matching with an all-in-one embedding model. Finally, we present a summary of current methodologies and a set of standard benchmarks with our DIVOTrack to provide a fair comparison and conduct a comprehensive analysis of current approaches and our proposed CrossMOT. The dataset and code are available at https://github.com/shengyuhao/DIVOTrack.
[[2302.07842] Augmented Language Models: a Survey](http://arxiv.org/abs/2302.07842) #interpretability
This survey reviews works in which language models (LMs) are augmented with reasoning skills and the ability to use tools. The former is defined as decomposing a potentially complex task into simpler subtasks while the latter consists in calling external modules such as a code interpreter. LMs can leverage these augmentations separately or in combination via heuristics, or learn to do so from demonstrations. While adhering to a standard missing tokens prediction objective, such augmented LMs can use various, possibly non-parametric external modules to expand their context processing ability, thus departing from the pure language modeling paradigm. We therefore refer to them as Augmented Language Models (ALMs). The missing token objective allows ALMs to learn to reason, use tools, and even act, while still performing standard natural language tasks and even outperforming most regular LMs on several benchmarks. In this work, after reviewing current advance in ALMs, we conclude that this new research direction has the potential to address common limitations of traditional LMs such as interpretability, consistency, and scalability issues.
[[2302.07407] Bayesian Decision Trees via Tractable Priors and Probabilistic Context-Free Grammars](http://arxiv.org/abs/2302.07407) #interpretability
Decision Trees are some of the most popular machine learning models today due to their out-of-the-box performance and interpretability. Often, Decision Trees models are constructed greedily in a top-down fashion via heuristic search criteria, such as Gini impurity or entropy. However, trees constructed in this manner are sensitive to minor fluctuations in training data and are prone to overfitting. In contrast, Bayesian approaches to tree construction formulate the selection process as a posterior inference problem; such approaches are more stable and provide greater theoretical guarantees. However, generating Bayesian Decision Trees usually requires sampling from complex, multimodal posterior distributions. Current Markov Chain Monte Carlo-based approaches for sampling Bayesian Decision Trees are prone to mode collapse and long mixing times, which makes them impractical. In this paper, we propose a new criterion for training Bayesian Decision Trees. Our criterion gives rise to BCART-PCFG, which can efficiently sample decision trees from a posterior distribution across trees given the data and find the maximum a posteriori (MAP) tree. Learning the posterior and training the sampler can be done in time that is polynomial in the dataset size. Once the posterior has been learned, trees can be sampled efficiently (linearly in the number of nodes). At the core of our method is a reduction of sampling the posterior to sampling a derivation from a probabilistic context-free grammar. We find that trees sampled via BCART-PCFG perform comparable to or better than greedily-constructed Decision Trees in classification accuracy on several datasets. Additionally, the trees sampled via BCART-PCFG are significantly smaller -- sometimes by as much as 20x.
[[2302.07736] Is ChatGPT better than Human Annotators? Potential and Limitations of ChatGPT in Explaining Implicit Hate Speech](http://arxiv.org/abs/2302.07736) #explainability
Recent studies have alarmed that many online hate speeches are implicit. With its subtle nature, the explainability of the detection of such hateful speech has been a challenging problem. In this work, we examine whether ChatGPT can be used for providing natural language explanations (NLEs) for implicit hateful speech detection. We design our prompt to elicit concise ChatGPT-generated NLEs and conduct user studies to evaluate their qualities by comparison with human-generated NLEs. We discuss the potential and limitations of ChatGPT in the context of implicit hateful speech research.
[[2302.07458] CUTS: Neural Causal Discovery from Irregular Time-Series Data](http://arxiv.org/abs/2302.07458) #explainability
Causal discovery from time-series data has been a central task in machine learning. Recently, Granger causality inference is gaining momentum due to its good explainability and high compatibility with emerging deep neural networks. However, most existing methods assume structured input data and degenerate greatly when encountering data with randomly missing entries or non-uniform sampling frequencies, which hampers their applications in real scenarios. To address this issue, here we present CUTS, a neural Granger causal discovery algorithm to jointly impute unobserved data points and build causal graphs, via plugging in two mutually boosting modules in an iterative framework: (i) Latent data prediction stage: designs a Delayed Supervision Graph Neural Network (DSGNN) to hallucinate and register unstructured data which might be of high dimension and with complex distribution; (ii) Causal graph fitting stage: builds a causal adjacency matrix with imputed data under sparse penalty. Experiments show that CUTS effectively infers causal graphs from unstructured time-series data, with significantly superior performance to existing methods. Our approach constitutes a promising step towards applying causal discovery to real applications with non-ideal observations.
[[2302.07440] Road Redesign Technique Achieving Enhanced Road Safety by Inpainting with a Diffusion Model](http://arxiv.org/abs/2302.07440) #diffusion
Road infrastructure can affect the occurrence of road accidents. Therefore, identifying roadway features with high accident probability is crucial. Here, we introduce image inpainting that can assist authorities in achieving safe roadway design with minimal intervention in the current roadway structure. Image inpainting is based on inpainting safe roadway elements in a roadway image, replacing accident-prone (AP) features by using a diffusion model. After object-level segmentation, the AP features identified by the properties of accident hotspots are masked by a human operator and safe roadway elements are inpainted. With only an average time of 2 min for image inpainting, the likelihood of an image being classified as an accident hotspot drops by an average of 11.85%. In addition, safe urban spaces can be designed considering human factors of commuters such as gaze saliency. Considering this, we introduce saliency enhancement that suggests chrominance alteration for a safe road view.
[[2302.07685] Video Probabilistic Diffusion Models in Projected Latent Space](http://arxiv.org/abs/2302.07685) #diffusion
Despite the remarkable progress in deep generative models, synthesizing high-resolution and temporally coherent videos still remains a challenge due to their high-dimensionality and complex temporal dynamics along with large spatial variations. Recent works on diffusion models have shown their potential to solve this challenge, yet they suffer from severe computation- and memory-inefficiency that limit the scalability. To handle this issue, we propose a novel generative model for videos, coined projected latent video diffusion models (PVDM), a probabilistic diffusion model which learns a video distribution in a low-dimensional latent space and thus can be efficiently trained with high-resolution videos under limited resources. Specifically, PVDM is composed of two components: (a) an autoencoder that projects a given video as 2D-shaped latent vectors that factorize the complex cubic structure of video pixels and (b) a diffusion model architecture specialized for our new factorized latent space and the training/sampling procedure to synthesize videos of arbitrary length with a single model. Experiments on popular video generation datasets demonstrate the superiority of PVDM compared with previous video synthesis methods; e.g., PVDM obtains the FVD score of 639.7 on the UCF-101 long video (128 frames) generation benchmark, which improves 1773.4 of the prior state-of-the-art.
[[2302.07411] Real-time chaotic video encryption based on multithreaded parallel confusion and diffusion](http://arxiv.org/abs/2302.07411) #diffusion
Due to the strong correlation between adjacent pixels, most image encryption schemes perform multiple rounds of confusion and diffusion to protect the image against attacks. Such operations, however, are time-consuming, cannot meet the real-time requirements of video encryption. Existing works, therefore, realize video encryption by simplifying the encryption process or encrypting specific parts of video frames, which results in lower security compared to image encryption. To solve the problem, this paper proposes a real-time chaotic video encryption strategy based on multithreaded parallel confusion and diffusion. It takes a video as the input, splits the frame into subframes, creates a set of threads to simultaneously perform five rounds of confusion and diffusion operations on corresponding subframes, and efficiently outputs the encrypted frames. The encryption speed evaluation shows that our method significantly improves the confusion and diffusion speed, realizes real-time 480x480, 576x576, and 768x768 24FPS video encryption using Intel Core i5-1135G7, Intel Core i7-8700, and Intel Xeon Gold 6226R, respectively. The statistical and security analysis prove that the deployed cryptosystems have outstanding statistical properties, can resist attacks, channel noise, and data loss. Compared with existing works, to the best of our knowledge, the proposed strategy achieves the fastest encryption speed, and realizes the first real-time chaotic video encryption that reaches the security level of image encryption. In addition, it is suitable for many confusion, diffusion algorithms and can be easily deployed with both hardware and software.
[[2302.07400] Score-based Diffusion Models in Function Space](http://arxiv.org/abs/2302.07400) #diffusion
Diffusion models have recently emerged as a powerful framework for generative modeling. They consist of a forward process that perturbs input data with Gaussian white noise and a reverse process that learns a score function to generate samples by denoising. Despite their tremendous success, they are mostly formulated on finite-dimensional spaces, e.g. Euclidean, limiting their applications to many domains where the data has a functional form such as in scientific computing and 3D geometric data analysis. In this work, we introduce a mathematically rigorous framework called Denoising Diffusion Operators (DDOs) for training diffusion models in function space. In DDOs, the forward process perturbs input functions gradually using a Gaussian process. The generative process is formulated by integrating a function-valued Langevin dynamic. Our approach requires an appropriate notion of the score for the perturbed data distribution, which we obtain by generalizing denoising score matching to function spaces that can be infinite-dimensional. We show that the corresponding discretized algorithm generates accurate samples at a fixed cost that is independent of the data resolution. We theoretically and numerically verify the applicability of our approach on a set of problems, including generating solutions to the Navier-Stokes equation viewed as the push-forward distribution of forcings from a Gaussian Random Field (GRF).
[[2302.07405] Unsupervised physics-informed neural network in reaction-diffusion biology models (Ulcerative colitis and Crohn's disease cases) A preliminary study](http://arxiv.org/abs/2302.07405) #diffusion
We propose to explore the potential of physics-informed neural networks (PINNs) in solving a class of partial differential equations (PDEs) used to model the propagation of chronic inflammatory bowel diseases, such as Crohn's disease and ulcerative colitis. An unsupervised approach was privileged during the deep neural network training. Given the complexity of the underlying biological system, characterized by intricate feedback loops and limited availability of high-quality data, the aim of this study is to explore the potential of PINNs in solving PDEs. In addition to providing this exploratory assessment, we also aim to emphasize the principles of reproducibility and transparency in our approach, with a specific focus on ensuring the robustness and generalizability through the use of artificial intelligence. We will quantify the relevance of the PINN method with several linear and non-linear PDEs in relation to biology. However, it is important to note that the final solution is dependent on the initial conditions, chosen boundary conditions, and neural network architectures.
[[2302.07865] Dataset Interfaces: Diagnosing Model Failures Using Controllable Counterfactual Generation](http://arxiv.org/abs/2302.07865) #diffusion
Distribution shifts are a major source of failure of deployed machine learning models. However, evaluating a model's reliability under distribution shifts can be challenging, especially since it may be difficult to acquire counterfactual examples that exhibit a specified shift. In this work, we introduce dataset interfaces: a framework which allows users to scalably synthesize such counterfactual examples from a given dataset. Specifically, we represent each class from the input dataset as a custom token within the text space of a text-to-image diffusion model. By incorporating these tokens into natural language prompts, we can then generate instantiations of objects in that dataset under desired distribution shifts. We demonstrate how applying our framework to the ImageNet dataset enables us to study model behavior across a diverse array of shifts, including variations in background, lighting, and attributes of the objects themselves. Code available at https://github.com/MadryLab/dataset-interfaces.