[[2301.09254] Learning to Linearize Deep Neural Networks for Secure and Efficient Private Inference](http://arxiv.org/abs/2301.09254) #secure
The large number of ReLU non-linearity operations in existing deep neural networks makes them ill-suited for latency-efficient private inference (PI). Existing techniques to reduce ReLU operations often involve manual effort and sacrifice significant accuracy. In this paper, we first present a novel measure of non-linearity layers' ReLU sensitivity, enabling mitigation of the time-consuming manual efforts in identifying the same. Based on this sensitivity, we then present SENet, a three-stage training method that for a given ReLU budget, automatically assigns per-layer ReLU counts, decides the ReLU locations for each layer's activation map, and trains a model with significantly fewer ReLUs to potentially yield latency and communication efficient PI. Experimental evaluations with multiple models on various datasets show SENet's superior performance both in terms of reduced ReLUs and improved classification accuracy compared to existing alternatives. In particular, SENet can yield models that require up to ~2x fewer ReLUs while yielding similar accuracy. For a similar ReLU budget SENet can yield models with ~2.32% improved classification accuracy, evaluated on CIFAR-100.
[[2301.08918] Is Signed Message Essential for Graph Neural Networks?](http://arxiv.org/abs/2301.08918) #secure
Message-passing Graph Neural Networks (GNNs), which collect information from adjacent nodes, achieve satisfying results on homophilic graphs. However, their performances are dismal in heterophilous graphs, and many researchers have proposed a plethora of schemes to solve this problem. Especially, flipping the sign of edges is rooted in a strong theoretical foundation, and attains significant performance enhancements. Nonetheless, previous analyses assume a binary class scenario and they may suffer from confined applicability. This paper extends the prior understandings to multi-class scenarios and points out two drawbacks: (1) the sign of multi-hop neighbors depends on the message propagation paths and may incur inconsistency, (2) it also increases the prediction uncertainty (e.g., conflict evidence) which can impede the stability of the algorithm. Based on the theoretical understanding, we introduce a novel strategy that is applicable to multi-class graphs. The proposed scheme combines confidence calibration to secure robustness while reducing uncertainty. We show the efficacy of our theorem through extensive experiments on six benchmark graph datasets.
[[2301.09376] Crowd3D: Towards Hundreds of People Reconstruction from a Single Image](http://arxiv.org/abs/2301.09376) #security
Image-based multi-person reconstruction in wide-field large scenes is critical for crowd analysis and security alert. However, existing methods cannot deal with large scenes containing hundreds of people, which encounter the challenges of large number of people, large variations in human scale, and complex spatial distribution. In this paper, we propose Crowd3D, the first framework to reconstruct the 3D poses, shapes and locations of hundreds of people with global consistency from a single large-scene image. The core of our approach is to convert the problem of complex crowd localization into pixel localization with the help of our newly defined concept, Human-scene Virtual Interaction Point (HVIP). To reconstruct the crowd with global consistency, we propose a progressive reconstruction network based on HVIP by pre-estimating a scene-level camera and a ground plane. To deal with a large number of persons and various human sizes, we also design an adaptive human-centric cropping scheme. Besides, we contribute a benchmark dataset, LargeCrowd, for crowd reconstruction in a large scene. Experimental results demonstrate the effectiveness of the proposed method. The code and datasets will be made public.
[[2301.08806] TxT: Real-time Transaction Encapsulation for Ethereum Smart Contracts](http://arxiv.org/abs/2301.08806) #security
Ethereum is a permissionless blockchain ecosystem that supports execution of smart contracts, the key enablers of decentralized finance (DeFi) and non-fungible tokens (NFT). However, the expressiveness of Ethereum smart contracts is a double-edged sword: while it enables blockchain programmability, it also introduces security vulnerabilities, i.e., the exploitable discrepancies between expected and actual behaviors of the contract code. To address these discrepancies and increase the vulnerability coverage, we propose a new smart contract security testing approach called transaction encapsulation. The core idea lies in the local execution of transactions on a fully-synchronized yet isolated Ethereum node, which creates a preview of outcomes of transaction sequences on the current state of blockchain. This approach poses a critical technical challenge -- the well-known time-of-check/time-of-use (TOCTOU) problem, i.e., the assurance that the final transactions will exhibit the same execution paths as the encapsulated test transactions. In this work, we determine the exact conditions for guaranteed execution path replicability of the tested transactions, and implement a transaction testing tool, TxT, which reveals the actual outcomes of Ethereum transactions. To ensure the correctness of testing, TxT deterministically verifies whether a given sequence of transactions ensues an identical execution path on the current state of blockchain. We analyze over 1.3 billion Ethereum transactions and determine that 96.5% of them can be verified by TxT. We further show that TxT successfully reveals the suspicious behaviors associated with 31 out of 37 vulnerabilities (83.8% coverage) in the smart contract weakness classification (SWC) registry. In comparison, the vulnerability coverage of all the existing defense approaches combined only reaches 40.5%.
[[2301.09207] VeraSel: Verifiable Random Selection for Mixnets Construction](http://arxiv.org/abs/2301.09207) #security
The security and performance of Mixnets depends on the trustworthiness of the Mixnodes in the network. The challenge is to limit the adversary's influence on which Mixnodes operate in the network. A trusted party (such as the Mixnet operator) may ensure this, however, it is a single point of failure in the event of corruption or coercion. Therefore, we study the problem of how to select a subset of Mixnodes in a distributed way for Mixnet construction. We present VeraSel, a scheme that enables Mixnodes to be chosen according to their weights in a distributed, unbiased, and verifiable fashion using Verifiable Random Functions (VRFs). It is shown that VeraSel enables any party to learn and verify which nodes has been selected based on the commitments and proofs generated by each Mixnode with VRF.
[[2301.09320] A Framework for Evaluating the Impact of Food Security Scenarios](http://arxiv.org/abs/2301.09320) #security
This study proposes an approach for predicting the impacts of scenarios on food security and demonstrates its application in a case study. The approach involves two main steps: (1) scenario definition, in which the end user specifies the assumptions and impacts of the scenario using a scenario template, and (2) scenario evaluation, in which a Vector Autoregression (VAR) model is used in combination with Monte Carlo simulation to generate predictions for the impacts of the scenario based on the defined assumptions and impacts. The case study is based on a proprietary time series food security database created using data from the Food and Agriculture Organization of the United Nations (FAOSTAT), the World Bank, and the United States Department of Agriculture (USDA). The database contains a wide range of data on various indicators of food security, such as production, trade, consumption, prices, availability, access, and nutritional value. The results show that the proposed approach can be used to predict the potential impacts of scenarios on food security and that the proprietary time series food security database can be used to support this approach. The study provides specific insights on how this approach can inform decision-making processes related to food security such as food prices and availability in the case study region.
[[2301.09255] Combined Use of Federated Learning and Image Encryption for Privacy-Preserving Image Classification with Vision Transformer](http://arxiv.org/abs/2301.09255) #privacy
In recent years, privacy-preserving methods for deep learning have become an urgent problem. Accordingly, we propose the combined use of federated learning (FL) and encrypted images for privacy-preserving image classification under the use of the vision transformer (ViT). The proposed method allows us not only to train models over multiple participants without directly sharing their raw data but to also protect the privacy of test (query) images for the first time. In addition, it can also maintain the same accuracy as normally trained models. In an experiment, the proposed method was demonstrated to well work without any performance degradation on the CIFAR-10 and CIFAR-100 datasets.
[[2301.09112] Differentially Private Natural Language Models: Recent Advances and Future Directions](http://arxiv.org/abs/2301.09112) #privacy
Recent developments in deep learning have led to great success in various natural language processing (NLP) tasks. However, these applications may involve data that contain sensitive information. Therefore, how to achieve good performance while also protect privacy of sensitive data is a crucial challenge in NLP. To preserve privacy, Differential Privacy (DP), which can prevent reconstruction attacks and protect against potential side knowledge, is becoming a de facto technique for private data analysis. In recent years, NLP in DP models (DP-NLP) has been studied from different perspectives, which deserves a comprehensive review. In this paper, we provide the first systematic review of recent advances on DP deep learning models in NLP. In particular, we first discuss some differences and additional challenges of DP-NLP compared with the standard DP deep learning. Then we investigate some existing work on DP-NLP and present its recent developments from two aspects: gradient perturbation based methods and embedding vector perturbation based methods. We also discuss some challenges and future directions of this topic.
[[2301.08778] Split Ways: Privacy-Preserving Training of Encrypted Data Using Split Learning](http://arxiv.org/abs/2301.08778) #privacy
Split Learning (SL) is a new collaborative learning technique that allows participants, e.g. a client and a server, to train machine learning models without the client sharing raw data. In this setting, the client initially applies its part of the machine learning model on the raw data to generate activation maps and then sends them to the server to continue the training process. Previous works in the field demonstrated that reconstructing activation maps could result in privacy leakage of client data. In addition to that, existing mitigation techniques that overcome the privacy leakage of SL prove to be significantly worse in terms of accuracy. In this paper, we improve upon previous works by constructing a protocol based on U-shaped SL that can operate on homomorphically encrypted data. More precisely, in our approach, the client applies Homomorphic Encryption (HE) on the activation maps before sending them to the server, thus protecting user privacy. This is an important improvement that reduces privacy leakage in comparison to other SL-based works. Finally, our results show that, with the optimum set of parameters, training with HE data in the U-shaped SL setting only reduces accuracy by 2.65% compared to training on plaintext. In addition, raw training data privacy is preserved.
[[2301.09041] Exploiting Out-of-band Motion Sensor Data to De-anonymize Virtual Reality Users](http://arxiv.org/abs/2301.09041) #privacy
Virtual Reality (VR) is an exciting new consumer technology which offers an immersive audio-visual experience to users through which they can navigate and interact with a digitally represented 3D space (i.e., a virtual world) using a headset device. By (visually) transporting users from the real or physical world to exciting and realistic virtual spaces, VR systems can enable true-to-life and more interactive versions of traditional applications such as gaming, remote conferencing, social networking and virtual tourism. However, as with any new consumer technology, VR applications also present significant user-privacy challenges. This paper studies a new type of privacy attack targeting VR users by connecting their activities visible in the virtual world (enabled by some VR application/service) to their physical state sensed in the real world. Specifically, this paper analyzes the feasibility of carrying out a de-anonymization or identification attack on VR users by correlating visually observed movements of users' avatars in the virtual world with some auxiliary data (e.g., motion sensor data from mobile/wearable devices held by users) representing their context/state in the physical world. To enable this attack, this paper proposes a novel framework which first employs a learning-based activity classification approach to translate the disparate visual movement data and motion sensor data into an activity-vector to ease comparison, followed by a filtering and identity ranking phase outputting an ordered list of potential identities corresponding to the target visual movement data. Extensive empirical evaluation of the proposed framework, under a comprehensive set of experimental settings, demonstrates the feasibility of such a de-anonymization attack.
[[2301.09378] Citadel: Self-Sovereign Identities on Dusk Network](http://arxiv.org/abs/2301.09378) #privacy
The amount of sensitive information that service providers handle about their users has become a concerning fact in many use cases, where users have no other option but to trust that those companies will not misuse their personal information. To solve that, Self-Sovereign Identity (SSI) systems have become a hot topic of research in recent years: SSI systems allow users to manage their identities transparently. Recent solutions represent the rights of users to use services as Non-Fungible Tokens (NFTs) stored on Blockchains, and users prove possession of these rights using Zero-Knowledge Proofs (ZKPs). However, even when ZKPs do not leak any information about the rights, the NFTs are stored as public values linked to known accounts, and thus, they can be traced. In this paper, we design a native privacy-preserving NFT model for the Dusk Network Blockchain, and on top of it, we deploy Citadel: our novel full-privacy-preserving SSI system, where the rights of the users are privately stored on the Dusk Network Blockchain, and users can prove their ownership in a fully private manner.
[[2301.08844] Statistical Theory of Differentially Private Marginal-based Data Synthesis Algorithms](http://arxiv.org/abs/2301.08844) #privacy
Marginal-based methods achieve promising performance in the synthetic data competition hosted by the National Institute of Standards and Technology (NIST). To deal with high-dimensional data, the distribution of synthetic data is represented by a probabilistic graphical model (e.g., a Bayesian network), while the raw data distribution is approximated by a collection of low-dimensional marginals. Differential privacy (DP) is guaranteed by introducing random noise to each low-dimensional marginal distribution. Despite its promising performance in practice, the statistical properties of marginal-based methods are rarely studied in the literature. In this paper, we study DP data synthesis algorithms based on Bayesian networks (BN) from a statistical perspective. We establish a rigorous accuracy guarantee for BN-based algorithms, where the errors are measured by the total variation (TV) distance or the $L^2$ distance. Related to downstream machine learning tasks, an upper bound for the utility error of the DP synthetic data is also derived. To complete the picture, we establish a lower bound for TV accuracy that holds for every $\epsilon$-DP synthetic data generator.
[[2301.09496] ECGAN: Self-supervised generative adversarial network for electrocardiography](http://arxiv.org/abs/2301.09496) #privacy
High-quality synthetic data can support the development of effective predictive models for biomedical tasks, especially in rare diseases or when subject to compelling privacy constraints. These limitations, for instance, negatively impact open access to electrocardiography datasets about arrhythmias. This work introduces a self-supervised approach to the generation of synthetic electrocardiography time series which is shown to promote morphological plausibility. Our model (ECGAN) allows conditioning the generative process for specific rhythm abnormalities, enhancing synchronization and diversity across samples with respect to literature models. A dedicated sample quality assessment framework is also defined, leveraging arrhythmia classifiers. The empirical results highlight a substantial improvement against state-of-the-art generative models for sequences and audio synthesis.
[[2301.09003] Blacks is to Anger as Whites is to Joy? Understanding Latent Affective Bias in Large Pre-trained Neural Language Models](http://arxiv.org/abs/2301.09003) #protect
Groundbreaking inventions and highly significant performance improvements in deep learning based Natural Language Processing are witnessed through the development of transformer based large Pre-trained Language Models (PLMs). The wide availability of unlabeled data within human generated data deluge along with self-supervised learning strategy helps to accelerate the success of large PLMs in language generation, language understanding, etc. But at the same time, latent historical bias/unfairness in human minds towards a particular gender, race, etc., encoded unintentionally/intentionally into the corpora harms and questions the utility and efficacy of large PLMs in many real-world applications, particularly for the protected groups. In this paper, we present an extensive investigation towards understanding the existence of "Affective Bias" in large PLMs to unveil any biased association of emotions such as anger, fear, joy, etc., towards a particular gender, race or religion with respect to the downstream task of textual emotion detection. We conduct our exploration of affective bias from the very initial stage of corpus level affective bias analysis by searching for imbalanced distribution of affective words within a domain, in large scale corpora that are used to pre-train and fine-tune PLMs. Later, to quantify affective bias in model predictions, we perform an extensive set of class-based and intensity-based evaluations using various bias evaluation corpora. Our results show the existence of statistically significant affective bias in the PLM based emotion detection systems, indicating biased association of certain emotions towards a particular gender, race, and religion.
[[2301.08751] Towards Understanding How Self-training Tolerates Data Backdoor Poisoning](http://arxiv.org/abs/2301.08751) #defense
Recent studies on backdoor attacks in model training have shown that polluting a small portion of training data is sufficient to produce incorrect manipulated predictions on poisoned test-time data while maintaining high clean accuracy in downstream tasks. The stealthiness of backdoor attacks has imposed tremendous defense challenges in today's machine learning paradigm. In this paper, we explore the potential of self-training via additional unlabeled data for mitigating backdoor attacks. We begin by making a pilot study to show that vanilla self-training is not effective in backdoor mitigation. Spurred by that, we propose to defend the backdoor attacks by leveraging strong but proper data augmentations in the self-training pseudo-labeling stage. We find that the new self-training regime help in defending against backdoor attacks to a great extent. Its effectiveness is demonstrated through experiments for different backdoor triggers on CIFAR-10 and a combination of CIFAR-10 with an additional unlabeled 500K TinyImages dataset. Finally, we explore the direction of combining self-supervised representation learning with self-training for further improvement in backdoor defense.
[[2301.09508] BayBFed: Bayesian Backdoor Defense for Federated Learning](http://arxiv.org/abs/2301.09508) #defense
Federated learning (FL) allows participants to jointly train a machine learning model without sharing their private data with others. However, FL is vulnerable to poisoning attacks such as backdoor attacks. Consequently, a variety of defenses have recently been proposed, which have primarily utilized intermediary states of the global model (i.e., logits) or distance of the local models (i.e., L2-norm) from the global model to detect malicious backdoors. However, as these approaches directly operate on client updates, their effectiveness depends on factors such as clients' data distribution or the adversary's attack strategies. In this paper, we introduce a novel and more generic backdoor defense framework, called BayBFed, which proposes to utilize probability distributions over client updates to detect malicious updates in FL: it computes a probabilistic measure over the clients' updates to keep track of any adjustments made in the updates, and uses a novel detection algorithm that can leverage this probabilistic measure to efficiently detect and filter out malicious updates. Thus, it overcomes the shortcomings of previous approaches that arise due to the direct usage of client updates; as our probabilistic measure will include all aspects of the local client training strategies. BayBFed utilizes two Bayesian Non-Parametric extensions: (i) a Hierarchical Beta-Bernoulli process to draw a probabilistic measure given the clients' updates, and (ii) an adaptation of the Chinese Restaurant Process (CRP), referred by us as CRP-Jensen, which leverages this probabilistic measure to detect and filter out malicious updates. We extensively evaluate our defense approach on five benchmark datasets: CIFAR10, Reddit, IoT intrusion detection, MNIST, and FMNIST, and show that it can effectively detect and eliminate malicious updates in FL without deteriorating the benign performance of the global model.
[[2301.09542] Improving Presentation Attack Detection for ID Cards on Remote Verification Systems](http://arxiv.org/abs/2301.09542) #attack
In this paper, an updated two-stage, end-to-end Presentation Attack Detection method for remote biometric verification systems of ID cards, based on MobileNetV2, is presented. Several presentation attack species such as printed, display, composite (based on cropped and spliced areas), plastic (PVC), and synthetic ID card images using different capture sources are used. This proposal was developed using a database consisting of 190.000 real case Chilean ID card images with the support of a third-party company. Also, a new framework called PyPAD, used to estimate multi-class metrics compliant with the ISO/IEC 30107-3 standard was developed, and will be made available for research purposes. Our method is trained on two convolutional neural networks separately, reaching BPCER\textsubscript{100} scores on ID cards attacks of 1.69\% and 2.36\% respectively. The two-stage method using both models together can reach a BPCER\textsubscript{100} score of 0.92\%.
[[2301.08824] An Automated Vulnerability Detection Framework for Smart Contracts](http://arxiv.org/abs/2301.08824) #attack
With the increase of the adoption of blockchain technology in providing decentralized solutions to various problems, smart contracts have become more popular to the point that billions of US Dollars are currently exchanged every day through such technology. Meanwhile, various vulnerabilities in smart contracts have been exploited by attackers to steal cryptocurrencies worth millions of dollars. The automatic detection of smart contract vulnerabilities therefore is an essential research problem. Existing solutions to this problem particularly rely on human experts to define features or different rules to detect vulnerabilities. However, this often causes many vulnerabilities to be ignored, and they are inefficient in detecting new vulnerabilities. In this study, to overcome such challenges, we propose a framework to automatically detect vulnerabilities in smart contracts on the blockchain. More specifically, first, we utilize novel feature vector generation techniques from bytecode of smart contract since the source code of smart contracts are rarely available in public. Next, the collected vectors are fed into our novel metric learning-based deep neural network(DNN) to get the detection result. We conduct comprehensive experiments on large-scale benchmarks, and the quantitative results demonstrate the effectiveness and efficiency of our approach.
[[2301.09069] Provable Unrestricted Adversarial Training without Compromise with Generalizability](http://arxiv.org/abs/2301.09069) #attack
Adversarial training (AT) is widely considered as the most promising strategy to defend against adversarial attacks and has drawn increasing interest from researchers. However, the existing AT methods still suffer from two challenges. First, they are unable to handle unrestricted adversarial examples (UAEs), which are built from scratch, as opposed to restricted adversarial examples (RAEs), which are created by adding perturbations bound by an $l_p$ norm to observed examples. Second, the existing AT methods often achieve adversarial robustness at the expense of standard generalizability (i.e., the accuracy on natural examples) because they make a tradeoff between them. To overcome these challenges, we propose a unique viewpoint that understands UAEs as imperceptibly perturbed unobserved examples. Also, we find that the tradeoff results from the separation of the distributions of adversarial examples and natural examples. Based on these ideas, we propose a novel AT approach called Provable Unrestricted Adversarial Training (PUAT), which can provide a target classifier with comprehensive adversarial robustness against both UAE and RAE, and simultaneously improve its standard generalizability. Particularly, PUAT utilizes partially labeled data to achieve effective UAE generation by accurately capturing the natural data distribution through a novel augmented triple-GAN. At the same time, PUAT extends the traditional AT by introducing the supervised loss of the target classifier into the adversarial loss and achieves the alignment between the UAE distribution, the natural data distribution, and the distribution learned by the classifier, with the collaboration of the augmented triple-GAN. Finally, the solid theoretical analysis and extensive experiments conducted on widely-used benchmarks demonstrate the superiority of PUAT.
[[2301.08849] CADA-GAN: Context-Aware GAN with Data Augmentation](http://arxiv.org/abs/2301.08849) #robust
Current child face generators are restricted by the limited size of the available datasets. In addition, feature selection can prove to be a significant challenge, especially due to the large amount of features that need to be trained for. To manage these problems, we proposed CADA-GAN, a \textbf{C}ontext-\textbf{A}ware GAN that allows optimal feature extraction, with added robustness from additional \textbf{D}ata \textbf{A}ugmentation. CADA-GAN is adapted from the popular StyleGAN2-Ada model, with attention on augmentation and segmentation of the parent images. The model has the lowest \textit{Mean Squared Error Loss} (MSEloss) on latent feature representations and the generated child image is robust compared with the one that generated from baseline models.
[[2301.08898] Recurrent Contour-based Instance Segmentation with Progressive Learning](http://arxiv.org/abs/2301.08898) #robust
Contour-based instance segmentation has been actively studied, thanks to its flexibility and elegance in processing visual objects within complex backgrounds. In this work, we propose a novel deep network architecture, i.e., PolySnake, for contour-based instance segmentation. Motivated by the classic Snake algorithm, the proposed PolySnake achieves superior and robust segmentation performance with an iterative and progressive contour refinement strategy. Technically, PolySnake introduces a recurrent update operator to estimate the object contour iteratively. It maintains a single estimate of the contour that is progressively deformed toward the object boundary. At each iteration, PolySnake builds a semantic-rich representation for the current contour and feeds it to the recurrent operator for further contour adjustment. Through the iterative refinements, the contour finally progressively converges to a stable status that tightly encloses the object instance. Moreover, with a compact design of the recurrent architecture, we ensure the running efficiency under multiple iterations. Extensive experiments are conducted to validate the merits of our method, and the results demonstrate that the proposed PolySnake outperforms the existing contour-based instance segmentation methods on several prevalent instance segmentation benchmarks. The codes and models are available at https://github.com/fh2019ustc/PolySnake.
[[2301.09063] DASTSiam: Spatio-Temporal Fusion and Discriminative Augmentation for Improved Siamese Tracking](http://arxiv.org/abs/2301.09063) #robust
Tracking tasks based on deep neural networks have greatly improved with the emergence of Siamese trackers. However, the appearance of targets often changes during tracking, which can reduce the robustness of the tracker when facing challenges such as aspect ratio change, occlusion, and scale variation. In addition, cluttered backgrounds can lead to multiple high response points in the response map, leading to incorrect target positioning. In this paper, we introduce two transformer-based modules to improve Siamese tracking called DASTSiam: the spatio-temporal (ST) fusion module and the Discriminative Augmentation (DA) module. The ST module uses cross-attention based accumulation of historical cues to improve robustness against object appearance changes, while the DA module associates semantic information between the template and search region to improve target discrimination. Moreover, Modifying the label assignment of anchors also improves the reliability of the object location. Our modules can be used with all Siamese trackers and show improved performance on several public datasets through comparative and ablation experiments.
[[2301.09253] CircNet: Meshing 3D Point Clouds with Circumcenter Detection](http://arxiv.org/abs/2301.09253) #robust
Reconstructing 3D point clouds into triangle meshes is a key problem in computational geometry and surface reconstruction. Point cloud triangulation solves this problem by providing edge information to the input points. Since no vertex interpolation is involved, it is beneficial to preserve sharp details on the surface. Taking advantage of learning-based techniques in triangulation, existing methods enumerate the complete combinations of candidate triangles, which is both complex and inefficient. In this paper, we leverage the duality between a triangle and its circumcenter, and introduce a deep neural network that detects the circumcenters to achieve point cloud triangulation. Specifically, we introduce multiple anchor priors to divide the neighborhood space of each point. The neural network then learns to predict the presences and locations of circumcenters under the guidance of those anchors. We extract the triangles dual to the detected circumcenters to form a primitive mesh, from which an edge-manifold mesh is produced via simple post-processing. Unlike existing learning-based triangulation methods, the proposed method bypasses an exhaustive enumeration of triangle combinations and local surface parameterization. We validate the efficiency, generalization, and robustness of our method on prominent datasets of both watertight and open surfaces. The code and trained models are provided at https://github.com/Ruitao-L/CircNet.
[[2301.09416] Towards Robust Video Instance Segmentation with Temporal-Aware Transformer](http://arxiv.org/abs/2301.09416) #robust
Most existing transformer based video instance segmentation methods extract per frame features independently, hence it is challenging to solve the appearance deformation problem. In this paper, we observe the temporal information is important as well and we propose TAFormer to aggregate spatio-temporal features both in transformer encoder and decoder. Specifically, in transformer encoder, we propose a novel spatio-temporal joint multi-scale deformable attention module which dynamically integrates the spatial and temporal information to obtain enriched spatio-temporal features. In transformer decoder, we introduce a temporal self-attention module to enhance the frame level box queries with the temporal relation. Moreover, TAFormer adopts an instance level contrastive loss to increase the discriminability of instance query embeddings. Therefore the tracking error caused by visually similar instances can be decreased. Experimental results show that TAFormer effectively leverages the spatial and temporal information to obtain context-aware feature representation and outperforms state-of-the-art methods.
[[2301.08745] Is ChatGPT A Good Translator? A Preliminary Study](http://arxiv.org/abs/2301.08745) #robust
This report provides a preliminary evaluation of ChatGPT for machine translation, including translation prompt, multilingual translation, and translation robustness. We adopt the prompts advised by ChatGPT to trigger its translation ability and find that the candidate prompts generally work well and show minor performance differences. By evaluating on a number of benchmark test sets, we find that ChatGPT performs competitively with commercial translation products (e.g., Google Translate) on high-resource European languages but lags behind significantly on lowresource or distant languages. As for the translation robustness, ChatGPT does not perform as well as the commercial systems on biomedical abstracts or Reddit comments but is potentially a good translator for spoken language. Scripts and data: https://github.com/wxjiao/Is-ChatGPT-A-Good-Translator
[[2301.08881] Dr](http://arxiv.org/abs/2301.08881) #robust
Neural text-to-SQL models have achieved remarkable performance in translating natural language questions into SQL queries. However, recent studies reveal that text-to-SQL models are vulnerable to task-specific perturbations. Previous curated robustness test sets usually focus on individual phenomena. In this paper, we propose a comprehensive robustness benchmark based on Spider, a cross-domain text-to-SQL benchmark, to diagnose the model robustness. We design 17 perturbations on databases, natural language questions, and SQL queries to measure the robustness from different angles. In order to collect more diversified natural question perturbations, we utilize large pretrained language models (PLMs) to simulate human behaviors in creating natural questions. We conduct a diagnostic study of the state-of-the-art models on the robustness set. Experimental results reveal that even the most robust model suffers from a 14.0% performance drop overall and a 50.7% performance drop on the most challenging perturbation. We also present a breakdown analysis regarding text-to-SQL model designs and provide insights for improving model robustness.
[[2301.08833] Bayesian Hierarchical Models for Counterfactual Estimation](http://arxiv.org/abs/2301.08833) #robust
Counterfactual explanations utilize feature perturbations to analyze the outcome of an original decision and recommend an actionable recourse. We argue that it is beneficial to provide several alternative explanations rather than a single point solution and propose a probabilistic paradigm to estimate a diverse set of counterfactuals. Specifically, we treat the perturbations as random variables endowed with prior distribution functions. This allows sampling multiple counterfactuals from the posterior density, with the added benefit of incorporating inductive biases, preserving domain specific constraints and quantifying uncertainty in estimates. More importantly, we leverage Bayesian hierarchical modeling to share information across different subgroups of a population, which can both improve robustness and measure fairness. A gradient based sampler with superior convergence characteristics efficiently computes the posterior samples. Experiments across several datasets demonstrate that the counterfactuals estimated using our approach are valid, sparse, diverse and feasible.
[[2301.08842] Limitations of Piecewise Linearity for Efficient Robustness Certification](http://arxiv.org/abs/2301.08842) #robust
Certified defenses against small-norm adversarial examples have received growing attention in recent years; though certified accuracies of state-of-the-art methods remain far below their non-robust counterparts, despite the fact that benchmark datasets have been shown to be well-separated at far larger radii than the literature generally attempts to certify. In this work, we offer insights that identify potential factors in this performance gap. Specifically, our analysis reveals that piecewise linearity imposes fundamental limitations on the tightness of leading certification techniques. These limitations are felt in practical terms as a greater need for capacity in models hoped to be certified efficiently. Moreover, this is in addition to the capacity necessary to learn a robust boundary, studied in prior work. However, we argue that addressing the limitations of piecewise linearity through scaling up model capacity may give rise to potential difficulties -- particularly regarding robust generalization -- therefore, we conclude by suggesting that developing smooth activation functions may be the way forward for advancing the performance of certified neural networks.
[[2301.09030] Condition monitoring and anomaly detection in cyber-physical systems](http://arxiv.org/abs/2301.09030) #robust
The modern industrial environment is equipping myriads of smart manufacturing machines where the state of each device can be monitored continuously. Such monitoring can help identify possible future failures and develop a cost-effective maintenance plan. However, it is a daunting task to perform early detection with low false positives and negatives from the huge volume of collected data. This requires developing a holistic machine learning framework to address the issues in condition monitoring of high-priority components and develop efficient techniques to detect anomalies that can detect and possibly localize the faulty components. This paper presents a comparative analysis of recent machine learning approaches for robust, cost-effective anomaly detection in cyber-physical systems. While detection has been extensively studied, very few researchers have analyzed the localization of the anomalies. We show that supervised learning outperforms unsupervised algorithms. For supervised cases, we achieve near-perfect accuracy of 98 percent (specifically for tree-based algorithms). In contrast, the best-case accuracy in the unsupervised cases was 63 percent :the area under the receiver operating characteristic curve (AUC) exhibits similar outcomes as an additional metric.
[[2301.09210] Debiasing the Cloze Task in Sequential Recommendation with Bidirectional Transformers](http://arxiv.org/abs/2301.09210) #robust
Bidirectional Transformer architectures are state-of-the-art sequential recommendation models that use a bi-directional representation capacity based on the Cloze task, a.k.a. Masked Language Modeling. The latter aims to predict randomly masked items within the sequence. Because they assume that the true interacted item is the most relevant one, an exposure bias results, where non-interacted items with low exposure propensities are assumed to be irrelevant. The most common approach to mitigating exposure bias in recommendation has been Inverse Propensity Scoring (IPS), which consists of down-weighting the interacted predictions in the loss function in proportion to their propensities of exposure, yielding a theoretically unbiased learning. In this work, we argue and prove that IPS does not extend to sequential recommendation because it fails to account for the temporal nature of the problem. We then propose a novel propensity scoring mechanism, which can theoretically debias the Cloze task in sequential recommendation. Finally we empirically demonstrate the debiasing capabilities of our proposed approach and its robustness to the severity of exposure bias.
[[2301.09521] WDC Products: A Multi-Dimensional Entity Matching Benchmark](http://arxiv.org/abs/2301.09521) #robust
The difficulty of an entity matching task depends on a combination of multiple factors such as the amount of corner-case pairs, the fraction of entities in the test set that have not been seen during training, and the size of the development set. Current entity matching benchmarks usually represent single points in the space along such dimensions or they provide for the evaluation of matching methods along a single dimension, for instance the amount of training data. This paper presents WDC Products, an entity matching benchmark which provides for the systematic evaluation of matching systems along combinations of three dimensions while relying on real-word data. The three dimensions are (i) amount of corner-cases (ii) generalization to unseen entities, and (iii) development set size. Generalization to unseen entities is a dimension not covered by any of the existing benchmarks yet but is crucial for evaluating the robustness of entity matching systems. WDC Products is based on heterogeneous product data from thousands of e-shops which mark-up products offers using schema.org annotations. Instead of learning how to match entity pairs, entity matching can also be formulated as a multi-class classification task that requires the matcher to recognize individual entities. WDC Products is the first benchmark that provides a pair-wise and a multi-class formulation of the same tasks and thus allows to directly compare the two alternatives. We evaluate WDC Products using several state-of-the-art matching systems, including Ditto, HierGAT, and R-SupCon. The evaluation shows that all matching systems struggle with unseen entities to varying degrees. It also shows that some systems are more training data efficient than others.
[[2301.09055] Resource-constrained FPGA Design for Satellite Component Feature Extraction](http://arxiv.org/abs/2301.09055) #extraction
The effective use of computer vision and machine learning for on-orbit applications has been hampered by limited computing capabilities, and therefore limited performance. While embedded systems utilizing ARM processors have been shown to meet acceptable but low performance standards, the recent availability of larger space-grade field programmable gate arrays (FPGAs) show potential to exceed the performance of microcomputer systems. This work proposes use of neural network-based object detection algorithm that can be deployed on a comparably resource-constrained FPGA to automatically detect components of non-cooperative, satellites on orbit. Hardware-in-the-loop experiments were performed on the ORION Maneuver Kinematics Simulator at Florida Tech to compare the performance of the new model deployed on a small, resource-constrained FPGA to an equivalent algorithm on a microcomputer system. Results show the FPGA implementation increases the throughput and decreases latency while maintaining comparable accuracy. These findings suggest future missions should consider deploying computer vision algorithms on space-grade FPGAs.
[[2301.09175] Ensemble Transfer Learning for Multilingual Coreference Resolution](http://arxiv.org/abs/2301.09175) #extraction
Entity coreference resolution is an important research problem with many applications, including information extraction and question answering. Coreference resolution for English has been studied extensively. However, there is relatively little work for other languages. A problem that frequently occurs when working with a non-English language is the scarcity of annotated training data. To overcome this challenge, we design a simple but effective ensemble-based framework that combines various transfer learning (TL) techniques. We first train several models using different TL methods. Then, during inference, we compute the unweighted average scores of the models' predictions to extract the final set of predicted clusters. Furthermore, we also propose a low-cost TL method that bootstraps coreference resolution models by utilizing Wikipedia anchor texts. Leveraging the idea that the coreferential links naturally exist between anchor texts pointing to the same article, our method builds a sizeable distantly-supervised dataset for the target language that consists of tens of thousands of documents. We can pre-train a model on the pseudo-labeled dataset before finetuning it on the final target dataset. Experimental results on two benchmark datasets, OntoNotes and SemEval, confirm the effectiveness of our methods. Our best ensembles consistently outperform the baseline approach of simple training by up to 7.68% in the F1 score. These ensembles also achieve new state-of-the-art results for three languages: Arabic, Dutch, and Spanish.
[[2301.09506] OvarNet: Towards Open-vocabulary Object Attribute Recognition](http://arxiv.org/abs/2301.09506) #federate
In this paper, we consider the problem of simultaneously detecting objects and inferring their visual attributes in an image, even for those with no manual annotations provided at the training stage, resembling an open-vocabulary scenario. To achieve this goal, we make the following contributions: (i) we start with a naive two-stage approach for open-vocabulary object detection and attribute classification, termed CLIP-Attr. The candidate objects are first proposed with an offline RPN and later classified for semantic category and attributes; (ii) we combine all available datasets and train with a federated strategy to finetune the CLIP model, aligning the visual representation with attributes, additionally, we investigate the efficacy of leveraging freely available online image-caption pairs under weakly supervised learning; (iii) in pursuit of efficiency, we train a Faster-RCNN type model end-to-end with knowledge distillation, that performs class-agnostic object proposals and classification on semantic categories and attributes with classifiers generated from a text encoder; Finally, (iv) we conduct extensive experiments on VAW, MS-COCO, LSA, and OVAD datasets, and show that recognition of semantic category and attributes is complementary for visual scene understanding, i.e., jointly training object detection and attributes prediction largely outperform existing approaches that treat the two tasks independently, demonstrating strong generalization ability to novel attributes and categories.
[[2301.08869] A Communication-Efficient Adaptive Algorithm for Federated Learning under Cumulative Regret](http://arxiv.org/abs/2301.08869) #federate
We consider the problem of online stochastic optimization in a distributed setting with $M$ clients connected through a central server. We develop a distributed online learning algorithm that achieves order-optimal cumulative regret with low communication cost measured in the total number of bits transmitted over the entire learning horizon. This is in contrast to existing studies which focus on the offline measure of simple regret for learning efficiency. The holistic measure for communication cost also departs from the prevailing approach that \emph{separately} tackles the communication frequency and the number of bits in each communication round.
[[2301.08968] The Best of Both Worlds: Accurate Global and Personalized Models through Federated Learning with Data-Free Hyper-Knowledge Distillation](http://arxiv.org/abs/2301.08968) #federate
Heterogeneity of data distributed across clients limits the performance of global models trained through federated learning, especially in the settings with highly imbalanced class distributions of local datasets. In recent years, personalized federated learning (pFL) has emerged as a potential solution to the challenges presented by heterogeneous data. However, existing pFL methods typically enhance performance of local models at the expense of the global model's accuracy. We propose FedHKD (Federated Hyper-Knowledge Distillation), a novel FL algorithm in which clients rely on knowledge distillation (KD) to train local models. In particular, each client extracts and sends to the server the means of local data representations and the corresponding soft predictions -- information that we refer to as ``hyper-knowledge". The server aggregates this information and broadcasts it to the clients in support of local training. Notably, unlike other KD-based pFL methods, FedHKD does not rely on a public dataset nor it deploys a generative model at the server. We analyze convergence of FedHKD and conduct extensive experiments on visual datasets in a variety of scenarios, demonstrating that FedHKD provides significant improvement in both personalized as well as global model performance compared to state-of-the-art FL methods designed for heterogeneous data settings.
[[2301.09109] Federated Recommendation with Additive Personalization](http://arxiv.org/abs/2301.09109) #federate
With rising concerns about privacy, developing recommendation systems in a federated setting become a new paradigm to develop next-generation Internet service architecture. However, existing approaches are usually derived from a distributed recommendation framework with an additional mechanism for privacy protection, thus most of them fail to fully exploit personalization in the new context of federated recommendation settings. In this paper, we propose a novel approach called Federated Recommendation with Additive Personalization (FedRAP) to enhance recommendation by learning user embedding and the user's personal view of item embeddings. Specifically, the proposed additive personalization is to add a personalized item embedding to a sparse global item embedding aggregated from all users. Moreover, a curriculum learning mechanism has been applied for additive personalization on item embeddings by gradually increasing regularization weights to mitigate the performance degradation caused by large variances among client-specific item embeddings. A unified formulation has been proposed with a sparse regularization of global item embeddings for reducing communication overhead. Experimental results on four real-world recommendation datasets demonstrate the effectiveness of FedRAP.
[[2301.09152] Prompt Federated Learning for Weather Forecasting: Toward Foundation Models on Meteorological Data](http://arxiv.org/abs/2301.09152) #federate
To tackle the global climate challenge, it urgently needs to develop a collaborative platform for comprehensive weather forecasting on large-scale meteorological data. Despite urgency, heterogeneous meteorological sensors across countries and regions, inevitably causing multivariate heterogeneity and data exposure, become the main barrier. This paper develops a foundation model across regions capable of understanding complex meteorological data and providing weather forecasting. To relieve the data exposure concern across regions, a novel federated learning approach has been proposed to collaboratively learn a brand-new spatio-temporal Transformer-based foundation model across participants with heterogeneous meteorological data. Moreover, a novel prompt learning mechanism has been adopted to satisfy low-resourced sensors' communication and computational constraints. The effectiveness of the proposed method has been demonstrated on classical weather forecasting tasks using three meteorological datasets with multivariate time series.
[[2301.09165] Energy Prediction using Federated Learning](http://arxiv.org/abs/2301.09165) #federate
In this work, we demonstrate the viability of using federated learning to successfully predict energy consumption as well as solar production for all households within a certain network using low-power and low-space consuming embedded devices. We also demonstrate our prediction performance improving over time without the need for sharing private consumer energy data. We simulate a system with four nodes using data for one year to show this.
[[2301.09269] M22: A Communication-Efficient Algorithm for Federated Learning Inspired by Rate-Distortion](http://arxiv.org/abs/2301.09269) #federate
In federated learning (FL), the communication constraint between the remote
learners and the Parameter Server (PS) is a crucial bottleneck. For this
reason, model updates must be compressed so as to minimize the loss in accuracy
resulting from the communication constraint. This paper proposes \emph{${\bf
M}$-magnitude weighted $L_{\bf 2}$ distortion + $\bf 2$ degrees of freedom''}
(M22) algorithm, a rate-distortion inspired approach to gradient compression
for federated training of deep neural networks (DNNs). In particular, we
propose a family of distortion measures between the original gradient and the
reconstruction we referred to as
$M$-magnitude weighted $L_2$'' distortion,
and we assume that gradient updates follow an i.i.d. distribution --
generalized normal or Weibull, which have two degrees of freedom. In both the
distortion measure and the gradient, there is one free parameter for each that
can be fitted as a function of the iteration number. Given a choice of gradient
distribution and distortion measure, we design the quantizer minimizing the
expected distortion in gradient reconstruction. To measure the gradient
compression performance under a communication constraint, we define the
\emph{per-bit accuracy} as the optimal improvement in accuracy that one bit of
communication brings to the centralized model over the training period. Using
this performance measure, we systematically benchmark the choice of gradient
distribution and distortion measure. We provide substantial insights on the
role of these choices and argue that significant performance improvements can
be attained using such a rate-distortion inspired compressor.
[[2301.09357] Accelerating Fair Federated Learning: Adaptive Federated Adam](http://arxiv.org/abs/2301.09357) #federate
Federated learning is a distributed and privacy-preserving approach to train a statistical model collaboratively from decentralized data of different parties. However, when datasets of participants are not independent and identically distributed (non-IID), models trained by naive federated algorithms may be biased towards certain participants, and model performance across participants is non-uniform. This is known as the fairness problem in federated learning. In this paper, we formulate fairness-controlled federated learning as a dynamical multi-objective optimization problem to ensure fair performance across all participants. To solve the problem efficiently, we study the convergence and bias of Adam as the server optimizer in federated learning, and propose Adaptive Federated Adam (AdaFedAdam) to accelerate fair federated learning with alleviated bias. We validated the effectiveness, Pareto optimality and robustness of AdaFedAdam in numerical experiments and show that AdaFedAdam outperforms existing algorithms, providing better convergence and fairness properties of the federated scheme.
[[2301.08837] New Insights into Multi-Calibration](http://arxiv.org/abs/2301.08837) #fair
We identify a novel connection between the recent literature on multi-group fairness for prediction algorithms and well-established notions of graph regularity from extremal graph theory. We frame our investigation using new, statistical distance-based variants of multi-calibration that are closely related to the concept of outcome indistinguishability. Adopting this perspective leads us naturally not only to our graph theoretic results, but also to new multi-calibration algorithms with improved complexity in certain parameter regimes, and to a generalization of a state-of-the-art result on omniprediction. Along the way, we also unify several prior algorithms for achieving multi-group fairness, as well as their analyses, through the lens of no-regret learning.
[[2301.08987] Tier Balancing: Towards Dynamic Fairness over Underlying Causal Factors](http://arxiv.org/abs/2301.08987) #fair
The pursuit of long-term fairness involves the interplay between decision-making and the underlying data generating process. In this paper, through causal modeling with a directed acyclic graph (DAG) on the decision-distribution interplay, we investigate the possibility of achieving long-term fairness from a dynamic perspective. We propose Tier Balancing, a technically more challenging but more natural notion to achieve in the context of long-term, dynamic fairness analysis. Different from previous fairness notions that are defined purely on observed variables, our notion goes one step further, capturing behind-the-scenes situation changes on the unobserved latent causal factors that directly carry out the influence from the current decision to the future data distribution. Under the specified dynamics, we prove that in general one cannot achieve the long-term fairness goal only through one-step interventions. Furthermore, in the effort of approaching long-term fairness, we consider the mission of "getting closer to" the long-term fairness goal and present possibility and impossibility results accordingly.
[[2301.08843] Towards Flexibility and Interpretability of Gaussian Process State-Space Model](http://arxiv.org/abs/2301.08843) #interpretability
Gaussian process state-space model (GPSSM) has attracted much attention over the past decade. However, the model representation power of GPSSM is far from satisfactory. Most GPSSM works rely on the standard Gaussian process (GP) with a preliminary kernel, such as squared exponential (SE) kernel and Mat\'{e}rn kernel, which limit the model representation power and its application in complex scenarios. To address this issue, this paper proposes a novel class of probabilistic state-space model named TGPSSM that enriches the GP priors in the standard GPSSM through parametric normalizing flow, making the state-space model more flexible and expressive. In addition, by inheriting the advantages of sparse representation of GP models, we propose a scalable and interpretable variational learning algorithm to learn the TGPSSM and infer the latent dynamics simultaneously. By integrating a constrained optimization framework and explicitly constructing a non-Gaussian state variational distribution, the proposed learning algorithm enables the TGPSSM to significantly improve the capabilities of state space representation and model inference. Experimental results based on various synthetic and real datasets corroborate that the proposed TGPSSM yields superior learning and inference performance compared to several state-of-the-art methods. The accompanying source code is available at https://github.com/zhidilin/TGPSSM.
[[2301.09230] Deterministic Online Classification: Non-iteratively Reweighted Recursive Least-Squares for Binary Class Rebalancing](http://arxiv.org/abs/2301.09230) #interpretability
Deterministic solutions are becoming more critical for interpretability. Weighted Least-Squares (WLS) has been widely used as a deterministic batch solution with a specific weight design. In the online settings of WLS, exact reweighting is necessary to converge to its batch settings. In order to comply with its necessity, the iteratively reweighted least-squares algorithm is mainly utilized with a linearly growing time complexity which is not attractive for online learning. Due to the high and growing computational costs, an efficient online formulation of reweighted least-squares is desired. We introduce a new deterministic online classification algorithm of WLS with a constant time complexity for binary class rebalancing. We demonstrate that our proposed online formulation exactly converges to its batch formulation and outperforms existing state-of-the-art stochastic online binary classification algorithms in real-world data sets empirically.
[[2301.09420] On Multi-Agent Deep Deterministic Policy Gradients and their Explainability for SMARTS Environment](http://arxiv.org/abs/2301.09420) #explainability
Multi-Agent RL or MARL is one of the complex problems in Autonomous Driving literature that hampers the release of fully-autonomous vehicles today. Several simulators have been in iteration after their inception to mitigate the problem of complex scenarios with multiple agents in Autonomous Driving. One such simulator--SMARTS, discusses the importance of cooperative multi-agent learning. For this problem, we discuss two approaches--MAPPO and MADDPG, which are based on-policy and off-policy RL approaches. We compare our results with the state-of-the-art results for this challenge and discuss the potential areas of improvement while discussing the explainability of these approaches in conjunction with waypoints in the SMARTS environment.
[[2301.08912] Rationalization for Explainable NLP: A Survey](http://arxiv.org/abs/2301.08912) #explainability
Recent advances in deep learning have improved the performance of many Natural Language Processing (NLP) tasks such as translation, question-answering, and text classification. However, this improvement comes at the expense of model explainability. Black-box models make it difficult to understand the internals of a system and the process it takes to arrive at an output. Numerical (LIME, Shapley) and visualization (saliency heatmap) explainability techniques are helpful; however, they are insufficient because they require specialized knowledge. These factors led rationalization to emerge as a more accessible explainable technique in NLP. Rationalization justifies a model's output by providing a natural language explanation (rationale). Recent improvements in natural language generation have made rationalization an attractive technique because it is intuitive, human-comprehensible, and accessible to non-technical users. Since rationalization is a relatively new field, it is disorganized. As the first survey, rationalization literature in NLP from 2007-2022 is analyzed. This survey presents available methods, explainable evaluations, code, and datasets used across various NLP tasks that use rationalization. Further, a new subfield in Explainable AI (XAI), namely, Rational AI (RAI), is introduced to advance the current state of rationalization. A discussion on observed insights, challenges, and future directions is provided to point to promising research opportunities.
[[2301.08914] ExClaim: Explainable Neural Claim Verification Using Rationalization](http://arxiv.org/abs/2301.08914) #explainability
With the advent of deep learning, text generation language models have improved dramatically, with text at a similar level as human-written text. This can lead to rampant misinformation because content can now be created cheaply and distributed quickly. Automated claim verification methods exist to validate claims, but they lack foundational data and often use mainstream news as evidence sources that are strongly biased towards a specific agenda. Current claim verification methods use deep neural network models and complex algorithms for a high classification accuracy but it is at the expense of model explainability. The models are black-boxes and their decision-making process and the steps it took to arrive at a final prediction are obfuscated from the user. We introduce a novel claim verification approach, namely: ExClaim, that attempts to provide an explainable claim verification system with foundational evidence. Inspired by the legal system, ExClaim leverages rationalization to provide a verdict for the claim and justifies the verdict through a natural language explanation (rationale) to describe the model's decision-making process. ExClaim treats the verdict classification task as a question-answer problem and achieves a performance of 0.93 F1 score. It provides subtasks explanations to also justify the intermediate outcomes. Statistical and Explainable AI (XAI) evaluations are conducted to ensure valid and trustworthy outcomes. Ensuring claim verification systems are assured, rational, and explainable is an essential step toward improving Human-AI trust and the accessibility of black-box systems.
[[2301.09042] The Shape of Explanations: A Topological Account of Rule-Based Explanations in Machine Learning](http://arxiv.org/abs/2301.09042) #explainability
Rule-based explanations provide simple reasons explaining the behavior of machine learning classifiers at given points in the feature space. Several recent methods (Anchors, LORE, etc.) purport to generate rule-based explanations for arbitrary or black-box classifiers. But what makes these methods work in general? We introduce a topological framework for rule-based explanation methods and provide a characterization of explainability in terms of the definability of a classifier relative to an explanation scheme. We employ this framework to consider various explanation schemes and argue that the preferred scheme depends on how much the user knows about the domain and the probability measure over the feature space.
[[2301.09430] RainDiffusion:When Unsupervised Learning Meets Diffusion Models for Real-world Image Deraining](http://arxiv.org/abs/2301.09430) #diffusion
What will happen when unsupervised learning meets diffusion models for real-world image deraining? To answer it, we propose RainDiffusion, the first unsupervised image deraining paradigm based on diffusion models. Beyond the traditional unsupervised wisdom of image deraining, RainDiffusion introduces stable training of unpaired real-world data instead of weakly adversarial training. RainDiffusion consists of two cooperative branches: Non-diffusive Translation Branch (NTB) and Diffusive Translation Branch (DTB). NTB exploits a cycle-consistent architecture to bypass the difficulty in unpaired training of standard diffusion models by generating initial clean/rainy image pairs. DTB leverages two conditional diffusion modules to progressively refine the desired output with initial image pairs and diffusive generative prior, to obtain a better generalization ability of deraining and rain generation. Rain-Diffusion is a non adversarial training paradigm, serving as a new standard bar for real-world image deraining. Extensive experiments confirm the superiority of our RainDiffusion over un/semi-supervised methods and show its competitive advantages over fully-supervised ones.
[[2301.09515] StyleGAN-T: Unlocking the Power of GANs for Fast Large-Scale Text-to-Image Synthesis](http://arxiv.org/abs/2301.09515) #diffusion
Text-to-image synthesis has recently seen significant progress thanks to large pretrained language models, large-scale training data, and the introduction of scalable model families such as diffusion and autoregressive models. However, the best-performing models require iterative evaluation to generate a single sample. In contrast, generative adversarial networks (GANs) only need a single forward pass. They are thus much faster, but they currently remain far behind the state-of-the-art in large-scale text-to-image synthesis. This paper aims to identify the necessary steps to regain competitiveness. Our proposed model, StyleGAN-T, addresses the specific requirements of large-scale text-to-image synthesis, such as large capacity, stable training on diverse datasets, strong text alignment, and controllable variation vs. text alignment tradeoff. StyleGAN-T significantly improves over previous GANs and outperforms distilled diffusion models - the previous state-of-the-art in fast text-to-image synthesis - in terms of sample quality and speed.
[[2301.09428] Explaining the effects of non-convergent sampling in the training of Energy-Based Models](http://arxiv.org/abs/2301.09428) #diffusion
In this paper, we quantify the impact of using non-convergent Markov chains to train Energy-Based models (EBMs). In particular, we show analytically that EBMs trained with non-persistent short runs to estimate the gradient can perfectly reproduce a set of empirical statistics of the data, not at the level of the equilibrium measure, but through a precise dynamical process. Our results provide a first-principles explanation for the observations of recent works proposing the strategy of using short runs starting from random initial conditions as an efficient way to generate high-quality samples in EBMs, and lay the groundwork for using EBMs as diffusion models. After explaining this effect in generic EBMs, we analyze two solvable models in which the effect of the non-convergent sampling in the trained parameters can be described in detail. Finally, we test these predictions numerically on the Boltzmann machine.
[[2301.09474] DIFFormer: Scalable (Graph) Transformers Induced by Energy Constrained Diffusion](http://arxiv.org/abs/2301.09474) #diffusion
Real-world data generation often involves complex inter-dependencies among instances, violating the IID-data hypothesis of standard learning paradigms and posing a challenge for uncovering the geometric structures for learning desired instance representations. To this end, we introduce an energy constrained diffusion model which encodes a batch of instances from a dataset into evolutionary states that progressively incorporate other instances' information by their interactions. The diffusion process is constrained by descent criteria w.r.t.~a principled energy function that characterizes the global consistency of instance representations over latent structures. We provide rigorous theory that implies closed-form optimal estimates for the pairwise diffusion strength among arbitrary instance pairs, which gives rise to a new class of neural encoders, dubbed as DIFFormer (diffusion-based Transformers), with two instantiations: a simple version with linear complexity for prohibitive instance numbers, and an advanced version for learning complex structures. Experiments highlight the wide applicability of our model as a general-purpose encoder backbone with superior performance in various tasks, such as node classification on large graphs, semi-supervised image/text classification, and spatial-temporal dynamics prediction.