[[2209.07138] Self-Healing Secure Blockchain Framework in Microgrids](http://arxiv.org/abs/2209.07138)
Blockchain has recently been depicted as a secure protocol for information exchange in cyber-physical microgrids. However, it is still found vulnerable against consensus-manipulation attacks. These stealth attacks are often difficult to detect as they use kernel-level access to mask their actions. In this paper, we firstly build a trusted and secured peer-to-peer network mechanism for physical DC microgrids' validation of transactions over Distributed Ledger. Secondly, we leverage from a physics-informed approach for detecting malware-infected nodes and then recovering from these stealth attacks using a self-healing recovery scheme augmented into the microgrid Blockchain network. This scheme allows compromised nodes to adapt to a reconstructed trustworthy signal in a multi-hop manner using corresponding measurements from the reliable nodes in the network. This supplements the capabilities of Blockchain enabling it to detect and mitigate consensus manipulation attempts.
[[2209.07215] ProAPT: Projection of APT Threats with Deep Reinforcement Learning](http://arxiv.org/abs/2209.07215)
The highest level in the Endsley situation awareness model is called projection when the status of elements in the environment in the near future is predicted. In cybersecurity situation awareness, the projection for an Advanced Persistent Threat (APT) requires predicting the next step of the APT. The threats are constantly changing and becoming more complex. As supervised and unsupervised learning methods require APT datasets for projecting the next step of APTs, they are unable to identify unknown APT threats. In reinforcement learning methods, the agent interacts with the environment, and so it might project the next step of known and unknown APTs. So far, reinforcement learning has not been used to project the next step for APTs. In reinforcement learning, the agent uses the previous states and actions to approximate the best action of the current state. When the number of states and actions is abundant, the agent employs a neural network which is called deep learning to approximate the best action of each state. In this paper, we present a deep reinforcement learning system to project the next step of APTs. As there exists some relation between attack steps, we employ the Long- Short-Term Memory (LSTM) method to approximate the best action of each state. In our proposed system, based on the current situation, we project the next steps of APT threats.
[[2209.07124] How Much Does It Cost to Train a Machine Learning Model over Distributed Data Sources?](http://arxiv.org/abs/2209.07124)
Federated learning (FL) is one of the most appealing alternatives to the standard centralized learning paradigm, allowing heterogeneous set of devices to train a machine learning model without sharing their raw data. However, FL requires a central server to coordinate the learning process, thus introducing potential scalability and security issues. In the literature, server-less FL approaches like gossip federated learning (GFL) and blockchain-enabled federated learning (BFL) have been proposed to mitigate these issues. In this work, we propose a complete overview of these three techniques proposing a comparison according to an integral set of performance indicators, including model accuracy, time complexity, communication overhead, convergence time and energy consumption. An extensive simulation campaign permits to draw a quantitative analysis. In particular, GFL is able to save the 18% of training time, the 68% of energy and the 51% of data to be shared with respect to the CFL solution, but it is not able to reach the level of accuracy of CFL. On the other hand, BFL represents a viable solution for implementing decentralized learning with a higher level of security, at the cost of an extra energy usage and data sharing. Finally, we identify open issues on the two decentralized federated learning implementations and provide insights on potential extensions and possible research directions on this new research field.
[[2209.07341] CLIPping Privacy: Identity Inference Attacks on Multi-Modal Machine Learning Models](http://arxiv.org/abs/2209.07341)
As deep learning is now used in many real-world applications, research has focused increasingly on the privacy of deep learning models and how to prevent attackers from obtaining sensitive information about the training data. However, image-text models like CLIP have not yet been looked at in the context of privacy attacks. While membership inference attacks aim to tell whether a specific data point was used for training, we introduce a new type of privacy attack, named identity inference attack (IDIA), designed for multi-modal image-text models like CLIP. Using IDIAs, an attacker can reveal whether a particular person, was part of the training data by querying the model in a black-box fashion with different images of the same person. Letting the model choose from a wide variety of possible text labels, the attacker can probe the model whether it recognizes the person and, therefore, was used for training. Through several experiments on CLIP, we show that the attacker can identify individuals used for training with very high accuracy and that the model learns to connect the names with the depicted people. Our experiments show that a multi-modal image-text model indeed leaks sensitive information about its training data and, therefore, should be handled with care.
[[2209.06955] Adversarial Correctness and Privacy for Probabilistic Data Structures](http://arxiv.org/abs/2209.06955)
We study the security of Probabilistic Data Structures (PDS) for handling Approximate Membership Queries (AMQ); prominent examples of AMQ-PDS are Bloom and Cuckoo filters. AMQ-PDS are increasingly being deployed in environments where adversaries can gain benefit from carefully selecting inputs, for example to increase the false positive rate of an AMQ-PDS. They are also being used in settings where the inputs are sensitive and should remain private in the face of adversaries who can access an AMQ-PDS through an API or who can learn its internal state by compromising the system running the AMQ-PDS.
We develop simulation-based security definitions that speak to correctness and privacy of AMQ-PDS. Our definitions are general and apply to a broad range of adversarial settings. We use our definitions to analyse the behaviour of both Bloom filters and insertion-only Cuckoo filters. We show that these AMQ-PDS can be provably protected through replacement or composition of hash functions with keyed pseudorandom functions in their construction. We also examine the practical impact on storage size and computation of providing secure instances of Bloom and insertion-only Cuckoo filters.
[[2209.07064] SecSkyline: Fast Privacy-Preserving Skyline Queries over Encrypted Cloud Databases](http://arxiv.org/abs/2209.07064)
The well-known benefits of cloud computing have spurred the popularity of database service outsourcing, where one can resort to the cloud to conveniently store and query databases. Coming with such popular trend is the threat to data privacy, as the cloud gains access to the databases and queries which may contain sensitive information, like medical or financial data. A large body of work has been presented for querying encrypted databases, which has been mostly focused on secure keyword search. In this paper, we instead focus on the support for secure skyline query processing over encrypted outsourced databases, where little work has been done. Skyline query is an advanced kind of database query which is important for multi-criteria decision-making systems and applications. We propose SecSkyline, a new system framework building on lightweight cryptography for fast privacy-preserving skyline queries. SecSkyline ambitiously provides strong protection for not only the content confidentiality of the outsourced database, the query, and the result, but also for data patterns that may incur indirect data leakages, such as dominance relationships among data points and search access patterns. Extensive experiments demonstrate that SecSkyline is substantially superior to the state-of-the-art in query latency, with up to 813$\times$ improvement.
[[2209.07303] Differentially Private Estimation of Hawkes Process](http://arxiv.org/abs/2209.07303)
Point process models are of great importance in real world applications. In certain critical applications, estimation of point process models involves large amounts of sensitive personal data from users. Privacy concerns naturally arise which have not been addressed in the existing literature. To bridge this glaring gap, we propose the first general differentially private estimation procedure for point process models. Specifically, we take the Hawkes process as an example, and introduce a rigorous definition of differential privacy for event stream data based on a discretized representation of the Hawkes process. We then propose two differentially private optimization algorithms, which can efficiently estimate Hawkes process models with the desired privacy and utility guarantees under two different settings. Experiments are provided to back up our theoretical analysis.
[[2209.07403] Private Stochastic Optimization in the Presence of Outliers: Optimal Rates for (Non-Smooth) Convex Losses and Extension to Non-Convex Losses](http://arxiv.org/abs/2209.07403)
We study differentially private (DP) stochastic optimization (SO) with data containing outliers and loss functions that are not Lipschitz continuous. To date, the vast majority of work on DP SO assumes that the loss is Lipschitz (i.e. stochastic gradients are uniformly bounded), and their error bounds scale with the Lipschitz parameter of the loss. While this assumption is convenient, it is often unrealistic: in many practical problems where privacy is required, data may contain outliers or be unbounded, causing some stochastic gradients to have large norm. In such cases, the Lipschitz parameter may be prohibitively large, leading to vacuous excess risk bounds. Thus, building on a recent line of work [WXDX20, KLZ22], we make the weaker assumption that stochastic gradients have bounded $k$-th moments for some $k \geq 2$. Compared with works on DP Lipschitz SO, our excess risk scales with the $k$-th moment bound instead of the Lipschitz parameter of the loss, allowing for significantly faster rates in the presence of outliers. For convex and strongly convex loss functions, we provide the first asymptotically optimal excess risk bounds (up to a logarithmic factor). Moreover, in contrast to the prior works [WXDX20, KLZ22], our bounds do not require the loss function to be differentiable/smooth. We also devise an accelerated algorithm that runs in linear time and yields improved (compared to prior works) and nearly optimal excess risk for smooth losses. Additionally, our work is the first to address non-convex non-Lipschitz loss functions satisfying the Proximal-PL inequality; this covers some classes of neural nets, among other practical models. Our Proximal-PL algorithm has nearly optimal excess risk that almost matches the strongly convex lower bound. Lastly, we provide shuffle DP variations of our algorithms, which do not require a trusted curator (e.g. for distributed learning).
[[2209.07076] Responsible AI Implementation: A Human-centered Framework for Accelerating the Innovation Process](http://arxiv.org/abs/2209.07076)
There is still a significant gap between expectations and the successful adoption of AI to innovate and improve businesses. Due to the emergence of deep learning, AI adoption is more complex as it often incorporates big data and the internet of things, affecting data privacy. Existing frameworks have identified the need to focus on human-centered design, combining technical and business/organizational perspectives. However, trust remains a critical issue that needs to be designed from the beginning. The proposed framework expands from the human-centered design approach, emphasizing and maintaining the trust that underpins the process. This paper proposes a theoretical framework for responsible artificial intelligence (AI) implementation. The proposed framework emphasizes a synergistic business technology approach for the agile co-creation process. The aim is to streamline the adoption process of AI to innovate and improve business by involving all stakeholders throughout the project so that the AI technology is designed, developed, and deployed in conjunction with people and not in isolation. The framework presents a fresh viewpoint on responsible AI implementation based on analytical literature review, conceptual framework design, and practitioners' mediating expertise. The framework emphasizes establishing and maintaining trust throughout the human-centered design and agile development of AI. This human-centered approach is aligned with and enabled by the privacy by design principle. The creators of the technology and the end-users are working together to tailor the AI solution specifically for the business requirements and human characteristics. An illustrative case study on adopting AI for assisting planning in a hospital will demonstrate that the proposed framework applies to real-life applications.
[[2209.07116] Decentralized Learning with Separable Data: Generalization and Fast Algorithms](http://arxiv.org/abs/2209.07116)
Decentralized learning offers privacy and communication efficiency when data are naturally distributed among agents communicating over an underlying graph. Motivated by overparameterized learning settings, in which models are trained to zero training loss, we study algorithmic and generalization properties of decentralized learning with gradient descent on separable data. Specifically, for decentralized gradient descent (DGD) and a variety of loss functions that asymptote to zero at infinity (including exponential and logistic losses), we derive novel finite-time generalization bounds. This complements a long line of recent work that studies the generalization performance and the implicit bias of gradient descent over separable data, but has thus far been limited to centralized learning scenarios. Notably, our generalization bounds match in order their centralized counterparts. Critical behind this, and of independent interest, is establishing novel bounds on the training loss and the rate-of-consensus of DGD for a class of self-bounded losses. Finally, on the algorithmic front, we design improved gradient-based routines for decentralized learning with separable data and empirically demonstrate orders-of-magnitude of speed-up in terms of both training and generalization performance.
[[2209.07446] Efficiency Ordering of Stochastic Gradient Descent](http://arxiv.org/abs/2209.07446)
We consider the stochastic gradient descent (SGD) algorithm driven by a general stochastic sequence, including i.i.d noise and random walk on an arbitrary graph, among others; and analyze it in the asymptotic sense. Specifically, we employ the notion of `efficiency ordering', a well-analyzed tool for comparing the performance of Markov Chain Monte Carlo (MCMC) samplers, for SGD algorithms in the form of Loewner ordering of covariance matrices associated with the scaled iterate errors in the long term. Using this ordering, we show that input sequences that are more efficient for MCMC sampling also lead to smaller covariance of the errors for SGD algorithms in the limit. This also suggests that an arbitrarily weighted MSE of SGD iterates in the limit becomes smaller when driven by more efficient chains. Our finding is of particular interest in applications such as decentralized optimization and swarm learning, where SGD is implemented in a random walk fashion on the underlying communication graph for cost issues and/or data privacy. We demonstrate how certain non-Markovian processes, for which typical mixing-time based non-asymptotic bounds are intractable, can outperform their Markovian counterparts in the sense of efficiency ordering for SGD. We show the utility of our method by applying it to gradient descent with shuffling and mini-batch gradient descent, reaffirming key results from existing literature under a unified framework. Empirically, we also observe efficiency ordering for variants of SGD such as accelerated SGD and Adam, open up the possibility of extending our notion of efficiency ordering to a broader family of stochastic optimization algorithms.
[[2209.07125] BadRes: Reveal the Backdoors through Residual Connection](http://arxiv.org/abs/2209.07125)
Generally, residual connections are indispensable network components in building CNNs and Transformers for various downstream tasks in CV and VL, which encourages skip shortcuts between network blocks. However, the layer-by-layer loopback residual connections may also hurt the model's robustness by allowing unsuspecting input. In this paper, we proposed a simple yet strong backdoor attack method - BadRes, where the residual connections play as a turnstile to be deterministic on clean inputs while unpredictable on poisoned ones. We have performed empirical evaluations on four datasets with ViT and BEiT models, and the BadRes achieves 97% attack success rate while receiving zero performance degradation on clean data. Moreover, we analyze BadRes with state-of-the-art defense methods and reveal the fundamental weakness lying in residual connections.
[[2209.07491] Defending Root DNS Servers Against DDoS Using Layered Defenses](http://arxiv.org/abs/2209.07491)
Distributed Denial-of-Service (DDoS) attacks exhaust resources, leaving a server unavailable to legitimate clients. The Domain Name System (DNS) is a frequent target of DDoS attacks. Since DNS is a critical infrastructure service, protecting it from DoS is imperative. Many prior approaches have focused on specific filters or anti-spoofing techniques to protect generic services. DNS root nameservers are more challenging to protect, since they use fixed IP addresses, serve very diverse clients and requests, receive predominantly UDP traffic that can be spoofed, and must guarantee high quality of service. In this paper we propose a layered DDoS defense for DNS root nameservers. Our defense uses a library of defensive filters, which can be optimized for different attack types, with different levels of selectivity. We further propose a method that automatically and continuously evaluates and selects the best combination of filters throughout the attack. We show that this layered defense approach provides exceptional protection against all attack types using traces of ten real attacks from a DNS root nameserver. Our automated system can select the best defense within seconds and quickly reduces traffic to the server within a manageable range, while keeping collateral damage lower than 2%. We can handle millions of filtering rules without noticeable operational overhead.
[[2209.06827] Weakly Supervised Invariant Representation Learning Via Disentangling Known and Unknown Nuisance Factors](http://arxiv.org/abs/2209.06827)
Disentangled and invariant representations are two critical goals of representation learning and many approaches have been proposed to achieve either one of them. However, those two goals are actually complementary to each other so that we propose a framework to accomplish both of them simultaneously. We introduce a weakly supervised signal to learn disentangled representation which consists of three splits containing predictive, known nuisance and unknown nuisance information respectively. Furthermore, we incorporate contrastive method to enforce representation invariance. Experiments shows that the proposed method outperforms state-of-the-art (SOTA) methods on four standard benchmarks and shows that the proposed method can have better adversarial defense ability comparing to other methods without adversarial training.
[[2209.06971] PointACL:Adversarial Contrastive Learning for Robust Point Clouds Representation under Adversarial Attack](http://arxiv.org/abs/2209.06971)
Despite recent success of self-supervised based contrastive learning model for 3D point clouds representation, the adversarial robustness of such pre-trained models raised concerns. Adversarial contrastive learning (ACL) is considered an effective way to improve the robustness of pre-trained models. In contrastive learning, the projector is considered an effective component for removing unnecessary feature information during contrastive pretraining and most ACL works also use contrastive loss with projected feature representations to generate adversarial examples in pretraining, while "unprojected " feature representations are used in generating adversarial inputs during inference.Because of the distribution gap between projected and "unprojected" features, their models are constrained of obtaining robust feature representations for downstream tasks. We introduce a new method to generate high-quality 3D adversarial examples for adversarial training by utilizing virtual adversarial loss with "unprojected" feature representations in contrastive learning framework. We present our robust aware loss function to train self-supervised contrastive learning framework adversarially. Furthermore, we find selecting high difference points with the Difference of Normal (DoN) operator as additional input for adversarial self-supervised contrastive learning can significantly improve the adversarial robustness of the pre-trained model. We validate our method, PointACL on downstream tasks, including 3D classification and 3D segmentation with multiple datasets. It obtains comparable robust accuracy over state-of-the-art contrastive adversarial learning methods.
[[2209.06823] DEANet: Decomposition Enhancement and Adjustment Network for Low-Light Image Enhancement](http://arxiv.org/abs/2209.06823)
Images obtained under low-light conditions will seriously affect the quality of the images. Solving the problem of poor low-light image quality can effectively improve the visual quality of images and better improve the usability of computer vision. In addition, it has very important applications in many fields. This paper proposes a DEANet based on Retinex for low-light image enhancement. It combines the frequency information and content information of the image into three sub-networks: decomposition network, enhancement network and adjustment network. These three sub-networks are respectively used for decomposition, denoising, contrast enhancement and detail preservation, adjustment, and image generation. Our model has good robust results for all low-light images. The model is trained on the public data set LOL, and the experimental results show that our method is better than the existing state-of-the-art methods in terms of vision and quality.
[[2209.06861] Landmark-free Statistical Shape Modeling via Neural Flow Deformations](http://arxiv.org/abs/2209.06861)
Statistical shape modeling aims at capturing shape variations of an anatomical structure that occur within a given population. Shape models are employed in many tasks, such as shape reconstruction and image segmentation, but also shape generation and classification. Existing shape priors either require dense correspondence between training examples or lack robustness and topological guarantees. We present FlowSSM, a novel shape modeling approach that learns shape variability without requiring dense correspondence between training instances. It relies on a hierarchy of continuous deformation flows, which are parametrized by a neural network. Our model outperforms state-of-the-art methods in providing an expressive and robust shape prior for distal femur and liver. We show that the emerging latent representation is discriminative by separating healthy from pathological shapes. Ultimately, we demonstrate its effectiveness on two shape reconstruction tasks from partial data. Our source code is publicly available (https://github.com/davecasp/flowssm).
[[2209.06953] On the interplay of adversarial robustness and architecture components: patches, convolution and attention](http://arxiv.org/abs/2209.06953)
In recent years novel architecture components for image classification have been developed, starting with attention and patches used in transformers. While prior works have analyzed the influence of some aspects of architecture components on the robustness to adversarial attacks, in particular for vision transformers, the understanding of the main factors is still limited. We compare several (non)-robust classifiers with different architectures and study their properties, including the effect of adversarial training on the interpretability of the learnt features and robustness to unseen threat models. An ablation from ResNet to ConvNeXt reveals key architectural changes leading to almost $10\%$ higher $\ell_\infty$-robustness.
[[2209.06954] Finetuning Pretrained Vision-Language Models with Correlation Information Bottleneck for Robust Visual Question Answering](http://arxiv.org/abs/2209.06954)
Benefiting from large-scale Pretrained Vision-Language Models (VL-PMs), the performance of Visual Question Answering (VQA) has started to approach human oracle performance. However, finetuning large-scale VL-PMs with limited data for VQA usually faces overfitting and poor generalization issues, leading to a lack of robustness. In this paper, we aim to improve the robustness of VQA systems (ie, the ability of the systems to defend against input variations and human-adversarial attacks) from the perspective of Information Bottleneck when finetuning VL-PMs for VQA. Generally, internal representations obtained by VL-PMs inevitably contain irrelevant and redundant information for the downstream VQA task, resulting in statistically spurious correlations and insensitivity to input variations. To encourage representations to converge to a minimal sufficient statistic in vision-language learning, we propose the Correlation Information Bottleneck (CIB) principle, which seeks a tradeoff between representation compression and redundancy by minimizing the mutual information (MI) between the inputs and internal representations while maximizing the MI between the outputs and the representations. Meanwhile, CIB measures the internal correlations among visual and linguistic inputs and representations by a symmetrized joint MI estimation. Extensive experiments on five VQA benchmarks of input robustness and two VQA benchmarks of human-adversarial robustness demonstrate the effectiveness and superiority of the proposed CIB in improving the robustness of VQA systems.
[[2209.07005] Self-Supervised Texture Image Anomaly Detection By Fusing Normalizing Flow and Dictionary Learning](http://arxiv.org/abs/2209.07005)
A common study area in anomaly identification is industrial images anomaly detection based on texture background. The interference of texture images and the minuteness of texture anomalies are the main reasons why many existing models fail to detect anomalies. We propose a strategy for anomaly detection that combines dictionary learning and normalizing flow based on the aforementioned questions. The two-stage anomaly detection approach already in use is enhanced by our method. In order to improve baseline method, this research add normalizing flow in representation learning and combines deep learning and dictionary learning. Improved algorithms have exceeded 95$\%$ detection accuracy on all MVTec AD texture type data after experimental validation. It shows strong robustness. The baseline method's detection accuracy for the Carpet data was 67.9%. The article was upgraded, raising the detection accuracy to 99.7%.
[[2209.07026] Can We Solve 3D Vision Tasks Starting from A 2D Vision Transformer?](http://arxiv.org/abs/2209.07026)
Vision Transformers (ViTs) have proven to be effective, in solving 2D image understanding tasks by training over large-scale image datasets; and meanwhile as a somehow separate track, in modeling the 3D visual world too such as voxels or point clouds. However, with the growing hope that transformers can become the "universal" modeling tool for heterogeneous data, ViTs for 2D and 3D tasks have so far adopted vastly different architecture designs that are hardly transferable. That invites an (over-)ambitious question: can we close the gap between the 2D and 3D ViT architectures? As a piloting study, this paper demonstrates the appealing promise to understand the 3D visual world, using a standard 2D ViT architecture, with only minimal customization at the input and output levels without redesigning the pipeline. To build a 3D ViT from its 2D sibling, we "inflate" the patch embedding and token sequence, accompanied with new positional encoding mechanisms designed to match the 3D data geometry. The resultant "minimalist" 3D ViT, named Simple3D-Former, performs surprisingly robustly on popular 3D tasks such as object classification, point cloud segmentation and indoor scene detection, compared to highly customized 3D-specific designs. It can hence act as a strong baseline for new 3D ViTs. Moreover, we note that pursing a unified 2D-3D ViT design has practical relevance besides just scientific curiosity. Specifically, we demonstrate that Simple3D-Former naturally enables to exploit the wealth of pre-trained weights from large-scale realistic 2D images (e.g., ImageNet), which can be plugged in to enhancing the 3D task performance "for free".
[[2209.07061] PROB-SLAM: Real-time Visual SLAM Based on Probabilistic Graph Optimization](http://arxiv.org/abs/2209.07061)
Traditional SLAM algorithms are typically based on artificial features, which lack high-level information. By introducing semantic information, SLAM can own higher stability and robustness rather than purely hand-crafted features. However, the high uncertainty of semantic detection networks prohibits the practical functionality of high-level information. To solve the uncertainty property introduced by semantics, this paper proposed a novel probability map based on the Gaussian distribution assumption. This map transforms the semantic binary object detection into probability results, which help establish a probabilistic data association between artificial features and semantic info. Through our algorithm, the higher confidence will be given higher weights in each update step while the edge of the detection area will be endowed with lower confidence. Then the uncertainty is undermined and has less effect on nonlinear optimization. The experiments are carried out in the TUM RGBD dataset, results show that our system improves ORB-SLAM2 by about 15% in indoor environments' errors. We have demonstrated that the method can be successfully applied to environments containing dynamic objects.
[[2209.07220] Face Shape-Guided Deep Feature Alignment for Face Recognition Robust to Face Misalignment](http://arxiv.org/abs/2209.07220)
For the past decades, face recognition (FR) has been actively studied in computer vision and pattern recognition society. Recently, due to the advances in deep learning, the FR technology shows high performance for most of the benchmark datasets. However, when the FR algorithm is applied to a real-world scenario, the performance has been known to be still unsatisfactory. This is mainly attributed to the mismatch between training and testing sets. Among such mismatches, face misalignment between training and testing faces is one of the factors that hinder successful FR. To address this limitation, we propose a face shape-guided deep feature alignment framework for FR robust to the face misalignment. Based on a face shape prior (e.g., face keypoints), we train the proposed deep network by introducing alignment processes, i.e., pixel and feature alignments, between well-aligned and misaligned face images. Through the pixel alignment process that decodes the aggregated feature extracted from a face image and face shape prior, we add the auxiliary task to reconstruct the well-aligned face image. Since the aggregated features are linked to the face feature extraction network as a guide via the feature alignment process, we train the robust face feature to the face misalignment. Even if the face shape estimation is required in the training stage, the additional face alignment process, which is usually incorporated in the conventional FR pipeline, is not necessarily needed in the testing phase. Through the comparative experiments, we validate the effectiveness of the proposed method for the face misalignment with the FR datasets.
[[2209.07237] Robust Implementation of Foreground Extraction and Vessel Segmentation for X-ray Coronary Angiography Image Sequence](http://arxiv.org/abs/2209.07237)
The extraction of contrast-filled vessels from X-ray coronary angiography(XCA) image sequence has important clinical significance for intuitively diagnosis and therapy. In this study, XCA image sequence O is regarded as a three-dimensional tensor input, vessel layer H is a sparse tensor, and background layer B is a low-rank tensor. Using tensor nuclear norm(TNN) minimization, a novel method for vessel layer extraction based on tensor robust principal component analysis(TRPCA) is proposed. Furthermore, considering the irregular movement of vessels and the dynamic interference of surrounding irrelevant tissues, the total variation(TV) regularized spatial-temporal constraint is introduced to separate the dynamic background E. Subsequently, for the vessel images with uneven contrast distribution, a two-stage region growth(TSRG) method is utilized for vessel enhancement and segmentation. A global threshold segmentation is used as the pre-processing to obtain the main branch, and the Radon-Like features(RLF) filter is used to enhance and connect broken minor segments, the final vessel mask is constructed by combining the two intermediate results. We evaluated the visibility of TV-TRPCA algorithm for foreground extraction and the accuracy of TSRG algorithm for vessel segmentation on real clinical XCA image sequences and third-party database. Both qualitative and quantitative results verify the superiority of the proposed methods over the existing state-of-the-art approaches.
[[2209.07399] A Light Recipe to Train Robust Vision Transformers](http://arxiv.org/abs/2209.07399)
In this paper, we ask whether Vision Transformers (ViTs) can serve as an underlying architecture for improving the adversarial robustness of machine learning models against evasion attacks. While earlier works have focused on improving Convolutional Neural Networks, we show that also ViTs are highly suitable for adversarial training to achieve competitive performance. We achieve this objective using a custom adversarial training recipe, discovered using rigorous ablation studies on a subset of the ImageNet dataset. The canonical training recipe for ViTs recommends strong data augmentation, in part to compensate for the lack of vision inductive bias of attention modules, when compared to convolutions. We show that this recipe achieves suboptimal performance when used for adversarial training. In contrast, we find that omitting all heavy data augmentation, and adding some additional bag-of-tricks ($\varepsilon$-warmup and larger weight decay), significantly boosts the performance of robust ViTs. We show that our recipe generalizes to different classes of ViT architectures and large-scale models on full ImageNet-1k. Additionally, investigating the reasons for the robustness of our models, we show that it is easier to generate strong attacks during training when using our recipe and that this leads to better robustness at test time. Finally, we further study one consequence of adversarial training by proposing a way to quantify the semantic nature of adversarial perturbations and highlight its correlation with the robustness of the model. Overall, we recommend that the community should avoid translating the canonical training recipes in ViTs to robust training and rethink common training choices in the context of adversarial training.
[[2209.07419] FFPA-Net: Efficient Feature Fusion with Projection Awareness for 3D Object Detection](http://arxiv.org/abs/2209.07419)
Promising complementarity exists between the texture features of color images and the geometric information of LiDAR point clouds. However, there still present many challenges for efficient and robust feature fusion in the field of 3D object detection. In this paper, first, unstructured 3D point clouds are filled in the 2D plane and 3D point cloud features are extracted faster using projection-aware convolution layers. Further, the corresponding indexes between different sensor signals are established in advance in the data preprocessing, which enables faster cross-modal feature fusion. To address LiDAR points and image pixels misalignment problems, two new plug-and-play fusion modules, LiCamFuse and BiLiCamFuse, are proposed. In LiCamFuse, soft query weights with perceiving the Euclidean distance of bimodal features are proposed. In BiLiCamFuse, the fusion module with dual attention is proposed to deeply correlate the geometric and textural features of the scene. The quantitative results on the KITTI dataset demonstrate that the proposed method achieves better feature-level fusion. In addition, the proposed network shows a shorter running time compared to existing methods.
[[2209.07518] Distribution Aware Metrics for Conditional Natural Language Generation](http://arxiv.org/abs/2209.07518)
Traditional automated metrics for evaluating conditional natural language generation use pairwise comparisons between a single generated text and the best-matching gold-standard ground truth text. When multiple ground truths are available, scores are aggregated using an average or max operation across references. While this approach works well when diversity in the ground truth data (i.e. dispersion of the distribution of conditional texts) can be ascribed to noise, such as in automated speech recognition, it does not allow for robust evaluation in the case where diversity in the ground truths represents signal for the model. In this work we argue that existing metrics are not appropriate for domains such as visual description or summarization where ground truths are semantically diverse, and where the diversity in those captions captures useful additional information about the context. We propose a novel paradigm for multi-candidate evaluation of conditional language generation models, and a new family of metrics that compare the distributions of reference and model-generated caption sets using small sample sets of each. We demonstrate the utility of our approach with a case study in visual description: where we show that existing models optimize for single-description quality over diversity, and gain some insights into how sampling methods and temperature impact description quality and diversity.
[[2209.06946] Robust Product Classification with Instance-Dependent Noise](http://arxiv.org/abs/2209.06946)
Noisy labels in large E-commerce product data (i.e., product items are placed into incorrect categories) are a critical issue for product categorization task because they are unavoidable, non-trivial to remove and degrade prediction performance significantly. Training a product title classification model which is robust to noisy labels in the data is very important to make product classification applications more practical. In this paper, we study the impact of instance-dependent noise to performance of product title classification by comparing our data denoising algorithm and different noise-resistance training algorithms which were designed to prevent a classifier model from over-fitting to noise. We develop a simple yet effective Deep Neural Network for product title classification to use as a base classifier. Along with recent methods of stimulating instance-dependent noise, we propose a novel noise stimulation algorithm based on product title similarity. Our experiments cover multiple datasets, various noise methods and different training solutions. Results uncover the limit of classification task when noise rate is not negligible and data distribution is highly skewed.
[[2209.07239] UBARv2: Towards Mitigating Exposure Bias in Task-Oriented Dialogs](http://arxiv.org/abs/2209.07239)
This paper studies the exposure bias problem in task-oriented dialog systems, where the model's generated content over multiple turns drives the dialog context away from the ground-truth distribution at training time, introducing error propagation and damaging the robustness of the TOD system. To bridge the gap between training and inference for multi-turn task-oriented dialogs, we propose session-level sampling which explicitly exposes the model to sampled generated content of dialog context during training. Additionally, we employ a dropout-based consistency regularization with the masking strategy R-Mask to further improve the robustness and performance of the model. The proposed UBARv2 achieves state-of-the-art performance on the standardized evaluation benchmark MultiWOZ and extensive experiments show the effectiveness of the proposed methods.
[[2209.07351] Rethinking Round-trip Translation for Automatic Machine Translation Evaluation](http://arxiv.org/abs/2209.07351)
A parallel corpus is generally required to automatically evaluate the translation quality using the metrics, such as BLEU, METEOR and BERTScore. While the reference-based evaluation paradigm is widely used in many machine translation tasks, it is difficult to be applied to translation with low-resource languages, as those languages suffer from a deficiency of corpora. Round-trip translation provides an encouraging way to alleviate the urgent requirement of the parallel corpus, although it was unfortunately not observed to correlate with forwarding translation in the era of statistical machine translation. In this paper, we firstly observe that forward translation quality consistently correlates to corresponding round-trip translation quality in the scope of neural machine translation. Then, we carefully analyse and unveil the reason for the contradictory results on statistical machine translation systems. Secondly, we propose a simple yet effective regression method to predict the performance of forward translation scores based on round-trip translation scores for various language pairs, including those between very low-resource languages. We conduct extensive experiments to show the effectiveness and robustness of the predictive models on 1,000+ language pairs. Finally, we test our method on challenging settings, such as predicting scores: i) for unseen language pairs in training and ii) on real-world WMT shared tasks but in new domains. The extensive experiments demonstrate the robustness and utility of our approach. We believe our work will inspire works on very low-resource multilingual machine translation.
[[2209.06931] Robust Transferable Feature Extractors: Learning to Defend Pre-Trained Networks Against White Box Adversaries](http://arxiv.org/abs/2209.06931)
The widespread adoption of deep neural networks in computer vision applications has brought forth a significant interest in adversarial robustness. Existing research has shown that maliciously perturbed inputs specifically tailored for a given model (i.e., adversarial examples) can be successfully transferred to another independently trained model to induce prediction errors. Moreover, this property of adversarial examples has been attributed to features derived from predictive patterns in the data distribution. Thus, we are motivated to investigate the following question: Can adversarial defenses, like adversarial examples, be successfully transferred to other independently trained models? To this end, we propose a deep learning-based pre-processing mechanism, which we refer to as a robust transferable feature extractor (RTFE). After examining theoretical motivation and implications, we experimentally show that our method can provide adversarial robustness to multiple independently pre-trained classifiers that are otherwise ineffective against an adaptive white box adversary. Furthermore, we show that RTFEs can even provide one-shot adversarial robustness to models independently trained on different datasets.
[[2209.07235] Sound and Complete Verification of Polynomial Networks](http://arxiv.org/abs/2209.07235)
Polynomial Networks (PNs) have demonstrated promising performance on face and image recognition recently. However, robustness of PNs is unclear and thus obtaining certificates becomes imperative for enabling their adoption in real-world applications. Existing verification algorithms on ReLU neural networks (NNs) based on branch and bound (BaB) techniques cannot be trivially applied to PN verification. In this work, we devise a new bounding method, equipped with BaB for global convergence guarantees, called VPN. One key insight is that we obtain much tighter bounds than the interval bound propagation baseline. This enables sound and complete PN verification with empirical validation on MNIST, CIFAR10 and STL10 datasets. We believe our method has its own interest to NN verification.
[[2209.06828] A Temporal Anomaly Detection System for Vehicles utilizing Functional Working Groups and Sensor Channels](http://arxiv.org/abs/2209.06828)
A modern vehicle fitted with sensors, actuators, and Electronic Control Units (ECUs) can be divided into several operational subsystems called Functional Working Groups (FWGs). Examples of these FWGs include the engine system, transmission, fuel system, brakes, etc. Each FWG has associated sensor-channels that gauge vehicular operating conditions. This data rich environment is conducive to the development of Predictive Maintenance (PdM) technologies. Undercutting various PdM technologies is the need for robust anomaly detection models that can identify events or observations which deviate significantly from the majority of the data and do not conform to a well defined notion of normal vehicular operational behavior. In this paper, we introduce the Vehicle Performance, Reliability, and Operations (VePRO) dataset and use it to create a multi-phased approach to anomaly detection. Utilizing Temporal Convolution Networks (TCN), our anomaly detection system can achieve 96% detection accuracy and accurately predicts 91% of true anomalies. The performance of our anomaly detection system improves when sensor channels from multiple FWGs are utilized.
[[2209.07263] Robustness in deep learning: The good (width), the bad (depth), and the ugly (initialization)](http://arxiv.org/abs/2209.07263)
We study the average robustness notion in deep neural networks in (selected) wide and narrow, deep and shallow, as well as lazy and non-lazy training settings. We prove that in the under-parameterized setting, width has a negative effect while it improves robustness in the over-parameterized setting. The effect of depth closely depends on the initialization and the training mode. In particular, when initialized with LeCun initialization, depth helps robustness with lazy training regime. In contrast, when initialized with Neural Tangent Kernel (NTK) and He-initialization, depth hurts the robustness. Moreover, under non-lazy training regime, we demonstrate how the width of a two-layer ReLU network benefits robustness. Our theoretical developments improve the results by Huang et al. [2021], Wu et al. [2021] and are consistent with Bubeck and Sellke [2021], Bubeck et al. [2021].
[[2209.07368] Causal Coupled Mechanisms: A Control Method with Cooperation and Competition for Complex System](http://arxiv.org/abs/2209.07368)
Complex systems are ubiquitous in the real world and tend to have complicated and poorly understood dynamics. For their control issues, the challenge is to guarantee accuracy, robustness, and generalization in such bloated and troubled environments. Fortunately, a complex system can be divided into multiple modular structures that human cognition appears to exploit. Inspired by this cognition, a novel control method, Causal Coupled Mechanisms (CCMs), is proposed that explores the cooperation in division and competition in combination. Our method employs the theory of hierarchical reinforcement learning (HRL), in which 1) the high-level policy with competitive awareness divides the whole complex system into multiple functional mechanisms, and 2) the low-level policy finishes the control task of each mechanism. Specifically for cooperation, a cascade control module helps the series operation of CCMs, and a forward coupled reasoning module is used to recover the coupling information lost in the division process. On both synthetic systems and a real-world biological regulatory system, the CCM method achieves robust and state-of-the-art control results even with unpredictable random noise. Moreover, generalization results show that reusing prepared specialized CCMs helps to perform well in environments with different confounders and dynamics.
[[2209.06866] Robust Constrained Reinforcement Learning](http://arxiv.org/abs/2209.06866)
Constrained reinforcement learning is to maximize the expected reward subject to constraints on utilities/costs. However, the training environment may not be the same as the test one, due to, e.g., modeling error, adversarial attack, non-stationarity, resulting in severe performance degradation and more importantly constraint violation. We propose a framework of robust constrained reinforcement learning under model uncertainty, where the MDP is not fixed but lies in some uncertainty set, the goal is to guarantee that constraints on utilities/costs are satisfied for all MDPs in the uncertainty set, and to maximize the worst-case reward performance over the uncertainty set. We design a robust primal-dual approach, and further theoretically develop guarantee on its convergence, complexity and robust feasibility. We then investigate a concrete example of $\delta$-contamination uncertainty set, design an online and model-free algorithm and theoretically characterize its sample complexity.
[[2209.07081] DEQGAN: Learning the Loss Function for PINNs with Generative Adversarial Networks](http://arxiv.org/abs/2209.07081)
Solutions to differential equations are of significant scientific and engineering relevance. Physics-Informed Neural Networks (PINNs) have emerged as a promising method for solving differential equations, but they lack a theoretical justification for the use of any particular loss function. This work presents Differential Equation GAN (DEQGAN), a novel method for solving differential equations using generative adversarial networks to "learn the loss function" for optimizing the neural network. Presenting results on a suite of twelve ordinary and partial differential equations, including the nonlinear Burgers', Allen-Cahn, Hamilton, and modified Einstein's gravity equations, we show that DEQGAN can obtain multiple orders of magnitude lower mean squared errors than PINNs that use $L_2$, $L_1$, and Huber loss functions. We also show that DEQGAN achieves solution accuracies that are competitive with popular numerical methods. Finally, we present two methods to improve the robustness of DEQGAN to different hyperparameter settings.
[[2209.07369] Adversarially Robust Learning: A Generic Minimax Optimal Learner and Characterization](http://arxiv.org/abs/2209.07369)
We present a minimax optimal learner for the problem of learning predictors robust to adversarial examples at test-time. Interestingly, we find that this requires new algorithmic ideas and approaches to adversarially robust learning. In particular, we show, in a strong negative sense, the suboptimality of the robust learner proposed by Montasser, Hanneke, and Srebro (2019) and a broader family of learners we identify as local learners. Our results are enabled by adopting a global perspective, specifically, through a key technical contribution: the global one-inclusion graph, which may be of independent interest, that generalizes the classical one-inclusion graph due to Haussler, Littlestone, and Warmuth (1994). Finally, as a byproduct, we identify a dimension characterizing qualitatively and quantitatively what classes of predictors $\mathcal{H}$ are robustly learnable. This resolves an open problem due to Montasser et al. (2019), and closes a (potentially) infinite gap between the established upper and lower bounds on the sample complexity of adversarially robust learning.
[[2209.07001] Pose Attention-Guided Profile-to-Frontal Face Recognition](http://arxiv.org/abs/2209.07001)
In recent years, face recognition systems have achieved exceptional success due to promising advances in deep learning architectures. However, they still fail to achieve expected accuracy when matching profile images against a gallery of frontal images. Current approaches either perform pose normalization (i.e., frontalization) or disentangle pose information for face recognition. We instead propose a new approach to utilize pose as an auxiliary information via an attention mechanism. In this paper, we hypothesize that pose attended information using an attention mechanism can guide contextual and distinctive feature extraction from profile faces, which further benefits a better representation learning in an embedded domain. To achieve this, first, we design a unified coupled profile-to-frontal face recognition network. It learns the mapping from faces to a compact embedding subspace via a class-specific contrastive loss. Second, we develop a novel pose attention block (PAB) to specially guide the pose-agnostic feature extraction from profile faces. To be more specific, PAB is designed to explicitly help the network to focus on important features along both channel and spatial dimension while learning discriminative yet pose invariant features in an embedding subspace. To validate the effectiveness of our proposed method, we conduct experiments on both controlled and in the wild benchmarks including Multi-PIE, CFP, IJBC, and show superiority over the state of the arts.
[[2209.07031] A semantic hierarchical graph neural network for text classification](http://arxiv.org/abs/2209.07031)
The key to the text classification task is language representation and important information extraction, and there are many related studies. In recent years, the research on graph neural network (GNN) in text classification has gradually emerged and shown its advantages, but the existing models mainly focus on directly inputting words as graph nodes into the GNN models ignoring the different levels of semantic structure information in the samples. To address the issue, we propose a new hierarchical graph neural network (HieGNN) which extracts corresponding information from word-level, sentence-level and document-level respectively. Experimental results on several benchmark datasets achieve better or similar results compared to several baseline methods, which demonstrate that our model is able to obtain more useful information for classification from samples.
[[2209.07442] Automatic Error Analysis for Document-level Information Extraction](http://arxiv.org/abs/2209.07442)
Document-level information extraction (IE) tasks have recently begun to be revisited in earnest using the end-to-end neural network techniques that have been successful on their sentence-level IE counterparts. Evaluation of the approaches, however, has been limited in a number of dimensions. In particular, the precision/recall/F1 scores typically reported provide few insights on the range of errors the models make. We build on the work of Kummerfeld and Klein (2013) to propose a transformation-based framework for automating error analysis in document-level event and (N-ary) relation extraction. We employ our framework to compare two state-of-the-art document-level template-filling approaches on datasets from three domains; and then, to gauge progress in IE since its inception 30 years ago, vs. four systems from the MUC-4 (1992) evaluation.
[[2209.07479] Gollum: A Gold Standard for Large Scale Multi Source Knowledge Graph Matching](http://arxiv.org/abs/2209.07479)
The number of Knowledge Graphs (KGs) generated with automatic and manual approaches is constantly growing. For an integrated view and usage, an alignment between these KGs is necessary on the schema as well as instance level. While there are approaches that try to tackle this multi source knowledge graph matching problem, large gold standards are missing to evaluate their effectiveness and scalability. We close this gap by presenting Gollum -- a gold standard for large-scale multi source knowledge graph matching with over 275,000 correspondences between 4,149 different KGs. They originate from knowledge graphs derived by applying the DBpedia extraction framework to a large wiki farm. Three variations of the gold standard are made available: (1) a version with all correspondences for evaluating unsupervised matching approaches, and two versions for evaluating supervised matching: (2) one where each KG is contained both in the train and test set, and (3) one where each KG is exclusively contained in the train or the test set.
[[2209.07018] FRANS: Automatic Feature Extraction for Time Series Forecasting](http://arxiv.org/abs/2209.07018)
Feature extraction methods help in dimensionality reduction and capture relevant information. In time series forecasting (TSF), features can be used as auxiliary information to achieve better accuracy. Traditionally, features used in TSF are handcrafted, which requires domain knowledge and significant data-engineering work. In this research, we first introduce a notion of static and dynamic features, which then enables us to develop our autonomous Feature Retrieving Autoregressive Network for Static features (FRANS) that does not require domain knowledge. The method is based on a CNN classifier that is trained to create for each series a collective and unique class representation either from parts of the series or, if class labels are available, from a set of series of the same class. It allows to discriminate series with similar behaviour but from different classes and makes the features extracted from the classifier to be maximally discriminatory. We explore the interpretability of our features, and evaluate the prediction capabilities of the method within the forecasting meta-learning environment FFORMA. Our results show that our features lead to improvement in accuracy in most situations. Once trained our approach creates features orders of magnitude faster than statistical methods.
[[2209.06997] M^4I: Multi-modal Models Membership Inference](http://arxiv.org/abs/2209.06997)
With the development of machine learning techniques, the attention of research has been moved from single-modal learning to multi-modal learning, as real-world data exist in the form of different modalities. However, multi-modal models often carry more information than single-modal models and they are usually applied in sensitive scenarios, such as medical report generation or disease identification. Compared with the existing membership inference against machine learning classifiers, we focus on the problem that the input and output of the multi-modal models are in different modalities, such as image captioning. This work studies the privacy leakage of multi-modal models through the lens of membership inference attack, a process of determining whether a data record involves in the model training process or not. To achieve this, we propose Multi-modal Models Membership Inference (M^4I) with two attack methods to infer the membership status, named metric-based (MB) M^4I and feature-based (FB) M^4I, respectively. More specifically, MB M^4I adopts similarity metrics while attacking to infer target data membership. FB M^4I uses a pre-trained shadow multi-modal feature extractor to achieve the purpose of data inference attack by comparing the similarities from extracted input and output features. Extensive experimental results show that both attack methods can achieve strong performances. Respectively, 72.5% and 94.83% of attack success rates on average can be obtained under unrestricted scenarios. Moreover, we evaluate multiple defense mechanisms against our attacks. The source code of M^4I attacks is publicly available at https://github.com/MultimodalMI/Multimodal-membership-inference.git.
[[2209.07267] Compressed Particle-Based Federated Bayesian Learning and Unlearning](http://arxiv.org/abs/2209.07267)
Conventional frequentist FL schemes are known to yield overconfident decisions. Bayesian FL addresses this issue by allowing agents to process and exchange uncertainty information encoded in distributions over the model parameters. However, this comes at the cost of a larger per-iteration communication overhead. This letter investigates whether Bayesian FL can still provide advantages in terms of calibration when constraining communication bandwidth. We present compressed particle-based Bayesian FL protocols for FL and federated "unlearning" that apply quantization and sparsification across multiple particles. The experimental results confirm that the benefits of Bayesian FL are robust to bandwidth constraints.
[[2209.06850] CAT: Controllable Attribute Translation for Fair Facial Attribute Classification](http://arxiv.org/abs/2209.06850)
As the social impact of visual recognition has been under scrutiny, several protected-attribute balanced datasets emerged to address dataset bias in imbalanced datasets. However, in facial attribute classification, dataset bias stems from both protected attribute level and facial attribute level, which makes it challenging to construct a multi-attribute-level balanced real dataset. To bridge the gap, we propose an effective pipeline to generate high-quality and sufficient facial images with desired facial attributes and supplement the original dataset to be a balanced dataset at both levels, which theoretically satisfies several fairness criteria. The effectiveness of our method is verified on sex classification and facial attribute classification by yielding comparable task performance as the original dataset and further improving fairness in a comprehensive fairness evaluation with a wide range of metrics. Furthermore, our method outperforms both resampling and balanced dataset construction to address dataset bias, and debiasing models to address task bias.
[[2209.06967] A novel illumination condition varied image dataset-Food Vision Dataset (FVD) for fair and reliable consumer acceptability predictions from food](http://arxiv.org/abs/2209.06967)
Recent advances in artificial intelligence promote a wide range of computer vision applications in many different domains. Digital cameras, acting as human eyes, can perceive fundamental object properties, such as shapes and colors, and can be further used for conducting high-level tasks, such as image classification, and object detections. Human perceptions have been widely recognized as the ground truth for training and evaluating computer vision models. However, in some cases, humans can be deceived by what they have seen. Well-functioned human vision relies on stable external lighting while unnatural illumination would influence human perception of essential characteristics of goods. To evaluate the illumination effects on human and computer perceptions, the group presents a novel dataset, the Food Vision Dataset (FVD), to create an evaluation benchmark to quantify illumination effects, and to push forward developments of illumination estimation methods for fair and reliable consumer acceptability prediction from food appearances. FVD consists of 675 images captured under 3 different power and 5 different temperature settings every alternate day for five such days.
[[2209.07044] Fair Inference for Discrete Latent Variable Models](http://arxiv.org/abs/2209.07044)
It is now well understood that machine learning models, trained on data without due care, often exhibit unfair and discriminatory behavior against certain populations. Traditional algorithmic fairness research has mainly focused on supervised learning tasks, particularly classification. While fairness in unsupervised learning has received some attention, the literature has primarily addressed fair representation learning of continuous embeddings. In this paper, we conversely focus on unsupervised learning using probabilistic graphical models with discrete latent variables. We develop a fair stochastic variational inference technique for the discrete latent variables, which is accomplished by including a fairness penalty on the variational distribution that aims to respect the principles of intersectionality, a critical lens on fairness from the legal, social science, and humanities literature, and then optimizing the variational parameters under this penalty. We first show the utility of our method in improving equity and fairness for clustering using na\"ive Bayes and Gaussian mixture models on benchmark datasets. To demonstrate the generality of our approach and its potential for real-world impact, we then develop a special-purpose graphical model for criminal justice risk assessments, and use our fairness approach to prevent the inferences from encoding unfair societal biases.
[[2209.07047] iFlipper: Label Flipping for Individual Fairness](http://arxiv.org/abs/2209.07047)
As machine learning becomes prevalent, mitigating any unfairness present in the training data becomes critical. Among the various notions of fairness, this paper focuses on the well-known individual fairness, which states that similar individuals should be treated similarly. While individual fairness can be improved when training a model (in-processing), we contend that fixing the data before model training (pre-processing) is a more fundamental solution. In particular, we show that label flipping is an effective pre-processing technique for improving individual fairness. Our system iFlipper solves the optimization problem of minimally flipping labels given a limit to the individual fairness violations, where a violation occurs when two similar examples in the training data have different labels. We first prove that the problem is NP-hard. We then propose an approximate linear programming algorithm and provide theoretical guarantees on how close its result is to the optimal solution in terms of the number of label flips. We also propose techniques for making the linear programming solution more optimal without exceeding the violations limit. Experiments on real datasets show that iFlipper significantly outperforms other pre-processing baselines in terms of individual fairness and accuracy on unseen test sets. In addition, iFlipper can be combined with in-processing techniques for even better results.
[[2209.07190] Adaptive Fairness Improvement Based on Causality Analysis](http://arxiv.org/abs/2209.07190)
Given a discriminating neural network, the problem of fairness improvement is to systematically reduce discrimination without significantly scarifies its performance (i.e., accuracy). Multiple categories of fairness improving methods have been proposed for neural networks, including pre-processing, in-processing and post-processing. Our empirical study however shows that these methods are not always effective (e.g., they may improve fairness by paying the price of huge accuracy drop) or even not helpful (e.g., they may even worsen both fairness and accuracy). In this work, we propose an approach which adaptively chooses the fairness improving method based on causality analysis. That is, we choose the method based on how the neurons and attributes responsible for unfairness are distributed among the input attributes and the hidden neurons. Our experimental evaluation shows that our approach is effective (i.e., always identify the best fairness improving method) and efficient (i.e., with an average time overhead of 5 minutes).
[[2209.07219] Training Neural Networks in Single vs Double Precision](http://arxiv.org/abs/2209.07219)
The commitment to single-precision floating-point arithmetic is widespread in the deep learning community. To evaluate whether this commitment is justified, the influence of computing precision (single and double precision) on the optimization performance of the Conjugate Gradient (CG) method (a second-order optimization algorithm) and RMSprop (a first-order algorithm) has been investigated. Tests of neural networks with one to five fully connected hidden layers and moderate or strong nonlinearity with up to 4 million network parameters have been optimized for Mean Square Error (MSE). The training tasks have been set up so that their MSE minimum was known to be zero. Computing experiments have disclosed that single-precision can keep up (with superlinear convergence) with double-precision as long as line search finds an improvement. First-order methods such as RMSprop do not benefit from double precision. However, for moderately nonlinear tasks, CG is clearly superior. For strongly nonlinear tasks, both algorithm classes find only solutions fairly poor in terms of mean square error as related to the output variance. CG with double floating-point precision is superior whenever the solutions have the potential to be useful for the application goal.
[[2209.07245] Efficient first-order predictor-corrector multiple objective optimization for fair misinformation detection](http://arxiv.org/abs/2209.07245)
Multiple-objective optimization (MOO) aims to simultaneously optimize multiple conflicting objectives and has found important applications in machine learning, such as minimizing classification loss and discrepancy in treating different populations for fairness. At optimality, further optimizing one objective will necessarily harm at least another objective, and decision-makers need to comprehensively explore multiple optima (called Pareto front) to pinpoint one final solution. We address the efficiency of finding the Pareto front. First, finding the front from scratch using stochastic multi-gradient descent (SMGD) is expensive with large neural networks and datasets. We propose to explore the Pareto front as a manifold from a few initial optima, based on a predictor-corrector method. Second, for each exploration step, the predictor solves a large-scale linear system that scales quadratically in the number of model parameters and requires one backpropagation to evaluate a second-order Hessian-vector product per iteration of the solver. We propose a Gauss-Newton approximation that only scales linearly, and that requires only first-order inner-product per iteration. This also allows for a choice between the MINRES and conjugate gradient methods when approximately solving the linear system. The innovations make predictor-corrector possible for large networks. Experiments on multi-objective (fairness and accuracy) misinformation detection tasks show that 1) the predictor-corrector method can find Pareto fronts better than or similar to SMGD with less time; and 2) the proposed first-order method does not harm the quality of the Pareto front identified by the second-order method, while further reduce running time.
[[2209.07312] Multicalibrated Regression for Downstream Fairness](http://arxiv.org/abs/2209.07312)
We show how to take a regression function $\hat{f}$ that is appropriately
multicalibrated'' and efficiently post-process it into an approximately error
minimizing classifier satisfying a large variety of fairness constraints. The
post-processing requires no labeled data, and only a modest amount of unlabeled
data and computation. The computational and sample complexity requirements of
computing $\hat f$ are comparable to the requirements for solving a single fair
learning task optimally, but it can in fact be used to solve many different
downstream fairness-constrained learning problems efficiently. Our
post-processing method easily handles intersecting groups, generalizing prior
work on post-processing regression functions to satisfy fairness constraints
that only applied to disjoint groups. Our work extends recent work showing that
multicalibrated regression functions are
omnipredictors'' (i.e. can be
post-processed to optimally solve unconstrained ERM problems) to constrained
optimization.
[[2209.07463] Omnipredictors for Constrained Optimization](http://arxiv.org/abs/2209.07463)
The notion of omnipredictors (Gopalan, Kalai, Reingold, Sharan and Wieder ITCS 2021), suggested a new paradigm for loss minimization. Rather than learning a predictor based on a known loss function, omnipredictors can easily be post-processed to minimize any one of a rich family of loss functions compared with the loss of a class $C$. It has been shown that such omnipredictors exist and are implied (for all convex and Lipschitz loss functions) by the notion of multicalibration from the algorithmic fairness literature. Nevertheless, it is often the case that the action selected must obey some additional constraints (such as capacity or parity constraints). In itself, the original notion of omnipredictors does not apply in this well-motivated and heavily studied the context of constrained loss minimization.
In this paper, we introduce omnipredictors for constrained optimization and study their complexity and implications. The notion that we introduce allows the learner to be unaware of the loss function that will be later assigned as well as the constraints that will be later imposed, as long as the subpopulations that are used to define these constraints are known.
The paper shows how to obtain omnipredictors for constrained optimization problems, relying on appropriate variants of multicalibration. For some interesting constraints and general loss functions and for general constraints and some interesting loss functions, we show how omnipredictors are implied by a variant of multicalibration that is similar in complexity to standard multicalibration. We demonstrate that in the general case, standard multicalibration is insufficient and show that omnipredictors are implied by multicalibration with respect to a class containing all the level sets of hypotheses in $C$. We also investigate the implications when the constraints are group fairness notions.
[[2209.07046] Exploring Visual Interpretability for Contrastive Language-Image Pre-training](http://arxiv.org/abs/2209.07046)
Contrastive Language-Image pre-training (CLIP) learns rich representations via readily available supervisions of natural language. It could improve general performance on downstream vision tasks, including but not limited to zero-shot, long tail, segmentation, retrieval, caption and video. However, to the best of our knowledge, the visual interpretability of CLIP has not been studied yet. To provide visual explanations of its predictions, we propose the Image-Text Similarity Map (ITSM). Based on it, we surprisingly find that CLIP prefers the background regions than the foregrounds, and presenting erroneous visualization against human understanding. Experimentally, we find the devil is in the pooling part, where inappropriate pooling methods lead to a phenomenon called semantic shift. To correct and boost the visualization results, we propose the Masked Max Pooling, with attention map from the self-supervised image encoder. Meanwhile, interpretability task and recognition task require different representations. To address the problem, we propose the dual projections to cater this requirement. We integrate above methods as Interpretable Contrastive Language-Image pre-training (ICLIP). And experiments suggest ICLIP greatly improves the interpretability. For example, the nontrivial improvements are $32.85\%$ and $49.10\%$, respectively, on VOC 2012 dataset.
[[2209.07089] Constrained Update Projection Approach to Safe Policy Optimization](http://arxiv.org/abs/2209.07089)
Safe reinforcement learning (RL) studies problems where an intelligent agent has to not only maximize reward but also avoid exploring unsafe areas. In this study, we propose CUP, a novel policy optimization method based on Constrained Update Projection framework that enjoys rigorous safety guarantee. Central to our CUP development is the newly proposed surrogate functions along with the performance bound. Compared to previous safe RL methods, CUP enjoys the benefits of 1) CUP generalizes the surrogate functions to generalized advantage estimator (GAE), leading to strong empirical performance. 2) CUP unifies performance bounds, providing a better understanding and interpretability for some existing algorithms; 3) CUP provides a non-convex implementation via only first-order optimizers, which does not require any strong approximation on the convexity of the objectives. To validate our CUP method, we compared CUP against a comprehensive list of safe RL baselines on a wide range of tasks. Experiments show the effectiveness of CUP both in terms of reward and safety constraint satisfaction. We have opened the source code of CUP at https://github.com/RL-boxes/Safe-RL/tree/ main/CUP.
[[2209.07175] Literature Review of various Fuzzy Rule based Systems](http://arxiv.org/abs/2209.07175)
Fuzzy rule based systems (FRBSs) is a rule-based system which uses linguistic fuzzy variables as antecedents and consequent to represent the human understandable knowledge. They have been applied to various applications and areas throughout the literature. However, FRBSs suffers from many drawbacks such as uncertainty representation, high number of rules, interpretability loss, high computational time for learning etc. To overcome these issues with FRBSs, there exists many extentions of FRBSs. In this paper, we present an overview and literature review for various types and prominent areas of fuzzy systems (FRBSs) namely genetic fuzzy system (GFS), Hierarchical fuzzy system (HFS), neuro fuzzy system (NFS), evolving fuzzy system (eFS), FRBSs for big data, FRBSs for imbalanced data, interpretability in FRBSs and FRBSs which uses cluster centroids as fuzzy rule, during the years 2010-2021. GFS uses genetic/evolutionary approaches to improve the learning ability of FRBSs, HFS solve the curse of dimensionality for FRBSs, NFS improves approximation ability of FRBSs using neural networks and dynamic systems for streaming data is considered in eFS. FRBSs are seen as good solutions for big data and imbalanced data, in the recent years the interpretability in FRBSs has gained popularity due to high dimensional and big data and rules are initialized with cluster centroids to limit the number of rules in FRBSs. This paper also highlights important contributions, publication statistics and current trends in the field. The paper also addresses several open research areas which need further attention from the FRBSs research community.
[[2209.07066] A Lattice-Based Embedding Method for Reversible Audio Watermarking](http://arxiv.org/abs/2209.07066)
Reversible audio watermarking (RAW) is a promising technique in various applications. To simultaneously meet the demand of achieving high imperceptibility and robustness, this paper proposes a novel RAW scheme based on lattices. The scheme is referred to as Meet-in-the-Middle Embedding (MME), in which the lattice quantization errors are properly scaled and added back to the quantized host signals. Simulations show that MME excels in a wide range of metrics including signal-to-watermark ratio (SWR), objective difference grade (ODG), and bit error rate (BER).