[[2302.08121] Practically Efficient Secure Computation of Rank-based Statistics Over Distributed Datasets](http://arxiv.org/abs/2302.08121) #secure
In this paper, we propose a practically efficient model for securely computing rank-based statistics, e.g., median, percentiles and quartiles, over distributed datasets in the malicious setting without leaking individual data privacy. Based on the binary search technique of Aggarwal et al. (EUROCRYPT \textquotesingle 04), we respectively present an interactive protocol and a non-interactive protocol, involving at most $\log ||R||$ rounds, where $||R||$ is the range size of the dataset elements. Besides, we introduce a series of optimisation techniques to reduce the round complexity. Our computing model is modular and can be instantiated with either homomorphic encryption or secret-sharing schemes. Compared to the state-of-the-art solutions, it provides stronger security and privacy while maintaining high efficiency and accuracy. Unlike differential-privacy-based solutions, it does not suffer a trade-off between accuracy and privacy. On the other hand, it only involves $O(N \log ||R||)$ time complexity, which is far more efficient than those bitwise-comparison-based solutions with $O(N^2\log ||R||)$ time complexity, where $N$ is the dataset size. Finally, we provide a UC-secure instantiation with the threshold Paillier cryptosystem and $\Sigma$-protocol zero-knowledge proofs of knowledge.
[[2302.08237] A cloud-based deep learning system for improving crowd safety at event entrances](http://arxiv.org/abs/2302.08237) #security
Crowding at the entrances of large events may lead to critical and life-threatening situations, particularly when people start pushing each other to reach the event faster. A system for automatic and timely identification of pushing behavior would help organizers and security forces to intervene early and mitigate dangerous situations. In this paper, we propose a cloud-based deep learning system for early detection of pushing automatically in the live video stream of crowded event entrances. The proposed system relies mainly on two models: a pre-trained deep optical flow and an adapted version of the EfficientNetV2B0 classifier. The optical flow model extracts the characteristics of the crowd motion in the live video stream, while the classifier analyses the crowd motion and annotates pushing patches in the live stream. A novel dataset is generated based on five real-world experiments and their associated ground truth data to train the adapted EfficientNetV2B0 model. The experimental situations simulated a crowded event entrance, and social psychologists manually created the ground truths for each video experiment. Several experiments on the videos and the generated dataset are carried out to evaluate the accuracy and annotation delay time of the proposed system. Furthermore, the experts manually revised the annotation results of the system. Findings indicate that the system identified pushing behaviors with an accuracy rate of 89% within an acceptable delay time.
[[2302.07953] AI Security Threats against Pervasive Robotic Systems: A Course for Next Generation Cybersecurity Workforce](http://arxiv.org/abs/2302.07953) #security
Robotics, automation, and related Artificial Intelligence (AI) systems have become pervasive bringing in concerns related to security, safety, accuracy, and trust. With growing dependency on physical robots that work in close proximity to humans, the security of these systems is becoming increasingly important to prevent cyber-attacks that could lead to privacy invasion, critical operations sabotage, and bodily harm. The current shortfall of professionals who can defend such systems demands development and integration of such a curriculum. This course description includes details about seven self-contained and adaptive modules on "AI security threats against pervasive robotic systems". Topics include: 1) Introduction, examples of attacks, and motivation; 2) - Robotic AI attack surfaces and penetration testing; 3) - Attack patterns and security strategies for input sensors; 4) - Training attacks and associated security strategies; 5) - Inference attacks and associated security strategies; 6) - Actuator attacks and associated security strategies; and 7) - Ethics of AI, robotics, and cybersecurity.
[[2302.08000] How Effective is Multiple-Vantage-Point Domain Control Validation?](http://arxiv.org/abs/2302.08000) #security
Multiple-vantage-point domain control validation (multiVA) is an emerging defense for mitigating BGP hijacking attacks against certificate authorities. While the adoption of multiVA is on the rise, little work has quantified its effectiveness against BGP hijacks in the wild. We bridge the gap by presenting the first analysis framework that measures the security of a multiVA deployment under real-world network configurations (e.g., DNS and RPKI). Our framework accurately models the attack surface of multiVA by 1) considering the attacks on DNS nameservers involved in domain validation, 2) considering deployed practical security techniques such as RPKI, 3) performing fine-grained internet-scale analysis to compute multiVA resilience (i.e., how difficult it is to launch a BGP hijack against a domain and get a bogus certificate under multiVA). We use our framework to perform a rigorous security analysis of the multiVA deployment of Let's Encrypt, using a dataset that consists of about 1 million certificates and 31 billion DNS queries collected over four months. Our analysis shows while DNS does enlarge the attack surface of multiVA, the of Let's Encrypt's multiVA deployment still offers an 88% median resilience against BGP hijacks, a notable improvement over 76% offered by single-vantage-point validation. RPKI, even in its current state of partial deployment, effectively mitigates BGP attacks and improves the security of the deployment by 15% as compared to the case without considering RPKI. Exploring 11,000 different multiVA configurations, we find that Let's Encrypt's deployment can be further enhanced to achieve a resilience of over 99% by using a full quorum policy with only two additional vantage points in different public clouds.
[[2302.08260] HE-MAN -- Homomorphically Encrypted MAchine learning with oNnx models](http://arxiv.org/abs/2302.08260) #security
Machine learning (ML) algorithms are increasingly important for the success of products and services, especially considering the growing amount and availability of data. This also holds for areas handling sensitive data, e.g. applications processing medical data or facial images. However, people are reluctant to pass their personal sensitive data to a ML service provider. At the same time, service providers have a strong interest in protecting their intellectual property and therefore refrain from publicly sharing their ML model. Fully homomorphic encryption (FHE) is a promising technique to enable individuals using ML services without giving up privacy and protecting the ML model of service providers at the same time. Despite steady improvements, FHE is still hardly integrated in today's ML applications.
We introduce HE-MAN, an open-source two-party machine learning toolset for privacy preserving inference with ONNX models and homomorphically encrypted data. Both the model and the input data do not have to be disclosed. HE-MAN abstracts cryptographic details away from the users, thus expertise in FHE is not required for either party. HE-MAN 's security relies on its underlying FHE schemes. For now, we integrate two different homomorphic encryption schemes, namely Concrete and TenSEAL. Compared to prior work, HE-MAN supports a broad range of ML models in ONNX format out of the box without sacrificing accuracy. We evaluate the performance of our implementation on different network architectures classifying handwritten digits and performing face recognition and report accuracy and latency of the homomorphically encrypted inference. Cryptographic parameters are automatically derived by the tools. We show that the accuracy of HE-MAN is on par with models using plaintext input while inference latency is several orders of magnitude higher compared to the plaintext case.
[[2302.08359] Towards a Unified Cybersecurity Testing Lab for Satellite, Aerospace, Avionics, Maritime, Drone (SAAMD) technologies and communications](http://arxiv.org/abs/2302.08359) #security
Aviation, maritime, and aerospace traffic control, radar, communication, and software technologies received increasing attention in the research literature over the past decade, as software-defined radios have enabled practical wireless attacks on communication links previously thought to be unreachable by unskilled or low-budget attackers. Moreover, recently it became apparent that both offensive and defensive cybersecurity has become a strategically differentiating factor for such technologies on the war fields (e.g., Ukraine), affecting both civilian and military missions regardless of their involvement. However, attacks and countermeasures are usually studied in simulated settings, thus introducing the lack of realism or non-systematic and highly customized practical setups, thus introducing high costs, overheads, and less reproducibility. Our "Unified Cybersecurity Testing Lab" seeks to close this gap by building a laboratory that can provide a systematic, affordable, highly-flexible, and extensible setup.
In this paper, we introduce and motivate our "Unified Cybersecurity Testing Lab for Satellite, Aerospace, Avionics, Maritime, Drone (SAAMD)" technologies and communications, as well as some peer-reviewed results and evaluation of the targeted threat vectors. We show via referenced peer-reviewed works that the current modules of the lab were successfully used to realistically attack and analyze air-traffic control, radar, communication, and software technologies such as ADS-B, AIS, ACARS, EFB, EPIRB and COSPAS-SARSAT. We are currently developing and integrating support for additional technologies (e.g., CCSDS, FLARM), and we plan future extensions on our own as well as in collaboration with research and industry. Our "Unified Cybersecurity Testing Lab" is open for use, experimentation, and collaboration with other researchers, contributors and interested parties.
[[2302.08361] Cybersecurity of COSPAS-SARSAT and EPIRB: threat and attacker models, exploits, future research](http://arxiv.org/abs/2302.08361) #security
COSPAS-SARSAT is an International programme for "Search and Rescue" (SAR) missions based on the "Satellite Aided Tracking" system (SARSAT). It is designed to provide accurate, timely, and reliable distress alert and location data to help SAR authorities of participating countries to assist persons and vessels in distress. Two types of satellite constellations serve COSPAS-SARSAT, low earth orbit search and rescue (LEOSAR) and geostationary orbiting search and rescue (GEOSAR). Despite its nearly-global deployment and critical importance, unfortunately enough, we found that COSPAS-SARSAT protocols and standard 406 MHz transmissions lack essential means of cybersecurity.
In this paper, we investigate the cybersecurity aspects of COSPAS-SARSAT space-/satellite-based systems. In particular, we practically and successfully implement and demonstrate the first (to our knowledge) attacks on COSPAS-SARSAT 406 MHz protocols, namely replay, spoofing, and protocol fuzzing on EPIRB protocols. We also identify a set of core research challenges preventing more effective cybersecurity research in the field and outline the main cybersecurity weaknesses and possible mitigations to increase the system's cybersecurity level.
[[2302.07917] Evaluating Trade-offs in Computer Vision Between Attribute Privacy, Fairness and Utility](http://arxiv.org/abs/2302.07917) #privacy
This paper investigates to what degree and magnitude tradeoffs exist between utility, fairness and attribute privacy in computer vision. Regarding privacy, we look at this important problem specifically in the context of attribute inference attacks, a less addressed form of privacy. To create a variety of models with different preferences, we use adversarial methods to intervene on attributes relating to fairness and privacy. We see that that certain tradeoffs exist between fairness and utility, privacy and utility, and between privacy and fairness. The results also show that those tradeoffs and interactions are more complex and nonlinear between the three goals than intuition would suggest.
[[2302.08374] Efficiency 360: Efficient Vision Transformers](http://arxiv.org/abs/2302.08374) #privacy
Transformers are widely used for solving tasks in natural language processing, computer vision, speech, and music domains. In this paper, we talk about the efficiency of transformers in terms of memory (the number of parameters), computation cost (number of floating points operations), and performance of models, including accuracy, the robustness of the model, and fair \& bias-free features. We mainly discuss the vision transformer for the image classification task. Our contribution is to introduce an efficient 360 framework, which includes various aspects of the vision transformer, to make it more efficient for industrial applications. By considering those applications, we categorize them into multiple dimensions such as privacy, robustness, transparency, fairness, inclusiveness, continual learning, probabilistic models, approximation, computational complexity, and spectral complexity. We compare various vision transformer models based on their performance, the number of parameters, and the number of floating point operations (FLOPs) on multiple datasets.
[[2302.07956] Tight Auditing of Differentially Private Machine Learning](http://arxiv.org/abs/2302.07956) #privacy
Auditing mechanisms for differential privacy use probabilistic means to empirically estimate the privacy level of an algorithm. For private machine learning, existing auditing mechanisms are tight: the empirical privacy estimate (nearly) matches the algorithm's provable privacy guarantee. But these auditing techniques suffer from two limitations. First, they only give tight estimates under implausible worst-case assumptions (e.g., a fully adversarial dataset). Second, they require thousands or millions of training runs to produce non-trivial statistical estimates of the privacy leakage.
This work addresses both issues. We design an improved auditing scheme that yields tight privacy estimates for natural (not adversarially crafted) datasets -- if the adversary can see all model updates during training. Prior auditing works rely on the same assumption, which is permitted under the standard differential privacy threat model. This threat model is also applicable, e.g., in federated learning settings. Moreover, our auditing scheme requires only two training runs (instead of thousands) to produce tight privacy estimates, by adapting recent advances in tight composition theorems for differential privacy. We demonstrate the utility of our improved auditing schemes by surfacing implementation bugs in private machine learning code that eluded prior auditing techniques.
[[2302.07975] Multi-Task Differential Privacy Under Distribution Skew](http://arxiv.org/abs/2302.07975) #privacy
We study the problem of multi-task learning under user-level differential privacy, in which $n$ users contribute data to $m$ tasks, each involving a subset of users. One important aspect of the problem, that can significantly impact quality, is the distribution skew among tasks. Certain tasks may have much fewer data samples than others, making them more susceptible to the noise added for privacy. It is natural to ask whether algorithms can adapt to this skew to improve the overall utility.
We give a systematic analysis of the problem, by studying how to optimally allocate a user's privacy budget among tasks. We propose a generic algorithm, based on an adaptive reweighting of the empirical loss, and show that when there is task distribution skew, this gives a quantifiable improvement of excess empirical risk.
Experimental studies on recommendation problems that exhibit a long tail of small tasks, demonstrate that our methods significantly improve utility, achieving the state of the art on two standard benchmarks.
[[2302.07992] Vector-based Efficient Data Hiding in Encrypted Images via Multi-MSB Replacement](http://arxiv.org/abs/2302.07992) #privacy
As an essential technique for data privacy protection, reversible data hiding in encrypted images (RDHEI) methods have drawn intensive research interest in recent years. In response to the increasing demand for protecting data privacy, novel methods that perform RDHEI are continually being developed. We propose two effective multi-MSB (most significant bit) replacement-based approaches that yield comparably high data embedding capacity, improve overall processing speed, and enhance reconstructed images' quality. Our first method, Efficient Multi-MSB Replacement-RDHEI (EMR-RDHEI), obtains higher data embedding rates (DERs, also known as payloads) and better visual quality in reconstructed images when compared with many other state-of-the-art methods. Our second method, Lossless Multi-MSB Replacement-RDHEI (LMR-RDHEI), can losslessly recover original images after an information embedding process is performed. To verify the accuracy of our methods, we compared them with other recent RDHEI techniques and performed extensive experiments using the widely accepted BOWS-2 dataset. Our experimental results showed that the DER of our EMR-RDHEI method ranged from 1.2087 bit per pixel (bpp) to 6.2682 bpp with an average of 3.2457 bpp. For the LMR-RDHEI method, the average DER was 2.5325 bpp, with a range between 0.2129 bpp and 6.0168 bpp. Our results demonstrate that these methods outperform many other state-of-the-art RDHEI algorithms. Additionally, the multi-MSB replacement-based approach provides a clean design and efficient vectorized implementation.
[[2302.08044] Balancing Privacy Protection and Interpretability in Federated Learning](http://arxiv.org/abs/2302.08044) #privacy
Federated learning (FL) aims to collaboratively train the global model in a distributed manner by sharing the model parameters from local clients to a central server, thereby potentially protecting users' private information. Nevertheless, recent studies have illustrated that FL still suffers from information leakage as adversaries try to recover the training data by analyzing shared parameters from local clients. To deal with this issue, differential privacy (DP) is adopted to add noise to the gradients of local models before aggregation. It, however, results in the poor performance of gradient-based interpretability methods, since some weights capturing the salient region in feature map will be perturbed. To overcome this problem, we propose a simple yet effective adaptive differential privacy (ADP) mechanism that selectively adds noisy perturbations to the gradients of client models in FL. We also theoretically analyze the impact of gradient perturbation on the model interpretability. Finally, extensive experiments on both IID and Non-IID data demonstrate that the proposed ADP can achieve a good trade-off between privacy and interpretability in FL.
[[2302.08066] Masking and Mixing Adversarial Training](http://arxiv.org/abs/2302.08066) #attack
While convolutional neural networks (CNNs) have achieved excellent performances in various computer vision tasks, they often misclassify with malicious samples, a.k.a. adversarial examples. Adversarial training is a popular and straightforward technique to defend against the threat of adversarial examples. Unfortunately, CNNs must sacrifice the accuracy of standard samples to improve robustness against adversarial examples when adversarial training is used. In this work, we propose Masking and Mixing Adversarial Training (M2AT) to mitigate the trade-off between accuracy and robustness. We focus on creating diverse adversarial examples during training. Specifically, our approach consists of two processes: 1) masking a perturbation with a binary mask and 2) mixing two partially perturbed images. Experimental results on CIFAR-10 dataset demonstrate that our method achieves better robustness against several adversarial attacks than previous methods.
[[2302.08320] Introduction to Presentation Attacks in Signature Biometrics and Recent Advances](http://arxiv.org/abs/2302.08320) #attack
Applications based on biometric authentication have received a lot of interest in the last years due to the breathtaking results obtained using personal traits such as face or fingerprint. However, it is important not to forget that these biometric systems have to withstand different types of possible attacks. This chapter carries out an analysis of different Presentation Attack (PA) scenarios for on-line handwritten signature verification. The main contributions of this chapter are: i) an updated overview of representative methods for Presentation Attack Detection (PAD) in signature biometrics; ii) a description of the different levels of PAs existing in on-line signature verification regarding the amount of information available to the impostor, as well as the training, effort, and ability to perform the forgeries; and iii) an evaluation of the system performance in signature biometrics under different scenarios considering recent publicly available signature databases, DeepSignDB and SVC2021_EvalDB. This work is in line with recent efforts in the Common Criteria standardization community towards security evaluation of biometric systems.
[[2302.07941] An Experimentation Infrastructure for Quantitative Measurements of Cyber Resilience](http://arxiv.org/abs/2302.07941) #attack
The vulnerability of cyber-physical systems to cyber attack is well known, and the requirement to build cyber resilience into these systems has been firmly established. The key challenge this paper addresses is that maturing this discipline requires the development of techniques, tools, and processes for objectively, rigorously, and quantitatively measuring the attributes of cyber resilience. Researchers and program managers need to be able to determine if the implementation of a resilience solution actually increases the resilience of the system. In previous work, a table top exercise was conducted using a notional heavy vehicle on a fictitious military mission while under a cyber attack. While this exercise provided some useful data, more and higher fidelity data is required to refine the measurement methodology. This paper details the efforts made to construct a cost-effective experimentation infrastructure to provide such data. It also presents a case study using some of the data generated by the infrastructure.
[[2302.07982] Correlation-Aware Neural Networks for DDoS Attack Detection In IoT Systems](http://arxiv.org/abs/2302.07982) #attack
We present a comprehensive study on applying machine learning to detect distributed Denial of service (DDoS) attacks using large-scale Internet of Things (IoT) systems. While prior works and existing DDoS attacks have largely focused on individual nodes transmitting packets at a high volume, we investigate more sophisticated futuristic attacks that use large numbers of IoT devices and camouflage their attack by having each node transmit at a volume typical of benign traffic. We introduce new correlation-aware architectures that take into account the correlation of traffic across IoT nodes, and we also compare the effectiveness of centralized and distributed detection models. We extensively analyze the proposed architectures by evaluating five different neural network models trained on a dataset derived from a 4060-node real-world IoT system. We observe that long short-term memory (LSTM) and a transformer-based model, in conjunction with the architectures that use correlation information of the IoT nodes, provide higher performance (in terms of F1 score and binary accuracy) than the other models and architectures, especially when the attacker camouflages itself by following benign traffic distribution on each transmitting node. For instance, by using the LSTM model, the distributed correlation-aware architecture gives 81% F1 score for the attacker that camouflages their attack with benign traffic as compared to 35% for the architecture that does not use correlation information. We also investigate the performance of heuristics for selecting a subset of nodes to share their data for correlation-aware architectures to meet resource constraints.
[[2302.08466] Marich: A Query-efficient Distributionally Equivalent Model Extraction Attack using Public Data](http://arxiv.org/abs/2302.08466) #attack
We study black-box model stealing attacks where the attacker can query a machine learning model only through publicly available APIs. Specifically, our aim is to design a black-box model extraction attack that uses minimal number of queries to create an informative and distributionally equivalent replica of the target model. First, we define distributionally equivalent and max-information model extraction attacks. Then, we reduce both the attacks into a variational optimisation problem. The attacker solves this problem to select the most informative queries that simultaneously maximise the entropy and reduce the mismatch between the target and the stolen models. This leads us to an active sampling-based query selection algorithm, Marich. We evaluate Marich on different text and image data sets, and different models, including BERT and ResNet18. Marich is able to extract models that achieve $69-96\%$ of true model's accuracy and uses $1,070 - 6,950$ samples from the publicly available query datasets, which are different from the private training datasets. Models extracted by Marich yield prediction distributions, which are $\sim2-4\times$ closer to the target's distribution in comparison to the existing active sampling-based algorithms. The extracted models also lead to $85-95\%$ accuracy under membership inference attacks. Experimental results validate that Marich is query-efficient, and also capable of performing task-accurate, high-fidelity, and informative model extraction.
[[2302.08257] On the Effect of Adversarial Training Against Invariance-based Adversarial Examples](http://arxiv.org/abs/2302.08257) #attack
Adversarial examples are carefully crafted attack points that are supposed to fool machine learning classifiers. In the last years, the field of adversarial machine learning, especially the study of perturbation-based adversarial examples, in which a perturbation that is not perceptible for humans is added to the images, has been studied extensively. Adversarial training can be used to achieve robustness against such inputs. Another type of adversarial examples are invariance-based adversarial examples, where the images are semantically modified such that the predicted class of the model does not change, but the class that is determined by humans does. How to ensure robustness against this type of adversarial examples has not been explored yet. This work addresses the impact of adversarial training with invariance-based adversarial examples on a convolutional neural network (CNN).
We show that when adversarial training with invariance-based and perturbation-based adversarial examples is applied, it should be conducted simultaneously and not consecutively. This procedure can achieve relatively high robustness against both types of adversarial examples. Additionally, we find that the algorithm used for generating invariance-based adversarial examples in prior work does not correctly determine the labels and therefore we use human-determined labels.
[[2302.07950] Topological Neural Discrete Representation Learning \
a la Kohonen](http://arxiv.org/abs/2302.07950) #robust`Unsupervised learning of discrete representations from continuous ones in neural networks (NNs) is the cornerstone of several applications today. Vector Quantisation (VQ) has become a popular method to achieve such representations, in particular in the context of generative models such as Variational Auto-Encoders (VAEs). For example, the exponential moving average-based VQ (EMA-VQ) algorithm is often used. Here we study an alternative VQ algorithm based on the learning rule of Kohonen Self-Organising Maps (KSOMs; 1982) of which EMA-VQ is a special case. In fact, KSOM is a classic VQ algorithm which is known to offer two potential benefits over the latter: empirically, KSOM is known to perform faster VQ, and discrete representations learned by KSOM form a topological structure on the grid whose nodes are the discrete symbols, resulting in an artificial version of the topographic map in the brain. We revisit these properties by using KSOM in VQ-VAEs for image processing. In particular, our experiments show that, while the speed-up compared to well-configured EMA-VQ is only observable at the beginning of training, KSOM is generally much more robust than EMA-VQ, e.g., w.r.t. the choice of initialisation schemes. Our code is public.
[[2302.08011] Vision-Based Terrain Relative Navigation on High-Altitude Balloon and Sub-Orbital Rocket](http://arxiv.org/abs/2302.08011) #robust
We present an experimental analysis on the use of a camera-based approach for high-altitude navigation by associating mapped landmarks from a satellite image database to camera images, and by leveraging inertial sensors between camera frames. We evaluate performance of both a sideways-tilted and downward-facing camera on data collected from a World View Enterprises high-altitude balloon with data beginning at an altitude of 33 km and descending to near ground level (4.5 km) with 1.5 hours of flight time. We demonstrate less than 290 meters of average position error over a trajectory of more than 150 kilometers. In addition to showing performance across a range of altitudes, we also demonstrate the robustness of the Terrain Relative Navigation (TRN) method to rapid rotations of the balloon, in some cases exceeding 20 degrees per second, and to camera obstructions caused by both cloud coverage and cords swaying underneath the balloon. Additionally, we evaluate performance on data collected by two cameras inside the capsule of Blue Origin's New Shepard rocket on payload flight NS-23, traveling at speeds up to 880 km/hr, and demonstrate less than 55 meters of average position error.
[[2302.08058] Learning Non-Local Spatial-Angular Correlation for Light Field Image Super-Resolution](http://arxiv.org/abs/2302.08058) #robust
Exploiting spatial-angular correlation is crucial to light field (LF) image super-resolution (SR), but is highly challenging due to its non-local property caused by the disparities among LF images. Although many deep neural networks (DNNs) have been developed for LF image SR and achieved continuously improved performance, existing methods cannot well leverage the long-range spatial-angular correlation and thus suffer a significant performance drop when handling scenes with large disparity variations. In this paper, we propose a simple yet effective method to learn the non-local spatial-angular correlation for LF image SR. In our method, we adopt the epipolar plane image (EPI) representation to project the 4D spatial-angular correlation onto multiple 2D EPI planes, and then develop a Transformer network with repetitive self-attention operations to learn the spatial-angular correlation by modeling the dependencies between each pair of EPI pixels. Our method can fully incorporate the information from all angular views while achieving a global receptive field along the epipolar line. We conduct extensive experiments with insightful visualizations to validate the effectiveness of our method. Comparative results on five public datasets show that our method not only achieves state-of-the-art SR performance, but also performs robust to disparity variations. Code is publicly available at https://github.com/ZhengyuLiang24/EPIT.
[[2302.08185] WHC: Weighted Hybrid Criterion for Filter Pruning on Convolutional Neural Networks](http://arxiv.org/abs/2302.08185) #robust
Filter pruning has attracted increasing attention in recent years for its capacity in compressing and accelerating convolutional neural networks. Various data-independent criteria, including norm-based and relationship-based ones, were proposed to prune the most unimportant filters. However, these state-of-the-art criteria fail to fully consider the dissimilarity of filters, and thus might lead to performance degradation. In this paper, we first analyze the limitation of relationship-based criteria with examples, and then introduce a new data-independent criterion, Weighted Hybrid Criterion (WHC), to tackle the problems of both norm-based and relationship-based criteria. By taking the magnitude of each filter and the linear dependence between filters into consideration, WHC can robustly recognize the most redundant filters, which can be safely pruned without introducing severe performance degradation to networks. Extensive pruning experiments in a simple one-shot manner demonstrate the effectiveness of the proposed WHC. In particular, WHC can prune ResNet-50 on ImageNet with more than 42% of floating point operations reduced without any performance loss in top-5 accuracy.
[[2302.08207] Learning Thin-Plate Spline Motion and Seamless Composition for Parallax-Tolerant Unsupervised Deep Image Stitching](http://arxiv.org/abs/2302.08207) #robust
Traditional image stitching approaches tend to leverage increasingly complex geometric features (point, line, edge, etc.) for better performance. However, these hand-crafted features are only suitable for specific natural scenes with adequate geometric structures. In contrast, deep stitching schemes overcome the adverse conditions by adaptively learning robust semantic features, but they cannot handle large-parallax cases due to homography-based registration. To solve these issues, we propose UDIS++, a parallax-tolerant unsupervised deep image stitching technique. First, we propose a robust and flexible warp to model the image registration from global homography to local thin-plate spline motion. It provides accurate alignment for overlapping regions and shape preservation for non-overlapping regions by joint optimization concerning alignment and distortion. Subsequently, to improve the generalization capability, we design a simple but effective iterative strategy to enhance the warp adaption in cross-dataset and cross-resolution applications. Finally, to further eliminate the parallax artifacts, we propose to composite the stitched image seamlessly by unsupervised learning for seam-driven composition masks. Compared with existing methods, our solution is parallax-tolerant and free from laborious designs of complicated geometric features for specific scenes. Extensive experiments show our superiority over the SoTA methods, both quantitatively and qualitatively. The code will be available at https://github.com/nie-lang/UDIS2.
[[2302.08274] Robust Human Motion Forecasting using Transformer-based Model](http://arxiv.org/abs/2302.08274) #robust
Comprehending human motion is a fundamental challenge for developing Human-Robot Collaborative applications. Computer vision researchers have addressed this field by only focusing on reducing error in predictions, but not taking into account the requirements to facilitate its implementation in robots. In this paper, we propose a new model based on Transformer that simultaneously deals with the real time 3D human motion forecasting in the short and long term. Our 2-Channel Transformer (2CH-TR) is able to efficiently exploit the spatio-temporal information of a shortly observed sequence (400ms) and generates a competitive accuracy against the current state-of-the-art. 2CH-TR stands out for the efficient performance of the Transformer, being lighter and faster than its competitors. In addition, our model is tested in conditions where the human motion is severely occluded, demonstrating its robustness in reconstructing and predicting 3D human motion in a highly noisy environment. Our experiment results show that the proposed 2CH-TR outperforms the ST-Transformer, which is another state-of-the-art model based on the Transformer, in terms of reconstruction and prediction under the same conditions of input prefix. Our model reduces in 8.89% the mean squared error of ST-Transformer in short-term prediction, and 2.57% in long-term prediction in Human3.6M dataset with 400ms input prefix.
[[2302.08258] Tragic and Comical Networks](http://arxiv.org/abs/2302.08258) #robust
There is a growing tradition in the joint field of network studies and drama history that produces interpretations from the character networks of the plays.The potential of such an interpretation is that the diagrams provide a different representation of the relationships between characters as compared to reading the text or watching the performance. Our aim is to create a method that is able to cluster texts with similar structures on the basis of the play's well-interpretable and simple properties, independent from the number of characters in the drama, or in other words, the size of the network. Finding these features is the most important part of our research, as well as establishing the appropriate statistical procedure to calculate the similarities between the texts. Our data was downloaded from the DraCor database and analyzed in R (we use the GerDracor and the ShakeDraCor sub-collection). We want to propose a robust method based on the distribution of words among characters; distribution of characters in scenes, average length of speech acts, or character-specific and macro-level network properties such as clusterization coefficient and network density. Based on these metrics a supervised classification procedure is applied to the sub-collections to classify comedies and tragedies using the Support Vector Machine (SVM) method. Our research shows that this approach can also produce reliable results on a small sample size.
[[2302.08387] LEALLA: Learning Lightweight Language-agnostic Sentence Embeddings with Knowledge Distillation](http://arxiv.org/abs/2302.08387) #robust
Large-scale language-agnostic sentence embedding models such as LaBSE (Feng et al., 2022) obtain state-of-the-art performance for parallel sentence alignment. However, these large-scale models can suffer from inference speed and computation overhead. This study systematically explores learning language-agnostic sentence embeddings with lightweight models. We demonstrate that a thin-deep encoder can construct robust low-dimensional sentence embeddings for 109 languages. With our proposed distillation methods, we achieve further improvements by incorporating knowledge from a teacher model. Empirical results on Tatoeba, United Nations, and BUCC show the effectiveness of our lightweight models. We release our lightweight language-agnostic sentence embedding models LEALLA on TensorFlow Hub.
[[2302.08500] Auditing large language models: a three-layered approach](http://arxiv.org/abs/2302.08500) #robust
The emergence of large language models (LLMs) represents a major advance in artificial intelligence (AI) research. However, the widespread use of LLMs is also coupled with significant ethical and social challenges. Previous research has pointed towards auditing as a promising governance mechanism to help ensure that AI systems are designed and deployed in ways that are ethical, legal, and technically robust. However, existing auditing procedures fail to address the governance challenges posed by LLMs, which are adaptable to a wide range of downstream tasks. To help bridge that gap, we offer three contributions in this article. First, we establish the need to develop new auditing procedures that capture the risks posed by LLMs by analysing the affordances and constraints of existing auditing procedures. Second, we outline a blueprint to audit LLMs in feasible and effective ways by drawing on best practices from IT governance and system engineering. Specifically, we propose a three-layered approach, whereby governance audits, model audits, and application audits complement and inform each other. Finally, we discuss the limitations not only of our three-layered approach but also of the prospect of auditing LLMs at all. Ultimately, this article seeks to expand the methodological toolkit available to technology providers and policymakers who wish to analyse and evaluate LLMs from technical, ethical, and legal perspectives.
[[2302.08048] Robust Mid-Pass Filtering Graph Convolutional Networks](http://arxiv.org/abs/2302.08048) #robust
Graph convolutional networks (GCNs) are currently the most promising paradigm for dealing with graph-structure data, while recent studies have also shown that GCNs are vulnerable to adversarial attacks. Thus developing GCN models that are robust to such attacks become a hot research topic. However, the structural purification learning-based or robustness constraints-based defense GCN methods are usually designed for specific data or attacks, and introduce additional objective that is not for classification. Extra training overhead is also required in their design. To address these challenges, we conduct in-depth explorations on mid-frequency signals on graphs and propose a simple yet effective Mid-pass filter GCN (Mid-GCN). Theoretical analyses guarantee the robustness of signals through the mid-pass filter, and we also shed light on the properties of different frequency signals under adversarial attacks. Extensive experiments on six benchmark graph data further verify the effectiveness of our designed Mid-GCN in node classification accuracy compared to state-of-the-art GCNs under various adversarial attack strategies.
[[2302.08051] Graph Adversarial Immunization for Certifiable Robustness](http://arxiv.org/abs/2302.08051) #robust
Despite achieving great success, graph neural networks (GNNs) are vulnerable to adversarial attacks. Existing defenses focus on developing adversarial training or robust GNNs. However, little research attention is paid to the potential and practice of immunization on graphs. In this paper, we propose and formulate graph adversarial immunization, i.e., vaccinating part of graph structure to improve certifiable robustness of graph against any admissible adversarial attack. We first propose edge-level immunization to vaccinate node pairs. Despite the primary success, such edge-level immunization cannot defend against emerging node injection attacks, since it only immunizes existing node pairs. To this end, we further propose node-level immunization. To circumvent computationally expensive combinatorial optimization when solving adversarial immunization, we design AdvImmune-Edge and AdvImmune-Node algorithms to effectively obtain the immune node pairs or nodes. Experiments demonstrate the superiority of AdvImmune methods. In particular, AdvImmune-Node remarkably improves the ratio of robust nodes by 79%, 294%, and 100%, after immunizing only 5% nodes. Furthermore, AdvImmune methods show excellent defensive performance against various attacks, outperforming state-of-the-art defenses. To the best of our knowledge, this is the first attempt to improve certifiable robustness from graph data perspective without losing performance on clean graphs, providing new insights into graph adversarial learning.
[[2302.07980] A Meta-Learning Approach to Population-Based Modelling of Structures](http://arxiv.org/abs/2302.07980) #robust
A major problem of machine-learning approaches in structural dynamics is the frequent lack of structural data. Inspired by the recently-emerging field of population-based structural health monitoring (PBSHM), and the use of transfer learning in this novel field, the current work attempts to create models that are able to transfer knowledge within populations of structures. The approach followed here is meta-learning, which is developed with a view to creating neural network models which are able to exploit knowledge from a population of various tasks to perform well in newly-presented tasks, with minimal training and a small number of data samples from the new task. Essentially, the method attempts to perform transfer learning in an automatic manner within the population of tasks. For the purposes of population-based structural modelling, the different tasks refer to different structures. The method is applied here to a population of simulated structures with a view to predicting their responses as a function of some environmental parameters. The meta-learning approach, which is used herein is the model-agnostic meta-learning (MAML) approach; it is compared to a traditional data-driven modelling approach, that of Gaussian processes, which is a quite effective alternative when few data samples are available for a problem. It is observed that the models trained using meta-learning approaches, are able to outperform conventional machine learning methods regarding inference about structures of the population, for which only a small number of samples are available. Moreover, the models prove to learn part of the physics of the problem, making them more robust than plain machine-learning algorithms. Another advantage of the methods is that the structures do not need to be parametrised in order for the knowledge transfer to be performed.
[[2302.07998] cGAN-Based High Dimensional IMU Sensor Data Generation for Therapeutic Activities](http://arxiv.org/abs/2302.07998) #robust
Human activity recognition is a core technology for applications such as rehabilitation, ambient health monitoring, and human-computer interactions. Wearable devices, particularly IMU sensors, can help us collect rich features of human movements that can be leveraged in activity recognition. Developing a robust classifier for activity recognition has always been of interest to researchers. One major problem is that there is usually a deficit of training data for some activities, making it difficult and sometimes impossible to develop a classifier. In this work, a novel GAN network called TheraGAN was developed to generate realistic IMU signals associated with a particular activity. The generated signal is of a 6-channel IMU. i.e., angular velocities and linear accelerations. Also, by introducing simple activities, which are meaningful subparts of a complex full-length activity, the generation process was facilitated for any activity with arbitrary length. To evaluate the generated signals, besides perceptual similarity metrics, they were applied along with real data to improve the accuracy of classifiers. The results show that the maximum increase in the f1-score belongs to the LSTM classifier by a 13.27% rise when generated data were added. This shows the validity of the generated data as well as TheraGAN as a tool to build more robust classifiers in case of imbalanced data problem.
[[2302.08416] A Bayesian Perspective for Determinant Minimization Based Robust Structured Matrix Factorizatio](http://arxiv.org/abs/2302.08416) #robust
We introduce a Bayesian perspective for the structured matrix factorization problem. The proposed framework provides a probabilistic interpretation for existing geometric methods based on determinant minimization. We model input data vectors as linear transformations of latent vectors drawn from a distribution uniform over a particular domain reflecting structural assumptions, such as the probability simplex in Nonnegative Matrix Factorization and polytopes in Polytopic Matrix Factorization. We represent the rows of the linear transformation matrix as vectors generated independently from a normal distribution whose covariance matrix is inverse Wishart distributed. We show that the corresponding maximum a posteriori estimation problem boils down to the robust determinant minimization approach for structured matrix factorization, providing insights about parameter selections and potential algorithmic extensions.
[[2302.08469] Hardware-aware training for large-scale and diverse deep learning inference workloads using in-memory computing-based accelerators](http://arxiv.org/abs/2302.08469) #robust
Analog in-memory computing (AIMC) -- a promising approach for energy-efficient acceleration of deep learning workloads -- computes matrix-vector multiplications (MVMs) but only approximately, due to nonidealities that often are non-deterministic or nonlinear. This can adversely impact the achievable deep neural network (DNN) inference accuracy as compared to a conventional floating point (FP) implementation. While retraining has previously been suggested to improve robustness, prior work has explored only a few DNN topologies, using disparate and overly simplified AIMC hardware models. Here, we use hardware-aware (HWA) training to systematically examine the accuracy of AIMC for multiple common artificial intelligence (AI) workloads across multiple DNN topologies, and investigate sensitivity and robustness to a broad set of nonidealities. By introducing a new and highly realistic AIMC crossbar-model, we improve significantly on earlier retraining approaches. We show that many large-scale DNNs of various topologies, including convolutional neural networks (CNNs), recurrent neural networks (RNNs), and transformers, can in fact be successfully retrained to show iso-accuracy on AIMC. Our results further suggest that AIMC nonidealities that add noise to the inputs or outputs, not the weights, have the largest impact on DNN accuracy, and that RNNs are particularly robust to all nonidealities.
[[2302.07919] COVID-VTS: Fact Extraction and Verification on Short Video Platforms](http://arxiv.org/abs/2302.07919) #extraction
We introduce a new benchmark, COVID-VTS, for fact-checking multi-modal information involving short-duration videos with COVID19- focused information from both the real world and machine generation. We propose, TwtrDetective, an effective model incorporating cross-media consistency checking to detect token-level malicious tampering in different modalities, and generate explanations. Due to the scarcity of training data, we also develop an efficient and scalable approach to automatically generate misleading video posts by event manipulation or adversarial matching. We investigate several state-of-the-art models and demonstrate the superiority of TwtrDetective.
[[2302.08493] Deep Multi-stream Network for Video-based Calving Sign Detection](http://arxiv.org/abs/2302.08493) #extraction
We have designed a deep multi-stream network for automatically detecting calving signs from video. Calving sign detection from a camera, which is a non-contact sensor, is expected to enable more efficient livestock management. As large-scale, well-developed data cannot generally be assumed when establishing calving detection systems, the basis for making the prediction needs to be presented to farmers during operation, so black-box modeling (also known as end-to-end modeling) is not appropriate. For practical operation of calving detection systems, the present study aims to incorporate expert knowledge into a deep neural network. To this end, we propose a multi-stream calving sign detection network in which multiple calving-related features are extracted from the corresponding feature extraction networks designed for each attribute with different characteristics, such as a cow's posture, rotation, and movement, known as calving signs, and are then integrated appropriately depending on the cow's situation. Experimental comparisons conducted using videos of 15 cows demonstrated that our multi-stream system yielded a significant improvement over the end-to-end system, and the multi-stream architecture significantly contributed to a reduction in detection errors. In addition, the distinctive mixture weights we observed helped provide interpretability of the system's behavior.
[[2302.08351] A Survey on Event-based News Narrative Extraction](http://arxiv.org/abs/2302.08351) #extraction
Narratives are fundamental to our understanding of the world, providing us with a natural structure for knowledge representation over time. Computational narrative extraction is a subfield of artificial intelligence that makes heavy use of information retrieval and natural language processing techniques. Despite the importance of computational narrative extraction, relatively little scholarly work exists on synthesizing previous research and strategizing future research in the area. In particular, this article focuses on extracting news narratives from an event-centric perspective. Extracting narratives from news data has multiple applications in understanding the evolving information landscape. This survey presents an extensive study of research in the area of event-based news narrative extraction. In particular, we screened over 900 articles that yielded 54 relevant articles. These articles are synthesized and organized by representation model, extraction criteria, and evaluation approaches. Based on the reviewed studies, we identify recent trends, open challenges, and potential research lines.
[[2302.08205] A method for incremental discovery of financial event types based on anomaly detection](http://arxiv.org/abs/2302.08205) #extraction
Event datasets in the financial domain are often constructed based on actual application scenarios, and their event types are weakly reusable due to scenario constraints; at the same time, the massive and diverse new financial big data cannot be limited to the event types defined for specific scenarios. This limitation of a small number of event types does not meet our research needs for more complex tasks such as the prediction of major financial events and the analysis of the ripple effects of financial events. In this paper, a three-stage approach is proposed to accomplish incremental discovery of event types. For an existing annotated financial event dataset, the three-stage approach consists of: for a set of financial event data with a mixture of original and unknown event types, a semi-supervised deep clustering model with anomaly detection is first applied to classify the data into normal and abnormal events, where abnormal events are events that do not belong to known types; then normal events are tagged with appropriate event types and abnormal events are reasonably clustered. Finally, a cluster keyword extraction method is used to recommend the type names of events for the new event clusters, thus incrementally discovering new event types. The proposed method is effective in the incremental discovery of new event types on real data sets.
[[2302.08015] Individual Fairness Guarantee in Learning with Censorship](http://arxiv.org/abs/2302.08015) #fair
Algorithmic fairness, studying how to make machine learning (ML) algorithms fair, is an established area of ML. As ML technologies expand their application domains, including ones with high societal impact, it becomes essential to take fairness into consideration when building ML systems. Yet, despite its wide range of socially sensitive applications, most work treats the issue of algorithmic bias as an intrinsic property of supervised learning, i.e., the class label is given as a precondition. Unlike prior fairness work, we study individual fairness in learning with censorship where the assumption of availability of the class label does not hold, while still requiring that similar individuals are treated similarly. We argue that this perspective represents a more realistic model of fairness research for real-world application deployment, and show how learning with such a relaxed precondition draws new insights that better explain algorithmic fairness. We also thoroughly evaluate the performance of the proposed methodology on three real-world datasets, and validate its superior performance in minimizing discrimination while maintaining predictive performance.
[[2302.08017] Preventing Discriminatory Decision-making in Evolving Data Streams](http://arxiv.org/abs/2302.08017) #fair
Bias in machine learning has rightly received significant attention over the last decade. However, most fair machine learning (fair-ML) work to address bias in decision-making systems has focused solely on the offline setting. Despite the wide prevalence of online systems in the real world, work on identifying and correcting bias in the online setting is severely lacking. The unique challenges of the online environment make addressing bias more difficult than in the offline setting. First, Streaming Machine Learning (SML) algorithms must deal with the constantly evolving real-time data stream. Second, they need to adapt to changing data distributions (concept drift) to make accurate predictions on new incoming data. Adding fairness constraints to this already complicated task is not straightforward. In this work, we focus on the challenges of achieving fairness in biased data streams while accounting for the presence of concept drift, accessing one sample at a time. We present Fair Sampling over Stream ($FS^2$), a novel fair rebalancing approach capable of being integrated with SML classification algorithms. Furthermore, we devise the first unified performance-fairness metric, Fairness Bonded Utility (FBU), to evaluate and compare the trade-off between performance and fairness of different bias mitigation methods efficiently. FBU simplifies the comparison of fairness-performance trade-offs of multiple techniques through one unified and intuitive evaluation, allowing model designers to easily choose a technique. Overall, extensive evaluations show our measures surpass those of other fair online techniques previously reported in the literature.
[[2302.08077] Group Fairness with Uncertainty in Sensitive Attributes](http://arxiv.org/abs/2302.08077) #fair
We consider learning a fair predictive model when sensitive attributes are uncertain, say, due to a limited amount of labeled data, collection bias, or privacy mechanism. We formulate the problem, for the independence notion of fairness, using the information bottleneck principle, and propose a robust optimization with respect to an uncertainty set of the sensitive attributes. As an illustrative case, we consider the joint Gaussian model and reduce the task to a quadratically constrained quadratic problem (QCQP). To ensure a strict fairness guarantee, we propose a robust QCQP and completely characterize its solution with an intuitive geometric understanding. When uncertainty arises due to limited labeled sensitive attributes, our analysis reveals the contribution of each new sample towards the optimal performance achieved with unlimited access to labeled sensitive attributes. This allows us to identify non-trivial regimes where uncertainty incurs no performance loss of the proposed algorithm while continuing to guarantee strict fairness. We also propose a bootstrap-based generic algorithm that is applicable beyond the Gaussian case. We demonstrate the value of our analysis and method on synthetic data as well as real-world classification and regression tasks.
[[2302.08158] Counterfactual Fair Opportunity: Measuring Decision Model Fairness with Counterfactual Reasoning](http://arxiv.org/abs/2302.08158) #fair
The increasing application of Artificial Intelligence and Machine Learning models poses potential risks of unfair behavior and, in light of recent regulations, has attracted the attention of the research community. Several researchers focused on seeking new fairness definitions or developing approaches to identify biased predictions. However, none try to exploit the counterfactual space to this aim. In that direction, the methodology proposed in this work aims to unveil unfair model behaviors using counterfactual reasoning in the case of fairness under unawareness setting. A counterfactual version of equal opportunity named counterfactual fair opportunity is defined and two novel metrics that analyze the sensitive information of counterfactual samples are introduced. Experimental results on three different datasets show the efficacy of our methodologies and our metrics, disclosing the unfair behavior of classic machine learning and debiasing models.
[[2302.08204] Counterfactual Reasoning for Bias Evaluation and Detection in a Fairness under Unawareness setting](http://arxiv.org/abs/2302.08204) #fair
Current AI regulations require discarding sensitive features (e.g., gender, race, religion) in the algorithm's decision-making process to prevent unfair outcomes. However, even without sensitive features in the training set, algorithms can persist in discrimination. Indeed, when sensitive features are omitted (fairness under unawareness), they could be inferred through non-linear relations with the so called proxy features. In this work, we propose a way to reveal the potential hidden bias of a machine learning model that can persist even when sensitive features are discarded. This study shows that it is possible to unveil whether the black-box predictor is still biased by exploiting counterfactual reasoning. In detail, when the predictor provides a negative classification outcome, our approach first builds counterfactual examples for a discriminated user category to obtain a positive outcome. Then, the same counterfactual samples feed an external classifier (that targets a sensitive feature) that reveals whether the modifications to the user characteristics needed for a positive outcome moved the individual to the non-discriminated group. When this occurs, it could be a warning sign for discriminatory behavior in the decision process. Furthermore, we leverage the deviation of counterfactuals from the original sample to determine which features are proxies of specific sensitive information. Our experiments show that, even if the model is trained without sensitive features, it often suffers discriminatory biases.
[[2302.08406] Entity Aware Modelling: A Survey](http://arxiv.org/abs/2302.08406) #fair
Personalized prediction of responses for individual entities caused by external drivers is vital across many disciplines. Recent machine learning (ML) advances have led to new state-of-the-art response prediction models. Models built at a population level often lead to sub-optimal performance in many personalized prediction settings due to heterogeneity in data across entities (tasks). In personalized prediction, the goal is to incorporate inherent characteristics of different entities to improve prediction performance. In this survey, we focus on the recent developments in the ML community for such entity-aware modeling approaches. ML algorithms often modulate the network using these entity characteristics when they are readily available. However, these entity characteristics are not readily available in many real-world scenarios, and different ML methods have been proposed to infer these characteristics from the data. In this survey, we have organized the current literature on entity-aware modeling based on the availability of these characteristics as well as the amount of training data. We highlight how recent innovations in other disciplines, such as uncertainty quantification, fairness, and knowledge-guided machine learning, can improve entity-aware modeling.
[[2302.08507] The Scope of Multicalibration: Characterizing Multicalibration via Property Elicitation](http://arxiv.org/abs/2302.08507) #fair
We make a connection between multicalibration and property elicitation and show that (under mild technical conditions) it is possible to produce a multicalibrated predictor for a continuous scalar distributional property $\Gamma$ if and only if $\Gamma$ is elicitable.
On the negative side, we show that for non-elicitable continuous properties there exist simple data distributions on which even the true distributional predictor is not calibrated. On the positive side, for elicitable $\Gamma$, we give simple canonical algorithms for the batch and the online adversarial setting, that learn a $\Gamma$-multicalibrated predictor. This generalizes past work on multicalibrated means and quantiles, and in fact strengthens existing online quantile multicalibration results.
To further counter-weigh our negative result, we show that if a property $\Gamma^1$ is not elicitable by itself, but is elicitable conditionally on another elicitable property $\Gamma^0$, then there is a canonical algorithm that jointly multicalibrates $\Gamma^1$ and $\Gamma^0$; this generalizes past work on mean-moment multicalibration.
Finally, as applications of our theory, we provide novel algorithmic and impossibility results for fair (multicalibrated) risk assessment.
[[2302.08038] Fuzzy Knowledge Distillation from High-Order TSK to Low-Order TSK](http://arxiv.org/abs/2302.08038) #interpretability
High-order Takagi-Sugeno-Kang (TSK) fuzzy classifiers possess powerful classification performance yet have fewer fuzzy rules, but always be impaired by its exponential growth training time and poorer interpretability owing to High-order polynomial used in consequent part of fuzzy rule, while Low-order TSK fuzzy classifiers run quickly with high interpretability, however they usually require more fuzzy rules and perform relatively not very well. Address this issue, a novel TSK fuzzy classifier embeded with knowledge distillation in deep learning called HTSK-LLM-DKD is proposed in this study. HTSK-LLM-DKD achieves the following distinctive characteristics: 1) It takes High-order TSK classifier as teacher model and Low-order TSK fuzzy classifier as student model, and leverages the proposed LLM-DKD (Least Learning Machine based Decoupling Knowledge Distillation) to distill the fuzzy dark knowledge from High-order TSK fuzzy classifier to Low-order TSK fuzzy classifier, which resulting in Low-order TSK fuzzy classifier endowed with enhanced performance surpassing or at least comparable to High-order TSK classifier, as well as high interpretability; specifically 2) The Negative Euclidean distance between the output of teacher model and each class is employed to obtain the teacher logits, and then it compute teacher/student soft labels by the softmax function with distillating temperature parameter; 3) By reformulating the Kullback-Leibler divergence, it decouples fuzzy dark knowledge into target class knowledge and non-target class knowledge, and transfers them to student model. The advantages of HTSK-LLM-DKD are verified on the benchmarking UCI datasets and a real dataset Cleveland heart disease, in terms of classification performance and model interpretability.
[[2302.08192] Frugal day-ahead forecasting of multiple local electricity loads by aggregating adaptive models](http://arxiv.org/abs/2302.08192) #interpretability
We focus on day-ahead electricity load forecasting of substations of the distribution network in France; therefore, our problem lies between the instability of a single consumption and the stability of a countrywide total demand. Moreover, we are interested in forecasting the loads of over one thousand substations; consequently, we are in the context of forecasting multiple time series. To that end, we rely on an adaptive methodology that provided excellent results at a national scale; the idea is to combine generalized additive models with state-space representations. However, the extension of this methodology to the prediction of over a thousand time series raises a computational issue. We solve it by developing a frugal variant, reducing the number of parameters estimated; we estimate the forecasting models only for a few time series and achieve transfer learning by relying on aggregation of experts. It yields a reduction of computational needs and their associated emissions. We build several variants, corresponding to different levels of parameter transfer, and we look for the best trade-off between accuracy and frugality. The selected method achieves competitive results compared to state-of-the-art individual models. Finally, we highlight the interpretability of the models, which is important for operational applications.
[[2302.08261] Knowledge-augmented Graph Machine Learning for Drug Discovery: A Survey from Precision to Interpretability](http://arxiv.org/abs/2302.08261) #interpretability
The integration of Artificial Intelligence (AI) into the field of drug discovery has been a growing area of interdisciplinary scientific research. However, conventional AI models are heavily limited in handling complex biomedical structures (such as 2D or 3D protein and molecule structures) and providing interpretations for outputs, which hinders their practical application. As of late, Graph Machine Learning (GML) has gained considerable attention for its exceptional ability to model graph-structured biomedical data and investigate their properties and functional relationships. Despite extensive efforts, GML methods still suffer from several deficiencies, such as the limited ability to handle supervision sparsity and provide interpretability in learning and inference processes, and their ineffectiveness in utilising relevant domain knowledge. In response, recent studies have proposed integrating external biomedical knowledge into the GML pipeline to realise more precise and interpretable drug discovery with limited training instances. However, a systematic definition for this burgeoning research direction is yet to be established. This survey presents a comprehensive overview of long-standing drug discovery principles, provides the foundational concepts and cutting-edge techniques for graph-structured data and knowledge databases, and formally summarises Knowledge-augmented Graph Machine Learning (KaGML) for drug discovery. A thorough review of related KaGML works, collected following a carefully designed search methodology, are organised into four categories following a novel-defined taxonomy. To facilitate research in this promptly emerging field, we also share collected practical resources that are valuable for intelligent drug discovery and provide an in-depth discussion of the potential avenues for future advancements.
[[2302.08160] The Inadequacy of Shapley Values for Explainability](http://arxiv.org/abs/2302.08160) #explainability
This paper develops a rigorous argument for why the use of Shapley values in explainable AI (XAI) will necessarily yield provably misleading information about the relative importance of features for predictions. Concretely, this paper demonstrates that there exist classifiers, and associated predictions, for which the relative importance of features determined by the Shapley values will incorrectly assign more importance to features that are provably irrelevant for the prediction, and less importance to features that are provably relevant for the prediction. The paper also argues that, given recent complexity results, the existence of efficient algorithms for the computation of rigorous feature attribution values in the case of some restricted classes of classifiers should be deemed unlikely at best.
[[2302.07944] Effective Data Augmentation With Diffusion Models](http://arxiv.org/abs/2302.07944) #diffusion
Data augmentation is one of the most prevalent tools in deep learning, underpinning many recent advances, including those from classification, generative models, and representation learning. The standard approach to data augmentation combines simple transformations like rotations and flips to generate new images from existing ones. However, these new images lack diversity along key semantic axes present in the data. Consider the task of recognizing different animals. Current augmentations fail to produce diversity in task-relevant high-level semantic attributes like the species of the animal. We address the lack of diversity in data augmentation with image-to-image transformations parameterized by pre-trained text-to-image diffusion models. Our method edits images to change their semantics using an off-the-shelf diffusion model, and generalizes to novel visual concepts from a few labelled examples. We evaluate our approach on image classification tasks in a few-shot setting, and on a real-world weed recognition task, and observe an improvement in accuracy in tested domains.
[[2302.07979] PRedItOR: Text Guided Image Editing with Diffusion Prior](http://arxiv.org/abs/2302.07979) #diffusion
Diffusion models have shown remarkable capabilities in generating high quality and creative images conditioned on text. An interesting application of such models is structure preserving text guided image editing. Existing approaches rely on text conditioned diffusion models such as Stable Diffusion or Imagen and require compute intensive optimization of text embeddings or fine-tuning the model weights for text guided image editing. We explore text guided image editing with a Hybrid Diffusion Model (HDM) architecture similar to DALLE-2. Our architecture consists of a diffusion prior model that generates CLIP image embedding conditioned on a text prompt and a custom Latent Diffusion Model trained to generate images conditioned on CLIP image embedding. We discover that the diffusion prior model can be used to perform text guided conceptual edits on the CLIP image embedding space without any finetuning or optimization. We combine this with structure preserving edits on the image decoder using existing approaches such as reverse DDIM to perform text guided image editing. Our approach, PRedItOR does not require additional inputs, fine-tuning, optimization or objectives and shows on par or better results than baselines qualitatively and quantitatively. We provide further analysis and understanding of the diffusion prior model and believe this opens up new possibilities in diffusion models research.
[[2302.08113] MultiDiffusion: Fusing Diffusion Paths for Controlled Image Generation](http://arxiv.org/abs/2302.08113) #diffusion
Recent advances in text-to-image generation with diffusion models present transformative capabilities in image quality. However, user controllability of the generated image, and fast adaptation to new tasks still remains an open challenge, currently mostly addressed by costly and long re-training and fine-tuning or ad-hoc adaptations to specific image generation tasks. In this work, we present MultiDiffusion, a unified framework that enables versatile and controllable image generation, using a pre-trained text-to-image diffusion model, without any further training or finetuning. At the center of our approach is a new generation process, based on an optimization task that binds together multiple diffusion generation processes with a shared set of parameters or constraints. We show that MultiDiffusion can be readily applied to generate high quality and diverse images that adhere to user-provided controls, such as desired aspect ratio (e.g., panorama), and spatial guiding signals, ranging from tight segmentation masks to bounding boxes. Project webpage: https://multidiffusion.github.io
[[2302.08357] Boundary Guided Mixing Trajectory for Semantic Control with Diffusion Models](http://arxiv.org/abs/2302.08357) #diffusion
Applying powerful generative denoising diffusion models (DDMs) for downstream tasks such as image semantic editing usually requires either fine-tuning pre-trained DDMs or learning auxiliary editing networks. In this work, we achieve SOTA semantic control performance on various application settings by optimizing the denoising trajectory solely via frozen DDMs. As one of the first optimization-based diffusion editing work, we start by seeking a more comprehensive understanding of the intermediate high-dimensional latent spaces by theoretically and empirically analyzing their probabilistic and geometric behaviors in the Markov chain. We then propose to further explore the critical step in the denoising trajectory that characterizes the convergence of a pre-trained DDM. Last but not least, we further present our method to search for the semantic subspaces boundaries for controllable manipulation, by guiding the denoising trajectory towards the targeted boundary at the critical convergent step. We conduct extensive experiments on various DPMs architectures (DDPM, iDDPM) and datasets (CelebA, CelebA-HQ, LSUN-church, LSUN-bedroom, AFHQ-dog) with different resolutions (64, 256) as empirical demonstrations.
[[2302.08411] Explicit Diffusion of Gaussian Mixture Model Based Image Priors](http://arxiv.org/abs/2302.08411) #diffusion
In this work we tackle the problem of estimating the density $f_X$ of a random variable $X$ by successive smoothing, such that the smoothed random variable $Y$ fulfills $(\partial_t - \Delta_1)f_Y(\,\cdot\,, t) = 0$, $f_Y(\,\cdot\,, 0) = f_X$. With a focus on image processing, we propose a product/fields of experts model with Gaussian mixture experts that admits an analytic expression for $f_Y (\,\cdot\,, t)$ under an orthogonality constraint on the filters. This construction naturally allows the model to be trained simultaneously over the entire diffusion horizon using empirical Bayes. We show preliminary results on image denoising where our model leads to competitive results while being tractable, interpretable, and having only a small number of learnable parameters. As a byproduct, our model can be used for reliable noise estimation, allowing blind denoising of images corrupted by heteroscedastic noise.
[[2302.08453] T2I-Adapter: Learning Adapters to Dig out More Controllable Ability for Text-to-Image Diffusion Models](http://arxiv.org/abs/2302.08453) #diffusion
The incredible generative ability of large-scale text-to-image (T2I) models has demonstrated strong power of learning complex structures and meaningful semantics. However, relying solely on text prompts cannot fully take advantage of the knowledge learned by the model, especially when flexible and accurate structure control is needed. In this paper, we aim to ``dig out" the capabilities that T2I models have implicitly learned, and then explicitly use them to control the generation more granularly. Specifically, we propose to learn simple and small T2I-Adapters to align internal knowledge in T2I models with external control signals, while freezing the original large T2I models. In this way, we can train various adapters according to different conditions, and achieve rich control and editing effects. Further, the proposed T2I-Adapters have attractive properties of practical value, such as composability and generalization ability. Extensive experiments demonstrate that our T2I-Adapter has promising generation quality and a wide range of applications.
[[2302.08510] Text-driven Visual Synthesis with Latent Diffusion Prior](http://arxiv.org/abs/2302.08510) #diffusion
There has been tremendous progress in large-scale text-to-image synthesis driven by diffusion models enabling versatile downstream applications such as 3D object synthesis from texts, image editing, and customized generation. We present a generic approach using latent diffusion models as powerful image priors for various visual synthesis tasks. Existing methods that utilize such priors fail to use these models' full capabilities. To improve this, our core ideas are 1) a feature matching loss between features from different layers of the decoder to provide detailed guidance and 2) a KL divergence loss to regularize the predicted latent features and stabilize the training. We demonstrate the efficacy of our approach on three different applications, text-to-3D, StyleGAN adaptation, and layered image editing. Extensive results show our method compares favorably against baselines.
[[2302.08086] Understanding the Distillation Process from Deep Generative Models to Tractable Probabilistic Circuits](http://arxiv.org/abs/2302.08086) #diffusion
Probabilistic Circuits (PCs) are a general and unified computational framework for tractable probabilistic models that support efficient computation of various inference tasks (e.g., computing marginal probabilities). Towards enabling such reasoning capabilities in complex real-world tasks, Liu et al. (2022) propose to distill knowledge (through latent variable assignments) from less tractable but more expressive deep generative models. However, it is still unclear what factors make this distillation work well. In this paper, we theoretically and empirically discover that the performance of a PC can exceed that of its teacher model. Therefore, instead of performing distillation from the most expressive deep generative model, we study what properties the teacher model and the PC should have in order to achieve good distillation performance. This leads to a generic algorithmic improvement as well as other data-type-specific ones over the existing latent variable distillation pipeline. Empirically, we outperform SoTA TPMs by a large margin on challenging image modeling benchmarks. In particular, on ImageNet32, PCs achieve 4.06 bits-per-dimension, which is only 0.34 behind variational diffusion models (Kingma et al., 2021).
[[2302.08224] DIFUSCO: Graph-based Diffusion Solvers for Combinatorial Optimization](http://arxiv.org/abs/2302.08224) #diffusion
Neural network-based Combinatorial Optimization (CO) methods have shown promising results in solving various NP-complete (NPC) problems without relying on hand-crafted domain knowledge. This paper broadens the current scope of neural solvers for NPC problems by introducing a new graph-based diffusion framework, namely DIFUSCO. Our framework casts NPC problems as discrete {0, 1}-vector optimization problems and leverages graph-based denoising diffusion models to generate high-quality solutions. We investigate two types of diffusion models with Gaussian and Bernoulli noise, respectively, and devise an effective inference schedule to enhance the solution quality. We evaluate our methods on two well-studied NPC combinatorial optimization problems: Traveling Salesman Problem (TSP) and Maximal Independent Set (MIS). Experimental results show that DIFUSCO strongly outperforms the previous state-of-the-art neural solvers, improving the performance gap between ground-truth and neural solvers from 1.76% to 0.46% on TSP-500, from 2.46% to 1.17% on TSP-1000, and from 3.19% to 2.58% on TSP10000. For the MIS problem, DIFUSCO outperforms the previous state-of-the-art neural solver on the challenging SATLIB benchmark. Our code is available at "https://github.com/Edward-Sun/DIFUSCO".