[[2304.03387] From Social Engineering to Quantum Threats: Safeguarding User Wallets with FailSafe](http://arxiv.org/abs/2304.03387) #secure
While cryptocurrencies have been rapidly gaining adoption, secure wallet interactions are still elusive for many users, which frequently leads to loss of funds. Here we propose an approach to securing interactions with cryptocurrency wallets for end-users. The approach called FailSafe consists of several defence-in-depth measures that can be applied near-term as well as a tool called qMig for aiding eventual quantum migration.
[[2304.03541] Code-based Cryptography: Lecture Notes](http://arxiv.org/abs/2304.03541) #secure
These lecture notes have been written for courses given at \'Ecole normale sup\'erieure de Lyon and summer school 2022 in post-quantum cryptography that took place in the university of Budapest. Our objective is to give a general introduction to the foundations of code-based cryptography which is currently known to be secure even against quantum adversaries. In particular we focus our attention to the decoding problem whose hardness is at the ground of the security of many cryptographic primitives, the most prominent being McEliece and Alekhnovich' encryption schemes.
[[2304.03763] Clutter Detection and Removal in 3D Scenes with View-Consistent Inpainting](http://arxiv.org/abs/2304.03763) #privacy
Removing clutter from scenes is essential in many applications, ranging from privacy-concerned content filtering to data augmentation. In this work, we present an automatic system that removes clutter from 3D scenes and inpaints with coherent geometry and texture. We propose techniques for its two key components: 3D segmentation from shared properties and 3D inpainting, both of which are important porblems. The definition of 3D scene clutter (frequently-moving objects) is not well captured by commonly-studied object categories in computer vision. To tackle the lack of well-defined clutter annotations, we group noisy fine-grained labels, leverage virtual rendering, and impose an instance-level area-sensitive loss. Once clutter is removed, we inpaint geometry and texture in the resulting holes by merging inpainted RGB-D images. This requires novel voting and pruning strategies that guarantee multi-view consistency across individually inpainted images for mesh reconstruction. Experiments on ScanNet and Matterport dataset show that our method outperforms baselines for clutter segmentation and 3D inpainting, both visually and quantitatively.
[[2304.03472] Does Prompt-Tuning Language Model Ensure Privacy?](http://arxiv.org/abs/2304.03472) #privacy
Prompt-tuning has received attention as an efficient tuning method in the language domain, i.e., tuning a prompt that is a few tokens long, while keeping the large language model frozen, yet achieving comparable performance with conventional fine-tuning. Considering the emerging privacy concerns with language models, we initiate the study of privacy leakage in the setting of prompt-tuning. We first describe a real-world email service pipeline to provide customized output for various users via prompt-tuning. Then we propose a novel privacy attack framework to infer users' private information by exploiting the prompt module with user-specific signals. We conduct a comprehensive privacy evaluation on the target pipeline to demonstrate the potential leakage from prompt-tuning. The results also demonstrate the effectiveness of the proposed attack.
[[2304.03538] Adjustable Privacy using Autoencoder-based Learning Structure](http://arxiv.org/abs/2304.03538) #privacy
Inference centers need more data to have a more comprehensive and beneficial learning model, and for this purpose, they need to collect data from data providers. On the other hand, data providers are cautious about delivering their datasets to inference centers in terms of privacy considerations. In this paper, by modifying the structure of the autoencoder, we present a method that manages the utility-privacy trade-off well. To be more precise, the data is first compressed using the encoder, then confidential and non-confidential features are separated and uncorrelated using the classifier. The confidential feature is appropriately combined with noise, and the non-confidential feature is enhanced, and at the end, data with the original data format is produced by the decoder. The proposed architecture also allows data providers to set the level of privacy required for confidential features. The proposed method has been examined for both image and categorical databases, and the results show a significant performance improvement compared to previous methods.
[[2304.03545] AI Model Disgorgement: Methods and Choices](http://arxiv.org/abs/2304.03545) #privacy
Responsible use of data is an indispensable part of any machine learning (ML) implementation. ML developers must carefully collect and curate their datasets, and document their provenance. They must also make sure to respect intellectual property rights, preserve individual privacy, and use data in an ethical way. Over the past few years, ML models have significantly increased in size and complexity. These models require a very large amount of data and compute capacity to train, to the extent that any defects in the training corpus cannot be trivially remedied by retraining the model from scratch. Despite sophisticated controls on training data and a significant amount of effort dedicated to ensuring that training corpora are properly composed, the sheer volume of data required for the models makes it challenging to manually inspect each datum comprising a training corpus. One potential fix for training corpus data defects is model disgorgement -- the elimination of not just the improperly used data, but also the effects of improperly used data on any component of an ML model. Model disgorgement techniques can be used to address a wide range of issues, such as reducing bias or toxicity, increasing fidelity, and ensuring responsible usage of intellectual property. In this paper, we introduce a taxonomy of possible disgorgement methods that are applicable to modern ML systems. In particular, we investigate the meaning of "removing the effects" of data in the trained model in a way that does not require retraining from scratch.
[[2304.03579] A lightweight Encryption Method For Privacy-Preserving in Process Mining](http://arxiv.org/abs/2304.03579) #privacy
Novel technological achievements in the fields of business intelligence, business management and data science are based on real-time and complex virtual networks. Sharing data between a large number of organizations that leads to a system with high computational complexity is one of the considerable characteristics of the current business networks. Discovery, conformance and enhancement of the business processes are performed using the generated event logs. In this regard, one of the overlooked challenges is privacy-preserving in the field of process mining in the industry. To preserve the data-privacy with a low computational complexity structure that is a necessity for the current digital business technology, a novel lightweight encryption method based on Haar transform and a private key is proposed in this paper. We compare the proposed method with the well-known homomorphic cryptosystem and Walsh- Hadamard encryption (WHE) in terms of cryptography, computational complexity and structure vulnerability. The analyses show that the proposed method anonymizes the event logs with the lower complexity and more accuracy compared with two aforementioned cryptosystems, significantly.
[[2304.03722] Beyond Privacy: Navigating the Opportunities and Challenges of Synthetic Data](http://arxiv.org/abs/2304.03722) #privacy
Generating synthetic data through generative models is gaining interest in the ML community and beyond. In the past, synthetic data was often regarded as a means to private data release, but a surge of recent papers explore how its potential reaches much further than this -- from creating more fair data to data augmentation, and from simulation to text generated by ChatGPT. In this perspective we explore whether, and how, synthetic data may become a dominant force in the machine learning world, promising a future where datasets can be tailored to individual needs. Just as importantly, we discuss which fundamental challenges the community needs to overcome for wider relevance and application of synthetic data -- the most important of which is quantifying how much we can trust any finding or prediction drawn from synthetic data.
[[2304.03757] Replicability and stability in learning](http://arxiv.org/abs/2304.03757) #privacy
Replicability is essential in science as it allows us to validate and verify
research findings. Impagliazzo, Lei, Pitassi and Sorrell (22) recently
initiated the study of replicability in machine learning. A learning algorithm
is replicable if it typically produces the same output when applied on two
i.i.d. inputs using the same internal randomness. We study a variant of
replicability that does not involve fixing the randomness. An algorithm
satisfies this form of replicability if it typically produces the same output
when applied on two i.i.d. inputs (without fixing the internal randomness).
This variant is called global stability and was introduced by Bun, Livni and
Moran (
20) in the context of differential privacy.
Impagliazzo et al. showed how to boost any replicable algorithm so that it produces the same output with probability arbitrarily close to 1. In contrast, we demonstrate that for numerous learning tasks, global stability can only be accomplished weakly, where the same output is produced only with probability bounded away from 1. To overcome this limitation, we introduce the concept of list replicability, which is equivalent to global stability. Moreover, we prove that list replicability can be boosted so that it is achieved with probability arbitrarily close to 1. We also describe basic relations between standard learning-theoretic complexity measures and list replicable numbers. Our results in addition imply that, besides trivial cases, replicable algorithms (in the sense of Impagliazzo et al.) must be randomized.
The proof of the impossibility result is based on a topological fixed-point theorem. For every algorithm, we are able to locate a "hard input distribution" by applying the Poincar\'e-Miranda theorem in a related topological setting. The equivalence between global stability and list replicability is algorithmic.
[[2304.03315] Exploration of Quantum Computer Power Side-Channels](http://arxiv.org/abs/2304.03315) #defense
Noisy Intermediate-Scale Quantum (NISQ) quantum computers are being rapidly improved, with bigger numbers of qubits and improved fidelity. The rapidly increasing qubit counts and improving the fidelity of quantum computers will enable novel algorithms to be executed on the quantum computers, and generate novel results and data whose intellectual property will be a highly-guarded secret. At the same time, quantum computers are likely to remain specialized machines, and many will be controlled and maintained in a remote, cloud-based environment where end users who want to come up with novel algorithms have no control over the physical space. Lack of physical control by users means that physical attacks could be possible, by malicious insiders in the data center, for example. This work shows for the first time that power-based side-channel attacks could be deployed against quantum computers. The attacks could be used to recover information about the control pulses sent to quantum computers. From the control pulses, the gate level description of the circuits, and eventually the secret algorithms can be reverse engineered. This work demonstrates how and what information could be recovered, and then in turn how to defend from power-based side-channels. Real control pulse information from real quantum computers is used to demonstrate potential power-based side-channel attacks. Meanwhile, proposed defenses can be deployed already today, without hardware changes.
[[2304.03510] Multispectral Imaging for Differential Face Morphing Attack Detection: A Preliminary Study](http://arxiv.org/abs/2304.03510) #attack
Face morphing attack detection is emerging as an increasingly challenging problem owing to advancements in high-quality and realistic morphing attack generation. Reliable detection of morphing attacks is essential because these attacks are targeted for border control applications. This paper presents a multispectral framework for differential morphing-attack detection (D-MAD). The D-MAD methods are based on using two facial images that are captured from the ePassport (also called the reference image) and the trusted device (for example, Automatic Border Control (ABC) gates) to detect whether the face image presented in ePassport is morphed. The proposed multispectral D-MAD framework introduce a multispectral image captured as a trusted capture to capture seven different spectral bands to detect morphing attacks. Extensive experiments were conducted on the newly created datasets with 143 unique data subjects that were captured using both visible and multispectral cameras in multiple sessions. The results indicate the superior performance of the proposed multispectral framework compared to visible images.
[[2304.03370] Reliable Learning for Test-time Attacks and Distribution Shift](http://arxiv.org/abs/2304.03370) #attack
Machine learning algorithms are often used in environments which are not
captured accurately even by the most carefully obtained training data, either
due to the possibility of adversarial' test-time attacks, or on account of
natural' distribution shift. For test-time attacks, we introduce and analyze a
novel robust reliability guarantee, which requires a learner to output
predictions along with a reliability radius $\eta$, with the meaning that its
prediction is guaranteed to be correct as long as the adversary has not
perturbed the test point farther than a distance $\eta$. We provide learners
that are optimal in the sense that they always output the best possible
reliability radius on any test point, and we characterize the reliable region,
i.e. the set of points where a given reliability radius is attainable. We
additionally analyze reliable learners under distribution shift, where the test
points may come from an arbitrary distribution Q different from the training
distribution P. For both cases, we bound the probability mass of the reliable
region for several interesting examples, for linear separators under nearly
log-concave and s-concave distributions, as well as for smooth boundary
classifiers under smooth probability distributions.
[[2304.03388] EZClone: Improving DNN Model Extraction Attack via Shape Distillation from GPU Execution Profiles](http://arxiv.org/abs/2304.03388) #attack
Deep Neural Networks (DNNs) have become ubiquitous due to their performance on prediction and classification problems. However, they face a variety of threats as their usage spreads. Model extraction attacks, which steal DNNs, endanger intellectual property, data privacy, and security. Previous research has shown that system-level side-channels can be used to leak the architecture of a victim DNN, exacerbating these risks. We propose two DNN architecture extraction techniques catering to various threat models. The first technique uses a malicious, dynamically linked version of PyTorch to expose a victim DNN architecture through the PyTorch profiler. The second, called EZClone, exploits aggregate (rather than time-series) GPU profiles as a side-channel to predict DNN architecture, employing a simple approach and assuming little adversary capability as compared to previous work. We investigate the effectiveness of EZClone when minimizing the complexity of the attack, when applied to pruned models, and when applied across GPUs. We find that EZClone correctly predicts DNN architectures for the entire set of PyTorch vision architectures with 100% accuracy. No other work has shown this degree of architecture prediction accuracy with the same adversarial constraints or using aggregate side-channel information. Prior work has shown that, once a DNN has been successfully cloned, further attacks such as model evasion or model inversion can be accelerated significantly.
[[2304.03405] A Comprehensive Survey of Upgradeable Smart Contract Patterns](http://arxiv.org/abs/2304.03405) #attack
In this work, we provide a comprehensive survey of smart contract upgradability patterns using proxies. A primary characteristic of smart contracts on the Ethereum blockchain is that they are immutable once implemented, no changes can be made. Taking human error into account, as well as technology improvements and newly discovered vulnerabilities, there has been a need to upgrade these smart contracts, which may hold enormous amounts of Ether and hence become the target of attacks. Several such attacks have caused tremendous losses in the past, as well as millions of dollars in Ether which has been locked away in broken contracts. Thus far we have collected many upgradable proxy patterns and studied their features to build a comprehensive catalog of patterns. We present a summary of these upgradable proxy patterns which we collected and studied. We scraped the source code for approximately 100000 verified contracts from Etherscan.io, the most popular block explorer for Ethereum, out of which we extracted around 64k unique files - most containing multiple contracts. We have begun to automate the analysis of these contracts using the popular static analysis tool Slither, while at the same time implementing much more robust detection of upgradable proxies using this framework. Comparing the results of the original implementation to our own, we have found that approximately 70 percent of the contracts which were initially flagged as upgradeable proxies are false positives which we have eliminated.
[[2304.03640] FedDiSC: A Computation-efficient Federated Learning Framework for Power Systems Disturbance and Cyber Attack Discrimination](http://arxiv.org/abs/2304.03640) #attack
With the growing concern about the security and privacy of smart grid systems, cyberattacks on critical power grid components, such as state estimation, have proven to be one of the top-priority cyber-related issues and have received significant attention in recent years. However, cyberattack detection in smart grids now faces new challenges, including privacy preservation and decentralized power zones with strategic data owners. To address these technical bottlenecks, this paper proposes a novel Federated Learning-based privacy-preserving and communication-efficient attack detection framework, known as FedDiSC, that enables Discrimination between power System disturbances and Cyberattacks. Specifically, we first propose a Federated Learning approach to enable Supervisory Control and Data Acquisition subsystems of decentralized power grid zones to collaboratively train an attack detection model without sharing sensitive power related data. Secondly, we put forward a representation learning-based Deep Auto-Encoder network to accurately detect power system and cybersecurity anomalies. Lastly, to adapt our proposed framework to the timeliness of real-world cyberattack detection in SGs, we leverage the use of a gradient privacy-preserving quantization scheme known as DP-SIGNSGD to improve its communication efficiency. Extensive simulations of the proposed framework on publicly available Industrial Control Systems datasets demonstrate that the proposed framework can achieve superior detection accuracy while preserving the privacy of sensitive power grid related information. Furthermore, we find that the gradient quantization scheme utilized improves communication efficiency by 40% when compared to a traditional federated learning approach without gradient quantization which suggests suitability in a real-world scenario.
[[2304.03657] SCART: Simulation of Cyber Attacks for Real-Time](http://arxiv.org/abs/2304.03657) #attack
Real-Time systems are often implemented as reactive systems that respond to stimuli and complete tasks in a known bounded time. The development process of such systems usually involves using a cycle-accurate simulation environment and even the digital twine system that can accurately simulate the system and the environment it operates in. In addition, many real-time systems require high reliability and strive to be immune against security attacks. Thus, the development environment must support reliability-related events such as the failure of a sensor, malfunction of a subsystem, and foreseen events of Cyber security attacks. This paper presents the SCART framework - an innovative solution that aims to allow extending simulation environments of real-time systems with the capability to incorporate reliability-related events and advanced cyber security attacks, e.g., an attack on a single sensor as well as "complex security attacks" that aim to change the behavior of a group of sensors. We validate our system by applying the new proposed environment on control a drone's flight control system including its navigation system that uses machine learning algorithms. Such a system is very challenging since it requires many experiments that can hardly be achieved by using live systems. We showed that using SCART is very efficient, can increase the model's accuracy, and significantly reduce false-positive rates. Some of these experiments were also validated using a set of "real drones".
[[2304.03373] Training-Free Layout Control with Cross-Attention Guidance](http://arxiv.org/abs/2304.03373) #robust
Recent diffusion-based generators can produce high-quality images based only on textual prompts. However, they do not correctly interpret instructions that specify the spatial layout of the composition. We propose a simple approach that can achieve robust layout control without requiring training or fine-tuning the image generator. Our technique, which we call layout guidance, manipulates the cross-attention layers that the model uses to interface textual and visual information and steers the reconstruction in the desired direction given, e.g., a user-specified layout. In order to determine how to best guide attention, we study the role of different attention maps when generating images and experiment with two alternative strategies, forward and backward guidance. We evaluate our method quantitatively and qualitatively with several experiments, validating its effectiveness. We further demonstrate its versatility by extending layout guidance to the task of editing the layout and context of a given real image.
[[2304.03391] Exposing and Mitigating Spurious Correlations for Cross-Modal Retrieval](http://arxiv.org/abs/2304.03391) #robust
Cross-modal retrieval methods are the preferred tool to search databases for the text that best matches a query image and vice versa. However, image-text retrieval models commonly learn to memorize spurious correlations in the training data, such as frequent object co-occurrence, instead of looking at the actual underlying reasons for the prediction in the image. For image-text retrieval, this manifests in retrieved sentences that mention objects that are not present in the query image. In this work, we introduce ODmAP@k, an object decorrelation metric that measures a model's robustness to spurious correlations in the training data. We use automatic image and text manipulations to control the presence of such object correlations in designated test data. Additionally, our data synthesis technique is used to tackle model biases due to spurious correlations of semantically unrelated objects in the training data. We apply our proposed pipeline, which involves the finetuning of image-text retrieval frameworks on carefully designed synthetic data, to three state-of-the-art models for image-text retrieval. This results in significant improvements for all three models, both in terms of the standard retrieval performance and in terms of our object decorrelation metric. The code is available at https://github.com/ExplainableML/Spurious_CM_Retrieval.
[[2304.03400] RoSteALS: Robust Steganography using Autoencoder Latent Space](http://arxiv.org/abs/2304.03400) #robust
Data hiding such as steganography and invisible watermarking has important applications in copyright protection, privacy-preserved communication and content provenance. Existing works often fall short in either preserving image quality, or robustness against perturbations or are too complex to train. We propose RoSteALS, a practical steganography technique leveraging frozen pretrained autoencoders to free the payload embedding from learning the distribution of cover images. RoSteALS has a light-weight secret encoder of just 300k parameters, is easy to train, has perfect secret recovery performance and comparable image quality on three benchmarks. Additionally, RoSteALS can be adapted for novel cover-less steganography applications in which the cover image can be sampled from noise or conditioned on text prompts via a denoising diffusion process. Our model and code are available at \url{https://github.com/TuBui/RoSteALS}.
[[2304.03456] Rethinking Evaluation Protocols of Visual Representations Learned via Self-supervised Learning](http://arxiv.org/abs/2304.03456) #robust
Linear probing (LP) (and $k$-NN) on the upstream dataset with labels (e.g., ImageNet) and transfer learning (TL) to various downstream datasets are commonly employed to evaluate the quality of visual representations learned via self-supervised learning (SSL). Although existing SSL methods have shown good performances under those evaluation protocols, we observe that the performances are very sensitive to the hyperparameters involved in LP and TL. We argue that this is an undesirable behavior since truly generic representations should be easily adapted to any other visual recognition task, i.e., the learned representations should be robust to the settings of LP and TL hyperparameters. In this work, we try to figure out the cause of performance sensitivity by conducting extensive experiments with state-of-the-art SSL methods. First, we find that input normalization for LP is crucial to eliminate performance variations according to the hyperparameters. Specifically, batch normalization before feeding inputs to a linear classifier considerably improves the stability of evaluation, and also resolves inconsistency of $k$-NN and LP metrics. Second, for TL, we demonstrate that a weight decay parameter in SSL significantly affects the transferability of learned representations, which cannot be identified by LP or $k$-NN evaluations on the upstream dataset. We believe that the findings of this study will be beneficial for the community by drawing attention to the shortcomings in the current SSL evaluation schemes and underscoring the need to reconsider them.
[[2304.03495] Devil's on the Edges: Selective Quad Attention for Scene Graph Generation](http://arxiv.org/abs/2304.03495) #robust
Scene graph generation aims to construct a semantic graph structure from an image such that its nodes and edges respectively represent objects and their relationships. One of the major challenges for the task lies in the presence of distracting objects and relationships in images; contextual reasoning is strongly distracted by irrelevant objects or backgrounds and, more importantly, a vast number of irrelevant candidate relations. To tackle the issue, we propose the Selective Quad Attention Network (SQUAT) that learns to select relevant object pairs and disambiguate them via diverse contextual interactions. SQUAT consists of two main components: edge selection and quad attention. The edge selection module selects relevant object pairs, i.e., edges in the scene graph, which helps contextual reasoning, and the quad attention module then updates the edge features using both edge-to-node and edge-to-edge cross-attentions to capture contextual information between objects and object pairs. Experiments demonstrate the strong performance and robustness of SQUAT, achieving the state of the art on the Visual Genome and Open Images v6 benchmarks.
[[2304.03550] Hierarchical Disentanglement-Alignment Network for Robust SAR Vehicle Recognition](http://arxiv.org/abs/2304.03550) #robust
Due to Synthetic Aperture Radar (SAR) imaging characteristics, SAR vehicle recognition faces the problem of extracting discriminative and robust target features from a small dataset. Deep learning has shown impressive performance on the MSTAR dataset. However, data bias in a small dataset, such as background correlation, impairs the causality of these methods, i.e., discriminative features contain target and background differences. Moreover, different operating conditions of SAR lead to target signatures and background clutter variations in imaging results. However, many deep learning-based methods only verify robustness to target or background variations in the current experimental setting. In this paper, we propose a novel domain alignment framework named Hierarchical Disentanglement-Alignment Network (HDANet) to enhance features' causality and robustness. Concisely, HDANet consists of three parts: The first part uses data augmentation to generate signature variations for domain alignment. The second part disentangles the target features through a multitask-assisted mask to prevent non-causal clutter from interfering with subsequent alignment and recognition. Thirdly, a contrastive loss is employed for domain alignment to extract robust target features, and the SimSiam structure is applied to mitigate conflicts between contrastive loss and feature discrimination. Finally, the proposed method shows high robustness across MSTAR's multiple target, sensor, and environment variants. Noteworthy, we add a new scene variant to verify the robustness to target and background variations. Moreover, the saliency map and Shapley value qualitatively and quantitatively demonstrate causality. Our code is available in \url{https://github.com/waterdisappear/SAR-ATR-HDANet}.
[[2304.03347] On the Evaluations of ChatGPT and Emotion-enhanced Prompting for Mental Health Analysis](http://arxiv.org/abs/2304.03347) #robust
Automated mental health analysis shows great potential for enhancing the efficiency and accessibility of mental health care, whereas the recent dominant methods utilized pre-trained language models (PLMs) as the backbone and incorporated emotional information. The latest large language models (LLMs), such as ChatGPT, exhibit dramatic capabilities on diverse natural language processing tasks. However, existing studies on ChatGPT's zero-shot performance for mental health analysis have limitations in inadequate evaluation, utilization of emotional information, and explainability of methods. In this work, we comprehensively evaluate the mental health analysis and emotional reasoning ability of ChatGPT on 11 datasets across 5 tasks, including binary and multi-class mental health condition detection, cause/factor detection of mental health conditions, emotion recognition in conversations, and causal emotion entailment. We empirically analyze the impact of different prompting strategies with emotional cues on ChatGPT's mental health analysis ability and explainability. Experimental results show that ChatGPT outperforms traditional neural network methods but still has a significant gap with advanced task-specific methods. The qualitative analysis shows its potential in explainability compared with advanced black-box methods but also limitations on robustness and inaccurate reasoning. Prompt engineering with emotional cues is found to be effective in improving its performance on mental health analysis but requires the proper way of emotion infusion.
[[2304.03394] Deep Learning for Opinion Mining and Topic Classification of Course Reviews](http://arxiv.org/abs/2304.03394) #robust
Student opinions for a course are important to educators and administrators, regardless of the type of the course or the institution. Reading and manually analyzing open-ended feedback becomes infeasible for massive volumes of comments at institution level or online forums. In this paper, we collected and pre-processed a large number of course reviews publicly available online. We applied machine learning techniques with the goal to gain insight into student sentiments and topics. Specifically, we utilized current Natural Language Processing (NLP) techniques, such as word embeddings and deep neural networks, and state-of-the-art BERT (Bidirectional Encoder Representations from Transformers), RoBERTa (Robustly optimized BERT approach) and XLNet (Generalized Auto-regression Pre-training). We performed extensive experimentation to compare these techniques versus traditional approaches. This comparative study demonstrates how to apply modern machine learning approaches for sentiment polarity extraction and topic-based classification utilizing course feedback. For sentiment polarity, the top model was RoBERTa with 95.5\% accuracy and 84.7\% F1-macro, while for topic classification, an SVM (Support Vector Machine) was the top classifier with 79.8\% accuracy and 80.6\% F1-macro. We also provided an in-depth exploration of the effect of certain hyperparameters on the model performance and discussed our observations. These findings can be used by institutions and course providers as a guide for analyzing their own course feedback using NLP models towards self-evaluation and improvement.
[[2304.03427] Cleansing Jewel: A Neural Spelling Correction Model Built On Google OCR-ed Tibetan Manuscripts](http://arxiv.org/abs/2304.03427) #robust
Scholars in the humanities rely heavily on ancient manuscripts to study history, religion, and socio-political structures in the past. Many efforts have been devoted to digitizing these precious manuscripts using OCR technology, but most manuscripts were blemished over the centuries so that an Optical Character Recognition (OCR) program cannot be expected to capture faded graphs and stains on pages. This work presents a neural spelling correction model built on Google OCR-ed Tibetan Manuscripts to auto-correct OCR-ed noisy output. This paper is divided into four sections: dataset, model architecture, training and analysis. First, we feature-engineered our raw Tibetan etext corpus into two sets of structured data frames -- a set of paired toy data and a set of paired real data. Then, we implemented a Confidence Score mechanism into the Transformer architecture to perform spelling correction tasks. According to the Loss and Character Error Rate, our Transformer + Confidence score mechanism architecture proves to be superior to Transformer, LSTM-2-LSTM and GRU-2-GRU architectures. Finally, to examine the robustness of our model, we analyzed erroneous tokens, visualized Attention and Self-Attention heatmaps in our model.
[[2304.03439] Evaluating the Logical Reasoning Ability of ChatGPT and GPT-4](http://arxiv.org/abs/2304.03439) #robust
Harnessing logical reasoning ability is a comprehensive natural language understanding endeavor. With the release of Generative Pretrained Transformer 4 (GPT-4), highlighted as "advanced" at reasoning tasks, we are eager to learn the GPT-4 performance on various logical reasoning tasks. This report analyses multiple logical reasoning datasets, with popular benchmarks like LogiQA and ReClor, and newly-released datasets like AR-LSAT. We test the multi-choice reading comprehension and natural language inference tasks with benchmarks requiring logical reasoning. We further construct a logical reasoning out-of-distribution dataset to investigate the robustness of ChatGPT and GPT-4. We also make a performance comparison between ChatGPT and GPT-4. Experiment results show that ChatGPT performs significantly better than the RoBERTa fine-tuning method on most logical reasoning benchmarks. GPT-4 shows even higher performance on our manual tests. Among benchmarks, ChatGPT and GPT-4 do relatively well on well-known datasets like LogiQA and ReClor. However, the performance drops significantly when handling newly released and out-of-distribution datasets. Logical reasoning remains challenging for ChatGPT and GPT-4, especially on out-of-distribution and natural language inference datasets.
[[2304.03365] Robust Decision-Focused Learning for Reward Transfer](http://arxiv.org/abs/2304.03365) #robust
Decision-focused (DF) model-based reinforcement learning has recently been introduced as a powerful algorithm which can focus on learning the MDP dynamics which are most relevant for obtaining high rewards. While this approach increases the performance of agents by focusing the learning towards optimizing for the reward directly, it does so by learning less accurate dynamics (from a MLE standpoint), and may thus be brittle to changes in the reward function. In this work, we develop the robust decision-focused (RDF) algorithm which leverages the non-identifiability of DF solutions to learn models which maximize expected returns while simultaneously learning models which are robust to changes in the reward function. We demonstrate on a variety of toy example and healthcare simulators that RDF significantly increases the robustness of DF to changes in the reward function, without decreasing the overall return the agent obtains.
[[2304.03374] Optimizing Neural Networks through Activation Function Discovery and Automatic Weight Initialization](http://arxiv.org/abs/2304.03374) #robust
Automated machine learning (AutoML) methods improve upon existing models by optimizing various aspects of their design. While present methods focus on hyperparameters and neural network topologies, other aspects of neural network design can be optimized as well. To further the state of the art in AutoML, this dissertation introduces techniques for discovering more powerful activation functions and establishing more robust weight initialization for neural networks. These contributions improve performance, but also provide new perspectives on neural network optimization. First, the dissertation demonstrates that discovering solutions specialized to specific architectures and tasks gives better performance than reusing general approaches. Second, it shows that jointly optimizing different components of neural networks is synergistic, and results in better performance than optimizing individual components alone. Third, it demonstrates that learned representations are easier to optimize than hard-coded ones, creating further opportunities for AutoML. The dissertation thus makes concrete progress towards fully automatic machine learning in the future.
[[2304.03376] Interpretable statistical representations of neural population dynamics and geometry](http://arxiv.org/abs/2304.03376) #robust
The dynamics of neuron populations during diverse tasks often evolve on low-dimensional manifolds. However, it remains challenging to discern the contributions of geometry and dynamics for encoding relevant behavioural variables. Here, we introduce an unsupervised geometric deep learning framework for representing non-linear dynamical systems based on statistical distributions of local phase portrait features. Our method provides robust geometry-aware or geometry-agnostic representations for the unbiased comparison of dynamics based on measured trajectories. We demonstrate that our statistical representation can generalise across neural network instances to discriminate computational mechanisms, obtain interpretable embeddings of neural dynamics in a primate reaching task with geometric correspondence to hand kinematics, and develop a decoding algorithm with state-of-the-art accuracy. Our results highlight the importance of using the intrinsic manifold structure over temporal information to develop better decoding algorithms and assimilate data across experiments.
[[2304.03431] Domain Generalization In Robust Invariant Representation](http://arxiv.org/abs/2304.03431) #robust
Unsupervised approaches for learning representations invariant to common transformations are used quite often for object recognition. Learning invariances makes models more robust and practical to use in real-world scenarios. Since data transformations that do not change the intrinsic properties of the object cause the majority of the complexity in recognition tasks, models that are invariant to these transformations help reduce the amount of training data required. This further increases the model's efficiency and simplifies training. In this paper, we investigate the generalization of invariant representations on out-of-distribution data and try to answer the question: Do model representations invariant to some transformations in a particular seen domain also remain invariant in previously unseen domains? Through extensive experiments, we demonstrate that the invariant model learns unstructured latent representations that are robust to distribution shifts, thus making invariance a desirable property for training in resource-constrained settings.
[[2304.03580] Language-aware Multiple Datasets Detection Pretraining for DETRs](http://arxiv.org/abs/2304.03580) #extraction
Pretraining on large-scale datasets can boost the performance of object detectors while the annotated datasets for object detection are hard to scale up due to the high labor cost. What we possess are numerous isolated filed-specific datasets, thus, it is appealing to jointly pretrain models across aggregation of datasets to enhance data volume and diversity. In this paper, we propose a strong framework for utilizing Multiple datasets to pretrain DETR-like detectors, termed METR, without the need for manual label spaces integration. It converts the typical multi-classification in object detection into binary classification by introducing a pre-trained language model. Specifically, we design a category extraction module for extracting potential categories involved in an image and assign these categories into different queries by language embeddings. Each query is only responsible for predicting a class-specific object. Besides, to adapt our novel detection paradigm, we propose a group bipartite matching strategy that limits the ground truths to match queries assigned to the same category. Extensive experiments demonstrate that METR achieves extraordinary results on either multi-task joint training or the pretrain & finetune paradigm. Notably, our pre-trained models have high flexible transferability and increase the performance upon various DETR-like detectors on COCO val2017 benchmark. Codes will be available after this paper is published.
[[2304.03608] ALIKED: A Lighter Keypoint and Descriptor Extraction Network via Deformable Transformation](http://arxiv.org/abs/2304.03608) #extraction
Image keypoints and descriptors play a crucial role in many visual measurement tasks. In recent years, deep neural networks have been widely used to improve the performance of keypoint and descriptor extraction. However, the conventional convolution operations do not provide the geometric invariance required for the descriptor. To address this issue, we propose the Sparse Deformable Descriptor Head (SDDH), which learns the deformable positions of supporting features for each keypoint and constructs deformable descriptors. Furthermore, SDDH extracts descriptors at sparse keypoints instead of a dense descriptor map, which enables efficient extraction of descriptors with strong expressiveness. In addition, we relax the neural reprojection error (NRE) loss from dense to sparse to train the extracted sparse descriptors. Experimental results show that the proposed network is both efficient and powerful in various visual measurement tasks, including image matching, 3D reconstruction, and visual relocalization.
[[2304.03754] Language Models are Causal Knowledge Extractors for Zero-shot Video Question Answering](http://arxiv.org/abs/2304.03754) #extraction
Causal Video Question Answering (CVidQA) queries not only association or
temporal relations but also causal relations in a video. Existing question
synthesis methods pre-trained question generation (QG) systems on reading
comprehension datasets with text descriptions as inputs. However, QG models
only learn to ask association questions (e.g., what is someone doing...'')
and result in inferior performance due to the poor transfer of association
knowledge to CVidQA, which focuses on causal questions like
why is someone
doing ...''. Observing this, we proposed to exploit causal knowledge to
generate question-answer pairs, and proposed a novel framework, Causal
Knowledge Extraction from Language Models (CaKE-LM), leveraging causal
commonsense knowledge from language models to tackle CVidQA. To extract
knowledge from LMs, CaKE-LM generates causal questions containing two events
with one triggering another (e.g., score a goal'' triggers
soccer player
kicking ball'') by prompting LM with the action (soccer player kicking ball) to
retrieve the intention (to score a goal). CaKE-LM significantly outperforms
conventional methods by 4% to 6% of zero-shot CVidQA accuracy on NExT-QA and
Causal-VidQA datasets. We also conduct comprehensive analyses and provide key
findings for future research.
[[2304.03691] Feature Mining for Encrypted Malicious Traffic Detection with Deep Learning and Other Machine Learning Algorithms](http://arxiv.org/abs/2304.03691) #extraction
The popularity of encryption mechanisms poses a great challenge to malicious traffic detection. The reason is traditional detection techniques cannot work without the decryption of encrypted traffic. Currently, research on encrypted malicious traffic detection without decryption has focused on feature extraction and the choice of machine learning or deep learning algorithms. In this paper, we first provide an in-depth analysis of traffic features and compare different state-of-the-art traffic feature creation approaches, while proposing a novel concept for encrypted traffic feature which is specifically designed for encrypted malicious traffic analysis. In addition, we propose a framework for encrypted malicious traffic detection. The framework is a two-layer detection framework which consists of both deep learning and traditional machine learning algorithms. Through comparative experiments, it outperforms classical deep learning and traditional machine learning algorithms, such as ResNet and Random Forest. Moreover, to provide sufficient training data for the deep learning model, we also curate a dataset composed entirely of public datasets. The composed dataset is more comprehensive than using any public dataset alone. Lastly, we discuss the future directions of this research.
[[2304.03626] Asynchronous Federated Continual Learning](http://arxiv.org/abs/2304.03626) #federate
The standard class-incremental continual learning setting assumes a set of tasks seen one after the other in a fixed and predefined order. This is not very realistic in federated learning environments where each client works independently in an asynchronous manner getting data for the different tasks in time-frames and orders totally uncorrelated with the other ones. We introduce a novel federated learning setting (AFCL) where the continual learning of multiple tasks happens at each client with different orderings and in asynchronous time slots. We tackle this novel task using prototype-based learning, a representation loss, fractal pre-training, and a modified aggregation policy. Our approach, called FedSpace, effectively tackles this task as shown by the results on the CIFAR-100 dataset using 3 different federated splits with 50, 100, and 500 clients, respectively. The code and federated splits are available at https://github.com/LTTM/FedSpace.
[[2304.03728] Interpretable Unified Language Checking](http://arxiv.org/abs/2304.03728) #fair
Despite recent concerns about undesirable behaviors generated by large language models (LLMs), including non-factual, biased, and hateful language, we find LLMs are inherent multi-task language checkers based on their latent representations of natural and social knowledge. We present an interpretable, unified, language checking (UniLC) method for both human and machine-generated language that aims to check if language input is factual and fair. While fairness and fact-checking tasks have been handled separately with dedicated models, we find that LLMs can achieve high performance on a combination of fact-checking, stereotype detection, and hate speech detection tasks with a simple, few-shot, unified set of prompts. With the ``1/2-shot'' multi-task language checking method proposed in this work, the GPT3.5-turbo model outperforms fully supervised baselines on several language tasks. The simple approach and results suggest that based on strong latent knowledge representations, an LLM can be an adaptive and explainable tool for detecting misinformation, stereotypes, and hate speech.
[[2304.03646] Fairness through Aleatoric Uncertainty](http://arxiv.org/abs/2304.03646) #fair
We propose a unique solution to tackle the often-competing goals of fairness and utility in machine learning classification tasks. While fairness ensures that the model's predictions are unbiased and do not discriminate against any particular group, utility focuses on maximizing the accuracy of the model's predictions. Our aim is to investigate the relationship between uncertainty and fairness. Our approach leverages this concept by employing Bayesian learning to estimate the uncertainty in sample predictions where the estimation is independent of confounding effects related to the protected attribute. Through empirical evidence, we show that samples with low classification uncertainty are modeled more accurately and fairly than those with high uncertainty, which may have biased representations and higher prediction errors. To address the challenge of balancing fairness and utility, we propose a novel fairness-utility objective that is defined based on uncertainty quantification. The weights in this objective are determined by the level of uncertainty, allowing us to optimize both fairness and utility simultaneously. Experiments on real-world datasets demonstrate the effectiveness of our approach. Our results show that our method outperforms state-of-the-art methods in terms of the fairness-utility tradeoff and this applies to both group and individual fairness metrics. This work presents a fresh perspective on the trade-off between accuracy and fairness in machine learning and highlights the potential of using uncertainty as a means to achieve optimal fairness and utility.
[[2304.03745] Assessing Perceived Fairness from Machine Learning Developer's Perspective](http://arxiv.org/abs/2304.03745) #fair
Fairness in machine learning (ML) applications is an important practice for developers in research and industry. In ML applications, unfairness is triggered due to bias in the data, curation process, erroneous assumptions, and implicit bias rendered within the algorithmic development process. As ML applications come into broader use developing fair ML applications is critical. Literature suggests multiple views on how fairness in ML is described from the users perspective and students as future developers. In particular, ML developers have not been the focus of research relating to perceived fairness. This paper reports on a pilot investigation of ML developers perception of fairness. In describing the perception of fairness, the paper performs an exploratory pilot study to assess the attributes of this construct using a systematic focus group of developers. In the focus group, we asked participants to discuss three questions- 1) What are the characteristics of fairness in ML? 2) What factors influence developers belief about the fairness of ML? and 3) What practices and tools are utilized for fairness in ML development? The findings of this exploratory work from the focus group show that to assess fairness developers generally focus on the overall ML application design and development, i.e., business-specific requirements, data collection, pre-processing, in-processing, and post-processing. Thus, we conclude that the procedural aspects of organizational justice theory can explain developers perception of fairness. The findings of this study can be utilized further to assist development teams in integrating fairness in the ML application development lifecycle. It will also motivate ML developers and organizations to develop best practices for assessing the fairness of ML-based applications.
[[2304.03322] Towards Coherent Image Inpainting Using Denoising Diffusion Implicit Models](http://arxiv.org/abs/2304.03322) #diffusion
Image inpainting refers to the task of generating a complete, natural image based on a partially revealed reference image. Recently, many research interests have been focused on addressing this problem using fixed diffusion models. These approaches typically directly replace the revealed region of the intermediate or final generated images with that of the reference image or its variants. However, since the unrevealed regions are not directly modified to match the context, it results in incoherence between revealed and unrevealed regions. To address the incoherence problem, a small number of methods introduce a rigorous Bayesian framework, but they tend to introduce mismatches between the generated and the reference images due to the approximation errors in computing the posterior distributions. In this paper, we propose COPAINT, which can coherently inpaint the whole image without introducing mismatches. COPAINT also uses the Bayesian framework to jointly modify both revealed and unrevealed regions, but approximates the posterior distribution in a way that allows the errors to gradually drop to zero throughout the denoising steps, thus strongly penalizing any mismatches with the reference image. Our experiments verify that COPAINT can outperform the existing diffusion-based methods under both objective and subjective metrics. The codes are available at https://github.com/UCSB-NLP-Chang/CoPaint/.
[[2304.03638] Compressed Regression over Adaptive Networks](http://arxiv.org/abs/2304.03638) #diffusion
In this work we derive the performance achievable by a network of distributed agents that solve, adaptively and in the presence of communication constraints, a regression problem. Agents employ the recently proposed ACTC (adapt-compress-then-combine) diffusion strategy, where the signals exchanged locally by neighboring agents are encoded with randomized differential compression operators. We provide a detailed characterization of the mean-square estimation error, which is shown to comprise a term related to the error that agents would achieve without communication constraints, plus a term arising from compression. The analysis reveals quantitative relationships between the compression loss and fundamental attributes of the distributed regression problem, in particular, the stochastic approximation error caused by the gradient noise and the network topology (through the Perron eigenvector). We show that knowledge of such relationships is critical to allocate optimally the communication resources across the agents, taking into account their individual attributes, such as the quality of their data or their degree of centrality in the network topology. We devise an optimized allocation strategy where the parameters necessary for the optimization can be learned online by the agents. Illustrative examples show that a significant performance improvement, as compared to a blind (i.e., uniform) resource allocation, can be achieved by optimizing the allocation by means of the provided mean-square-error formulas.