[[2210.03638] Demystifying Quantum Blockchain for Healthcare](http://arxiv.org/abs/2210.03638)
The application of blockchain technology can be beneficial in the field of healthcare as well as in the fight against the COVID-19 epidemic. In this work, the importance of blockchain is analyzed and it is observed that blockchain technology and the processes associated with it will be utilised in the healthcare systems of the future for data acquisition from sensors, automatic patient monitoring, and secure data storage. This technology substantially simplifies the process of carrying out operations because it can store a substantial quantity of data in a dispersed and secure manner, as well as enable access whenever and wherever it is required to do so. With the assistance of quantum blockchain, the benefits of quantum computing, such as the capability to acquire thermal imaging based on quantum computing and the speed with which patients may be located and monitored, can all be exploited to their full potential. Quantum blockchain is another tool that can be utilised to maintain the confidentiality, authenticity, and accessibility of data records. The processing of medical records could potentially benefit from greater speed and privacy if it combines quantum computing and blockchain technology. The authors of this paper investigate the possible benefits and applications of blockchain and quantum technologies in the field of medicine, pharmacy and healthcare systems. In this context, this work explored and compared quantum technologies and blockchain-based technologies in conjunction with other cutting-edge information and communications technologies such as ratification intelligence, machine learning, drones, and so on.
[[2210.03372] Pre-trained Adversarial Perturbations](http://arxiv.org/abs/2210.03372)
Self-supervised pre-training has drawn increasing attention in recent years due to its superior performance on numerous downstream tasks after fine-tuning. However, it is well-known that deep learning models lack the robustness to adversarial examples, which can also invoke security issues to pre-trained models, despite being less explored. In this paper, we delve into the robustness of pre-trained models by introducing Pre-trained Adversarial Perturbations (PAPs), which are universal perturbations crafted for the pre-trained models to maintain the effectiveness when attacking fine-tuned ones without any knowledge of the downstream tasks. To this end, we propose a Low-Level Layer Lifting Attack (L4A) method to generate effective PAPs by lifting the neuron activations of low-level layers of the pre-trained models. Equipped with an enhanced noise augmentation strategy, L4A is effective at generating more transferable PAPs against fine-tuned models. Extensive experiments on typical pre-trained vision models and ten downstream tasks demonstrate that our method improves the attack success rate by a large margin compared with state-of-the-art methods.
[[2210.03592] Specialized Re-Ranking: A Novel Retrieval-Verification Framework for Cloth Changing Person Re-Identification](http://arxiv.org/abs/2210.03592)
Cloth changing person re-identification(Re-ID) can work under more complicated scenarios with higher security than normal Re-ID and biometric techniques and is therefore extremely valuable in applications. Meanwhile, higher flexibility in appearance always leads to more similar-looking confusing images, which is the weakness of the widely used retrieval methods. In this work, we shed light on how to handle these similar images. Specifically, we propose a novel retrieval-verification framework. Given an image, the retrieval module can search for similar images quickly. Our proposed verification network will then compare the input image and the candidate images by contrasting those local details and give a similarity score. An innovative ranking strategy is also introduced to take a good balance between retrieval and verification results. Comprehensive experiments are conducted to show the effectiveness of our framework and its capability in improving the state-of-the-art methods remarkably on both synthetic and realistic datasets.
[[2210.03207] Threat Repair with Optimization Modulo Theories](http://arxiv.org/abs/2210.03207)
We propose a model-based procedure for automatically preventing security threats using formal models. We encode system models and potential threats as satisfiability modulo theory (SMT) formulas. This model allows us to ask security questions as satisfiability queries. We formulate threat prevention as an optimization problem over the same formulas. The outcome of our threat prevention procedure is a suggestion of model attribute repair that eliminates threats. Whenever threat prevention fails, we automatically explain why the threat happens. We implement our approach using the state-of-the-art Z3 SMT solver and interface it with the threat analysis tool THREATGET. We demonstrate the value of our procedure in two case studies from automotive and smart home domains, including an industrial-strength example.
[[2210.03254] Network Intrusion Detection System in a Light Bulb](http://arxiv.org/abs/2210.03254)
Internet of Things (IoT) devices are progressively being utilised in a variety of edge applications to monitor and control home and industry infrastructure. Due to the limited compute and energy resources, active security protections are usually minimal in many IoT devices. This has created a critical security challenge that has attracted researchers' attention in the field of network security. Despite a large number of proposed Network Intrusion Detection Systems (NIDSs), there is limited research into practical IoT implementations, and to the best of our knowledge, no edge-based NIDS has been demonstrated to operate on common low-power chipsets found in the majority of IoT devices, such as the ESP8266. This research aims to address this gap by pushing the boundaries on low-power Machine Learning (ML) based NIDSs. We propose and develop an efficient and low-power ML-based NIDS, and demonstrate its applicability for IoT edge applications by running it on a typical smart light bulb. We also evaluate our system against other proposed edge-based NIDSs and show that our model has a higher detection performance, and is significantly faster and smaller, and therefore more applicable to a wider range of IoT edge devices.
[[2210.03458] PAC Security: Automatic Privacy Measurement and Control of Data Processing](http://arxiv.org/abs/2210.03458)
We propose and study a new privacy definition, termed Probably Approximately Correct (PAC) Security. PAC security characterizes the information-theoretic hardness to recover sensitive data given arbitrary information disclosure/leakage during/after any processing. Unlike the classic cryptographic definition and Differential Privacy (DP), which consider the adversarial (input-independent) worst case}, PAC security is a simulatable metric that accommodates priors and quantifies the instance-based impossibility of inference. A fully automatic analysis and proof generation framework is proposed, where security parameters can be produced with arbitrarily high confidence via Monte-Carlo simulation for any black-box data processing oracle. This appealing automation property enables analysis of complicated data processing, where the worst-case proof in the classic privacy regime could be loose or even intractable. Furthermore, we show that the magnitude of (necessary) perturbation required in PAC security is not explicitly dependent on dimensionality, which is in contrast to the worst-case information-theoretic lower bound. We also include practical applications of PAC security with comparisons.
[[2210.03518] LGTBIDS: Layer-wise Graph Theory Based Intrusion Detection System in Beyond 5G](http://arxiv.org/abs/2210.03518)
The advancement in wireless communication technologies is becoming more demanding and pervasive. One of the fundamental parameters that limit the efficiency of the network are the security challenges. The communication network is vulnerable to security attacks such as spoofing attacks and signal strength attacks. Intrusion detection signifies a central approach to ensuring the security of the communication network. In this paper, an Intrusion Detection System based on the framework of graph theory is proposed. A Layerwise Graph Theory-Based Intrusion Detection System (LGTBIDS) algorithm is designed to detect the attacked node. The algorithm performs the layer-wise analysis to extract the vulnerable nodes and ultimately the attacked node(s). For each layer, every node is scanned for the possibility of susceptible node(s). The strategy of the IDS is based on the analysis of energy efficiency and secrecy rate. The nodes with the energy efficiency and secrecy rate beyond the range of upper and lower thresholds are detected as the nodes under attack. Further, detected node(s) are transmitted with a random sequence of bits followed by the process of re-authentication. The obtained results validate the better performance, low time computations, and low complexity. Finally, the proposed approach is compared with the conventional solution of intrusion detection.
[[2210.03205] Synthetic Dataset Generation for Privacy-Preserving Machine Learning](http://arxiv.org/abs/2210.03205)
Machine Learning (ML) has achieved enormous success in solving a variety of problems in computer vision, speech recognition, object detection, to name a few. The principal reason for this success is the availability of huge datasets for training deep neural networks (DNNs). However, datasets cannot be publicly released if they contain sensitive information such as medical records, and data privacy becomes a major concern. Encryption methods could be a possible solution, however their deployment on ML applications seriously impacts classification accuracy and results in substantial computational overhead. Alternatively, obfuscation techniques could be used, but maintaining a good trade-off between visual privacy and accuracy is challenging. In this paper, we propose a method to generate secure synthetic datasets from the original private datasets. Given a network with Batch Normalization (BN) layers pretrained on the original dataset, we first record the class-wise BN layer statistics. Next, we generate the synthetic dataset by optimizing random noise such that the synthetic data match the layer-wise statistical distribution of original images. We evaluate our method on image classification datasets (CIFAR10, ImageNet) and show that synthetic data can be used in place of the original CIFAR10/ImageNet data for training networks from scratch, producing comparable classification performance. Further, to analyze visual privacy provided by our method, we use Image Quality Metrics and show high degree of visual dissimilarity between the original and synthetic images. Moreover, we show that our proposed method preserves data-privacy under various privacy-leakage attacks including Gradient Matching Attack, Model Memorization Attack, and GAN-based Attack.
[[2210.03221] Q-LSTM Language Model -- Decentralized Quantum Multilingual Pre-Trained Language Model for Privacy Protection](http://arxiv.org/abs/2210.03221)
Large-scale language models are trained on a massive amount of natural language data that might encode or reflect our private information. With careful manipulation, malicious agents can reverse engineer the training data even if data sanitation and differential privacy algorithms were involved in the pre-training process. In this work, we propose a decentralized training framework to address privacy concerns in training large-scale language models. The framework consists of a cloud quantum language model built with Variational Quantum Classifiers (VQC) for sentence embedding and a local Long-Short Term Memory (LSTM) model. We use both intrinsic evaluation (loss, perplexity) and extrinsic evaluation (downstream sentiment analysis task) to evaluate the performance of our quantum language model. Our quantum model was comparable to its classical counterpart on all the above metrics. We also perform ablation studies to look into the effect of the size of VQC and the size of training data on the performance of the model. Our approach solves privacy concerns without sacrificing downstream task performance. The intractability of quantum operations on classical hardware ensures the confidentiality of the training data and makes it impossible to be recovered by any adversary.
[[2210.03403] TAN without a burn: Scaling Laws of DP-SGD](http://arxiv.org/abs/2210.03403)
Differentially Private methods for training Deep Neural Networks (DNNs) have progressed recently, in particular with the use of massive batches and aggregated data augmentations for a large number of steps. These techniques require much more compute than their non-private counterparts, shifting the traditional privacy-accuracy trade-off to a privacy-accuracy-compute trade-off and making hyper-parameter search virtually impossible for realistic scenarios. In this work, we decouple privacy analysis and experimental behavior of noisy training to explore the trade-off with minimal computational requirements. We first use the tools of R\'enyi Differential Privacy (RDP) to show that the privacy budget, when not overcharged, only depends on the total amount of noise (TAN) injected throughout training. We then derive scaling laws for training models with DP-SGD to optimize hyper-parameters with more than a 100 reduction in computational budget. We apply the proposed method on CIFAR-10 and ImageNet and, in particular, strongly improve the state-of-the-art on ImageNet with a +9 points gain in accuracy for a privacy budget epsilon=8.
[[2210.03505] Private and Efficient Meta-Learning with Low Rank and Sparse Decomposition](http://arxiv.org/abs/2210.03505)
Meta-learning is critical for a variety of practical ML systems -- like personalized recommendations systems -- that are required to generalize to new tasks despite a small number of task-specific training points. Existing meta-learning techniques use two complementary approaches of either learning a low-dimensional representation of points for all tasks, or task-specific fine-tuning of a global model trained using all the tasks. In this work, we propose a novel meta-learning framework that combines both the techniques to enable handling of a large number of data-starved tasks. Our framework models network weights as a sum of low-rank and sparse matrices. This allows us to capture information from multiple domains together in the low-rank part while still allowing task specific personalization using the sparse part. We instantiate and study the framework in the linear setting, where the problem reduces to that of estimating the sum of a rank-$r$ and a $k$-column sparse matrix using a small number of linear measurements. We propose an alternating minimization method with hard thresholding -- AMHT-LRS -- to learn the low-rank and sparse part effectively and efficiently. For the realizable, Gaussian data setting, we show that AMHT-LRS indeed solves the problem efficiently with nearly optimal samples. We extend AMHT-LRS to ensure that it preserves privacy of each individual user in the dataset, while still ensuring strong generalization with nearly optimal number of samples. Finally, on multiple datasets, we demonstrate that the framework allows personalized models to obtain superior performance in the data-scarce regime.
[[2210.03520] Exploring the Relationships between Privacy by Design Schemes and Privacy Laws: A Comparative Analysis](http://arxiv.org/abs/2210.03520)
Internet of Things (IoT) applications have the potential to derive sensitive information about individuals. Therefore, developers must exercise due diligence to make sure that data are managed according to the privacy regulations and data protection laws. However, doing so can be a difficult and challenging task. Recent research has revealed that developers typically face difficulties when complying with regulations. One key reason is that, at times, regulations are vague, and could be challenging to extract and enact such legal requirements. In our research paper, we have conducted a systematic analysis of the data protection laws that are used across different continents, namely: (i) General Data Protection Regulations (GDPR), (ii) the Personal Information Protection and Electronic Documents Act (PIPEDA), (iii) the California Consumer Privacy Act (CCPA), (iv) Australian Privacy Principles (APPs), and (v) New Zealand's Privacy Act 1993. In this technical report, we presented the detailed results of the conducted framework analysis method to attain a comprehensive view of different data protection laws and highlighted the disparities, in order to assist developers in adhering to the regulations across different regions, along with creating a Combined Privacy Law Framework (CPLF). After that, we gave an overview of various Privacy by Design (PbD) schemes developed previously by different researchers. Then, the key principles and individuals' rights of the CPLF were mapped with the privacy principles, strategies, guidelines, and patterns of the Privacy by Design (PbD) schemes in order to investigate the gaps in existing schemes.
[[2210.03647] Learnware: Small Models Do Big](http://arxiv.org/abs/2210.03647)
There are complaints about current machine learning techniques such as the requirement of a huge amount of training data and proficient training skills, the difficulty of continual learning, the risk of catastrophic forgetting, the leaking of data privacy/proprietary, etc. Most research efforts have been focusing on one of those concerned issues separately, paying less attention to the fact that most issues are entangled in practice. The prevailing big model paradigm, which has achieved impressive results in natural language processing and computer vision applications, has not yet addressed those issues, whereas becoming a serious source of carbon emissions. This article offers an overview of the learnware paradigm, which attempts to enable users not need to build machine learning models from scratch, with the hope of reusing small models to do things even beyond their original purposes, where the key ingredient is the specification which enables a trained model to be adequately identified to reuse according to the requirement of future users who know nothing about the model in advance.
[[2210.03312] Distillation-Resistant Watermarking for Model Protection in NLP](http://arxiv.org/abs/2210.03312)
How can we protect the intellectual property of trained NLP models? Modern NLP models are prone to stealing by querying and distilling from their publicly exposed APIs. However, existing protection methods such as watermarking only work for images but are not applicable to text. We propose Distillation-Resistant Watermarking (DRW), a novel technique to protect NLP models from being stolen via distillation. DRW protects a model by injecting watermarks into the victim's prediction probability corresponding to a secret key and is able to detect such a key by probing a suspect model. We prove that a protected model still retains the original accuracy within a certain bound. We evaluate DRW on a diverse set of NLP tasks including text classification, part-of-speech tagging, and named entity recognition. Experiments show that DRW protects the original model and detects stealing suspects at 100% mean average precision for all four tasks while the prior method fails on two.
[[2210.03249] Joint Protection Scheme for Deep Neural Network Hardware Accelerators and Models](http://arxiv.org/abs/2210.03249)
Deep neural networks (DNNs) are utilized in numerous image processing, object detection, and video analysis tasks and need to be implemented using hardware accelerators to achieve practical speed. Logic locking is one of the most popular methods for preventing chip counterfeiting. Nevertheless, existing logic-locking schemes need to sacrifice the number of input patterns leading to wrong output under incorrect keys to resist the powerful satisfiability (SAT)-attack. Furthermore, DNN model inference is fault-tolerant. Hence, using a wrong key for those SAT-resistant logic-locking schemes may not affect the accuracy of DNNs. This makes the previous SAT-resistant logic-locking scheme ineffective on protecting DNN accelerators. Besides, to prevent DNN models from being illegally used, the models need to be obfuscated by the designers before they are provided to end-users. Previous obfuscation methods either require long time to retrain the model or leak information about the model. This paper proposes a joint protection scheme for DNN hardware accelerators and models. The DNN accelerator is modified using a hardware key (Hkey) and a model key (Mkey). Different from previous logic locking, the Hkey, which is used to protect the accelerator, does not affect the output when it is wrong. As a result, the SAT attack can be effectively resisted. On the other hand, a wrong Hkey leads to substantial increase in memory accesses, inference time, and energy consumption and makes the accelerator unusable. A correct Mkey can recover the DNN model that is obfuscated by the proposed method. Compared to previous model obfuscation schemes, our proposed method avoids model retraining and does not leak model information.
[[2210.03297] Preprocessors Matter! Realistic Decision-Based Attacks on Machine Learning Systems](http://arxiv.org/abs/2210.03297)
Decision-based adversarial attacks construct inputs that fool a machine-learning model into making targeted mispredictions by making only hard-label queries. For the most part, these attacks have been applied directly to isolated neural network models. However, in practice, machine learning models are just a component of a much larger system. By adding just a single preprocessor in front of a classifier, we find that state-of-the-art query-based attacks are as much as seven times less effective at attacking a prediction pipeline than attacking the machine learning model alone. Hence, attacks that are unaware of this invariance inevitably waste a large number of queries to re-discover or overcome it. We, therefore, develop techniques to first reverse-engineer the preprocessor and then use this extracted information to attack the end-to-end system. Our extraction method requires only a few hundred queries to learn the preprocessors used by most publicly available model pipelines, and our preprocessor-aware attacks recover the same efficacy as just attacking the model alone. The code can be found at https://github.com/google-research/preprocessor-aware-black-box-attack.
[[2210.03543] A2: Efficient Automated Attacker for Boosting Adversarial Training](http://arxiv.org/abs/2210.03543)
Based on the significant improvement of model robustness by AT (Adversarial Training), various variants have been proposed to further boost the performance. Well-recognized methods have focused on different components of AT (e.g., designing loss functions and leveraging additional unlabeled data). It is generally accepted that stronger perturbations yield more robust models. However, how to generate stronger perturbations efficiently is still missed. In this paper, we propose an efficient automated attacker called A2 to boost AT by generating the optimal perturbations on-the-fly during training. A2 is a parameterized automated attacker to search in the attacker space for the best attacker against the defense model and examples. Extensive experiments across different datasets demonstrate that A2 generates stronger perturbations with low extra cost and reliably improves the robustness of various AT methods against different attacks.
[[2210.03688] A Wolf in Sheep's Clothing: Spreading Deadly Pathogens Under the Disguise of Popular Music](http://arxiv.org/abs/2210.03688)
A Negative Pressure Room (NPR) is an essential requirement by the Bio-Safety Levels (BSLs) in biolabs or infectious-control hospitals to prevent deadly pathogens from being leaked from the facility. An NPR maintains a negative pressure inside with respect to the outside reference space so that microbes are contained inside of an NPR. Nowadays, differential pressure sensors (DPSs) are utilized by the Building Management Systems (BMSs) to control and monitor the negative pressure in an NPR. This paper demonstrates a non-invasive and stealthy attack on NPRs by spoofing a DPS at its resonant frequency. Our contributions are: (1) We show that DPSs used in NPRs typically have resonant frequencies in the audible range. (2) We use this finding to design malicious music to create resonance in DPSs, resulting in an overshooting in the DPS's normal pressure readings. (3) We show how the resonance in DPSs can fool the BMSs so that the NPR turns its negative pressure to a positive one, causing a potential \textit{leak} of deadly microbes from NPRs. We do experiments on 8 DPSs from 5 different manufacturers to evaluate their resonant frequencies considering the sampling tube length and find resonance in 6 DPSs. We can achieve a 2.5 Pa change in negative pressure from a $\sim$7 cm distance when a sampling tube is not present and from a $\sim$2.5 cm distance for a 1 m sampling tube length. We also introduce an interval-time variation approach for an adversarial control over the negative pressure and show that the \textit{forged} pressure can be varied within 12 - 33 Pa. Our attack is also capable of attacking multiple NPRs simultaneously. Moreover, we demonstrate our attack at a real-world NPR located in an anonymous bioresearch facility, which is FDA approved and follows CDC guidelines. We also provide countermeasures to prevent the attack.
[[2210.03719] BayesImposter: Bayesian Estimation Based ](http://arxiv.org/abs/2210.03719)
Over the last six years, several papers used memory deduplication to trigger various security issues, such as leaking heap-address and causing bit-flip in the physical memory. The most essential requirement for successful memory deduplication is to provide identical copies of a physical page. Recent works use a brute-force approach to create identical copies of a physical page that is an inaccurate and time-consuming primitive from the attacker's perspective.
Our work begins to fill this gap by providing a domain-specific structured way to duplicate a physical page in cloud settings in the context of industrial control systems (ICSs). Here, we show a new attack primitive - \textit{BayesImposter}, which points out that the attacker can duplicate the .bss section of the target control DLL file of cloud protocols using the \textit{Bayesian estimation} technique. Our approach results in less memory (i.e., 4 KB compared to GB) and time (i.e., 13 minutes compared to hours) compared to the brute-force approach used in recent works. We point out that ICSs can be expressed as state-space models; hence, the \textit{Bayesian estimation} is an ideal choice to be combined with memory deduplication for a successful attack in cloud settings. To demonstrate the strength of \textit{BayesImposter}, we create a real-world automation platform using a scaled-down automated high-bay warehouse and industrial-grade SIMATIC S7-1500 PLC from Siemens as a target ICS. We demonstrate that \textit{BayesImposter} can predictively inject false commands into the PLC that can cause possible equipment damage with machine failure in the target ICS. Moreover, we show that \textit{BayesImposter} is capable of adversarial control over the target ICS resulting in severe consequences, such as killing a person but making it looks like an accident. Therefore, we also provide countermeasures to prevent the attack.
[[2210.03561] Empowering Graph Representation Learning with Test-Time Graph Transformation](http://arxiv.org/abs/2210.03561)
As powerful tools for representation learning on graphs, graph neural networks (GNNs) have facilitated various applications from drug discovery to recommender systems. Nevertheless, the effectiveness of GNNs is immensely challenged by issues related to data quality, such as distribution shift, abnormal features and adversarial attacks. Recent efforts have been made on tackling these issues from a modeling perspective which requires additional cost of changing model architectures or re-training model parameters. In this work, we provide a data-centric view to tackle these issues and propose a graph transformation framework named GTrans which adapts and refines graph data at test time to achieve better performance. We provide theoretical analysis on the design of the framework and discuss why adapting graph data works better than adapting the model. Extensive experiments have demonstrated the effectiveness of GTrans on three distinct scenarios for eight benchmark datasets where suboptimal data is presented. Remarkably, GTrans performs the best in most cases with improvements up to 2.8%, 8.2% and 3.8% over the best baselines on three experimental settings.
[[2210.03158] Neural Volumetric Mesh Generator](http://arxiv.org/abs/2210.03158)
Deep generative models have shown success in generating 3D shapes with different representations. In this work, we propose Neural Volumetric Mesh Generator(NVMG) which can generate novel and high-quality volumetric meshes. Unlike the previous 3D generative model for point cloud, voxel, and implicit surface, the volumetric mesh representation is a ready-to-use representation in industry with details on both the surface and interior. Generating this such highly-structured data thus brings a significant challenge. We first propose a diffusion-based generative model to tackle this problem by generating voxelized shapes with close-to-reality outlines and structures. We can simply obtain a tetrahedral mesh as a template with the voxelized shape. Further, we use a voxel-conditional neural network to predict the smooth implicit surface conditioned on the voxels, and progressively project the tetrahedral mesh to the predicted surface under regularizations. The regularization terms are carefully designed so that they can (1) get rid of the defects like flipping and high distortion; (2) force the regularity of the interior and surface structure during the deformation procedure for a high-quality final mesh. As shown in the experiments, our pipeline can generate high-quality artifact-free volumetric and surface meshes from random noise or a reference image without any post-processing. Compared with the state-of-the-art voxel-to-mesh deformation method, we show more robustness and better performance when taking generated voxels as input.
[[2210.03339] Dual Clustering Co-teaching with Consistent Sample Mining for Unsupervised Person Re-Identification](http://arxiv.org/abs/2210.03339)
In unsupervised person Re-ID, peer-teaching strategy leveraging two networks to facilitate training has been proven to be an effective method to deal with the pseudo label noise. However, training two networks with a set of noisy pseudo labels reduces the complementarity of the two networks and results in label noise accumulation. To handle this issue, this paper proposes a novel Dual Clustering Co-teaching (DCCT) approach. DCCT mainly exploits the features extracted by two networks to generate two sets of pseudo labels separately by clustering with different parameters. Each network is trained with the pseudo labels generated by its peer network, which can increase the complementarity of the two networks to reduce the impact of noises. Furthermore, we propose dual clustering with dynamic parameters (DCDP) to make the network adaptive and robust to dynamically changing clustering parameters. Moreover, Consistent Sample Mining (CSM) is proposed to find the samples with unchanged pseudo labels during training for potential noisy sample removal. Extensive experiments demonstrate the effectiveness of the proposed method, which outperforms the state-of-the-art unsupervised person Re-ID methods by a considerable margin and surpasses most methods utilizing camera information.
[[2210.03382] Temporal Feature Alignment in Contrastive Self-Supervised Learning for Human Activity Recognition](http://arxiv.org/abs/2210.03382)
Automated Human Activity Recognition has long been a problem of great interest in human-centered and ubiquitous computing. In the last years, a plethora of supervised learning algorithms based on deep neural networks has been suggested to address this problem using various modalities. While every modality has its own limitations, there is one common challenge. Namely, supervised learning requires vast amounts of annotated data which is practically hard to collect. In this paper, we benefit from the self-supervised learning paradigm (SSL) that is typically used to learn deep feature representations from unlabeled data. Moreover, we upgrade a contrastive SSL framework, namely SimCLR, widely used in various applications by introducing a temporal feature alignment procedure for Human Activity Recognition. Specifically, we propose integrating a dynamic time warping (DTW) algorithm in a latent space to force features to be aligned in a temporal dimension. Extensive experiments have been conducted for the unimodal scenario with inertial modality as well as in multimodal settings using inertial and skeleton data. According to the obtained results, the proposed approach has a great potential in learning robust feature representations compared to the recent SSL baselines, and clearly outperforms supervised models in semi-supervised learning. The code for the unimodal case is available via the following link: https://github.com/bulatkh/csshar_tfa.
[[2210.03429] Adversarially Robust Prototypical Few-shot Segmentation with Neural-ODEs](http://arxiv.org/abs/2210.03429)
Few-shot Learning (FSL) methods are being adopted in settings where data is not abundantly available. This is especially seen in medical domains where the annotations are expensive to obtain. Deep Neural Networks have been shown to be vulnerable to adversarial attacks. This is even more severe in the case of FSL due to the lack of a large number of training examples. In this paper, we provide a framework to make few-shot segmentation models adversarially robust in the medical domain where such attacks can severely impact the decisions made by clinicians who use them. We propose a novel robust few-shot segmentation framework, Prototypical Neural Ordinary Differential Equation (PNODE), that provides defense against gradient-based adversarial attacks. We show that our framework is more robust compared to traditional adversarial defense mechanisms such as adversarial training. Adversarial training involves increased training time and shows robustness to limited types of attacks depending on the type of adversarial examples seen during training. Our proposed framework generalises well to common adversarial attacks like FGSM, PGD and SMIA while having the model parameters comparable to the existing few-shot segmentation models. We show the effectiveness of our proposed approach on three publicly available multi-organ segmentation datasets in both in-domain and cross-domain settings by attacking the support and query sets without the need for ad-hoc adversarial training.
[[2210.03433] PS-ARM: An End-to-End Attention-aware Relation Mixer Network for Person Search](http://arxiv.org/abs/2210.03433)
Person search is a challenging problem with various real-world applications, that aims at joint person detection and re-identification of a query person from uncropped gallery images. Although, the previous study focuses on rich feature information learning, it is still hard to retrieve the query person due to the occurrence of appearance deformations and background distractors. In this paper, we propose a novel attention-aware relation mixer (ARM) module for person search, which exploits the global relation between different local regions within RoI of a person and make it robust against various appearance deformations and occlusion. The proposed ARM is composed of a relation mixer block and a spatio-channel attention layer. The relation mixer block introduces a spatially attended spatial mixing and a channel-wise attended channel mixing for effectively capturing discriminative relation features within an RoI. These discriminative relation features are further enriched by introducing a spatio-channel attention where the foreground and background discriminability is empowered in a joint spatio-channel space. Our ARM module is generic and it does not rely on fine-grained supervision or topological assumptions, hence being easily integrated into any Faster R-CNN based person search methods. Comprehensive experiments are performed on two challenging benchmark datasets: CUHKSYSU and PRW. Our PS-ARM achieves state-of-the-art performance on both datasets. On the challenging PRW dataset, our PS-ARM achieves an absolute gain of 5 in the mAP score over SeqNet, while operating at a comparable speed.
[[2210.03659] Spatio-temporal Tendency Reasoning for Human Body Pose and Shape Estimation from Videos](http://arxiv.org/abs/2210.03659)
In this paper, we present a spatio-temporal tendency reasoning (STR) network for recovering human body pose and shape from videos. Previous approaches have focused on how to extend 3D human datasets and temporal-based learning to promote accuracy and temporal smoothing. Different from them, our STR aims to learn accurate and natural motion sequences in an unconstrained environment through temporal and spatial tendency and to fully excavate the spatio-temporal features of existing video data. To this end, our STR learns the representation of features in the temporal and spatial dimensions respectively, to concentrate on a more robust representation of spatio-temporal features. More specifically, for efficient temporal modeling, we first propose a temporal tendency reasoning (TTR) module. TTR constructs a time-dimensional hierarchical residual connection representation within a video sequence to effectively reason temporal sequences' tendencies and retain effective dissemination of human information. Meanwhile, for enhancing the spatial representation, we design a spatial tendency enhancing (STE) module to further learns to excite spatially time-frequency domain sensitive features in human motion information representations. Finally, we introduce integration strategies to integrate and refine the spatio-temporal feature representations. Extensive experimental findings on large-scale publically available datasets reveal that our STR remains competitive with the state-of-the-art on three datasets. Our code are available at https://github.com/Changboyang/STR.git.
[[2210.03319] Robust Unsupervised Cross-Lingual Word Embedding using Domain Flow Interpolation](http://arxiv.org/abs/2210.03319)
This paper investigates an unsupervised approach towards deriving a universal, cross-lingual word embedding space, where words with similar semantics from different languages are close to one another. Previous adversarial approaches have shown promising results in inducing cross-lingual word embedding without parallel data. However, the training stage shows instability for distant language pairs. Instead of mapping the source language space directly to the target language space, we propose to make use of a sequence of intermediate spaces for smooth bridging. Each intermediate space may be conceived as a pseudo-language space and is introduced via simple linear interpolation. This approach is modeled after domain flow in computer vision, but with a modified objective function. Experiments on intrinsic Bilingual Dictionary Induction tasks show that the proposed approach can improve the robustness of adversarial models with comparable and even better precision. Further experiments on the downstream task of Cross-Lingual Natural Language Inference show that the proposed model achieves significant performance improvement for distant language pairs in downstream tasks compared to state-of-the-art adversarial and non-adversarial models.
[[2210.03378] UU-Tax at SemEval-2022 Task 3: Improving the generalizability of language models for taxonomy classification through data augmentation](http://arxiv.org/abs/2210.03378)
This paper presents our strategy to address the SemEval-2022 Task 3 PreTENS: Presupposed Taxonomies Evaluating Neural Network Semantics. The goal of the task is to identify if a sentence is deemed acceptable or not, depending on the taxonomic relationship that holds between a noun pair contained in the sentence. For sub-task 1 -- binary classification -- we propose an effective way to enhance the robustness and the generalizability of language models for better classification on this downstream task. We design a two-stage fine-tuning procedure on the ELECTRA language model using data augmentation techniques. Rigorous experiments are carried out using multi-task learning and data-enriched fine-tuning. Experimental results demonstrate that our proposed model, UU-Tax, is indeed able to generalize well for our downstream task. For sub-task 2 -- regression -- we propose a simple classifier that trains on features obtained from Universal Sentence Encoder (USE). In addition to describing the submitted systems, we discuss other experiments that employ pre-trained language models and data augmentation techniques. For both sub-tasks, we perform error analysis to further understand the behaviour of the proposed models. We achieved a global F1_Binary score of 91.25% in sub-task 1 and a rho score of 0.221 in sub-task 2.
[[2210.03454] DABERT: Dual Attention Enhanced BERT for Semantic Matching](http://arxiv.org/abs/2210.03454)
Transformer-based pre-trained language models such as BERT have achieved remarkable results in Semantic Sentence Matching. However, existing models still suffer from insufficient ability to capture subtle differences. Minor noise like word addition, deletion, and modification of sentences may cause flipped predictions. To alleviate this problem, we propose a novel Dual Attention Enhanced BERT (DABERT) to enhance the ability of BERT to capture fine-grained differences in sentence pairs. DABERT comprises (1) Dual Attention module, which measures soft word matches by introducing a new dual channel alignment mechanism to model affinity and difference attention. (2) Adaptive Fusion module, this module uses attention to learn the aggregation of difference and affinity features, and generates a vector describing the matching details of sentence pairs. We conduct extensive experiments on well-studied semantic matching and robustness test datasets, and the experimental results show the effectiveness of our proposed method.
[[2210.03696] NMTSloth: Understanding and Testing Efficiency Degradation of Neural Machine Translation Systems](http://arxiv.org/abs/2210.03696)
Neural Machine Translation (NMT) systems have received much recent attention due to their human-level accuracy. While existing works mostly focus on either improving accuracy or testing accuracy robustness, the computation efficiency of NMT systems, which is of paramount importance due to often vast translation demands and real-time requirements, has surprisingly received little attention. In this paper, we make the first attempt to understand and test potential computation efficiency robustness in state-of-the-art NMT systems. By analyzing the working mechanism and implementation of 1455 public-accessible NMT systems, we observe a fundamental property in NMT systems that could be manipulated in an adversarial manner to reduce computation efficiency significantly. Our key motivation is to generate test inputs that could sufficiently delay the generation of EOS such that NMT systems would have to go through enough iterations to satisfy the pre-configured threshold. We present NMTSloth, which develops a gradient-guided technique that searches for a minimal and unnoticeable perturbation at character-level, token-level, and structure-level, which sufficiently delays the appearance of EOS and forces these inputs to reach the naturally-unreachable threshold. To demonstrate the effectiveness of NMTSloth, we conduct a systematic evaluation on three public-available NMT systems: Google T5, AllenAI WMT14, and Helsinki-NLP translators. Experimental results show that NMTSloth can increase NMT systems' response latency and energy consumption by 85% to 3153% and 86% to 3052%, respectively, by perturbing just one character or token in the input sentence. Our case study shows that inputs generated by NMTSloth significantly affect the battery power in real-world mobile devices (i.e., drain more than 30 times battery power than normal inputs).
[[2210.03120] GBSVM: Granular-ball Support Vector Machine](http://arxiv.org/abs/2210.03120)
GBSVM (Granular-ball Support Vector Machine) is an important attempt to use the coarse granularity of a granular-ball as the input to construct a classifier instead of a data point. It is the first classifier whose input contains no points, i.e., $x_i$, in the history of machine learning. However, on the one hand, its dual model is not derived, and the algorithm has not been implemented and can not be applied. On the other hand, there are some errors in its existing model. To address these problems, this paper has fixed the errors of the original model of GBSVM, and derived its dual model. Furthermore, an algorithm is designed using particle swarm optimization algorithm to solve the dual model. The experimental results on the UCI benchmark datasets demonstrate that GBSVM has good robustness and efficiency.
[[2210.03122] Temporal Spatial Decomposition and Fusion Network for Time Series Forecasting](http://arxiv.org/abs/2210.03122)
Feature engineering is required to obtain better results for time series forecasting, and decomposition is a crucial one. One decomposition approach often cannot be used for numerous forecasting tasks since the standard time series decomposition lacks flexibility and robustness. Traditional feature selection relies heavily on preexisting domain knowledge, has no generic methodology, and requires a lot of labor. However, most time series prediction models based on deep learning typically suffer from interpretability issue, so the "black box" results lead to a lack of confidence. To deal with the above issues forms the motivation of the thesis. In the paper we propose TSDFNet as a neural network with self-decomposition mechanism and an attentive feature fusion mechanism, It abandons feature engineering as a preprocessing convention and creatively integrates it as an internal module with the deep model. The self-decomposition mechanism empowers TSDFNet with extensible and adaptive decomposition capabilities for any time series, users can choose their own basis functions to decompose the sequence into temporal and generalized spatial dimensions. Attentive feature fusion mechanism has the ability to capture the importance of external variables and the causality with target variables. It can automatically suppress the unimportant features while enhancing the effective ones, so that users do not have to struggle with feature selection. Moreover, TSDFNet is easy to look into the "black box" of the deep neural network by feature visualization and analyze the prediction results. We demonstrate performance improvements over existing widely accepted models on more than a dozen datasets, and three experiments showcase the interpretability of TSDFNet.
[[2210.03123] Enhancing Mixup-Based Graph Learning for Language Processing via Hybrid Pooling](http://arxiv.org/abs/2210.03123)
Graph neural networks (GNNs) have recently been popular in natural language and programming language processing, particularly in text and source code classification. Graph pooling which processes node representation into the entire graph representation, which can be used for multiple downstream tasks, e.g., graph classification, is a crucial component of GNNs. Recently, to enhance graph learning, Manifold Mixup, a data augmentation strategy that mixes the graph data vector after the pooling layer, has been introduced. However, since there are a series of graph pooling methods, how they affect the effectiveness of such a Mixup approach is unclear. In this paper, we take the first step to explore the influence of graph pooling methods on the effectiveness of the Mixup-based data augmentation approach. Specifically, 9 types of hybrid pooling methods are considered in the study, e.g., $\mathcal{M}{sum}(\mathcal{P}{att},\mathcal{P}_{max})$. The experimental results on both natural language datasets (Gossipcop, Politifact) and programming language datasets (Java250, Python800) demonstrate that hybrid pooling methods are more suitable for Mixup than the standard max pooling and the state-of-the-art graph multiset transformer (GMT) pooling, in terms of metric accuracy and robustness.
[[2210.03150] Towards Out-of-Distribution Adversarial Robustness](http://arxiv.org/abs/2210.03150)
Adversarial robustness continues to be a major challenge for deep learning. A core issue is that robustness to one type of attack often fails to transfer to other attacks. While prior work establishes a theoretical trade-off in robustness against different $L_p$ norms, we show that there is potential for improvement against many commonly used attacks by adopting a domain generalisation approach. Concretely, we treat each type of attack as a domain, and apply the Risk Extrapolation method (REx), which promotes similar levels of robustness against all training attacks. Compared to existing methods, we obtain similar or superior worst-case adversarial robustness on attacks seen during training. Moreover, we achieve superior performance on families or tunings of attacks only encountered at test time. On ensembles of attacks, our approach improves the accuracy from 3.4% the best existing baseline to 25.9% on MNIST, and from 16.9% to 23.5% on CIFAR10.
[[2210.03164] InfoOT: Information Maximizing Optimal Transport](http://arxiv.org/abs/2210.03164)
Optimal transport aligns samples across distributions by minimizing the transportation cost between them, e.g., the geometric distances. Yet, it ignores coherence structure in the data such as clusters, does not handle outliers well, and cannot integrate new data points. To address these drawbacks, we propose InfoOT, an information-theoretic extension of optimal transport that maximizes the mutual information between domains while minimizing geometric distances. The resulting objective can still be formulated as a (generalized) optimal transport problem, and can be efficiently solved by projected gradient descent. This formulation yields a new projection method that is robust to outliers and generalizes to unseen samples. Empirically, InfoOT improves the quality of alignments across benchmarks in domain adaptation, cross-domain retrieval, and single-cell alignment.
[[2210.03275] Out-of-Distribution Generalization in Algorithmic Reasoning Through Curriculum Learning](http://arxiv.org/abs/2210.03275)
Out-of-distribution generalization (OODG) is a longstanding challenge for neural networks, and is quite apparent in tasks with well-defined variables and rules, where explicit use of the rules can solve problems independently of the particular values of the variables. Large transformer-based language models have pushed the boundaries on how well neural networks can generalize to novel inputs, but their complexity obfuscates they achieve such robustness. As a step toward understanding how transformer-based systems generalize, we explore the question of OODG in smaller scale transformers. Using a reasoning task based on the puzzle Sudoku, we show that OODG can occur on complex problems if the training set includes examples sampled from the whole distribution of simpler component tasks.
[[2210.03675] Koopman Neural Forecaster for Time Series with Temporal Distribution Shifts](http://arxiv.org/abs/2210.03675)
Temporal distributional shifts, with underlying dynamics changing over time, frequently occur in real-world time series, and pose a fundamental challenge for deep neural networks (DNNs). In this paper, we propose a novel deep sequence model based on the Koopman theory for time series forecasting: Koopman Neural Forecaster (KNF) that leverages DNNs to learn the linear Koopman space and the coefficients of chosen measurement functions. KNF imposes appropriate inductive biases for improved robustness against distributional shifts, employing both a global operator to learn shared characteristics, and a local operator to capture changing dynamics, as well as a specially-designed feedback loop to continuously update the learnt operators over time for rapidly varying behaviors. To the best of our knowledge, this is the first time that Koopman theory is applied to real-world chaotic time series without known governing laws. We demonstrate that KNF achieves the superior performance compared to the alternatives, on multiple time series datasets that are shown to suffer from distribution shifts.
[[2210.03731] Demystifying Map Space Exploration for NPUs](http://arxiv.org/abs/2210.03731)
Map Space Exploration is the problem of finding optimized mappings of a Deep Neural Network (DNN) model on an accelerator. It is known to be extremely computationally expensive, and there has been active research looking at both heuristics and learning-based methods to make the problem computationally tractable. However, while there are dozens of mappers out there (all empirically claiming to find better mappings than others), the research community lacks systematic insights on how different search techniques navigate the map-space and how different mapping axes contribute to the accelerator's performance and efficiency. Such insights are crucial to developing mapping frameworks for emerging DNNs that are increasingly irregular (due to neural architecture search) and sparse, making the corresponding map spaces much more complex. In this work, rather than proposing yet another mapper, we do a first-of-its-kind apples-to-apples comparison of search techniques leveraged by different mappers. Next, we extract the learnings from our study and propose two new techniques that can augment existing mappers -- warm-start and sparsity-aware -- that demonstrate speedups, scalability, and robustness across diverse DNN models.
[[2210.03453] Key Information Extraction in Purchase Documents using Deep Learning and Rule-based Corrections](http://arxiv.org/abs/2210.03453)
Deep Learning (DL) is dominating the fields of Natural Language Processing (NLP) and Computer Vision (CV) in the recent times. However, DL commonly relies on the availability of large data annotations, so other alternative or complementary pattern-based techniques can help to improve results. In this paper, we build upon Key Information Extraction (KIE) in purchase documents using both DL and rule-based corrections. Our system initially trusts on Optical Character Recognition (OCR) and text understanding based on entity tagging to identify purchase facts of interest (e.g., product codes, descriptions, quantities, or prices). These facts are then linked to a same product group, which is recognized by means of line detection and some grouping heuristics. Once these DL approaches are processed, we contribute several mechanisms consisting of rule-based corrections for improving the baseline DL predictions. We prove the enhancements provided by these rule-based corrections over the baseline DL results in the presented experiments for purchase documents from public and NielsenIQ datasets.
[[2210.03337] A Unified Framework for Multi-intent Spoken Language Understanding with prompting](http://arxiv.org/abs/2210.03337)
Multi-intent Spoken Language Understanding has great potential for widespread implementation. Jointly modeling Intent Detection and Slot Filling in it provides a channel to exploit the correlation between intents and slots. However, current approaches are apt to formulate these two sub-tasks differently, which leads to two issues: 1) It hinders models from effective extraction of shared features. 2) Pretty complicated structures are involved to enhance expression ability while causing damage to the interpretability of frameworks. In this work, we describe a Prompt-based Spoken Language Understanding (PromptSLU) framework, to intuitively unify two sub-tasks into the same form by offering a common pre-trained Seq2Seq model. In detail, ID and SF are completed by concisely filling the utterance into task-specific prompt templates as input, and sharing output formats of key-value pairs sequence. Furthermore, variable intents are predicted first, then naturally embedded into prompts to guide slot-value pairs inference from a semantic perspective. Finally, we are inspired by prevalent multi-task learning to introduce an auxiliary sub-task, which helps to learn relationships among provided labels. Experiment results show that our framework outperforms several state-of-the-art baselines on two public datasets.
[[2210.03419] Event Extraction: A Survey](http://arxiv.org/abs/2210.03419)
Extracting the reported events from text is one of the key research themes in natural language processing. This process includes several tasks such as event detection, argument extraction, role labeling. As one of the most important topics in natural language processing and natural language understanding, the applications of event extraction spans across a wide range of domains such as newswire, biomedical domain, history and humanity, and cyber security. This report presents a comprehensive survey for event detection from textual documents. In this report, we provide the task definition, the evaluation method, as well as the benchmark datasets and a taxonomy of methodologies for event extraction. We also present our vision of future research direction in event detection.
[[2210.03690] Few-Shot Anaphora Resolution in Scientific Protocols via Mixtures of In-Context Experts](http://arxiv.org/abs/2210.03690)
Anaphora resolution is an important task for information extraction across a range of languages, text genres, and domains, motivating the need for methods that do not require large annotated datasets. In-context learning has emerged as a promising approach, yet there are a number of challenges in applying in-context learning to resolve anaphora. For example, encoding a single in-context demonstration that consists of: an anaphor, a paragraph-length context, and a list of corresponding antecedents, requires conditioning a language model on a long sequence of tokens, limiting the number of demonstrations per prompt. In this paper, we present MICE (Mixtures of In-Context Experts), which we demonstrate is effective for few-shot anaphora resolution in scientific protocols (Tamari et al., 2021). Given only a handful of training examples, MICE combines the predictions of hundreds of in-context experts, yielding a 30% increase in F1 score over a competitive prompt retrieval baseline. Furthermore, we show MICE can be used to train compact student models without sacrificing performance. As far as we are aware, this is the first work to present experimental results demonstrating the effectiveness of in-context learning on the task of few-shot anaphora resolution in scientific protocols.
[[2210.03277] Rethinking Normalization Methods in Federated Learning](http://arxiv.org/abs/2210.03277)
Federated learning (FL) is a popular distributed learning framework that can reduce privacy risks by not explicitly sharing private data. In this work, we explicitly uncover external covariate shift problem in FL, which is caused by the independent local training processes on different devices. We demonstrate that external covariate shifts will lead to the obliteration of some devices' contributions to the global model. Further, we show that normalization layers are indispensable in FL since their inherited properties can alleviate the problem of obliterating some devices' contributions. However, recent works have shown that batch normalization, which is one of the standard components in many deep neural networks, will incur accuracy drop of the global model in FL. The essential reason for the failure of batch normalization in FL is poorly studied. We unveil that external covariate shift is the key reason why batch normalization is ineffective in FL. We also show that layer normalization is a better choice in FL which can mitigate the external covariate shift and improve the performance of the global model. We conduct experiments on CIFAR10 under non-IID settings. The results demonstrate that models with layer normalization converge fastest and achieve the best or comparable accuracy for three different model architectures.
[[2210.03444] Depersonalized Federated Learning: Tackling Statistical Heterogeneity by Alternating Stochastic Gradient Descent](http://arxiv.org/abs/2210.03444)
Federated learning (FL) has gained increasing attention recently, which enables distributed devices to train a common machine learning (ML) model for intelligent inference cooperatively without data sharing.
However, the raw data held by various involved participators are always non-independent-and-identically-distributed (non-i.i.d), which results in slow convergence of the FL training process.
To address this issue, we propose a new FL method that can significantly mitigate statistical heterogeneity by the depersonalized mechanism.
Particularly, we decouple the global and local objectives optimized by performing stochastic gradient descent alternately to reduce the accumulated variance on the global model (generated in local update phases) hence accelerating the FL convergence.
Then we analyze the proposed method detailedly to show the proposed method converging at a sublinear speed in the general non-convex setting.
Finally, extensive numerical results are conducted with experiments on public datasets to verify the effectiveness of our proposed method.
[[2210.03175] Evaluating Fairness Without Sensitive Attributes: A Framework Using Only Auxiliary Models](http://arxiv.org/abs/2210.03175)
Although the volume of literature and public attention on machine learning fairness has been growing significantly, in practice some tasks as basic as measuring fairness, which is the first step in studying and promoting fairness, can be challenging. This is because sensitive attributes are often unavailable due to privacy regulations. The straightforward solution is to use auxiliary models to predict the missing sensitive attributes. However, our theoretical analyses show that the estimation error of the directly measured fairness metrics is proportional to the error rates of auxiliary models' predictions. Existing works that attempt to reduce the estimation error often require strong assumptions, e.g. access to the ground-truth sensitive attributes or some form of conditional independence. In this paper, we drop those assumptions and propose a framework that uses only off-the-shelf auxiliary models. The main challenge is how to reduce the negative impact of imperfectly predicted sensitive attributes on the fairness metrics without knowing the ground-truth sensitive attributes. Inspired by the noisy label learning literature, we first derive a closed-form relationship between the directly measured fairness metrics and their corresponding ground-truth metrics. And then we estimate some key statistics (most importantly transition matrix in the noisy label literature), which we use, together with the derived relationship, to calibrate the fairness metrics. In addition, we theoretically prove the upper bound of the estimation error in our calibrated metrics and show our method can substantially decrease the estimation error especially when auxiliary models are inaccurate or the target model is highly biased. Experiments on COMPAS and CelebA validate our theoretical analyses and show our method can measure fairness significantly more accurately than baselines under favorable circumstances.
[[2210.03274] TCNL: Transparent and Controllable Network Learning Via Embedding Human-Guided Concepts](http://arxiv.org/abs/2210.03274)
Explaining deep learning models is of vital importance for understanding artificial intelligence systems, improving safety, and evaluating fairness. To better understand and control the CNN model, many methods for transparency-interpretability have been proposed. However, most of these works are less intuitive for human understanding and have insufficient human control over the CNN model. We propose a novel method, Transparent and Controllable Network Learning (TCNL), to overcome such challenges. Towards the goal of improving transparency-interpretability, in TCNL, we define some concepts for specific classification tasks through scientific human-intuition study and incorporate concept information into the CNN model. In TCNL, the shallow feature extractor gets preliminary features first. Then several concept feature extractors are built right after the shallow feature extractor to learn high-dimensional concept representations. The concept feature extractor is encouraged to encode information related to the predefined concepts. We also build the concept mapper to visualize features extracted by the concept extractor in a human-intuitive way. TCNL provides a generalizable approach to transparency-interpretability. Researchers can define concepts corresponding to certain classification tasks and encourage the model to encode specific concept information, which to a certain extent improves transparency-interpretability and the controllability of the CNN model. The datasets (with concept sets) for our experiments will also be released (https://github.com/bupt-ai-cz/TCNL).
[[2210.03352] The Ethical Risks of Analyzing Crisis Events on Social Media with Machine Learning](http://arxiv.org/abs/2210.03352)
Social media platforms provide a continuous stream of real-time news regarding crisis events on a global scale. Several machine learning methods utilize the crowd-sourced data for the automated detection of crises and the characterization of their precursors and aftermaths. Early detection and localization of crisis-related events can help save lives and economies. Yet, the applied automation methods introduce ethical risks worthy of investigation
[[2210.03303] Data-driven Approach to Differentiating between Depression and Dementia from Noisy Speech and Language Data](http://arxiv.org/abs/2210.03303)
A significant number of studies apply acoustic and linguistic characteristics of human speech as prominent markers of dementia and depression. However, studies on discriminating depression from dementia are rare. Co-morbid depression is frequent in dementia and these clinical conditions share many overlapping symptoms, but the ability to distinguish between depression and dementia is essential as depression is often curable. In this work, we investigate the ability of clustering approaches in distinguishing between depression and dementia from human speech. We introduce a novel aggregated dataset, which combines narrative speech data from multiple conditions, i.e., Alzheimer's disease, mild cognitive impairment, healthy control, and depression. We compare linear and non-linear clustering approaches and show that non-linear clustering techniques distinguish better between distinct disease clusters. Our interpretability analysis shows that the main differentiating symptoms between dementia and depression are acoustic abnormality, repetitiveness (or circularity) of speech, word finding difficulty, coherence impairment, and differences in lexical complexity and richness.
[[2210.03629] ReAct: Synergizing Reasoning and Acting in Language Models](http://arxiv.org/abs/2210.03629)
While large language models (LLMs) have demonstrated impressive capabilities across tasks in language understanding and interactive decision making, their abilities for reasoning (e.g. chain-of-thought prompting) and acting (e.g. action plan generation) have primarily been studied as separate topics. In this paper, we explore the use of LLMs to generate both reasoning traces and task-specific actions in an interleaved manner, allowing for greater synergy between the two: reasoning traces help the model induce, track, and update action plans as well as handle exceptions, while actions allow it to interface with external sources, such as knowledge bases or environments, to gather additional information. We apply our approach, named ReAct, to a diverse set of language and decision making tasks and demonstrate its effectiveness over state-of-the-art baselines, as well as improved human interpretability and trustworthiness over methods without reasoning or acting components. Concretely, on question answering (HotpotQA) and fact verification (Fever), ReAct overcomes issues of hallucination and error propagation prevalent in chain-of-thought reasoning by interacting with a simple Wikipedia API, and generates human-like task-solving trajectories that are more interpretable than baselines without reasoning traces. On two interactive decision making benchmarks (ALFWorld and WebShop), ReAct outperforms imitation and reinforcement learning methods by an absolute success rate of 34% and 10% respectively, while being prompted with only one or two in-context examples.