[[2208.09030] A Secure and Efficient Data Deduplication Scheme with Dynamic Ownership Management in Cloud Computing](http://arxiv.org/abs/2208.09030)
Encrypted data deduplication is an important technique for eliminating duplicate copies of repeating data, and has been widely used in cloud storage to save storage space and network bandwidth. Recently, several deduplication schemes solving the privacy-preserving problem of dynamic ownership management have been proposed. However, these schemes suffer from low efficiency when the cloud user joining and revocation frequently go on, especially in the absence of a trusted third party. In this paper, we propose a novel server-side deduplication scheme for encrypted data in a hybrid cloud architecture, where a public cloud (Pub-CSP) manages the storage and a private cloud (Pri-CSP) plays a role as the data owner to perform deduplication and dynamic ownership management. Further, to mitigate the communication overhead we adopt a pre-verified accessing control approach to prevent the unauthorized cloud users from downloading data and use an initial uploader check mechanism to ensure only the first uploader needs to perform encryption. Our security analysis and performance evaluation demonstrate that our proposed scheme has better performance in terms of security, effectiveness, and practicability compared with other schemes.
[[2208.09245] Deep Joint Source-Channel and Encryption Coding: Secure Semantic Communications](http://arxiv.org/abs/2208.09245)
Deep learning driven joint source-channel coding (JSCC) for wireless image or video transmission, also called DeepJSCC, has been a topic of interest recently with very promising results. The idea is to map similar source samples to nearby points in the channel input space such that, despite the noise introduced by the channel, the input can be recovered with minimal distortion. In DeepJSCC, this is achieved by an autoencoder architecture with a non-trainable channel layer between the encoder and decoder. DeepJSCC has many favorable properties, such as better end-to-end distortion performance than its separate source and channel coding counterpart as well as graceful degradation with respect to channel quality. However, due to the inherent correlation between the source sample and channel input, DeepJSCC is vulnerable to eavesdropping attacks. In this paper, we propose the first DeepJSCC scheme for wireless image transmission that is secure against eavesdroppers, called DeepJSCEC. DeepJSCEC not only preserves the favorable properties of DeepJSCC, it also provides security against chosen-plaintext attacks from the eavesdropper, without the need to make assumptions about the eavesdropper's channel condition, or its intended use of the intercepted signal. Numerical results show that DeepJSCEC achieves similar or better image quality than separate source coding using BPG compression, AES encryption, and LDPC codes for channel coding, while preserving the graceful degradation of image quality with respect to channel quality. We also show that the proposed encryption method is problem agnostic, meaning it can be applied to other end-to-end JSCC problems, such as remote classification, without modification. Given the importance of security in modern wireless communication systems, we believe this work brings DeepJSCC schemes much closer to adoption in practice.
[[2208.09235] A Pragmatic Methodology for Blind Hardware Trojan Insertion in Finalized Layouts](http://arxiv.org/abs/2208.09235)
A potential vulnerability for integrated circuits (ICs) is the insertion of hardware trojans (HTs) during manufacturing. Understanding the practicability of such an attack can lead to appropriate measures for mitigating it. In this paper, we demonstrate a pragmatic framework for analyzing HT susceptibility of finalized layouts. Our framework is representative of a fabrication-time attack, where the adversary is assumed to have access only to a layout representation of the circuit. The framework inserts trojans into tapeout-ready layouts utilizing an Engineering Change Order (ECO) flow. The attacked security nodes are blindly searched utilizing reverse-engineering techniques. For our experimental investigation, we utilized three crypto-cores (AES-128, SHA-256, and RSA) and a microcontroller (RISC-V) as targets. We explored 96 combinations of triggers, payloads and targets for our framework. Our findings demonstrate that even in high-density designs, the covert insertion of sophisticated trojans is possible. All this while maintaining the original target logic, with minimal impact on power and performance. Furthermore, from our exploration, we conclude that it is too naive to only utilize placement resources as a metric for HT vulnerability. This work highlights that the HT insertion success is a complex function of the placement, routing resources, the position of the attacked nodes, and further design-specific characteristics. As a result, our framework goes beyond just an attack, we present the most advanced analysis tool to assess the vulnerability of HT insertion into finalized layouts.
[[2208.09281] Usable Security for an IoT OS: Integrating the Zoo of Embedded Crypto Components Below a Common API](http://arxiv.org/abs/2208.09281)
IoT devices differ widely in crypto-supporting hardware, ranging from no hardware support to powerful accelerators supporting numerous of operations including protected key storage. An operating system should provide uniform access to these heterogeneous hardware features, which is a particular challenge in the resource constrained IoT. Effective security is tied to the usability of cryptographic interfaces. A thoughtful API design is challenging, and it is beneficial to re-use such an interface and to share the knowledge of programming embedded security widely.
In this paper, we integrate an emerging cryptographic interface into usable system-level calls for the IoT operating system RIOT, which runs on more than 240 platforms. This interface supports ID-based key handling to access key material in protected storage without exposing it to anyone. Our design foresees hardware acceleration on all available variants; our implementation integrates diverse cryptographic hardware and software backends via the uniform interface. Our performance measurements show that the overhead of the uniform API with integrated key management is negligible compared to the individual crypto operation. Our approach enhances the usability, portability, and flexibility of cryptographic support in the IoT.
[[2208.09191] Synthetic Data in Human Analysis: A Survey](http://arxiv.org/abs/2208.09191)
Deep neural networks have become prevalent in human analysis, boosting the performance of applications, such as biometric recognition, action recognition, as well as person re-identification. However, the performance of such networks scales with the available training data. In human analysis, the demand for large-scale datasets poses a severe challenge, as data collection is tedious, time-expensive, costly and must comply with data protection laws. Current research investigates the generation of \textit{synthetic data} as an efficient and privacy-ensuring alternative to collecting real data in the field. This survey introduces the basic definitions and methodologies, essential when generating and employing synthetic data for human analysis. We conduct a survey that summarises current state-of-the-art methods and the main benefits of using synthetic data. We also provide an overview of publicly available synthetic datasets and generation models. Finally, we discuss limitations, as well as open research problems in this field. This survey is intended for researchers and practitioners in the field of human analysis.
[[2208.09011] Verifiable Differential Privacy For When The Curious Become Dishonest](http://arxiv.org/abs/2208.09011)
Many applications seek to produce differentially private statistics on sensitive data. Traditional approaches in the centralised model rely on a trusted aggregator to gather the raw data, aggregate statistics and introduce appropriate noise. Recent work has tried to relax the trust assumptions and reduce the need for trusted entities. However, such systems can trade off trust for increased noise and still require complete trust in some participants. Moreover, they do not prevent a malicious entity from introducing adversarial noise to skew the result or unmask some inputs.
In this paper, we introduce the notion of ``verifiable differential privacy with covert security''. The purpose is to ensure both privacy of the client's data and assurance that the output is not subject to any form of adversarial manipulation. The result is that everyone is assured that the noise used for differential privacy has been generated correctly, but no one can determine what the noise was. In the event of a malicious entity attempting to pervert the protocol, their actions will be detected with a constant probability negligibly close to one. We show that such verifiable privacy is practical and can be implemented at scale.
[[2208.09079] A Multi-Modal Wildfire Prediction and Personalized Early-Warning System Based on a Novel Machine Learning Framework](http://arxiv.org/abs/2208.09079)
Wildfires are increasingly impacting the environment, human health and safety. Among the top 20 California wildfires, those in 2020-2021 burned more acres than the last century combined. California's 2018 wildfire season caused damages of $148.5 billion. Among millions of impacted people, those living with disabilities (around 15% of the world population) are disproportionately impacted due to inadequate means of alerts. In this project, a multi-modal wildfire prediction and personalized early warning system has been developed based on an advanced machine learning architecture. Sensor data from the Environmental Protection Agency and historical wildfire data from 2012 to 2018 have been compiled to establish a comprehensive wildfire database, the largest of its kind. Next, a novel U-Convolutional-LSTM (Long Short-Term Memory) neural network was designed with a special architecture for extracting key spatial and temporal features from contiguous environmental parameters indicative of impending wildfires. Environmental and meteorological factors were incorporated into the database and classified as leading indicators and trailing indicators, correlated to risks of wildfire conception and propagation respectively. Additionally, geological data was used to provide better wildfire risk assessment. This novel spatio-temporal neural network achieved >97% accuracy vs. around 76% using traditional convolutional neural networks, successfully predicting 2018's five most devastating wildfires 5-14 days in advance. Finally, a personalized early warning system, tailored to individuals with sensory disabilities or respiratory exacerbation conditions, was proposed. This technique would enable fire departments to anticipate and prevent wildfires before they strike and provide early warnings for at-risk individuals for better preparation, thereby saving lives and reducing economic damages.
[[2208.09411] Wildfire Forecasting with Satellite Images and Deep Generative Model](http://arxiv.org/abs/2208.09411)
Wildfire forecasting has been one of the most critical tasks that humanities want to thrive. It plays a vital role in protecting human life. Wildfire prediction, on the other hand, is difficult because of its stochastic and chaotic properties. We tackled the problem by interpreting a series of wildfire images as a video and used it to anticipate how the fire would behave in the future. However, creating video prediction models that account for the inherent uncertainty of the future is challenging. The bulk of published attempts is based on stochastic image-autoregressive recurrent networks, which raises various performance and application difficulties, such as computational cost and limited efficiency on massive datasets. Another possibility is to use entirely latent temporal models that combine frame synthesis and temporal dynamics. However, due to design and training issues, no such model for stochastic video prediction has yet been proposed in the literature. This paper addresses these issues by introducing a novel stochastic temporal model whose dynamics are driven in a latent space. It naturally predicts video dynamics by allowing our lighter, more interpretable latent model to beat previous state-of-the-art approaches on the GOES-16 dataset. Results will be compared towards various benchmarking models.
[[2208.09285] Shadows Aren't So Dangerous After All: A Fast and Robust Defense Against Shadow-Based Adversarial Attacks](http://arxiv.org/abs/2208.09285)
Robust classification is essential in tasks like autonomous vehicle sign recognition, where the downsides of misclassification can be grave. Adversarial attacks threaten the robustness of neural network classifiers, causing them to consistently and confidently misidentify road signs. One such class of attack, shadow-based attacks, causes misidentifications by applying a natural-looking shadow to input images, resulting in road signs that appear natural to a human observer but confusing for these classifiers. Current defenses against such attacks use a simple adversarial training procedure to achieve a rather low 25\% and 40\% robustness on the GTSRB and LISA test sets, respectively. In this paper, we propose a robust, fast, and generalizable method, designed to defend against shadow attacks in the context of road sign recognition, that augments source images with binary adaptive threshold and edge maps. We empirically show its robustness against shadow attacks, and reformulate the problem to show its similarity $\varepsilon$ perturbation-based attacks. Experimental results show that our edge defense results in 78\% robustness while maintaining 98\% benign test accuracy on the GTSRB test set, with similar results from our threshold defense. Link to our code is in the paper.
[[2208.09336] Dispersed Pixel Perturbation-based Imperceptible Backdoor Trigger for Image Classifier Models](http://arxiv.org/abs/2208.09336)
Typical deep neural network (DNN) backdoor attacks are based on triggers embedded in inputs. Existing imperceptible triggers are computationally expensive or low in attack success. In this paper, we propose a new backdoor trigger, which is easy to generate, imperceptible, and highly effective. The new trigger is a uniformly randomly generated three-dimensional (3D) binary pattern that can be horizontally and/or vertically repeated and mirrored and superposed onto three-channel images for training a backdoored DNN model. Dispersed throughout an image, the new trigger produces weak perturbation to individual pixels, but collectively holds a strong recognizable pattern to train and activate the backdoor of the DNN. We also analytically reveal that the trigger is increasingly effective with the improving resolution of the images. Experiments are conducted using the ResNet-18 and MLP models on the MNIST, CIFAR-10, and BTSR datasets. In terms of imperceptibility, the new trigger outperforms existing triggers, such as BadNets, Trojaned NN, and Hidden Backdoor, by over an order of magnitude. The new trigger achieves an almost 100% attack success rate, only reduces the classification accuracy by less than 0.7%-2.4%, and invalidates the state-of-the-art defense techniques.
[[2208.09195] Real-Time Robust Video Object Detection System Against Physical-World Adversarial Attacks](http://arxiv.org/abs/2208.09195)
DNN-based video object detection (VOD) powers autonomous driving and video surveillance industries with rising importance and promising opportunities. However, adversarial patch attack yields huge concern in live vision tasks because of its practicality, feasibility, and powerful attack effectiveness. This work proposes Themis, a software/hardware system to defend against adversarial patches for real-time robust video object detection. We observe that adversarial patches exhibit extremely localized superficial feature importance in a small region with non-robust predictions, and thus propose the adversarial region detection algorithm for adversarial effect elimination. Themis also proposes a systematic design to efficiently support the algorithm by eliminating redundant computations and memory traffics. Experimental results show that the proposed methodology can effectively recover the system from the adversarial attack with negligible hardware overhead.
[[2208.09427] Curbing Task Interference using Representation Similarity-Guided Multi-Task Feature Sharing](http://arxiv.org/abs/2208.09427)
Multi-task learning of dense prediction tasks, by sharing both the encoder and decoder, as opposed to sharing only the encoder, provides an attractive front to increase both accuracy and computational efficiency. When the tasks are similar, sharing the decoder serves as an additional inductive bias providing more room for tasks to share complementary information among themselves. However, increased sharing exposes more parameters to task interference which likely hinders both generalization and robustness. Effective ways to curb this interference while exploiting the inductive bias of sharing the decoder remains an open challenge. To address this challenge, we propose Progressive Decoder Fusion (PDF) to progressively combine task decoders based on inter-task representation similarity. We show that this procedure leads to a multi-task network with better generalization to in-distribution and out-of-distribution data and improved robustness to adversarial attacks. Additionally, we observe that the predictions of different tasks of this multi-task network are more consistent with each other.
[[2208.09316] UKP-SQuARE v2 Explainability and Adversarial Attacks for Trustworthy QA](http://arxiv.org/abs/2208.09316)
Question Answering (QA) systems are increasingly deployed in applications where they support real-world decisions. However, state-of-the-art models rely on deep neural networks, which are difficult to interpret by humans. Inherently interpretable models or post hoc explainability methods can help users to comprehend how a model arrives at its prediction and, if successful, increase their trust in the system. Furthermore, researchers can leverage these insights to develop new methods that are more accurate and less biased. In this paper, we introduce SQuARE v2, the new version of SQuARE, to provide an explainability infrastructure for comparing models based on methods such as saliency maps and graph-based explanations. While saliency maps are useful to inspect the importance of each input token for the model's prediction, graph-based explanations from external Knowledge Graphs enable the users to verify the reasoning behind the model prediction. In addition, we provide multiple adversarial attacks to compare the robustness of QA models. With these explainability methods and adversarial attacks, we aim to ease the research on trustworthy QA models. SQuARE is available on https://square.ukp-lab.de.
[[2208.09466] Gender Bias and Universal Substitution Adversarial Attacks on Grammatical Error Correction Systems for Automated Assessment](http://arxiv.org/abs/2208.09466)
Grammatical Error Correction (GEC) systems perform a sequence-to-sequence task, where an input word sequence containing grammatical errors, is corrected for these errors by the GEC system to output a grammatically correct word sequence. With the advent of deep learning methods, automated GEC systems have become increasingly popular. For example, GEC systems are often used on speech transcriptions of English learners as a form of assessment and feedback - these powerful GEC systems can be used to automatically measure an aspect of a candidate's fluency. The count of \textit{edits} from a candidate's input sentence (or essay) to a GEC system's grammatically corrected output sentence is indicative of a candidate's language ability, where fewer edits suggest better fluency. The count of edits can thus be viewed as a \textit{fluency score} with zero implying perfect fluency. However, although deep learning based GEC systems are extremely powerful and accurate, they are susceptible to adversarial attacks: an adversary can introduce a small, specific change at the input of a system that causes a large, undesired change at the output. When considering the application of GEC systems to automated language assessment, the aim of an adversary could be to cheat by making a small change to a grammatically incorrect input sentence that conceals the errors from a GEC system, such that no edits are found and the candidate is unjustly awarded a perfect fluency score. This work examines a simple universal substitution adversarial attack that non-native speakers of English could realistically employ to deceive GEC systems used for assessment.
[[2208.09140] An Optimal Energy Efficient Design of Artificial Noise for Preventing Power Leakage based Side-Channel Attacks](http://arxiv.org/abs/2208.09140)
Side-channel attacks (SCAs), which infer secret information (for example secret keys) by exploiting information that leaks from the implementation (such as power consumption), have been shown to be a non-negligible threat to modern cryptographic implementations and devices in recent years. Hence, how to prevent side-channel attacks on cryptographic devices has become an important problem. One of the widely used countermeasures to against power SCAs is the injection of random noise sequences into the raw leakage traces. However, the indiscriminate injection of random noise can lead to significant increases in energy consumption in device, and ways must be found to reduce the amount of energy in noise generation while keeping the side-channel invisible. In this paper, we propose an optimal energy-efficient design for artificial noise generation to prevent side-channel attacks. This approach exploits the sparsity among the leakage traces. We model the side-channel as a communication channel, which allows us to use channel capacity to measure the mutual information between the secret and the leakage traces. For a given energy budget in the noise generation, we obtain the optimal design of the artificial noise injection by solving the side-channel's channel capacity minimization problem. The experimental results also validate the effectiveness of our proposed scheme.
[[2208.09170] Crafting Monocular Cues and Velocity Guidance for Self-Supervised Multi-Frame Depth Learning](http://arxiv.org/abs/2208.09170)
Self-supervised monocular methods can efficiently learn depth information of weakly textured surfaces or reflective objects. However, the depth accuracy is limited due to the inherent ambiguity in monocular geometric modeling. In contrast, multi-frame depth estimation methods improve the depth accuracy thanks to the success of Multi-View Stereo (MVS), which directly makes use of geometric constraints. Unfortunately, MVS often suffers from texture-less regions, non-Lambertian surfaces, and moving objects, especially in real-world video sequences without known camera motion and depth supervision. Therefore, we propose MOVEDepth, which exploits the MOnocular cues and VElocity guidance to improve multi-frame Depth learning. Unlike existing methods that enforce consistency between MVS depth and monocular depth, MOVEDepth boosts multi-frame depth learning by directly addressing the inherent problems of MVS. The key of our approach is to utilize monocular depth as a geometric priority to construct MVS cost volume, and adjust depth candidates of cost volume under the guidance of predicted camera velocity. We further fuse monocular depth and MVS depth by learning uncertainty in the cost volume, which results in a robust depth estimation against ambiguity in multi-view geometry. Extensive experiments show MOVEDepth achieves state-of-the-art performance: Compared with Monodepth2 and PackNet, our method relatively improves the depth accuracy by 20\% and 19.8\% on the KITTI benchmark. MOVEDepth also generalizes to the more challenging DDAD benchmark, relatively outperforming ManyDepth by 7.2\%. The code is available at https://github.com/JeffWang987/MOVEDepth.
[[2208.09198] TTT-UCDR: Test-time Training for Universal Cross-Domain Retrieval](http://arxiv.org/abs/2208.09198)
Image retrieval is a niche problem in computer vision curated towards finding similar images in a database using a query. In this work, for the first time in literature, we employ test-time training techniques for adapting to distribution shifts under Universal Cross-Domain Retrieval (UCDR). Test-time training has previously been shown to reduce generalization error for image classification, domain adaptation, semantic segmentation, and zero-shot sketch-based image retrieval (ZS-SBIR). In UCDR, in addition to the semantic shift of unknown categories present in ZS-SBIR, the presence of unknown domains leads to even higher distribution shifts. To bridge this domain gap, we use self-supervision through 3 different losses - Barlow Twins, Jigsaw Puzzle and RotNet on a pretrained network at test-time. This simple approach leads to improvements on UCDR benchmarks and also improves model robustness under a challenging cross-dataset generalization setting.
[[2208.09286] Background Invariance Testing According to Semantic Proximity](http://arxiv.org/abs/2208.09286)
In many applications, machine learned (ML) models are required to hold some invariance qualities, such as rotation, size, intensity, and background invariance. Unlike many types of variance, the variants of background scenes cannot be ordered easily, which makes it difficult to analyze the robustness and biases of the models concerned. In this work, we present a technical solution for ordering background scenes according to their semantic proximity to a target image that contains a foreground object being tested. We make use of the results of object recognition as the semantic description of each image, and construct an ontology for storing knowledge about relationships among different objects using association analysis. This ontology enables (i) efficient and meaningful search for background scenes of different semantic distances to a target image, (ii) quantitative control of the distribution and sparsity of the sampled background scenes, and (iii) quality assurance using visual representations of invariance testing results (referred to as variance matrices). In this paper, we also report the training of an ML4ML assessor to evaluate the invariance quality of ML models automatically.
[[2208.09315] Self-Supervised Visual Place Recognition by Mining Temporal and Feature Neighborhoods](http://arxiv.org/abs/2208.09315)
Visual place recognition (VPR) using deep networks has achieved state-of-the-art performance. However, most of them require a training set with ground truth sensor poses to obtain positive and negative samples of each observation's spatial neighborhood for supervised learning. When such information is unavailable, temporal neighborhoods from a sequentially collected data stream could be exploited for self-supervised training, although we find its performance suboptimal. Inspired by noisy label learning, we propose a novel self-supervised framework named \textit{TF-VPR} that uses temporal neighborhoods and learnable feature neighborhoods to discover unknown spatial neighborhoods. Our method follows an iterative training paradigm which alternates between: (1) representation learning with data augmentation, (2) positive set expansion to include the current feature space neighbors, and (3) positive set contraction via geometric verification. We conduct comprehensive experiments on both simulated and real datasets, with either RGB images or point clouds as inputs. The results show that our method outperforms our baselines in recall rate, robustness, and heading diversity, a novel metric we propose for VPR. Our code and datasets can be found at https://ai4ce.github.io/TF-VPR/.
[[2208.09330] Low-light Enhancement Method Based on Attention Map Net](http://arxiv.org/abs/2208.09330)
Low-light image enhancement is a crucial preprocessing task for some complex vision tasks. Target detection, image segmentation, and image recognition outcomes are all directly impacted by the impact of image enhancement. However, the majority of the currently used image enhancement techniques do not produce satisfactory outcomes, and these enhanced networks have relatively weak robustness. We suggest an improved network called BrightenNet that uses U-Net as its primary structure and incorporates a number of different attention mechanisms as a solution to this issue. In a specific application, we employ the network as the generator and LSGAN as the training framework to achieve better enhancement results. We demonstrate the validity of the proposed network BrightenNet in the experiments that follow in this paper. The results it produced can both preserve image details and conform to human vision standards.
[[2208.09414] ModSelect: Automatic Modality Selection for Synthetic-to-Real Domain Generalization](http://arxiv.org/abs/2208.09414)
Modality selection is an important step when designing multimodal systems, especially in the case of cross-domain activity recognition as certain modalities are more robust to domain shift than others. However, selecting only the modalities which have a positive contribution requires a systematic approach. We tackle this problem by proposing an unsupervised modality selection method (ModSelect), which does not require any ground-truth labels. We determine the correlation between the predictions of multiple unimodal classifiers and the domain discrepancy between their embeddings. Then, we systematically compute modality selection thresholds, which select only modalities with a high correlation and low domain discrepancy. We show in our experiments that our method ModSelect chooses only modalities with positive contributions and consistently improves the performance on a Synthetic-to-Real domain adaptation benchmark, narrowing the domain gap.
[[2208.09180] Effective Transfer Learning for Low-Resource Natural Language Understanding](http://arxiv.org/abs/2208.09180)
Natural language understanding (NLU) is the task of semantic decoding of human languages by machines. NLU models rely heavily on large training data to ensure good performance. However, substantial languages and domains have very few data resources and domain experts. It is necessary to overcome the data scarcity challenge, when very few or even zero training samples are available. In this thesis, we focus on developing cross-lingual and cross-domain methods to tackle the low-resource issues. First, we propose to improve the model's cross-lingual ability by focusing on the task-related keywords, enhancing the model's robustness and regularizing the representations. We find that the representations for low-resource languages can be easily and greatly improved by focusing on just the keywords. Second, we present Order-Reduced Modeling methods for the cross-lingual adaptation, and find that modeling partial word orders instead of the whole sequence can improve the robustness of the model against word order differences between languages and task knowledge transfer to low-resource languages. Third, we propose to leverage different levels of domain-related corpora and additional masking of data in the pre-training for the cross-domain adaptation, and discover that more challenging pre-training can better address the domain discrepancy issue in the task knowledge transfer. Finally, we introduce a coarse-to-fine framework, Coach, and a cross-lingual and cross-domain parsing framework, X2Parser. Coach decomposes the representation learning process into a coarse-grained and a fine-grained feature learning, and X2Parser simplifies the hierarchical task structures into flattened ones. We observe that simplifying task structures makes the representation learning more effective for low-resource languages and domains.
[[2208.09329] Causal Intervention Improves Implicit Sentiment Analysis](http://arxiv.org/abs/2208.09329)
Despite having achieved great success for sentiment analysis, existing neural models struggle with implicit sentiment analysis. This may be due to the fact that they may latch onto spurious correlations ("shortcuts", e.g., focusing only on explicit sentiment words), resulting in undermining the effectiveness and robustness of the learned model. In this work, we propose a causal intervention model for Implicit Sentiment Analysis using Instrumental Variable (ISAIV). We first review sentiment analysis from a causal perspective and analyze the confounders existing in this task. Then, we introduce an instrumental variable to eliminate the confounding causal effects, thus extracting the pure causal effect between sentence and sentiment. We compare the proposed ISAIV model with several strong baselines on both the general implicit sentiment analysis and aspect-based implicit sentiment analysis tasks. The results indicate the great advantages of our model and the efficacy of implicit sentiment reasoning.
[[2208.09418] SAFARI: Versatile and Efficient Evaluations for Robustness of Interpretability](http://arxiv.org/abs/2208.09418)
Interpretability of Deep Learning (DL) models is arguably the barrier in front of trustworthy AI. Despite great efforts made by the Explainable AI (XAI) community, explanations lack robustness--indistinguishable input perturbations may lead to different XAI results. Thus, it is vital to assess how robust DL interpretability is, given an XAI technique. To this end, we identify the following challenges that state-of-the-art is unable to cope with collectively: i) XAI techniques are highly heterogeneous; ii) misinterpretations are normally rare events; iii) both worst-case and overall robustness are of practical interest. In this paper, we propose two evaluation methods to tackle them--i) they are of black-box nature, based on Genetic Algorithm (GA) and Subset Simulation (SS); ii) bespoke fitness functions are used by GA to solve a constrained optimisation efficiently, while SS is dedicated to estimating rare event probabilities; iii) two diverse metrics are introduced, concerning the worst-case interpretation discrepancy and a probabilistic notion of \textit{how} robust in general, respectively. We conduct experiments to study the accuracy, sensitivity and efficiency of our methods that outperform state-of-the-arts. Finally, we show two applications of our methods for ranking robust XAI methods and selecting training schemes to improve both classification and interpretation robustness.
[[2208.09027] GraTO: Graph Neural Network Framework Tackling Over-smoothing with Neural Architecture Search](http://arxiv.org/abs/2208.09027)
Current Graph Neural Networks (GNNs) suffer from the over-smoothing problem, which results in indistinguishable node representations and low model performance with more GNN layers. Many methods have been put forward to tackle this problem in recent years. However, existing tackling over-smoothing methods emphasize model performance and neglect the over-smoothness of node representations. Additional, different approaches are applied one at a time, while there lacks an overall framework to jointly leverage multiple solutions to the over-smoothing challenge. To solve these problems, we propose GraTO, a framework based on neural architecture search to automatically search for GNNs architecture. GraTO adopts a novel loss function to facilitate striking a balance between model performance and representation smoothness. In addition to existing methods, our search space also includes DropAttribute, a novel scheme for alleviating the over-smoothing challenge, to fully leverage diverse solutions. We conduct extensive experiments on six real-world datasets to evaluate GraTo, which demonstrates that GraTo outperforms baselines in the over-smoothing metrics and achieves competitive performance in accuracy. GraTO is especially effective and robust with increasing numbers of GNN layers. Further experiments bear out the quality of node representations learned with GraTO and the effectiveness of model architecture. We make cide of GraTo available at Github (\url{https://github.com/fxsxjtu/GraTO}).
[[2208.09139] DAFT: Distilling Adversarially Fine-tuned Models for Better OOD Generalization](http://arxiv.org/abs/2208.09139)
We consider the problem of OOD generalization, where the goal is to train a model that performs well on test distributions that are different from the training distribution. Deep learning models are known to be fragile to such shifts and can suffer large accuracy drops even for slightly different test distributions. We propose a new method - DAFT - based on the intuition that adversarially robust combination of a large number of rich features should provide OOD robustness. Our method carefully distills the knowledge from a powerful teacher that learns several discriminative features using standard training while combining them using adversarial training. The standard adversarial training procedure is modified to produce teachers which can guide the student better. We evaluate DAFT on standard benchmarks in the DomainBed framework, and demonstrate that DAFT achieves significant improvements over the current state-of-the-art OOD generalization methods. DAFT consistently out-performs well-tuned ERM and distillation baselines by up to 6%, with more pronounced gains for smaller networks.
[[2208.09449] A Novel Plug-and-Play Approach for Adversarially Robust Generalization](http://arxiv.org/abs/2208.09449)
In this work, we propose a robust framework that employs adversarially robust training to safeguard the machine learning models against perturbed testing data. We achieve this by incorporating the worst-case additive adversarial error within a fixed budget for each sample during model estimation. Our main focus is to provide a plug-and-play solution that can be incorporated in the existing machine learning algorithms with minimal changes. To that end, we derive the closed-form ready-to-use solution for several widely used loss functions with a variety of norm constraints on adversarial perturbation. Finally, we validate our approach by showing significant performance improvement on real-world datasets for supervised problems such as regression and classification, as well as for unsupervised problems such as matrix completion and learning graphical models, with very little computational overhead.
[[2208.09061] Mouse Dynamics Behavioral Biometrics: A Survey](http://arxiv.org/abs/2208.09061)
Utilization of internet in everyday life has made us vulnerable in terms of privacy and security of our data and systems. Therefore, there are pressing needs to protect our data and systems by improving authentication mechanisms, which needs to be low cost, unobtrusive, and ideally ubiquitous in nature. Behavioral biometrics modalities such as mouse dynamics (i.e., mouse behaviors on a graphical user interface-GUI) and widget interactions (i.e., another modality closely related to mouse dynamics that also considers the target (widget) of a GUI interaction, such as links, button, and combo-box) can bolster the security of existing authentication systems because of their ability to distinguish an individual based on their unique features. As a result, it can be difficult for an imposter to impersonate these behavioral biometrics, making them suitable for authentication. In this paper, we survey the literature on mouse dynamics and widget interactions dated from 1897 to
[[2208.09183] Improved Image Classification with Token Fusion](http://arxiv.org/abs/2208.09183)
In this paper, we propose a method using the fusion of CNN and transformer structure to improve image classification performance. In the case of CNN, information about a local area on an image can be extracted well, but there is a limit to the extraction of global information. On the other hand, the transformer has an advantage in relatively global extraction, but has a disadvantage in that it requires a lot of memory for local feature value extraction. In the case of an image, it is converted into a feature map through CNN, and each feature map's pixel is considered a token. At the same time, the image is divided into patch areas and then fused with the transformer method that views them as tokens. For the fusion of tokens with two different characteristics, we propose three methods: (1) late token fusion with parallel structure, (2) early token fusion, (3) token fusion in a layer by layer. In an experiment using ImageNet 1k, the proposed method shows the best classification performance.
[[2208.09163] UniCausal: Unified Benchmark and Model for Causal Text Mining](http://arxiv.org/abs/2208.09163)
Current causal text mining datasets vary in objectives, data coverage, and annotation schemes. These inconsistent efforts prevented modeling capabilities and fair comparisons of model performance. Few datasets include cause-effect span annotations, which are needed for end-to-end causal extraction. Therefore, we proposed UniCausal, a unified benchmark for causal text mining across three tasks: Causal Sequence Classification, Cause-Effect Span Detection and Causal Pair Classification. We consolidated and aligned annotations of six high quality human-annotated corpus, resulting in a total of 58,720, 12,144 and 69,165 examples for each task respectively. Since the definition of causality can be subjective, our framework was designed to allow researchers to work on some or all datasets and tasks. As an initial benchmark, we adapted BERT pre-trained models to our task and generated baseline scores. We achieved 70.10% Binary F1 score for Sequence Classification, 52.42% Macro F1 score for Span Detection, and 84.68% Binary F1 score for Pair Classification.
[[2208.09299] SimLDA: A tool for topic model evaluation](http://arxiv.org/abs/2208.09299)
Variational Bayes (VB) applied to latent Dirichlet allocation (LDA) has become the most popular algorithm for aspect modeling. While sufficiently successful in text topic extraction from large corpora, VB is less successful in identifying aspects in the presence of limited data. We present a novel variational message passing algorithm as applied to Latent Dirichlet Allocation (LDA) and compare it with the gold standard VB and collapsed Gibbs sampling. In situations where marginalisation leads to non-conjugate messages, we use ideas from sampling to derive approximate update equations. In cases where conjugacy holds, Loopy Belief update (LBU) (also known as Lauritzen-Spiegelhalter) is used. Our algorithm, ALBU (approximate LBU), has strong similarities with Variational Message Passing (VMP) (which is the message passing variant of VB). To compare the performance of the algorithms in the presence of limited data, we use data sets consisting of tweets and news groups. Using coherence measures we show that ALBU learns latent distributions more accurately than does VB, especially for smaller data sets.
[[2208.09354] End-to-end Clinical Event Extraction from Chinese Electronic Health Record](http://arxiv.org/abs/2208.09354)
Event extraction is an important work of medical text processing. According to the complex characteristics of medical text annotation, we use the end-to-end event extraction model to enhance the output formatting information of events. Through pre training and fine-tuning, we can extract the attributes of the four dimensions of medical text: anatomical position, subject word, description word and occurrence state. On the test set, the accuracy rate was 0.4511, the recall rate was 0.3928, and the F1 value was 0.42. The method of this model is simple, and it has won the second place in the task of mining clinical discovery events (task2) in the Chinese electronic medical record of the seventh China health information processing Conference (chip2021).
[[2208.09440] Feature Selection for Fault Detection and Prediction based on Event Log Analysis](http://arxiv.org/abs/2208.09440)
Event logs are widely used for anomaly detection and prediction in complex systems. Existing log-based anomaly detection methods usually consist of four main steps: log collection, log parsing, feature extraction, and anomaly detection, wherein the feature extraction step extracts useful features for anomaly detection by counting log events. For a complex system, such as a lithography machine consisting of a large number of subsystems, its log may contain thousands of different events, resulting in abounding extracted features. However, when anomaly detection is performed at the subsystem level, analyzing all features becomes expensive and unnecessary. To mitigate this problem, we develop a feature selection method for log-based anomaly detection and prediction, largely improving the effectiveness and efficiency.
[[2208.09215] Almost Cost-Free Communication in Federated Best Arm Identification](http://arxiv.org/abs/2208.09215)
We study the problem of best arm identification in a federated learning multi-armed bandit setup with a central server and multiple clients. Each client is associated with a multi-armed bandit in which each arm yields {\em i.i.d.}\ rewards following a Gaussian distribution with an unknown mean and known variance. The set of arms is assumed to be the same at all the clients. We define two notions of best arm -- local and global. The local best arm at a client is the arm with the largest mean among the arms local to the client, whereas the global best arm is the arm with the largest average mean across all the clients. We assume that each client can only observe the rewards from its local arms and thereby estimate its local best arm. The clients communicate with a central server on uplinks that entail a cost of $C\ge0$ units per usage per uplink. The global best arm is estimated at the server. The goal is to identify the local best arms and the global best arm with minimal total cost, defined as the sum of the total number of arm selections at all the clients and the total communication cost, subject to an upper bound on the error probability. We propose a novel algorithm {\sc FedElim} that is based on successive elimination and communicates only in exponential time steps and obtain a high probability instance-dependent upper bound on its total cost. The key takeaway from our paper is that for any $C\geq 0$ and error probabilities sufficiently small, the total number of arm selections (resp.\ the total cost) under {\sc FedElim} is at most~$2$ (resp.~$3$) times the maximum total number of arm selections under its variant that communicates in every time step. Additionally, we show that the latter is optimal in expectation up to a constant factor, thereby demonstrating that communication is almost cost-free in {\sc FedElim}. We numerically validate the efficacy of {\sc FedElim}.
[[2208.09378] Federated Learning with Noisy Labels](http://arxiv.org/abs/2208.09378)
Federated Learning (FL) is a distributed machine learning paradigm that enables learning models from decentralized private datasets, where the labeling effort is entrusted to the clients. While most existing FL approaches assume high-quality labels are readily available on users' devices; in reality, label noise can naturally occur in FL and follows a non-i.i.d. distribution among clients. Due to the non-iid-ness challenges, existing state-of-the-art centralized approaches exhibit unsatisfactory performance, while previous FL studies rely on data exchange or repeated server-side aid to improve model's performance. Here, we propose FedLN, a framework to deal with label noise across different FL training stages; namely, FL initialization, on-device model training, and server model aggregation. Specifically, FedLN computes per-client noise-level estimation in a single federated round and improves the models' performance by correcting (or limiting the effect of) noisy samples. Extensive experiments on various publicly available vision and audio datasets demonstrate a 24% improvement on average compared to other existing methods for a label noise level of 70%. We further validate the efficiency of FedLN in human-annotated real-world noisy datasets and report a 9% increase on average in models' recognition rate, highlighting that FedLN can be useful for improving FL services provided to everyday users.
[[2208.09432] Federated Select: A Primitive for Communication- and Memory-Efficient Federated Learning](http://arxiv.org/abs/2208.09432)
Federated learning (FL) is a framework for machine learning across heterogeneous client devices in a privacy-preserving fashion. To date, most FL algorithms learn a "global" server model across multiple rounds. At each round, the same server model is broadcast to all participating clients, updated locally, and then aggregated across clients. In this work, we propose a more general procedure in which clients "select" what values are sent to them. Notably, this allows clients to operate on smaller, data-dependent slices. In order to make this practical, we outline a primitive, federated select, which enables client-specific selection in realistic FL systems. We discuss how to use federated select for model training and show that it can lead to drastic reductions in communication and client memory usage, potentially enabling the training of models too large to fit on-device. We also discuss the implications of federated select on privacy and trust, which in turn affect possible system constraints and design. Finally, we discuss open questions concerning model architectures, privacy-preserving technologies, and practical FL systems.
[[2208.09478] Communication Size Reduction of Federated Learning based on Neural ODE Model](http://arxiv.org/abs/2208.09478)
Federated learning is a machine learning method in which data is not aggregated on a server, but is distributed to the edges, in consideration of security and privacy. ResNet is a classic but representative neural network that succeeds in deepening the neural network by learning a residual function that adds the inputs and outputs together. In federated learning, communication is performed between the server and edge devices to exchange weight parameters, but ResNet has deep layers and a large number of parameters, so communication size becomes large. In this paper, we use Neural ODE as a lightweight model of ResNet to reduce communication size in federated learning. In addition, we newly introduce a flexible federated learning using Neural ODE models with different number of iterations, which correspond to ResNet with different depths. The CIFAR-10 dataset is used in the evaluation, and the use of Neural ODE reduces communication size by approximately 90% compared to ResNet. We also show that the proposed flexible federated learning can merge models with different iteration counts.
[[2208.09147] Disentangled Representation with Causal Constraints for Counterfactual Fairness](http://arxiv.org/abs/2208.09147)
Much research has been devoted to the problem of learning fair representations; however, they do not explicitly the relationship between latent representations. In many real-world applications, there may be causal relationships between latent representations. Furthermore, most fair representation learning methods focus on group-level fairness and are based on correlations, ignoring the causal relationships underlying the data. In this work, we theoretically demonstrate that using the structured representations enable downstream predictive models to achieve counterfactual fairness, and then we propose the Counterfactual Fairness Variational AutoEncoder (CF-VAE) to obtain structured representations with respect to domain knowledge. The experimental results show that the proposed method achieves better fairness and accuracy performance than the benchmark fairness methods.
[[2208.09240] An Unsupervised Short- and Long-Term Mask Representation for Multivariate Time Series Anomaly Detection](http://arxiv.org/abs/2208.09240)
Anomaly detection of multivariate time series is meaningful for system behavior monitoring. This paper proposes an anomaly detection method based on unsupervised Short- and Long-term Mask Representation learning (SLMR). The main idea is to extract short-term local dependency patterns and long-term global trend patterns of the multivariate time series by using multi-scale residual dilated convolution and Gated Recurrent Unit(GRU) respectively. Furthermore, our approach can comprehend temporal contexts and feature correlations by combining spatial-temporal masked self-supervised representation learning and sequence split. It considers the importance of features is different, and we introduce the attention mechanism to adjust the contribution of each feature. Finally, a forecasting-based model and a reconstruction-based model are integrated to focus on single timestamp prediction and latent representation of time series. Experiments show that the performance of our method outperforms other state-of-the-art models on three real-world datasets. Further analysis shows that our method is good at interpretability.