[[2211.02246] DatChain -- Blockchain implementation in Data transfer for IoT Devices](http://arxiv.org/abs/2211.02246)
Currently, the IoT ecosystem is comprised of fully connected smart devices that exchange data to provide more automated, precise, and fast decisions. This idealised situation can only be accomplished if a system for data transactions is processed efficiently and security is ensured with high scalability and practicability. The integrity of data must be maintained during the exchange or transfer of data between entities. We propose to make a application called DatChain that responds to the above situation. The application stores data sensed by the Iot sensors in the backend after encrypting it and when the data is required for any purpose it can be exchanged using a suitable blockchain network that can keep up with the transfer rate even at high traffic in a secure environment.
[[2211.02408] Rickrolling the Artist: Injecting Invisible Backdoors into Text-Guided Image Generation Models](http://arxiv.org/abs/2211.02408)
While text-to-image synthesis currently enjoys great popularity among researchers and the general public, the security of these models has been neglected so far. Many text-guided image generation models rely on pre-trained text encoders from external sources, and their users trust that the retrieved models will behave as promised. Unfortunately, this might not be the case. We introduce backdoor attacks against text-guided generative models and demonstrate that their text encoders pose a major tampering risk. Our attacks only slightly alter an encoder so that no suspicious model behavior is apparent for image generations with clean prompts. By then inserting a single non-Latin character into the prompt, the adversary can trigger the model to either generate images with pre-defined attributes or images following a hidden, potentially malicious description. We empirically demonstrate the high effectiveness of our attacks on Stable Diffusion and highlight that the injection process of a single backdoor takes less than two minutes. Besides phrasing our approach solely as an attack, it can also force an encoder to forget phrases related to certain concepts, such as nudity or violence, and help to make image generation safer.
[[2211.02266] Rescuing the End-user systems from Vulnerable Applications using Virtualization Techniques](http://arxiv.org/abs/2211.02266)
In systems owned by normal end-users, many times security attacks are mounted by sneaking in malicious applications or exploiting existing software vulnerabilities through security non-conforming actions of users. Virtualization approaches can address this problem by providing a quarantine environment for applications, malicious devices, and device drivers, which are mostly used as entry points for security attacks. However, the existing methods to provide quarantine environments using virtualization are not transparent to the user, both in terms of application interface transparency and file system transparency. Further, software configuration level solutions like remote desktops and remote application access mechanisms combined with shared file systems do not meet the user transparency and security requirements. We propose qOS, a VM-based solution combined with certain OS extensions to meet the security requirements of end-point systems owned by normal users, in a transparent and efficient manner. We demonstrate the efficacy of qOS by empirically evaluating the prototype implementation in the Linux+KVM system in terms of efficiency, security, and user transparency.
[[2211.02579] V2X Misbehavior in Maneuver Sharing and Coordination Service: Considerations for Standardization](http://arxiv.org/abs/2211.02579)
Connected and Automated Vehicles (CAV) use sensors and wireless communication to improve road safety and efficiency. However, attackers may target Vehicle-to-Everything (V2X) communication. Indeed, an attacker may send authenticated-but-wrong data to send false location information, alert incorrect events, or report a bogus object endangering safety of other CAVs. Standardization Development Organizations (SDO) are currently working on developing security standards against such attacks. Unfortunately, current standardization efforts do not include misbehavior specifications for advanced V2X services such as Maneuver Sharing and Coordination Service (MSCS). This work assesses the security of MSC Messages (MSCM) and proposes inputs for consideration in existing standards.
[[2211.02652] AntFuzzer: A Grey-Box Fuzzing Framework for EOSIO Smart Contracts](http://arxiv.org/abs/2211.02652)
In the past few years, several attacks against the vulnerabilities of EOSIO smart contracts have caused severe financial losses to this prevalent blockchain platform. As a lightweight test-generation approach, grey-box fuzzing can open up the possibility of improving the security of EOSIO smart contracts. However, developing a practical grey-box fuzzer for EOSIO smart contracts from scratch is time-consuming and requires a deep understanding of EOSIO internals. In this work, we proposed AntFuzzer, the first highly extensible grey-box fuzzing framework for EOSIO smart contracts. AntFuzzer implements a novel approach that interfaces AFL to conduct AFL-style grey-box fuzzing on EOSIO smart contracts. Compared to black-box fuzzing tools, AntFuzzer can effectively trigger those hard-to-cover branches. It achieved an improvement in code coverage on 37.5% of smart contracts in our benchmark dataset. AntFuzzer provides unified interfaces for users to easily develop new detection plugins for continually emerging vulnerabilities. We have implemented 6 detection plugins on AntFuzzer to detect major vulnerabilities of EOSIO smart contracts. In our large-scale fuzzing experiments on 4,616 real-world smart contracts, AntFuzzer successfully detected 741 vulnerabilities. The results demonstrate the effectiveness and efficiency of AntFuzzer and our detection pl
[[2211.02139] Can Querying for Bias Leak Protected Attributes? Achieving Privacy With Smooth Sensitivity](http://arxiv.org/abs/2211.02139)
Existing regulations prohibit model developers from accessing protected
attributes (gender, race, etc.), often resulting in fairness assessments on
populations without knowing their protected groups. In such scenarios,
institutions often adopt a separation between the model developers (who train
models with no access to the protected attributes) and a compliance team (who
may have access to the entire dataset for auditing purpose). However, the model
developers might be allowed to test their models for bias by querying the
compliance team for group fairness metrics. In this paper, we first demonstrate
that simply querying for fairness metrics, such as statistical parity and
equalized odds can leak the protected attributes of individuals to the model
developers. We demonstrate that there always exist strategies by which the
model developers can identify the protected attribute of a targeted individual
in the test dataset from just a single query. In particular, we show that one
can reconstruct the protected attributes of all the individuals from O(Nk log
n/Nk) queries when Nk<
[[2211.02161] Privacy-preserving Deep Learning based Record Linkage](http://arxiv.org/abs/2211.02161)
Deep learning-based linkage of records across different databases is becoming increasingly useful in data integration and mining applications to discover new insights from multiple sources of data. However, due to privacy and confidentiality concerns, organisations often are not willing or allowed to share their sensitive data with any external parties, thus making it challenging to build/train deep learning models for record linkage across different organizations' databases. To overcome this limitation, we propose the first deep learning-based multi-party privacy-preserving record linkage (PPRL) protocol that can be used to link sensitive databases held by multiple different organisations. In our approach, each database owner first trains a local deep learning model, which is then uploaded to a secure environment and securely aggregated to create a global model. The global model is then used by a linkage unit to distinguish unlabelled record pairs as matches and non-matches. We utilise differential privacy to achieve provable privacy protection against re-identification attacks. We evaluate the linkage quality and scalability of our approach using several large real-world databases, showing that it can achieve high linkage quality while providing sufficient privacy protection against existing attacks.
[[2211.02369] A Jigsaw Puzzle Solver-based Attack on Block-wise Image Encryption for Privacy-preserving DNNs](http://arxiv.org/abs/2211.02369)
Privacy-preserving deep neural networks (DNNs) have been proposed for protecting data privacy in the cloud server. Although several encryption schemes for visually protection have been proposed for privacy-preserving DNNs, several attacks enable to restore visual information from encrypted images. On the other hand, it has been confirmed that the block-wise image encryption scheme which utilizes block and pixel shuffling is robust against several attacks. In this paper, we propose a jigsaw puzzle solver-based attack to restore visual information from encrypted images including block and pixel shuffling. In experiments, images encrypted by using the block-wise image encryption are mostly restored by using the proposed attack.
[[2211.02179] Verifying RISC-V Physical Memory Protection](http://arxiv.org/abs/2211.02179)
We formally verify an open-source hardware implementation of physical memory protection (PMP) in RISC-V, which is a standard feature used for memory isolation in security critical systems such as the Keystone trusted execution environment. PMP provides per-hardware-thread machine-mode control registers that specify the access privileges for physical memory regions. We first formalize the functional property of the PMP rules based on the RISC-V ISA manual. Then, we use the LIME tool to translate an open-source implementation of the PMP hardware module written in Chisel to the UCLID5 formal verification language. We encode the formal specification in UCLID5 and verify the functional correctness of the hardware. This is an initial effort towards verifying the Keystone framework, where the trusted computing base (TCB) relies on PMP to provide security guarantees such as integrity and confidentiality.
[[2211.02223] Adversarial Defense via Neural Oscillation inspired Gradient Masking](http://arxiv.org/abs/2211.02223)
Spiking neural networks (SNNs) attract great attention due to their low power consumption, low latency, and biological plausibility. As they are widely deployed in neuromorphic devices for low-power brain-inspired computing, security issues become increasingly important. However, compared to deep neural networks (DNNs), SNNs currently lack specifically designed defense methods against adversarial attacks. Inspired by neural membrane potential oscillation, we propose a novel neural model that incorporates the bio-inspired oscillation mechanism to enhance the security of SNNs. Our experiments show that SNNs with neural oscillation neurons have better resistance to adversarial attacks than ordinary SNNs with LIF neurons on kinds of architectures and datasets. Furthermore, we propose a defense method that changes model's gradients by replacing the form of oscillation, which hides the original training gradients and confuses the attacker into using gradients of 'fake' neurons to generate invalid adversarial samples. Our experiments suggest that the proposed defense method can effectively resist both single-step and iterative attacks with comparable defense effectiveness and much less computational costs than adversarial training methods on DNNs. To the best of our knowledge, this is the first work that establishes adversarial defense through masking surrogate gradients on SNNs.
[[2211.02272] Logits are predictive of network type](http://arxiv.org/abs/2211.02272)
We show that it is possible to predict which deep network has generated a given logit vector with accuracy well above chance. We utilize a number of networks on a dataset, initialized with random weights or pretrained weights, as well as fine-tuned networks. A classifier is then trained on the logit vectors of the trained set of this dataset to map the logit vector to the network index that has generated it. The classifier is then evaluated on the test set of the dataset. Results are better with randomly initialized networks, but also generalize to pretrained networks as well as fine-tuned ones. Classification accuracy is higher using unnormalized logits than normalized ones. We find that there is little transfer when applying a classifier to the same networks but with different sets of weights. In addition to help better understand deep networks and the way they encode uncertainty, we anticipate our finding to be useful in some applications (e.g. tailoring an adversarial attack for a certain type of network). Code is available at https://github.com/aliborji/logits.
[[2211.02245] Unintended Memorization and Timing Attacks in Named Entity Recognition Models](http://arxiv.org/abs/2211.02245)
Named entity recognition models (NER), are widely used for identifying named entities (e.g., individuals, locations, and other information) in text documents. Machine learning based NER models are increasingly being applied in privacy-sensitive applications that need automatic and scalable identification of sensitive information to redact text for data sharing. In this paper, we study the setting when NER models are available as a black-box service for identifying sensitive information in user documents and show that these models are vulnerable to membership inference on their training datasets. With updated pre-trained NER models from spaCy, we demonstrate two distinct membership attacks on these models. Our first attack capitalizes on unintended memorization in the NER's underlying neural network, a phenomenon NNs are known to be vulnerable to. Our second attack leverages a timing side-channel to target NER models that maintain vocabularies constructed from the training data. We show that different functional paths of words within the training dataset in contrast to words not previously seen have measurable differences in execution time. Revealing membership status of training samples has clear privacy implications, e.g., in text redaction, sensitive words or phrases to be found and removed, are at risk of being detected in the training dataset. Our experimental evaluation includes the redaction of both password and health data, presenting both security risks and privacy/regulatory issues. This is exacerbated by results that show memorization with only a single phrase. We achieved 70% AUC in our first attack on a text redaction use-case. We also show overwhelming success in the timing attack with 99.23% AUC. Finally we discuss potential mitigation approaches to realize the safe use of NER models in light of the privacy and security implications of membership inference attacks.
[[2211.02468] Improving Adversarial Robustness to Sensitivity and Invariance Attacks with Deep Metric Learning](http://arxiv.org/abs/2211.02468)
Intentionally crafted adversarial samples have effectively exploited weaknesses in deep neural networks. A standard method in adversarial robustness assumes a framework to defend against samples crafted by minimally perturbing a sample such that its corresponding model output changes. These sensitivity attacks exploit the model's sensitivity toward task-irrelevant features. Another form of adversarial sample can be crafted via invariance attacks, which exploit the model underestimating the importance of relevant features. Previous literature has indicated a tradeoff in defending against both attack types within a strictly L_p bounded defense. To promote robustness toward both types of attacks beyond Euclidean distance metrics, we use metric learning to frame adversarial regularization as an optimal transport problem. Our preliminary results indicate that regularizing over invariant perturbations in our framework improves both invariant and sensitivity defense.
[[2211.02150] 3D Reconstruction of Multiple Objects by mmWave Radar on UAV](http://arxiv.org/abs/2211.02150)
In this paper, we explore the feasibility of utilizing a mmWave radar sensor installed on a UAV to reconstruct the 3D shapes of multiple objects in a space. The UAV hovers at various locations in the space, and its onboard radar senor collects raw radar data via scanning the space with Synthetic Aperture Radar (SAR) operation. The radar data is sent to a deep neural network model, which outputs the point cloud reconstruction of the multiple objects in the space. We evaluate two different models. Model 1 is our recently proposed 3DRIMR/R2P model, and Model 2 is formed by adding a segmentation stage in the processing pipeline of Model 1. Our experiments have demonstrated that both models are promising in solving the multiple object reconstruction problem. We also show that Model 2, despite producing denser and smoother point clouds, can lead to higher reconstruction loss or even loss of objects. In addition, we find that both models are robust to the highly noisy radar data obtained by unstable SAR operation due to the instability or vibration of a small UAV hovering at its intended scanning point. Our exploratory study has shown a promising direction of applying mmWave radar sensing in 3D object reconstruction.
[[2211.02307] Domain Adaptive Video Semantic Segmentation via Cross-Domain Moving Object Mixing](http://arxiv.org/abs/2211.02307)
The network trained for domain adaptation is prone to bias toward the easy-to-transfer classes. Since the ground truth label on the target domain is unavailable during training, the bias problem leads to skewed predictions, forgetting to predict hard-to-transfer classes. To address this problem, we propose Cross-domain Moving Object Mixing (CMOM) that cuts several objects, including hard-to-transfer classes, in the source domain video clip and pastes them into the target domain video clip. Unlike image-level domain adaptation, the temporal context should be maintained to mix moving objects in two different videos. Therefore, we design CMOM to mix with consecutive video frames, so that unrealistic movements are not occurring. We additionally propose Feature Alignment with Temporal Context (FATC) to enhance target domain feature discriminability. FATC exploits the robust source domain features, which are trained with ground truth labels, to learn discriminative target domain features in an unsupervised manner by filtering unreliable predictions with temporal consensus. We demonstrate the effectiveness of the proposed approaches through extensive experiments. In particular, our model reaches mIoU of 53.81% on VIPER to Cityscapes-Seq benchmark and mIoU of 56.31% on SYNTHIA-Seq to Cityscapes-Seq benchmark, surpassing the state-of-the-art methods by large margins.
[[2211.02404] Tensor Robust PCA with Nonconvex and Nonlocal Regularization](http://arxiv.org/abs/2211.02404)
Tensor robust principal component analysis (TRPCA) is a promising way for low-rank tensor recovery, which minimizes the convex surrogate of tensor rank by shrinking each tensor singular values equally. However, for real-world visual data, large singular values represent more signifiant information than small singular values. In this paper, we propose a nonconvex TRPCA (N-TRPCA) model based on the tensor adjustable logarithmic norm. Unlike TRPCA, our N-TRPCA can adaptively shrink small singular values more and shrink large singular values less. In addition, TRPCA assumes that the whole data tensor is of low rank. This assumption is hardly satisfied in practice for natural visual data, restricting the capability of TRPCA to recover the edges and texture details from noisy images and videos. To this end, we integrate nonlocal self-similarity into N-TRPCA, and further develop a nonconvex and nonlocal TRPCA (NN-TRPCA) model. Specifically, similar nonlocal patches are grouped as a tensor and then each group tensor is recovered by our N-TRPCA. Since the patches in one group are highly correlated, all group tensors have strong low-rank property, leading to an improvement of recovery performance. Experimental results demonstrate that the proposed NN-TRPCA outperforms some existing TRPCA methods in visual data recovery. The demo code is available at https://github.com/qguo2010/NN-TRPCA.
[[2211.02578] Data Models for Dataset Drift Controls in Machine Learning With Images](http://arxiv.org/abs/2211.02578)
Camera images are ubiquitous in machine learning research. They also play a central role in the delivery of important services spanning medicine and environmental surveying. However, the application of machine learning models in these domains has been limited because of robustness concerns. A primary failure mode are performance drops due to differences between the training and deployment data. While there are methods to prospectively validate the robustness of machine learning models to such dataset drifts, existing approaches do not account for explicit models of the primary object of interest: the data. This makes it difficult to create physically faithful drift test cases or to provide specifications of data models that should be avoided when deploying a machine learning model. In this study, we demonstrate how these shortcomings can be overcome by pairing machine learning robustness validation with physical optics. We examine the role raw sensor data and differentiable data models can play in controlling performance risks related to image dataset drift. The findings are distilled into three applications. First, drift synthesis enables the controlled generation of physically faithful drift test cases. The experiments presented here show that the average decrease in model performance is ten to four times less severe than under post-hoc augmentation testing. Second, the gradient connection between task and data models allows for drift forensics that can be used to specify performance-sensitive data models which should be avoided during deployment of a machine learning model. Third, drift adjustment opens up the possibility for processing adjustments in the face of drift. This can lead to speed up and stabilization of classifier training at a margin of up to 20% in validation accuracy. A guide to access the open code and datasets is available at https://github.com/aiaudit-org/raw2logit.
[[2211.02580] Evaluating and Improving Factuality in Multimodal Abstractive Summarization](http://arxiv.org/abs/2211.02580)
Current metrics for evaluating factuality for abstractive document summarization have achieved high correlations with human judgment, but they do not account for the vision modality and thus are not adequate for vision-and-language summarization. We propose CLIPBERTScore, a simple weighted combination of CLIPScore and BERTScore to leverage the robustness and strong factuality detection performance between image-summary and document-summary, respectively. Next, due to the lack of meta-evaluation benchmarks to evaluate the quality of multimodal factuality metrics, we collect human judgments of factuality with respect to documents and images. We show that this simple combination of two metrics in the zero-shot setting achieves higher correlations than existing factuality metrics for document summarization, outperforms an existing multimodal summarization metric, and performs competitively with strong multimodal factuality metrics specifically fine-tuned for the task. Our thorough analysis demonstrates the robustness and high correlation of CLIPBERTScore and its components on four factuality metric-evaluation benchmarks. Finally, we demonstrate two practical downstream applications of our CLIPBERTScore metric: for selecting important images to focus on during training, and as a reward for reinforcement learning to improve factuality of multimodal summary generation w.r.t automatic and human evaluation. Our data and code are publicly available at https://github.com/meetdavidwan/faithful-multimodal-summ
[[2211.02643] A Transformer Architecture for Online Gesture Recognition of Mathematical Expressions](http://arxiv.org/abs/2211.02643)
The Transformer architecture is shown to provide a powerful framework as an end-to-end model for building expression trees from online handwritten gestures corresponding to glyph strokes. In particular, the attention mechanism was successfully used to encode, learn and enforce the underlying syntax of expressions creating latent representations that are correctly decoded to the exact mathematical expression tree, providing robustness to ablated inputs and unseen glyphs. For the first time, the encoder is fed with spatio-temporal data tokens potentially forming an infinitely large vocabulary, which finds applications beyond that of online gesture recognition. A new supervised dataset of online handwriting gestures is provided for training models on generic handwriting recognition tasks and a new metric is proposed for the evaluation of the syntactic correctness of the output expression trees. A small Transformer model suitable for edge inference was successfully trained to an average normalised Levenshtein accuracy of 94%, resulting in valid postfix RPN tree representation for 94% of predictions.
[[2211.02069] LMentry: A Language Model Benchmark of Elementary Language Tasks](http://arxiv.org/abs/2211.02069)
As the performance of large language models rapidly improves, benchmarks are getting larger and more complex as well. We present LMentry, a benchmark that avoids this "arms race" by focusing on a compact set of tasks that are trivial to humans, e.g. writing a sentence containing a specific word, identifying which words in a list belong to a specific category, or choosing which of two words is longer. LMentry is specifically designed to provide quick and interpretable insights into the capabilities and robustness of large language models. Our experiments reveal a wide variety of failure cases that, while immediately obvious to humans, pose a considerable challenge for large language models, including OpenAI's latest 175B-parameter instruction-tuned model, TextDavinci002. LMentry complements contemporary evaluation approaches of large language models, providing a quick, automatic, and easy-to-run "unit test", without resorting to large benchmark suites of complex tasks.
[[2211.02429] Dealing with Abbreviations in the Slovenian Biographical Lexicon](http://arxiv.org/abs/2211.02429)
Abbreviations present a significant challenge for NLP systems because they cause tokenization and out-of-vocabulary errors. They can also make the text less readable, especially in reference printed books, where they are extensively used. Abbreviations are especially problematic in low-resource settings, where systems are less robust to begin with. In this paper, we propose a new method for addressing the problems caused by a high density of domain-specific abbreviations in a text. We apply this method to the case of a Slovenian biographical lexicon and evaluate it on a newly developed gold-standard dataset of 51 Slovenian biographies. Our abbreviation identification method performs significantly better than commonly used ad-hoc solutions, especially at identifying unseen abbreviations. We also propose and present the results of a method for expanding the identified abbreviations in context.
[[2211.02052] Theta-Resonance: A Single-Step Reinforcement Learning Method for Design Space Exploration](http://arxiv.org/abs/2211.02052)
Given an environment (e.g., a simulator) for evaluating samples in a specified design space and a set of weighted evaluation metrics -- one can use Theta-Resonance, a single-step Markov Decision Process (MDP), to train an intelligent agent producing progressively more optimal samples. In Theta-Resonance, a neural network consumes a constant input tensor and produces a policy as a set of conditional probability density functions (PDFs) for sampling each design dimension. We specialize existing policy gradient algorithms in deep reinforcement learning (D-RL) in order to use evaluation feedback (in terms of cost, penalty or reward) to update our policy network with robust algorithmic stability and minimal design evaluations. We study multiple neural architectures (for our policy network) within the context of a simple SoC design space and propose a method of constructing synthetic space exploration problems to compare and improve design space exploration (DSE) algorithms. Although we only present categorical design spaces, we also outline how to use Theta-Resonance in order to explore continuous and mixed continuous-discrete design spaces.
[[2211.02146] Robust Time Series Chain Discovery with Incremental Nearest Neighbors](http://arxiv.org/abs/2211.02146)
Time series motif discovery has been a fundamental task to identify meaningful repeated patterns in time series. Recently, time series chains were introduced as an expansion of time series motifs to identify the continuous evolving patterns in time series data. Informally, a time series chain (TSC) is a temporally ordered set of time series subsequences, in which every subsequence is similar to the one that precedes it, but the last and the first can be arbitrarily dissimilar. TSCs are shown to be able to reveal latent continuous evolving trends in the time series, and identify precursors of unusual events in complex systems. Despite its promising interpretability, unfortunately, we have observed that existing TSC definitions lack the ability to accurately cover the evolving part of a time series: the discovered chains can be easily cut by noise and can include non-evolving patterns, making them impractical in real-world applications. Inspired by a recent work that tracks how the nearest neighbor of a time series subsequence changes over time, we introduce a new TSC definition which is much more robust to noise in the data, in the sense that they can better locate the evolving patterns while excluding the non-evolving ones. We further propose two new quality metrics to rank the discovered chains. With extensive empirical evaluations, we demonstrate that the proposed TSC definition is significantly more robust to noise than the state of the art, and the top ranked chains discovered can reveal meaningful regularities in a variety of real world datasets.
[[2211.02233] Improved Adaptive Algorithm for Scalable Active Learning with Weak Labeler](http://arxiv.org/abs/2211.02233)
Active learning with strong and weak labelers considers a practical setting where we have access to both costly but accurate strong labelers and inaccurate but cheap predictions provided by weak labelers. We study this problem in the streaming setting, where decisions must be taken \textit{online}. We design a novel algorithmic template, Weak Labeler Active Cover (WL-AC), that is able to robustly leverage the lower quality weak labelers to reduce the query complexity while retaining the desired level of accuracy. Prior active learning algorithms with access to weak labelers learn a difference classifier which predicts where the weak labels differ from strong labelers; this requires the strong assumption of realizability of the difference classifier (Zhang and Chaudhuri,2015). WL-AC bypasses this \textit{realizability} assumption and thus is applicable to many real-world scenarios such as random corrupted weak labels and high dimensional family of difference classifiers (\textit{e.g.,} deep neural nets). Moreover, WL-AC cleverly trades off evaluating the quality with full exploitation of weak labelers, which allows to convert any active learning strategy to one that can leverage weak labelers. We provide an instantiation of this template that achieves the optimal query complexity for any given weak labeler, without knowing its accuracy a-priori. Empirically, we propose an instantiation of the WL-AC template that can be efficiently implemented for large-scale models (\textit{e.g}., deep neural nets) and show its effectiveness on the corrupted-MNIST dataset by significantly reducing the number of labels while keeping the same accuracy as in passive learning.
[[2211.02646] Robustness of Fusion-based Multimodal Classifiers to Cross-Modal Content Dilutions](http://arxiv.org/abs/2211.02646)
As multimodal learning finds applications in a wide variety of high-stakes societal tasks, investigating their robustness becomes important. Existing work has focused on understanding the robustness of vision-and-language models to imperceptible variations on benchmark tasks. In this work, we investigate the robustness of multimodal classifiers to cross-modal dilutions - a plausible variation. We develop a model that, given a multimodal (image + text) input, generates additional dilution text that (a) maintains relevance and topical coherence with the image and existing text, and (b) when added to the original text, leads to misclassification of the multimodal input. Via experiments on Crisis Humanitarianism and Sentiment Detection tasks, we find that the performance of task-specific fusion-based multimodal classifiers drops by 23.3% and 22.5%, respectively, in the presence of dilutions generated by our model. Metric-based comparisons with several baselines and human evaluations indicate that our dilutions show higher relevance and topical coherence, while simultaneously being more effective at demonstrating the brittleness of the multimodal classifiers. Our work aims to highlight and encourage further research on the robustness of deep multimodal models to realistic variations, especially in human-facing societal applications. The code and other resources are available at https://claws-lab.github.io/multimodal-robustness/.
[[2211.02091] Fairness in Federated Learning via Core-Stability](http://arxiv.org/abs/2211.02091)
Federated learning provides an effective paradigm to jointly optimize a model benefited from rich distributed data while protecting data privacy. Nonetheless, the heterogeneity nature of distributed data makes it challenging to define and ensure fairness among local agents. For instance, it is intuitively "unfair" for agents with data of high quality to sacrifice their performance due to other agents with low quality data. Currently popular egalitarian and weighted equity-based fairness measures suffer from the aforementioned pitfall. In this work, we aim to formally represent this problem and address these fairness issues using concepts from co-operative game theory and social choice theory. We model the task of learning a shared predictor in the federated setting as a fair public decision making problem, and then define the notion of core-stable fairness: Given $N$ agents, there is no subset of agents $S$ that can benefit significantly by forming a coalition among themselves based on their utilities $U_N$ and $U_S$ (i.e., $\frac{|S|}{N} U_S \geq U_N$). Core-stable predictors are robust to low quality local data from some agents, and additionally they satisfy Proportionality and Pareto-optimality, two well sought-after fairness and efficiency notions within social choice. We then propose an efficient federated learning protocol CoreFed to optimize a core stable predictor. CoreFed determines a core-stable predictor when the loss functions of the agents are convex. CoreFed also determines approximate core-stable predictors when the loss functions are not convex, like smooth neural networks. We further show the existence of core-stable predictors in more general settings using Kakutani's fixed point theorem. Finally, we empirically validate our analysis on two real-world datasets, and we show that CoreFed achieves higher core-stability fairness than FedAvg while having similar accuracy.
[[2211.02106] Federated Hypergradient Descent](http://arxiv.org/abs/2211.02106)
In this work, we explore combining automatic hyperparameter tuning and optimization for federated learning (FL) in an online, one-shot procedure. We apply a principled approach on a method for adaptive client learning rate, number of local steps, and batch size. In our federated learning applications, our primary motivations are minimizing communication budget as well as local computational resources in the training pipeline. Conventionally, hyperparameter tuning methods involve at least some degree of trial-and-error, which is known to be sample inefficient. In order to address our motivations, we propose FATHOM (Federated AuTomatic Hyperparameter OptiMization) as a one-shot online procedure. We investigate the challenges and solutions of deriving analytical gradients with respect to the hyperparameters of interest. Our approach is inspired by the fact that, with the exception of local data, we have full knowledge of all components involved in our training process, and this fact can be exploited in our algorithm impactfully. We show that FATHOM is more communication efficient than Federated Averaging (FedAvg) with optimized, static valued hyperparameters, and is also more computationally efficient overall. As a communication efficient, one-shot online procedure, FATHOM solves the bottleneck of costly communication and limited local computation, by eliminating a potentially wasteful tuning process, and by optimizing the hyperparamters adaptively throughout the training procedure without trial-and-error. We show our numerical results through extensive empirical experiments with the Federated EMNIST-62 (FEMNIST) and Federated Stack Overflow (FSO) datasets, using FedJAX as our baseline framework.
[[2211.02296] Decentralized Federated Reinforcement Learning for User-Centric Dynamic TFDD Control](http://arxiv.org/abs/2211.02296)
The explosive growth of dynamic and heterogeneous data traffic brings great challenges for 5G and beyond mobile networks. To enhance the network capacity and reliability, we propose a learning-based dynamic time-frequency division duplexing (D-TFDD) scheme that adaptively allocates the uplink and downlink time-frequency resources of base stations (BSs) to meet the asymmetric and heterogeneous traffic demands while alleviating the inter-cell interference. We formulate the problem as a decentralized partially observable Markov decision process (Dec-POMDP) that maximizes the long-term expected sum rate under the users' packet dropping ratio constraints. In order to jointly optimize the global resources in a decentralized manner, we propose a federated reinforcement learning (RL) algorithm named federated Wolpertinger deep deterministic policy gradient (FWDDPG) algorithm. The BSs decide their local time-frequency configurations through RL algorithms and achieve global training via exchanging local RL models with their neighbors under a decentralized federated learning framework. Specifically, to deal with the large-scale discrete action space of each BS, we adopt a DDPG-based algorithm to generate actions in a continuous space, and then utilize Wolpertinger policy to reduce the mapping errors from continuous action space back to discrete action space. Simulation results demonstrate the superiority of our proposed algorithm to benchmark algorithms with respect to system sum rate.
[[2211.02141] Shapes2Toon: Generating Cartoon Characters from Simple Geometric Shapes](http://arxiv.org/abs/2211.02141)
Cartoons are an important part of our entertainment culture. Though drawing a cartoon is not for everyone, creating it using an arrangement of basic geometric primitives that approximates that character is a fairly frequent technique in art. The key motivation behind this technique is that human bodies
[[2211.02092] Making Machine Learning Datasets and Models FAIR for HPC: A Methodology and Case Study](http://arxiv.org/abs/2211.02092)
The FAIR Guiding Principles aim to improve the findability, accessibility, interoperability, and reusability of digital content by making them both human and machine actionable. However, these principles have not yet been broadly adopted in the domain of machine learning-based program analyses and optimizations for High-Performance Computing (HPC). In this paper, we design a methodology to make HPC datasets and machine learning models FAIR after investigating existing FAIRness assessment and improvement techniques. Our methodology includes a comprehensive, quantitative assessment for elected data, followed by concrete, actionable suggestions to improve FAIRness with respect to common issues related to persistent identifiers, rich metadata descriptions, license and provenance information. Moreover, we select a representative training dataset to evaluate our methodology. The experiment shows the methodology can effectively improve the dataset and model's FAIRness from an initial score of 19.1% to the final score of 83.0%.
[[2211.02166] A $k$-additive Choquet integral-based approach to approximate the SHAP values for local interpretability in machine learning](http://arxiv.org/abs/2211.02166)
Besides accuracy, recent studies on machine learning models have been addressing the question on how the obtained results can be interpreted. Indeed, while complex machine learning models are able to provide very good results in terms of accuracy even in challenging applications, it is difficult to interpret them. Aiming at providing some interpretability for such models, one of the most famous methods, called SHAP, borrows the Shapley value concept from game theory in order to locally explain the predicted outcome of an instance of interest. As the SHAP values calculation needs previous computations on all possible coalitions of attributes, its computational cost can be very high. Therefore, a SHAP-based method called Kernel SHAP adopts an efficient strategy that approximate such values with less computational effort. In this paper, we also address local interpretability in machine learning based on Shapley values. Firstly, we provide a straightforward formulation of a SHAP-based method for local interpretability by using the Choquet integral, which leads to both Shapley values and Shapley interaction indices. Moreover, we also adopt the concept of $k$-additive games from game theory, which contributes to reduce the computational effort when estimating the SHAP values. The obtained results attest that our proposal needs less computations on coalitions of attributes to approximate the SHAP values.