[[2212.11394] Secure Aggregation of Semi-Honest Clients and Servers in Federated Learning with Secret-Shared Homomorphism](http://arxiv.org/abs/2212.11394) #secure
Privacy-preserving distributed machine learning has been recognized as one of the most promising paradigms for the collaboration of multiple or many clients who cannot disclose their local data -- neither the training data set nor the model parameters. One popular approach to preserving the confidentiality of local data is homomorphic encryption (HE): the clients send encrypted model parameters to the server, which aggregates the models into a global model and broadcasts it to the clients. Evidently, the security of the above HE-based approach depends on the assumption that the clients are honest and cannot obtain the encrypted models from others (otherwise the models can be immediately decrypted because all clients share the same private key); this is indeed the case since the encrypted models are transmitted through protocols like Transport Layer Security (TLS). However, it is often overlooked that the leakage of encrypted models does not necessarily happen on the network channel; for example, if a semi-honest client colludes with a semi-honest server (the former having the private key and the latter having the ciphertext), possibly offline in another session, they can surely recover the model parameters in question. This paper presents a new protocol for secure aggregation under the mild, in our opinion more practical, assumption that all parties are semi-honest. As a starting point, we construct a new security primitive, called \textit{asymmetric threshold secret sharing} (ATSS), which works with homomorphic encryption to constitute an aggregation protocol to achieve the desired security. We demonstrate that the ATSS-based protocol is provably secure and delivers reasonable performance in practice.
[[2212.11475] CHEM: Efficient Secure Aggregation with Cached Homomorphic Encryption in Federated Machine Learning Systems](http://arxiv.org/abs/2212.11475) #secure
Although homomorphic encryption can be incorporated into neural network layers for securing machine learning tasks, such as confidential inference over encrypted data samples and encrypted local models in federated learning, the computational overhead has been an Achilles heel. This paper proposes a caching protocol, namely CHEM, such that tensor ciphertexts can be constructed from a pool of cached radixes rather than carrying out expensive encryption operations. From a theoretical perspective, we demonstrate that CHEM is semantically secure and can be parameterized with straightforward analysis under practical assumptions. Experimental results on three popular public data sets show that adopting CHEM only incurs sub-second overhead and yet reduces the encryption cost by 48%--89% for encoding input data samples in confidential inference and 67%--87% for encoding local models in federated learning, respectively.
[[2212.11449] Detecting Network Security Vulnerabilities and Proactive Strategies to Mitigate Potential Threats](http://arxiv.org/abs/2212.11449) #security
In multi-tier network systems, custom applications, Web services and platform environments, storing data and information assets becomes a challenge for any organisation. Although there are different methods to secure network systems, the best way to test the level of security is to conduct penetration testing. In this paper, we describe how we performed live penetration testing for a particular network, namely, 192.168.3.0/24 (Case Study) by identifying the system vulnerabilities to enable its penetration. After compromising the system, critical data (Flags) must be found, indicating our successful penetration. As professional penetration testers, we used an arsenal of penetration testing tools utilised by malicious actors on the internet, such as Nmap, Nessus, Sparta and Metasploit, etc. Typically, much effort was employed on reconnaissance & scanning phases, rather than system exploration, due to their importance in identifying security vulnerabilities in the system environment. The vulnerability analysis highlighted the most critical threats, which token is an advantage to gain access, namely, FTP services, HTTP, and human errors. However, comprising the system is not sufficient because the critical data (Flag) generally requires the administrators rights. Consequently, teams often examine the system to find a way to escalate privilege to the root level. Furthermore, some critical data (Flags) require decryption algorithms or the analysis of captured packets to make them readable. We found eight Flags and identified a system security breach. Mitigation strategies addressing the identified vulnerabilities are recommended to ensure the given networks are secured against future attacks.
[[2212.11562] Role of Cybersecurity and Blockchain in Battlefield of Things](http://arxiv.org/abs/2212.11562) #security
The Internet of Things is an essential component in the growth of an ecosystem that enables quick and precise judgments to be made for communication on the battleground. The usage of the battlefield of things (BoT) is, however, subject to several restrictions for a variety of reasons. There is a potential for instances of replay, data manipulation, breaches of privacy, and other similar occurrences. As a direct result of this, the implementation of a security mechanism to protect the communication that occurs within BoT has turned into an absolute requirement. To this aim, we propose a blockchain-based solution that is both safe and private for use in communications inside the BoT ecosystem. In addition, research is conducted on the benefits of integrating blockchain technology and cybersecurity into BoT application implementations. This work elaborates on the importance of integrating cybersecurity and blockchain-based tools, techniques and methodologies for BoT.
[[2212.11700] Blockchain Scalability and Security: Communications Among Fast-Changing Committees Made Simple](http://arxiv.org/abs/2212.11700) #security
For permissionless blockchains, scalability is paramount. While current technologies still fail to address this problem fully, many research works propose sharding or other techniques that extensively adopt parallel processing of transactions. In these approaches, a potentially large number of committees of nodes independently perform consensus and process new transactions. Hence, in addition to regular intra-committee communication, (1) new transactions have to be delivered to the right committee, (2) committees need to communicate to process inter-shard transactions or (3) to exchange intermediate results. To contrast slowly adaptive adversaries, committees should be frequently changed. However, efficient communication to frequently-changing committees is hard.
We propose a simple approach that allows us to implicitly select committee members and effectively deliver messages to all members of a specific committee, even when committees are changed frequently. The aim of our design is to provide a committee selection procedure and a committee-targeted communication primitive to be applied in most of the scalable blockchain architectures that are currently proposed in literature. We provide a theoretical proof of the security of our approach and first experimental results that shows that our approach might be feasible in practice.
[[2212.11599] Synopsis: Sequential Decision Problems with Weak Feedback](http://arxiv.org/abs/2212.11599) #security
This thesis considers sequential decision problems, where the loss/reward incurred by selecting an action may not be inferred from observed feedback. A major part of this thesis focuses on the unsupervised sequential selection problem, where one can not infer the loss incurred for selecting an action from observed feedback. We also introduce a new setup named Censored Semi Bandits, where the loss incurred for selecting an action can be observed under certain conditions. Finally, we study the channel selection problem in the communication networks, where the reward for an action is only observed when no other player selects that action to play in the round. These problems find applications in many fields like healthcare, crowd-sourcing, security, adaptive resource allocation, among many others. This thesis aims to address the above-described sequential decision problems by exploiting specific structures these problems exhibit. We develop provably optimal algorithms for each of these setups with weak feedback and validate their empirical performance on different problem instances derived from synthetic and real datasets.
[[2212.11603] Sequential Decision Problems with Weak Feedback](http://arxiv.org/abs/2212.11603) #security
This thesis considers sequential decision problems, where the loss/reward incurred by selecting an action may not be inferred from observed feedback. A major part of this thesis focuses on the unsupervised sequential selection problem, where one can not infer the loss incurred for selecting an action from observed feedback. We also introduce a new setup named Censored Semi Bandits, where the loss incurred for selecting an action can be observed under certain conditions. Finally, we study the channel selection problem in the communication networks, where the reward for an action is only observed when no other player selects that action to play in the round. These problems find applications in many fields like healthcare, crowd-sourcing, security, adaptive resource allocation, among many others. This thesis aims to address the above-described sequential decision problems by exploiting specific structures these problems exhibit. We develop provably optimal algorithms for each of these setups with weak feedback and validate their empirical performance on different problem instances derived from synthetic and real datasets.
[[2212.11486] Over-the-Air Federated Learning with Enhanced Privacy](http://arxiv.org/abs/2212.11486) #privacy
Federated learning (FL) has emerged as a promising learning paradigm in which only local model parameters (gradients) are shared. Private user data never leaves the local devices thus preserving data privacy. However, recent research has shown that even when local data is never shared by a user, exchanging model parameters without protection can also leak private information. Moreover, in wireless systems, the frequent transmission of model parameters can cause tremendous bandwidth consumption and network congestion when the model is large. To address this problem, we propose a new FL framework with efficient over-the-air parameter aggregation and strong privacy protection of both user data and models. We achieve this by introducing pairwise cancellable random artificial noises (PCR-ANs) on end devices. As compared to existing over-the-air computation (AirComp) based FL schemes, our design provides stronger privacy protection. We analytically show the secrecy capacity and the convergence rate of the proposed wireless FL aggregation algorithm.
[[2212.11468] IPProtect: protecting the intellectual property of visual datasets during data valuation](http://arxiv.org/abs/2212.11468) #protect
Data trading is essential to accelerate the development of data-driven machine learning pipelines. The central problem in data trading is to estimate the utility of a seller's dataset with respect to a given buyer's machine learning task, also known as data valuation. Typically, data valuation requires one or more participants to share their raw dataset with others, leading to potential risks of intellectual property (IP) violations. In this paper, we tackle the novel task of preemptively protecting the IP of datasets that need to be shared during data valuation. First, we identify and formalize two kinds of novel IP risks in visual datasets: data-item (image) IP and statistical (dataset) IP. Then, we propose a novel algorithm to convert the raw dataset into a sanitized version, that provides resistance to IP violations, while at the same time allowing accurate data valuation. The key idea is to limit the transfer of information from the raw dataset to the sanitized dataset, thereby protecting against potential intellectual property violations. Next, we analyze our method for the likely existence of a solution and immunity against reconstruction attacks. Finally, we conduct extensive experiments on three computer vision datasets demonstrating the advantages of our method in comparison to other baselines.
[[2212.11772] A Self-Adjusting Fusion Representation Learning Model for Unaligned Text-Audio Sequences](http://arxiv.org/abs/2212.11772) #protect
Inter-modal interaction plays an indispensable role in multimodal sentiment analysis. Due to different modalities sequences are usually non-alignment, how to integrate relevant information of each modality to learn fusion representations has been one of the central challenges in multimodal learning. In this paper, a Self-Adjusting Fusion Representation Learning Model (SA-FRLM) is proposed to learn robust crossmodal fusion representations directly from the unaligned text and audio sequences. Different from previous works, our model not only makes full use of the interaction between different modalities but also maximizes the protection of the unimodal characteristics. Specifically, we first employ a crossmodal alignment module to project different modalities features to the same dimension. The crossmodal collaboration attention is then adopted to model the inter-modal interaction between text and audio sequences and initialize the fusion representations. After that, as the core unit of the SA-FRLM, the crossmodal adjustment transformer is proposed to protect original unimodal characteristics. It can dynamically adapt the fusion representations by using single modal streams. We evaluate our approach on the public multimodal sentiment analysis datasets CMU-MOSI and CMU-MOSEI. The experiment results show that our model has significantly improved the performance of all the metrics on the unaligned text-audio sequences.
[[2212.11760] Aliasing is a Driver of Adversarial Attacks](http://arxiv.org/abs/2212.11760) #attack
Aliasing is a highly important concept in signal processing, as careful consideration of resolution changes is essential in ensuring transmission and processing quality of audio, image, and video. Despite this, up until recently aliasing has received very little consideration in Deep Learning, with all common architectures carelessly sub-sampling without considering aliasing effects. In this work, we investigate the hypothesis that the existence of adversarial perturbations is due in part to aliasing in neural networks. Our ultimate goal is to increase robustness against adversarial attacks using explainable, non-trained, structural changes only, derived from aliasing first principles. Our contributions are the following. First, we establish a sufficient condition for no aliasing for general image transformations. Next, we study sources of aliasing in common neural network layers, and derive simple modifications from first principles to eliminate or reduce it. Lastly, our experimental results show a solid link between anti-aliasing and adversarial attacks. Simply reducing aliasing already results in more robust classifiers, and combining anti-aliasing with robust training out-performs solo robust training on $L_2$ attacks with none or minimal losses in performance on $L_{\infty}$ attacks.
[[2212.11751] Mind Your Heart: Stealthy Backdoor Attack on Dynamic Deep Neural Network in Edge Computing](http://arxiv.org/abs/2212.11751) #attack
Transforming off-the-shelf deep neural network (DNN) models into dynamic multi-exit architectures can achieve inference and transmission efficiency by fragmenting and distributing a large DNN model in edge computing scenarios (e.g., edge devices and cloud servers). In this paper, we propose a novel backdoor attack specifically on the dynamic multi-exit DNN models. Particularly, we inject a backdoor by poisoning one DNN model's shallow hidden layers targeting not this vanilla DNN model but only its dynamically deployed multi-exit architectures. Our backdoored vanilla model behaves normally on performance and cannot be activated even with the correct trigger. However, the backdoor will be activated when the victims acquire this model and transform it into a dynamic multi-exit architecture at their deployment. We conduct extensive experiments to prove the effectiveness of our attack on three structures (ResNet-56, VGG-16, and MobileNet) with four datasets (CIFAR-10, SVHN, GTSRB, and Tiny-ImageNet) and our backdoor is stealthy to evade multiple state-of-the-art backdoor detection or removal methods.
[[2212.11810] GAN-based Domain Inference Attack](http://arxiv.org/abs/2212.11810) #attack
Model-based attacks can infer training data information from deep neural network models. These attacks heavily depend on the attacker's knowledge of the application domain, e.g., using it to determine the auxiliary data for model-inversion attacks. However, attackers may not know what the model is used for in practice. We propose a generative adversarial network (GAN) based method to explore likely or similar domains of a target model -- the model domain inference (MDI) attack. For a given target (classification) model, we assume that the attacker knows nothing but the input and output formats and can use the model to derive the prediction for any input in the desired form. Our basic idea is to use the target model to affect a GAN training process for a candidate domain's dataset that is easy to obtain. We find that the target model may distract the training procedure less if the domain is more similar to the target domain. We then measure the distraction level with the distance between GAN-generated datasets, which can be used to rank candidate domains for the target model. Our experiments show that the auxiliary dataset from an MDI top-ranked domain can effectively boost the result of model-inversion attacks.
[[2212.11484] SALVE: Self-supervised Adaptive Low-light Video Enhancement](http://arxiv.org/abs/2212.11484) #robust
A self-supervised adaptive low-light video enhancement (SALVE) method is proposed in this work. SALVE first conducts an effective Retinex-based low-light image enhancement on a few key frames of an input low-light video. Next, it learns mappings from the low- to enhanced-light frames via Ridge regression. Finally, it uses these mappings to enhance the remaining frames in the input video. SALVE is a hybrid method that combines components from a traditional Retinex-based image enhancement method and a learning-based method. The former component leads to a robust solution which is easily adaptive to new real-world environments. The latter component offers a fast, computationally inexpensive and temporally consistent solution. We conduct extensive experiments to show the superior performance of SALVE. Our user study shows that 87% of participants prefer SALVE over prior work.
[[2212.11511] Confidence-Aware Paced-Curriculum Learning by Label Smoothing for Surgical Scene Understanding](http://arxiv.org/abs/2212.11511) #robust
Curriculum learning and self-paced learning are the training strategies that gradually feed the samples from easy to more complex. They have captivated increasing attention due to their excellent performance in robotic vision. Most recent works focus on designing curricula based on difficulty levels in input samples or smoothing the feature maps. However, smoothing labels to control the learning utility in a curriculum manner is still unexplored. In this work, we design a paced curriculum by label smoothing (P-CBLS) using paced learning with uniform label smoothing (ULS) for classification tasks and fuse uniform and spatially varying label smoothing (SVLS) for semantic segmentation tasks in a curriculum manner. In ULS and SVLS, a bigger smoothing factor value enforces a heavy smoothing penalty in the true label and limits learning less information. Therefore, we design the curriculum by label smoothing (CBLS). We set a bigger smoothing value at the beginning of training and gradually decreased it to zero to control the model learning utility from lower to higher. We also designed a confidence-aware pacing function and combined it with our CBLS to investigate the benefits of various curricula. The proposed techniques are validated on four robotic surgery datasets of multi-class, multi-label classification, captioning, and segmentation tasks. We also investigate the robustness of our method by corrupting validation data into different severity levels. Our extensive analysis shows that the proposed method improves prediction accuracy and robustness.
[[2212.11533] LaneAF: Robust Multi-Lane Detection with Affinity Fields](http://arxiv.org/abs/2212.11533) #robust
Lane detection is a long-standing task and a basic module in autonomous driving. The task is to detect the lane of the current driving road, and provide relevant information such as the ID, direction, curvature, width, length, with visualization. Our work is based on CNN backbone DLA-34, along with Affinity Fields, aims to achieve robust detection of various lanes without assuming the number of lanes. Besides, we investigate novel decoding methods to achieve more efficient lane detection algorithm.
[[2212.11677] DuAT: Dual-Aggregation Transformer Network for Medical Image Segmentation](http://arxiv.org/abs/2212.11677) #robust
Transformer-based models have been widely demonstrated to be successful in computer vision tasks by modelling long-range dependencies and capturing global representations. However, they are often dominated by features of large patterns leading to the loss of local details (e.g., boundaries and small objects), which are critical in medical image segmentation. To alleviate this problem, we propose a Dual-Aggregation Transformer Network called DuAT, which is characterized by two innovative designs, namely, the Global-to-Local Spatial Aggregation (GLSA) and Selective Boundary Aggregation (SBA) modules. The GLSA has the ability to aggregate and represent both global and local spatial features, which are beneficial for locating large and small objects, respectively. The SBA module is used to aggregate the boundary characteristic from low-level features and semantic information from high-level features for better preserving boundary details and locating the re-calibration objects. Extensive experiments in six benchmark datasets demonstrate that our proposed model outperforms state-of-the-art methods in the segmentation of skin lesion images, and polyps in colonoscopy images. In addition, our approach is more robust than existing methods in various challenging situations such as small object segmentation and ambiguous object boundaries.
[[2212.11920] Beyond SOT: It's Time to Track Multiple Generic Objects at Once](http://arxiv.org/abs/2212.11920) #robust
Generic Object Tracking (GOT) is the problem of tracking target objects, specified by bounding boxes in the first frame of a video. While the task has received much attention in the last decades, researchers have almost exclusively focused on the single object setting. Multi-object GOT benefits from a wider applicability, rendering it more attractive in real-world applications. We attribute the lack of research interest into this problem to the absence of suitable benchmarks. In this work, we introduce a new large-scale GOT benchmark, LaGOT, containing multiple annotated target objects per sequence. Our benchmark allows researchers to tackle key remaining challenges in GOT, aiming to increase robustness and reduce computation through joint tracking of multiple objects simultaneously. Furthermore, we propose a Transformer-based GOT tracker TaMOS capable of joint processing of multiple objects through shared computation. TaMOs achieves a 4x faster run-time in case of 10 concurrent objects compared to tracking each object independently and outperforms existing single object trackers on our new benchmark. Finally, TaMOs achieves highly competitive results on single-object GOT datasets, setting a new state-of-the-art on TrackingNet with a success rate AUC of 84.4%. Our benchmark, code, and trained models will be made publicly available.
[[2212.11850] Did You See That? A Covert Channel Exploiting Recent Legitimate Traffic](http://arxiv.org/abs/2212.11850) #robust
Covert channels are unforeseen and stealthy communication channels that enable manifold adversary scenarios, such as the covert exfiltration of confidential data or the stealthy orchestration of botnets. However, they can also allow the exchange of confidential information by journalists. All covert channels described until now therefore need to craft seemingly legitimate information flows for their information exchange, mimicking unsuspicious behavior.
In this paper, we present DYST (Did You See That?), which represents a new class of covert channels we call history covert channels. History covert channels can communicate almost exclusively based on unaltered legitimate traffic created by regular nodes participating in a network. Only a negligible fraction of the covert communication process requires the transfer of actual covert channel information. We extend the current taxonomy for covert channels to show how history channels can be categorized. We theoretically analyze the characteristics of history channels and show how their configuration can be optimized for two channel implementations, called DYST-Basic and DYST-Ext.
We further implement a proof-of-concept code for both DYST variants and evaluate the performance (robustness, detectability, and optimization) with both, simulated and real traffic. Finally, we discuss application scenarios and potential countermeasures against DYST.
[[2212.11702] Robust Meta-Representation Learning via Global Label Inference and Classification](http://arxiv.org/abs/2212.11702) #robust
Few-shot learning (FSL) is a central problem in meta-learning, where learners must efficiently learn from few labeled examples. Within FSL, feature pre-training has recently become an increasingly popular strategy to significantly improve generalization performance. However, the contribution of pre-training is often overlooked and understudied, with limited theoretical understanding of its impact on meta-learning performance. Further, pre-training requires a consistent set of global labels shared across training tasks, which may be unavailable in practice. In this work, we address the above issues by first showing the connection between pre-training and meta-learning. We discuss why pre-training yields more robust meta-representation and connect the theoretical analysis to existing works and empirical results. Secondly, we introduce Meta Label Learning (MeLa), a novel meta-learning algorithm that learns task relations by inferring global labels across tasks. This allows us to exploit pre-training for FSL even when global labels are unavailable or ill-defined. Lastly, we introduce an augmented pre-training procedure that further improves the learned meta-representation. Empirically, MeLa outperforms existing methods across a diverse range of benchmarks, in particular under a more challenging setting where the number of training tasks is limited and labels are task-specific. We also provide extensive ablation study to highlight its key properties.
[[2212.11746] Certified Policy Smoothing for Cooperative Multi-Agent Reinforcement Learning](http://arxiv.org/abs/2212.11746) #robust
Cooperative multi-agent reinforcement learning (c-MARL) is widely applied in safety-critical scenarios, thus the analysis of robustness for c-MARL models is profoundly important. However, robustness certification for c-MARLs has not yet been explored in the community. In this paper, we propose a novel certification method, which is the first work to leverage a scalable approach for c-MARLs to determine actions with guaranteed certified bounds. c-MARL certification poses two key challenges compared with single-agent systems: (i) the accumulated uncertainty as the number of agents increases; (ii) the potential lack of impact when changing the action of a single agent into a global team reward. These challenges prevent us from directly using existing algorithms. Hence, we employ the false discovery rate (FDR) controlling procedure considering the importance of each agent to certify per-state robustness and propose a tree-search-based algorithm to find a lower bound of the global reward under the minimal certified perturbation. As our method is general, it can also be applied in single-agent environments. We empirically show that our certification bounds are much tighter than state-of-the-art RL certification solutions. We also run experiments on two popular c-MARL algorithms: QMIX and VDN, in two different environments, with two and four agents. The experimental results show that our method produces meaningful guaranteed robustness for all models and environments. Our tool CertifyCMARL is available at https://github.com/TrustAI/CertifyCMA
[[2212.11792] CatlNet: Learning Communication and Coordination Policies from CaTL+ Specifications](http://arxiv.org/abs/2212.11792) #robust
In this paper, we propose a learning-based framework to simultaneously learn the communication and distributed control policies for a heterogeneous multi-agent system (MAS) under complex mission requirements from Capability Temporal Logic plus (CaTL+) specifications. Both policies are trained, implemented, and deployed using a novel neural network model called CatlNet. Taking advantage of the robustness measure of CaTL+, we train CatlNet centrally to maximize it where network parameters are shared among all agents, allowing CatlNet to scale to large teams easily. CatlNet can then be deployed distributedly. A plan repair algorithm is also introduced to guide CatlNet's training and improve both training efficiency and the overall performance of CatlNet. The CatlNet approach is tested in simulation and results show that, after training, CatlNet can steer the decentralized MAS system online to satisfy a CaTL+ specification with a high success rate.
[[2212.11958] Asymmetric Cross-Scale Alignment for Text-Based Person Search](http://arxiv.org/abs/2212.11958) #extraction
Text-based person search (TBPS) is of significant importance in intelligent surveillance, which aims to retrieve pedestrian images with high semantic relevance to a given text description. This retrieval task is characterized with both modal heterogeneity and fine-grained matching. To implement this task, one needs to extract multi-scale features from both image and text domains, and then perform the cross-modal alignment. However, most existing approaches only consider the alignment confined at their individual scales, e.g., an image-sentence or a region-phrase scale. Such a strategy adopts the presumable alignment in feature extraction, while overlooking the cross-scale alignment, e.g., image-phrase. In this paper, we present a transformer-based model to extract multi-scale representations, and perform Asymmetric Cross-Scale Alignment (ACSA) to precisely align the two modalities. Specifically, ACSA consists of a global-level alignment module and an asymmetric cross-attention module, where the former aligns an image and texts on a global scale, and the latter applies the cross-attention mechanism to dynamically align the cross-modal entities in region/image-phrase scales. Extensive experiments on two benchmark datasets CUHK-PEDES and RSTPReid demonstrate the effectiveness of our approach. Codes are available at \href{url}{https://github.com/mul-hjh/ACSA}.
[[2212.11522] AsyncFLEO: Asynchronous Federated Learning for LEO Satellite Constellations with High-Altitude Platforms](http://arxiv.org/abs/2212.11522) #federate
Low Earth Orbit (LEO) constellations, each comprising a large number of satellites, have become a new source of big data "from the sky". Downloading such data to a ground station (GS) for big data analytics demands very high bandwidth and involves large propagation delays. Federated Learning (FL) offers a promising solution because it allows data to stay in-situ (never leaving satellites) and it only needs to transmit machine learning model parameters (trained on the satellites' data). However, the conventional, synchronous FL process can take several days to train a single FL model in the context of satellite communication (Satcom), due to a bottleneck caused by straggler satellites. In this paper, we propose an asynchronous FL framework for LEO constellations called AsyncFLEO to improve FL efficiency in Satcom. Not only does AsynFLEO address the bottleneck (idle waiting) in synchronous FL, but it also solves the issue of model staleness caused by straggler satellites. AsyncFLEO utilizes high-altitude platforms (HAPs) positioned "in the sky" as parameter servers, and consists of three technical components: (1) a ring-of-stars communication topology, (2) a model propagation algorithm, and (3) a model aggregation algorithm with satellite grouping and staleness discounting. Our extensive evaluation with both IID and non-IID data shows that AsyncFLEO outperforms the state of the art by a large margin, cutting down convergence delay by 22 times and increasing accuracy by 40%.
[[2212.11729] Federated Learning -- Methods, Applications and beyond](http://arxiv.org/abs/2212.11729) #federate
In recent years the applications of machine learning models have increased rapidly, due to the large amount of available data and technological progress.While some domains like web analysis can benefit from this with only minor restrictions, other fields like in medicine with patient data are strongerregulated. In particular \emph{data privacy} plays an important role as recently highlighted by the trustworthy AI initiative of the EU or general privacy regulations in legislation. Another major challenge is, that the required training \emph{data is} often \emph{distributed} in terms of features or samples and unavailable for classicalbatch learning approaches. In 2016 Google came up with a framework, called \emph{Federated Learning} to solve both of these problems. We provide a brief overview on existing Methods and Applications in the field of vertical and horizontal \emph{Federated Learning}, as well as \emph{Fderated Transfer Learning}.
[[2212.11409] DExT: Detector Explanation Toolkit](http://arxiv.org/abs/2212.11409) #interpretability
State-of-the-art object detectors are treated as black boxes due to their highly non-linear internal computations. Even with unprecedented advancements in detector performance, the inability to explain how their outputs are generated limits their use in safety-critical applications. Previous work fails to produce explanations for both bounding box and classification decisions, and generally make individual explanations for various detectors. In this paper, we propose an open-source Detector Explanation Toolkit (DExT) which implements the proposed approach to generate a holistic explanation for all detector decisions using certain gradient-based explanation methods. We suggests various multi-object visualization methods to merge the explanations of multiple objects detected in an image as well as the corresponding detections in a single image. The quantitative evaluation show that the Single Shot MultiBox Detector (SSD) is more faithfully explained compared to other detectors regardless of the explanation methods. Both quantitative and human-centric evaluations identify that SmoothGrad with Guided Backpropagation (GBP) provides more trustworthy explanations among selected methods across all detectors. We expect that DExT will motivate practitioners to evaluate object detectors from the interpretability perspective by explaining both bounding box and classification decisions.
[[2212.11415] Circumventing interpretability: How to defeat mind-readers](http://arxiv.org/abs/2212.11415) #interpretability
The increasing capabilities of artificial intelligence (AI) systems make it ever more important that we interpret their internals to ensure that their intentions are aligned with human values. Yet there is reason to believe that misaligned artificial intelligence will have a convergent instrumental incentive to make its thoughts difficult for us to interpret. In this article, I discuss many ways that a capable AI might circumvent scalable interpretability methods and suggest a framework for thinking about these potential future risks.
[[2212.11870] Impossibility Theorems for Feature Attribution](http://arxiv.org/abs/2212.11870) #interpretability
Despite a sea of interpretability methods that can produce plausible explanations, the field has also empirically seen many failure cases of such methods. In light of these results, it remains unclear for practitioners how to use these methods and choose between them in a principled way. In this paper, we show that for even moderately rich model classes (easily satisfied by neural networks), any feature attribution method that is complete and linear--for example, Integrated Gradients and SHAP--can provably fail to improve on random guessing for inferring model behaviour. Our results apply to common end-tasks such as identifying local model behaviour, spurious feature identification, and algorithmic recourse. One takeaway from our work is the importance of concretely defining end-tasks. In particular, we show that once such an end-task is defined, a simple and direct approach of repeated model evaluations can outperform many other complex feature attribution methods.
[[2212.11565] Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation](http://arxiv.org/abs/2212.11565) #diffusion
To reproduce the success of text-to-image (T2I) generation, recent works in text-to-video (T2V) generation employ large-scale text-video dataset for fine-tuning. However, such paradigm is computationally expensive. Humans have the amazing ability to learn new visual concepts from just one single exemplar. We hereby study a new T2V generation problem$\unicode{x2014}$One-Shot Video Generation, where only a single text-video pair is presented for training an open-domain T2V generator. Intuitively, we propose to adapt the T2I diffusion model pretrained on massive image data for T2V generation. We make two key observations: 1) T2I models are able to generate images that align well with the verb terms; 2) extending T2I models to generate multiple images concurrently exhibits surprisingly good content consistency. To further learn continuous motion, we propose Tune-A-Video with a tailored Sparse-Causal Attention, which generates videos from text prompts via an efficient one-shot tuning of pretrained T2I diffusion models. Tune-A-Video is capable of producing temporally-coherent videos over various applications such as change of subject or background, attribute editing, style transfer, demonstrating the versatility and effectiveness of our method.
[[2212.11972] Scalable Adaptive Computation for Iterative Generation](http://arxiv.org/abs/2212.11972) #diffusion
We present the Recurrent Interface Network (RIN), a neural net architecture that allocates computation adaptively to the input according to the distribution of information, allowing it to scale to iterative generation of high-dimensional data. Hidden units of RINs are partitioned into the interface, which is locally connected to inputs, and latents, which are decoupled from inputs and can exchange information globally. The RIN block selectively reads from the interface into latents for high-capacity processing, with incremental updates written back to the interface. Stacking multiple blocks enables effective routing across local and global levels. While routing adds overhead, the cost can be amortized in recurrent computation settings where inputs change gradually while more global context persists, such as iterative generation using diffusion models. To this end, we propose a latent self-conditioning technique that "warm-starts" the latents at each iteration of the generation process. When applied to diffusion models operating directly on pixels, RINs yield state-of-the-art image and video generation without cascades or guidance, while being domain-agnostic and up to 10$\times$ more efficient compared to specialized 2D and 3D U-Nets.
[[2212.11685] GENIE: Large Scale Pre-training for Text Generation with Diffusion Model](http://arxiv.org/abs/2212.11685) #diffusion
In this paper, we propose a large-scale language pre-training for text GENeration using dIffusion modEl, which is named GENIE. GENIE is a pre-training sequence-to-sequence text generation model which combines Transformer and diffusion. The diffusion model accepts the latent information from the encoder, which is used to guide the denoising of the current time step. After multiple such denoise iterations, the diffusion model can restore the Gaussian noise to the diverse output text which is controlled by the input text. Moreover, such architecture design also allows us to adopt large scale pre-training on the GENIE. We propose a novel pre-training method named continuous paragraph denoise based on the characteristics of the diffusion model. Extensive experiments on the XSum, CNN/DailyMail, and Gigaword benchmarks shows that GENIE can achieves comparable performance with various strong baselines, especially after pre-training, the generation quality of GENIE is greatly improved. We have also conduct a lot of experiments on the generation diversity and parameter impact of GENIE. The code for GENIE will be made publicly available.