[[2210.16367] LAKEE: A Lightweight Authenticated Key Exchange Protocol for Power Constrained Devices](http://arxiv.org/abs/2210.16367)
The rapid development of IoT networks has led to a research trend in designing effective security features for them. Due to the power-constrained nature of IoT devices, the security features should remain as lightweight as possible. Currently, most of the IoT network traffic is unencrypted. The leakage of smart devices' unencrypted data can come with the significant cost of a privacy breach. To have a secure channel with encrypted traffic, two endpoints in a network have to authenticate each other and calculate a short-term key. They can then communicate through an authenticated and secure channel. This process is referred to as authenticated key exchange (AKE). Although Datagram Transport Layer Security (DTLS) offers an AKE protocol for IoT networks, research has proposed more efficient and case-specific alternatives. This paper presents LAKEE, a straightforward, lightweight AKE protocol for IoT networks. Our protocol employs elliptic curve cryptography for generating a short-term session key. It reduces the communication and computational overhead of its alternatives while maintaining or improving their security strength. The simplicity and low overhead of our protocol make it a fit for a network of constrained devices.
[[2210.16453] Joint Sub-component Level Segmentation and Classification for Anomaly Detection within Dual-Energy X-Ray Security Imagery](http://arxiv.org/abs/2210.16453)
X-ray baggage security screening is in widespread use and crucial to maintaining transport security for threat/anomaly detection tasks. The automatic detection of anomaly, which is concealed within cluttered and complex electronics/electrical items, using 2D X-ray imagery is of primary interest in recent years. We address this task by introducing joint object sub-component level segmentation and classification strategy using deep Convolution Neural Network architecture. The performance is evaluated over a dataset of cluttered X-ray baggage security imagery, consisting of consumer electrical and electronics items using variants of dual-energy X-ray imagery (pseudo-colour, high, low, and effective-Z). The proposed joint sub-component level segmentation and classification approach achieve ~99% true positive and ~5% false positive for anomaly detection task.
[[2210.16719] Multi-view Multi-label Anomaly Network Traffic Classification based on MLP-Mixer Neural Network](http://arxiv.org/abs/2210.16719)
Network traffic classification is the basis of many network security applications and has attracted enough attention in the field of cyberspace security. Existing network traffic classification based on convolutional neural networks (CNNs) often emphasizes local patterns of traffic data while ignoring global information associations. In this paper, we propose a MLP-Mixer based multi-view multi-label neural network for network traffic classification. Compared with the existing CNN-based methods, our method adopts the MLP-Mixer structure, which is more in line with the structure of the packet than the conventional convolution operation. In our method, the packet is divided into the packet header and the packet body, together with the flow features of the packet as input from different views. We utilize a multi-label setting to learn different scenarios simultaneously to improve the classification performance by exploiting the correlations between different scenarios. Taking advantage of the above characteristics, we propose an end-to-end network traffic classification method. We conduct experiments on three public datasets, and the experimental results show that our method can achieve superior performance.
[[2210.16765] Benchmarking Adversarial Patch Against Aerial Detection](http://arxiv.org/abs/2210.16765)
DNNs are vulnerable to adversarial examples, which poses great security concerns for security-critical systems. In this paper, a novel adaptive-patch-based physical attack (AP-PA) framework is proposed, which aims to generate adversarial patches that are adaptive in both physical dynamics and varying scales, and by which the particular targets can be hidden from being detected. Furthermore, the adversarial patch is also gifted with attack effectiveness against all targets of the same class with a patch outside the target (No need to smear targeted objects) and robust enough in the physical world. In addition, a new loss is devised to consider more available information of detected objects to optimize the adversarial patch, which can significantly improve the patch's attack efficacy (Average precision drop up to 87.86% and 85.48% in white-box and black-box settings, respectively) and optimizing efficiency. We also establish one of the first comprehensive, coherent, and rigorous benchmarks to evaluate the attack efficacy of adversarial patches on aerial detection tasks. Finally, several proportionally scaled experiments are performed physically to demonstrate that the elaborated adversarial patches can successfully deceive aerial detection algorithms in dynamic physical circumstances. The code is available at https://github.com/JiaweiLian/AP-PA.
[[2210.16519] Security-Preserving Federated Learning via Byzantine-Sensitive Triplet Distance](http://arxiv.org/abs/2210.16519)
While being an effective framework of learning a shared model across multiple edge devices, federated learning (FL) is generally vulnerable to Byzantine attacks from adversarial edge devices. While existing works on FL mitigate such compromised devices by only aggregating a subset of the local models at the server side, they still cannot successfully ignore the outliers due to imprecise scoring rule. In this paper, we propose an effective Byzantine-robust FL framework, namely dummy contrastive aggregation, by defining a novel scoring function that sensitively discriminates whether the model has been poisoned or not. Key idea is to extract essential information from every local models along with the previous global model to define a distance measure in a manner similar to triplet loss. Numerical results validate the advantage of the proposed approach by showing improved performance as compared to the state-of-the-art Byzantine-resilient aggregation methods, e.g., Krum, Trimmed-mean, and Fang.
[[2210.16595] BEPHAP: A Blockchain-Based Efficient Privacy-Preserving Handover Authentication Protocol with Key Agreement for Internet of Vehicles](http://arxiv.org/abs/2210.16595)
The Internet of Vehicles (IoV) can significantly improve transportation efficiency and ensure traffic safety. Authentication is regarded as the fundamental defense line against attacks in IoV. However, the state-of-the-art approaches suffer from several drawbacks, including bottlenecks of the single cloud server model, high computational overhead of operations, excessive trust in cloud servers and roadside units (RSUs), and leakage of vehicle trajectory privacy. In this paper, BEPHAP, a Blockchain-based Efficient Privacy-preserving Handover Authentication Protocol with key agreement for internet of vehicles, is introduced to address these problems. BEPHAP achieves anonymous cross-domain mutual handover authentication with key agreement based on the tamper-proof blockchain, symmetric cryptography, and the chameleon hash function under a security model that cloud servers and RSUs may launch attacks. BEPHAP is particularly well suited for IoV since it allows vehicles only need to perform lightweight cryptographic operations during the authentication phase. BEPHAP also achieves data confidentiality, unlinkability, traceability, non-repudiation, non-frameability, and key escrow freeness. Formal verification based on ProVerif and formal security proofs based on the BAN logic indicates that BEPHAP is resistant to various typical attacks, such as man-in-the-middle attacks, impersonation attacks, and replay attacks. Performance analysis demonstrates that BEPHAP surpasses existing works in both computation and communication efficiencies. And the message loss rate remains 0 at 5000 requests per second, which meets the requirement of IoV.
[[2210.16346] Improving Hyperspectral Adversarial Robustness using Ensemble Networks in the Presences of Multiple Attacks](http://arxiv.org/abs/2210.16346)
Semantic segmentation of hyperspectral images (HSI) has seen great strides in recent years by incorporating knowledge from deep learning RGB classification models. Similar to their classification counterparts, semantic segmentation models are vulnerable to adversarial examples and need adversarial training to counteract them. Traditional approaches to adversarial robustness focus on training or retraining a single network on attacked data, however, in the presence of multiple attacks these approaches decrease the performance compared to networks trained individually on each attack. To combat this issue we propose an Adversarial Discriminator Ensemble Network (ADE-Net) which focuses on attack type detection and adversarial robustness under a unified model to preserve per data-type weight optimally while robustifiying the overall network. In the proposed method, a discriminator network is used to separate data by attack type into their specific attack-expert ensemble network. Our approach allows for the presence of multiple attacks mixed together while also labeling attack types during testing. We experimentally show that ADE-Net outperforms the baseline, which is a single network adversarially trained under a mix of multiple attacks, for HSI Indian Pines, Kennedy Space, and Houston datasets.
[[2210.16606] Neural Combinatorial Logic Circuit Synthesis from Input-Output Examples](http://arxiv.org/abs/2210.16606)
We propose a novel, fully explainable neural approach to synthesis of combinatorial logic circuits from input-output examples. The carrying advantage of our method is that it readily extends to inductive scenarios, where the set of examples is incomplete but still indicative of the desired behaviour. Our method can be employed for a virtually arbitrary choice of atoms - from logic gates to FPGA blocks - as long as they can be formulated in a differentiable fashion, and consistently yields good results for synthesis of practical circuits of increasing size. In particular, we succeed in learning a number of arithmetic, bitwise, and signal-routing operations, and even generalise towards the correct behaviour in inductive scenarios. Our method, attacking a discrete logical synthesis problem with an explainable neural approach, hints at a wider promise for synthesis and reasoning-related tasks.
[[2210.16371] Distributed Black-box Attack against Image Classification Cloud Services](http://arxiv.org/abs/2210.16371)
Black-box adversarial attacks can fool image classifiers into misclassifying images without requiring access to model structure and weights. Recently proposed black-box attacks can achieve a success rate of more than 95\% after less than 1,000 queries. The question then arises of whether black-box attacks have become a real threat against IoT devices that rely on cloud APIs to achieve image classification. To shed some light on this, note that prior research has primarily focused on increasing the success rate and reducing the number of required queries. However, another crucial factor for black-box attacks against cloud APIs is the time required to perform the attack. This paper applies black-box attacks directly to cloud APIs rather than to local models, thereby avoiding multiple mistakes made in prior research. Further, we exploit load balancing to enable distributed black-box attacks that can reduce the attack time by a factor of about five for both local search and gradient estimation methods.
[[2210.16682] Robust Distributed Learning Against Both Distributional Shifts and Byzantine Attacks](http://arxiv.org/abs/2210.16682)
In distributed learning systems, robustness issues may arise from two sources. On one hand, due to distributional shifts between training data and test data, the trained model could exhibit poor out-of-sample performance. On the other hand, a portion of working nodes might be subject to byzantine attacks which could invalidate the learning result. Existing works mostly deal with these two issues separately. In this paper, we propose a new algorithm that equips distributed learning with robustness measures against both distributional shifts and byzantine attacks. Our algorithm is built on recent advances in distributionally robust optimization as well as norm-based screening (NBS), a robust aggregation scheme against byzantine attacks. We provide convergence proofs in three cases of the learning model being nonconvex, convex, and strongly convex for the proposed algorithm, shedding light on its convergence behaviors and endurability against byzantine attacks. In particular, we deduce that any algorithm employing NBS (including ours) cannot converge when the percentage of byzantine nodes is 1/3 or higher, instead of 1/2, which is the common belief in current literature. The experimental results demonstrate the effectiveness of our algorithm against both robustness issues. To the best of our knowledge, this is the first work to address distributional shifts and byzantine attacks simultaneously.
[[2210.16451] Robust Boosting Forests with Richer Deep Feature Hierarchy](http://arxiv.org/abs/2210.16451)
We propose a robust variant of boosting forest to the various adversarial defense methods, and apply it to enhance the robustness of the deep neural network. We retain the deep network architecture, weights, and middle layer features, then install gradient boosting forest to select the features from each layer of the deep network, and predict the target. For training each decision tree, we propose a novel conservative and greedy trade-off, with consideration for less misprediction instead of pure gain functions, therefore being suboptimal and conservative. We actively increase tree depth to remedy the accuracy with splits in more features, being more greedy in growing tree depth. We propose a new task on 3D face model, whose robustness has not been carefully studied, despite the great security and privacy concerns related to face analytics. We tried a simple attack method on a pure convolutional neural network (CNN) face shape estimator, making it degenerate to only output average face shape with invisible perturbation. Our conservative-greedy boosting forest (CGBF) on face landmark datasets showed a great improvement over original pure deep learning methods under the adversarial attacks.
[[2210.16467] ImplantFormer: Vision Transformer based Implant Position Regression Using Dental CBCT Data](http://arxiv.org/abs/2210.16467)
Implant prosthesis is the most optimum treatment of dentition defect or dentition loss, which usually involves a surgical guide design process to decide the position of implant. However, such design heavily relies on the subjective experiences of dentist. To relieve this problem, in this paper, a transformer based Implant Position Regression Network, ImplantFormer, is proposed to automatically predict the implant position based on the oral CBCT data. The 3D CBCT data is firstly transformed into a series of 2D transverse plane slice views. ImplantFormer is then proposed to predict the position of implant based on the 2D slices of crown images. Convolutional stem and decoder are designed to coarsely extract image feature before the operation of patch embedding and integrate multi-levels feature map for robust prediction. The predictions of our network at tooth crown area are finally projected back to the positions at tooth root. As both long-range relationship and local features are involved, our approach can better represent global information and achieves better location performance than the state-of-the-art detectors. Experimental results on a dataset of 128 patients, collected from Shenzhen University General Hospital, show that our ImplantFormer achieves superior performance than benchmarks.
[[2210.16646] Breaking the Symmetry: Resolving Symmetry Ambiguities in Equivariant Neural Networks](http://arxiv.org/abs/2210.16646)
Equivariant networks have been adopted in many 3-D learning areas. Here we identify a fundamental limitation of these networks: their ambiguity to symmetries. Equivariant networks cannot complete symmetry-dependent tasks like segmenting a left-right symmetric object into its left and right sides. We tackle this problem by adding components that resolve symmetry ambiguities while preserving rotational equivariance. We present OAVNN: Orientation Aware Vector Neuron Network, an extension of the Vector Neuron Network. OAVNN is a rotation equivariant network that is robust to planar symmetric inputs. Our network consists of three key components. 1) We introduce an algorithm to calculate symmetry detecting features. 2) We create a symmetry-sensitive orientation aware linear layer. 3) We construct an attention mechanism that relates directional information across points. We evaluate the network using left-right segmentation and find that the network quickly obtains accurate segmentations. We hope this work motivates investigations on the expressivity of equivariant networks on symmetric objects.
[[2210.16847] 1st Place Solutions for UG2+ Challenge 2022 ATMOSPHERIC TURBULENCE MITIGATION](http://arxiv.org/abs/2210.16847)
In this technical report, we briefly introduce the solution of our team ''summer'' for Atomospheric Turbulence Mitigation in UG$^2$+ Challenge in CVPR
[[2210.16870] A simple, efficient and scalable contrastive masked autoencoder for learning visual representations](http://arxiv.org/abs/2210.16870)
We introduce CAN, a simple, efficient and scalable method for self-supervised learning of visual representations. Our framework is a minimal and conceptually clean synthesis of (C) contrastive learning, (A) masked autoencoders, and (N) the noise prediction approach used in diffusion models. The learning mechanisms are complementary to one another: contrastive learning shapes the embedding space across a batch of image samples; masked autoencoders focus on reconstruction of the low-frequency spatial correlations in a single image sample; and noise prediction encourages the reconstruction of the high-frequency components of an image. The combined approach results in a robust, scalable and simple-to-implement algorithm. The training process is symmetric, with 50% of patches in both views being masked at random, yielding a considerable efficiency improvement over prior contrastive learning methods. Extensive empirical studies demonstrate that CAN achieves strong downstream performance under both linear and finetuning evaluations on transfer learning and robustness tasks. CAN outperforms MAE and SimCLR when pre-training on ImageNet, but is especially useful for pre-training on larger uncurated datasets such as JFT-300M: for linear probe on ImageNet, CAN achieves 75.4% compared to 73.4% for SimCLR and 64.1% for MAE. The finetuned performance on ImageNet of our ViT-L model is 86.1%, compared to 85.5% for SimCLR, and 85.4% for MAE. The overall FLOPs load of SimCLR is 70% higher than CAN for ViT-L models.
[[2210.16900] High Resolution Multi-Scale RAFT (Robust Vision Challenge 2022)](http://arxiv.org/abs/2210.16900)
In this report, we present our optical flow approach, MS-RAFT+, that won the Robust Vision Challenge 2022. It is based on the MS-RAFT method, which successfully integrates several multi-scale concepts into single-scale RAFT. Our approach extends this method by exploiting an additional finer scale for estimating the flow, which is made feasible by on-demand cost computation. This way, it can not only operate at half the original resolution, but also use MS-RAFT's shared convex upsampler to obtain full resolution flow. Moreover, our approach relies on an adjusted fine-tuning scheme during training. This in turn aims at improving the generalization across benchmarks. Among all participating methods in the Robust Vision Challenge, our approach ranks first on VIPER and second on KITTI, Sintel, and Middlebury, resulting in the first place of the overall ranking.
[[2210.16422] Toward Unifying Text Segmentation and Long Document Summarization](http://arxiv.org/abs/2210.16422)
Text segmentation is important for signaling a document's structure. Without segmenting a long document into topically coherent sections, it is difficult for readers to comprehend the text, let alone find important information. The problem is only exacerbated by a lack of segmentation in transcripts of audio/video recordings. In this paper, we explore the role that section segmentation plays in extractive summarization of written and spoken documents. Our approach learns robust sentence representations by performing summarization and segmentation simultaneously, which is further enhanced by an optimization-based regularizer to promote selection of diverse summary sentences. We conduct experiments on multiple datasets ranging from scientific articles to spoken transcripts to evaluate the model's performance. Our findings suggest that the model can not only achieve state-of-the-art performance on publicly available benchmarks, but demonstrate better cross-genre transferability when equipped with text segmentation. We perform a series of analyses to quantify the impact of section segmentation on summarizing written and spoken documents of substantial length and complexity.
[[2210.16732] How Far are We from Robust Long Abstractive Summarization?](http://arxiv.org/abs/2210.16732)
Abstractive summarization has made tremendous progress in recent years. In this work, we perform fine-grained human annotations to evaluate long document abstractive summarization systems (i.e., models and metrics) with the aim of implementing them to generate reliable summaries. For long document abstractive models, we show that the constant strive for state-of-the-art ROUGE results can lead us to generate more relevant summaries but not factual ones. For long document evaluation metrics, human evaluation results show that ROUGE remains the best at evaluating the relevancy of a summary. It also reveals important limitations of factuality metrics in detecting different types of factual errors and the reasons behind the effectiveness of BARTScore. We then suggest promising directions in the endeavor of developing factual consistency metrics. Finally, we release our annotated long document dataset with the hope that it can contribute to the development of metrics across a broader range of summarization settings.
[[2210.16865] Learning to Decompose: Hypothetical Question Decomposition Based on Comparable Texts](http://arxiv.org/abs/2210.16865)
Explicit decomposition modeling, which involves breaking down complex tasks into more straightforward and often more interpretable sub-tasks, has long been a central theme in developing robust and interpretable NLU systems. However, despite the many datasets and resources built as part of this effort, the majority have small-scale annotations and limited scope, which is insufficient to solve general decomposition tasks. In this paper, we look at large-scale intermediate pre-training of decomposition-based transformers using distant supervision from comparable texts, particularly large-scale parallel news. We show that with such intermediate pre-training, developing robust decomposition-based models for a diverse range of tasks becomes more feasible. For example, on semantic parsing, our model, DecompT5, improves 20% to 30% on two datasets, Overnight and TORQUE, over the baseline language model. We further use DecompT5 to build a novel decomposition-based QA system named DecompEntail, improving over state-of-the-art models, including GPT-3, on both HotpotQA and StrategyQA by 8% and 4%, respectively.
[[2210.16906] DyG2Vec: Representation Learning for Dynamic Graphs with Self-Supervision](http://arxiv.org/abs/2210.16906)
The challenge in learning from dynamic graphs for predictive tasks lies in extracting fine-grained temporal motifs from an ever-evolving graph. Moreover, task labels are often scarce, costly to obtain, and highly imbalanced for large dynamic graphs. Recent advances in self-supervised learning on graphs demonstrate great potential, but focus on static graphs. State-of-the-art (SoTA) models for dynamic graphs are not only incompatible with the self-supervised learning (SSL) paradigm but also fail to forecast interactions beyond the very near future. To address these limitations, we present DyG2Vec, an SSL-compatible, efficient model for representation learning on dynamic graphs. DyG2Vec uses a window-based mechanism to generate task-agnostic node embeddings that can be used to forecast future interactions. DyG2Vec significantly outperforms SoTA baselines on benchmark datasets for downstream tasks while only requiring a fraction of the training/inference time. We adapt two SSL evaluation mechanisms to make them applicable to dynamic graphs and thus show that SSL pre-training helps learn more robust temporal node representations, especially for scenarios with few labels.
[[2210.16365] Elastic Weight Consolidation Improves the Robustness of Self-Supervised Learning Methods under Transfer](http://arxiv.org/abs/2210.16365)
Self-supervised representation learning (SSL) methods provide an effective label-free initial condition for fine-tuning downstream tasks. However, in numerous realistic scenarios, the downstream task might be biased with respect to the target label distribution. This in turn moves the learned fine-tuned model posterior away from the initial (label) bias-free self-supervised model posterior. In this work, we re-interpret SSL fine-tuning under the lens of Bayesian continual learning and consider regularization through the Elastic Weight Consolidation (EWC) framework. We demonstrate that self-regularization against an initial SSL backbone improves worst sub-group performance in Waterbirds by 5% and Celeb-A by 2% when using the ViT-B/16 architecture. Furthermore, to help simplify the use of EWC with SSL, we pre-compute and publicly release the Fisher Information Matrix (FIM), evaluated with 10,000 ImageNet-1K variates evaluated on large modern SSL architectures including ViT-B/16 and ResNet50 trained with DINO.
[[2210.16400] Flatter, faster: scaling momentum for optimal speedup of SGD](http://arxiv.org/abs/2210.16400)
Commonly used optimization algorithms often show a trade-off between good generalization and fast training times. For instance, stochastic gradient descent (SGD) tends to have good generalization; however, adaptive gradient methods have superior training times. Momentum can help accelerate training with SGD, but so far there has been no principled way to select the momentum hyperparameter. Here we study implicit bias arising from the interplay between SGD with label noise and momentum in the training of overparametrized neural networks. We find that scaling the momentum hyperparameter $1-\beta$ with the learning rate to the power of $2/3$ maximally accelerates training, without sacrificing generalization. To analytically derive this result we develop an architecture-independent framework, where the main assumption is the existence of a degenerate manifold of global minimizers, as is natural in overparametrized models. Training dynamics display the emergence of two characteristic timescales that are well-separated for generic values of the hyperparameters. The maximum acceleration of training is reached when these two timescales meet, which in turn determines the scaling limit we propose. We perform experiments, including matrix sensing and ResNet on CIFAR10, which provide evidence for the robustness of these results.
[[2210.16401] The Fisher-Rao Loss for Learning under Label Noise](http://arxiv.org/abs/2210.16401)
Choosing a suitable loss function is essential when learning by empirical risk minimisation. In many practical cases, the datasets used for training a classifier may contain incorrect labels, which prompts the interest for using loss functions that are inherently robust to label noise. In this paper, we study the Fisher-Rao loss function, which emerges from the Fisher-Rao distance in the statistical manifold of discrete distributions. We derive an upper bound for the performance degradation in the presence of label noise, and analyse the learning speed of this loss. Comparing with other commonly used losses, we argue that the Fisher-Rao loss provides a natural trade-off between robustness and training dynamics. Numerical experiments with synthetic and MNIST datasets illustrate this performance.
[[2210.16892] Partitioned Gradient Matching-based Data Subset Selection for Compute-Efficient Robust ASR Training](http://arxiv.org/abs/2210.16892)
Training state-of-the-art ASR systems such as RNN-T often has a high associated financial and environmental cost. Training with a subset of training data could mitigate this problem if the subset selected could achieve on-par performance with training with the entire dataset. Although there are many data subset selection(DSS) algorithms, direct application to the RNN-T is difficult, especially the DSS algorithms that are adaptive and use learning dynamics such as gradients, as RNN-T tend to have gradients with a significantly larger memory footprint. In this paper, we propose Partitioned Gradient Matching (PGM) a novel distributable DSS algorithm, suitable for massive datasets like those used to train RNN-T. Through extensive experiments on Librispeech 100H and Librispeech 960H, we show that PGM achieves between 3x to 6x speedup with only a very small accuracy degradation (under 1% absolute WER difference). In addition, we demonstrate similar results for PGM even in settings where the training data is corrupted with noise.
[[2210.16728] ISG: I can See Your Gene Expression](http://arxiv.org/abs/2210.16728)
This paper aims to predict gene expression from a histology slide image precisely. Such a slide image has a large resolution and sparsely distributed textures. These obstruct extracting and interpreting discriminative features from the slide image for diverse gene types prediction. Existing gene expression methods mainly use general components to filter textureless regions, extract features, and aggregate features uniformly across regions. However, they ignore gaps and interactions between different image regions and are therefore inferior in the gene expression task. Instead, we present ISG framework that harnesses interactions among discriminative features from texture-abundant regions by three new modules: 1) a Shannon Selection module, based on the Shannon information content and Solomonoff's theory, to filter out textureless image regions; 2) a Feature Extraction network to extract expressive low-dimensional feature representations for efficient region interactions among a high-resolution image; 3) a Dual Attention network attends to regions with desired gene expression features and aggregates them for the prediction task. Extensive experiments on standard benchmark datasets show that the proposed ISG framework outperforms state-of-the-art methods significantly.
[[2210.16829] Self-Regularized Prototypical Network for Few-Shot Semantic Segmentation](http://arxiv.org/abs/2210.16829)
The deep CNNs in image semantic segmentation typically require a large number of densely-annotated images for training and have difficulties in generalizing to unseen object categories. Therefore, few-shot segmentation has been developed to perform segmentation with just a few annotated examples. In this work, we tackle the few-shot segmentation using a self-regularized prototypical network (SRPNet) based on prototype extraction for better utilization of the support information. The proposed SRPNet extracts class-specific prototype representations from support images and generates segmentation masks for query images by a distance metric - the fidelity. A direct yet effective prototype regularization on support set is proposed in SRPNet, in which the generated prototypes are evaluated and regularized on the support set itself. The extent to which the generated prototypes restore the support mask imposes an upper limit on performance. The performance on the query set should never exceed the upper limit no matter how complete the knowledge is generalized from support set to query set. With the specific prototype regularization, SRPNet fully exploits knowledge from the support and offers high-quality prototypes that are representative for each semantic class and meanwhile discriminative for different classes. The query performance is further improved by an iterative query inference (IQI) module that combines a set of regularized prototypes. Our proposed SRPNet achieves new state-of-art performance on 1-shot and 5-shot segmentation benchmarks.
[[2210.16914] FatNet: High Resolution Kernels for Classification Using Fully Convolutional Optical Neural Networks](http://arxiv.org/abs/2210.16914)
This paper describes the transformation of a traditional in-silico classification network into an optical fully convolutional neural network with high-resolution feature maps and kernels. When using the free-space 4f system to accelerate the inference speed of neural networks, higher resolutions of feature maps and kernels can be used without the loss in frame rate. We present FatNet for the classification of images, which is more compatible with free-space acceleration than standard convolutional classifiers. It neglects the standard combination of convolutional feature extraction and classifier dense layers by performing both in one fully convolutional network. This approach takes full advantage of the parallelism in the 4f free-space system and performs fewer conversions between electronics and optics by reducing the number of channels and increasing the resolution, making the network faster in optics than off-the-shelf networks. To demonstrate the capabilities of FatNet, it trained with the CIFAR100 dataset on GPU and the simulator of the 4f system, then compared the results against ResNet-18. The results show 8.2 times fewer convolution operations at the cost of only 6% lower accuracy compared to the original network. These are promising results for the approach of training deep learning with high-resolution kernels in the direction towards the upcoming optics era.
[[2210.16391] Radically Lower Data-Labeling Costs for Visually Rich Document Extraction Models](http://arxiv.org/abs/2210.16391)
A key bottleneck in building automatic extraction models for visually rich documents like invoices is the cost of acquiring the several thousand high-quality labeled documents that are needed to train a model with acceptable accuracy. We propose Selective Labeling to simplify the labeling task to provide "yes/no" labels for candidate extractions predicted by a model trained on partially labeled documents. We combine this with a custom active learning strategy to find the predictions that the model is most uncertain about. We show through experiments on document types drawn from 3 different domains that selective labeling can reduce the cost of acquiring labeled data by $10\times$ with a negligible loss in accuracy.
[[2210.16541] Entity-centered Cross-document Relation Extraction](http://arxiv.org/abs/2210.16541)
Relation Extraction (RE) is a fundamental task of information extraction, which has attracted a large amount of research attention. Previous studies focus on extracting the relations within a sentence or document, while currently researchers begin to explore cross-document RE. However, current cross-document RE methods directly utilize text snippets surrounding target entities in multiple given documents, which brings considerable noisy and non-relevant sentences. Moreover, they utilize all the text paths in a document bag in a coarse-grained way, without considering the connections between these text paths.In this paper, we aim to address both of these shortages and push the state-of-the-art for cross-document RE. First, we focus on input construction for our RE model and propose an entity-based document-context filter to retain useful information in the given documents by using the bridge entities in the text paths. Second, we propose a cross-document RE model based on cross-path entity relation attention, which allow the entity relations across text paths to interact with each other. We compare our cross-document RE method with the state-of-the-art methods in the dataset CodRED. Our method outperforms them by at least 10% in F1, thus demonstrating its effectiveness.
[[2210.16424] Machine Unlearning of Federated Clusters](http://arxiv.org/abs/2210.16424)
Federated clustering is an unsupervised learning problem that arises in a number of practical applications, including personalized recommender and healthcare systems. With the adoption of recent laws ensuring the "right to be forgotten", the problem of machine unlearning for federated clustering methods has become of significant importance. This work proposes the first known unlearning mechanism for federated clustering with privacy criteria that support simple, provable, and efficient data removal at the client and server level. The gist of our approach is to combine special initialization procedures with quantization methods that allow for secure aggregation of estimated local cluster counts at the server unit. As part of our platform, we introduce secure compressed multiset aggregation (SCMA), which is of independent interest for secure sparse model aggregation. In order to simultaneously facilitate low communication complexity and secret sharing protocols, we integrate Reed-Solomon encoding with special evaluation points into the new SCMA pipeline and derive bounds on the time and communication complexity of different components of the scheme. Compared to completely retraining K-means++ locally and globally for each removal request, we obtain an average speed-up of roughly 84x across seven datasets, two of which contain biological and medical information that is subject to frequent unlearning requests.
[[2210.16441] GowFed -- A novel Federated Network Intrusion Detection System](http://arxiv.org/abs/2210.16441)
Network intrusion detection systems are evolving into intelligent systems that perform data analysis while searching for anomalies in their environment. Indeed, the development of deep learning techniques paved the way to build more complex and effective threat detection models. However, training those models may be computationally infeasible in most Edge or IoT devices. Current approaches rely on powerful centralized servers that receive data from all their parties -- violating basic privacy constraints and substantially affecting response times and operational costs due to the huge communication overheads. To mitigate these issues, Federated Learning emerged as a promising approach, where different agents collaboratively train a shared model, without exposing training data to others or requiring a compute-intensive centralized infrastructure. This work presents GowFed, a novel network threat detection system that combines the usage of Gower Dissimilarity matrices and Federated averaging. Different approaches of GowFed have been developed based on state-of the-art knowledge: (1) a vanilla version; and (2) a version instrumented with an attention mechanism. Furthermore, each variant has been tested using simulation oriented tools provided by TensorFlow Federated framework. In the same way, a centralized analogous development of the Federated systems is carried out to explore their differences in terms of scalability and performance -- across a set of designed experiments/scenarios. Overall, GowFed intends to be the first stepping stone towards the combined usage of Federated Learning and Gower Dissimilarity matrices to detect network threats in industrial-level networks.
[[2210.16520] Fast-Convergent Federated Learning via Cyclic Aggregation](http://arxiv.org/abs/2210.16520)
Federated learning (FL) aims at optimizing a shared global model over multiple edge devices without transmitting (private) data to the central server. While it is theoretically well-known that FL yields an optimal model -- centrally trained model assuming availability of all the edge device data at the central server -- under mild condition, in practice, it often requires massive amount of iterations until convergence, especially under presence of statistical/computational heterogeneity. This paper utilizes cyclic learning rate at the server side to reduce the number of training iterations with increased performance without any additional computational costs for both the server and the edge devices. Numerical results validate that, simply plugging-in the proposed cyclic aggregation to the existing FL algorithms effectively reduces the number of training iterations with improved performance.
[[2210.16524] Federated clustering with GAN-based data synthesis](http://arxiv.org/abs/2210.16524)
Federated clustering is an adaptation of centralized clustering in the federated settings, which aims to cluster data based on a global similarity measure while keeping all data local. The key here is how to construct a global similarity measure without sharing private data. To handle this, k-FED and federated fuzzy c-means (FFCM) respectively adapted K-means and fuzzy c-means to the federated learning settings, which aim to construct $K$ global cluster centroids by running K-means on a set of all local cluster centroids. However, the constructed global cluster centroids may be fragile and be sensitive to different non-independent and identically distributed (Non-IID) levels among clients. To handle this, we propose a simple but effective federated clustering framework with GAN-based data synthesis, which is called synthetic data aided federated clustering (SDA-FC). It outperforms k-FED and FFCM in terms of effectiveness and robustness, requires only one communication round, can run asynchronously, and can handle device failures. Moreover, although NMI is a far more commonly used metric than Kappa, empirical results indicate that Kappa is a more reliable one.
[[2210.16656] Auxo: Heterogeneity-Mitigating Federated Learning via Scalable Client Clustering](http://arxiv.org/abs/2210.16656)
Federated learning (FL) is an emerging machine learning (ML) paradigm that enables heterogeneous edge devices to collaboratively train ML models without revealing their raw data to a logically centralized server. Heterogeneity across participants is a fundamental challenge in FL, both in terms of non-independent and identically distributed (Non-IID) data distributions and variations in device capabilities. Many existing works present point solutions to address issues like slow convergence, low final accuracy, and bias in FL, all stemming from the client heterogeneity. We observe that, in a large population, there exist groups of clients with statistically similar data distributions (cohorts). In this paper, we propose Auxo to gradually identify cohorts among large-scale, low-participation, and resource-constrained FL populations. Auxo then adaptively determines how to train cohort-specific models in order to achieve better model performance and ensure resource efficiency. By identifying cohorts with smaller heterogeneity and performing efficient cohort-based training, our extensive evaluations show that Auxo substantially boosts the state-of-the-art solutions in terms of final accuracy, convergence time, and model bias.
[[2210.16790] One Gradient Frank-Wolfe for Decentralized Online Convex and Submodular Optimization](http://arxiv.org/abs/2210.16790)
Decentralized learning has been studied intensively in recent years motivated by its wide applications in the context of federated learning. The majority of previous research focuses on the offline setting in which the objective function is static. However, the offline setting becomes unrealistic in numerous machine learning applications that witness the change of massive data. In this paper, we propose \emph{decentralized online} algorithm for convex and continuous DR-submodular optimization, two classes of functions that are present in a variety of machine learning problems. Our algorithms achieve performance guarantees comparable to those in the centralized offline setting. Moreover, on average, each participant performs only a \emph{single} gradient computation per time step. Subsequently, we extend our algorithms to the bandit setting. Finally, we illustrate the competitive performance of our algorithms in real-world experiments.
[[2210.16754] Mitigating Unfairness via Evolutionary Multi-objective Ensemble Learning](http://arxiv.org/abs/2210.16754)
In the literature of mitigating unfairness in machine learning, many fairness measures are designed to evaluate predictions of learning models and also utilised to guide the training of fair models. It has been theoretically and empirically shown that there exist conflicts and inconsistencies among accuracy and multiple fairness measures. Optimising one or several fairness measures may sacrifice or deteriorate other measures. Two key questions should be considered, how to simultaneously optimise accuracy and multiple fairness measures, and how to optimise all the considered fairness measures more effectively. In this paper, we view the mitigating unfairness problem as a multi-objective learning problem considering the conflicts among fairness measures. A multi-objective evolutionary learning framework is used to simultaneously optimise several metrics (including accuracy and multiple fairness measures) of machine learning models. Then, ensembles are constructed based on the learning models in order to automatically balance different metrics. Empirical results on eight well-known datasets demonstrate that compared with the state-of-the-art approaches for mitigating unfairness, our proposed algorithm can provide decision-makers with better tradeoffs among accuracy and multiple fairness metrics. Furthermore, the high-quality models generated by the framework can be used to construct an ensemble to automatically achieve a better tradeoff among all the considered fairness metrics than other ensemble methods. Our code is publicly available at https://github.com/qingquan63/FairEMOL
[[2210.16435] Scalable Spectral Clustering with Group Fairness Constraints](http://arxiv.org/abs/2210.16435)
There are synergies of research interests and industrial efforts in modeling fairness and correcting algorithmic bias in machine learning. In this paper, we present a scalable algorithm for spectral clustering (SC) with group fairness constraints. Group fairness is also known as statistical parity where in each cluster, each protected group is represented with the same proportion as in the entirety. While FairSC algorithm (Kleindessner et al., 2019) is able to find the fairer clustering, it is compromised by high costs due to the kernels of computing nullspaces and the square roots of dense matrices explicitly. We present a new formulation of underlying spectral computation by incorporating nullspace projection and Hotelling's deflation such that the resulting algorithm, called s-FairSC, only involves the sparse matrix-vector products and is able to fully exploit the sparsity of the fair SC model. The experimental results on the modified stochastic block model demonstrate that s-FairSC is comparable with FairSC in recovering fair clustering. Meanwhile, it is sped up by a factor of 12 for moderate model sizes. s-FairSC is further demonstrated to be scalable in the sense that the computational costs of s-FairSC only increase marginally compared to the SC without fairness constraints.