secure

Title: Ping-Pong Swaps. (arXiv:2211.13335v1 [cs.CR])

Paper URL: http://arxiv.org/abs/2211.13335
Code URL: null
Copy Paste: [[2211.13335] Ping-Pong Swaps](http://arxiv.org/abs/2211.13335) #secure
Summary:
We propose Ping-Pong Swaps: A secure pure peer-to-peer crosschain swap mechanism of tokens or cryptocurrencies that does not require escrow nor an intermediate trusted third party. The only technical requirement is to be able to open unidirectional payment channels in both blockchain protocols. This allows anonymous cryptocurrency trading without the need of a centralized exchange, nor DEX's in DeFi platforms, nor multisignature escrow systems with penalties. Direct peer-to-peer crosschain swaps can be performed without a bridge platform. This enables the creation of a global peer-to-peer market of pairs of tokens or cryptocurrencies. Ping-pong swaps with fiat currency is possible if banks incorporate simple payment channel functionalities. Some inmediate applications are simple and fast rebalancing of Lightning Network channels, and wrapping tokens in smartchains.

Title: Fast and Efficient Malware Detection with Joint Static and Dynamic Features Through Transfer Learning. (arXiv:2211.13860v1 [cs.CR])

Paper URL: http://arxiv.org/abs/2211.13860
Code URL: null
Copy Paste: [[2211.13860] Fast and Efficient Malware Detection with Joint Static and Dynamic Features Through Transfer Learning](http://arxiv.org/abs/2211.13860) #secure
Summary:
In malware detection, dynamic analysis extracts the runtime behavior of malware samples in a controlled environment and static analysis extracts features using reverse engineering tools. While the former faces the challenges of anti-virtualization and evasive behavior of malware samples, the latter faces the challenges of code obfuscation. To tackle these drawbacks, prior works proposed to develop detection models by aggregating dynamic and static features, thus leveraging the advantages of both approaches. However, simply concatenating dynamic and static features raises an issue of imbalanced contribution due to the heterogeneous dimensions of feature vectors to the performance of malware detection models. Yet, dynamic analysis is a time-consuming task and requires a secure environment, leading to detection delays and high costs for maintaining the analysis infrastructure. In this paper, we first introduce a method of constructing aggregated features via concatenating latent features learned through deep learning with equally-contributed dimensions. We then develop a knowledge distillation technique to transfer knowledge learned from aggregated features by a teacher model to a student model trained only on static features and use the trained student model for the detection of new malware samples. We carry out extensive experiments with a dataset of 86709 samples including both benign and malware samples. The experimental results show that the teacher model trained on aggregated features constructed by our method outperforms the state-of-the-art models with an improvement of up to 2.38% in detection accuracy. The distilled student model not only achieves high performance (97.81% in terms of accuracy) as that of the teacher model but also significantly reduces the detection time (from 70046.6 ms to 194.9 ms) without requiring dynamic analysis.

security

Title: Corn Yield Prediction based on Remotely Sensed Variables Using Variational Autoencoder and Multiple Instance Regression. (arXiv:2211.13286v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2211.13286
Code URL: null
Copy Paste: [[2211.13286] Corn Yield Prediction based on Remotely Sensed Variables Using Variational Autoencoder and Multiple Instance Regression](http://arxiv.org/abs/2211.13286) #security
Summary:
In the U.S., corn is the most produced crop and has been an essential part of the American diet. To meet the demand for supply chain management and regional food security, accurate and timely large-scale corn yield prediction is attracting more attention in precision agriculture. Recently, remote sensing technology and machine learning methods have been widely explored for crop yield prediction. Currently, most county-level yield prediction models use county-level mean variables for prediction, ignoring much detailed information. Moreover, inconsistent spatial resolution between crop area and satellite sensors results in mixed pixels, which may decrease the prediction accuracy. Only a few works have addressed the mixed pixels problem in large-scale crop yield prediction. To address the information loss and mixed pixels problem, we developed a variational autoencoder (VAE) based multiple instance regression (MIR) model for large-scaled corn yield prediction. We use all unlabeled data to train a VAE and the well-trained VAE for anomaly detection. As a preprocess method, anomaly detection can help MIR find a better representation of every bag than traditional MIR methods, thus better performing in large-scale corn yield prediction. Our experiments showed that variational autoencoder based multiple instance regression (VAEMIR) outperformed all baseline methods in large-scale corn yield prediction. Though a suitable meta parameter is required, VAEMIR shows excellent potential in feature learning and extraction for large-scale corn yield prediction.

Title: Detecting Anomalies using Generative Adversarial Networks on Images. (arXiv:2211.13808v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2211.13808
Code URL: null
Copy Paste: [[2211.13808] Detecting Anomalies using Generative Adversarial Networks on Images](http://arxiv.org/abs/2211.13808) #security
Summary:
Automatic detection of anomalies such as weapons or threat objects in baggage security, or detecting impaired items in industrial production is an important computer vision task demanding high efficiency and accuracy. Most of the available data in the anomaly detection task is imbalanced as the number of positive/anomalous instances is sparse. Inadequate availability of the data makes training of a deep neural network architecture for anomaly detection challenging. This paper proposes a novel Generative Adversarial Network (GAN) based model for anomaly detection. It uses normal (non-anomalous) images to learn about the normality based on which it detects if an input image contains an anomalous/threat object. The proposed model uses a generator with an encoder-decoder network having dense convolutional skip connections for enhanced reconstruction and to capture the data distribution. A self-attention augmented discriminator is used having the ability to check the consistency of detailed features even in distant portions. We use spectral normalisation to facilitate stable and improved training of the GAN. Experiments are performed on three datasets, viz. CIFAR-10, MVTec AD (for industrial applications) and SIXray (for X-ray baggage security). On the MVTec AD and SIXray datasets, our model achieves an improvement of upto 21% and 4.6%, respectively

Title: Principled Data-Driven Decision Support for Cyber-Forensic Investigations. (arXiv:2211.13345v1 [cs.CR])

Paper URL: http://arxiv.org/abs/2211.13345
Code URL: null
Copy Paste: [[2211.13345] Principled Data-Driven Decision Support for Cyber-Forensic Investigations](http://arxiv.org/abs/2211.13345) #security
Summary:
In the wake of a cybersecurity incident, it is crucial to promptly discover how the threat actors breached security in order to assess the impact of the incident and to develop and deploy countermeasures that can protect against further attacks. To this end, defenders can launch a cyber-forensic investigation, which discovers the techniques that the threat actors used in the incident. A fundamental challenge in such an investigation is prioritizing the investigation of particular techniques since the investigation of each technique requires time and effort, but forensic analysts cannot know which ones were actually used before investigating them. To ensure prompt discovery, it is imperative to provide decision support that can help forensic analysts with this prioritization. A recent study demonstrated that data-driven decision support, based on a dataset of prior incidents, can provide state-of-the-art prioritization. However, this data-driven approach, called DISCLOSE, is based on a heuristic that utilizes only a subset of the available information and does not approximate optimal decisions. To improve upon this heuristic, we introduce a principled approach for data-driven decision support for cyber-forensic investigations. We formulate the decision-support problem using a Markov decision process, whose states represent the states of a forensic investigation. To solve the decision problem, we propose a Monte Carlo tree search based method, which relies on a k-NN regression over prior incidents to estimate state-transition probabilities. We evaluate our proposed approach on multiple versions of the MITRE ATT&CK dataset, which is a knowledge base of adversarial techniques and tactics based on real-world cyber incidents, and demonstrate that our approach outperforms DISCLOSE in terms of techniques discovered per effort spent.

Title: FedCut: A Spectral Analysis Framework for Reliable Detection of Byzantine Colluders. (arXiv:2211.13389v1 [cs.CR])

Paper URL: http://arxiv.org/abs/2211.13389
Code URL: null
Copy Paste: [[2211.13389] FedCut: A Spectral Analysis Framework for Reliable Detection of Byzantine Colluders](http://arxiv.org/abs/2211.13389) #security
Summary:
This paper proposes a general spectral analysis framework that thwarts a security risk in federated Learning caused by groups of malicious Byzantine attackers or colluders, who conspire to upload vicious model updates to severely debase global model performances. The proposed framework delineates the strong consistency and temporal coherence between Byzantine colluders' model updates from a spectral analysis lens, and, formulates the detection of Byzantine misbehaviours as a community detection problem in weighted graphs. The modified normalized graph cut is then utilized to discern attackers from benign participants. Moreover, the Spectral heuristics is adopted to make the detection robust against various attacks. The proposed Byzantine colluder resilient method, i.e., FedCut, is guaranteed to converge with bounded errors. Extensive experimental results under a variety of settings justify the superiority of FedCut, which demonstrates extremely robust model performance (MP) under various attacks. It was shown that FedCut's averaged MP is 2.1% to 16.5% better than that of the state of the art Byzantine-resilient methods. In terms of the worst-case model performance (MP), FedCut is 17.6% to 69.5% better than these methods.

Title: Network Security Modelling with Distributional Data. (arXiv:2211.13419v1 [cs.CR])

Paper URL: http://arxiv.org/abs/2211.13419
Code URL: null
Copy Paste: [[2211.13419] Network Security Modelling with Distributional Data](http://arxiv.org/abs/2211.13419) #security
Summary:
We investigate the detection of botnet command and control (C2) hosts in massive IP traffic using machine learning methods. To this end, we use NetFlow data -- the industry standard for monitoring of IP traffic -- and ML models using two sets of features: conventional NetFlow variables and distributional features based on NetFlow variables. In addition to using static summaries of NetFlow features, we use quantiles of their IP-level distributions as input features in predictive models to predict whether an IP belongs to known botnet families. These models are used to develop intrusion detection systems to predict traffic traces identified with malicious attacks. The results are validated by matching predictions to existing denylists of published malicious IP addresses and deep packet inspection. The usage of our proposed novel distributional features, combined with techniques that enable modelling complex input feature spaces result in highly accurate predictions by our trained models.

Title: GitHub Considered Harmful? Analyzing Open-Source Projects for the Automatic Generation of Cryptographic API Call Sequences. (arXiv:2211.13498v1 [cs.CR])

Paper URL: http://arxiv.org/abs/2211.13498
Code URL: null
Copy Paste: [[2211.13498] GitHub Considered Harmful? Analyzing Open-Source Projects for the Automatic Generation of Cryptographic API Call Sequences](http://arxiv.org/abs/2211.13498) #security
Summary:
GitHub is a popular data repository for code examples. It is being continuously used to train several AI-based tools to automatically generate code. However, the effectiveness of such tools in correctly demonstrating the usage of cryptographic APIs has not been thoroughly assessed. In this paper, we investigate the extent and severity of misuses, specifically caused by incorrect cryptographic API call sequences in GitHub. We also analyze the suitability of GitHub data to train a learning-based model to generate correct cryptographic API call sequences. For this, we manually extracted and analyzed the call sequences from GitHub. Using this data, we augmented an existing learning-based model called DeepAPI to create two security-specific models that generate cryptographic API call sequences for a given natural language (NL) description. Our results indicate that it is imperative to not neglect the misuses in API call sequences while using data sources like GitHub, to train models that generate code.

Title: SmartIntentNN: Towards Smart Contract Intent Detection. (arXiv:2211.13670v1 [cs.CR])

Paper URL: http://arxiv.org/abs/2211.13670
Code URL: null
Copy Paste: [[2211.13670] SmartIntentNN: Towards Smart Contract Intent Detection](http://arxiv.org/abs/2211.13670) #security
Summary:
Researchers currently have been focusing on smart contract vulnerability detection, but we find that developers' intent to write smart contracts is a more noteworthy security concern because smart contracts with malicious intent have caused significant financial loss to users. A more unfortunate fact is that we can only rely on manual audits to check for unfriendly smart contracts. In this paper, we propose \textsc{SmartIntentNN}, Smart Contract Intent Neural Network, a deep learning-based tool that aims to automate the process of developers' intent detection in smart contracts, saving human resources and overhead.

The demo video is available on \url{https://youtu.be/ho1SMtYm-wI}.

Title: Blockchain based solution design for Energy Exchange Platform. (arXiv:2211.13907v1 [cs.CR])

Paper URL: http://arxiv.org/abs/2211.13907
Code URL: null
Copy Paste: [[2211.13907] Blockchain based solution design for Energy Exchange Platform](http://arxiv.org/abs/2211.13907) #security
Summary:
It is observed that users have higher requirements for fairness, transparency, and privacy of transactions of energy exchanges that occur across platforms like Indian Energy Exchange (IEX) and Power Exchange India Limited (PXIL). As a decentralized distributed accounting system, blockchain is characterized by traceability, security, credibility, and non-tampering of transactions, which can meet the needs of integrated energy and multi-energy transactions. Based on the research on the application of blockchain technology in the field of integrated energy services, this solution proposes an integrated energy trading process based on smart contracts and explores the application of blockchain technology in integrated energy services.

Title: End-to-End Stochastic Optimization with Energy-Based Model. (arXiv:2211.13837v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2211.13837
Code URL: https://github.com/Lingkai-Kong/SO-EBM
Copy Paste: [[2211.13837] End-to-End Stochastic Optimization with Energy-Based Model](http://arxiv.org/abs/2211.13837) #security
Summary:
Decision-focused learning (DFL) was recently proposed for stochastic optimization problems that involve unknown parameters. By integrating predictive modeling with an implicitly differentiable optimization layer, DFL has shown superior performance to the standard two-stage predict-then-optimize pipeline. However, most existing DFL methods are only applicable to convex problems or a subset of nonconvex problems that can be easily relaxed to convex ones. Further, they can be inefficient in training due to the requirement of solving and differentiating through the optimization problem in every training iteration. We propose SO-EBM, a general and efficient DFL method for stochastic optimization using energy-based models. Instead of relying on KKT conditions to induce an implicit optimization layer, SO-EBM explicitly parameterizes the original optimization problem using a differentiable optimization layer based on energy functions. To better approximate the optimization landscape, we propose a coupled training objective that uses a maximum likelihood loss to capture the optimum location and a distribution-based regularizer to capture the overall energy landscape. Finally, we propose an efficient training procedure for SO-EBM with a self-normalized importance sampler based on a Gaussian mixture proposal. We evaluate SO-EBM in three applications: power scheduling, COVID-19 resource allocation, and non-convex adversarial security game, demonstrating the effectiveness and efficiency of SO-EBM.

privacy

Title: Differentially Private Image Classification from Features. (arXiv:2211.13403v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2211.13403
Code URL: null
Copy Paste: [[2211.13403] Differentially Private Image Classification from Features](http://arxiv.org/abs/2211.13403) #privacy
Summary:
Leveraging transfer learning has recently been shown to be an effective strategy for training large models with Differential Privacy (DP). Moreover, somewhat surprisingly, recent works have found that privately training just the last layer of a pre-trained model provides the best utility with DP. While past studies largely rely on algorithms like DP-SGD for training large models, in the specific case of privately learning from features, we observe that computational burden is low enough to allow for more sophisticated optimization schemes, including second-order methods. To that end, we systematically explore the effect of design parameters such as loss function and optimization algorithm. We find that, while commonly used logistic regression performs better than linear regression in the non-private setting, the situation is reversed in the private setting. We find that linear regression is much more effective than logistic regression from both privacy and computational aspects, especially at stricter epsilon values ($\epsilon < 1$). On the optimization side, we also explore using Newton's method, and find that second-order information is quite helpful even with privacy, although the benefit significantly diminishes with stricter privacy guarantees. While both methods use second-order information, least squares is effective at lower epsilons while Newton's method is effective at larger epsilon values. To combine the benefits of both, we propose a novel algorithm called DP-FC, which leverages feature covariance instead of the Hessian of the logistic regression loss and performs well across all $\epsilon$ values we tried. With this, we obtain new SOTA results on ImageNet-1k, CIFAR-100 and CIFAR-10 across all values of $\epsilon$ typically considered. Most remarkably, on ImageNet-1K, we obtain top-1 accuracy of 88\% under (8, $8 * 10^{-7}$)-DP and 84.3\% under (0.1, $8 * 10^{-7}$)-DP.

Title: Responsible Active Learning via Human-in-the-loop Peer Study. (arXiv:2211.13587v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2211.13587
Code URL: null
Copy Paste: [[2211.13587] Responsible Active Learning via Human-in-the-loop Peer Study](http://arxiv.org/abs/2211.13587) #privacy
Summary:
Active learning has been proposed to reduce data annotation efforts by only manually labelling representative data samples for training. Meanwhile, recent active learning applications have benefited a lot from cloud computing services with not only sufficient computational resources but also crowdsourcing frameworks that include many humans in the active learning loop. However, previous active learning methods that always require passing large-scale unlabelled data to cloud may potentially raise significant data privacy issues. To mitigate such a risk, we propose a responsible active learning method, namely Peer Study Learning (PSL), to simultaneously preserve data privacy and improve model stability. Specifically, we first introduce a human-in-the-loop teacher-student architecture to isolate unlabelled data from the task learner (teacher) on the cloud-side by maintaining an active learner (student) on the client-side. During training, the task learner instructs the light-weight active learner which then provides feedback on the active sampling criterion. To further enhance the active learner via large-scale unlabelled data, we introduce multiple peer students into the active learner which is trained by a novel learning paradigm, including the In-Class Peer Study on labelled data and the Out-of-Class Peer Study on unlabelled data. Lastly, we devise a discrepancy-based active sampling criterion, Peer Study Feedback, that exploits the variability of peer students to select the most informative data to improve model stability. Extensive experiments demonstrate the superiority of the proposed PSL over a wide range of active learning methods in both standard and sensitive protection settings.

Title: Data Provenance Inference in Machine Learning. (arXiv:2211.13416v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2211.13416
Code URL: null
Copy Paste: [[2211.13416] Data Provenance Inference in Machine Learning](http://arxiv.org/abs/2211.13416) #privacy
Summary:
Unintended memorization of various information granularity has garnered academic attention in recent years, e.g. membership inference and property inference. How to inversely use this privacy leakage to facilitate real-world applications is a growing direction; the current efforts include dataset ownership inference and user auditing. Standing on the data lifecycle and ML model production, we propose an inference process named Data Provenance Inference, which is to infer the generation, collection or processing property of the ML training data, to assist ML developers in locating the training data gaps without maintaining strenuous metadata. We formularly define the data provenance and the data provenance inference task in ML training. Then we propose a novel inference strategy combining embedded-space multiple instance classification and shadow learning. Comprehensive evaluations cover language, visual and structured data in black-box and white-box settings, with diverse kinds of data provenance (i.e. business, county, movie, user). Our best inference accuracy achieves 98.96% in the white-box text model when "author" is the data provenance. The experimental results indicate that, in general, the inference performance positively correlated with the amount of reference data for inference, the depth and also the amount of the parameter of the accessed layer. Furthermore, we give a post-hoc statistical analysis of the data provenance definition to explain when our proposed method works well.

Title: A Privacy-Preserving Outsourced Data Model in Cloud Environment. (arXiv:2211.13542v1 [cs.CR])

Paper URL: http://arxiv.org/abs/2211.13542
Code URL: null
Copy Paste: [[2211.13542] A Privacy-Preserving Outsourced Data Model in Cloud Environment](http://arxiv.org/abs/2211.13542) #privacy
Summary:
Nowadays, more and more machine learning applications, such as medical diagnosis, online fraud detection, email spam filtering, etc., services are provided by cloud computing. The cloud service provider collects the data from the various owners to train or classify the machine learning system in the cloud environment. However, multiple data owners may not entirely rely on the cloud platform that a third party engages. Therefore, data security and privacy problems are among the critical hindrances to using machine learning tools, particularly with multiple data owners. In addition, unauthorized entities can detect the statistical input data and infer the machine learning model parameters. Therefore, a privacy-preserving model is proposed, which protects the privacy of the data without compromising machine learning efficiency. In order to protect the data of data owners, the epsilon-differential privacy is used, and fog nodes are used to address the problem of the lower bandwidth and latency in this proposed scheme. The noise is produced by the epsilon-differential mechanism, which is then added to the data. Moreover, the noise is injected at the data owner site to protect the owners data. Fog nodes collect the noise-added data from the data owners, then shift it to the cloud platform for storage, computation, and performing the classification tasks purposes.

Title: FPT: a Fixed-Point Accelerator for Torus Fully Homomorphic Encryption. (arXiv:2211.13696v1 [cs.CR])

Paper URL: http://arxiv.org/abs/2211.13696
Code URL: null
Copy Paste: [[2211.13696] FPT: a Fixed-Point Accelerator for Torus Fully Homomorphic Encryption](http://arxiv.org/abs/2211.13696) #privacy
Summary:
Fully Homomorphic Encryption is a technique that allows computation on encrypted data. It has the potential to drastically change privacy considerations in the cloud, but high computational and memory overheads are preventing its broad adoption. TFHE is a promising Torus-based FHE scheme that heavily relies on bootstrapping, the noise-removal tool that must be invoked after every encrypted gate computation.

We present FPT, a Fixed-Point FPGA accelerator for TFHE bootstrapping. FPT is the first hardware accelerator to heavily exploit the inherent noise present in FHE calculations. Instead of double or single-precision floating-point arithmetic, it implements TFHE bootstrapping entirely with approximate fixed-point arithmetic. Using an in-depth analysis of noise propagation in bootstrapping FFT computations, FPT is able to use noise-trimmed fixed-point representations that are up to 50% smaller than prior implementations using floating-point or integer FFTs.

FPT's microarchitecture is built as a streaming processor inspired by traditional streaming DSPs: it instantiates high-throughput computational stages that are directly cascaded, with simplified control logic and routing networks. FPT's streaming approach allows 100% utilization of arithmetic units and requires only small bootstrapping key caches, enabling an entirely compute-bound bootstrapping throughput of 1 BS / 35$\mu$s. This is in stark contrast to the established classical CPU approach to FHE bootstrapping acceleration, which tends to be heavily memory and bandwidth-constrained.

FPT is fully implemented and evaluated as a bootstrapping FPGA kernel for an Alveo U280 datacenter accelerator card. FPT achieves almost three orders of magnitude higher bootstrapping throughput than existing CPU-based implementations, and 2.5$\times$ higher throughput compared to recent ASIC emulation experiments.

Title: CryptoLight: An Electro-Optical Accelerator for Fully Homomorphic Encryption. (arXiv:2211.13780v1 [cs.CR])

Paper URL: http://arxiv.org/abs/2211.13780
Code URL: null
Copy Paste: [[2211.13780] CryptoLight: An Electro-Optical Accelerator for Fully Homomorphic Encryption](http://arxiv.org/abs/2211.13780) #privacy
Summary:
Fully homomorphic encryption (FHE) protects data privacy in cloud computing by enabling computations to directly occur on ciphertexts. Although the speed of computationally expensive FHE operations can be significantly boosted by prior ASIC-based FHE accelerators, the performance of key-switching, the dominate primitive in various FHE operations, is seriously limited by their small bit-width datapaths and frequent matrix transpositions. In this paper, we present an electro-optical (EO) FHE accelerator, CryptoLight, to accelerate FHE operations. Its 512-bit datapath supporting 510-bit residues greatly reduces the key-switching cost. We also create an in-scratchpad-memory transpose unit to fast transpose matrices. Compared to prior FHE accelerators, on average, CryptoLight reduces the latency of various FHE applications by >94.4% and the energy consumption by >95%.

Title: Turning the Tables: Biased, Imbalanced, Dynamic Tabular Datasets for ML Evaluation. (arXiv:2211.13358v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2211.13358
Code URL: https://github.com/feedzai/bank-account-fraud
Copy Paste: [[2211.13358] Turning the Tables: Biased, Imbalanced, Dynamic Tabular Datasets for ML Evaluation](http://arxiv.org/abs/2211.13358) #privacy
Summary:
Evaluating new techniques on realistic datasets plays a crucial role in the development of ML research and its broader adoption by practitioners. In recent years, there has been a significant increase of publicly available unstructured data resources for computer vision and NLP tasks. However, tabular data -- which is prevalent in many high-stakes domains -- has been lagging behind. To bridge this gap, we present Bank Account Fraud (BAF), the first publicly available 1 privacy-preserving, large-scale, realistic suite of tabular datasets. The suite was generated by applying state-of-the-art tabular data generation techniques on an anonymized,real-world bank account opening fraud detection dataset. This setting carries a set of challenges that are commonplace in real-world applications, including temporal dynamics and significant class imbalance. Additionally, to allow practitioners to stress test both performance and fairness of ML methods, each dataset variant of BAF contains specific types of data bias. With this resource, we aim to provide the research community with a more realistic, complete, and robust test bed to evaluate novel and existing methods.

protect

Title: Tracking Dataset IP Use in Deep Neural Networks. (arXiv:2211.13535v1 [cs.CR])

Paper URL: http://arxiv.org/abs/2211.13535
Code URL: null
Copy Paste: [[2211.13535] Tracking Dataset IP Use in Deep Neural Networks](http://arxiv.org/abs/2211.13535) #protect
Summary:
Training highly performant deep neural networks (DNNs) typically requires the collection of a massive dataset and the use of powerful computing resources. Therefore, unauthorized redistribution of private pre-trained DNNs may cause severe economic loss for model owners. For protecting the ownership of DNN models, DNN watermarking schemes have been proposed by embedding secret information in a DNN model and verifying its presence for model ownership. However, existing DNN watermarking schemes compromise the model utility and are vulnerable to watermark removal attacks because a model is modified with a watermark. Alternatively, a new approach dubbed DEEPJUDGE was introduced to measure the similarity between a suspect model and a victim model without modifying the victim model. However, DEEPJUDGE would only be designed to detect the case where a suspect model's architecture is the same as a victim model's. In this work, we propose a novel DNN fingerprinting technique dubbed DEEPTASTER to prevent a new attack scenario in which a victim's data is stolen to build a suspect model. DEEPTASTER can effectively detect such data theft attacks even when a suspect model's architecture differs from a victim model's. To achieve this goal, DEEPTASTER generates a few adversarial images with perturbations, transforms them into the Fourier frequency domain, and uses the transformed images to identify the dataset used in a suspect model. The intuition is that those adversarial images can be used to capture the characteristics of DNNs built on a specific dataset. We evaluated the detection accuracy of DEEPTASTER on three datasets with three model architectures under various attack scenarios, including transfer learning, pruning, fine-tuning, and data augmentation. Overall, DEEPTASTER achieves a balanced accuracy of 94.95%, which is significantly better than 61.11% achieved by DEEPJUDGE in the same settings.

defense

attack

Title: Dual Graphs of Polyhedral Decompositions for the Detection of Adversarial Attacks. (arXiv:2211.13305v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2211.13305
Code URL: null
Copy Paste: [[2211.13305] Dual Graphs of Polyhedral Decompositions for the Detection of Adversarial Attacks](http://arxiv.org/abs/2211.13305) #attack
Summary:
Previous work has shown that a neural network with the rectified linear unit (ReLU) activation function leads to a convex polyhedral decomposition of the input space. These decompositions can be represented by a dual graph with vertices corresponding to polyhedra and edges corresponding to polyhedra sharing a facet, which is a subgraph of a Hamming graph. This paper illustrates how one can utilize the dual graph to detect and analyze adversarial attacks in the context of digital images. When an image passes through a network containing ReLU nodes, the firing or non-firing at a node can be encoded as a bit ($1$ for ReLU activation, $0$ for ReLU non-activation). The sequence of all bit activations identifies the image with a bit vector, which identifies it with a polyhedron in the decomposition and, in turn, identifies it with a vertex in the dual graph. We identify ReLU bits that are discriminators between non-adversarial and adversarial images and examine how well collections of these discriminators can ensemble vote to build an adversarial image detector. Specifically, we examine the similarities and differences of ReLU bit vectors for adversarial images, and their non-adversarial counterparts, using a pre-trained ResNet-50 architecture. While this paper focuses on adversarial digital images, ResNet-50 architecture, and the ReLU activation function, our methods extend to other network architectures, activation functions, and types of datasets.

Title: SAGA: Spectral Adversarial Geometric Attack on 3D Meshes. (arXiv:2211.13775v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2211.13775
Code URL: null
Copy Paste: [[2211.13775] SAGA: Spectral Adversarial Geometric Attack on 3D Meshes](http://arxiv.org/abs/2211.13775) #attack
Summary:
A triangular mesh is one of the most popular 3D data representations. As such, the deployment of deep neural networks for mesh processing is widely spread and is increasingly attracting more attention. However, neural networks are prone to adversarial attacks, where carefully crafted inputs impair the model's functionality. The need to explore these vulnerabilities is a fundamental factor in the future development of 3D-based applications. Recently, mesh attacks were studied on the semantic level, where classifiers are misled to produce wrong predictions. Nevertheless, mesh surfaces possess complex geometric attributes beyond their semantic meaning, and their analysis often includes the need to encode and reconstruct the geometry of the shape.

We propose a novel framework for a geometric adversarial attack on a 3D mesh autoencoder. In this setting, an adversarial input mesh deceives the autoencoder by forcing it to reconstruct a different geometric shape at its output. The malicious input is produced by perturbing a clean shape in the spectral domain. Our method leverages the spectral decomposition of the mesh along with additional mesh-related properties to obtain visually credible results that consider the delicacy of surface distortions. Our code is publicly available at https://github.com/StolikTomer/SAGA.

Title: Specognitor: Identifying Spectre Vulnerabilities via Prediction-Aware Symbolic Execution. (arXiv:2211.13526v1 [cs.CR])

Paper URL: http://arxiv.org/abs/2211.13526
Code URL: null
Copy Paste: [[2211.13526] Specognitor: Identifying Spectre Vulnerabilities via Prediction-Aware Symbolic Execution](http://arxiv.org/abs/2211.13526) #attack
Summary:
Spectre attacks exploit speculative execution to leak sensitive information. In the last few years, a number of static side-channel detectors have been proposed to detect cache leakage in the presence of speculative execution. However, these techniques either ignore branch prediction mechanism, detect static pre-defined patterns which is not suitable for detecting new patterns, or lead to false negatives.

In this paper, we illustrate the weakness of prediction-agnostic state-of-the-art approaches. We propose Specognitor, a novel prediction-aware symbolic execution engine to soundly explore program paths and detect subtle spectre variant 1 and variant 2 vulnerabilities. We propose a dynamic pattern detection mechanism to account for both existing and future vulnerabilities. Our experimental results show the effectiveness and efficiency of Specognitor in analyzing real-world cryptographic programs w.r.t. different processor families.

Title: Explainable and Safe Reinforcement Learning for Autonomous Air Mobility. (arXiv:2211.13474v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2211.13474
Code URL: https://github.com/wleiiiii/gym-atc-attack-project
Copy Paste: [[2211.13474] Explainable and Safe Reinforcement Learning for Autonomous Air Mobility](http://arxiv.org/abs/2211.13474) #attack
Summary:
Increasing traffic demands, higher levels of automation, and communication enhancements provide novel design opportunities for future air traffic controllers (ATCs). This article presents a novel deep reinforcement learning (DRL) controller to aid conflict resolution for autonomous free flight. Although DRL has achieved important advancements in this field, the existing works pay little attention to the explainability and safety issues related to DRL controllers, particularly the safety under adversarial attacks. To address those two issues, we design a fully explainable DRL framework wherein we: 1) decompose the coupled Q value learning model into a safety-awareness and efficiency (reach the target) one; and 2) use information from surrounding intruders as inputs, eliminating the needs of central controllers. In our simulated experiments, we show that by decoupling the safety-awareness and efficiency, we can exceed performance on free flight control tasks while dramatically improving explainability on practical. In addition, the safety Q learning module provides rich information about the safety situation of environments. To study the safety under adversarial attacks, we additionally propose an adversarial attack strategy that can impose both safety-oriented and efficiency-oriented attacks. The adversarial aims to minimize safety/efficiency by only attacking the agent at a few time steps. In the experiments, our attack strategy increases as many collisions as the uniform attack (i.e., attacking at every time step) by only attacking the agent four times less often, which provide insights into the capabilities and restrictions of the DRL in future ATC designs. The source code is publicly available at https://github.com/WLeiiiii/Gym-ATC-Attack-Project.

robust

Title: How do Cross-View and Cross-Modal Alignment Affect Representations in Contrastive Learning?. (arXiv:2211.13309v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2211.13309
Code URL: null
Copy Paste: [[2211.13309] How do Cross-View and Cross-Modal Alignment Affect Representations in Contrastive Learning?](http://arxiv.org/abs/2211.13309) #robust
Summary:
Various state-of-the-art self-supervised visual representation learning approaches take advantage of data from multiple sensors by aligning the feature representations across views and/or modalities. In this work, we investigate how aligning representations affects the visual features obtained from cross-view and cross-modal contrastive learning on images and point clouds. On five real-world datasets and on five tasks, we train and evaluate 108 models based on four pretraining variations. We find that cross-modal representation alignment discards complementary visual information, such as color and texture, and instead emphasizes redundant depth cues. The depth cues obtained from pretraining improve downstream depth prediction performance. Also overall, cross-modal alignment leads to more robust encoders than pre-training by cross-view alignment, especially on depth prediction, instance segmentation, and object detection.

Title: Multi-Task Learning of Object State Changes from Uncurated Videos. (arXiv:2211.13500v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2211.13500
Code URL: https://github.com/soCzech/MultiTaskObjectStates
Copy Paste: [[2211.13500] Multi-Task Learning of Object State Changes from Uncurated Videos](http://arxiv.org/abs/2211.13500) #robust
Summary:
We aim to learn to temporally localize object state changes and the corresponding state-modifying actions by observing people interacting with objects in long uncurated web videos. We introduce three principal contributions. First, we explore alternative multi-task network architectures and identify a model that enables efficient joint learning of multiple object states and actions such as pouring water and pouring coffee. Second, we design a multi-task self-supervised learning procedure that exploits different types of constraints between objects and state-modifying actions enabling end-to-end training of a model for temporal localization of object states and actions in videos from only noisy video-level supervision. Third, we report results on the large-scale ChangeIt and COIN datasets containing tens of thousands of long (un)curated web videos depicting various interactions such as hole drilling, cream whisking, or paper plane folding. We show that our multi-task model achieves a relative improvement of 40% over the prior single-task methods and significantly outperforms both image-based and video-based zero-shot models for this problem. We also test our method on long egocentric videos of the EPIC-KITCHENS and the Ego4D datasets in a zero-shot setup demonstrating the robustness of our learned model.

Title: Chinese Character Recognition with Radical-Structured Stroke Trees. (arXiv:2211.13518v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2211.13518
Code URL: null
Copy Paste: [[2211.13518] Chinese Character Recognition with Radical-Structured Stroke Trees](http://arxiv.org/abs/2211.13518) #robust
Summary:
The flourishing blossom of deep learning has witnessed the rapid development of Chinese character recognition. However, it remains a great challenge that the characters for testing may have different distributions from those of the training dataset. Existing methods based on a single-level representation (character-level, radical-level, or stroke-level) may be either too sensitive to distribution changes (e.g., induced by blurring, occlusion, and zero-shot problems) or too tolerant to one-to-many ambiguities. In this paper, we represent each Chinese character as a stroke tree, which is organized according to its radical structures, to fully exploit the merits of both radical and stroke levels in a decent way. We propose a two-stage decomposition framework, where a Feature-to-Radical Decoder perceives radical structures and radical regions, and a Radical-to-Stroke Decoder further predicts the stroke sequences according to the features of radical regions. The generated radical structures and stroke sequences are encoded as a Radical-Structured Stroke Tree (RSST), which is fed to a Tree-to-Character Translator based on the proposed Weighted Edit Distance to match the closest candidate character in the RSST lexicon. Our extensive experimental results demonstrate that the proposed method outperforms the state-of-the-art single-level methods by increasing margins as the distribution difference becomes more severe in the blurring, occlusion, and zero-shot scenarios, which indeed validates the robustness of the proposed method.

Title: 3D Dual-Fusion: Dual-Domain Dual-Query Camera-LiDAR Fusion for 3D Object Detection. (arXiv:2211.13529v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2211.13529
Code URL: null
Copy Paste: [[2211.13529] 3D Dual-Fusion: Dual-Domain Dual-Query Camera-LiDAR Fusion for 3D Object Detection](http://arxiv.org/abs/2211.13529) #robust
Summary:
Fusing data from cameras and LiDAR sensors is an essential technique to achieve robust 3D object detection. One key challenge in camera-LiDAR fusion involves mitigating the large domain gap between the two sensors in terms of coordinates and data distribution when fusing their features. In this paper, we propose a novel camera-LiDAR fusion architecture called, 3D Dual-Fusion, which is designed to mitigate the gap between the feature representations of camera and LiDAR data. The proposed method fuses the features of the camera-view and 3D voxel-view domain and models their interactions through deformable attention. We redesign the transformer fusion encoder to aggregate the information from the two domains. Two major changes include 1) dual query-based deformable attention to fuse the dual-domain features interactively and 2) 3D local self-attention to encode the voxel-domain queries prior to dual-query decoding. The results of an experimental evaluation show that the proposed camera-LiDAR fusion architecture achieved competitive performance on the KITTI and nuScenes datasets, with state-of-the-art performances in some 3D object detection benchmarks categories.

Title: Cross-domain Transfer of defect features in technical domains based on partial target data. (arXiv:2211.13662v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2211.13662
Code URL: null
Copy Paste: [[2211.13662] Cross-domain Transfer of defect features in technical domains based on partial target data](http://arxiv.org/abs/2211.13662) #robust
Summary:
A common challenge in real world classification scenarios with sequentially appending target domain data is insufficient training datasets during the training phase. Therefore, conventional deep learning and transfer learning classifiers are not applicable especially when individual classes are not represented or are severely underrepresented at the outset. In many technical domains, however, it is only the defect or worn reject classes that are insufficiently represented, while the non-defect class is often available from the beginning. The proposed classification approach addresses such conditions and is based on a CNN encoder. Following a contrastive learning approach, it is trained with a modified triplet loss function using two datasets: Besides the non-defective target domain class 1st dataset, a state-of-the-art labeled source domain dataset that contains highly related classes e.g., a related manufacturing error or wear defect but originates from a highly different domain e.g., different product, material, or appearance = 2nd dataset is utilized. The approach learns the classification features from the source domain dataset while at the same time learning the differences between the source and the target domain in a single training step, aiming to transfer the relevant features to the target domain. The classifier becomes sensitive to the classification features and by architecture robust against the highly domain-specific context. The approach is benchmarked in a technical and a non-technical domain and shows convincing classification results. In particular, it is shown that the domain generalization capabilities and classification results are improved by the proposed architecture, allowing for larger domain shifts between source and target domains.

Title: On Pitfalls of Measuring Occlusion Robustness through Data Distortion. (arXiv:2211.13734v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2211.13734
Code URL: null
Copy Paste: [[2211.13734] On Pitfalls of Measuring Occlusion Robustness through Data Distortion](http://arxiv.org/abs/2211.13734) #robust
Summary:
Over the past years, the crucial role of data has largely been shadowed by the field's focus on architectures and training procedures. We often cause changes to the data without being aware of their wider implications. In this paper we show that distorting images without accounting for the artefacts introduced leads to biased results when establishing occlusion robustness. To ensure models behave as expected in real-world scenarios, we need to rule out the impact added artefacts have on evaluation. We propose a new approach, iOcclusion, as a fairer alternative for applications where the possible occluders are unknown.

Title: TemporalStereo: Efficient Spatial-Temporal Stereo Matching Network. (arXiv:2211.13755v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2211.13755
Code URL: https://github.com/youmi-zym/temporalstereo
Copy Paste: [[2211.13755] TemporalStereo: Efficient Spatial-Temporal Stereo Matching Network](http://arxiv.org/abs/2211.13755) #robust
Summary:
We present TemporalStereo, a coarse-to-fine based online stereo matching network which is highly efficient, and able to effectively exploit the past geometry and context information to boost the matching accuracy. Our network leverages sparse cost volume and proves to be effective when a single stereo pair is given, however, its peculiar ability to use spatio-temporal information across frames allows TemporalStereo to alleviate problems such as occlusions and reflective regions while enjoying high efficiency also in the case of stereo sequences. Notably our model trained, once with stereo videos, can run in both single-pair and temporal ways seamlessly. Experiments show that our network relying on camera motion is even robust to dynamic objects when running on videos. We validate TemporalStereo through extensive experiments on synthetic (SceneFlow, TartanAir) and real (KITTI 2012, KITTI 2015) datasets. Detailed results show that our model achieves state-of-the-art performance on any of these datasets. Code is available at \url{https://github.com/youmi-zym/TemporalStereo.git}.

Title: Contrastive pretraining for semantic segmentation is robust to noisy positive pairs. (arXiv:2211.13756v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2211.13756
Code URL: null
Copy Paste: [[2211.13756] Contrastive pretraining for semantic segmentation is robust to noisy positive pairs](http://arxiv.org/abs/2211.13756) #robust
Summary:
Domain-specific variants of contrastive learning can construct positive pairs from two distinct images, as opposed to augmenting the same image twice. Unlike in traditional contrastive methods, this can result in positive pairs not matching perfectly. Similar to false negative pairs, this could impede model performance. Surprisingly, we find that downstream semantic segmentation is either robust to the noisy pairs or even benefits from them. The experiments are conducted on the remote sensing dataset xBD, and a synthetic segmentation dataset, on which we have full control over the noise parameters. As a result, practitioners should be able to use such domain-specific contrastive methods without having to filter their positive pairs beforehand.

Title: Towards Practical Control of Singular Values of Convolutional Layers. (arXiv:2211.13771v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2211.13771
Code URL: https://github.com/whiteteadragon/practical_svd_conv
Copy Paste: [[2211.13771] Towards Practical Control of Singular Values of Convolutional Layers](http://arxiv.org/abs/2211.13771) #robust
Summary:
In general, convolutional neural networks (CNNs) are easy to train, but their essential properties, such as generalization error and adversarial robustness, are hard to control. Recent research demonstrated that singular values of convolutional layers significantly affect such elusive properties and offered several methods for controlling them. Nevertheless, these methods present an intractable computational challenge or resort to coarse approximations. In this paper, we offer a principled approach to alleviating constraints of the prior art at the expense of an insignificant reduction in layer expressivity. Our method is based on the tensor-train decomposition; it retains control over the actual singular values of convolutional mappings while providing structurally sparse and hardware-friendly representation. We demonstrate the improved properties of modern CNNs with our method and analyze its impact on the model performance, calibration, and adversarial robustness. The source code is available at: https://github.com/WhiteTeaDragon/practical_svd_conv

Title: Semantic Communication Enabling Robust Edge Intelligence for Time-Critical IoT Applications. (arXiv:2211.13787v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2211.13787
Code URL: null
Copy Paste: [[2211.13787] Semantic Communication Enabling Robust Edge Intelligence for Time-Critical IoT Applications](http://arxiv.org/abs/2211.13787) #robust
Summary:
This paper aims to design robust Edge Intelligence using semantic communication for time-critical IoT applications. We systematically analyze the effect of image DCT coefficients on inference accuracy and propose the channel-agnostic effectiveness encoding for offloading by transmitting the most meaningful task data first. This scheme can well utilize all available communication resource and strike a balance between transmission latency and inference accuracy. Then, we design an effectiveness decoding by implementing a novel image augmentation process for convolutional neural network (CNN) training, through which an original CNN model is transformed into a Robust CNN model. We use the proposed training method to generate Robust MobileNet-v2 and Robust ResNet-50. The proposed Edge Intelligence framework consists of the proposed effectiveness encoding and effectiveness decoding. The experimental results show that the effectiveness decoding using the Robust CNN models perform consistently better under various image distortions caused by channel errors or limited communication resource. The proposed Edge Intelligence framework using semantic communication significantly outperforms the conventional approach under latency and data rate constraints, in particular, under ultra stringent deadlines and low data rate.

Title: FFHQ-UV: Normalized Facial UV-Texture Dataset for 3D Face Reconstruction. (arXiv:2211.13874v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2211.13874
Code URL: https://github.com/csbhr/ffhq-uv
Copy Paste: [[2211.13874] FFHQ-UV: Normalized Facial UV-Texture Dataset for 3D Face Reconstruction](http://arxiv.org/abs/2211.13874) #robust
Summary:
We present a large-scale facial UV-texture dataset that contains over 50,000 high-quality texture UV-maps with even illuminations, neutral expressions, and cleaned facial regions, which are desired characteristics for rendering realistic 3D face models under different lighting conditions. The dataset is derived from a large-scale face image dataset namely FFHQ, with the help of our fully automatic and robust UV-texture production pipeline. Our pipeline utilizes the recent advances in StyleGAN-based facial image editing approaches to generate multi-view normalized face images from single-image inputs. An elaborated UV-texture extraction, correction, and completion procedure is then applied to produce high-quality UV-maps from the normalized face images. Compared with existing UV-texture datasets, our dataset has more diverse and higher-quality texture maps. We further train a GAN-based texture decoder as the nonlinear texture basis for parametric fitting based 3D face reconstruction. Experiments show that our method improves the reconstruction accuracy over state-of-the-art approaches, and more importantly, produces high-quality texture maps that are ready for realistic renderings. The dataset, code, and pre-trained texture decoder are publicly available at https://github.com/csbhr/FFHQ-UV.

Title: TAOTF: A Two-stage Approximately Orthogonal Training Framework in Deep Neural Networks. (arXiv:2211.13902v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2211.13902
Code URL: null
Copy Paste: [[2211.13902] TAOTF: A Two-stage Approximately Orthogonal Training Framework in Deep Neural Networks](http://arxiv.org/abs/2211.13902) #robust
Summary:
The orthogonality constraints, including the hard and soft ones, have been used to normalize the weight matrices of Deep Neural Network (DNN) models, especially the Convolutional Neural Network (CNN) and Vision Transformer (ViT), to reduce model parameter redundancy and improve training stability. However, the robustness to noisy data of these models with constraints is not always satisfactory. In this work, we propose a novel two-stage approximately orthogonal training framework (TAOTF) to find a trade-off between the orthogonal solution space and the main task solution space to solve this problem in noisy data scenarios. In the first stage, we propose a novel algorithm called polar decomposition-based orthogonal initialization (PDOI) to find a good initialization for the orthogonal optimization. In the second stage, unlike other existing methods, we apply soft orthogonal constraints for all layers of DNN model. We evaluate the proposed model-agnostic framework both on the natural image and medical image datasets, which show that our method achieves stable and superior performances to existing methods.

Title: Towards Good Practices for Missing Modality Robust Action Recognition. (arXiv:2211.13916v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2211.13916
Code URL: https://github.com/sangminwoo/actionmae
Copy Paste: [[2211.13916] Towards Good Practices for Missing Modality Robust Action Recognition](http://arxiv.org/abs/2211.13916) #robust
Summary:
Standard multi-modal models assume the use of the same modalities in training and inference stages. However, in practice, the environment in which multi-modal models operate may not satisfy such assumption. As such, their performances degrade drastically if any modality is missing in the inference stage. We ask: how can we train a model that is robust to missing modalities? This paper seeks a set of good practices for multi-modal action recognition, with a particular interest in circumstances where some modalities are not available at an inference time. First, we study how to effectively regularize the model during training (e.g., data augmentation). Second, we investigate on fusion methods for robustness to missing modalities: we find that transformer-based fusion shows better robustness for missing modality than summation or concatenation. Third, we propose a simple modular network, ActionMAE, which learns missing modality predictive coding by randomly dropping modality features and tries to reconstruct them with the remaining modality features. Coupling these good practices, we build a model that is not only effective in multi-modal action recognition but also robust to modality missing. Our model achieves the state-of-the-arts on multiple benchmarks and maintains competitive performances even in missing modality scenarios. Codes are available at https://github.com/sangminwoo/ActionMAE.

Title: SEAT: Stable and Explainable Attention. (arXiv:2211.13290v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2211.13290
Code URL: null
Copy Paste: [[2211.13290] SEAT: Stable and Explainable Attention](http://arxiv.org/abs/2211.13290) #robust
Summary:
Currently, attention mechanism becomes a standard fixture in most state-of-the-art natural language processing (NLP) models, not only due to outstanding performance it could gain, but also due to plausible innate explanation for the behaviors of neural architectures it provides, which is notoriously difficult to analyze. However, recent studies show that attention is unstable against randomness and perturbations during training or testing, such as random seeds and slight perturbation of embedding vectors, which impedes it from becoming a faithful explanation tool. Thus, a natural question is whether we can find some substitute of the current attention which is more stable and could keep the most important characteristics on explanation and prediction of attention. In this paper, to resolve the problem, we provide a first rigorous definition of such alternate namely SEAT (Stable and Explainable Attention). Specifically, a SEAT should has the following three properties: (1) Its prediction distribution is enforced to be close to the distribution based on the vanilla attention; (2) Its top-k indices have large overlaps with those of the vanilla attention; (3) It is robust w.r.t perturbations, i.e., any slight perturbation on SEAT will not change the prediction distribution too much, which implicitly indicates that it is stable to randomness and perturbations. Finally, through intensive experiments on various datasets, we compare our SEAT with other baseline methods using RNN, BiLSTM and BERT architectures via six different evaluation metrics for model interpretation, stability and accuracy. Results show that SEAT is more stable against different perturbations and randomness while also keeps the explainability of attention, which indicates it is a more faithful explanation. Moreover, compared with vanilla attention, there is almost no utility (accuracy) degradation for SEAT.

Title: Prototypical Fine-tuning: Towards Robust Performance Under Varying Data Sizes. (arXiv:2211.13638v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2211.13638
Code URL: null
Copy Paste: [[2211.13638] Prototypical Fine-tuning: Towards Robust Performance Under Varying Data Sizes](http://arxiv.org/abs/2211.13638) #robust
Summary:
In this paper, we move towards combining large parametric models with non-parametric prototypical networks. We propose prototypical fine-tuning, a novel prototypical framework for fine-tuning pretrained language models (LM), which automatically learns a bias to improve predictive performance for varying data sizes, especially low-resource settings. Our prototypical fine-tuning approach can automatically adjust the model capacity according to the number of data points and the model's inherent attributes. Moreover, we propose four principles for effective prototype fine-tuning towards the optimal solution. Experimental results across various datasets show that our work achieves significant performance improvements under various low-resource settings, as well as comparable and usually better performances in high-resource scenarios.

Title: Competency-Aware Neural Machine Translation: Can Machine Translation Know its Own Translation Quality?. (arXiv:2211.13865v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2211.13865
Code URL: https://github.com/xiaoyi0814/canmt
Copy Paste: [[2211.13865] Competency-Aware Neural Machine Translation: Can Machine Translation Know its Own Translation Quality?](http://arxiv.org/abs/2211.13865) #robust
Summary:
Neural machine translation (NMT) is often criticized for failures that happen without awareness. The lack of competency awareness makes NMT untrustworthy. This is in sharp contrast to human translators who give feedback or conduct further investigations whenever they are in doubt about predictions. To fill this gap, we propose a novel competency-aware NMT by extending conventional NMT with a self-estimator, offering abilities to translate a source sentence and estimate its competency. The self-estimator encodes the information of the decoding procedure and then examines whether it can reconstruct the original semantics of the source sentence. Experimental results on four translation tasks demonstrate that the proposed method not only carries out translation tasks intact but also delivers outstanding performance on quality estimation. Without depending on any reference or annotated data typically required by state-of-the-art metric and quality estimation methods, our model yields an even higher correlation with human quality judgments than a variety of aforementioned methods, such as BLEURT, COMET, and BERTScore. Quantitative and qualitative analyses show better robustness of competency awareness in our model.

Title: Lempel-Ziv Networks. (arXiv:2211.13250v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2211.13250
Code URL: null
Copy Paste: [[2211.13250] Lempel-Ziv Networks](http://arxiv.org/abs/2211.13250) #robust
Summary:
Sequence processing has long been a central area of machine learning research. Recurrent neural nets have been successful in processing sequences for a number of tasks; however, they are known to be both ineffective and computationally expensive when applied to very long sequences. Compression-based methods have demonstrated more robustness when processing such sequences -- in particular, an approach pairing the Lempel-Ziv Jaccard Distance (LZJD) with the k-Nearest Neighbor algorithm has shown promise on long sequence problems (up to $T=200,000,000$ steps) involving malware classification. Unfortunately, use of LZJD is limited to discrete domains. To extend the benefits of LZJD to a continuous domain, we investigate the effectiveness of a deep-learning analog of the algorithm, the Lempel-Ziv Network. While we achieve successful proof of concept, we are unable to improve meaningfully on the performance of a standard LSTM across a variety of datasets and sequence processing tasks. In addition to presenting this negative result, our work highlights the problem of sub-par baseline tuning in newer research areas.

Title: Multiple Imputation with Neural Network Gaussian Process for High-dimensional Incomplete Data. (arXiv:2211.13297v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2211.13297
Code URL: null
Copy Paste: [[2211.13297] Multiple Imputation with Neural Network Gaussian Process for High-dimensional Incomplete Data](http://arxiv.org/abs/2211.13297) #robust
Summary:
Missing data are ubiquitous in real world applications and, if not adequately handled, may lead to the loss of information and biased findings in downstream analysis. Particularly, high-dimensional incomplete data with a moderate sample size, such as analysis of multi-omics data, present daunting challenges. Imputation is arguably the most popular method for handling missing data, though existing imputation methods have a number of limitations. Single imputation methods such as matrix completion methods do not adequately account for imputation uncertainty and hence would yield improper statistical inference. In contrast, multiple imputation (MI) methods allow for proper inference but existing methods do not perform well in high-dimensional settings. Our work aims to address these significant methodological gaps, leveraging recent advances in neural network Gaussian process (NNGP) from a Bayesian viewpoint. We propose two NNGP-based MI methods, namely MI-NNGP, that can apply multiple imputations for missing values from a joint (posterior predictive) distribution. The MI-NNGP methods are shown to significantly outperform existing state-of-the-art methods on synthetic and real datasets, in terms of imputation error, statistical inference, robustness to missing rates, and computation costs, under three missing data mechanisms, MCAR, MAR, and MNAR.

Title: CoMadOut -- A Robust Outlier Detection Algorithm based on CoMAD. (arXiv:2211.13314v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2211.13314
Code URL: null
Copy Paste: [[2211.13314] CoMadOut -- A Robust Outlier Detection Algorithm based on CoMAD](http://arxiv.org/abs/2211.13314) #robust
Summary:
Unsupervised learning methods are well established in the area of anomaly detection and achieve state of the art performances on outlier data sets. Outliers play a significant role, since they bear the potential to distort the predictions of a machine learning algorithm on a given data set. Especially among PCA-based methods, outliers have an additional destructive potential regarding the result: they may not only distort the orientation and translation of the principal components, they also make it more complicated to detect outliers. To address this problem, we propose the robust outlier detection algorithm CoMadOut, which satisfies two required properties: (1) being robust towards outliers and (2) detecting them. Our outlier detection method using coMAD-PCA defines dependent on its variant an inlier region with a robust noise margin by measures of in-distribution (ID) and out-of-distribution (OOD). These measures allow distribution based outlier scoring for each principal component, and thus, for an appropriate alignment of the decision boundary between normal and abnormal instances. Experiments comparing CoMadOut with traditional, deep and other comparable robust outlier detection methods showed that the performance of the introduced CoMadOut approach is competitive to well established methods related to average precision (AP), recall and area under the receiver operating characteristic (AUROC) curve. In summary our approach can be seen as a robust alternative for outlier detection tasks.

Title: Group SELFIES: A Robust Fragment-Based Molecular String Representation. (arXiv:2211.13322v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2211.13322
Code URL: https://github.com/aspuru-guzik-group/group-selfies
Copy Paste: [[2211.13322] Group SELFIES: A Robust Fragment-Based Molecular String Representation](http://arxiv.org/abs/2211.13322) #robust
Summary:
We introduce Group SELFIES, a molecular string representation that leverages group tokens to represent functional groups or entire substructures while maintaining chemical robustness guarantees. Molecular string representations, such as SMILES and SELFIES, serve as the basis for molecular generation and optimization in chemical language models, deep generative models, and evolutionary methods. While SMILES and SELFIES leverage atomic representations, Group SELFIES builds on top of the chemical robustness guarantees of SELFIES by enabling group tokens, thereby creating additional flexibility to the representation. Moreover, the group tokens in Group SELFIES can take advantage of inductive biases of molecular fragments that capture meaningful chemical motifs. The advantages of capturing chemical motifs and flexibility are demonstrated in our experiments, which show that Group SELFIES improves distribution learning of common molecular datasets. Further experiments also show that random sampling of Group SELFIES strings improves the quality of generated molecules compared to regular SELFIES strings. Our open-source implementation of Group SELFIES is available online, which we hope will aid future research in molecular generation and optimization.

Title: Robustness Analysis of Deep Learning Models for Population Synthesis. (arXiv:2211.13339v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2211.13339
Code URL: null
Copy Paste: [[2211.13339] Robustness Analysis of Deep Learning Models for Population Synthesis](http://arxiv.org/abs/2211.13339) #robust
Summary:
Deep generative models have become useful for synthetic data generation, particularly population synthesis. The models implicitly learn the probability distribution of a dataset and can draw samples from a distribution. Several models have been proposed, but their performance is only tested on a single cross-sectional sample. The implementation of population synthesis on single datasets is seen as a drawback that needs further studies to explore the robustness of the models on multiple datasets. While comparing with the real data can increase trust and interpretability of the models, techniques to evaluate deep generative models' robustness for population synthesis remain underexplored. In this study, we present bootstrap confidence interval for the deep generative models, an approach that computes efficient confidence intervals for mean errors predictions to evaluate the robustness of the models to multiple datasets. Specifically, we adopt the tabular-based Composite Travel Generative Adversarial Network (CTGAN) and Variational Autoencoder (VAE), to estimate the distribution of the population, by generating agents that have tabular data using several samples over time from the same study area. The models are implemented on multiple travel diaries of Montreal Origin- Destination Survey of 2008, 2013, and 2018 and compare the predictive performance under varying sample sizes from multiple surveys. Results show that the predictive errors of CTGAN have narrower confidence intervals indicating its robustness to multiple datasets of the varying sample sizes when compared to VAE. Again, the evaluation of model robustness against varying sample size shows a minimal decrease in model performance with decrease in sample size. This study directly supports agent-based modelling by enabling finer synthetic generation of populations in a reliable environment.

Title: Lifting Weak Supervision To Structured Prediction. (arXiv:2211.13375v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2211.13375
Code URL: https://github.com/sprocketlab/ws-structured-prediction
Copy Paste: [[2211.13375] Lifting Weak Supervision To Structured Prediction](http://arxiv.org/abs/2211.13375) #robust
Summary:
Weak supervision (WS) is a rich set of techniques that produce pseudolabels by aggregating easily obtained but potentially noisy label estimates from a variety of sources. WS is theoretically well understood for binary classification, where simple approaches enable consistent estimation of pseudolabel noise rates. Using this result, it has been shown that downstream models trained on the pseudolabels have generalization guarantees nearly identical to those trained on clean labels. While this is exciting, users often wish to use WS for structured prediction, where the output space consists of more than a binary or multi-class label set: e.g. rankings, graphs, manifolds, and more. Do the favorable theoretical properties of WS for binary classification lift to this setting? We answer this question in the affirmative for a wide range of scenarios. For labels taking values in a finite metric space, we introduce techniques new to weak supervision based on pseudo-Euclidean embeddings and tensor decompositions, providing a nearly-consistent noise rate estimator. For labels in constant-curvature Riemannian manifolds, we introduce new invariants that also yield consistent noise rate estimation. In both cases, when using the resulting pseudolabels in concert with a flexible downstream model, we obtain generalization guarantees nearly identical to those for models trained on clean data. Several of our results, which can be viewed as robustness guarantees in structured prediction with noisy labels, may be of independent interest. Empirical evaluation validates our claims and shows the merits of the proposed method.

Title: Collaborative Training of Medical Artificial Intelligence Models with non-uniform Labels. (arXiv:2211.13606v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2211.13606
Code URL: null
Copy Paste: [[2211.13606] Collaborative Training of Medical Artificial Intelligence Models with non-uniform Labels](http://arxiv.org/abs/2211.13606) #robust
Summary:
Artificial intelligence (AI) methods are revolutionizing medical image analysis. However, robust AI models require large multi-site datasets for training. While multiple stakeholders have provided publicly available datasets, the ways in which these data are labeled differ widely. For example, one dataset of chest radiographs might contain labels denoting the presence of metastases in the lung, while another dataset of chest radiograph might focus on the presence of pneumonia. With conventional approaches, these data cannot be used together to train a single AI model. We propose a new framework that we call flexible federated learning (FFL) for collaborative training on such data. Using publicly available data of 695,000 chest radiographs from five institutions - each with differing labels - we demonstrate that large and heterogeneously labeled datasets can be used to train one big AI model with this framework. We find that models trained with FFL are superior to models that are trained on matching annotations only. This may pave the way for training of truly large-scale AI models that make efficient use of all existing data.

biometric

Title: Quality-Based Conditional Processing in Multi-Biometrics: Application to Sensor Interoperability. (arXiv:2211.13554v1 [cs.CR])

Paper URL: http://arxiv.org/abs/2211.13554
Code URL: null
Copy Paste: [[2211.13554] Quality-Based Conditional Processing in Multi-Biometrics: Application to Sensor Interoperability](http://arxiv.org/abs/2211.13554) #biometric
Summary:
As biometric technology is increasingly deployed, it will be common to replace parts of operational systems with newer designs. The cost and inconvenience of reacquiring enrolled users when a new vendor solution is incorporated makes this approach difficult and many applications will require to deal with information from different sources regularly. These interoperability problems can dramatically affect the performance of biometric systems and thus, they need to be overcome. Here, we describe and evaluate the ATVS-UAM fusion approach submitted to the quality-based evaluation of the 2007 BioSecure Multimodal Evaluation Campaign, whose aim was to compare fusion algorithms when biometric signals were generated using several biometric devices in mismatched conditions. Quality measures from the raw biometric data are available to allow system adjustment to changing quality conditions due to device changes. This system adjustment is referred to as quality-based conditional processing. The proposed fusion approach is based on linear logistic regression, in which fused scores tend to be log-likelihood-ratios. This allows the easy and efficient combination of matching scores from different devices assuming low dependence among modalities. In our system, quality information is used to switch between different system modules depending on the data source (the sensor in our case) and to reject channels with low quality data during the fusion. We compare our fusion approach to a set of rule-based fusion schemes over normalized scores. Results show that the proposed approach outperforms all the rule-based fusion schemes. We also show that with the quality-based channel rejection scheme, an overall improvement of 25% in the equal error rate is obtained.

Title: Fingerprint Image-Quality Estimation and its Application to Multialgorithm Verification. (arXiv:2211.13557v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2211.13557
Code URL: null
Copy Paste: [[2211.13557] Fingerprint Image-Quality Estimation and its Application to Multialgorithm Verification](http://arxiv.org/abs/2211.13557) #biometric
Summary:
Signal-quality awareness has been found to increase recognition rates and to support decisions in multisensor environments significantly. Nevertheless, automatic quality assessment is still an open issue. Here, we study the orientation tensor of fingerprint images to quantify signal impairments, such as noise, lack of structure, blur, with the help of symmetry descriptors. A strongly reduced reference is especially favorable in biometrics, but less information is not sufficient for the approach. This is also supported by numerous experiments involving a simpler quality estimator, a trained method (NFIQ), as well as the human perception of fingerprint quality on several public databases. Furthermore, quality measurements are extensively reused to adapt fusion parameters in a monomodal multialgorithm fingerprint recognition environment. In this study, several trained and nontrained score-level fusion schemes are investigated. A Bayes-based strategy for incorporating experts past performances and current quality conditions, a novel cascaded scheme for computational efficiency, besides simple fusion rules, is presented. The quantitative results favor quality awareness under all aspects, boosting recognition rates and fusing differently skilled experts efficiently as well as effectively (by training).

Title: AFR-Net: Attention-Driven Fingerprint Recognition Network. (arXiv:2211.13897v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2211.13897
Code URL: null
Copy Paste: [[2211.13897] AFR-Net: Attention-Driven Fingerprint Recognition Network](http://arxiv.org/abs/2211.13897) #biometric
Summary:
The use of vision transformers (ViT) in computer vision is increasing due to limited inductive biases (e.g., locality, weight sharing, etc.) and increased scalability compared to other deep learning methods (e.g., convolutional neural networks (CNN)). This has led to some initial studies on the use of ViT for biometric recognition, including fingerprint recognition. In this work, we improve on these initial studies for transformers in fingerprint recognition by i.) evaluating additional attention-based architectures in addition to vanilla ViT, ii.) scaling to larger and more diverse training and evaluation datasets, and iii.) combining the complimentary representations of attention-based and CNN-based embeddings for improved state-of-the-art (SOTA) fingerprint recognition for both authentication (1:1 comparisons) and identification (1:N comparisions). Our combined architecture, AFR-Net (Attention-Driven Fingerprint Recognition Network), outperforms several baseline transformer and CNN-based models, including a SOTA commercial fingerprint system, Verifinger v12.3, across many intra-sensor, cross-sensor (including contact to contactless), and latent to rolled fingerprint matching datasets. Additionally, we propose a realignment strategy using local embeddings extracted from intermediate feature maps within the networks to refine the global embeddings in low certainty situations, which boosts the overall recognition accuracy significantly for all the evaluations across each of the models. This realignment strategy requires no additional training and can be applied as a wrapper to any existing deep learning network (including attention-based, CNN-based, or both) to boost its performance.

steal

extraction

Title: Attention-based Feature Compression for CNN Inference Offloading in Edge Computing. (arXiv:2211.13745v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2211.13745
Code URL: null
Copy Paste: [[2211.13745] Attention-based Feature Compression for CNN Inference Offloading in Edge Computing](http://arxiv.org/abs/2211.13745) #extraction
Summary:
This paper studies the computational offloading of CNN inference in device-edge co-inference systems. Inspired by the emerging paradigm semantic communication, we propose a novel autoencoder-based CNN architecture (AECNN), for effective feature extraction at end-device. We design a feature compression module based on the channel attention method in CNN, to compress the intermediate data by selecting the most important features. To further reduce communication overhead, we can use entropy encoding to remove the statistical redundancy in the compressed data. At the receiver, we design a lightweight decoder to reconstruct the intermediate data through learning from the received compressed data to improve accuracy. To fasten the convergence, we use a step-by-step approach to train the neural networks obtained based on ResNet-50 architecture. Experimental results show that AECNN can compress the intermediate data by more than 256x with only about 4% accuracy loss, which outperforms the state-of-the-art work, BottleNet++. Compared to offloading inference task directly to edge server, AECNN can complete inference task earlier, in particular, under poor wireless channel condition, which highlights the effectiveness of AECNN in guaranteeing higher accuracy within time constraint.

Title: ReFace: Improving Clothes-Changing Re-Identification With Face Features. (arXiv:2211.13807v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2211.13807
Code URL: https://github.com/bar371/ReFace
Copy Paste: [[2211.13807] ReFace: Improving Clothes-Changing Re-Identification With Face Features](http://arxiv.org/abs/2211.13807) #extraction
Summary:
Person re-identification (ReID) has been an active research field for many years. Despite that, models addressing this problem tend to perform poorly when the task is to re-identify the same people over a prolonged time, due to appearance changes such as different clothes and hairstyles. In this work, we introduce a new method that takes full advantage of the ability of existing ReID models to extract appearance-related features and combines it with a face feature extraction model to achieve new state-of-the-art results, both on image-based and video-based benchmarks. Moreover, we show how our method could be used for an application in which multiple people of interest, under clothes-changing settings, should be re-identified given an unseen video and a limited amount of labeled data. We claim that current ReID benchmarks do not represent such real-world scenarios, and publish a new dataset, 42Street, based on a theater play as an example of such an application. We show that our proposed method outperforms existing models also on this dataset while using only pre-trained modules and without any further training.

Title: Detecting Entities in the Astrophysics Literature: A Comparison of Word-based and Span-based Entity Recognition Methods. (arXiv:2211.13819v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2211.13819
Code URL: null
Copy Paste: [[2211.13819] Detecting Entities in the Astrophysics Literature: A Comparison of Word-based and Span-based Entity Recognition Methods](http://arxiv.org/abs/2211.13819) #extraction
Summary:
Information Extraction from scientific literature can be challenging due to the highly specialised nature of such text. We describe our entity recognition methods developed as part of the DEAL (Detecting Entities in the Astrophysics Literature) shared task. The aim of the task is to build a system that can identify Named Entities in a dataset composed by scholarly articles from astrophysics literature. We planned our participation such that it enables us to conduct an empirical comparison between word-based tagging and span-based classification methods. When evaluated on two hidden test sets provided by the organizer, our best-performing submission achieved $F_1$ scores of 0.8307 (validation phase) and 0.7990 (testing phase).

Title: Learning with Silver Standard Data for Zero-shot Relation Extraction. (arXiv:2211.13883v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2211.13883
Code URL: null
Copy Paste: [[2211.13883] Learning with Silver Standard Data for Zero-shot Relation Extraction](http://arxiv.org/abs/2211.13883) #extraction
Summary:
The superior performance of supervised relation extraction (RE) methods heavily relies on a large amount of gold standard data. Recent zero-shot relation extraction methods converted the RE task to other NLP tasks and used off-the-shelf models of these NLP tasks to directly perform inference on the test data without using a large amount of RE annotation data. A potentially valuable by-product of these methods is the large-scale silver standard data. However, there is no further investigation on the use of potentially valuable silver standard data. In this paper, we propose to first detect a small amount of clean data from silver standard data and then use the selected clean data to finetune the pretrained model. We then use the finetuned model to infer relation types. We also propose a class-aware clean data detection module to consider class information when selecting clean data. The experimental results show that our method can outperform the baseline by 12% and 11% on TACRED and Wiki80 dataset in the zero-shot RE task. By using extra silver standard data of different distributions, the performance can be further improved.

Title: MUSIED: A Benchmark for Event Detection from Multi-Source Heterogeneous Informal Texts. (arXiv:2211.13896v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2211.13896
Code URL: https://github.com/myeclipse/musied
Copy Paste: [[2211.13896] MUSIED: A Benchmark for Event Detection from Multi-Source Heterogeneous Informal Texts](http://arxiv.org/abs/2211.13896) #extraction
Summary:
Event detection (ED) identifies and classifies event triggers from unstructured texts, serving as a fundamental task for information extraction. Despite the remarkable progress achieved in the past several years, most research efforts focus on detecting events from formal texts (e.g., news articles, Wikipedia documents, financial announcements). Moreover, the texts in each dataset are either from a single source or multiple yet relatively homogeneous sources. With massive amounts of user-generated text accumulating on the Web and inside enterprises, identifying meaningful events in these informal texts, usually from multiple heterogeneous sources, has become a problem of significant practical value. As a pioneering exploration that expands event detection to the scenarios involving informal and heterogeneous texts, we propose a new large-scale Chinese event detection dataset based on user reviews, text conversations, and phone conversations in a leading e-commerce platform for food service. We carefully investigate the proposed dataset's textual informality and multi-source heterogeneity characteristics by inspecting data samples quantitatively and qualitatively. Extensive experiments with state-of-the-art event detection methods verify the unique challenges posed by these characteristics, indicating that multi-source informal event detection remains an open problem and requires further efforts. Our benchmark and code are released at \url{https://github.com/myeclipse/MUSIED}.

membership infer

federate

Title: Knowledge-Aware Federated Active Learning with Non-IID Data. (arXiv:2211.13579v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2211.13579
Code URL: https://github.com/anonydoe/knowledge-aware-federated-active-learning-with-non-iid-data
Copy Paste: [[2211.13579] Knowledge-Aware Federated Active Learning with Non-IID Data](http://arxiv.org/abs/2211.13579) #federate
Summary:
Federated learning enables multiple decentralized clients to learn collaboratively without sharing the local training data. However, the expensive annotation cost to acquire data labels on local clients remains an obstacle in utilizing local data. In this paper, we propose a federated active learning paradigm to efficiently learn a global model with limited annotation budget while protecting data privacy in a decentralized learning way. The main challenge faced by federated active learning is the mismatch between the active sampling goal of the global model on the server and that of the asynchronous local clients. This becomes even more significant when data is distributed non-IID across local clients. To address the aforementioned challenge, we propose Knowledge-Aware Federated Active Learning (KAFAL), which consists of Knowledge-Specialized Active Sampling (KSAS) and Knowledge-Compensatory Federated Update (KCFU). KSAS is a novel active sampling method tailored for the federated active learning problem. It deals with the mismatch challenge by sampling actively based on the discrepancies between local and global models. KSAS intensifies specialized knowledge in local clients, ensuring the sampled data to be informative for both the local clients and the global model. KCFU, in the meantime, deals with the client heterogeneity caused by limited data and non-IID data distributions. It compensates for each client's ability in weak classes by the assistance of the global model. Extensive experiments and analyses are conducted to show the superiority of KSAS over the state-of-the-art active learning methods and the efficiency of KCFU under the federated active learning framework.

Title: Federated Learning Hyper-Parameter Tuning from a System Perspective. (arXiv:2211.13656v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2211.13656
Code URL: https://github.com/datasystech/fedtune
Copy Paste: [[2211.13656] Federated Learning Hyper-Parameter Tuning from a System Perspective](http://arxiv.org/abs/2211.13656) #federate
Summary:
Federated learning (FL) is a distributed model training paradigm that preserves clients' data privacy. It has gained tremendous attention from both academia and industry. FL hyper-parameters (e.g., the number of selected clients and the number of training passes) significantly affect the training overhead in terms of computation time, transmission time, computation load, and transmission load. However, the current practice of manually selecting FL hyper-parameters imposes a heavy burden on FL practitioners because applications have different training preferences. In this paper, we propose FedTune, an automatic FL hyper-parameter tuning algorithm tailored to applications' diverse system requirements in FL training. FedTune iteratively adjusts FL hyper-parameters during FL training and can be easily integrated into existing FL systems. Through extensive evaluations of FedTune for diverse applications and FL aggregation algorithms, we show that FedTune is lightweight and effective, achieving 8.48%-26.75% system overhead reduction compared to using fixed FL hyper-parameters. This paper assists FL practitioners in designing high-performance FL training solutions. The source code of FedTune is available at https://github.com/DataSysTech/FedTune.

fair

interpretability

Title: MEGAN: Multi-Explanation Graph Attention Network. (arXiv:2211.13236v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2211.13236
Code URL: https://github.com/awa59kst120df/graph_attention_student
Copy Paste: [[2211.13236] MEGAN: Multi-Explanation Graph Attention Network](http://arxiv.org/abs/2211.13236) #interpretability
Summary:
Explainable artificial intelligence (XAI) methods are expected to improve trust during human-AI interactions, provide tools for model analysis and extend human understanding of complex problems. Explanation-supervised training allows to improve explanation quality by training self-explaining XAI models on ground truth or human-generated explanations. However, existing explanation methods have limited expressiveness and interoperability due to the fact that only single explanations in form of node and edge importance are generated. To that end we propose the novel multi-explanation graph attention network (MEGAN). Our fully differentiable, attention-based model features multiple explanation channels, which can be chosen independently of the task specifications. We first validate our model on a synthetic graph regression dataset. We show that for the special single explanation case, our model significantly outperforms existing post-hoc and explanation-supervised baseline methods. Furthermore, we demonstrate significant advantages when using two explanations, both in quantitative explanation measures as well as in human interpretability. Finally, we demonstrate our model's capabilities on multiple real-world datasets. We find that our model produces sparse high-fidelity explanations consistent with human intuition about those tasks and at the same time matches state-of-the-art graph neural networks in predictive performance, indicating that explanations and accuracy are not necessarily a trade-off.

Title: Towards Interpretable Anomaly Detection via Invariant Rule Mining. (arXiv:2211.13577v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2211.13577
Code URL: null
Copy Paste: [[2211.13577] Towards Interpretable Anomaly Detection via Invariant Rule Mining](http://arxiv.org/abs/2211.13577) #interpretability
Summary:
In the research area of anomaly detection, novel and promising methods are frequently developed. However, most existing studies, especially those leveraging deep neural networks, exclusively focus on the detection task only and ignore the interpretability of the underlying models as well as their detection results. However, anomaly interpretation, which aims to provide explanation of why specific data instances are identified as anomalies, is an equally (if not more) important task in many real-world applications. In this work, we pursue highly interpretable anomaly detection via invariant rule mining. Specifically, we leverage decision tree learning and association rule mining to automatically generate invariant rules that are consistently satisfied by the underlying data generation process. The generated invariant rules can provide explicit explanation of anomaly detection results and thus are extremely useful for subsequent decision-making. Furthermore, our empirical evaluation shows that the proposed method can also achieve comparable performance in terms of AUC and partial AUC with popular anomaly detection models in various benchmark datasets.

Title: ML Interpretability: Simple Isn't Easy. (arXiv:2211.13617v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2211.13617
Code URL: null
Copy Paste: [[2211.13617] ML Interpretability: Simple Isn't Easy](http://arxiv.org/abs/2211.13617) #interpretability
Summary:
The interpretability of ML models is important, but it is not clear what it amounts to. So far, most philosophers have discussed the lack of interpretability of black-box models such as neural networks, and methods such as explainable AI that aim to make these models more transparent. The goal of this paper is to clarify the nature of interpretability by focussing on the other end of the 'interpretability spectrum'. The reasons why some models, linear models and decision trees, are highly interpretable will be examined, and also how more general models, MARS and GAM, retain some degree of interpretability. I find that while there is heterogeneity in how we gain interpretability, what interpretability is in particular cases can be explicated in a clear manner.

explainability

watermark

Title: Seeds Don't Lie: An Adaptive Watermarking Framework for Computer Vision Models. (arXiv:2211.13644v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2211.13644
Code URL: null
Copy Paste: [[2211.13644] Seeds Don't Lie: An Adaptive Watermarking Framework for Computer Vision Models](http://arxiv.org/abs/2211.13644) #watermark
Summary:
In recent years, various watermarking methods were suggested to detect computer vision models obtained illegitimately from their owners, however they fail to demonstrate satisfactory robustness against model extraction attacks. In this paper, we present an adaptive framework to watermark a protected model, leveraging the unique behavior present in the model due to a unique random seed initialized during the model training. This watermark is used to detect extracted models, which have the same unique behavior, indicating an unauthorized usage of the protected model's intellectual property (IP). First, we show how an initial seed for random number generation as part of model training produces distinct characteristics in the model's decision boundaries, which are inherited by extracted models and present in their decision boundaries, but aren't present in non-extracted models trained on the same data-set with a different seed. Based on our findings, we suggest the Robust Adaptive Watermarking (RAW) Framework, which utilizes the unique behavior present in the protected and extracted models to generate a watermark key-set and verification model. We show that the framework is robust to (1) unseen model extraction attacks, and (2) extracted models which undergo a blurring method (e.g., weight pruning). We evaluate the framework's robustness against a naive attacker (unaware that the model is watermarked), and an informed attacker (who employs blurring strategies to remove watermarked behavior from an extracted model), and achieve outstanding (i.e., >0.9) AUC values. Finally, we show that the framework is robust to model extraction attacks with different structure and/or architecture than the protected model.

Title: CycleGANWM: A CycleGAN watermarking method for ownership verification. (arXiv:2211.13737v1 [cs.CR])

Paper URL: http://arxiv.org/abs/2211.13737
Code URL: null
Copy Paste: [[2211.13737] CycleGANWM: A CycleGAN watermarking method for ownership verification](http://arxiv.org/abs/2211.13737) #watermark
Summary:
Due to the proliferation and widespread use of deep neural networks (DNN), their Intellectual Property Rights (IPR) protection has become increasingly important. This paper presents a novel model watermarking method for an unsupervised image-to-image translation (I2IT) networks, named CycleGAN, which leverage the image translation visual quality and watermark embedding. In this method, a watermark decoder is trained initially. Then the decoder is frozen and used to extract the watermark bits when training the CycleGAN watermarking model. The CycleGAN watermarking (CycleGANWM) is trained with specific loss functions and optimized to get a good performance on both I2IT task and watermark embedding. For watermark verification, this work uses statistical significance test to identify the ownership of the model from the extract watermark bits. We evaluate the robustness of the model against image post-processing and improve it by fine-tuning the model with adding data augmentation on the output images before extracting the watermark bits. We also carry out surrogate model attack under black-box access of the model. The experimental results prove that the proposed method is effective and robust to some image post-processing, and it is able to resist surrogate model attack.

diffusion

Title: HouseDiffusion: Vector Floorplan Generation via a Diffusion Model with Discrete and Continuous Denoising. (arXiv:2211.13287v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2211.13287
Code URL: null
Copy Paste: [[2211.13287] HouseDiffusion: Vector Floorplan Generation via a Diffusion Model with Discrete and Continuous Denoising](http://arxiv.org/abs/2211.13287) #diffusion
Summary:
The paper presents a novel approach for vector-floorplan generation via a diffusion model, which denoises 2D coordinates of room/door corners with two inference objectives: 1) a single-step noise as the continuous quantity to precisely invert the continuous forward process; and 2) the final 2D coordinate as the discrete quantity to establish geometric incident relationships such as parallelism, orthogonality, and corner-sharing. Our task is graph-conditioned floorplan generation, a common workflow in floorplan design. We represent a floorplan as 1D polygonal loops, each of which corresponds to a room or a door. Our diffusion model employs a Transformer architecture at the core, which controls the attention masks based on the input graph-constraint and directly generates vector-graphics floorplans via a discrete and continuous denoising process. We have evaluated our approach on RPLAN dataset. The proposed approach makes significant improvements in all the metrics against the state-of-the-art with significant margins, while being capable of generating non-Manhattan structures and controlling the exact number of corners per room. A project website with supplementary video and document is here https://aminshabani.github.io/housediffusion.

Title: Make-A-Story: Visual Memory Conditioned Consistent Story Generation. (arXiv:2211.13319v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2211.13319
Code URL: null
Copy Paste: [[2211.13319] Make-A-Story: Visual Memory Conditioned Consistent Story Generation](http://arxiv.org/abs/2211.13319) #diffusion
Summary:
There has been a recent explosion of impressive generative models that can produce high quality images (or videos) conditioned on text descriptions. However, all such approaches rely on conditional sentences that contain unambiguous descriptions of scenes and main actors in them. Therefore employing such models for more complex task of story visualization, where naturally references and co-references exist, and one requires to reason about when to maintain consistency of actors and backgrounds across frames/scenes, and when not to, based on story progression, remains a challenge. In this work, we address the aforementioned challenges and propose a novel autoregressive diffusion-based framework with a visual memory module that implicitly captures the actor and background context across the generated frames. Sentence-conditioned soft attention over the memories enables effective reference resolution and learns to maintain scene and actor consistency when needed. To validate the effectiveness of our approach, we extend the MUGEN dataset and introduce additional characters, backgrounds and referencing in multi-sentence storylines. Our experiments for story generation on the MUGEN and the FlintstonesSV dataset show that our method not only outperforms prior state-of-the-art in generating frames with high visual quality, which are consistent with the story, but also models appropriate correspondences between the characters and the background.

Title: Fast Sampling of Diffusion Models via Operator Learning. (arXiv:2211.13449v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2211.13449
Code URL: null
Copy Paste: [[2211.13449] Fast Sampling of Diffusion Models via Operator Learning](http://arxiv.org/abs/2211.13449) #diffusion
Summary:
Diffusion models have found widespread adoption in various areas. However, sampling from them is slow because it involves emulating a reverse process with hundreds-to-thousands of network evaluations. Inspired by the success of neural operators in accelerating differential equations solving, we approach this problem by solving the underlying neural differential equation from an operator learning perspective. We examine probability flow ODE trajectories in diffusion models and observe a compact energy spectrum that can be learned efficiently in Fourier space. With this insight, we propose diffusion Fourier neural operator (DFNO) with temporal convolution in Fourier space to parameterize the operator that maps initial condition to the solution trajectory, which is a continuous function in time. DFNO can be applied to any diffusion model and generate high-quality samples in one model forward call. Our method achieves the state-of-the-art FID of 4.72 on CIFAR-10 using only one model evaluation.

Title: Sketch-Guided Text-to-Image Diffusion Models. (arXiv:2211.13752v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2211.13752
Code URL: null
Copy Paste: [[2211.13752] Sketch-Guided Text-to-Image Diffusion Models](http://arxiv.org/abs/2211.13752) #diffusion
Summary:
Text-to-Image models have introduced a remarkable leap in the evolution of machine learning, demonstrating high-quality synthesis of images from a given text-prompt. However, these powerful pretrained models still lack control handles that can guide spatial properties of the synthesized images. In this work, we introduce a universal approach to guide a pretrained text-to-image diffusion model, with a spatial map from another domain (e.g., sketch) during inference time. Unlike previous works, our method does not require to train a dedicated model or a specialized encoder for the task. Our key idea is to train a Latent Guidance Predictor (LGP) - a small, per-pixel, Multi-Layer Perceptron (MLP) that maps latent features of noisy images to spatial maps, where the deep features are extracted from the core Denoising Diffusion Probabilistic Model (DDPM) network. The LGP is trained only on a few thousand images and constitutes a differential guiding map predictor, over which the loss is computed and propagated back to push the intermediate images to agree with the spatial map. The per-pixel training offers flexibility and locality which allows the technique to perform well on out-of-domain sketches, including free-hand style drawings. We take a particular focus on the sketch-to-image translation task, revealing a robust and expressive way to generate images that follow the guidance of a sketch of arbitrary style or domain. Project page: sketch-guided-diffusion.github.io

Title: DiffusionSDF: Conditional Generative Modeling of Signed Distance Functions. (arXiv:2211.13757v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2211.13757
Code URL: null
Copy Paste: [[2211.13757] DiffusionSDF: Conditional Generative Modeling of Signed Distance Functions](http://arxiv.org/abs/2211.13757) #diffusion
Summary:
Probabilistic diffusion models have achieved state-of-the-art results for image synthesis, inpainting, and text-to-image tasks. However, they are still in the early stages of generating complex 3D shapes. This work proposes DiffusionSDF, a generative model for shape completion, single-view reconstruction, and reconstruction of real-scanned point clouds. We use neural signed distance functions (SDFs) as our 3D representation to parameterize the geometry of various signals (e.g., point clouds, 2D images) through neural networks. Neural SDFs are implicit functions and diffusing them amounts to learning the reversal of their neural network weights, which we solve using a custom modulation module. Extensive experiments show that our method is capable of both realistic unconditional generation and conditional generation from partial inputs. This work expands the domain of diffusion models from learning 2D, explicit representations, to 3D, implicit representations.

Title: Design of Turing Systems with Physics-Informed Neural Networks. (arXiv:2211.13464v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2211.13464
Code URL: null
Copy Paste: [[2211.13464] Design of Turing Systems with Physics-Informed Neural Networks](http://arxiv.org/abs/2211.13464) #diffusion
Summary:
Reaction-diffusion (Turing) systems are fundamental to the formation of spatial patterns in nature and engineering. These systems are governed by a set of non-linear partial differential equations containing parameters that determine the rate of constituent diffusion and reaction. Critically, these parameters, such as diffusion coefficient, heavily influence the mode and type of the final pattern, and quantitative characterization and knowledge of these parameters can aid in bio-mimetic design or understanding of real-world systems. However, the use of numerical methods to infer these parameters can be difficult and computationally expensive. Typically, adjoint solvers may be used, but they are frequently unstable for very non-linear systems. Alternatively, massive amounts of iterative forward simulations are used to find the best match, but this is extremely effortful. Recently, physics-informed neural networks have been proposed as a means for data-driven discovery of partial differential equations, and have seen success in various applications. Thus, we investigate the use of physics-informed neural networks as a tool to infer key parameters in reaction-diffusion systems in the steady-state for scientific discovery or design. Our proof-of-concept results show that the method is able to infer parameters for different pattern modes and types with errors of less than 10\%. In addition, the stochastic nature of this method can be exploited to provide multiple parameter alternatives to the desired pattern, highlighting the versatility of this method for bio-mimetic design. This work thus demonstrates the utility of physics-informed neural networks for inverse parameter inference of reaction-diffusion systems to enhance scientific discovery and design.