[[2207.08022] Monitoring Vegetation From Space at Extremely Fine Resolutions via Coarsely-Supervised Smooth U-Net](http://arxiv.org/abs/2207.08022)
Monitoring vegetation productivity at extremely fine resolutions is valuable for real-world agricultural applications, such as detecting crop stress and providing early warning of food insecurity. Solar-Induced Chlorophyll Fluorescence (SIF) provides a promising way to directly measure plant productivity from space. However, satellite SIF observations are only available at a coarse spatial resolution, making it impossible to monitor how individual crop types or farms are doing. This poses a challenging coarsely-supervised regression (or downscaling) task; at training time, we only have SIF labels at a coarse resolution (3km), but we want to predict SIF at much finer spatial resolutions (e.g. 30m, a 100x increase). We also have additional fine-resolution input features, but the relationship between these features and SIF is unknown. To address this, we propose Coarsely-Supervised Smooth U-Net (CS-SUNet), a novel method for this coarse supervision setting. CS-SUNet combines the expressive power of deep convolutional networks with novel regularization methods based on prior knowledge (such as a smoothness loss) that are crucial for preventing overfitting. Experiments show that CS-SUNet resolves fine-grained variations in SIF more accurately than existing methods.
[[2207.07738] A Comprehensive Survey on the Cyber-Security of Smart Grids: Cyber-Attacks, Detection, Countermeasure Techniques, and Future Directions](http://arxiv.org/abs/2207.07738)
One of the significant challenges that smart grid networks face is cyber-security. Several studies have been conducted to highlight those security challenges. However, the majority of these surveys classify attacks based on the security requirements, confidentiality, integrity, and availability, without taking into consideration the accountability requirement. In addition, some of these surveys focused on the Transmission Control Protocol/Internet Protocol (TCP/IP) model, which does not differentiate between the application, session, and presentation and the data link and physical layers of the Open System Interconnection (OSI) model. In this survey paper, we provide a classification of attacks based on the OSI model and discuss in more detail the cyber-attacks that can target the different layers of smart grid networks communication. We also propose new classifications for the detection and countermeasure techniques and describe existing techniques under each category. Finally, we discuss challenges and future research directions.
[[2207.08105] Mobile Security for the modern CEO: Attacks, Mitigations, and Future Trends](http://arxiv.org/abs/2207.08105)
Todays world is digital, global, and interconnected and mobile devices are at the heart of modern communications in business, politics, and civil society. However, cyber threats are an omnipresent reality in our hyper-connected world. The world economic forum ranks cyber threats consistently among the global top security risks. Attacks on mobile devices grow yearly in volume and magnitude causing severe damage. This paper offers a comprehensive overview of modern mobile attacks categorized into malware, phishing, communication, supply chain, physical, and authentication attacks, including a section on mitigations and limitations. It also provides security design tips to secure the mobile setup and general recommendations to prevent the successful execution of an incoming attack. The last section highlights future technology trends and how those will impact and change the mobile security landscape in the future.
[[2207.08109] Security Evaluation of Compressible Image Encryption for Privacy-Preserving Image Classification against Ciphertext-only Attacks](http://arxiv.org/abs/2207.08109)
The security of learnable image encryption schemes for image classification using deep neural networks against several attacks has been discussed. On the other hand, block scrambling image encryption using the vision transformer has been proposed, which applies to lossless compression methods such as JPEG standard by dividing an image into permuted blocks. Although robustness of the block scrambling image encryption against jigsaw puzzle solver attacks that utilize a correlation among the blocks has been evaluated under the condition of a large number of encrypted blocks, the security of encrypted images with a small number of blocks has never been evaluated. In this paper, the security of the block scrambling image encryption against ciphertext-only attacks is evaluated by using jigsaw puzzle solver attacks.
[[2207.07697] POET: Training Neural Networks on Tiny Devices with Integrated Rematerialization and Paging](http://arxiv.org/abs/2207.07697)
Fine-tuning models on edge devices like mobile phones would enable privacy-preserving personalization over sensitive data. However, edge training has historically been limited to relatively small models with simple architectures because training is both memory and energy intensive. We present POET, an algorithm to enable training large neural networks on memory-scarce battery-operated edge devices. POET jointly optimizes the integrated search search spaces of rematerialization and paging, two algorithms to reduce the memory consumption of backpropagation. Given a memory budget and a run-time constraint, we formulate a mixed-integer linear program (MILP) for energy-optimal training. Our approach enables training significantly larger models on embedded devices while reducing energy consumption while not modifying mathematical correctness of backpropagation. We demonstrate that it is possible to fine-tune both ResNet-18 and BERT within the memory constraints of a Cortex-M class embedded device while outperforming current edge training methods in energy efficiency. POET is an open-source project available at https://github.com/ShishirPatil/poet
[[2207.08124] Source-free Unsupervised Domain Adaptation for Blind Image Quality Assessment](http://arxiv.org/abs/2207.08124)
Existing learning-based methods for blind image quality assessment (BIQA) are heavily dependent on large amounts of annotated training data, and usually suffer from a severe performance degradation when encountering the domain/distribution shift problem. Thanks to the development of unsupervised domain adaptation (UDA), some works attempt to transfer the knowledge from a label-sufficient source domain to a label-free target domain under domain shift with UDA. However, it requires the coexistence of source and target data, which might be impractical for source data due to the privacy or storage issues. In this paper, we take the first step towards the source-free unsupervised domain adaptation (SFUDA) in a simple yet efficient manner for BIQA to tackle the domain shift without access to the source data. Specifically, we cast the quality assessment task as a rating distribution prediction problem. Based on the intrinsic properties of BIQA, we present a group of well-designed self-supervised objectives to guide the adaptation of the BN affine parameters towards the target domain. Among them, minimizing the prediction entropy and maximizing the batch prediction diversity aim to encourage more confident results while avoiding the trivial solution. Besides, based on the observation that the IQA rating distribution of single image follows the Gaussian distribution, we apply Gaussian regularization to the predicted rating distribution to make it more consistent with the nature of human scoring. Extensive experimental results under cross-domain scenarios demonstrated the effectiveness of our proposed method to mitigate the domain shift.
[[2207.07816] Sotto Voce: Federated Speech Recognition with Differential Privacy Guarantees](http://arxiv.org/abs/2207.07816)
Speech data is expensive to collect, and incredibly sensitive to its sources. It is often the case that organizations independently collect small datasets for their own use, but often these are not performant for the demands of machine learning. Organizations could pool these datasets together and jointly build a strong ASR system; sharing data in the clear, however, comes with tremendous risk, in terms of intellectual property loss as well as loss of privacy of the individuals who exist in the dataset. In this paper, we offer a potential solution for learning an ML model across multiple organizations where we can provide mathematical guarantees limiting privacy loss. We use a Federated Learning approach built on a strong foundation of Differential Privacy techniques. We apply these to a senone classification prototype and demonstrate that the model improves with the addition of private data while still respecting privacy.
[[2207.07721] FLIP: A Utility Preserving Privacy Mechanism for Time Series](http://arxiv.org/abs/2207.07721)
Guaranteeing privacy in released data is an important goal for data-producing agencies. There has been extensive research on developing suitable privacy mechanisms in recent years. Particularly notable is the idea of noise addition with the guarantee of differential privacy. There are, however, concerns about compromising data utility when very stringent privacy mechanisms are applied. Such compromises can be quite stark in correlated data, such as time series data. Adding white noise to a stochastic process may significantly change the correlation structure, a facet of the process that is essential to optimal prediction. We propose the use of all-pass filtering as a privacy mechanism for regularly sampled time series data, showing that this procedure preserves utility while also providing sufficient privacy guarantees to entity-level time series.
[[2207.07779] DeTrust-FL: Privacy-Preserving Federated Learning in Decentralized Trust Setting](http://arxiv.org/abs/2207.07779)
Federated learning has emerged as a privacy-preserving machine learning approach where multiple parties can train a single model without sharing their raw training data. Federated learning typically requires the utilization of multi-party computation techniques to provide strong privacy guarantees by ensuring that an untrusted or curious aggregator cannot obtain isolated replies from parties involved in the training process, thereby preventing potential inference attacks. Until recently, it was thought that some of these secure aggregation techniques were sufficient to fully protect against inference attacks coming from a curious aggregator. However, recent research has demonstrated that a curious aggregator can successfully launch a disaggregation attack to learn information about model updates of a target party. This paper presents DeTrust-FL, an efficient privacy-preserving federated learning framework for addressing the lack of transparency that enables isolation attacks, such as disaggregation attacks, during secure aggregation by assuring that parties' model updates are included in the aggregated model in a private and secure manner. DeTrust-FL proposes a decentralized trust consensus mechanism and incorporates a recently proposed decentralized functional encryption (FE) scheme in which all parties agree on a participation matrix before collaboratively generating decryption key fragments, thereby gaining control and trust over the secure aggregation process in a decentralized setting. Our experimental evaluation demonstrates that DeTrust-FL outperforms state-of-the-art FE-based secure multi-party aggregation solutions in terms of training time and reduces the volume of data transferred. In contrast to existing approaches, this is achieved without creating any trust dependency on external trusted entities.
[[2207.07964] A Parallel Privacy-Preserving Shortest Path Protocol from a Path Algebra Problem](http://arxiv.org/abs/2207.07964)
In this paper, we present a secure multiparty computation (SMC) protocol for single-source shortest distances (SSSD) in undirected graphs, where the location of edges is public, but their length is private. The protocol works in the Arithmetic Black Box (ABB) model on top of the separator tree of the graph, achieving good time complexity if the subgraphs of the graph have small separators (which is the case for e.g. planar graphs); the achievable parallelism is significantly higher than that of classical SSSD algorithms implemented on top of an ABB.
We implement our protocol on top of the Sharemind MPC platform, and perform extensive benchmarking over different network environments. We compare our algorithm against the baseline picked from classical algorithms - privacy-preserving Bellman-Ford algorithm (with public edges).
[[2207.07793] Towards the Desirable Decision Boundary by Moderate-Margin Adversarial Training](http://arxiv.org/abs/2207.07793)
Adversarial training, as one of the most effective defense methods against adversarial attacks, tends to learn an inclusive decision boundary to increase the robustness of deep learning models. However, due to the large and unnecessary increase in the margin along adversarial directions, adversarial training causes heavy cross-over between natural examples and adversarial examples, which is not conducive to balancing the trade-off between robustness and natural accuracy. In this paper, we propose a novel adversarial training scheme to achieve a better trade-off between robustness and natural accuracy. It aims to learn a moderate-inclusive decision boundary, which means that the margins of natural examples under the decision boundary are moderate. We call this scheme Moderate-Margin Adversarial Training (MMAT), which generates finer-grained adversarial examples to mitigate the cross-over problem. We also take advantage of logits from a teacher model that has been well-trained to guide the learning of our model. Finally, MMAT achieves high natural accuracy and robustness under both black-box and white-box attacks. On SVHN, for example, state-of-the-art robustness and natural accuracy are achieved.
[[2207.07803] Masked Spatial-Spectral Autoencoders Are Excellent Hyperspectral Defenders](http://arxiv.org/abs/2207.07803)
Deep learning methodology contributes a lot to the development of hyperspectral image (HSI) analysis community. However, it also makes HSI analysis systems vulnerable to adversarial attacks. To this end, we propose a masked spatial-spectral autoencoder (MSSA) in this paper under self-supervised learning theory, for enhancing the robustness of HSI analysis systems. First, a masked sequence attention learning module is conducted to promote the inherent robustness of HSI analysis systems along spectral channel. Then, we develop a graph convolutional network with learnable graph structure to establish global pixel-wise combinations.In this way, the attack effect would be dispersed by all the related pixels among each combination, and a better defense performance is achievable in spatial aspect.Finally, to improve the defense transferability and address the problem of limited labelled samples, MSSA employs spectra reconstruction as a pretext task and fits the datasets in a self-supervised manner.Comprehensive experiments over three benchmarks verify the effectiveness of MSSA in comparison with the state-of-the-art hyperspectral classification methods and representative adversarial defense strategies.
[[2207.08089] Threat Model-Agnostic Adversarial Defense using Diffusion Models](http://arxiv.org/abs/2207.08089)
Deep Neural Networks (DNNs) are highly sensitive to imperceptible malicious perturbations, known as adversarial attacks. Following the discovery of this vulnerability in real-world imaging and vision applications, the associated safety concerns have attracted vast research attention, and many defense techniques have been developed. Most of these defense methods rely on adversarial training (AT) -- training the classification network on images perturbed according to a specific threat model, which defines the magnitude of the allowed modification. Although AT leads to promising results, training on a specific threat model fails to generalize to other types of perturbations. A different approach utilizes a preprocessing step to remove the adversarial perturbation from the attacked image. In this work, we follow the latter path and aim to develop a technique that leads to robust classifiers across various realizations of threat models. To this end, we harness the recent advances in stochastic generative modeling, and means to leverage these for sampling from conditional distributions. Our defense relies on an addition of Gaussian i.i.d noise to the attacked image, followed by a pretrained diffusion process -- an architecture that performs a stochastic iterative process over a denoising network, yielding a high perceptual quality denoised outcome. The obtained robustness with this stochastic preprocessing step is validated through extensive experiments on the CIFAR-10 dataset, showing that our method outperforms the leading defense methods under various threat models.
[[2207.07932] Semi-Supervised Keypoint Detector and Descriptor for Retinal Image Matching](http://arxiv.org/abs/2207.07932)
For retinal image matching (RIM), we propose SuperRetina, the first end-to-end method with jointly trainable keypoint detector and descriptor. SuperRetina is trained in a novel semi-supervised manner. A small set of (nearly 100) images are incompletely labeled and used to supervise the network to detect keypoints on the vascular tree. To attack the incompleteness of manual labeling, we propose Progressive Keypoint Expansion to enrich the keypoint labels at each training epoch. By utilizing a keypoint-based improved triplet loss as its description loss, SuperRetina produces highly discriminative descriptors at full input image size. Extensive experiments on multiple real-world datasets justify the viability of SuperRetina. Even with manual labeling replaced by auto labeling and thus making the training process fully manual-annotation free, SuperRetina compares favorably against a number of strong baselines for two RIM tasks, i.e. image registration and identity verification. SuperRetina will be open source.
[[2207.08044] DIMBA: Discretely Masked Black-Box Attack in Single Object Tracking](http://arxiv.org/abs/2207.08044)
The adversarial attack can force a CNN-based model to produce an incorrect output by craftily manipulating human-imperceptible input. Exploring such perturbations can help us gain a deeper understanding of the vulnerability of neural networks, and provide robustness to deep learning against miscellaneous adversaries. Despite extensive studies focusing on the robustness of image, audio, and NLP, works on adversarial examples of visual object tracking -- especially in a black-box manner -- are quite lacking. In this paper, we propose a novel adversarial attack method to generate noises for single object tracking under black-box settings, where perturbations are merely added on initial frames of tracking sequences, which is difficult to be noticed from the perspective of a whole video clip. Specifically, we divide our algorithm into three components and exploit reinforcement learning for localizing important frame patches precisely while reducing unnecessary computational queries overhead. Compared to existing techniques, our method requires fewer queries on initialized frames of a video to manipulate competitive or even better attack performance. We test our algorithm in both long-term and short-term datasets, including OTB100, VOT2018, UAV123, and LaSOT. Extensive experiments demonstrate the effectiveness of our method on three mainstream types of trackers: discrimination, Siamese-based, and reinforcement learning-based trackers.
[[2207.07941] MixTailor: Mixed Gradient Aggregation for Robust Learning Against Tailored Attacks](http://arxiv.org/abs/2207.07941)
Implementations of SGD on distributed and multi-GPU systems creates new vulnerabilities, which can be identified and misused by one or more adversarial agents. Recently, it has been shown that well-known Byzantine-resilient gradient aggregation schemes are indeed vulnerable to informed attackers that can tailor the attacks (Fang et al., 2020; Xie et al., 2020b). We introduce MixTailor, a scheme based on randomization of the aggregation strategies that makes it impossible for the attacker to be fully informed. Deterministic schemes can be integrated into MixTailor on the fly without introducing any additional hyperparameters. Randomization decreases the capability of a powerful adversary to tailor its attacks, while the resulting randomized aggregation scheme is still competitive in terms of performance. For both iid and non-iid settings, we establish almost sure convergence guarantees that are both stronger and more general than those available in the literature. Our empirical studies across various datasets, attacks, and settings, validate our hypothesis and show that MixTailor successfully defends when well-known Byzantine-tolerant schemes fail.
[[2207.07996] Optimal Strategic Mining Against Cryptographic Self-Selection in Proof-of-Stake](http://arxiv.org/abs/2207.07996)
Cryptographic Self-Selection is a subroutine used to select a leader for modern proof-of-stake consensus protocols, such as Algorand. In cryptographic self-selection, each round $r$ has a seed $Q_r$. In round $r$, each account owner is asked to digitally sign $Q_r$, hash their digital signature to produce a credential, and then broadcast this credential to the entire network. A publicly-known function scores each credential in a manner so that the distribution of the lowest scoring credential is identical to the distribution of stake owned by each account. The user who broadcasts the lowest-scoring credential is the leader for round $r$, and their credential becomes the seed $Q_{r+1}$. Such protocols leave open the possibility of a selfish-mining style attack: a user who owns multiple accounts that each produce low-scoring credentials in round $r$ can selectively choose which ones to broadcast in order to influence the seed for round $r+1$. Indeed, the user can pre-compute their credentials for round $r+1$ for each potential seed, and broadcast only the credential (among those with a low enough score to be the leader) that produces the most favorable seed.
We consider an adversary who wishes to maximize the expected fraction of rounds in which an account they own is the leader. We show such an adversary always benefits from deviating from the intended protocol, regardless of the fraction of the stake controlled. We characterize the optimal strategy; first by proving the existence of optimal positive recurrent strategies whenever the adversary owns last than $38\%$ of the stake. Then, we provide a Markov Decision Process formulation to compute the optimal strategy.
[[2207.07742] Human keypoint detection for close proximity human-robot interaction](http://arxiv.org/abs/2207.07742)
We study the performance of state-of-the-art human keypoint detectors in the context of close proximity human-robot interaction. The detection in this scenario is specific in that only a subset of body parts such as hands and torso are in the field of view. In particular, (i) we survey existing datasets with human pose annotation from the perspective of close proximity images and prepare and make publicly available a new Human in Close Proximity (HiCP) dataset; (ii) we quantitatively and qualitatively compare state-of-the-art human whole-body 2D keypoint detection methods (OpenPose, MMPose, AlphaPose, Detectron2) on this dataset; (iii) since accurate detection of hands and fingers is critical in applications with handovers, we evaluate the performance of the MediaPipe hand detector; (iv) we deploy the algorithms on a humanoid robot with an RGB-D camera on its head and evaluate the performance in 3D human keypoint detection. A motion capture system is used as reference.
The best performing whole-body keypoint detectors in close proximity were MMPose and AlphaPose, but both had difficulty with finger detection. Thus, we propose a combination of MMPose or AlphaPose for the body and MediaPipe for the hands in a single framework providing the most accurate and robust detection. We also analyse the failure modes of individual detectors -- for example, to what extent the absence of the head of the person in the image degrades performance. Finally, we demonstrate the framework in a scenario where a humanoid robot interacting with a person uses the detected 3D keypoints for whole-body avoidance maneuvers.
[[2207.07797] CARBEN: Composite Adversarial Robustness Benchmark](http://arxiv.org/abs/2207.07797)
Prior literature on adversarial attack methods has mainly focused on attacking with and defending against a single threat model, e.g., perturbations bounded in Lp ball. However, multiple threat models can be combined into composite perturbations. One such approach, composite adversarial attack (CAA), not only expands the perturbable space of the image, but also may be overlooked by current modes of robustness evaluation. This paper demonstrates how CAA's attack order affects the resulting image, and provides real-time inferences of different models, which will facilitate users' configuration of the parameters of the attack level and their rapid evaluation of model prediction. A leaderboard to benchmark adversarial robustness against CAA is also introduced.
[[2207.07869] CA-SpaceNet: Counterfactual Analysis for 6D Pose Estimation in Space](http://arxiv.org/abs/2207.07869)
Reliable and stable 6D pose estimation of uncooperative space objects plays an essential role in on-orbit servicing and debris removal missions. Considering that the pose estimator is sensitive to background interference, this paper proposes a counterfactual analysis framework named CASpaceNet to complete robust 6D pose estimation of the spaceborne targets under complicated background. Specifically, conventional methods are adopted to extract the features of the whole image in the factual case. In the counterfactual case, a non-existent image without the target but only the background is imagined. Side effect caused by background interference is reduced by counterfactual analysis, which leads to unbiased prediction in final results. In addition, we also carry out lowbit-width quantization for CA-SpaceNet and deploy part of the framework to a Processing-In-Memory (PIM) accelerator on FPGA. Qualitative and quantitative results demonstrate the effectiveness and efficiency of our proposed method. To our best knowledge, this paper applies causal inference and network quantization to the 6D pose estimation of space-borne targets for the first time. The code is available at https://github.com/Shunli-Wang/CA-SpaceNet.
[[2207.07896] Cross Vision-RF Gait Re-identification with Low-cost RGB-D Cameras and mmWave Radars](http://arxiv.org/abs/2207.07896)
Human identification is a key requirement for many applications in everyday life, such as personalized services, automatic surveillance, continuous authentication, and contact tracing during pandemics, etc. This work studies the problem of cross-modal human re-identification (ReID), in response to the regular human movements across camera-allowed regions (e.g., streets) and camera-restricted regions (e.g., offices) deployed with heterogeneous sensors. By leveraging the emerging low-cost RGB-D cameras and mmWave radars, we propose the first-of-its-kind vision-RF system for cross-modal multi-person ReID at the same time. Firstly, to address the fundamental inter-modality discrepancy, we propose a novel signature synthesis algorithm based on the observed specular reflection model of a human body. Secondly, an effective cross-modal deep metric learning model is introduced to deal with interference caused by unsynchronized data across radars and cameras. Through extensive experiments in both indoor and outdoor environments, we demonstrate that our proposed system is able to achieve ~92.5% top-1 accuracy and ~97.5% top-5 accuracy out of 56 volunteers. We also show that our proposed system is able to robustly reidentify subjects even when multiple subjects are present in the sensors' field of view.
[[2207.07898] SPSN: Superpixel Prototype Sampling Network for RGB-D Salient Object Detection](http://arxiv.org/abs/2207.07898)
RGB-D salient object detection (SOD) has been in the spotlight recently because it is an important preprocessing operation for various vision tasks. However, despite advances in deep learning-based methods, RGB-D SOD is still challenging due to the large domain gap between an RGB image and the depth map and low-quality depth maps. To solve this problem, we propose a novel superpixel prototype sampling network (SPSN) architecture. The proposed model splits the input RGB image and depth map into component superpixels to generate component prototypes. We design a prototype sampling network so that the network only samples prototypes corresponding to salient objects. In addition, we propose a reliance selection module to recognize the quality of each RGB and depth feature map and adaptively weight them in proportion to their reliability. The proposed method makes the model robust to inconsistencies between RGB images and depth maps and eliminates the influence of non-salient objects. Our method is evaluated on five popular datasets, achieving state-of-the-art performance. We prove the effectiveness of the proposed method through comparative experiments.
[[2207.07900] Mutual Adaptive Reasoning for Monocular 3D Multi-Person Pose Estimation](http://arxiv.org/abs/2207.07900)
Inter-person occlusion and depth ambiguity make estimating the 3D poses of monocular multiple persons as camera-centric coordinates a challenging problem. Typical top-down frameworks suffer from high computational redundancy with an additional detection stage. By contrast, the bottom-up methods enjoy low computational costs as they are less affected by the number of humans. However, most existing bottom-up methods treat camera-centric 3D human pose estimation as two unrelated subtasks: 2.5D pose estimation and camera-centric depth estimation. In this paper, we propose a unified model that leverages the mutual benefits of both these subtasks. Within the framework, a robust structured 2.5D pose estimation is designed to recognize inter-person occlusion based on depth relationships. Additionally, we develop an end-to-end geometry-aware depth reasoning method that exploits the mutual benefits of both 2.5D pose and camera-centric root depths. This method first uses 2.5D pose and geometry information to infer camera-centric root depths in a forward pass, and then exploits the root depths to further improve representation learning of 2.5D pose estimation in a backward pass. Further, we designed an adaptive fusion scheme that leverages both visual perception and body geometry to alleviate inherent depth ambiguity issues. Extensive experiments demonstrate the superiority of our proposed model over a wide range of bottom-up methods. Our accuracy is even competitive with top-down counterparts. Notably, our model runs much faster than existing bottom-up and top-down methods.
[[2207.07913] Dual-branch Hybrid Learning Network for Unbiased Scene Graph Generation](http://arxiv.org/abs/2207.07913)
The current studies of Scene Graph Generation (SGG) focus on solving the long-tailed problem for generating unbiased scene graphs. However, most de-biasing methods overemphasize the tail predicates and underestimate head ones throughout training, thereby wrecking the representation ability of head predicate features. Furthermore, these impaired features from head predicates harm the learning of tail predicates. In fact, the inference of tail predicates heavily depends on the general patterns learned from head ones, e.g., "standing on" depends on "on". Thus, these de-biasing SGG methods can neither achieve excellent performance on tail predicates nor satisfying behaviors on head ones. To address this issue, we propose a Dual-branch Hybrid Learning network (DHL) to take care of both head predicates and tail ones for SGG, including a Coarse-grained Learning Branch (CLB) and a Fine-grained Learning Branch (FLB). Specifically, the CLB is responsible for learning expertise and robust features of head predicates, while the FLB is expected to predict informative tail predicates. Furthermore, DHL is equipped with a Branch Curriculum Schedule (BCS) to make the two branches work well together. Experiments show that our approach achieves a new state-of-the-art performance on VG and GQA datasets and makes a trade-off between the performance of tail predicates and head ones. Moreover, extensive experiments on two downstream tasks (i.e., Image Captioning and Sentence-to-Graph Retrieval) further verify the generalization and practicability of our method.
[[2207.08034] Progress and limitations of deep networks to recognize objects in unusual poses](http://arxiv.org/abs/2207.08034)
Deep networks should be robust to rare events if they are to be successfully deployed in high-stakes real-world applications (e.g., self-driving cars). Here we study the capability of deep networks to recognize objects in unusual poses. We create a synthetic dataset of images of objects in unusual orientations, and evaluate the robustness of a collection of 38 recent and competitive deep networks for image classification. We show that classifying these images is still a challenge for all networks tested, with an average accuracy drop of 29.5% compared to when the objects are presented upright. This brittleness is largely unaffected by various network design choices, such as training losses (e.g., supervised vs. self-supervised), architectures (e.g., convolutional networks vs. transformers), dataset modalities (e.g., images vs. image-text pairs), and data-augmentation schemes. However, networks trained on very large datasets substantially outperform others, with the best network tested$\unicode{x2014}$Noisy Student EfficentNet-L2 trained on JFT-300M$\unicode{x2014}$showing a relatively small accuracy drop of only 14.5% on unusual poses. Nevertheless, a visual inspection of the failures of Noisy Student reveals a remaining gap in robustness with the human visual system. Furthermore, combining multiple object transformations$\unicode{x2014}$3D-rotations and scaling$\unicode{x2014}$further degrades the performance of all networks. Altogether, our results provide another measurement of the robustness of deep networks that is important to consider when using them in the real world. Code and datasets are available at https://github.com/amro-kamal/ObjectPose.
[[2207.08134] Editing Out-of-domain GAN Inversion via Differential Activations](http://arxiv.org/abs/2207.08134)
Despite the demonstrated editing capacity in the latent space of a pretrained GAN model, inverting real-world images is stuck in a dilemma that the reconstruction cannot be faithful to the original input. The main reason for this is that the distributions between training and real-world data are misaligned, and because of that, it is unstable of GAN inversion for real image editing. In this paper, we propose a novel GAN prior based editing framework to tackle the out-of-domain inversion problem with a composition-decomposition paradigm. In particular, during the phase of composition, we introduce a differential activation module for detecting semantic changes from a global perspective, \ie, the relative gap between the features of edited and unedited images. With the aid of the generated Diff-CAM mask, a coarse reconstruction can intuitively be composited by the paired original and edited images. In this way, the attribute-irrelevant regions can be survived in almost whole, while the quality of such an intermediate result is still limited by an unavoidable ghosting effect. Consequently, in the decomposition phase, we further present a GAN prior based deghosting network for separating the final fine edited image from the coarse reconstruction. Extensive experiments exhibit superiorities over the state-of-the-art methods, in terms of qualitative and quantitative evaluations. The robustness and flexibility of our method is also validated on both scenarios of single attribute and multi-attribute manipulations.
[[2207.07706] Probing Semantic Grounding in Language Models of Code with Representational Similarity Analysis](http://arxiv.org/abs/2207.07706)
Representational Similarity Analysis is a method from cognitive neuroscience, which helps in comparing representations from two different sources of data. In this paper, we propose using Representational Similarity Analysis to probe the semantic grounding in language models of code. We probe representations from the CodeBERT model for semantic grounding by using the data from the IBM CodeNet dataset. Through our experiments, we show that current pre-training methods do not induce semantic grounding in language models of code, and instead focus on optimizing form-based patterns. We also show that even a little amount of fine-tuning on semantically relevant tasks increases the semantic grounding in CodeBERT significantly. Our ablations with the input modality to the CodeBERT model show that using bimodal inputs (code and natural language) over unimodal inputs (only code) gives better semantic grounding and sample efficiency during semantic fine-tuning. Finally, our experiments with semantic perturbations in code reveal that CodeBERT is able to robustly distinguish between semantically correct and incorrect code.
[[2207.08099] Aspect-specific Context Modeling for Aspect-based Sentiment Analysis](http://arxiv.org/abs/2207.08099)
Aspect-based sentiment analysis (ABSA) aims at predicting sentiment polarity (SC) or extracting opinion span (OE) expressed towards a given aspect. Previous work in ABSA mostly relies on rather complicated aspect-specific feature induction. Recently, pretrained language models (PLMs), e.g., BERT, have been used as context modeling layers to simplify the feature induction structures and achieve state-of-the-art performance. However, such PLM-based context modeling can be not that aspect-specific. Therefore, a key question is left under-explored: how the aspect-specific context can be better modeled through PLMs? To answer the question, we attempt to enhance aspect-specific context modeling with PLM in a non-intrusive manner. We propose three aspect-specific input transformations, namely aspect companion, aspect prompt, and aspect marker. Informed by these transformations, non-intrusive aspect-specific PLMs can be achieved to promote the PLM to pay more attention to the aspect-specific context in a sentence. Additionally, we craft an adversarial benchmark for ABSA (advABSA) to see how aspect-specific modeling can impact model robustness. Extensive experimental results on standard and adversarial benchmarks for SC and OE demonstrate the effectiveness and robustness of the proposed method, yielding new state-of-the-art performance on OE and competitive performance on SC.
[[2207.08002] EEG2Vec: Learning Affective EEG Representations via Variational Autoencoders](http://arxiv.org/abs/2207.08002)
There is a growing need for sparse representational formats of human affective states that can be utilized in scenarios with limited computational memory resources. We explore whether representing neural data, in response to emotional stimuli, in a latent vector space can serve to both predict emotional states as well as generate synthetic EEG data that are participant- and/or emotion-specific. We propose a conditional variational autoencoder based framework, EEG2Vec, to learn generative-discriminative representations from EEG data. Experimental results on affective EEG recording datasets demonstrate that our model is suitable for unsupervised EEG modeling, classification of three distinct emotion categories (positive, neutral, negative) based on the latent representation achieves a robust performance of 68.49%, and generated synthetic EEG sequences resemble real EEG data inputs to particularly reconstruct low-frequency signal components. Our work advances areas where affective EEG representations can be useful in e.g., generating artificial (labeled) training data or alleviating manual feature extraction, and provide efficiency for memory constrained edge computing applications.
[[2207.08130] Discover Life Skills for Planning with Bandits via Observing and Learning How the World Works](http://arxiv.org/abs/2207.08130)
We propose a novel approach for planning agents to compose abstract skills via observing and learning from historical interactions with the world. Our framework operates in a Markov state-space model via a set of actions under unknown pre-conditions. We formulate skills as high-level abstract policies that propose action plans based on the current state. Each policy learns new plans by observing the states' transitions while the agent interacts with the world. Such an approach automatically learns new plans to achieve specific intended effects, but the success of such plans is often dependent on the states in which they are applicable. Therefore, we formulate the evaluation of such plans as infinitely many multi-armed bandit problems, where we balance the allocation of resources on evaluating the success probability of existing arms and exploring new options. The result is a planner capable of automatically learning robust high-level skills under a noisy environment; such skills implicitly learn the action pre-condition without explicit knowledge. We show that this planning approach is experimentally very competitive in high-dimensional state space domains.
[[2207.07690] Support Vector Machines with the Hard-Margin Loss: Optimal Training via Combinatorial Benders' Cuts](http://arxiv.org/abs/2207.07690)
The classical hinge-loss support vector machines (SVMs) model is sensitive to outlier observations due to the unboundedness of its loss function. To circumvent this issue, recent studies have focused on non-convex loss functions, such as the hard-margin loss, which associates a constant penalty to any misclassified or within-margin sample. Applying this loss function yields much-needed robustness for critical applications but it also leads to an NP-hard model that makes training difficult, since current exact optimization algorithms show limited scalability, whereas heuristics are not able to find high-quality solutions consistently. Against this background, we propose new integer programming strategies that significantly improve our ability to train the hard-margin SVM model to global optimality. We introduce an iterative sampling and decomposition approach, in which smaller subproblems are used to separate combinatorial Benders' cuts. Those cuts, used within a branch-and-cut algorithm, permit to converge much more quickly towards a global optimum. Through extensive numerical analyses on classical benchmark data sets, our solution algorithm solves, for the first time, 117 new data sets to optimality and achieves a reduction of 50% in the average optimality gap for the hardest datasets of the benchmark.
[[2207.07790] BCRLSP: An Offline Reinforcement Learning Framework for Sequential Targeted Promotion](http://arxiv.org/abs/2207.07790)
We utilize an offline reinforcement learning (RL) model for sequential targeted promotion in the presence of budget constraints in a real-world business environment. In our application, the mobile app aims to boost customer retention by sending cash bonuses to customers and control the costs of such cash bonuses during each time period. To achieve the multi-task goal, we propose the Budget Constrained Reinforcement Learning for Sequential Promotion (BCRLSP) framework to determine the value of cash bonuses to be sent to users. We first find out the target policy and the associated Q-values that maximizes the user retention rate using an RL model. A linear programming (LP) model is then added to satisfy the constraints of promotion costs. We solve the LP problem by maximizing the Q-values of actions learned from the RL model given the budget constraints. During deployment, we combine the offline RL model with the LP model to generate a robust policy under the budget constraints. Using both online and offline experiments, we demonstrate the efficacy of our approach by showing that BCRLSP achieves a higher long-term customer retention rate and a lower cost than various baselines. Taking advantage of the near real-time cost control method, the proposed framework can easily adapt to data with a noisy behavioral policy and/or meet flexible budget constraints.
[[2207.07822] Adaptive Sketches for Robust Regression with Importance Sampling](http://arxiv.org/abs/2207.07822)
We introduce data structures for solving robust regression through stochastic gradient descent (SGD) by sampling gradients with probability proportional to their norm, i.e., importance sampling. Although SGD is widely used for large scale machine learning, it is well-known for possibly experiencing slow convergence rates due to the high variance from uniform sampling. On the other hand, importance sampling can significantly decrease the variance but is usually difficult to implement because computing the sampling probabilities requires additional passes over the data, in which case standard gradient descent (GD) could be used instead. In this paper, we introduce an algorithm that approximately samples $T$ gradients of dimension $d$ from nearly the optimal importance sampling distribution for a robust regression problem over $n$ rows. Thus our algorithm effectively runs $T$ steps of SGD with importance sampling while using sublinear space and just making a single pass over the data. Our techniques also extend to performing importance sampling for second-order optimization.
[[2207.07888] SizeShiftReg: a Regularization Method for Improving Size-Generalization in Graph Neural Networks](http://arxiv.org/abs/2207.07888)
In the past few years, graph neural networks (GNNs) have become the de facto model of choice for graph classification. While, from the theoretical viewpoint, most GNNs can operate on graphs of any size, it is empirically observed that their classification performance degrades when they are applied on graphs with sizes that differ from those in the training data. Previous works have tried to tackle this issue in graph classification by providing the model with inductive biases derived from assumptions on the generative process of the graphs, or by requiring access to graphs from the test domain. The first strategy is tied to the use of ad-hoc models and to the quality of the assumptions made on the generative process, leaving open the question of how to improve the performance of generic GNN models in general settings. On the other hand, the second strategy can be applied to any GNN, but requires access to information that is not always easy to obtain. In this work we consider the scenario in which we only have access to the training data, and we propose a regularization strategy that can be applied to any GNN to improve its generalization capabilities from smaller to larger graphs without requiring access to the test data. Our regularization is based on the idea of simulating a shift in the size of the training graphs using coarsening techniques, and enforcing the model to be robust to such a shift. Experimental results on standard datasets show that popular GNN models, trained on the 50% smallest graphs in the dataset and tested on the 10% largest graphs, obtain performance improvements of up to 30% when trained with our regularization strategy.
[[2207.07908] Multiscale Causal Structure Learning](http://arxiv.org/abs/2207.07908)
The inference of causal structures from observed data plays a key role in unveiling the underlying dynamics of the system. This paper exposes a novel method, named Multiscale-Causal Structure Learning (MS-CASTLE), to estimate the structure of linear causal relationships occurring at different time scales. Differently from existing approaches, MS-CASTLE takes explicitly into account instantaneous and lagged inter-relations between multiple time series, represented at different scales, hinging on stationary wavelet transform and non-convex optimization. MS-CASTLE incorporates, as a special case, a single-scale version named SS-CASTLE, which compares favorably in terms of computational efficiency, performance and robustness with respect to the state of the art onto synthetic data. We used MS-CASTLE to study the multiscale causal structure of the risk of 15 global equity markets, during covid-19 pandemic, illustrating how MS-CASTLE can extract meaningful information thanks to its multiscale analysis, outperforming SS-CASTLE. We found that the most persistent and strongest interactions occur at mid-term time resolutions. Moreover, we identified the stock markets that drive the risk during the considered period: Brazil, Canada and Italy. The proposed approach can be exploited by financial investors who, depending to their investment horizon, can manage the risk within equity portfolios from a causal perspective.
[[2207.07795] RCRN: Real-world Character Image Restoration Network via Skeleton Extraction](http://arxiv.org/abs/2207.07795)
Constructing high-quality character image datasets is challenging because real-world images are often affected by image degradation. There are limitations when applying current image restoration methods to such real-world character images, since (i) the categories of noise in character images are different from those in general images; (ii) real-world character images usually contain more complex image degradation, e.g., mixed noise at different noise levels. To address these problems, we propose a real-world character restoration network (RCRN) to effectively restore degraded character images, where character skeleton information and scale-ensemble feature extraction are utilized to obtain better restoration performance. The proposed method consists of a skeleton extractor (SENet) and a character image restorer (CiRNet). SENet aims to preserve the structural consistency of the character and normalize complex noise. Then, CiRNet reconstructs clean images from degraded character images and their skeletons. Due to the lack of benchmarks for real-world character image restoration, we constructed a dataset containing 1,606 character images with real-world degradation to evaluate the validity of the proposed method. The experimental results demonstrate that RCRN outperforms state-of-the-art methods quantitatively and qualitatively.
[[2207.07740] Knowledge Representation in Digital Agriculture: A Step Towards Standardised Model](http://arxiv.org/abs/2207.07740)
In recent years, data science has evolved significantly. Data analysis and mining processes become routines in all sectors of the economy where datasets are available. Vast data repositories have been collected, curated, stored, and used for extracting knowledge. And this is becoming commonplace. Subsequently, we extract a large amount of knowledge, either directly from the data or through experts in the given domain. The challenge now is how to exploit all this large amount of knowledge that is previously known for efficient decision-making processes. Until recently, much of the knowledge gained through a number of years of research is stored in static knowledge bases or ontologies, while more diverse and dynamic knowledge acquired from data mining studies is not centrally and consistently managed. In this research, we propose a novel model called ontology-based knowledge map to represent and store the results (knowledge) of data mining in crop farming to build, maintain, and enrich the process of knowledge discovery. The proposed model consists of six main sets: concepts, attributes, relations, transformations, instances, and states. This model is dynamic and facilitates the access, updates, and exploitation of the knowledge at any time. This paper also proposes an architecture for handling this knowledge-based model. The system architecture includes knowledge modelling, extraction, assessment, publishing, and exploitation. This system has been implemented and used in agriculture for crop management and monitoring. It is proven to be very effective and promising for its extension to other domains.
[[2207.08041] Personalized PCA: Decoupling Shared and Unique Features](http://arxiv.org/abs/2207.08041)
In this paper, we tackle a significant challenge in PCA: heterogeneity. When data are collected from different sources with heterogeneous trends while still sharing some congruency, it is critical to extract shared knowledge while retaining unique features of each source. To this end, we propose personalized PCA (PerPCA), which uses mutually orthogonal global and local principal components to encode both unique and shared features. We show that, under mild conditions, both unique and shared features can be identified and recovered by a constrained optimization problem, even if the covariance matrices are immensely different. Also, we design a fully federated algorithm inspired by distributed Stiefel gradient descent to solve the problem. The algorithm introduces a new group of operations called generalized retractions to handle orthogonality constraints, and only requires global PCs to be shared across sources. We prove the linear convergence of the algorithm under suitable assumptions. Comprehensive numerical experiments highlight PerPCA's superior performance in feature extraction and prediction from heterogeneous datasets. As a systematic approach to decouple shared and unique features from heterogeneous datasets, PerPCA finds applications in several tasks including video segmentation, topic extraction, and distributed clustering.
[[2207.07700] Introducing Federated Learning into Internet of Things ecosystems -- preliminary considerations](http://arxiv.org/abs/2207.07700)
Federated learning (FL) was proposed to facilitate the training of models in a distributed environment. It supports the protection of (local) data privacy and uses local resources for model training. Until now, the majority of research has been devoted to "core issues", such as adaptation of machine learning algorithms to FL, data privacy protection, or dealing with the effects of uneven data distribution between clients. This contribution is anchored in a practical use case, where FL is to be actually deployed within an Internet of Things ecosystem. Hence, somewhat different issues that need to be considered, beyond popular considerations found in the literature, are identified. Moreover, an architecture that enables the building of flexible, and adaptable, FL solutions is introduced.
[[2207.08057] Balancing Accuracy and Integrity for Reconfigurable Intelligent Surface-aided Over-the-Air Federated Learning](http://arxiv.org/abs/2207.08057)
Over-the-air federated learning (AirFL) allows devices to train a learning model in parallel and synchronize their local models using over-the-air computation. The integrity of AirFL is vulnerable due to the obscurity of the local models aggregated over-the-air. This paper presents a novel framework to balance the accuracy and integrity of AirFL, where multi-antenna devices and base station (BS) are jointly optimized with a reconfigurable intelligent surface (RIS). The key contributions include a new and non-trivial problem jointly considering the model accuracy and integrity of AirFL, and a new framework that transforms the problem into tractable subproblems. Under perfect channel state information (CSI), the new framework minimizes the aggregated model's distortion and retains the local models' recoverability by optimizing the transmit beamformers of the devices, the receive beamformers of the BS, and the RIS configuration in an alternating manner. Under imperfect CSI, the new framework delivers a robust design of the beamformers and RIS configuration to combat non-negligible channel estimation errors. As corroborated experimentally, the novel framework can achieve comparable accuracy to the ideal FL while preserving local model recoverability under perfect CSI, and improve the accuracy when the number of receive antennas is small or moderate under imperfect CSI.
[[2207.07953] Level Set-Based Camera Pose Estimation From Multiple 2D/3D Ellipse-Ellipsoid Correspondences](http://arxiv.org/abs/2207.07953)
In this paper, we propose an object-based camera pose estimation from a single RGB image and a pre-built map of objects, represented with ellipsoidal models. We show that contrary to point correspondences, the definition of a cost function characterizing the projection of a 3D object onto a 2D object detection is not straightforward. We develop an ellipse-ellipse cost based on level sets sampling, demonstrate its nice properties for handling partially visible objects and compare its performance with other common metrics. Finally, we show that the use of a predictive uncertainty on the detected ellipses allows a fair weighting of the contribution of the correspondences which improves the computed pose. The code is released at https://gitlab.inria.fr/tangram/level-set-based-camera-pose-estimation.
[[2207.07723] More Data Can Lead Us Astray: Active Data Acquisition in the Presence of Label Bias](http://arxiv.org/abs/2207.07723)
An increased awareness concerning risks of algorithmic bias has driven a surge of efforts around bias mitigation strategies. A vast majority of the proposed approaches fall under one of two categories: (1) imposing algorithmic fairness constraints on predictive models, and (2) collecting additional training samples. Most recently and at the intersection of these two categories, methods that propose active learning under fairness constraints have been developed. However, proposed bias mitigation strategies typically overlook the bias presented in the observed labels. In this work, we study fairness considerations of active data collection strategies in the presence of label bias. We first present an overview of different types of label bias in the context of supervised learning systems. We then empirically show that, when overlooking label bias, collecting more data can aggravate bias, and imposing fairness constraints that rely on the observed labels in the data collection process may not address the problem. Our results illustrate the unintended consequences of deploying a model that attempts to mitigate a single type of bias while neglecting others, emphasizing the importance of explicitly differentiating between the types of bias that fairness-aware algorithms aim to address, and highlighting the risks of neglecting label bias during data collection.
[[2207.08001] SVGraph: Learning Semantic Graphs from Instructional Videos](http://arxiv.org/abs/2207.08001)
In this work, we focus on generating graphical representations of noisy, instructional videos for video understanding. We propose a self-supervised, interpretable approach that does not require any annotations for graphical representations, which would be expensive and time consuming to collect. We attempt to overcome "black box" learning limitations by presenting Semantic Video Graph or SVGraph, a multi-modal approach that utilizes narrations for semantic interpretability of the learned graphs. SVGraph 1) relies on the agreement between multiple modalities to learn a unified graphical structure with the help of cross-modal attention and 2) assigns semantic interpretation with the help of Semantic-Assignment, which captures the semantics from video narration. We perform experiments on multiple datasets and demonstrate the interpretability of SVGraph in semantic graph learning.
[[2207.08046] MDM:Visual Explanations for Neural Networks via Multiple Dynamic Mask](http://arxiv.org/abs/2207.08046)
The active region lookup of a neural network tells us which regions the neural network focuses on when making a decision, which gives us a basis for interpretability when the neural network makes a classification decision. We propose an algorithm Multiple Dynamic Mask(MDM), which is a general saliency graph query method with interpretability of the inference process. Its proposal is based on an assumption: when a picture is input to a neural network that has been trained, the activation features related to classification will affect the classification results of the neural network, and the features unrelated to classification will hardly affect the classification results of the network. MDM: A learning-based end-to-end algorithm for finding regions of interest for neural network classification. It has the following advantages: 1. It has the interpretability of the reasoning process. 2. It is universal, it can be used for any neural network and does not depend on the internal structure of the neural network. 3. The search performance is better. Because the algorithm is based on learning to generate masks and has the ability to adapt to different data and networks, the performance is better than the method proposed in the previous paper. For the MDM saliency map search algorithm, we experimentally compared the performance indicators of various saliency map search methods and the MDM with ResNet and DenseNet as the trained neural networks. The search effect performance of the MDM reached the state of the art. We applied the MDM to the interpretable neural network ProtoPNet and XProtoNet, which improved the interpretability of the model and the prototype search performance. We visualize the performance of convolutional neural architecture and Transformer architecture on saliency map search.
[[2207.07769] Anomalous behaviour in loss-gradient based interpretability methods](http://arxiv.org/abs/2207.07769)
Loss-gradients are used to interpret the decision making process of deep learning models. In this work, we evaluate loss-gradient based attribution methods by occluding parts of the input and comparing the performance of the occluded input to the original input. We observe that the occluded input has better performance than the original across the test dataset under certain conditions. Similar behaviour is observed in sound and image recognition tasks. We explore different loss-gradient attribution methods, occlusion levels and replacement values to explain the phenomenon of performance improvement under occlusion.
[[2207.07781] Subgroup Discovery in Unstructured Data](http://arxiv.org/abs/2207.07781)
Subgroup discovery is a descriptive and exploratory data mining technique to identify subgroups in a population that exhibit interesting behavior with respect to a variable of interest. Subgroup discovery has numerous applications in knowledge discovery and hypothesis generation, yet it remains inapplicable for unstructured, high-dimensional data such as images. This is because subgroup discovery algorithms rely on defining descriptive rules based on (attribute, value) pairs, however, in unstructured data, an attribute is not well defined. Even in cases where the notion of attribute intuitively exists in the data, such as a pixel in an image, due to the high dimensionality of the data, these attributes are not informative enough to be used in a rule. In this paper, we introduce the subgroup-aware variational autoencoder, a novel variational autoencoder that learns a representation of unstructured data which leads to subgroups with higher quality. Our experimental results demonstrate the effectiveness of the method at learning subgroups with high quality while supporting the interpretability of the concepts.
[[2207.07972] Certified Neural Network Watermarks with Randomized Smoothing](http://arxiv.org/abs/2207.07972)
Watermarking is a commonly used strategy to protect creators' rights to digital images, videos and audio. Recently, watermarking methods have been extended to deep learning models -- in principle, the watermark should be preserved when an adversary tries to copy the model. However, in practice, watermarks can often be removed by an intelligent adversary. Several papers have proposed watermarking methods that claim to be empirically resistant to different types of removal attacks, but these new techniques often fail in the face of new or better-tuned adversaries. In this paper, we propose a certifiable watermarking method. Using the randomized smoothing technique proposed in Chiang et al., we show that our watermark is guaranteed to be unremovable unless the model parameters are changed by more than a certain l2 threshold. In addition to being certifiable, our watermark is also empirically more robust compared to previous watermarking methods. Our experiments can be reproduced with code at https://github.com/arpitbansal297/Certified_Watermarks
[[2207.08050] Repairing Systematic Outliers by Learning Clean Subspaces in VAEs](http://arxiv.org/abs/2207.08050)
Data cleaning often comprises outlier detection and data repair. Systematic errors result from nearly deterministic transformations that occur repeatedly in the data, e.g. specific image pixels being set to default values or watermarks. Consequently, models with enough capacity easily overfit to these errors, making detection and repair difficult. Seeing as a systematic outlier is a combination of patterns of a clean instance and systematic error patterns, our main insight is that inliers can be modelled by a smaller representation (subspace) in a model than outliers. By exploiting this, we propose Clean Subspace Variational Autoencoder (CLSVAE), a novel semi-supervised model for detection and automated repair of systematic errors. The main idea is to partition the latent space and model inlier and outlier patterns separately. CLSVAE is effective with much less labelled data compared to previous related models, often with less than 2% of the data. We provide experiments using three image datasets in scenarios with different levels of corruption and labelled set sizes, comparing to relevant baselines. CLSVAE provides superior repairs without human intervention, e.g. with just 0.25% of labelled data we see a relative error decrease of 58% compared to the closest baseline.