[[2209.05687] PSAQ-ViT V2: Towards Accurate and General Data-Free Quantization for Vision Transformers](http://arxiv.org/abs/2209.05687)
Data-free quantization can potentially address data privacy and security concerns in model compression, and thus has been widely investigated. Recently, PSAQ-ViT designs a relative value metric, patch similarity, to generate data from pre-trained vision transformers (ViTs), achieving the first attempt at data-free quantization for ViTs. In this paper, we propose PSAQ-ViT V2, a more accurate and general data-free quantization framework for ViTs, built on top of PSAQ-ViT. More specifically, following the patch similarity metric in PSAQ-ViT, we introduce an adaptive teacher-student strategy, which facilitates the constant cyclic evolution of the generated samples and the quantized model (student) in a competitive and interactive fashion under the supervision of the full-precision model (teacher), thus significantly improving the accuracy of the quantized model. Moreover, without the auxiliary category guidance, we employ the task- and model-independent prior information, making the general-purpose scheme compatible with a broad range of vision tasks and models. Extensive experiments are conducted on various models on image classification, object detection, and semantic segmentation tasks, and PSAQ-ViT V2, with the naive quantization strategy and without access to real-world data, consistently achieves competitive results, showing potential as a powerful baseline on data-free quantization for ViTs. For instance, with Swin-S as the (backbone) model, 8-bit quantization reaches 82.13 top-1 accuracy on ImageNet, 50.9 box AP and 44.1 mask AP on COCO, and 47.2 mIoU on ADE20K. We hope that accurate and general PSAQ-ViT V2 can serve as a potential and practice solution in real-world applications involving sensitive data. Code will be released and merged at: https://github.com/zkkli/PSAQ-ViT.
[[2209.05856] Just Noticeable Difference Modeling for Face Recognition System](http://arxiv.org/abs/2209.05856)
High-quality face images are required to guarantee the stability and reliability of automatic face recognition (FR) systems in surveillance and security scenarios. However, a massive amount of face data is usually compressed before being analyzed due to limitations on transmission or storage. The compressed images may lose the powerful identity information, resulting in the performance degradation of the FR system. Herein, we make the first attempt to study just noticeable difference (JND) for the FR system, which can be defined as the maximum distortion that the FR system cannot notice. More specifically, we establish a JND dataset including 3530 original images and 137,670 compressed images generated by advanced reference encoding/decoding software based on the Versatile Video Coding (VVC) standard (VTM-15.0). Subsequently, we develop a novel JND prediction model to directly infer JND images for the FR system. In particular, in order to maximum redundancy removal without impairment of robust identity information, we apply the encoder with multiple feature extraction and attention-based feature decomposition modules to progressively decompose face features into two uncorrelated components, i.e., identity and residual features, via self-supervised learning. Then, the residual feature is fed into the decoder to generate the residual map. Finally, the predicted JND map is obtained by subtracting the residual map from the original image. Experimental results have demonstrated that the proposed model achieves higher accuracy of JND map prediction compared with the state-of-the-art JND models, and is capable of saving more bits while maintaining the performance of the FR system compared with VTM-15.0.
[[2209.05911] Computer vision based vehicle tracking as a complementary and scalable approach to RFID tagging](http://arxiv.org/abs/2209.05911)
Logging of incoming/outgoing vehicles serves as a piece of critical information for root-cause analysis to combat security breach incidents in various sensitive organizations. RFID tagging hampers the scalability of vehicle tracking solutions on both logistics as well as technical fronts. For instance, requiring each incoming vehicle(departmental or private) to be RFID tagged is a severe constraint and coupling video analytics with RFID to detect abnormal vehicle movement is non-trivial. We leverage publicly available implementations of computer vision algorithms to develop an interpretable vehicle tracking algorithm using finite-state machine formalism. The state-machine consumes input from the cascaded object detection and optical character recognition(OCR) models for state transitions. We evaluated the proposed method on 75 video clips of 285 vehicles from our system deployment site. We observed that the detection rate is most affected by the speed and the type of vehicle. The highest detection rate is achieved when the vehicle movement is restricted to follow a movement restrictions(SOP) at the checkpoint similar to RFID tagging. We further analyzed 700 vehicle tracking predictions on live-data and identified that the majority of vehicle number prediction errors are due to illegible-text, image-blur, text occlusion and out-of-vocab letters in vehicle numbers. Towards system deployment and performance enhancement, we expect our ongoing system monitoring to provide evidences to establish a higher vehicle-throughput SOP at the security checkpoint as well as to drive the fine-tuning of the deployed computer-vision models and the state-machine to establish the proposed approach as a promising alternative to RFID-tagging.
[[2209.05572] Bao-Enclave: Virtualization-based Enclaves for Arm](http://arxiv.org/abs/2209.05572)
General-purpose operating systems (GPOS), such as Linux, encompass several million lines of code. Statistically, a larger code base inevitably leads to a higher number of potential vulnerabilities and inherently a more vulnerable system. To minimize the impact of vulnerabilities in GPOS, it has become common to implement security-sensitive programs outside the domain of the GPOS, i.e., in a Trusted Execution Environment (TEE). Arm TrustZone is the de-facto technology for implementing TEEs in Arm devices. However, over the last decade, TEEs have been successfully attacked hundreds of times. Unfortunately, these attacks have been possible due to the presence of several architectural and implementation flaws in TrustZone-based TEEs. In this paper, we propose Bao-Enclave, a virtualization-based solution that enables OEMs to remove security functionality from the TEE and move them into normal world isolated environments, protected from potentially malicious OSes, in the form of lightweight virtual machines (VMs). We evaluate Bao-Enclave on real hardware platforms and find out that Bao-Enclave may improve the performance of security-sensitive workloads by up to 4.8x, while significantly simplifying the TEE software TCB.
[[2209.05579] Intrusion Detection Systems Using Support Vector Machines on the KDDCUP'99 and NSL-KDD Datasets: A Comprehensive Survey](http://arxiv.org/abs/2209.05579)
With the growing rates of cyber-attacks and cyber espionage, the need for better and more powerful intrusion detection systems (IDS) is even more warranted nowadays. The basic task of an IDS is to act as the first line of defense, in detecting attacks on the internet. As intrusion tactics from intruders become more sophisticated and difficult to detect, researchers have started to apply novel Machine Learning (ML) techniques to effectively detect intruders and hence preserve internet users' information and overall trust in the entire internet network security. Over the last decade, there has been an explosion of research on intrusion detection techniques based on ML and Deep Learning (DL) architectures on various cyber security-based datasets such as the DARPA, KDDCUP'99, NSL-KDD, CAIDA, CTU-13, UNSW-NB15. In this research, we review contemporary literature and provide a comprehensive survey of different types of intrusion detection technique that applies Support Vector Machines (SVMs) algorithms as a classifier. We focus only on studies that have been evaluated on the two most widely used datasets in cybersecurity namely: the KDDCUP'99 and the NSL-KDD datasets. We provide a summary of each method, identifying the role of the SVMs classifier, and all other algorithms involved in the studies. Furthermore, we present a critical review of each method, in tabular form, highlighting the performance measures, strengths, and limitations of each of the methods surveyed.
[[2209.05799] A Neural Network-based SAT-Resilient Obfuscation Towards Enhanced Logic Locking](http://arxiv.org/abs/2209.05799)
Logic obfuscation is introduced as a pivotal defense against multiple hardware threats on Integrated Circuits (ICs), including reverse engineering (RE) and intellectual property (IP) theft. The effectiveness of logic obfuscation is challenged by the recently introduced Boolean satisfiability (SAT) attack and its variants. A plethora of countermeasures has also been proposed to thwart the SAT attack. Irrespective of the implemented defense against SAT attacks, large power, performance, and area overheads are indispensable. In contrast, we propose a cognitive solution: a neural network-based unSAT clause translator, SATConda, that incurs a minimal area and power overhead while preserving the original functionality with impenetrable security. SATConda is incubated with an unSAT clause generator that translates the existing conjunctive normal form (CNF) through minimal perturbations such as the inclusion of pair of inverters or buffers or adding a new lightweight unSAT block depending on the provided CNF. For efficient unSAT clause generation, SATConda is equipped with a multi-layer neural network that first learns the dependencies of features (literals and clauses), followed by a long-short-term-memory (LSTM) network to validate and backpropagate the SAT-hardness for better learning and translation. Our proposed SATConda is evaluated on ISCAS85 and ISCAS89 benchmarks and is seen to defend against multiple state-of-the-art successfully SAT attacks devised for hardware RE. In addition, we also evaluate our proposed SATCondas empirical performance against MiniSAT, Lingeling and Glucose SAT solvers that form the base for numerous existing deobfuscation SAT attacks.
[[2209.05872] Smart Contract Vulnerability Detection Technique: A Survey](http://arxiv.org/abs/2209.05872)
Smart contract, one of the most successful applications of blockchain, is taking the world by storm, playing an essential role in the blockchain ecosystem. However, frequent smart contract security incidents not only result in tremendous economic losses but also destroy the blockchain-based credit system. The security and reliability of smart contracts thus gain extensive attention from researchers worldwide. In this survey, we first summarize the common types and typical cases of smart contract vulnerabilities from three levels, i.e., Solidity code layer, EVM execution layer, and Block dependency layer. Further, we review the research progress of smart contract vulnerability detection and classify existing counterparts into five categories, i.e., formal verification, symbolic execution, fuzzing detection, intermediate representation, and deep learning. Empirically, we take 300 real-world smart contracts deployed on Ethereum as the test samples and compare the representative methods in terms of accuracy, F1-Score, and average detection time. Finally, we discuss the challenges in the field of smart contract vulnerability detection and combine with the deep learning technology to look forward to future research directions.
[[2209.06056] An Extensive Study of Residential Proxies in China](http://arxiv.org/abs/2209.06056)
We carry out the first in-depth characterization of residential proxies (RESIPs) in China, for which little is studied in previous works. Our study is made possible through a semantic-based classifier to automatically capture RESIP services. In addition to the classifier, new techniques have also been identified to capture RESIPs without interacting with and relaying traffic through RESIP services, which can significantly lower the cost and thus allow a continuous monitoring of RESIPs. Our RESIP service classifier has achieved a good performance with a recall of 99.7% and a precision of 97.6% in 10-fold cross validation. Applying the classifier has identified 399 RESIP services, a much larger set compared to 38 RESIP services collected in all previous works. Our effort of RESIP capturing lead to a collection of 9,077,278 RESIP IPs (51.36% are located in China), 96.70% of which are not covered in publicly available RESIP datasets. An extensive measurement on RESIPs and their services has uncovered a set of interesting findings as well as several security implications. Especially, 80.05% RESIP IPs located in China have sourced at least one malicious traffic flows during 2021, resulting in 52-million malicious traffic flows in total. And RESIPs have also been observed in corporation networks of 559 sensitive organizations including government agencies, education institutions and enterprises. Also, 3,232,698 China RESIP IPs have opened at least one TCP/UDP ports for accepting relaying requests, which incurs non-negligible security risks to the local network of RESIPs. Besides, 91% China RESIP IPs are of a lifetime less than 10 days while most China RESIP services show up a crest-trough pattern in terms of the daily active RESIPs across time.
[[2209.05724] Defense against Privacy Leakage in Federated Learning](http://arxiv.org/abs/2209.05724)
Federated Learning (FL) provides a promising distributed learning paradigm, since it seeks to protect users privacy by not sharing their private training data. Recent research has demonstrated, however, that FL is susceptible to model inversion attacks, which can reconstruct users' private data by eavesdropping on shared gradients. Existing defense solutions cannot survive stronger attacks and exhibit a poor trade-off between privacy and performance. In this paper, we present a straightforward yet effective defense strategy based on obfuscating the gradients of sensitive data with concealing data. Specifically, we alter a few samples within a mini batch to mimic the sensitive data at the gradient levels. Using a gradient projection technique, our method seeks to obscure sensitive data without sacrificing FL performance. Our extensive evaluations demonstrate that, compared to other defenses, our technique offers the highest level of protection while preserving FL performance. Our source code is located in the repository.
[[2209.05954] Automatically Score Tissue Images Like a Pathologist by Transfer Learning](http://arxiv.org/abs/2209.05954)
Cancer is the second leading cause of death in the world. Diagnosing cancer early on can save many lives. Pathologists have to look at tissue microarray (TMA) images manually to identify tumors, which can be time-consuming, inconsistent and subjective. Existing algorithms that automatically detect tumors have either not achieved the accuracy level of a pathologist or require substantial human involvements. A major challenge is that TMA images with different shapes, sizes, and locations can have the same score. Learning staining patterns in TMA images requires a huge number of images, which are severely limited due to privacy concerns and regulations in medical organizations. TMA images from different cancer types may have common characteristics that could provide valuable information, but using them directly harms the accuracy. Transfer learning is adopted to increase the training sample size by extracting knowledge from tissue images from different cancer types. Transfer learning has made it possible for the algorithm to break the critical accuracy barrier. The proposed algorithm reports an accuracy of 75.9% on breast cancer TMA images from the Stanford Tissue Microarray Database, achieving the 75% accuracy level of pathologists. This will allow pathologists to confidently use automatic algorithms to assist them in recognizing tumors consistently with a higher accuracy in real time.
[[2209.05578] Cocktail Party Attack: Breaking Aggregation-Based Privacy in Federated Learning using Independent Component Analysis](http://arxiv.org/abs/2209.05578)
Federated learning (FL) aims to perform privacy-preserving machine learning on distributed data held by multiple data owners. To this end, FL requires the data owners to perform training locally and share the gradient updates (instead of the private inputs) with the central server, which are then securely aggregated over multiple data owners. Although aggregation by itself does not provably offer privacy protection, prior work showed that it may suffice if the batch size is sufficiently large. In this paper, we propose the Cocktail Party Attack (CPA) that, contrary to prior belief, is able to recover the private inputs from gradients aggregated over a very large batch size. CPA leverages the crucial insight that aggregate gradients from a fully connected layer is a linear combination of its inputs, which leads us to frame gradient inversion as a blind source separation (BSS) problem (informally called the cocktail party problem). We adapt independent component analysis (ICA)--a classic solution to the BSS problem--to recover private inputs for fully-connected and convolutional networks, and show that CPA significantly outperforms prior gradient inversion attacks, scales to ImageNet-sized inputs, and works on large batch sizes of up to 1024.
[[2209.06113] Generate novel and robust samples from data: accessible sharing without privacy concerns](http://arxiv.org/abs/2209.06113)
Generating new samples from data sets can mitigate extra expensive operations, increased invasive procedures, and mitigate privacy issues. These novel samples that are statistically robust can be used as a temporary and intermediate replacement when privacy is a concern. This method can enable better data sharing practices without problems relating to identification issues or biases that are flaws for an adversarial attack.
[[2209.05978] A Distributed Acoustic Sensor System for Intelligent Transportation using Deep Learning](http://arxiv.org/abs/2209.05978)
Intelligent transport systems (ITS) are pivotal in the development of sustainable and green urban living. ITS is data-driven and enabled by the profusion of sensors ranging from pneumatic tubes to smart cameras. This work explores a novel data source based on optical fibre-based distributed acoustic sensors (DAS) for traffic analysis. Detecting the type of vehicle and estimating the occupancy of vehicles are prime concerns in ITS. The first is motivated by the need for tracking, controlling, and forecasting traffic flow. The second targets the regulation of high occupancy vehicle lanes in an attempt to reduce emissions and congestion. These tasks are often conducted by individuals inspecting vehicles or through the use of emerging computer vision technologies. The former is not scale-able nor efficient whereas the latter is intrusive to passengers' privacy. To this end, we propose a deep learning technique to analyse DAS signals to address this challenge through continuous sensing and without exposing personal information. We propose a deep learning method for processing DAS signals and achieve 92% vehicle classification accuracy and 92-97% in occupancy detection based on DAS data collected under controlled conditions.
[[2209.06015] Black-box Ownership Verification for Dataset Protection via Backdoor Watermarking](http://arxiv.org/abs/2209.06015)
Deep learning, especially deep neural networks (DNNs), has been widely and successfully adopted in many critical applications for its high effectiveness and efficiency. The rapid development of DNNs has benefited from the existence of some high-quality datasets ($e.g.$, ImageNet), which allow researchers and developers to easily verify the performance of their methods. Currently, almost all existing released datasets require that they can only be adopted for academic or educational purposes rather than commercial purposes without permission. However, there is still no good way to ensure that. In this paper, we formulate the protection of released datasets as verifying whether they are adopted for training a (suspicious) third-party model, where defenders can only query the model while having no information about its parameters and training details. Based on this formulation, we propose to embed external patterns via backdoor watermarking for the ownership verification to protect them. Our method contains two main parts, including dataset watermarking and dataset verification. Specifically, we exploit poison-only backdoor attacks ($e.g.$, BadNets) for dataset watermarking and design a hypothesis-test-guided method for dataset verification. Experiments on multiple benchmark datasets of different tasks are conducted, which verify the effectiveness of our method. The code for reproducing main experiments is available at \url{https://github.com/THUYimingLi/DVBW}.
[[2209.05980] Certified Defences Against Adversarial Patch Attacks on Semantic Segmentation](http://arxiv.org/abs/2209.05980)
Adversarial patch attacks are an emerging security threat for real world deep learning applications. We present Demasked Smoothing, the first approach (up to our knowledge) to certify the robustness of semantic segmentation models against this threat model. Previous work on certifiably defending against patch attacks has mostly focused on image classification task and often required changes in the model architecture and additional training which is undesirable and computationally expensive. In Demasked Smoothing, any segmentation model can be applied without particular training, fine-tuning, or restriction of the architecture. Using different masking strategies, Demasked Smoothing can be applied both for certified detection and certified recovery. In extensive experiments we show that Demasked Smoothing can on average certify 64% of the pixel predictions for a 1% patch in the detection task and 48% against a 0.5% patch for the recovery task on the ADE20K dataset.
[[2209.05692] Sample Complexity of an Adversarial Attack on UCB-based Best-arm Identification Policy](http://arxiv.org/abs/2209.05692)
In this work I study the problem of adversarial perturbations to rewards, in a Multi-armed bandit (MAB) setting. Specifically, I focus on an adversarial attack to a UCB type best-arm identification policy applied to a stochastic MAB. The UCB attack presented in [1] results in pulling a target arm K very often. I used the attack model of [1] to derive the sample complexity required for selecting target arm K as the best arm. I have proved that the stopping condition of UCB based best-arm identification algorithm given in [2], can be achieved by the target arm K in T rounds, where T depends only on the total number of arms and $\sigma$ parameter of $\sigma^2-$ sub-Gaussian random rewards of the arms.
[[2209.05742] A Tale of HodgeRank and Spectral Method: Target Attack Against Rank Aggregation Is the Fixed Point of Adversarial Game](http://arxiv.org/abs/2209.05742)
Rank aggregation with pairwise comparisons has shown promising results in elections, sports competitions, recommendations, and information retrieval. However, little attention has been paid to the security issue of such algorithms, in contrast to numerous research work on the computational and statistical characteristics. Driven by huge profits, the potential adversary has strong motivation and incentives to manipulate the ranking list. Meanwhile, the intrinsic vulnerability of the rank aggregation methods is not well studied in the literature. To fully understand the possible risks, we focus on the purposeful adversary who desires to designate the aggregated results by modifying the pairwise data in this paper. From the perspective of the dynamical system, the attack behavior with a target ranking list is a fixed point belonging to the composition of the adversary and the victim. To perform the targeted attack, we formulate the interaction between the adversary and the victim as a game-theoretic framework consisting of two continuous operators while Nash equilibrium is established. Then two procedures against HodgeRank and RankCentrality are constructed to produce the modification of the original data. Furthermore, we prove that the victims will produce the target ranking list once the adversary masters the complete information. It is noteworthy that the proposed methods allow the adversary only to hold incomplete information or imperfect feedback and perform the purposeful attack. The effectiveness of the suggested target attack strategies is demonstrated by a series of toy simulations and several real-world data experiments. These experimental results show that the proposed methods could achieve the attacker's goal in the sense that the leading candidate of the perturbed ranking list is the designated one by the adversary.
[[2209.05624] Robust Category-Level 6D Pose Estimation with Coarse-to-Fine Rendering of Neural Features](http://arxiv.org/abs/2209.05624)
We consider the problem of category-level 6D pose estimation from a single RGB image. Our approach represents an object category as a cuboid mesh and learns a generative model of the neural feature activations at each mesh vertex to perform pose estimation through differentiable rendering. A common problem of rendering-based approaches is that they rely on bounding box proposals, which do not convey information about the 3D rotation of the object and are not reliable when objects are partially occluded. Instead, we introduce a coarse-to-fine optimization strategy that utilizes the rendering process to estimate a sparse set of 6D object proposals, which are subsequently refined with gradient-based optimization. The key to enabling the convergence of our approach is a neural feature representation that is trained to be scale- and rotation-invariant using contrastive learning. Our experiments demonstrate an enhanced category-level 6D pose estimation performance compared to prior work, particularly under strong partial occlusion.
[[2209.05779] Test-Time Adaptation with Principal Component Analysis](http://arxiv.org/abs/2209.05779)
Machine Learning models are prone to fail when test data are different from training data, a situation often encountered in real applications known as distribution shift. While still valid, the training-time knowledge becomes less effective, requiring a test-time adaptation to maintain high performance. Following approaches that assume batch-norm layer and use their statistics for adaptation, we propose a Test-Time Adaptation with Principal Component Analysis (TTAwPCA), which presumes a fitted PCA and adapts at test time a spectral filter based on the singular values of the PCA for robustness to corruptions. TTAwPCA combines three components: the output of a given layer is decomposed using a Principal Component Analysis (PCA), filtered by a penalization of its singular values, and reconstructed with the PCA inverse transform. This generic enhancement adds fewer parameters than current methods. Experiments on CIFAR-10-C and CIFAR- 100-C demonstrate the effectiveness and limits of our method using a unique filter of 2000 parameters.
[[2209.05785] Adversarial Coreset Selection for Efficient Robust Training](http://arxiv.org/abs/2209.05785)
Neural networks are vulnerable to adversarial attacks: adding well-crafted, imperceptible perturbations to their input can modify their output. Adversarial training is one of the most effective approaches to training robust models against such attacks. Unfortunately, this method is much slower than vanilla training of neural networks since it needs to construct adversarial examples for the entire training data at every iteration. By leveraging the theory of coreset selection, we show how selecting a small subset of training data provides a principled approach to reducing the time complexity of robust training. To this end, we first provide convergence guarantees for adversarial coreset selection. In particular, we show that the convergence bound is directly related to how well our coresets can approximate the gradient computed over the entire training data. Motivated by our theoretical analysis, we propose using this gradient approximation error as our adversarial coreset selection objective to reduce the training set size effectively. Once built, we run adversarial training over this subset of the training data. Unlike existing methods, our approach can be adapted to a wide variety of training objectives, including TRADES, $\ell_p$-PGD, and Perceptual Adversarial Training. We conduct extensive experiments to demonstrate that our approach speeds up adversarial training by 2-3 times while experiencing a slight degradation in the clean and robust accuracy.
[[2209.05804] Analyzing the Impact of Varied Window Hyper-parameters on Deep CNN for sEMG based Motion Intent Classification](http://arxiv.org/abs/2209.05804)
The use of deep neural networks in electromyogram (EMG) based prostheses control provides a promising alternative to the hand-crafted features by automatically learning muscle activation patterns from the EMG signals. Meanwhile, the use of raw EMG signals as input to convolution neural networks (CNN) offers a simple, fast, and ideal scheme for effective control of prostheses. Therefore, this study investigates the relationship between window length and overlap, which may influence the generation of robust raw EMG 2-dimensional (2D) signals for application in CNN. And a rule of thumb for a proper combination of these parameters that could guarantee optimal network performance was derived. Moreover, we investigate the relationship between the CNN receptive window size and the raw EMG signal size. Experimental results show that the performance of the CNN increases with the increase in overlap within the generated signals, with the highest improvement of 9.49% accuracy and 23.33% F1-score realized when the overlap is 75% of the window length. Similarly, the network performance increases with the increase in receptive window (kernel) size. Findings from this study suggest that a combination of 75% overlap in 2D EMG signals and wider network kernels may provide ideal motor intents classification for adequate EMG-CNN based prostheses control scheme.
[[2209.05921] Document Image Binarization in JPEG Compressed Domain using Dual Discriminator Generative Adversarial Networks](http://arxiv.org/abs/2209.05921)
Image binarization techniques are being popularly used in enhancement of noisy and/or degraded images catering different Document Image Anlaysis (DIA) applications like word spotting, document retrieval, and OCR. Most of the existing techniques focus on feeding pixel images into the Convolution Neural Networks to accomplish document binarization, which may not produce effective results when working with compressed images that need to be processed without full decompression. Therefore in this research paper, the idea of document image binarization directly using JPEG compressed stream of document images is proposed by employing Dual Discriminator Generative Adversarial Networks (DD-GANs). Here the two discriminator networks - Global and Local work on different image ratios and use focal loss as generator loss. The proposed model has been thoroughly tested with different versions of DIBCO dataset having challenges like holes, erased or smudged ink, dust, and misplaced fibres. The model proved to be highly robust, efficient both in terms of time and space complexities, and also resulted in state-of-the-art performance in JPEG compressed domain.
[[2209.05924] SVNet: Where SO(3) Equivariance Meets Binarization on Point Cloud Representation](http://arxiv.org/abs/2209.05924)
Efficiency and robustness are increasingly needed for applications on 3D point clouds, with the ubiquitous use of edge devices in scenarios like autonomous driving and robotics, which often demand real-time and reliable responses. The paper tackles the challenge by designing a general framework to construct 3D learning architectures with SO(3) equivariance and network binarization. However, a naive combination of equivariant networks and binarization either causes sub-optimal computational efficiency or geometric ambiguity. We propose to locate both scalar and vector features in our networks to avoid both cases. Precisely, the presence of scalar features makes the major part of the network binarizable, while vector features serve to retain rich structural information and ensure SO(3) equivariance. The proposed approach can be applied to general backbones like PointNet and DGCNN. Meanwhile, experiments on ModelNet40, ShapeNet, and the real-world dataset ScanObjectNN, demonstrated that the method achieves a great trade-off between efficiency, rotation robustness, and accuracy. The codes are available at https://github.com/zhuoinoulu/svnet.
[[2209.06040] DMTNet: Dynamic Multi-scale Network for Dual-pixel Images Defocus Deblurring with Transformer](http://arxiv.org/abs/2209.06040)
Recent works achieve excellent results in defocus deblurring task based on dual-pixel data using convolutional neural network (CNN), while the scarcity of data limits the exploration and attempt of vision transformer in this task. In addition, the existing works use fixed parameters and network architecture to deblur images with different distribution and content information, which also affects the generalization ability of the model. In this paper, we propose a dynamic multi-scale network, named DMTNet, for dual-pixel images defocus deblurring. DMTNet mainly contains two modules: feature extraction module and reconstruction module. The feature extraction module is composed of several vision transformer blocks, which uses its powerful feature extraction capability to obtain richer features and improve the robustness of the model. The reconstruction module is composed of several Dynamic Multi-scale Sub-reconstruction Module (DMSSRM). DMSSRM can restore images by adaptively assigning weights to features from different scales according to the blur distribution and content information of the input images. DMTNet combines the advantages of transformer and CNN, in which the vision transformer improves the performance ceiling of CNN, and the inductive bias of CNN enables transformer to extract more robust features without relying on a large amount of data. DMTNet might be the first attempt to use vision transformer to restore the blurring images to clarity. By combining with CNN, the vision transformer may achieve better performance on small datasets. Experimental results on the popular benchmarks demonstrate that our DMTNet significantly outperforms state-of-the-art methods.
[[2209.06078] On the Optimal Combination of Cross-Entropy and Soft Dice Losses for Lesion Segmentation with Out-of-Distribution Robustness](http://arxiv.org/abs/2209.06078)
We study the impact of different loss functions on lesion segmentation from medical images. Although the Cross-Entropy (CE) loss is the most popular option when dealing with natural images, for biomedical image segmentation the soft Dice loss is often preferred due to its ability to handle imbalanced scenarios. On the other hand, the combination of both functions has also been successfully applied in this kind of tasks. A much less studied problem is the generalization ability of all these losses in the presence of Out-of-Distribution (OoD) data. This refers to samples appearing in test time that are drawn from a different distribution than training images. In our case, we train our models on images that always contain lesions, but in test time we also have lesion-free samples. We analyze the impact of the minimization of different loss functions on in-distribution performance, but also its ability to generalize to OoD data, via comprehensive experiments on polyp segmentation from endoscopic images and ulcer segmentation from diabetic feet images. Our findings are surprising: CE-Dice loss combinations that excel in segmenting in-distribution images have a poor performance when dealing with OoD data, which leads us to recommend the adoption of the CE loss for this kind of problems, due to its robustness and ability to generalize to OoD samples. Code associated to our experiments can be found at \url{https://github.com/agaldran/lesion_losses_ood} .
[[2209.06172] Comparative analysis of segmentation and generative models for fingerprint retrieval task](http://arxiv.org/abs/2209.06172)
Biometric Authentication like Fingerprints has become an integral part of the modern technology for authentication and verification of users. It is pervasive in more ways than most of us are aware of. However, these fingerprint images deteriorate in quality if the fingers are dirty, wet, injured or when sensors malfunction. Therefore, extricating the original fingerprint by removing the noise and inpainting it to restructure the image is crucial for its authentication. Hence, this paper proposes a deep learning approach to address these issues using Generative (GAN) and Segmentation models. Qualitative and Quantitative comparison has been done between pix2pixGAN and cycleGAN (generative models) as well as U-net (segmentation model). To train the model, we created our own dataset NFD - Noisy Fingerprint Dataset meticulously with different backgrounds along with scratches in some images to make it more realistic and robust. In our research, the u-net model performed better than the GAN networks
[[2209.05522] TEDL: A Two-stage Evidential Deep Learning Method for Classification Uncertainty Quantification](http://arxiv.org/abs/2209.05522)
In this paper, we propose TEDL, a two-stage learning approach to quantify uncertainty for deep learning models in classification tasks, inspired by our findings in experimenting with Evidential Deep Learning (EDL) method, a recently proposed uncertainty quantification approach based on the Dempster-Shafer theory. More specifically, we observe that EDL tends to yield inferior AUC compared with models learnt by cross-entropy loss and is highly sensitive in training. Such sensitivity is likely to cause unreliable uncertainty estimation, making it risky for practical applications. To mitigate both limitations, we propose a simple yet effective two-stage learning approach based on our analysis on the likely reasons causing such sensitivity, with the first stage learning from cross-entropy loss, followed by a second stage learning from EDL loss. We also re-formulate the EDL loss by replacing ReLU with ELU to avoid the Dying ReLU issue. Extensive experiments are carried out on varied sized training corpus collected from a large-scale commercial search engine, demonstrating that the proposed two-stage learning framework can increase AUC significantly and greatly improve training robustness.
[[2209.05668] Class-Level Logit Perturbation](http://arxiv.org/abs/2209.05668)
Features, logits, and labels are the three primary data when a sample passes through a deep neural network. Feature perturbation and label perturbation receive increasing attention in recent years. They have been proven to be useful in various deep learning approaches. For example, (adversarial) feature perturbation can improve the robustness or even generalization capability of learned models. However, limited studies have explicitly explored for the perturbation of logit vectors. This work discusses several existing methods related to class-level logit perturbation. A unified viewpoint between positive/negative data augmentation and loss variations incurred by logit perturbation is established. A theoretical analysis is provided to illuminate why class-level logit perturbation is useful. Accordingly, new methodologies are proposed to explicitly learn to perturb logits for both single-label and multi-label classification tasks. Extensive experiments on benchmark image classification data sets and their long-tail versions indicated the competitive performance of our learning method. As it only perturbs on logit, it can be used as a plug-in to fuse with any existing classification algorithms. All the codes are available at https://github.com/limengyang1992/lpl.
[[2209.06116] Patching Weak Convolutional Neural Network Models through Modularization and Composition](http://arxiv.org/abs/2209.06116)
Despite great success in many applications, deep neural networks are not always robust in practice. For instance, a convolutional neuron network (CNN) model for classification tasks often performs unsatisfactorily in classifying some particular classes of objects. In this work, we are concerned with patching the weak part of a CNN model instead of improving it through the costly retraining of the entire model. Inspired by the fundamental concepts of modularization and composition in software engineering, we propose a compressed modularization approach, CNNSplitter, which decomposes a strong CNN model for $N$-class classification into $N$ smaller CNN modules. Each module is a sub-model containing a part of the convolution kernels of the strong model. To patch a weak CNN model that performs unsatisfactorily on a target class (TC), we compose the weak CNN model with the corresponding module obtained from a strong CNN model. The ability of the weak CNN model to recognize the TC can thus be improved through patching. Moreover, the ability to recognize non-TCs is also improved, as the samples misclassified as TC could be classified as non-TCs correctly. Experimental results with two representative CNNs on three widely-used datasets show that the averaged improvement on the TC in terms of precision and recall are 12.54% and 2.14%, respectively. Moreover, patching improves the accuracy of non-TCs by 1.18%. The results demonstrate that CNNSplitter can patch a weak CNN model through modularization and composition, thus providing a new solution for developing robust CNN models.
[[2209.06203] Normalizing Flows for Interventional Density Estimation](http://arxiv.org/abs/2209.06203)
Existing machine learning methods for causal inference usually estimate quantities expressed via the mean of potential outcomes (e.g., average treatment effect). However, such quantities do not capture the full information about the distribution of potential outcomes. In this work, we estimate the density of potential outcomes after interventions from observational data. Specifically, we propose a novel, fully-parametric deep learning method for this purpose, called Interventional Normalizing Flows. Our Interventional Normalizing Flows offer a properly normalized density estimator. For this, we introduce an iterative training of two normalizing flows, namely (i) a teacher flow for estimation of nuisance parameters and (ii) a student flow for parametric estimation of the density of potential outcomes. For efficient and doubly-robust estimation of the student flow parameters, we develop a custom tractable optimization objective based on a one-step bias correction. Across various experiments, we demonstrate that our Interventional Normalizing Flows are expressive and highly effective, and scale well with both sample size and high-dimensional confounding. To the best of our knowledge, our Interventional Normalizing Flows are the first fully-parametric, deep learning method for density estimation of potential outcomes.
[[2209.05550] Mathematical Framework for Online Social Media Regulation](http://arxiv.org/abs/2209.05550)
Social media platforms (SMPs) leverage algorithmic filtering (AF) as a means of selecting the content that constitutes a user's feed with the aim of maximizing their rewards. Selectively choosing the contents to be shown on the user's feed may yield a certain extent of influence, either minor or major, on the user's decision-making, compared to what it would have been under a natural/fair content selection. As we have witnessed over the past decade, algorithmic filtering can cause detrimental side effects, ranging from biasing individual decisions to shaping those of society as a whole, for example, diverting users' attention from whether to get the COVID-19 vaccine or inducing the public to choose a presidential candidate. The government's constant attempts to regulate the adverse effects of AF are often complicated, due to bureaucracy, legal affairs, and financial considerations. On the other hand SMPs seek to monitor their own algorithmic activities to avoid being fined for exceeding the allowable threshold. In this paper, we mathematically formalize this framework and utilize it to construct a data-driven statistical algorithm to regulate the AF from deflecting users' beliefs over time, along with sample and complexity guarantees. We show that our algorithm is robust against potential adversarial users. This state-of-the-art algorithm can be used either by authorities acting as external regulators or by SMPs for self-regulation.
[[2209.05774] PointScatter: Point Set Representation for Tubular Structure Extraction](http://arxiv.org/abs/2209.05774)
This paper explores the point set representation for tubular structure extraction tasks. Compared with the traditional mask representation, the point set representation enjoys its flexibility and representation ability, which would not be restricted by the fixed grid as the mask. Inspired by this, we propose PointScatter, an alternative to the segmentation models for the tubular structure extraction task. PointScatter splits the image into scatter regions and parallelly predicts points for each scatter region. We further propose the greedy-based region-wise bipartite matching algorithm to train the network end-to-end and efficiently. We benchmark the PointScatter on four public tubular datasets, and the extensive experiments on tubular structure segmentation and centerline extraction task demonstrate the effectiveness of our approach. Code is available at https://github.com/zhangzhao2022/pointscatter.
[[2209.05824] CPnP: Consistent Pose Estimator for Perspective-n-Point Problem with Bias Elimination](http://arxiv.org/abs/2209.05824)
The Perspective-n-Point (PnP) problem has been widely studied in both computer vision and photogrammetry societies. With the development of feature extraction techniques, a large number of feature points might be available in a single shot. It is promising to devise a consistent estimator, i.e., the estimate can converge to the true camera pose as the number of points increases. To this end, we propose a consistent PnP solver, named \emph{CPnP}, with bias elimination. Specifically, linear equations are constructed from the original projection model via measurement model modification and variable elimination, based on which a closed-form least-squares solution is obtained. We then analyze and subtract the asymptotic bias of this solution, resulting in a consistent estimate. Additionally, Gauss-Newton (GN) iterations are executed to refine the consistent solution. Our proposed estimator is efficient in terms of computations -- it has $O(n)$ computational complexity. Experimental tests on both synthetic data and real images show that our proposed estimator is superior to some well-known ones for images with dense visual features, in terms of estimation precision and computing time.
[[2209.05987] Design of Negative Sampling Strategies for Distantly Supervised Skill Extraction](http://arxiv.org/abs/2209.05987)
Skills play a central role in the job market and many human resources (HR) processes. In the wake of other digital experiences, today's online job market has candidates expecting to see the right opportunities based on their skill set. Similarly, enterprises increasingly need to use data to guarantee that the skills within their workforce remain future-proof. However, structured information about skills is often missing, and processes building on self- or manager-assessment have shown to struggle with issues around adoption, completeness, and freshness of the resulting data. Extracting skills is a highly challenging task, given the many thousands of possible skill labels mentioned either explicitly or merely described implicitly and the lack of finely annotated training corpora. Previous work on skill extraction overly simplifies the task to an explicit entity detection task or builds on manually annotated training data that would be infeasible if applied to a complete vocabulary of skills. We propose an end-to-end system for skill extraction, based on distant supervision through literal matching. We propose and evaluate several negative sampling strategies, tuned on a small validation dataset, to improve the generalization of skill extraction towards implicitly mentioned skills, despite the lack of such implicit skills in the distantly supervised data. We observe that using the ESCO taxonomy to select negative examples from related skills yields the biggest improvements, and combining three different strategies in one model further increases the performance, up to 8 percentage points in RP@5. We introduce a manually annotated evaluation benchmark for skill extraction based on the ESCO taxonomy, on which we validate our models. We release the benchmark dataset for research purposes to stimulate further research on the task.
[[2209.06170] Computational Sarcasm Analysis on Social Media: A Systematic Review](http://arxiv.org/abs/2209.06170)
Sarcasm can be defined as saying or writing the opposite of what one truly wants to express, usually to insult, irritate, or amuse someone. Because of the obscure nature of sarcasm in textual data, detecting it is difficult and of great interest to the sentiment analysis research community. Though the research in sarcasm detection spans more than a decade, some significant advancements have been made recently, including employing unsupervised pre-trained transformers in multimodal environments and integrating context to identify sarcasm. In this study, we aim to provide a brief overview of recent advancements and trends in computational sarcasm research for the English language. We describe relevant datasets, methodologies, trends, issues, challenges, and tasks relating to sarcasm that are beyond detection. Our study provides well-summarized tables of sarcasm datasets, sarcastic features and their extraction methods, and performance analysis of various approaches which can help researchers in related domains understand current state-of-the-art practices in sarcasm detection.
[[2209.05627] SENDER: SEmi-Nonlinear Deep Efficient Reconstructor for Extraction Canonical, Meta, and Sub Functional Connectivity in the Human Brain](http://arxiv.org/abs/2209.05627)
Deep Linear and Nonlinear learning methods have already been vital machine learning methods for investigating the hierarchical features such as functional connectivity in the human brain via functional Magnetic Resonance signals; however, there are three major shortcomings: 1). For deep linear learning methods, although the identified hierarchy of functional connectivity is easily explainable, it is challenging to reveal more hierarchical functional connectivity; 2). For deep nonlinear learning methods, although non-fully connected architecture reduces the complexity of neural network structures that are easy to optimize and not vulnerable to overfitting, the functional connectivity hierarchy is difficult to explain; 3). Importantly, it is challenging for Deep Linear/Nonlinear methods to detect meta and sub-functional connectivity even in the shallow layers; 4). Like most conventional Deep Nonlinear Methods, such as Deep Neural Networks, the hyperparameters must be tuned manually, which is time-consuming. Thus, in this work, we propose a novel deep hybrid learning method named SEmi-Nonlinear Deep Efficient Reconstruction (SENDER), to overcome the aforementioned shortcomings: 1). SENDER utilizes a multiple-layer stacked structure for the linear learning methods to detect the canonical functional connectivity; 2). SENDER implements a non-fully connected architecture conducted for the nonlinear learning methods to reveal the meta-functional connectivity through shallow and deeper layers; 3). SENDER incorporates the proposed background components to extract the sub-functional connectivity; 4). SENDER adopts a novel rank reduction operator to implement the hyperparameters tuning automatically. To further validate the effectiveness, we compared SENDER with four peer methodologies using real functional Magnetic Resonance Imaging data for the human brain.
[[2209.06032] Investigating the Predictive Reproducibility of Federated Graph Neural Networks using Medical Datasets](http://arxiv.org/abs/2209.06032)
Graph neural networks (GNNs) have achieved extraordinary enhancements in various areas including the fields medical imaging and network neuroscience where they displayed a high accuracy in diagnosing challenging neurological disorders such as autism. In the face of medical data scarcity and high-privacy, training such data-hungry models remains challenging. Federated learning brings an efficient solution to this issue by allowing to train models on multiple datasets, collected independently by different hospitals, in fully data-preserving manner. Although both state-of-the-art GNNs and federated learning techniques focus on boosting classification accuracy, they overlook a critical unsolved problem: investigating the reproducibility of the most discriminative biomarkers (i.e., features) selected by the GNN models within a federated learning paradigm. Quantifying the reproducibility of a predictive medical model against perturbations of training and testing data distributions presents one of the biggest hurdles to overcome in developing translational clinical applications. To the best of our knowledge, this presents the first work investigating the reproducibility of federated GNN models with application to classifying medical imaging and brain connectivity datasets. We evaluated our framework using various GNN models trained on medical imaging and connectomic datasets. More importantly, we showed that federated learning boosts both the accuracy and reproducibility of GNN models in such medical learning tasks. Our source code is available at https://github.com/basiralab/reproducibleFedGNN.
[[2209.05690] Concept-Based Explanations for Tabular Data](http://arxiv.org/abs/2209.05690)
The interpretability of machine learning models has been an essential area of research for the safe deployment of machine learning systems. One particular approach is to attribute model decisions to high-level concepts that humans can understand. However, such concept-based explainability for Deep Neural Networks (DNNs) has been studied mostly on image domain. In this paper, we extend TCAV, the concept attribution approach, to tabular learning, by providing an idea on how to define concepts over tabular data. On a synthetic dataset with ground-truth concept explanations and a real-world dataset, we show the validity of our method in generating interpretability results that match the human-level intuitions. On top of this, we propose a notion of fairness based on TCAV that quantifies what layer of DNN has learned representations that lead to biased predictions of the model. Also, we empirically demonstrate the relation of TCAV-based fairness to a group fairness notion, Demographic Parity.
[[2209.05957] Adversarial Inter-Group Link Injection Degrades the Fairness of Graph Neural Networks](http://arxiv.org/abs/2209.05957)
We present evidence for the existence and effectiveness of adversarial attacks on graph neural networks (GNNs) that aim to degrade fairness. These attacks can disadvantage a particular subgroup of nodes in GNN-based node classification, where nodes of the underlying network have sensitive attributes, such as race or gender. We conduct qualitative and experimental analyses explaining how adversarial link injection impairs the fairness of GNN predictions. For example, an attacker can compromise the fairness of GNN-based node classification by injecting adversarial links between nodes belonging to opposite subgroups and opposite class labels. Our experiments on empirical datasets demonstrate that adversarial fairness attacks can significantly degrade the fairness of GNN predictions (attacks are effective) with a low perturbation rate (attacks are efficient) and without a significant drop in accuracy (attacks are deceptive). This work demonstrates the vulnerability of GNN models to adversarial fairness attacks. We hope our findings raise awareness about this issue in our community and lay a foundation for the future development of GNN models that are more robust to such attacks.