[[2212.13567] Range-Based Set Reconciliation and Authenticated Set Representations](http://arxiv.org/abs/2212.13567) #secure
Range-based set reconciliation is a simple approach to efficiently computing the union of two sets over a network, based on recursively partitioning the sets and comparing fingerprints of the partitions to probabilistically detect whether a partition requires further work. Whereas prior presentations of this approach focus on specific fingerprinting schemes for specific use-cases, we give a more generic description and analysis in the broader context of set reconciliation. Precisely capturing the design space for fingerprinting schemes allows us to survey for cryptographically secure schemes. Furthermore, we reduce the time complexity of local computations by a logarithmic factor compared to previous publications. In investigating secure associative hash functions, we open up a new class of tree-based authenticated data structures which need not prescribe a deterministic balancing scheme.
[[2212.13689] ML-based Secure Low-Power Communication in Adversarial Contexts](http://arxiv.org/abs/2212.13689) #secure
As wireless network technology becomes more and more popular, mutual interference between various signals has become more and more severe and common. Therefore, there is often a situation in which the transmission of its own signal is interfered with by occupying the channel. Especially in a confrontational environment, Jamming has caused great harm to the security of information transmission. So I propose ML-based secure ultra-low power communication, which is an approach to use machine learning to predict future wireless traffic by capturing patterns of past wireless traffic to ensure ultra-low-power transmission of signals via backscatters. In order to be more suitable for the adversarial environment, we use backscatter to achieve ultra-low power signal transmission, and use frequency-hopping technology to achieve successful confrontation with Jamming information. In the end, we achieved a prediction success rate of 96.19%.
[[2212.13986] Green Bitcoin: Global Sound Money](http://arxiv.org/abs/2212.13986) #secure
Modern societies have adopted government-issued fiat currencies many of which exist today mainly in the form of digits in credit and bank accounts. Fiat currencies are controlled by central banks for economic stimulation and stabilization. Boom-and-bust cycles are created. The volatility of the cycle has become increasingly extreme. Social inequality due to the concentration of wealth is prevalent worldwide. As such, restoring sound money, which provides stored value over time, has become a pressing issue. Currently, cryptocurrencies such as Bitcoin are in their infancy and may someday qualify as sound money. Bitcoin today is considered as a digital asset for storing value. But Bitcoin has problems. The first issue of the current Bitcoin network is its high energy consumption consensus mechanism. The second is the cryptographic primitives which are unsafe against post-quantum (PQ) attacks. We aim to propose Green Bitcoin which addresses both issues. To save energy in consensus mechanism, we introduce a post-quantum secure (self-election) verifiable coin-toss function and novel PQ secure proof-of-computation primitives. It is expected to reduce the rate of energy consumption more than 90 percent of the current Bitcoin network. The elliptic curve cryptography will be replaced with PQ-safe versions. The Green Bitcoin protocol will help Bitcoin evolve into a post-quantum secure network.
[[2212.13312] Users really do respond to smishing](http://arxiv.org/abs/2212.13312) #security
Text phish messages, referred to as Smishing is a type of social engineering attack where fake text messages are created, and used to lure users into responding to those messages. These messages aim to obtain user credentials, install malware on the phones, or launch smishing attacks. They ask users to reply to their message, click on a URL that redirects them to a phishing website, or call the provided number. Thousands of mobile users are affected by smishing attacks daily. Drawing inspiration by the works of Tu et al. (USENIX Security, 2019) on Robocalls and Tischer et al. (IEEE Symposium on Security and Privacy, 2016) on USB drives, this paper investigates why smishing works. Accordingly, we designed smishing experiments and sent phishing SMSes to 265 users to measure the efficacy of smishing attacks. We sent eight fake text messages to participants and recorded their CLICK, REPLY, and CALL responses along with their feedback in a post-test survey. Our results reveal that 16.92% of our participants had potentially fallen for our smishing attack. To test repeat phishing, we subjected a set of randomly selected participants to a second round of smishing attacks with a different message than the one they received in the first round. As a result, we observed that 12.82% potentially fell for the attack again. Using logistic regression, we observed that a combination of user REPLY and CLICK actions increased the odds that a user would respond to our smishing message when compared to CLICK. Additionally, we found a similar statistically significant increase when comparing Facebook and Walmart entity scenario to our IRS baseline.
[[2212.13353] A Miniscule Survey on Blockchain Scalability](http://arxiv.org/abs/2212.13353) #security
With the rise of cryptocurrency and NFTs in the past decade, blockchain technology has been an area of increasing interest to both industry and academic experts. In this paper, we discuss the feasibility of such systems through the lens of scalability. We also briefly dive into the security issues of such systems, as well as some applications, including healthcare, supply chain, and government applications.
[[2212.13421] Hardware Implementation of a Polar Code-based Public Key Cryptosystem](http://arxiv.org/abs/2212.13421) #security
In recent years, there have been many studies on quantum computing and the construction of quantum computers which are capable of breaking conventional number theory-based public key cryptosystems. Therefore, in the not-too-distant future, we need the public key cryptosystems that withstand against the attacks executed by quantum computers, so-called post-quantum cryptosystems. A public key cryptosystem based on polar codes (PKC-PC) has recently been introduced whose security depends on the difficulty of solving the general decoding problem of polar code. In this paper, we first implement the encryption, key generation and decryption algorithms of PKC-PC on Raspberry Pi3. Then, to evaluate its performance, we have measured several related parameters such as execution time, energy consumption, memory consumption and CPU utilization. All these metrics are investigated for encryption/decryption algorithms of PKC-PC with various parameters of polar codes. In the next step, the investigated parameters are compared to the implemented McEliece public key cryptosystem. Analyses of such results show that the execution time of encryption/decryption as well as the energy and memory consumption of PKC-PC is shorter than the McEliece cryptosystem.
[[2212.13452] Financial Crimes in Web3-empowered Metaverse: Taxonomy, Countermeasures, and Opportunities](http://arxiv.org/abs/2212.13452) #security
At present, the concept of metaverse has sparked widespread attention from the public to major industries. With the rapid development of blockchain and Web3 technologies, the decentralized metaverse ecology has attracted a large influx of users and capital.
Due to the lack of industry standards and regulatory rules, the Web3-empowered metaverse ecosystem has witnessed a variety of financial crimes, such as scams, code exploit, wash trading, money laundering, and illegal services and shops. To this end, it is especially urgent and critical to summarize and classify the financial security threats on the Web3-empowered metaverse in order to maintain the long-term healthy development of its ecology.
In this paper, we first outline the background, foundation, and applications of the Web3 metaverse. Then, we provide a comprehensive overview and taxonomy of the security risks and financial crimes that have emerged since the development of the decentralized metaverse. For each financial crime, we focus on three issues: a) existing definitions, b) relevant cases and analysis, and c) existing academic research on this type of crime. Next, from the perspective of academic research and government policy, we summarize the current anti-crime measurements and technologies in the metaverse. Finally, we discuss the opportunities and challenges in behavioral mining and the potential regulation of financial activities in the metaverse.
The overview of this paper is expected to help readers better understand the potential security threats in this emerging ecology, and to provide insights and references for financial crime fighting.
[[2212.13700] Publishing Efficient On-device Models Increases Adversarial Vulnerability](http://arxiv.org/abs/2212.13700) #security
Recent increases in the computational demands of deep neural networks (DNNs) have sparked interest in efficient deep learning mechanisms, e.g., quantization or pruning. These mechanisms enable the construction of a small, efficient version of commercial-scale models with comparable accuracy, accelerating their deployment to resource-constrained devices.
In this paper, we study the security considerations of publishing on-device variants of large-scale models. We first show that an adversary can exploit on-device models to make attacking the large models easier. In evaluations across 19 DNNs, by exploiting the published on-device models as a transfer prior, the adversarial vulnerability of the original commercial-scale models increases by up to 100x. We then show that the vulnerability increases as the similarity between a full-scale and its efficient model increase. Based on the insights, we propose a defense, $similarity$-$unpairing$, that fine-tunes on-device models with the objective of reducing the similarity. We evaluated our defense on all the 19 DNNs and found that it reduces the transferability up to 90% and the number of queries required by a factor of 10-100x. Our results suggest that further research is needed on the security (or even privacy) threats caused by publishing those efficient siblings.
[[2212.13716] One Bad Apple Spoils the Barrel: Understanding the Security Risks Introduced by Third-Party Components in IoT Firmware](http://arxiv.org/abs/2212.13716) #security
Currently, the development of IoT firmware heavily depends on third-party components (TPCs) to improve development efficiency. Nevertheless, TPCs are not secure, and the vulnerabilities in TPCs will influence the security of IoT firmware. Existing works pay less attention to the vulnerabilities caused by TPCs, and we still lack a comprehensive understanding of the security impact of TPC vulnerability against firmware. To fill in the knowledge gap, we design and implement FirmSec, which leverages syntactical features and control-flow graph features to detect the TPCs in firmware, and then recognizes the corresponding vulnerabilities. Based on FirmSec, we present the first large-scale analysis of the security risks raised by TPCs on $34,136$ firmware images. We successfully detect 584 TPCs and identify 128,757 vulnerabilities caused by 429 CVEs. Our in-depth analysis reveals the diversity of security risks in firmware and discovers some well-known vulnerabilities are still rooted in firmware. Besides, we explore the geographical distribution of vulnerable devices and confirm that the security situation of devices in different regions varies. Our analysis also indicates that vulnerabilities caused by TPCs in firmware keep growing with the boom of the IoT ecosystem. Further analysis shows 2,478 commercial firmware images have potentially violated GPL/AGPL licensing terms.
[[2212.13988] Machine Learning for Detecting Malware in PE Files](http://arxiv.org/abs/2212.13988) #security
The increasing number of sophisticated malware poses a major cybersecurity threat. Portable executable (PE) files are a common vector for such malware. In this work we review and evaluate machine learning-based PE malware detection techniques. Using a large benchmark dataset, we evaluate features of PE files using the most common machine learning techniques to detect malware.
[[2212.13989] AdvCat: Domain-Agnostic Robustness Assessment for Cybersecurity-Critical Applications with Categorical Inputs](http://arxiv.org/abs/2212.13989) #security
Machine Learning-as-a-Service systems (MLaaS) have been largely developed for cybersecurity-critical applications, such as detecting network intrusions and fake news campaigns. Despite effectiveness, their robustness against adversarial attacks is one of the key trust concerns for MLaaS deployment. We are thus motivated to assess the adversarial robustness of the Machine Learning models residing at the core of these security-critical applications with categorical inputs. Previous research efforts on accessing model robustness against manipulation of categorical inputs are specific to use cases and heavily depend on domain knowledge, or require white-box access to the target ML model. Such limitations prevent the robustness assessment from being as a domain-agnostic service provided to various real-world applications. We propose a provably optimal yet computationally highly efficient adversarial robustness assessment protocol for a wide band of ML-driven cybersecurity-critical applications. We demonstrate the use of the domain-agnostic robustness assessment method with substantial experimental study on fake news detection and intrusion detection problems.
[[2212.13993] Metaverse Communications, Networking, Security, and Applications: Research Issues, State-of-the-Art, and Future Directions](http://arxiv.org/abs/2212.13993) #security
Metaverse is an evolving orchestrator of the next-generation Internet architecture that produces an immersive and self-adapting virtual world in which humans perform activities similar to those in the real world, such as playing sports, doing work, and socializing. It is becoming a reality and is driven by ever-evolving advanced technologies such as extended reality, artificial intelligence, and blockchain. In this context, Metaverse will play an essential role in developing smart cities, which becomes more evident in the post COVID 19 pandemic metropolitan setting. However, the new paradigm imposes new challenges, such as developing novel privacy and security threats that can emerge in the digital Metaverse ecosystem. Moreover, it requires the convergence of several media types with the capability to quickly process massive amounts of data to keep the residents safe and well-informed, which can raise issues related to scalability and interoperability. In light of this, this research study aims to review the literature on the state of the art of integrating the Metaverse architecture concepts in smart cities. First, this paper presents the theoretical architecture of Metaverse and discusses international companies interest in this emerging technology. It also examines the notion of Metaverse relevant to virtual reality, identifies the prevalent threats, and determines the importance of communication infrastructure in information gathering for efficient Metaverse operation. Next, the notion of blockchain technologies is discussed regarding privacy preservation and how it can provide tamper-proof content sharing among Metaverse users. Finally, the application of distributed Metaverse for social good is highlighted.
[[2212.13791] StyleID: Identity Disentanglement for Anonymizing Faces](http://arxiv.org/abs/2212.13791) #privacy
Privacy of machine learning models is one of the remaining challenges that hinder the broad adoption of Artificial Intelligent (AI). This paper considers this problem in the context of image datasets containing faces. Anonymization of such datasets is becoming increasingly important due to their central role in the training of autonomous cars, for example, and the vast amount of data generated by surveillance systems. While most prior work de-identifies facial images by modifying identity features in pixel space, we instead project the image onto the latent space of a Generative Adversarial Network (GAN) model, find the features that provide the biggest identity disentanglement, and then manipulate these features in latent space, pixel space, or both. The main contribution of the paper is the design of a feature-preserving anonymization framework, StyleID, which protects the individuals' identity, while preserving as many characteristics of the original faces in the image dataset as possible. As part of the contribution, we present a novel disentanglement metric, three complementing disentanglement methods, and new insights into identity disentanglement. StyleID provides tunable privacy, has low computational complexity, and is shown to outperform current state-of-the-art solutions.
[[2212.13987] Encryption Mechanism And Resource Allocation Optimization Based On Edge Computing Environment](http://arxiv.org/abs/2212.13987) #privacy
A method for optimizing encryption mechanism and resource allocation based on edge computing environment is proposed. A local differential privacy algorithm based on a histogram algorithm is used to protect user information during task offloading, which allows accurate preservation of user contextual information while reducing interference with the playback decision. To efficiently offload tasks and improve offloading performance, a joint optimization algorithm for task offloading and resource allocation is proposed that optimizes overall latency. A balance will be found between privacy protection and task offloading accuracy. The impact of contextual data interference on task offloading decisions is minimized while ensuring a predefined level of privacy protection. In the concrete connected vehicle example, the method distributes tasks among roadside devices and neighboring vehicles with sufficient computational resources.
[[2212.13992] Social-Aware Clustered Federated Learning with Customized Privacy Preservation](http://arxiv.org/abs/2212.13992) #privacy
A key feature of federated learning (FL) is to preserve the data privacy of end users. However, there still exist potential privacy leakage in exchanging gradients under FL. As a result, recent research often explores the differential privacy (DP) approaches to add noises to the computing results to address privacy concerns with low overheads, which however degrade the model performance. In this paper, we strike the balance of data privacy and efficiency by utilizing the pervasive social connections between users. Specifically, we propose SCFL, a novel Social-aware Clustered Federated Learning scheme, where mutually trusted individuals can freely form a social cluster and aggregate their raw model updates (e.g., gradients) inside each cluster before uploading to the cloud for global aggregation. By mixing model updates in a social group, adversaries can only eavesdrop the social-layer combined results, but not the privacy of individuals. We unfold the design of SCFL in three steps. \emph{i) Stable social cluster formation. Considering users' heterogeneous training samples and data distributions, we formulate the optimal social cluster formation problem as a federation game and devise a fair revenue allocation mechanism to resist free-riders. ii) Differentiated trust-privacy mapping}. For the clusters with low mutual trust, we design a customizable privacy preservation mechanism to adaptively sanitize participants' model updates depending on social trust degrees. iii) Distributed convergence}. A distributed two-sided matching algorithm is devised to attain an optimized disjoint partition with Nash-stable convergence. Experiments on Facebook network and MNIST/CIFAR-10 datasets validate that our SCFL can effectively enhance learning utility, improve user payoff, and enforce customizable privacy protection.
[[2212.13599] Brain Cancer Segmentation Using YOLOv5 Deep Neural Network](http://arxiv.org/abs/2212.13599) #protect
An expansion of aberrant brain cells is referred to as a brain tumor. The brain's architecture is extremely intricate, with several regions controlling various nervous system processes. Any portion of the brain or skull can develop a brain tumor, including the brain's protective coating, the base of the skull, the brainstem, the sinuses, the nasal cavity, and many other places. Over the past ten years, numerous developments in the field of computer-aided brain tumor diagnosis have been made. Recently, instance segmentation has attracted a lot of interest in numerous computer vision applications. It seeks to assign various IDs to various scene objects, even if they are members of the same class. Typically, a two-stage pipeline is used to perform instance segmentation. This study shows brain cancer segmentation using YOLOv5. Yolo takes dataset as picture format and corresponding text file. You Only Look Once (YOLO) is a viral and widely used algorithm. YOLO is famous for its object recognition properties. You Only Look Once (YOLO) is a popular algorithm that has gone viral. YOLO is well known for its ability to identify objects. YOLO V2, V3, V4, and V5 are some of the YOLO latest versions that experts have published in recent years. Early brain tumor detection is one of the most important jobs that neurologists and radiologists have. However, it can be difficult and error-prone to manually identify and segment brain tumors from Magnetic Resonance Imaging (MRI) data. For making an early diagnosis of the condition, an automated brain tumor detection system is necessary. The model of the research paper has three classes. They are respectively Meningioma, Pituitary, Glioma. The results show that, our model achieves competitive accuracy, in terms of runtime usage of M2 10 core GPU.
[[2212.13675] XMAM:X-raying Models with A Matrix to Reveal Backdoor Attacks for Federated Learning](http://arxiv.org/abs/2212.13675) #attack
Federated Learning (FL) has received increasing attention due to its privacy protection capability. However, the base algorithm FedAvg is vulnerable when it suffers from so-called backdoor attacks. Former researchers proposed several robust aggregation methods. Unfortunately, many of these aggregation methods are unable to defend against backdoor attacks. What's more, the attackers recently have proposed some hiding methods that further improve backdoor attacks' stealthiness, making all the existing robust aggregation methods fail.
To tackle the threat of backdoor attacks, we propose a new aggregation method, X-raying Models with A Matrix (XMAM), to reveal the malicious local model updates submitted by the backdoor attackers. Since we observe that the output of the Softmax layer exhibits distinguishable patterns between malicious and benign updates, we focus on the Softmax layer's output in which the backdoor attackers are difficult to hide their malicious behavior. Specifically, like X-ray examinations, we investigate the local model updates by using a matrix as an input to get their Softmax layer's outputs. Then, we preclude updates whose outputs are abnormal by clustering. Without any training dataset in the server, the extensive evaluations show that our XMAM can effectively distinguish malicious local model updates from benign ones. For instance, when other methods fail to defend against the backdoor attacks at no more than 20% malicious clients, our method can tolerate 45% malicious clients in the black-box mode and about 30% in Projected Gradient Descent (PGD) mode. Besides, under adaptive attacks, the results demonstrate that XMAM can still complete the global model training task even when there are 40% malicious clients. Finally, we analyze our method's screening complexity, and the results show that XMAM is about 10-10000 times faster than the existing methods.
[[2212.13721] Emerging Mobile Phone-based Social Engineering Cyberattacks in the Zambian ICT Sector](http://arxiv.org/abs/2212.13721) #attack
The number of registered SIM cards and active mobile phone subscribers in Zambia in 2020 surpassed the population of the country. This clearly shows that mobile phones in Zambia have become part of everyday life easing not only the way people communicate but also the way people perform financial transactions owing to the integration of mobile phone systems with financial payment systems. This development has not come without a cost. Cyberattackers, using various social engineering techniques have jumped onto the bandwagon to defraud unsuspecting users. Considering the aforesaid, this paper presents a high-order analytical approach towards mobile phone-based social engineering cyberattacks (phishing, SMishing, and Vishing) in Zambia which seek to defraud benign victims. This paper presents a baseline study to reiterate the problem at hand. Furthermore, we devise an attack model and an evaluation framework and ascertain the most prevalent types of attack. We also present a logistic regression analysis in the results section to conclude the most prevalent mobile phone-based type of social engineering attack. Based on the artifacts and observed insights, we suggest recommendations to mitigate these emergent social engineering cyberattacks.
[[2212.13941] HeATed Alert Triage (HeAT): Transferrable Learning to Extract Multistage Attack Campaigns](http://arxiv.org/abs/2212.13941) #attack
With growing sophistication and volume of cyber attacks combined with complex network structures, it is becoming extremely difficult for security analysts to corroborate evidences to identify multistage campaigns on their network. This work develops HeAT (Heated Alert Triage): given a critical indicator of compromise (IoC), e.g., a severe IDS alert, HeAT produces a HeATed Attack Campaign (HAC) depicting the multistage activities that led up to the critical event. We define the concept of "Alert Episode Heat" to represent the analysts opinion of how much an event contributes to the attack campaign of the critical IoC given their knowledge of the network and security expertise. Leveraging a network-agnostic feature set, HeAT learns the essence of analyst's assessment of "HeAT" for a small set of IoC's, and applies the learned model to extract insightful attack campaigns for IoC's not seen before, even across networks by transferring what have been learned. We demonstrate the capabilities of HeAT with data collected in Collegiate Penetration Testing Competition (CPTC) and through collaboration with a real-world SOC. We developed HeAT-Gain metrics to demonstrate how analysts may assess and benefit from the extracted attack campaigns in comparison to common practices where IP addresses are used to corroborate evidences. Our results demonstrates the practical uses of HeAT by finding campaigns that span across diverse attack stages, remove a significant volume of irrelevant alerts, and achieve coherency to the analyst's original assessments.
[[2212.13991] Detection, Explanation and Filtering of Cyber Attacks Combining Symbolic and Sub-Symbolic Methods](http://arxiv.org/abs/2212.13991) #attack
Machine learning (ML) on graph-structured data has recently received deepened interest in the context of intrusion detection in the cybersecurity domain. Due to the increasing amounts of data generated by monitoring tools as well as more and more sophisticated attacks, these ML methods are gaining traction. Knowledge graphs and their corresponding learning techniques such as Graph Neural Networks (GNNs) with their ability to seamlessly integrate data from multiple domains using human-understandable vocabularies, are finding application in the cybersecurity domain. However, similar to other connectionist models, GNNs are lacking transparency in their decision making. This is especially important as there tend to be a high number of false positive alerts in the cybersecurity domain, such that triage needs to be done by domain experts, requiring a lot of man power. Therefore, we are addressing Explainable AI (XAI) for GNNs to enhance trust management by exploring combining symbolic and sub-symbolic methods in the area of cybersecurity that incorporate domain knowledge. We experimented with this approach by generating explanations in an industrial demonstrator system. The proposed method is shown to produce intuitive explanations for alerts for a diverse range of scenarios. Not only do the explanations provide deeper insights into the alerts, but they also lead to a reduction of false positive alerts by 66% and by 93% when including the fidelity metric.
[[2212.13994] Investigation and rectification of NIDS datasets and standratized feature set derivation for network attack detection with graph neural networks](http://arxiv.org/abs/2212.13994) #attack
Network Intrusion and Detection Systems (NIDS) are essential for malicious traffic and cyberattack detection in modern networks. Artificial intelligence-based NIDS are powerful tools that can learn complex data correlations for accurate attack prediction. Graph Neural Networks (GNNs) provide an opportunity to analyze network topology along with flow features which makes them particularly suitable for NIDS applications. However, successful application of such tool requires large amounts of carefully collected and labeled data for training and testing. In this paper we inspect different versions of ToN-IoT dataset and point out inconsistencies in some versions. We filter the full version of ToN-IoT and present a new version labeled ToN-IoT-R. To ensure generalization we propose a new standardized and compact set of flow features which are derived solely from NetFlowv5-compatible data. We separate numeric data and flags into different categories and propose a new dataset-agnostic normalization approach for numeric features. This allows us to preserve meaning of flow flags and we propose to conduct targeted analysis based on, for instance, network protocols. For flow classification we use E-GraphSage algorithm with modified node initialization technique that allows us to add node degree to node features. We achieve high classification accuracy on ToN-IoT-R and compare it with previously published results for ToN-IoT, NF-ToN-IoT, and NF-ToN-IoT-v2. We highlight the importance of careful data collection and labeling and appropriate data preprocessing choice and conclude that the proposed set of features is more applicable for real NIDS due to being less demanding to traffic monitoring equipment while preserving high flow classification accuracy.
[[2212.13607] EDoG: Adversarial Edge Detection For Graph Neural Networks](http://arxiv.org/abs/2212.13607) #attack
Graph Neural Networks (GNNs) have been widely applied to different tasks such as bioinformatics, drug design, and social networks. However, recent studies have shown that GNNs are vulnerable to adversarial attacks which aim to mislead the node or subgraph classification prediction by adding subtle perturbations. Detecting these attacks is challenging due to the small magnitude of perturbation and the discrete nature of graph data. In this paper, we propose a general adversarial edge detection pipeline EDoG without requiring knowledge of the attack strategies based on graph generation. Specifically, we propose a novel graph generation approach combined with link prediction to detect suspicious adversarial edges. To effectively train the graph generative model, we sample several sub-graphs from the given graph data. We show that since the number of adversarial edges is usually low in practice, with low probability the sampled sub-graphs will contain adversarial edges based on the union bound. In addition, considering the strong attacks which perturb a large number of edges, we propose a set of novel features to perform outlier detection as the preprocessing for our detection. Extensive experimental results on three real-world graph datasets including a private transaction rule dataset from a major company and two types of synthetic graphs with controlled properties show that EDoG can achieve above 0.8 AUC against four state-of-the-art unseen attack strategies without requiring any knowledge about the attack type; and around 0.85 with knowledge of the attack type. EDoG significantly outperforms traditional malicious edge detection baselines. We also show that an adaptive attack with full knowledge of our detection pipeline is difficult to bypass it.
[[2212.13415] Spacecraft Pose Estimation Based on Unsupervised Domain Adaptation and on a 3D-Guided Loss Combination](http://arxiv.org/abs/2212.13415) #robust
Spacecraft pose estimation is a key task to enable space missions in which two spacecrafts must navigate around each other. Current state-of-the-art algorithms for pose estimation employ data-driven techniques. However, there is an absence of real training data for spacecraft imaged in space conditions due to the costs and difficulties associated with the space environment. This has motivated the introduction of 3D data simulators, solving the issue of data availability but introducing a large gap between the training (source) and test (target) domains. We explore a method that incorporates 3D structure into the spacecraft pose estimation pipeline to provide robustness to intensity domain shift and we present an algorithm for unsupervised domain adaptation with robust pseudo-labelling. Our solution has ranked second in the two categories of the 2021 Pose Estimation Challenge organised by the European Space Agency and the Stanford University, achieving the lowest average error over the two categories.
[[2212.13439] Robust Cross-vendor Mammographic Texture Models Using Augmentation-based Domain Adaptation for Long-term Breast Cancer Risk](http://arxiv.org/abs/2212.13439) #robust
The future of population-based breast cancer screening is likely personalized strategies based on clinically relevant risk models. Mammography-based risk models should remain robust to domain shifts caused by different populations and mammographic devices. Modern risk models do not ensure adaptation across vendor-domains and are often conflated to unintentionally rely on both precursors of cancer and systemic/global mammographic information associated with short- and long-term risk, respectively, which might limit performance. We developed a robust, cross-vendor model for long-term risk assessment. An augmentation-based domain adaption technique, based on flavorization of mammographic views, ensured generalization to an unseen vendor-domain. We trained on samples without diagnosed/potential malignant findings to learn systemic/global breast tissue features, called mammographic texture, indicative of future breast cancer. However, training so may cause erratic convergence. By excluding noise-inducing samples and designing a case-control dataset, a robust ensemble texture model was trained. This model was validated in two independent datasets. In 66,607 Danish women with flavorized Siemens views, the AUC was 0.71 and 0.65 for prediction of interval cancers within two years (ICs) and from two years after screening (LTCs), respectively. In a combination with established risk factors, the model's AUC increased to 0.68 for LTCs. In 25,706 Dutch women with Hologic-processed views, the AUCs were not different from the AUCs in Danish women with flavorized views. The results suggested that the model robustly estimated long-term risk while adapting to an unseen processed vendor-domain. The model identified 8.1% of Danish women accounting for 20.9% of ICs and 14.2% of LTCs.
[[2212.13462] MVTN: Learning Multi-View Transformations for 3D Understanding](http://arxiv.org/abs/2212.13462) #robust
Multi-view projection techniques have shown themselves to be highly effective in achieving top-performing results in the recognition of 3D shapes. These methods involve learning how to combine information from multiple view-points. However, the camera view-points from which these views are obtained are often fixed for all shapes. To overcome the static nature of current multi-view techniques, we propose learning these view-points. Specifically, we introduce the Multi-View Transformation Network (MVTN), which uses differentiable rendering to determine optimal view-points for 3D shape recognition. As a result, MVTN can be trained end-to-end with any multi-view network for 3D shape classification. We integrate MVTN into a novel adaptive multi-view pipeline that is capable of rendering both 3D meshes and point clouds. Our approach demonstrates state-of-the-art performance in 3D classification and shape retrieval on several benchmarks (ModelNet40, ScanObjectNN, ShapeNet Core55). Further analysis indicates that our approach exhibits improved robustness to occlusion compared to other methods. We also investigate additional aspects of MVTN, such as 2D pretraining and its use for segmentation. To support further research in this area, we have released MVTorch, a PyTorch library for 3D understanding and generation using multi-view projections.
[[2212.13726] A Clustering-guided Contrastive Fusion for Multi-view Representation Learning](http://arxiv.org/abs/2212.13726) #robust
The past two decades have seen increasingly rapid advances in the field of multi-view representation learning due to it extracting useful information from diverse domains to facilitate the development of multi-view applications. However, the community faces two challenges: i) how to learn robust representations from a large amount of unlabeled data to against noise or incomplete views setting, and ii) how to balance view consistency and complementary for various downstream tasks. To this end, we utilize a deep fusion network to fuse view-specific representations into the view-common representation, extracting high-level semantics for obtaining robust representation. In addition, we employ a clustering task to guide the fusion network to prevent it from leading to trivial solutions. For balancing consistency and complementary, then, we design an asymmetrical contrastive strategy that aligns the view-common representation and each view-specific representation. These modules are incorporated into a unified method known as CLustering-guided cOntrastiVE fusioN (CLOVEN). We quantitatively and qualitatively evaluate the proposed method on five datasets, demonstrating that CLOVEN outperforms 11 competitive multi-view learning methods in clustering and classification. In the incomplete view scenario, our proposed method resists noise interference better than those of our competitors. Furthermore, the visualization analysis shows that CLOVEN can preserve the intrinsic structure of view-specific representation while also improving the compactness of view-commom representation. Our source code will be available soon at https://github.com/guanzhou-ke/cloven.
[[2212.13945] A Segmentation Method for fluorescence images without a machine learning approach](http://arxiv.org/abs/2212.13945) #robust
Background: Image analysis applications in digital pathology include various methods for segmenting regions of interest. Their identification is one of the most complex steps, and therefore of great interest for the study of robust methods that do not necessarily rely on a machine learning (ML) approach. Method: A fully automatic and optimized segmentation process for different datasets is a prerequisite for classifying and diagnosing Indirect ImmunoFluorescence (IIF) raw data. This study describes a deterministic computational neuroscience approach for identifying cells and nuclei. It is far from the conventional neural network approach, but it is equivalent to their quantitative and qualitative performance, and it is also solid to adversative noise. The method is robust, based on formally correct functions, and does not suffer from tuning on specific data sets. Results: This work demonstrates the robustness of the method against the variability of parameters, such as image size, mode, and signal-to-noise ratio. We validated the method on two datasets (Neuroblastoma and NucleusSegData) using images annotated by independent medical doctors. Conclusions: The definition of deterministic and formally correct methods, from a functional to a structural point of view, guarantees the achievement of optimized and functionally correct results. The excellent performance of our deterministic method (NeuronalAlg) to segment cells and nuclei from fluorescence images was measured with quantitative indicators and compared with those achieved by three published ML approaches.
[[2212.13402] Traceable Automatic Feature Transformation via Cascading Actor-Critic Agents](http://arxiv.org/abs/2212.13402) #robust
Feature transformation for AI is an essential task to boost the effectiveness and interpretability of machine learning (ML). Feature transformation aims to transform original data to identify an optimal feature space that enhances the performances of a downstream ML model. Existing studies either combines preprocessing, feature selection, and generation skills to empirically transform data, or automate feature transformation by machine intelligence, such as reinforcement learning. However, existing studies suffer from: 1) high-dimensional non-discriminative feature space; 2) inability to represent complex situational states; 3) inefficiency in integrating local and global feature information. To fill the research gap, we formulate the feature transformation task as an iterative, nested process of feature generation and selection, where feature generation is to generate and add new features based on original features, and feature selection is to remove redundant features to control the size of feature space. Finally, we present extensive experiments and case studies to illustrate 24.7\% improvements in F1 scores compared with SOTAs and robustness in high-dimensional data.
[[2212.13669] Optimal algorithms for group distributionally robust optimization and beyond](http://arxiv.org/abs/2212.13669) #robust
Distributionally robust optimization (DRO) can improve the robustness and fairness of learning methods. In this paper, we devise stochastic algorithms for a class of DRO problems including group DRO, subpopulation fairness, and empirical conditional value at risk (CVaR) optimization. Our new algorithms achieve faster convergence rates than existing algorithms for multiple DRO settings. We also provide a new information-theoretic lower bound that implies our bounds are tight for group DRO. Empirically, too, our algorithms outperform known methods
[[2212.13792] Periocular Biometrics: A Modality for Unconstrained Scenarios](http://arxiv.org/abs/2212.13792) #biometric
Periocular refers to the region of the face that surrounds the eye socket. This is a feature-rich area that can be used by itself to determine the identity of an individual. It is especially useful when the iris or the face cannot be reliably acquired. This can be the case of unconstrained or uncooperative scenarios, where the face may appear partially occluded, or the subject-to-camera distance may be high. However, it has received revived attention during the pandemic due to masked faces, leaving the ocular region as the only visible facial area, even in controlled scenarios. This paper discusses the state-of-the-art of periocular biometrics, giving an overall framework of its most significant research aspects.
[[2212.13525] Cross-Resolution Flow Propagation for Foveated Video Super-Resolution](http://arxiv.org/abs/2212.13525) #extraction
The demand of high-resolution video contents has grown over the years. However, the delivery of high-resolution video is constrained by either computational resources required for rendering or network bandwidth for remote transmission. To remedy this limitation, we leverage the eye trackers found alongside existing augmented and virtual reality headsets. We propose the application of video super-resolution (VSR) technique to fuse low-resolution context with regional high-resolution context for resource-constrained consumption of high-resolution content without perceivable drop in quality. Eye trackers provide us the gaze direction of a user, aiding us in the extraction of the regional high-resolution context. As only pixels that falls within the gaze region can be resolved by the human eye, a large amount of the delivered content is redundant as we can't perceive the difference in quality of the region beyond the observed region. To generate a visually pleasing frame from the fusion of high-resolution region and low-resolution region, we study the capability of a deep neural network of transferring the context of the observed region to other regions (low-resolution) of the current and future frames. We label this task a Foveated Video Super-Resolution (FVSR), as we need to super-resolve the low-resolution regions of current and future frames through the fusion of pixels from the gaze region. We propose Cross-Resolution Flow Propagation (CRFP) for FVSR. We train and evaluate CRFP on REDS dataset on the task of 8x FVSR, i.e. a combination of 8x VSR and the fusion of foveated region. Departing from the conventional evaluation of per frame quality using SSIM or PSNR, we propose the evaluation of past foveated region, measuring the capability of a model to leverage the noise present in eye trackers during FVSR. Code is made available at https://github.com/eugenelet/CRFP.
[[2212.14033] How Do Deepfakes Move? Motion Magnification for Deepfake Source Detection](http://arxiv.org/abs/2212.14033) #extraction
With the proliferation of deep generative models, deepfakes are improving in quality and quantity everyday. However, there are subtle authenticity signals in pristine videos, not replicated by SOTA GANs. We contrast the movement in deepfakes and authentic videos by motion magnification towards building a generalized deepfake source detector. The sub-muscular motion in faces has different interpretations per different generative models which is reflected in their generative residue. Our approach exploits the difference between real motion and the amplified GAN fingerprints, by combining deep and traditional motion magnification, to detect whether a video is fake and its source generator if so. Evaluating our approach on two multi-source datasets, we obtain 97.17% and 94.03% for video source detection. We compare against the prior deepfake source detector and other complex architectures. We also analyze the importance of magnification amount, phase extraction window, backbone network architecture, sample counts, and sample lengths. Finally, we report our results for different skin tones to assess the bias.
[[2212.13258] Intelligent Feature Extraction, Data Fusion and Detection of Concrete Bridge Cracks: Current Development and Challenges](http://arxiv.org/abs/2212.13258) #extraction
As a common appearance defect of concrete bridges, cracks are important indices for bridge structure health assessment. Although there has been much research on crack identification, research on the evolution mechanism of bridge cracks is still far from practical applications. In this paper, the state-of-the-art research on intelligent theories and methodologies for intelligent feature extraction, data fusion and crack detection based on data-driven approaches is comprehensively reviewed. The research is discussed from three aspects: the feature extraction level of the multimodal parameters of bridge cracks, the description level and the diagnosis level of the bridge crack damage states. We focus on previous research concerning the quantitative characterization problems of multimodal parameters of bridge cracks and their implementation in crack identification, while highlighting some of their major drawbacks. In addition, the current challenges and potential future research directions are discussed.
[[2212.13904] A Novel Self-Supervised Learning-Based Anomaly Node Detection Method Based on an Autoencoder in Wireless Sensor Networks](http://arxiv.org/abs/2212.13904) #extraction
Due to the issue that existing wireless sensor network (WSN)-based anomaly detection methods only consider and analyze temporal features, in this paper, a self-supervised learning-based anomaly node detection method based on an autoencoder is designed. This method integrates temporal WSN data flow feature extraction, spatial position feature extraction and intermodal WSN correlation feature extraction into the design of the autoencoder to make full use of the spatial and temporal information of the WSN for anomaly detection. First, a fully connected network is used to extract the temporal features of nodes by considering a single mode from a local spatial perspective. Second, a graph neural network (GNN) is used to introduce the WSN topology from a global spatial perspective for anomaly detection and extract the spatial and temporal features of the data flows of nodes and their neighbors by considering a single mode. Then, the adaptive fusion method involving weighted summation is used to extract the relevant features between different models. In addition, this paper introduces a gated recurrent unit (GRU) to solve the long-term dependence problem of the time dimension. Eventually, the reconstructed output of the decoder and the hidden layer representation of the autoencoder are fed into a fully connected network to calculate the anomaly probability of the current system. Since the spatial feature extraction operation is advanced, the designed method can be applied to the task of large-scale network anomaly detection by adding a clustering operation. Experiments show that the designed method outperforms the baselines, and the F1 score reaches 90.6%, which is 5.2% higher than those of the existing anomaly detection methods based on unsupervised reconstruction and prediction. Code and model are available at https://github.com/GuetYe/anomaly_detection/GLSL
[[2212.13679] CCFL: Computationally Customized Federated Learning](http://arxiv.org/abs/2212.13679) #federate
Federated learning (FL) is a method to train model with distributed data from numerous participants such as IoT devices. It inherently assumes a uniform capacity among participants. However, participants have diverse computational resources in practice due to different conditions such as different energy budgets or executing parallel unrelated tasks. It is necessary to reduce the computation overhead for participants with inefficient computational resources, otherwise they would be unable to finish the full training process. To address the computation heterogeneity, in this paper we propose a strategy for estimating local models without computationally intensive iterations. Based on it, we propose Computationally Customized Federated Learning (CCFL), which allows each participant to determine whether to perform conventional local training or model estimation in each round based on its current computational resources. Both theoretical analysis and exhaustive experiments indicate that CCFL has the same convergence rate as FedAvg without resource constraints. Furthermore, CCFL can be viewed of a computation-efficient extension of FedAvg that retains model performance while considerably reducing computation overhead.
[[2212.13392] DeepCuts: Single-Shot Interpretability based Pruning for BERT](http://arxiv.org/abs/2212.13392) #interpretability
As language models have grown in parameters and layers, it has become much harder to train and infer with them on single GPUs. This is severely restricting the availability of large language models such as GPT-3, BERT-Large, and many others. A common technique to solve this problem is pruning the network architecture by removing transformer heads, fully-connected weights, and other modules. The main challenge is to discern the important parameters from the less important ones. Our goal is to find strong metrics for identifying such parameters. We thus propose two strategies: Cam-Cut based on the GradCAM interpretations, and Smooth-Cut based on the SmoothGrad, for calculating the importance scores. Through this work, we show that our scoring functions are able to assign more relevant task-based scores to the network parameters, and thus both our pruning approaches significantly outperform the standard weight and gradient-based strategies, especially at higher compression ratios in BERT-based models. We also analyze our pruning masks and find them to be significantly different from the ones obtained using standard metrics.
[[2212.13428] A Survey on Knowledge-Enhanced Pre-trained Language Models](http://arxiv.org/abs/2212.13428) #interpretability
Natural Language Processing (NLP) has been revolutionized by the use of Pre-trained Language Models (PLMs) such as BERT. Despite setting new records in nearly every NLP task, PLMs still face a number of challenges including poor interpretability, weak reasoning capability, and the need for a lot of expensive annotated data when applied to downstream tasks. By integrating external knowledge into PLMs, \textit{\underline{K}nowledge-\underline{E}nhanced \underline{P}re-trained \underline{L}anguage \underline{M}odels} (KEPLMs) have the potential to overcome the above-mentioned limitations. In this paper, we examine KEPLMs systematically through a series of studies. Specifically, we outline the common types and different formats of knowledge to be integrated into KEPLMs, detail the existing methods for building and evaluating KEPLMS, present the applications of KEPLMs in downstream tasks, and discuss the future research directions. Researchers will benefit from this survey by gaining a quick and comprehensive overview of the latest developments in this field.
[[2212.13634] On the Equivalence of the Weighted Tsetlin Machine and the Perceptron](http://arxiv.org/abs/2212.13634) #interpretability
Tsetlin Machine (TM) has been gaining popularity as an inherently interpretable machine leaning method that is able to achieve promising performance with low computational complexity on a variety of applications. The interpretability and the low computational complexity of the TM are inherited from the Boolean expressions for representing various sub-patterns. Although possessing favorable properties, TM has not been the go-to method for AI applications, mainly due to its conceptual and theoretical differences compared with perceptrons and neural networks, which are more widely known and well understood. In this paper, we provide detailed insights for the operational concept of the TM, and try to bridge the gap in the theoretical understanding between the perceptron and the TM. More specifically, we study the operational concept of the TM following the analytical structure of perceptrons, showing the resemblance between the perceptrons and the TM. Through the analysis, we indicated that the TM's weight update can be considered as a special case of the gradient weight update. We also perform an empirical analysis of TM by showing the flexibility in determining the clause length, visualization of decision boundaries and obtaining interpretable boolean expressions from TM. In addition, we also discuss the advantages of TM in terms of its structure and its ability to solve more complex problems.
[[2212.13408] NEEDED: Introducing Hierarchical Transformer to Eye Diseases Diagnosis](http://arxiv.org/abs/2212.13408) #explainability
With the development of natural language processing techniques(NLP), automatic diagnosis of eye diseases using ophthalmology electronic medical records (OEMR) has become possible. It aims to evaluate the condition of both eyes of a patient respectively, and we formulate it as a particular multi-label classification task in this paper. Although there are a few related studies in other diseases, automatic diagnosis of eye diseases exhibits unique characteristics. First, descriptions of both eyes are mixed up in OEMR documents, with both free text and templated asymptomatic descriptions, resulting in sparsity and clutter of information. Second, OEMR documents contain multiple parts of descriptions and have long document lengths. Third, it is critical to provide explainability to the disease diagnosis model. To overcome those challenges, we present an effective automatic eye disease diagnosis framework, NEEDED. In this framework, a preprocessing module is integrated to improve the density and quality of information. Then, we design a hierarchical transformer structure for learning the contextualized representations of each sentence in the OEMR document. For the diagnosis part, we propose an attention-based predictor that enables traceable diagnosis by obtaining disease-specific information. Experiments on the real dataset and comparison with several baseline models show the advantage and explainability of our framework.
[[2212.13344] DiffFace: Diffusion-based Face Swapping with Facial Guidance](http://arxiv.org/abs/2212.13344) #diffusion
In this paper, we propose a diffusion-based face swapping framework for the first time, called DiffFace, composed of training ID conditional DDPM, sampling with facial guidance, and a target-preserving blending. In specific, in the training process, the ID conditional DDPM is trained to generate face images with the desired identity. In the sampling process, we use the off-the-shelf facial expert models to make the model transfer source identity while preserving target attributes faithfully. During this process, to preserve the background of the target image and obtain the desired face swapping result, we additionally propose a target-preserving blending strategy. It helps our model to keep the attributes of the target face from noise while transferring the source facial identity. In addition, without any re-training, our model can flexibly apply additional facial guidance and adaptively control the ID-attributes trade-off to achieve the desired results. To the best of our knowledge, this is the first approach that applies the diffusion model in face swapping task. Compared with previous GAN-based approaches, by taking advantage of the diffusion model for the face swapping task, DiffFace achieves better benefits such as training stability, high fidelity, diversity of the samples, and controllability. Extensive experiments show that our DiffFace is comparable or superior to the state-of-the-art methods on several standard face swapping benchmarks.
[[2212.13771] Exploring Vision Transformers as Diffusion Learners](http://arxiv.org/abs/2212.13771) #diffusion
Score-based diffusion models have captured widespread attention and funded fast progress of recent vision generative tasks. In this paper, we focus on diffusion model backbone which has been much neglected before. We systematically explore vision Transformers as diffusion learners for various generative tasks. With our improvements the performance of vanilla ViT-based backbone (IU-ViT) is boosted to be on par with traditional U-Net-based methods. We further provide a hypothesis on the implication of disentangling the generative backbone as an encoder-decoder structure and show proof-of-concept experiments verifying the effectiveness of a stronger encoder for generative tasks with ASymmetriC ENcoder Decoder (ASCEND). Our improvements achieve competitive results on CIFAR-10, CelebA, LSUN, CUB Bird and large-resolution text-to-image tasks. To the best of our knowledge, we are the first to successfully train a single diffusion model on text-to-image task beyond 64x64 resolution. We hope this will motivate people to rethink the modeling choices and the training pipelines for diffusion-based generative models.