[[2306.07992] Securing Visually-Aware Recommender Systems: An Adversarial Image Reconstruction and Detection Framework](http://arxiv.org/abs/2306.07992) #secure
With rich visual data, such as images, becoming readily associated with items, visually-aware recommendation systems (VARS) have been widely used in different applications. Recent studies have shown that VARS are vulnerable to item-image adversarial attacks, which add human-imperceptible perturbations to the clean images associated with those items. Attacks on VARS pose new security challenges to a wide range of applications such as e-Commerce and social networks where VARS are widely used. How to secure VARS from such adversarial attacks becomes a critical problem. Currently, there is still a lack of systematic study on how to design secure defense strategies against visual attacks on VARS. In this paper, we attempt to fill this gap by proposing an adversarial image reconstruction and detection framework to secure VARS. Our proposed method can simultaneously (1) secure VARS from adversarial attacks characterized by local perturbations by image reconstruction based on global vision transformers; and (2) accurately detect adversarial examples using a novel contrastive learning approach. Meanwhile, our framework is designed to be used as both a filter and a detector so that they can be jointly trained to improve the flexibility of our defense strategy to a variety of attacks and VARS models. We have conducted extensive experimental studies with two popular attack methods (FGSM and PGD). Our experimental results on two real-world datasets show that our defense strategy against visual attacks is effective and outperforms existing methods on different attacks. Moreover, our method can detect adversarial examples with high accuracy.
[[2306.08074] RETINA: Distributed and Secure Trust Management for Smart Grid Applications and Energy Trading](http://arxiv.org/abs/2306.08074) #secure
The rapid adoption of smart grids demands robust security and efficiency measures due to their critical role in delivering electricity and their potential for customer-oriented benefits. This paper presents an innovative framework, named RETINA, which provides a resilient and secure energy trading mechanism within smart grid systems. RETINA tackles the inherent security and infrastructure challenges in smart grids by establishing a trust-based security layer and facilitating energy transactions through blockchain technology. Our proposed solution integrates Public Key Infrastructure (PKI) and the Web of Trust (WoT) concepts, promoting decentralized communication channels and robust key management. We further introduce a smart contract-based energy trading mechanism that factors in trust, distance, and energy type (green or non-green) in cost calculation. The utility and robustness of RETINA have been validated in a virtualized testbed environment with 500 nodes, demonstrating superior performance in terms of scalability and resilience compared to the existing WoT scheme. Furthermore, RETINA successfully enables a secure and efficient energy trading scheme, promoting the use of renewable energy sources. Future enhancements will include application to a realistic smart grid deployment and the integration of additional functionalities. This groundbreaking solution has the potential to revolutionize the smart grid ecosystem, addressing its current limitations and propelling the industry towards a future of advanced and secure energy exchange.
[[2306.08134] Uncovering and Exploiting Hidden APIs in Mobile Super Apps](http://arxiv.org/abs/2306.08134) #secure
Mobile applications, particularly those from social media platforms such as WeChat and TikTok, are evolving into "super apps" that offer a wide range of services such as instant messaging and media sharing, e-commerce, e-learning, and e-government. These super apps often provide APIs for developers to create "miniapps" that run within the super app. These APIs should have been thoroughly scrutinized for security. Unfortunately, we find that many of them are undocumented and unsecured, potentially allowing miniapps to bypass restrictions and gain higher privileged access. To systematically identify these hidden APIs before they are exploited by attackers, we developed a tool APIScope with both static analysis and dynamic analysis, where static analysis is used to recognize hidden undocumented APIs, and dynamic analysis is used to confirm whether the identified APIs can be invoked by an unprivileged 3rdparty miniapps. We have applied APIScope to five popular super apps (i.e., WeChat, WeCom, Baidu, QQ, and Tiktok) and found that all of them contain hidden APIs, many of which can be exploited due to missing security checks. We have also quantified the hidden APIs that may have security implications by verifying if they have access to resources protected by Android permissions. Furthermore, we demonstrate the potential security hazards by presenting various attack scenarios, including unauthorized access to any web pages, downloading and installing malicious software, and stealing sensitive information. We have reported our findings to the relevant vendors, some of whom have patched the vulnerabilities and rewarded us with bug bounties.
[[2306.08151] Don't Leak Your Keys: Understanding, Measuring, and Exploiting the AppSecret Leaks in Mini-Programs](http://arxiv.org/abs/2306.08151) #secure
Mobile mini-programs in WeChat have gained significant popularity since their debut in 2017, reaching a scale similar to that of Android apps in the Play Store. Like Google, Tencent, the provider of WeChat, offers APIs to support the development of mini-programs and also maintains a mini-program market within the WeChat app. However, mini-program APIs often manage sensitive user data within the social network platform, both on the WeChat client app and in the cloud. As a result, cryptographic protocols have been implemented to secure data access. In this paper, we demonstrate that WeChat should have required the use of the "appsecret" master key, which is used to authenticate a mini-program, to be used only in the mini-program back-end. If this key is leaked in the front-end of the mini-programs, it can lead to catastrophic attacks on both mini-program developers and users. Using a mini-program crawler and a master key leakage inspector, we measured 3,450,586 crawled mini-programs and found that 40,880 of them had leaked their master keys, allowing attackers to carry out various attacks such as account hijacking, promotion abuse, and service theft. Similar issues were confirmed through testing and measuring of Baidu mini-programs too. We have reported these vulnerabilities and the list of vulnerable mini-programs to Tencent and Baidu, which awarded us with bug bounties, and also Tencent recently released a new API to defend against these attacks based on our findings.
[[2306.08271] Multiclass Confidence and Localization Calibration for Object Detection](http://arxiv.org/abs/2306.08271) #security
Albeit achieving high predictive accuracy across many challenging computer vision problems, recent studies suggest that deep neural networks (DNNs) tend to make overconfident predictions, rendering them poorly calibrated. Most of the existing attempts for improving DNN calibration are limited to classification tasks and restricted to calibrating in-domain predictions. Surprisingly, very little to no attempts have been made in studying the calibration of object detection methods, which occupy a pivotal space in vision-based security-sensitive, and safety-critical applications. In this paper, we propose a new train-time technique for calibrating modern object detection methods. It is capable of jointly calibrating multiclass confidence and box localization by leveraging their predictive uncertainties. We perform extensive experiments on several in-domain and out-of-domain detection benchmarks. Results demonstrate that our proposed train-time calibration method consistently outperforms several baselines in reducing calibration error for both in-domain and out-of-domain predictions. Our code and models are available at https://github.com/bimsarapathiraja/MCCL.
[[2306.07984] Cross Chain Bribery Contracts: Majority vs Mighty Minority](http://arxiv.org/abs/2306.07984) #security
Bribery is a perilous issue in the real world, especially in an economical aspect. This fraudulence is unavoidable, and more importantly, it is more difficult to trace in case smart contracts are utilized for bribing on a distributed public blockchain. In our paper, we propose a new threat to the security of a blockchain system, cross-chain bribery using smart contracts. An arbitrary wealthy briber can utilize cross-chain smart contracts to manipulate a consensus mechanism on a victim's blockchain or to disgrace a victim's blockchain. To better understand this threat, our paper proposes a framework to analyze bribery using cross-chain smart contracts. We analyze the amount of incentive to bribe rational miners in a victim's blockchain and also a full cost of conducting a cross-chain bribery attack. The result is that such attacks can be carried out with a reasonable amount of money or cryptocurrencies.
[[2306.07989] A Survey on Cross-Architectural IoT Malware Threat Hunting](http://arxiv.org/abs/2306.07989) #security
In recent years, the increase in non-Windows malware threats had turned the focus of the cybersecurity community. Research works on hunting Windows PE-based malwares are maturing, whereas the developments on Linux malware threat hunting are relatively scarce. With the advent of the Internet of Things (IoT) era, smart devices that are getting integrated into human life have become a hackers highway for their malicious activities. The IoT devices employ various Unix-based architectures that follow ELF (Executable and Linkable Format) as their standard binary file specification. This study aims at providing a comprehensive survey on the latest developments in cross-architectural IoT malware detection and classification approaches. Aided by a modern taxonomy, we discuss the feature representations, feature extraction techniques, and machine learning models employed in the surveyed works. We further provide more insights on the practical challenges involved in cross-architectural IoT malware threat hunting and discuss various avenues to instill potential future research.
[[2306.08056] Distributed Trust Through the Lens of Software Architecture](http://arxiv.org/abs/2306.08056) #security
Distributed trust is a nebulous concept that has evolved from different perspectives in recent years. While one can attribute its current prominence to blockchain and cryptocurrency, the distributed trust concept has been cultivating progress in federated learning, trustworthy and responsible AI in an ecosystem setting, data sharing, privacy issues across organizational boundaries, and zero trust cybersecurity. This paper will survey the concept of distributed trust in multiple disciplines. It will take a system/software architecture point of view to look at trust redistribution/shift and the associated tradeoffs in systems and applications enabled by distributed trust technologies.
[[2306.08060] Software Supply Chain Vulnerabilities Detection in Source Code: Performance Comparison between Traditional and Quantum Machine Learning Algorithms](http://arxiv.org/abs/2306.08060) #security
The software supply chain (SSC) attack has become one of the crucial issues that are being increased rapidly with the advancement of the software development domain. In general, SSC attacks execute during the software development processes lead to vulnerabilities in software products targeting downstream customers and even involved stakeholders. Machine Learning approaches are proven in detecting and preventing software security vulnerabilities. Besides, emerging quantum machine learning can be promising in addressing SSC attacks. Considering the distinction between traditional and quantum machine learning, performance could be varies based on the proportions of the experimenting dataset. In this paper, we conduct a comparative analysis between quantum neural networks (QNN) and conventional neural networks (NN) with a software supply chain attack dataset known as ClaMP. Our goal is to distinguish the performance between QNN and NN and to conduct the experiment, we develop two different models for QNN and NN by utilizing Pennylane for quantum and TensorFlow and Keras for traditional respectively. We evaluated the performance of both models with different proportions of the ClaMP dataset to identify the f1 score, recall, precision, and accuracy. We also measure the execution time to check the efficiency of both models. The demonstration result indicates that execution time for QNN is slower than NN with a higher percentage of datasets. Due to recent advancements in QNN, a large level of experiments shall be carried out to understand both models accurately in our future research.
[[2306.08168] Decentralizing Custodial Wallets with MFKDF](http://arxiv.org/abs/2306.08168) #security
The average cryptocurrency user today faces a difficult choice between centralized custodial wallets, which are notoriously prone to spontaneous collapse, or cumbersome self-custody solutions, which if not managed properly can cause a total loss of funds. In this paper, we present a "best of both worlds" cryptocurrency wallet design that looks like, and inherits the user experience of, a centralized custodial solution, while in fact being entirely decentralized in design and implementation. In our design, private keys are not stored on any device, but are instead derived directly from a user's authentication factors, such as passwords, soft tokens (e.g., Google Authenticator), hard tokens (e.g., YubiKey), or out-of-band authentication (e.g., SMS). Public parameters (salts, one-time pads, etc.) needed to access the wallet can be safely stored in public view, such as on a public blockchain, thereby providing strong availability guarantees. Users can then simply "log in" to their decentralized wallet on any device using standard credentials and even recover from lost credentials, thereby providing the usability of a custodial wallet with the trust and security of a decentralized approach.
[[2306.08178] ChatGPT vs](http://arxiv.org/abs/2306.08178) #security
This study, to the best of our knowledge, is the first to explore the intersection between lightweight cryptography (LWC) and advanced artificial intelligence (AI) language models. LWC, in particular the ASCON algorithm which has been selected as the LWC standard by the National Institute of Standards and Technology (NIST) in Feb. 2023, has become increasingly significant for preserving data security given the quick expansion and resource limitations of Internet of Things (IoT) devices. On the other hand, OpenAI's large language model (LLM) ChatGPT has demonstrated significant potential in producing complex, human-like text. This paper offers a novel method for implementing the NIST LWC standard, ASCON, using the GPT-4 model. Moreover, this paper details the design and functionality of ASCON, the procedures and actual Python implementation of ASCON using ChatGPT, and a discussion of the outcomes. The results contribute valuable insights into the efficient application of advanced AI language models in cryptography, particularly in constrained environments. Source code can be found at: https://github.com/DrCintas/ASCON-with-ChatGPT.
[[2306.08243] MMASD: A Multimodal Dataset for Autism Intervention Analysis](http://arxiv.org/abs/2306.08243) #privacy
Autism spectrum disorder (ASD) is a developmental disorder characterized by significant social communication impairments and difficulties perceiving and presenting communication cues. Machine learning techniques have been broadly adopted to facilitate autism studies and assessments. However, computational models are primarily concentrated on specific analysis and validated on private datasets in the autism community, which limits comparisons across models due to privacy-preserving data sharing complications. This work presents a novel privacy-preserving open-source dataset, MMASD as a MultiModal ASD benchmark dataset, collected from play therapy interventions of children with Autism. MMASD includes data from 32 children with ASD, and 1,315 data samples segmented from over 100 hours of intervention recordings. To promote public access, each data sample consists of four privacy-preserving modalities of data: (1) optical flow, (2) 2D skeleton, (3) 3D skeleton, and (4) clinician ASD evaluation scores of children, e.g., ADOS scores. MMASD aims to assist researchers and therapists in understanding children's cognitive status, monitoring their progress during therapy, and customizing the treatment plan accordingly. It also has inspiration for downstream tasks such as action quality assessment and interpersonal synchrony estimation. MMASD dataset can be easily accessed at https://github.com/Li-Jicheng/MMASD-A-Multimodal-Dataset-for-Autism-Intervention-Analysis.
[[2306.08126] PersonaPKT: Building Personalized Dialogue Agents via Parameter-efficient Knowledge Transfer](http://arxiv.org/abs/2306.08126) #privacy
Personalized dialogue agents (DAs) powered by large pre-trained language models (PLMs) often rely on explicit persona descriptions to maintain personality consistency. However, such descriptions may not always be available or may pose privacy concerns. To tackle this bottleneck, we introduce PersonaPKT, a lightweight transfer learning approach that can build persona-consistent dialogue models without explicit persona descriptions. By representing each persona as a continuous vector, PersonaPKT learns implicit persona-specific features directly from a small number of dialogue samples produced by the same persona, adding less than 0.1% trainable parameters for each persona on top of the PLM backbone. Empirical results demonstrate that PersonaPKT effectively builds personalized DAs with high storage efficiency, outperforming various baselines in terms of persona consistency while maintaining good response generation quality. In addition, it enhances privacy protection by avoiding explicit persona descriptions. Overall, PersonaPKT is an effective solution for creating personalized DAs that respect user privacy.
[[2306.07973] PrivaScissors: Enhance the Privacy of Collaborative Inference through the Lens of Mutual Information](http://arxiv.org/abs/2306.07973) #privacy
Edge-cloud collaborative inference empowers resource-limited IoT devices to support deep learning applications without disclosing their raw data to the cloud server, thus preserving privacy. Nevertheless, prior research has shown that collaborative inference still results in the exposure of data and predictions from edge devices. To enhance the privacy of collaborative inference, we introduce a defense strategy called PrivaScissors, which is designed to reduce the mutual information between a model's intermediate outcomes and the device's data and predictions. We evaluate PrivaScissors's performance on several datasets in the context of diverse attacks and offer a theoretical robustness guarantee.
[[2306.08011] Privacy Inference-Empowered Stealthy Backdoor Attack on Federated Learning under Non-IID Scenarios](http://arxiv.org/abs/2306.08011) #privacy
Federated learning (FL) naturally faces the problem of data heterogeneity in real-world scenarios, but this is often overlooked by studies on FL security and privacy. On the one hand, the effectiveness of backdoor attacks on FL may drop significantly under non-IID scenarios. On the other hand, malicious clients may steal private data through privacy inference attacks. Therefore, it is necessary to have a comprehensive perspective of data heterogeneity, backdoor, and privacy inference. In this paper, we propose a novel privacy inference-empowered stealthy backdoor attack (PI-SBA) scheme for FL under non-IID scenarios. Firstly, a diverse data reconstruction mechanism based on generative adversarial networks (GANs) is proposed to produce a supplementary dataset, which can improve the attacker's local data distribution and support more sophisticated strategies for backdoor attacks. Based on this, we design a source-specified backdoor learning (SSBL) strategy as a demonstration, allowing the adversary to arbitrarily specify which classes are susceptible to the backdoor trigger. Since the PI-SBA has an independent poisoned data synthesis process, it can be integrated into existing backdoor attacks to improve their effectiveness and stealthiness in non-IID scenarios. Extensive experiments based on MNIST, CIFAR10 and Youtube Aligned Face datasets demonstrate that the proposed PI-SBA scheme is effective in non-IID FL and stealthy against state-of-the-art defense methods.
[[2306.08153] (Amplified) Banded Matrix Factorization: A unified approach to private training](http://arxiv.org/abs/2306.08153) #privacy
Matrix factorization (MF) mechanisms for differential privacy (DP) have substantially improved the state-of-the-art in privacy-utility-computation tradeoffs for ML applications in a variety of scenarios, but in both the centralized and federated settings there remain instances where either MF cannot be easily applied, or other algorithms provide better tradeoffs (typically, as $\epsilon$ becomes small).
In this work, we show how MF can subsume prior state-of-the-art algorithms in both federated and centralized training settings, across all privacy budgets. The key technique throughout is the construction of MF mechanisms with banded matrices. For cross-device federated learning (FL), this enables multiple-participations with a relaxed device participation schema compatible with practical FL infrastructure (as demonstrated by a production deployment). In the centralized setting, we prove that banded matrices enjoy the same privacy amplification results as for the ubiquitous DP-SGD algorithm, but can provide strictly better performance in most scenarios -- this lets us always at least match DP-SGD, and often outperform it even at $\epsilon\ll2$. Finally, $\hat{b}$-banded matrices substantially reduce the memory and time complexity of per-step noise generation from $\mathcal{O}(n)$, $n$ the total number of iterations, to a constant $\mathcal{O}(\hat{b})$, compared to general MF mechanisms.
[[2306.08170] Is Your Wallet Snitching On You? An Analysis on the Privacy Implications of Web3](http://arxiv.org/abs/2306.08170) #privacy
With the recent hype around the Metaverse and NFTs, Web3 is getting more and more popular. The goal of Web3 is to decentralize the web via decentralized applications. Wallets play a crucial role as they act as an interface between these applications and the user. Wallets such as MetaMask are being used by millions of users nowadays. Unfortunately, Web3 is often advertised as more secure and private. However, decentralized applications as well as wallets are based on traditional technologies, which are not designed with privacy of users in mind. In this paper, we analyze the privacy implications that Web3 technologies such as decentralized applications and wallets have on users. To this end, we build a framework that measures exposure of wallet information. First, we study whether information about installed wallets is being used to track users online. We analyze the top 100K websites and find evidence of 1,325 websites running scripts that probe whether users have wallets installed in their browser. Second, we measure whether decentralized applications and wallets leak the user's unique wallet address to third-parties. We intercept the traffic of 616 decentralized applications and 100 wallets and find over 2000 leaks across 211 applications and more than 300 leaks across 13 wallets. Our study shows that Web3 poses a threat to users' privacy and requires new designs towards more privacy-aware wallet architectures.
[[2306.08173] Safeguarding Data in Multimodal AI: A Differentially Private Approach to CLIP Training](http://arxiv.org/abs/2306.08173) #privacy
The surge in multimodal AI's success has sparked concerns over data privacy in vision-and-language tasks. While CLIP has revolutionized multimodal learning through joint training on images and text, its potential to unintentionally disclose sensitive information necessitates the integration of privacy-preserving mechanisms. We introduce a differentially private adaptation of the Contrastive Language-Image Pretraining (CLIP) model that effectively addresses privacy concerns while retaining accuracy. Our proposed method, Dp-CLIP, is rigorously evaluated on benchmark datasets encompassing diverse vision-and-language tasks such as image classification and visual question answering. We demonstrate that our approach retains performance on par with the standard non-private CLIP model. Furthermore, we analyze our proposed algorithm under linear representation settings. We derive the convergence rate of our algorithm and show a trade-off between utility and privacy when gradients are clipped per-batch and the loss function does not satisfy smoothness conditions assumed in the literature for the analysis of DP-SGD.
[[2306.08223] Protecting User Privacy in Remote Conversational Systems: A Privacy-Preserving framework based on text sanitization](http://arxiv.org/abs/2306.08223) #privacy
Large Language Models (LLMs) are gaining increasing attention due to their exceptional performance across numerous tasks. As a result, the general public utilize them as an influential tool for boosting their productivity while natural language processing researchers endeavor to employ them in solving existing or new research problems. Unfortunately, individuals can only access such powerful AIs through APIs, which ultimately leads to the transmission of raw data to the models' providers and increases the possibility of privacy data leakage. Current privacy-preserving methods for cloud-deployed language models aim to protect privacy information in the pre-training dataset or during the model training phase. However, they do not meet the specific challenges presented by the remote access approach of new large-scale language models.
This paper introduces a novel task, "User Privacy Protection for Dialogue Models," which aims to safeguard sensitive user information from any possible disclosure while conversing with chatbots. We also present an evaluation scheme for this task, which covers evaluation metrics for privacy protection, data availability, and resistance to simulation attacks. Moreover, we propose the first framework for this task, namely privacy protection through text sanitization. Before sending the input to remote large models, it filters out the sensitive information, using several rounds of text sanitization based on privacy types that users define. Upon receiving responses from the larger model, our framework automatically restores privacy to ensure that the conversation goes smoothly, without intervention from the privacy filter. Experiments based on real-world datasets demonstrate the efficacy of our privacy-preserving approach against eavesdropping from potential attackers.
[[2306.08157] Causal Feature Engineering of Price Directions of Cryptocurrencies using Dynamic Bayesian Networks](http://arxiv.org/abs/2306.08157) #privacy
Cryptocurrencies have gained popularity across various sectors, especially in finance and investment. The popularity is partly due to their unique specifications originating from blockchain-related characteristics such as privacy, decentralisation, and untraceability. Despite their growing popularity, cryptocurrencies remain a high-risk investment due to their price volatility and uncertainty. The inherent volatility in cryptocurrency prices, coupled with internal cryptocurrency-related factors and external influential global economic factors makes predicting their prices and price movement directions challenging. Nevertheless, the knowledge obtained from predicting the direction of cryptocurrency prices can provide valuable guidance for investors in making informed investment decisions. To address this issue, this paper proposes a dynamic Bayesian network (DBN) approach, which can model complex systems in multivariate settings, to predict the price movement direction of five popular altcoins (cryptocurrencies other than Bitcoin) in the next trading day. The efficacy of the proposed model in predicting cryptocurrency price directions is evaluated from two perspectives. Firstly, our proposed approach is compared to two baseline models, namely an auto-regressive integrated moving average and support vector regression. Secondly, from a feature engineering point of view, the impact of twenty-three different features, grouped into four categories, on the DBN's prediction performance is investigated. The experimental results demonstrate that the DBN significantly outperforms the baseline models. In addition, among the groups of features, technical indicators are found to be the most effective predictors of cryptocurrency price directions.
[[2306.08127] Friend or Foe Inside? Exploring In-Process Isolation to Maintain Memory Safety for Unsafe Rust](http://arxiv.org/abs/2306.08127) #protect
Rust is a popular memory-safe systems programming language. In order to interact with hardware or call into non-Rust libraries, Rust provides \emph{unsafe} language features that shift responsibility for ensuring memory safety to the developer. Failing to do so, may lead to memory safety violations in unsafe code which can violate safety of the entire application. In this work we explore in-process isolation with Memory Protection Keys as a mechanism to shield safe program sections from safety violations that may happen in unsafe sections. Our approach is easy to use and comprehensive as it prevents heap and stack-based violations. We further compare process-based and in-process isolation mechanisms and the necessary requirements for data serialization, communication, and context switching. Our results show that in-process isolation can be effective and efficient, permits for a high degree of automation, and also enables a notion of application rewinding where the safe program section may detect and safely handle violations in unsafe code.
[[2306.07993] Trustworthy Artificial Intelligence Framework for Proactive Detection and Risk Explanation of Cyber Attacks in Smart Grid](http://arxiv.org/abs/2306.07993) #attack
The rapid growth of distributed energy resources (DERs), such as renewable energy sources, generators, consumers, and prosumers in the smart grid infrastructure, poses significant cybersecurity and trust challenges to the grid controller. Consequently, it is crucial to identify adversarial tactics and measure the strength of the attacker's DER. To enable a trustworthy smart grid controller, this work investigates a trustworthy artificial intelligence (AI) mechanism for proactive identification and explanation of the cyber risk caused by the control/status message of DERs. Thus, proposing and developing a trustworthy AI framework to facilitate the deployment of any AI algorithms for detecting potential cyber threats and analyzing root causes based on Shapley value interpretation while dynamically quantifying the risk of an attack based on Ward's minimum variance formula. The experiment with a state-of-the-art dataset establishes the proposed framework as a trustworthy AI by fulfilling the capabilities of reliability, fairness, explainability, transparency, reproducibility, and accountability.
[[2306.07997] Machine Learning Approach on Multiclass Classification of Internet Firewall Log Files](http://arxiv.org/abs/2306.07997) #attack
Firewalls are critical components in securing communication networks by screening all incoming (and occasionally exiting) data packets. Filtering is carried out by comparing incoming data packets to a set of rules designed to prevent malicious code from entering the network. To regulate the flow of data packets entering and leaving a network, an Internet firewall keeps a track of all activity. While the primary function of log files is to aid in troubleshooting and diagnostics, the information they contain is also very relevant to system audits and forensics. Firewalls primary function is to prevent malicious data packets from being sent. In order to better defend against cyberattacks and understand when and how malicious actions are influencing the internet, it is necessary to examine log files. As a result, the firewall decides whether to 'allow,' 'deny,' 'drop,' or 'reset-both' the incoming and outgoing packets. In this research, we apply various categorization algorithms to make sense of data logged by a firewall device. Harmonic mean F1 score, recall, and sensitivity measurement data with a 99% accuracy score in the random forest technique are used to compare the classifier's performance. To be sure, the proposed characteristics did significantly contribute to enhancing the firewall classification rate, as seen by the high accuracy rates generated by the other methods.
[[2306.08169] Multi-Factor Credential Hashing for Asymmetric Brute-Force Attack Resistance](http://arxiv.org/abs/2306.08169) #attack
Since the introduction of bcrypt in 1999, adaptive password hashing functions, whereby brute-force resistance increases symmetrically with computational difficulty for legitimate users, have been our most powerful post-breach countermeasure against credential disclosure. Unfortunately, the relatively low tolerance of users to added latency places an upper bound on the deployment of this technique in most applications. In this paper, we present a multi-factor credential hashing function (MFCHF) that incorporates the additional entropy of multi-factor authentication into password hashes to provide asymmetric resistance to brute-force attacks. MFCHF provides full backward compatibility with existing authentication software (e.g., Google Authenticator) and hardware (e.g., YubiKeys), with support for common usability features like factor recovery. The result is a 10^6 to 10^48 times increase in the difficulty of cracking hashed credentials, with little added latency or usability impact.
[[2306.07998] Contrastive Attention Networks for Attribution of Early Modern Print](http://arxiv.org/abs/2306.07998) #robust
In this paper, we develop machine learning techniques to identify unknown printers in early modern (c.~1500--1800) English printed books. Specifically, we focus on matching uniquely damaged character type-imprints in anonymously printed books to works with known printers in order to provide evidence of their origins. Until now, this work has been limited to manual investigations by analytical bibliographers. We present a Contrastive Attention-based Metric Learning approach to identify similar damage across character image pairs, which is sensitive to very subtle differences in glyph shapes, yet robust to various confounding sources of noise associated with digitized historical books. To overcome the scarce amount of supervised data, we design a random data synthesis procedure that aims to simulate bends, fractures, and inking variations induced by the early printing process. Our method successfully improves downstream damaged type-imprint matching among printed works from this period, as validated by in-domain human experts. The results of our approach on two important philosophical works from the Early Modern period demonstrate potential to extend the extant historical research about the origins and content of these books.
[[2306.08013] TopP\&R: Robust Support Estimation Approach for Evaluating Fidelity and Diversity in Generative Models](http://arxiv.org/abs/2306.08013) #robust
We propose a robust and reliable evaluation metric for generative models by introducing topological and statistical treatments for rigorous support estimation. Existing metrics, such as Inception Score (IS), Fr\'echet Inception Distance (FID), and the variants of Precision and Recall (P\&R), heavily rely on supports that are estimated from sample features. However, the reliability of their estimation has not been seriously discussed (and overlooked) even though the quality of the evaluation entirely depends on it. In this paper, we propose Topological Precision and Recall (TopP\&R, pronounced 'topper'), which provides a systematic approach to estimating supports, retaining only topologically and statistically important features with a certain level of confidence. This not only makes TopP\&R strong for noisy features, but also provides statistical consistency. Our theoretical and experimental results show that TopP\&R is robust to outliers and non-independent and identically distributed (Non-IID) perturbations, while accurately capturing the true trend of change in samples. To the best of our knowledge, this is the first evaluation metric focused on the robust estimation of the support and provides its statistical consistency under noise.
[[2306.08200] POP: Prompt Of Prompts for Continual Learning](http://arxiv.org/abs/2306.08200) #robust
Continual learning (CL) has attracted increasing attention in the recent past. It aims to mimic the human ability to learn new concepts without catastrophic forgetting. While existing CL methods accomplish this to some extent, they are still prone to semantic drift of the learned feature space. Foundation models, which are endowed with a robust feature representation, learned from very large datasets, provide an interesting substrate for the solution of the CL problem. Recent work has also shown that they can be adapted to specific tasks by prompt tuning techniques that leave the generality of the representation mostly unscathed. An open question is, however, how to learn both prompts that are task specific and prompts that are global, i.e. capture cross-task information. In this work, we propose the Prompt Of Prompts (POP) model, which addresses this goal by progressively learning a group of task-specified prompts and a group of global prompts, denoted as POP, to integrate information from the former. We show that a foundation model equipped with POP learning is able to outperform classic CL methods by a significant margin. Moreover, as prompt tuning only requires a small set of training samples, POP is able to perform CL in the few-shot setting, while still outperforming competing methods trained on the entire dataset.
[[2306.08257] On the Robustness of Latent Diffusion Models](http://arxiv.org/abs/2306.08257) #robust
Latent diffusion models achieve state-of-the-art performance on a variety of generative tasks, such as image synthesis and image editing. However, the robustness of latent diffusion models is not well studied. Previous works only focus on the adversarial attacks against the encoder or the output image under white-box settings, regardless of the denoising process. Therefore, in this paper, we aim to analyze the robustness of latent diffusion models more thoroughly. We first study the influence of the components inside latent diffusion models on their white-box robustness. In addition to white-box scenarios, we evaluate the black-box robustness of latent diffusion models via transfer attacks, where we consider both prompt-transfer and model-transfer settings and possible defense mechanisms. However, all these explorations need a comprehensive benchmark dataset, which is missing in the literature. Therefore, to facilitate the research of the robustness of latent diffusion models, we propose two automatic dataset construction pipelines for two kinds of image editing models and release the whole dataset. Our code and dataset are available at \url{https://github.com/jpzhang1810/LDM-Robustness}.
[[2306.08122] Beyond Black Box AI-Generated Plagiarism Detection: From Sentence to Document Level](http://arxiv.org/abs/2306.08122) #robust
The increasing reliance on large language models (LLMs) in academic writing has led to a rise in plagiarism. Existing AI-generated text classifiers have limited accuracy and often produce false positives. We propose a novel approach using natural language processing (NLP) techniques, offering quantifiable metrics at both sentence and document levels for easier interpretation by human evaluators. Our method employs a multi-faceted approach, generating multiple paraphrased versions of a given question and inputting them into the LLM to generate answers. By using a contrastive loss function based on cosine similarity, we match generated sentences with those from the student's response. Our approach achieves up to 94% accuracy in classifying human and AI text, providing a robust and adaptable solution for plagiarism detection in academic settings. This method improves with LLM advancements, reducing the need for new model training or reconfiguration, and offers a more transparent way of evaluating and detecting AI-generated text.
[[2306.08055] Tune As You Scale: Hyperparameter Optimization For Compute Efficient Training](http://arxiv.org/abs/2306.08055) #robust
Hyperparameter tuning of deep learning models can lead to order-of-magnitude performance gains for the same amount of compute. Despite this, systematic tuning is uncommon, particularly for large models, which are expensive to evaluate and tend to have many hyperparameters, necessitating difficult judgment calls about tradeoffs, budgets, and search bounds. To address these issues and propose a practical method for robustly tuning large models, we present Cost-Aware Pareto Region Bayesian Search (CARBS), a Bayesian optimization algorithm that performs local search around the performance-cost Pareto frontier. CARBS does well even in unbounded search spaces with many hyperparameters, learns scaling relationships so that it can tune models even as they are scaled up, and automates much of the "black magic" of tuning. Among our results, we effectively solve the entire ProcGen benchmark just by tuning a simple baseline (PPO, as provided in the original ProcGen paper). We also reproduce the model size vs. training tokens scaling result from the Chinchilla project (Hoffmann et al. 2022), while simultaneously discovering scaling laws for every other hyperparameter, via an easy automated process that uses significantly less compute and is applicable to any deep learning problem (not just language models).
[[2306.08194] Learning on Graphs under Label Noise](http://arxiv.org/abs/2306.08194) #robust
Node classification on graphs is a significant task with a wide range of applications, including social analysis and anomaly detection. Even though graph neural networks (GNNs) have produced promising results on this task, current techniques often presume that label information of nodes is accurate, which may not be the case in real-world applications. To tackle this issue, we investigate the problem of learning on graphs with label noise and develop a novel approach dubbed Consistent Graph Neural Network (CGNN) to solve it. Specifically, we employ graph contrastive learning as a regularization term, which promotes two views of augmented nodes to have consistent representations. Since this regularization term cannot utilize label information, it can enhance the robustness of node representations to label noise. Moreover, to detect noisy labels on the graph, we present a sample selection technique based on the homophily assumption, which identifies noisy nodes by measuring the consistency between the labels with their neighbors. Finally, we purify these confident noisy labels to permit efficient semantic graph learning. Extensive experiments on three well-known benchmark datasets demonstrate the superiority of our CGNN over competing approaches.
[[2306.08210] Uncertainty-Aware Robust Learning on Noisy Graphs](http://arxiv.org/abs/2306.08210) #robust
Graph neural networks have shown impressive capabilities in solving various graph learning tasks, particularly excelling in node classification. However, their effectiveness can be hindered by the challenges arising from the widespread existence of noisy measurements associated with the topological or nodal information present in real-world graphs. These inaccuracies in observations can corrupt the crucial patterns within the graph data, ultimately resulting in undesirable performance in practical applications. To address these issues, this paper proposes a novel uncertainty-aware graph learning framework motivated by distributionally robust optimization. Specifically, we use a graph neural network-based encoder to embed the node features and find the optimal node embeddings by minimizing the worst-case risk through a minimax formulation. Such an uncertainty-aware learning process leads to improved node representations and a more robust graph predictive model that effectively mitigates the impact of uncertainty arising from data noise. Our experimental result shows that the proposed framework achieves superior predictive performance compared to the state-of-the-art baselines under various noisy settings.
[[2306.08277] FRIGATE: Frugal Spatio-temporal Forecasting on Road Networks](http://arxiv.org/abs/2306.08277) #robust
Modelling spatio-temporal processes on road networks is a task of growing importance. While significant progress has been made on developing spatio-temporal graph neural networks (Gnns), existing works are built upon three assumptions that are not practical on real-world road networks. First, they assume sensing on every node of a road network. In reality, due to budget-constraints or sensor failures, all locations (nodes) may not be equipped with sensors. Second, they assume that sensing history is available at all installed sensors. This is unrealistic as well due to sensor failures, loss of packets during communication, etc. Finally, there is an assumption of static road networks. Connectivity within networks change due to road closures, constructions of new roads, etc. In this work, we develop FRIGATE to address all these shortcomings. FRIGATE is powered by a spatio-temporal Gnn that integrates positional, topological, and temporal information into rich inductive node representations. The joint fusion of this diverse information is made feasible through a novel combination of gated Lipschitz embeddings with Lstms. We prove that the proposed Gnn architecture is provably more expressive than message-passing Gnns used in state-of-the-art algorithms. The higher expressivity of FRIGATE naturally translates to superior empirical performance conducted on real-world network-constrained traffic data. In addition, FRIGATE is robust to frugal sensor deployment, changes in road network connectivity, and temporal irregularity in sensing.
[[2306.08002] Modification in Elliptic Curve Cryptography based Mutual authentication scheme for smart grid communication using biometric approach](http://arxiv.org/abs/2306.08002) #biometric
Smart grid is a self-sufficient system. That tracks how the energy is used from its source to its final destination. The smart grid can increase the service quality while reducing the consumption of electricity. However, the safety and confidentiality of information data is the major challenge in smart grid environment. To overcome this there are numerous authentication procedures that have been documented. The mutual authentication system for the smart grid that is based on elliptic curve cryptography and biometrics was thus introduced by A.A. Khan et al.s. This protocol is secure from various attacks. But we found an inability of password and biometric updating phase. Therefore we provided the password and biometric updating phase in this protocol.
[[2306.08004] Detection and classification of faults aimed at preventive maintenance of PV systems](http://arxiv.org/abs/2306.08004) #extraction
Diagnosis in PV systems aims to detect, locate and identify faults. Diagnosing these faults is vital to guarantee energy production and extend the useful life of PV power plants. In the literature, multiple machine learning approaches have been proposed for this purpose. However, few of these works have paid special attention to the detection of fine faults and the specialized process of extraction and selection of features for their classification. A fine fault is one whose characteristic signature is difficult to distinguish to that of a healthy panel. As a contribution to the detection of fine faults (especially of the snail trail type), this article proposes an innovative approach based on the Random Forest (RF) algorithm. This approach uses a complex feature extraction and selection method that improves the computational time of fault classification while maintaining high accuracy.
[[2306.08044] Pruning the Way to Reliable Policies: A Multi-Objective Deep Q-Learning Approach to Critical Care](http://arxiv.org/abs/2306.08044) #extraction
Most medical treatment decisions are sequential in nature. Hence, there is substantial hope that reinforcement learning may make it possible to formulate precise data-driven treatment plans. However, a key challenge for most applications in this field is the sparse nature of primarily mortality-based reward functions, leading to decreased stability of offline estimates. In this work, we introduce a deep Q-learning approach able to obtain more reliable critical care policies. This method integrates relevant but noisy intermediate biomarker signals into the reward specification, without compromising the optimization of the main outcome of interest (e.g. patient survival). We achieve this by first pruning the action set based on all available rewards, and second training a final model based on the sparse main reward but with a restricted action set. By disentangling accurate and approximated rewards through action pruning, potential distortions of the main objective are minimized, all while enabling the extraction of valuable information from intermediate signals that can guide the learning process. We evaluate our method in both off-policy and offline settings using simulated environments and real health records of patients in intensive care units. Our empirical results indicate that pruning significantly reduces the size of the action space while staying mostly consistent with the actions taken by physicians, outperforming the current state-of-the-art offline reinforcement learning method conservative Q-learning. Our work is a step towards developing reliable policies by effectively harnessing the wealth of available information in data-intensive critical care environments.
[[2306.07974] Chainlet Orbits: Topological Address Embedding for the Bitcoin Blockchain](http://arxiv.org/abs/2306.07974) #interpretability
The rise of cryptocurrencies like Bitcoin, which enable transactions with a degree of pseudonymity, has led to a surge in various illicit activities, including ransomware payments and transactions on darknet markets. These illegal activities often utilize Bitcoin as the preferred payment method. However, current tools for detecting illicit behavior either rely on a few heuristics and laborious data collection processes or employ computationally inefficient graph neural network (GNN) models that are challenging to interpret.
To overcome the computational and interpretability limitations of existing techniques, we introduce an effective solution called Chainlet Orbits. This approach embeds Bitcoin addresses by leveraging their topological characteristics in transactions. By employing our innovative address embedding, we investigate e-crime in Bitcoin networks by focusing on distinctive substructures that arise from illicit behavior.
The results of our node classification experiments demonstrate superior performance compared to state-of-the-art methods, including both topological and GNN-based approaches. Moreover, our approach enables the use of interpretable and explainable machine learning models in as little as 15 minutes for most days on the Bitcoin transaction network.
[[2306.08232] Curricular Subgoals for Inverse Reinforcement Learning](http://arxiv.org/abs/2306.08232) #interpretability
Inverse Reinforcement Learning (IRL) aims to reconstruct the reward function from expert demonstrations to facilitate policy learning, and has demonstrated its remarkable success in imitation learning. To promote expert-like behavior, existing IRL methods mainly focus on learning global reward functions to minimize the trajectory difference between the imitator and the expert. However, these global designs are still limited by the redundant noise and error propagation problems, leading to the unsuitable reward assignment and thus downgrading the agent capability in complex multi-stage tasks. In this paper, we propose a novel Curricular Subgoal-based Inverse Reinforcement Learning (CSIRL) framework, that explicitly disentangles one task with several local subgoals to guide agent imitation. Specifically, CSIRL firstly introduces decision uncertainty of the trained agent over expert trajectories to dynamically select subgoals, which directly determines the exploration boundary of different task stages. To further acquire local reward functions for each stage, we customize a meta-imitation objective based on these curricular subgoals to train an intrinsic reward generator. Experiments on the D4RL and autonomous driving benchmarks demonstrate that the proposed methods yields results superior to the state-of-the-art counterparts, as well as better interpretability. Our code is available at https://github.com/Plankson/CSIRL.
[[2306.08259] LargeST: A Benchmark Dataset for Large-Scale Traffic Forecasting](http://arxiv.org/abs/2306.08259) #interpretability
Traffic forecasting plays a critical role in smart city initiatives and has experienced significant advancements thanks to the power of deep learning in capturing non-linear patterns of traffic data. However, the promising results achieved on current public datasets may not be applicable to practical scenarios due to limitations within these datasets. First, the limited sizes of them may not reflect the real-world scale of traffic networks. Second, the temporal coverage of these datasets is typically short, posing hurdles in studying long-term patterns and acquiring sufficient samples for training deep models. Third, these datasets often lack adequate metadata for sensors, which compromises the reliability and interpretability of the data. To mitigate these limitations, we introduce the LargeST benchmark dataset. It encompasses a total number of 8,600 sensors with a 5-year time coverage and includes comprehensive metadata. Using LargeST, we perform in-depth data analysis to extract data insights, benchmark well-known baselines in terms of their performance and efficiency, and identify challenges as well as opportunities for future research. We release the datasets and baseline implementations at: https://github.com/liuxu77/LargeST.
[[2306.08068] DORSal: Diffusion for Object-centric Representations of Scenes $\textit{et al](http://arxiv.org/abs/2306.08068) #diffusion
Recent progress in 3D scene understanding enables scalable learning of representations across large datasets of diverse scenes. As a consequence, generalization to unseen scenes and objects, rendering novel views from just a single or a handful of input images, and controllable scene generation that supports editing, is now possible. However, training jointly on a large number of scenes typically compromises rendering quality when compared to single-scene optimized models such as NeRFs. In this paper, we leverage recent progress in diffusion models to equip 3D scene representation learning models with the ability to render high-fidelity novel views, while retaining benefits such as object-level scene editing to a large degree. In particular, we propose DORSal, which adapts a video diffusion architecture for 3D scene generation conditioned on object-centric slot-based representations of scenes. On both complex synthetic multi-object scenes and on the real-world large-scale Street View dataset, we show that DORSal enables scalable neural rendering of 3D scenes with object-level editing and improves upon existing approaches.
[[2306.08103] Adding 3D Geometry Control to Diffusion Models](http://arxiv.org/abs/2306.08103) #diffusion
Diffusion models have emerged as a powerful method of generative modeling across a range of fields, capable of producing stunning photo-realistic images from natural language descriptions. However, these models lack explicit control over the 3D structure of the objects in the generated images. In this paper, we propose a novel method that incorporates 3D geometry control into diffusion models, making them generate even more realistic and diverse images. To achieve this, our method exploits ControlNet, which extends diffusion models by using visual prompts in addition to text prompts. We generate images of 3D objects taken from a 3D shape repository (e.g., ShapeNet and Objaverse), render them from a variety of poses and viewing directions, compute the edge maps of the rendered images, and use these edge maps as visual prompts to generate realistic images. With explicit 3D geometry control, we can easily change the 3D structures of the objects in the generated images and obtain ground-truth 3D annotations automatically. This allows us to use the generated images to improve a lot of vision tasks, e.g., classification and 3D pose estimation, in both in-distribution (ID) and out-of-distribution (OOD) settings. We demonstrate the effectiveness of our method through extensive experiments on ImageNet-50, ImageNet-R, PASCAL3D+, ObjectNet3D, and OOD-CV datasets. The results show that our method significantly outperforms existing methods across multiple benchmarks (e.g., 4.6 percentage points on ImageNet-50 using ViT and 3.5 percentage points on PASCAL3D+ and ObjectNet3D using NeMo).
[[2306.08247] Diffusion in Diffusion: Cyclic One-Way Diffusion for Text-Vision-Conditioned Generation](http://arxiv.org/abs/2306.08247) #diffusion
Text-to-Image (T2I) generation with diffusion models allows users to control
the semantic content in the synthesized images given text conditions. As a
further step toward a more customized image creation application, we introduce
a new multi-modality generation setting that synthesizes images based on not
only the semantic-level textual input but also on the pixel-level visual
conditions. Existing literature first converts the given visual information to
semantic-level representation by connecting it to languages, and then
incorporates it into the original denoising process. Seemingly intuitive, such
methodological design loses the pixel values during the semantic transition,
thus failing to fulfill the task scenario where the preservation of low-level
vision is desired (e.g., ID of a given face image). To this end, we propose
Cyclic One-Way Diffusion (COW), a training-free framework for creating
customized images with respect to semantic text and pixel-visual conditioning.
Notably, we observe that sub-regions of an image impose mutual interference,
just like physical diffusion, to achieve ultimate harmony along the denoising
trajectory. Thus we propose to repetitively utilize the given visual condition
in a cyclic way, by planting the visual condition as a high-concentration
seed'' at the initialization step of the denoising process, and
diffuse''
it into a harmonious picture by controlling a one-way information flow from the
visual condition. We repeat the destroy-and-construct process multiple times to
gradually but steadily impose the internal diffusion process within the image.
Experiments on the challenging one-shot face and text-conditioned image
synthesis task demonstrate our superiority in terms of speed, image quality,
and conditional fidelity compared to learning-based text-vision conditional
methods.
[[2306.08251] GBSD: Generative Bokeh with Stage Diffusion](http://arxiv.org/abs/2306.08251) #diffusion
The bokeh effect is an artistic technique that blurs out-of-focus areas in a photograph and has gained interest due to recent developments in text-to-image synthesis and the ubiquity of smart-phone cameras and photo-sharing apps. Prior work on rendering bokeh effects have focused on post hoc image manipulation to produce similar blurring effects in existing photographs using classical computer graphics or neural rendering techniques, but have either depth discontinuity artifacts or are restricted to reproducing bokeh effects that are present in the training data. More recent diffusion based models can synthesize images with an artistic style, but either require the generation of high-dimensional masks, expensive fine-tuning, or affect global image characteristics. In this paper, we present GBSD, the first generative text-to-image model that synthesizes photorealistic images with a bokeh style. Motivated by how image synthesis occurs progressively in diffusion models, our approach combines latent diffusion models with a 2-stage conditioning algorithm to render bokeh effects on semantically defined objects. Since we can focus the effect on objects, this semantic bokeh effect is more versatile than classical rendering techniques. We evaluate GBSD both quantitatively and qualitatively and demonstrate its ability to be applied in both text-to-image and image-to-image settings.
[[2306.08276] TryOnDiffusion: A Tale of Two UNets](http://arxiv.org/abs/2306.08276) #diffusion
Given two images depicting a person and a garment worn by another person, our goal is to generate a visualization of how the garment might look on the input person. A key challenge is to synthesize a photorealistic detail-preserving visualization of the garment, while warping the garment to accommodate a significant body pose and shape change across the subjects. Previous methods either focus on garment detail preservation without effective pose and shape variation, or allow try-on with the desired shape and pose but lack garment details. In this paper, we propose a diffusion-based architecture that unifies two UNets (referred to as Parallel-UNet), which allows us to preserve garment details and warp the garment for significant pose and body change in a single network. The key ideas behind Parallel-UNet include: 1) garment is warped implicitly via a cross attention mechanism, 2) garment warp and person blend happen as part of a unified process as opposed to a sequence of two separate tasks. Experimental results indicate that TryOnDiffusion achieves state-of-the-art performance both qualitatively and quantitatively.
[[2306.08183] ZeroForge: Feedforward Text-to-Shape Without 3D Supervision](http://arxiv.org/abs/2306.08183) #data-free
Current state-of-the-art methods for text-to-shape generation either require supervised training using a labeled dataset of pre-defined 3D shapes, or perform expensive inference-time optimization of implicit neural representations. In this work, we present ZeroForge, an approach for zero-shot text-to-shape generation that avoids both pitfalls. To achieve open-vocabulary shape generation, we require careful architectural adaptation of existing feed-forward approaches, as well as a combination of data-free CLIP-loss and contrastive losses to avoid mode collapse. Using these techniques, we are able to considerably expand the generative ability of existing feed-forward text-to-shape models such as CLIP-Forge. We support our method via extensive qualitative and quantitative evaluations
[[2306.08009] DHBE: Data-free Holistic Backdoor Erasing in Deep Neural Networks via Restricted Adversarial Distillation](http://arxiv.org/abs/2306.08009) #data-free
Backdoor attacks have emerged as an urgent threat to Deep Neural Networks (DNNs), where victim DNNs are furtively implanted with malicious neurons that could be triggered by the adversary. To defend against backdoor attacks, many works establish a staged pipeline to remove backdoors from victim DNNs: inspecting, locating, and erasing. However, in a scenario where a few clean data can be accessible, such pipeline is fragile and cannot erase backdoors completely without sacrificing model accuracy. To address this issue, in this paper, we propose a novel data-free holistic backdoor erasing (DHBE) framework. Instead of the staged pipeline, the DHBE treats the backdoor erasing task as a unified adversarial procedure, which seeks equilibrium between two different competing processes: distillation and backdoor regularization. In distillation, the backdoored DNN is distilled into a proxy model, transferring its knowledge about clean data, yet backdoors are simultaneously transferred. In backdoor regularization, the proxy model is holistically regularized to prevent from infecting any possible backdoor transferred from distillation. These two processes jointly proceed with data-free adversarial optimization until a clean, high-accuracy proxy model is obtained. With the novel adversarial design, our framework demonstrates its superiority in three aspects: 1) minimal detriment to model accuracy, 2) high tolerance for hyperparameters, and 3) no demand for clean data. Extensive experiments on various backdoor attacks and datasets are performed to verify the effectiveness of the proposed framework. Code is available at \url{https://github.com/yanzhicong/DHBE}
[[2306.08045] Efficient 3D Semantic Segmentation with Superpoint Transformer](http://arxiv.org/abs/2306.08045) #transformer
We introduce a novel superpoint-based transformer architecture for efficient semantic segmentation of large-scale 3D scenes. Our method incorporates a fast algorithm to partition point clouds into a hierarchical superpoint structure, which makes our preprocessing 7 times times faster than existing superpoint-based approaches. Additionally, we leverage a self-attention mechanism to capture the relationships between superpoints at multiple scales, leading to state-of-the-art performance on three challenging benchmark datasets: S3DIS (76.0% mIoU 6-fold validation), KITTI-360 (63.5% on Val), and DALES (79.6%). With only 212k parameters, our approach is up to 200 times more compact than other state-of-the-art models while maintaining similar performance. Furthermore, our model can be trained on a single GPU in 3 hours for a fold of the S3DIS dataset, which is 7x to 70x fewer GPU-hours than the best-performing methods. Our code and models are accessible at github.com/drprojects/superpoint_transformer.
[[2306.08073] Dynamic Clustering Transformer Network for Point Cloud Segmentation](http://arxiv.org/abs/2306.08073) #transformer
Point cloud segmentation is one of the most important tasks in computer vision with widespread scientific, industrial, and commercial applications. The research thereof has resulted in many breakthroughs in 3D object and scene understanding. Previous methods typically utilized hierarchical architectures for feature representation. However, the commonly used sampling and grouping methods in hierarchical networks are only based on point-wise three-dimensional coordinates, ignoring local semantic homogeneity of point clusters. Additionally, the prevalent Farthest Point Sampling (FPS) method is often a computational bottleneck. To address these issues, we propose a novel 3D point cloud representation network, called Dynamic Clustering Transformer Network (DCTNet). It has an encoder-decoder architecture, allowing for both local and global feature learning. Specifically, we propose novel semantic feature-based dynamic sampling and clustering methods in the encoder, which enables the model to be aware of local semantic homogeneity for local feature aggregation. Furthermore, in the decoder, we propose an efficient semantic feature-guided upsampling method. Our method was evaluated on an object-based dataset (ShapeNet), an urban navigation dataset (Toronto-3D), and a multispectral LiDAR dataset, verifying the performance of DCTNet across a wide variety of practical engineering applications. The inference speed of DCTNet is 3.8-16.8$\times$ faster than existing State-of-the-Art (SOTA) models on the ShapeNet dataset, while achieving an instance-wise mIoU of $86.6\%$, the current top score. Our method similarly outperforms previous methods on the other datasets, verifying it as the new State-of-the-Art in point cloud segmentation.
[[2306.08226] CLIPXPlore: Coupled CLIP and Shape Spaces for 3D Shape Exploration](http://arxiv.org/abs/2306.08226) #generative
This paper presents CLIPXPlore, a new framework that leverages a vision-language model to guide the exploration of the 3D shape space. Many recent methods have been developed to encode 3D shapes into a learned latent shape space to enable generative design and modeling. Yet, existing methods lack effective exploration mechanisms, despite the rich information. To this end, we propose to leverage CLIP, a powerful pre-trained vision-language model, to aid the shape-space exploration. Our idea is threefold. First, we couple the CLIP and shape spaces by generating paired CLIP and shape codes through sketch images and training a mapper network to connect the two spaces. Second, to explore the space around a given shape, we formulate a co-optimization strategy to search for the CLIP code that better matches the geometry of the shape. Third, we design three exploration modes, binary-attribute-guided, text-guided, and sketch-guided, to locate suitable exploration trajectories in shape space and induce meaningful changes to the shape. We perform a series of experiments to quantitatively and visually compare CLIPXPlore with different baselines in each of the three exploration modes, showing that CLIPXPlore can produce many meaningful exploration results that cannot be achieved by the existing solutions.
[[2306.08230] Unbiased Learning of Deep Generative Models with Structured Discrete Representations](http://arxiv.org/abs/2306.08230) #generative
By composing graphical models with deep learning architectures, we learn generative models with the strengths of both frameworks. The structured variational autoencoder (SVAE) inherits structure and interpretability from graphical models, and flexible likelihoods for high-dimensional data from deep learning, but poses substantial optimization challenges. We propose novel algorithms for learning SVAEs, and are the first to demonstrate the SVAE's ability to handle multimodal uncertainty when data is missing by incorporating discrete latent variables. Our memory-efficient implicit differentiation scheme makes the SVAE tractable to learn via gradient descent, while demonstrating robustness to incomplete optimization. To more rapidly learn accurate graphical model parameters, we derive a method for computing natural gradients without manual derivations, which avoids biases found in prior work. These optimization innovations enable the first comparisons of the SVAE to state-of-the-art time series models, where the SVAE performs competitively while learning interpretable and structured discrete data representations.
[[2306.08129] AVIS: Autonomous Visual Information Seeking with Large Language Models](http://arxiv.org/abs/2306.08129) #large language model
In this paper, we propose an autonomous information seeking visual question answering framework, AVIS. Our method leverages a Large Language Model (LLM) to dynamically strategize the utilization of external tools and to investigate their outputs, thereby acquiring the indispensable knowledge needed to provide answers to the posed questions. Responding to visual questions that necessitate external knowledge, such as "What event is commemorated by the building depicted in this image?", is a complex task. This task presents a combinatorial search space that demands a sequence of actions, including invoking APIs, analyzing their responses, and making informed decisions. We conduct a user study to collect a variety of instances of human decision-making when faced with this task. This data is then used to design a system comprised of three components: an LLM-powered planner that dynamically determines which tool to use next, an LLM-powered reasoner that analyzes and extracts key information from the tool outputs, and a working memory component that retains the acquired information throughout the process. The collected user behavior serves as a guide for our system in two key ways. First, we create a transition graph by analyzing the sequence of decisions made by users. This graph delineates distinct states and confines the set of actions available at each state. Second, we use examples of user decision-making to provide our LLM-powered planner and reasoner with relevant contextual instances, enhancing their capacity to make informed decisions. We show that AVIS achieves state-of-the-art results on knowledge-intensive visual question answering benchmarks such as Infoseek and OK-VQA.
[[2306.08107] AutoML in the Age of Large Language Models: Current Challenges, Future Opportunities and Risks](http://arxiv.org/abs/2306.08107) #large language model
The fields of both Natural Language Processing (NLP) and Automated Machine Learning (AutoML) have achieved remarkable results over the past years. In NLP, especially Large Language Models (LLMs) have experienced a rapid series of breakthroughs very recently. We envision that the two fields can radically push the boundaries of each other through tight integration. To showcase this vision, we explore the potential of a symbiotic relationship between AutoML and LLMs, shedding light on how they can benefit each other. In particular, we investigate both the opportunities to enhance AutoML approaches with LLMs from different perspectives and the challenges of leveraging AutoML to further improve LLMs. To this end, we survey existing work, and we critically assess risks. We strongly believe that the integration of the two fields has the potential to disrupt both fields, NLP and AutoML. By highlighting conceivable synergies, but also risks, we aim to foster further exploration at the intersection of AutoML and LLMs.
[[2306.08161] h2oGPT: Democratizing Large Language Models](http://arxiv.org/abs/2306.08161) #large language model
Foundation Large Language Models (LLMs) such as GPT-4 represent a revolution in AI due to their real-world applications though natural language processing. However, they also pose many significant risks such as the presence of biased, private, or harmful text, and the unauthorized inclusion of copyrighted material.
We introduce h2oGPT, a suite of open-source code repositories for the creation and use of Large Language Models (LLMs) based on Generative Pretrained Transformers (GPTs). The goal of this project is to create the world's best truly open-source alternative to closed-source GPTs. In collaboration with and as part of the incredible and unstoppable open-source community, we open-source several fine-tuned h2oGPT models from 7 to 40 Billion parameters, ready for commercial use under fully permissive Apache 2.0 licenses. Included in our release is 100% private document search using natural language.
Open-source language models help boost AI development and make it more accessible and trustworthy. They lower entry hurdles, allowing people and groups to tailor these models to their needs. This openness increases innovation, transparency, and fairness. An open-source strategy is needed to share AI benefits fairly, and H2O.ai will continue to democratize AI and LLMs.
[[2306.08162] INT2](http://arxiv.org/abs/2306.08162) #large language model
We introduce a method that dramatically reduces fine-tuning VRAM requirements and rectifies quantization errors in quantized Large Language Models. First, we develop an extremely memory-efficient fine-tuning (EMEF) method for quantized models using Low-Rank Adaptation (LoRA), and drawing upon it, we construct an error-correcting algorithm designed to minimize errors induced by the quantization process. Our method reduces the memory requirements by up to 5.6 times, which enables fine-tuning a 7 billion parameter Large Language Model (LLM) on consumer laptops. At the same time, we propose a Low-Rank Error Correction (LREC) method that exploits the added LoRA layers to ameliorate the gap between the quantized model and its float point counterpart. Our error correction framework leads to a fully functional INT2 quantized LLM with the capacity to generate coherent English text. To the best of our knowledge, this is the first INT2 Large Language Model that has been able to reach such a performance. The overhead of our method is merely a 1.05 times increase in model size, which translates to an effective precision of INT2.1. Also, our method readily generalizes to other quantization standards, such as INT3, INT4, and INT8, restoring their lost performance, which marks a significant milestone in the field of model quantization. The strategies delineated in this paper hold promising implications for the future development and optimization of quantized models, marking a pivotal shift in the landscape of low-resource machine learning computations.
[[2306.08190] Assessing the Effectiveness of GPT-3 in Detecting False Political Statements: A Case Study on the LIAR Dataset](http://arxiv.org/abs/2306.08190) #large language model
The detection of political fake statements is crucial for maintaining information integrity and preventing the spread of misinformation in society. Historically, state-of-the-art machine learning models employed various methods for detecting deceptive statements. These methods include the use of metadata (W. Wang et al., 2018), n-grams analysis (Singh et al., 2021), and linguistic (Wu et al., 2022) and stylometric (Islam et al., 2020) features. Recent advancements in large language models, such as GPT-3 (Brown et al., 2020) have achieved state-of-the-art performance on a wide range of tasks. In this study, we conducted experiments with GPT-3 on the LIAR dataset (W. Wang et al., 2018) and achieved higher accuracy than state-of-the-art models without using any additional meta or linguistic features. Additionally, we experimented with zero-shot learning using a carefully designed prompt and achieved near state-of-the-art performance. An advantage of this approach is that the model provided evidence for its decision, which adds transparency to the model's decision-making and offers a chance for users to verify the validity of the evidence provided.
[[2306.08075] BPKD: Boundary Privileged Knowledge Distillation For Semantic Segmentation](http://arxiv.org/abs/2306.08075) #segmentation
Current approaches for knowledge distillation in semantic segmentation tend to adopt a holistic approach that treats all spatial locations equally. However, for dense prediction tasks, it is crucial to consider the knowledge representation for different spatial locations in a different manner. Furthermore, edge regions between adjacent categories are highly uncertain due to context information leakage, which is particularly pronounced for compact networks. To address this challenge, this paper proposes a novel approach called boundary-privileged knowledge distillation (BPKD). BPKD distills the knowledge of the teacher model's body and edges separately from the compact student model. Specifically, we employ two distinct loss functions: 1) Edge Loss, which aims to distinguish between ambiguous classes at the pixel level in edge regions. 2) Body Loss, which utilizes shape constraints and selectively attends to the inner-semantic regions. Our experiments demonstrate that the proposed BPKD method provides extensive refinements and aggregation for edge and body regions. Additionally, the method achieves state-of-the-art distillation performance for semantic segmentation on three popular benchmark datasets, highlighting its effectiveness and generalization ability. BPKD shows consistent improvements over various lightweight semantic segmentation structures. The code is available at \url{https://github.com/AkideLiu/BPKD}.
[[2306.08213] SMC-UDA: Structure-Modal Constraint for Unsupervised Cross-Domain Renal Segmentation](http://arxiv.org/abs/2306.08213) #segmentation
Medical image segmentation based on deep learning often fails when deployed on images from a different domain. The domain adaptation methods aim to solve domain-shift challenges, but still face some problems. The transfer learning methods require annotation on the target domain, and the generative unsupervised domain adaptation (UDA) models ignore domain-specific representations, whose generated quality highly restricts segmentation performance. In this study, we propose a novel Structure-Modal Constrained (SMC) UDA framework based on a discriminative paradigm and introduce edge structure as a bridge between domains. The proposed multi-modal learning backbone distills structure information from image texture to distinguish domain-invariant edge structure. With the structure-constrained self-learning and progressive ROI, our methods segment the kidney by locating the 3D spatial structure of the edge. We evaluated SMC-UDA on public renal segmentation datasets, adapting from the labeled source domain (CT) to the unlabeled target domain (CT/MRI). The experiments show that our proposed SMC-UDA has a strong generalization and outperforms generative UDA methods.
[[2306.08275] C$^3$PS: Context-aware Conditional Cross Pseudo Supervision for Semi-supervised Medical Image Segmentation](http://arxiv.org/abs/2306.08275) #segmentation
Semi-supervised learning (SSL) methods, which can leverage a large amount of unlabeled data for improved performance, has attracted increasing attention recently. In this paper, we introduce a novel Context-aware Conditional Cross Pseudo Supervision method (referred as C$^3$PS) for semi-supervised medical image segmentation. Unlike previously published Cross Pseudo Supervision (CPS) works, this paper introduces a novel Conditional Cross Pseudo Supervision (CCPS) mechanism where the cross pseudo supervision is conditioned on a given class label. Context-awareness is further introduced in the CCPS to improve the quality of pseudo-labels for cross pseudo supervision. The proposed method has the additional advantage that in the later training stage, it can focus on the learning of hard organs. Validated on two typical yet challenging medical image segmentation tasks, our method demonstrates superior performance over the state-of-the-art methods.