secure

Title: PiXi: Password Inspiration by Exploring Information. (arXiv:2304.10728v1 [cs.CR])

Paper URL: http://arxiv.org/abs/2304.10728
Code URL: null
Copy Paste: [[2304.10728] PiXi: Password Inspiration by Exploring Information](http://arxiv.org/abs/2304.10728) #secure
Summary:
Passwords, a first line of defense against unauthorized access, must be secure and memorable. However, people often struggle to create secure passwords they can recall. To address this problem, we design Password inspiration by eXploring information (PiXi), a novel approach to nudge users towards creating secure passwords. PiXi is the first of its kind that employs a password creation nudge to support users in the task of generating a unique secure password themselves. PiXi prompts users to explore unusual information right before creating a password, to shake them out of their typical habits and thought processes, and to inspire them to create unique (and therefore stronger) passwords. PiXi's design aims to create an engaging, interactive, and effective nudge to improve secure password creation. We conducted a user study ($N=238$) to compare the efficacy of PiXi to typical password creation. Our findings indicate that PiXi's nudges do influence users' password choices such that passwords are significantly longer and more secure (less predictable and guessable).

Title: Decentralized Inverse Transparency With Blockchain. (arXiv:2304.11033v1 [cs.CR])

Paper URL: http://arxiv.org/abs/2304.11033
Code URL: null
Copy Paste: [[2304.11033] Decentralized Inverse Transparency With Blockchain](http://arxiv.org/abs/2304.11033) #secure
Summary:
Employee data can be used to facilitate work, but their misusage may pose risks for individuals. Inverse transparency therefore aims to track all usages of personal data, allowing individuals to monitor them to ensure accountability for potential misusage. This necessitates a trusted log to establish an agreed-upon and non-repudiable timeline of events. The unique properties of blockchain facilitate this by providing immutability and availability. For power asymmetric environments such as the workplace, permissionless blockchain is especially beneficial as no trusted third party is required. Yet, two issues remain: (1) In a decentralized environment, no arbiter can facilitate and attest to data exchanges. Simple peer-to-peer sharing of data, conversely, lacks the required non-repudiation. (2) With data governed by privacy legislation such as the GDPR, the core advantage of immutability becomes a liability. After a rightful request, an individual's personal data need to be rectified or deleted, which is impossible in an immutable blockchain.

To solve these issues, we present Kovacs, a decentralized data exchange and usage logging system for inverse transparency built on blockchain. Its new-usage protocol ensures non-repudiation, and therefore accountability, for inverse transparency. Its one-time pseudonym generation algorithm guarantees unlinkability and enables proof of ownership, which allows data subjects to exercise their legal rights regarding their personal data. With our implementation, we show the viability of our solution. The decentralized communication impacts performance and scalability, but exchange duration and storage size are still reasonable. More importantly, the provided information security meets high requirements. We conclude that Kovacs realizes decentralized inverse transparency through secure and GDPR-compliant use of permissionless blockchain.

security

Title: Fooling Thermal Infrared Detectors in Physical World. (arXiv:2304.10712v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2304.10712
Code URL: null
Copy Paste: [[2304.10712] Fooling Thermal Infrared Detectors in Physical World](http://arxiv.org/abs/2304.10712) #security
Summary:
Infrared imaging systems have a vast array of potential applications in pedestrian detection and autonomous driving, and their safety performance is of great concern. However, few studies have explored the safety of infrared imaging systems in real-world settings. Previous research has used physical perturbations such as small bulbs and thermal "QR codes" to attack infrared imaging detectors, but such methods are highly visible and lack stealthiness. Other researchers have used hot and cold blocks to deceive infrared imaging detectors, but this method is limited in its ability to execute attacks from various angles. To address these shortcomings, we propose a novel physical attack called adversarial infrared blocks (AdvIB). By optimizing the physical parameters of the adversarial infrared blocks, this method can execute a stealthy black-box attack on thermal imaging system from various angles. We evaluate the proposed method based on its effectiveness, stealthiness, and robustness. Our physical tests show that the proposed method achieves a success rate of over 80% under most distance and angle conditions, validating its effectiveness. For stealthiness, our method involves attaching the adversarial infrared block to the inside of clothing, enhancing its stealthiness. Additionally, we test the proposed method on advanced detectors, and experimental results demonstrate an average attack success rate of 51.2%, proving its robustness. Overall, our proposed AdvIB method offers a promising avenue for conducting stealthy, effective and robust black-box attacks on thermal imaging system, with potential implications for real-world safety and security applications.

Title: Automated Mapping of CVE Vulnerability Records to MITRE CWE Weaknesses. (arXiv:2304.11130v1 [cs.CR])

Paper URL: http://arxiv.org/abs/2304.11130
Code URL: null
Copy Paste: [[2304.11130] Automated Mapping of CVE Vulnerability Records to MITRE CWE Weaknesses](http://arxiv.org/abs/2304.11130) #security
Summary:
In recent years, a proliferation of cyber-security threats and diversity has been on the rise culminating in an increase in their reporting and analysis. To counter that, many non-profit organizations have emerged in this domain, such as MITRE and OSWAP, which have been actively tracking vulnerabilities, and publishing defense recommendations in standardized formats. As producing data in such formats manually is very time-consuming, there have been some proposals to automate the process. Unfortunately, a major obstacle to adopting supervised machine learning for this problem has been the lack of publicly available specialized datasets. Here, we aim to bridge this gap. In particular, we focus on mapping CVE records into MITRE CWE Weaknesses, and we release to the research community a manually annotated dataset of 4,012 records for this task. With a human-in-the-loop framework in mind, we approach the problem as a ranking task and aim to incorporate reinforced learning to make use of the human feedback in future work. Our experimental results using fine-tuned deep learning models, namely Sentence-BERT and rankT5, show sizable performance gains over BM25, BERT, and RoBERTa, which demonstrates the need for an architecture capable of good semantic understanding for this task.

Title: DeepReShape: Redesigning Neural Networks for Efficient Private Inference. (arXiv:2304.10593v1 [cs.CR])

Paper URL: http://arxiv.org/abs/2304.10593
Code URL: null
Copy Paste: [[2304.10593] DeepReShape: Redesigning Neural Networks for Efficient Private Inference](http://arxiv.org/abs/2304.10593) #security
Summary:
The increasing demand for privacy and security has driven the advancement of private inference (PI), a cryptographic method enabling inferences directly on encrypted data. However, the computational and storage burdens of non-linear operators (e.g., ReLUs) render it impractical. Despite these limitations, prior ReLU optimization methods consistently relied on classical networks, that are not optimized for PI. Moreover, the selection of baseline networks in these ReLU optimization methods remains enigmatic and fails to provide insights into network attributes contributing to PI efficiency. In this paper, we investigate the desirable network architecture for efficient PI, and {\em key finding} is wider networks are superior at higher ReLU counts, while networks with a greater proportion of least-critical ReLUs excel at lower ReLU counts. Leveraging these findings, we develop a novel network redesign technique (DeepReShape) with a complexity of $\mathcal{O}(1)$, and synthesize specialized architectures(HybReNet). Compared to the state-of-the-art (SNL on CIFAR-100), we achieve a 2.35\% accuracy gain at 180K ReLUs, and for ResNet50 on TinyImageNet our method saves 4.2$\times$ ReLUs at iso-accuracy.

Title: A Survey of Prevent and Detect Access Control Vulnerabilities. (arXiv:2304.10600v1 [cs.CR])

Paper URL: http://arxiv.org/abs/2304.10600
Code URL: null
Copy Paste: [[2304.10600] A Survey of Prevent and Detect Access Control Vulnerabilities](http://arxiv.org/abs/2304.10600) #security
Summary:
Broken access control is one of the most common security vulnerabilities in web applications. These vulnerabilities are the major cause of many data breach incidents, which result in privacy concern and revenue loss. However, preventing and detecting access control vulnerabilities proactively in web applications could be difficult. Currently, these vulnerabilities are actively detected by bug bounty hunters post-deployment, which creates attack windows for malicious access. To solve this problem proactively requires security awareness and expertise from developers, which calls for systematic solutions.

This survey targets to provide a structured overview of approaches that tackle access control vulnerabilities. It firstly discusses the unique feature of access control vulnerabilities, then studies the existing works proposed to tackle access control vulnerabilities in web applications, which span the spectrum of software development from software design and implementation, software analysis and testing, and runtime monitoring. At last we discuss the open problem in this field.

Title: Cryptanalysis of quantum permutation pad. (arXiv:2304.11081v1 [cs.CR])

Paper URL: http://arxiv.org/abs/2304.11081
Code URL: null
Copy Paste: [[2304.11081] Cryptanalysis of quantum permutation pad](http://arxiv.org/abs/2304.11081) #security
Summary:
Cryptanalysis increases the level of confidence in cryptographic algorithms. We analyze the security of a symmetric cryptographic algorithm - quantum permutation pad (QPP) [8]. We found the instances of ciphertext the same as plaintext even after the action of QPP with the probability 1/N when the entire set of permutation matrices of dimension N is used and with the probability 1/N^m when an incomplete set of m permutation matrices of dimension N are used. We visually show such instances in a cipher image created by QPP of 256 permutation matrices of different dimensions. For any practical usage of QPP, we recommend a set of 256 permutation matrices of a dimension more or equal to 2048.

Title: AI Product Security: A Primer for Developers. (arXiv:2304.11087v1 [cs.CR])

Paper URL: http://arxiv.org/abs/2304.11087
Code URL: null
Copy Paste: [[2304.11087] AI Product Security: A Primer for Developers](http://arxiv.org/abs/2304.11087) #security
Summary:
Not too long ago, AI security used to mean the research and practice of how AI can empower cybersecurity, that is, AI for security. Ever since Ian Goodfellow and his team popularized adversarial attacks on machine learning, security for AI became an important concern and also part of AI security. It is imperative to understand the threats to machine learning products and avoid common pitfalls in AI product development. This article is addressed to developers, designers, managers and researchers of AI software products.

Title: Implementing and Evaluating Security in O-RAN: Interfaces, Intelligence, and Platforms. (arXiv:2304.11125v1 [cs.CR])

Paper URL: http://arxiv.org/abs/2304.11125
Code URL: null
Copy Paste: [[2304.11125] Implementing and Evaluating Security in O-RAN: Interfaces, Intelligence, and Platforms](http://arxiv.org/abs/2304.11125) #security
Summary:
The Open Radio Access Network (RAN) is a networking paradigm that builds on top of cloud-based, multi-vendor, open and intelligent architectures to shape the next generation of cellular networks for 5G and beyond. While this new paradigm comes with many advantages in terms of observatibility and reconfigurability of the network, it inevitably expands the threat surface of cellular systems and can potentially expose its components to several cyber attacks, thus making securing O-RAN networks a necessity. In this paper, we explore the security aspects of O-RAN systems by focusing on the specifications and architectures proposed by the O-RAN Alliance. We address the problem of securing O-RAN systems with an holistic perspective, including considerations on the open interfaces used to interconnect the different O-RAN components, on the overall platform, and on the intelligence used to monitor and control the network. For each focus area we identify threats, discuss relevant solutions to address these issues, and demonstrate experimentally how such solutions can effectively defend O-RAN systems against selected cyber attacks. This article is the first work in approaching the security aspect of O-RAN holistically and with experimental evidence obtained on a state-of-the-art programmable O-RAN platform, thus providing unique guideline for researchers in the field.

privacy

Title: Sparsity in neural networks can improve their privacy. (arXiv:2304.10553v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2304.10553
Code URL: null
Copy Paste: [[2304.10553] Sparsity in neural networks can improve their privacy](http://arxiv.org/abs/2304.10553) #privacy
Summary:
This article measures how sparsity can make neural networks more robust to membership inference attacks. The obtained empirical results show that sparsity improves the privacy of the network, while preserving comparable performances on the task at hand. This empirical study completes and extends existing literature.

Title: Outsourced Analysis of Encrypted Graphs in the Cloud with Privacy Protection. (arXiv:2304.10833v1 [cs.CR])

Paper URL: http://arxiv.org/abs/2304.10833
Code URL: null
Copy Paste: [[2304.10833] Outsourced Analysis of Encrypted Graphs in the Cloud with Privacy Protection](http://arxiv.org/abs/2304.10833) #privacy
Summary:
Huge diagrams have unique properties for organizations and research, such as client linkages in informal organizations and customer evaluation lattices in social channels. They necessitate a lot of financial assets to maintain because they are large and frequently continue to expand. Owners of large diagrams may need to use cloud resources due to the extensive arrangement of open cloud resources to increase capacity and computation flexibility. However, the cloud's accountability and protection of schematics have become a significant issue. In this study, we consider calculations for security savings for essential graph examination practices: schematic extraterrestrial examination for outsourcing graphs in the cloud server. We create the security-protecting variants of the two proposed Eigen decay computations. They are using two cryptographic algorithms: additional substance homomorphic encryption (ASHE) strategies and some degree homomorphic encryption (SDHE) methods. Inadequate networks also feature a distinctively confidential info adaptation convention to allow the trade-off between secrecy and data sparseness. Both dense and sparse structures are investigated. According to test results, calculations with sparse encoding can drastically reduce information. SDHE-based strategies have reduced computing time, while ASHE-based methods have reduced stockpiling expenses.

Title: Mining Privacy-Preserving Association Rules based on Parallel Processing in Cloud Computing. (arXiv:2304.10836v1 [cs.CR])

Paper URL: http://arxiv.org/abs/2304.10836
Code URL: null
Copy Paste: [[2304.10836] Mining Privacy-Preserving Association Rules based on Parallel Processing in Cloud Computing](http://arxiv.org/abs/2304.10836) #privacy
Summary:
With the onset of the Information Era and the rapid growth of information technology, ample space for processing and extracting data has opened up. However, privacy concerns may stifle expansion throughout this area. The challenge of reliable mining techniques when transactions disperse across sources is addressed in this study. This work looks at the prospect of creating a new set of three algorithms that can obtain maximum privacy, data utility, and time savings while doing so. This paper proposes a unique double encryption and Transaction Splitter approach to alter the database to optimize the data utility and confidentiality tradeoff in the preparation phase. This paper presents a customized apriori approach for the mining process, which does not examine the entire database to estimate the support for each attribute. Existing distributed data solutions have a high encryption complexity and an insufficient specification of many participants' properties. Proposed solutions provide increased privacy protection against a variety of attack models. Furthermore, in terms of communication cycles and processing complexity, it is much simpler and quicker. Proposed work tests on top of a realworld transaction database demonstrate that the aim of the proposed method is realistic.

Title: Auditing and Generating Synthetic Data with Controllable Trust Trade-offs. (arXiv:2304.10819v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2304.10819
Code URL: null
Copy Paste: [[2304.10819] Auditing and Generating Synthetic Data with Controllable Trust Trade-offs](http://arxiv.org/abs/2304.10819) #privacy
Summary:
Data collected from the real world tends to be biased, unbalanced, and at risk of exposing sensitive and private information. This reality has given rise to the idea of creating synthetic datasets to alleviate risk, bias, harm, and privacy concerns inherent in the real data. This concept relies on Generative AI models to produce unbiased, privacy-preserving synthetic data while being true to the real data. In this new paradigm, how can we tell if this approach delivers on its promises? We present an auditing framework that offers a holistic assessment of synthetic datasets and AI models trained on them, centered around bias and discrimination prevention, fidelity to the real data, utility, robustness, and privacy preservation. We showcase our framework by auditing multiple generative models on diverse use cases, including education, healthcare, banking, human resources, and across different modalities, from tabular, to time-series, to natural language. Our use cases demonstrate the importance of a holistic assessment in order to ensure compliance with socio-technical safeguards that regulators and policymakers are increasingly enforcing. For this purpose, we introduce the trust index that ranks multiple synthetic datasets based on their prescribed safeguards and their desired trade-offs. Moreover, we devise a trust-index-driven model selection and cross-validation procedure via auditing in the training loop that we showcase on a class of transformer models that we dub TrustFormers, across different modalities. This trust-driven model selection allows for controllable trust trade-offs in the resulting synthetic data. We instrument our auditing framework with workflows that connect different stakeholders from model development to audit and certification via a synthetic data auditing report.

protect

Title: A Plug-and-Play Defensive Perturbation for Copyright Protection of DNN-based Applications. (arXiv:2304.10679v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2304.10679
Code URL: null
Copy Paste: [[2304.10679] A Plug-and-Play Defensive Perturbation for Copyright Protection of DNN-based Applications](http://arxiv.org/abs/2304.10679) #protect
Summary:
Wide deployment of deep neural networks (DNNs) based applications (e.g., style transfer, cartoonish), stimulating the requirement of copyright protection of such application's production. Although some traditional visible copyright techniques are available, they would introduce undesired traces and result in a poor user experience. In this paper, we propose a novel plug-and-play invisible copyright protection method based on defensive perturbation for DNN-based applications (i.e., style transfer). Rather than apply the perturbation to attack the DNNs model, we explore the potential utilization of perturbation in copyright protection. Specifically, we project the copyright information to the defensive perturbation with the designed copyright encoder, which is added to the image to be protected. Then, we extract the copyright information from the encoded copyrighted image with the devised copyright decoder. Furthermore, we use a robustness module to strengthen the decoding capability of the decoder toward images with various distortions (e.g., JPEG compression), which may be occurred when the user posts the image on social media. To ensure the image quality of encoded images and decoded copyright images, a loss function was elaborately devised. Objective and subjective experiment results demonstrate the effectiveness of the proposed method. We have also conducted physical world tests on social media (i.e., Wechat and Twitter) by posting encoded copyright images. The results show that the copyright information in the encoded image saved from social media can still be correctly extracted.

Title: Matching-based Data Valuation for Generative Model. (arXiv:2304.10701v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2304.10701
Code URL: null
Copy Paste: [[2304.10701] Matching-based Data Valuation for Generative Model](http://arxiv.org/abs/2304.10701) #protect
Summary:
Data valuation is critical in machine learning, as it helps enhance model transparency and protect data properties. Existing data valuation methods have primarily focused on discriminative models, neglecting deep generative models that have recently gained considerable attention. Similar to discriminative models, there is an urgent need to assess data contributions in deep generative models as well. However, previous data valuation approaches mainly relied on discriminative model performance metrics and required model retraining. Consequently, they cannot be applied directly and efficiently to recent deep generative models, such as generative adversarial networks and diffusion models, in practice. To bridge this gap, we formulate the data valuation problem in generative models from a similarity-matching perspective. Specifically, we introduce Generative Model Valuator (GMValuator), the first model-agnostic approach for any generative models, designed to provide data valuation for generation tasks. We have conducted extensive experiments to demonstrate the effectiveness of the proposed method. To the best of their knowledge, GMValuator is the first work that offers a training-free, post-hoc data valuation strategy for deep generative models.

Title: Deep Attention Unet: A Network Model with Global Feature Perception Ability. (arXiv:2304.10829v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2304.10829
Code URL: null
Copy Paste: [[2304.10829] Deep Attention Unet: A Network Model with Global Feature Perception Ability](http://arxiv.org/abs/2304.10829) #protect
Summary:
Remote sensing image segmentation is a specific task of remote sensing image interpretation. A good remote sensing image segmentation algorithm can provide guidance for environmental protection, agricultural production, and urban construction. This paper proposes a new type of UNet image segmentation algorithm based on channel self attention mechanism and residual connection called . In my experiment, the new network model improved mIOU by 2.48% compared to traditional UNet on the FoodNet dataset. The image segmentation algorithm proposed in this article enhances the internal connections between different items in the image, thus achieving better segmentation results for remote sensing images with occlusion.

Title: Deep Transfer Learning Applications in Intrusion Detection Systems: A Comprehensive Review. (arXiv:2304.10550v1 [cs.CR])

Paper URL: http://arxiv.org/abs/2304.10550
Code URL: null
Copy Paste: [[2304.10550] Deep Transfer Learning Applications in Intrusion Detection Systems: A Comprehensive Review](http://arxiv.org/abs/2304.10550) #protect
Summary:
Globally, the external Internet is increasingly being connected to the contemporary industrial control system. As a result, there is an immediate need to protect the network from several threats. The key infrastructure of industrial activity may be protected from harm by using an intrusion detection system (IDS), a preventive measure mechanism, to recognize new kinds of dangerous threats and hostile activities. The most recent artificial intelligence (AI) techniques used to create IDS in many kinds of industrial control networks are examined in this study, with a particular emphasis on IDS-based deep transfer learning (DTL). This latter can be seen as a type of information fusion that merge, and/or adapt knowledge from multiple domains to enhance the performance of the target task, particularly when the labeled data in the target domain is scarce. Publications issued after 2015 were taken into account. These selected publications were divided into three categories: DTL-only and IDS-only are involved in the introduction and background, and DTL-based IDS papers are involved in the core papers of this review. Researchers will be able to have a better grasp of the current state of DTL approaches used in IDS in many different types of networks by reading this review paper. Other useful information, such as the datasets used, the sort of DTL employed, the pre-trained network, IDS techniques, the evaluation metrics including accuracy/F-score and false alarm rate (FAR), and the improvement gained, were also covered. The algorithms, and methods used in several studies, or illustrate deeply and clearly the principle in any DTL-based IDS subcategory are presented to the reader.

defense

Title: A Multiagent CyberBattleSim for RL Cyber Operation Agents. (arXiv:2304.11052v1 [cs.CR])

Paper URL: http://arxiv.org/abs/2304.11052
Code URL: null
Copy Paste: [[2304.11052] A Multiagent CyberBattleSim for RL Cyber Operation Agents](http://arxiv.org/abs/2304.11052) #defense
Summary:
Hardening cyber physical assets is both crucial and labor-intensive. Recently, Machine Learning (ML) in general and Reinforcement Learning RL) more specifically has shown great promise to automate tasks that otherwise would require significant human insight/intelligence. The development of autonomous RL agents requires a suitable training environment that allows us to quickly evaluate various alternatives, in particular how to arrange training scenarios that pit attackers and defenders against each other. CyberBattleSim is a training environment that supports the training of red agents, i.e., attackers. We added the capability to train blue agents, i.e., defenders. The paper describes our changes and reports on the results we obtained when training blue agents, either in isolation or jointly with red agents. Our results show that training a blue agent does lead to stronger defenses against attacks. In particular, training a blue agent jointly with a red agent increases the blue agent's capability to thwart sophisticated red agents.

Title: Training Automated Defense Strategies Using Graph-based Cyber Attack Simulations. (arXiv:2304.11084v1 [cs.CR])

Paper URL: http://arxiv.org/abs/2304.11084
Code URL: null
Copy Paste: [[2304.11084] Training Automated Defense Strategies Using Graph-based Cyber Attack Simulations](http://arxiv.org/abs/2304.11084) #defense
Summary:
We implemented and evaluated an automated cyber defense agent. The agent takes security alerts as input and uses reinforcement learning to learn a policy for executing predefined defensive measures. The defender policies were trained in an environment intended to simulate a cyber attack. In the simulation, an attacking agent attempts to capture targets in the environment, while the defender attempts to protect them by enabling defenses. The environment was modeled using attack graphs based on the Meta Attack Language language. We assumed that defensive measures have downtime costs, meaning that the defender agent was penalized for using them. We also assumed that the environment was equipped with an imperfect intrusion detection system that occasionally produces erroneous alerts based on the environment state. To evaluate the setup, we trained the defensive agent with different volumes of intrusion detection system noise. We also trained agents with different attacker strategies and graph sizes. In experiments, the defensive agent using policies trained with reinforcement learning outperformed agents using heuristic policies. Experiments also demonstrated that the policies could generalize across different attacker strategies. However, the performance of the learned policies decreased as the attack graphs increased in size.

attack

Title: Launching a Robust Backdoor Attack under Capability Constrained Scenarios. (arXiv:2304.10985v1 [cs.CR])

Paper URL: http://arxiv.org/abs/2304.10985
Code URL: null
Copy Paste: [[2304.10985] Launching a Robust Backdoor Attack under Capability Constrained Scenarios](http://arxiv.org/abs/2304.10985) #attack
Summary:
As deep neural networks continue to be used in critical domains, concerns over their security have emerged. Deep learning models are vulnerable to backdoor attacks due to the lack of transparency. A poisoned backdoor model may perform normally in routine environments, but exhibit malicious behavior when the input contains a trigger. Current research on backdoor attacks focuses on improving the stealthiness of triggers, and most approaches require strong attacker capabilities, such as knowledge of the model structure or control over the training process. These attacks are impractical since in most cases the attacker's capabilities are limited. Additionally, the issue of model robustness has not received adequate attention. For instance, model distillation is commonly used to streamline model size as the number of parameters grows exponentially, and most of previous backdoor attacks failed after model distillation; the image augmentation operations can destroy the trigger and thus disable the backdoor. This study explores the implementation of black-box backdoor attacks within capability constraints. An attacker can carry out such attacks by acting as either an image annotator or an image provider, without involvement in the training process or knowledge of the target model's structure. Through the design of a backdoor trigger, our attack remains effective after model distillation and image augmentation, making it more threatening and practical. Our experimental results demonstrate that our method achieves a high attack success rate in black-box scenarios and evades state-of-the-art backdoor defenses.

Title: Fundamental Limitations of Alignment in Large Language Models. (arXiv:2304.11082v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2304.11082
Code URL: null
Copy Paste: [[2304.11082] Fundamental Limitations of Alignment in Large Language Models](http://arxiv.org/abs/2304.11082) #attack
Summary:
An important aspect in developing language models that interact with humans is aligning their behavior to be useful and unharmful for their human users. This is usually achieved by tuning the model in a way that enhances desired behaviors and inhibits undesired ones, a process referred to as alignment. In this paper, we propose a theoretical approach called Behavior Expectation Bounds (BEB) which allows us to formally investigate several inherent characteristics and limitations of alignment in large language models. Importantly, we prove that for any behavior that has a finite probability of being exhibited by the model, there exist prompts that can trigger the model into outputting this behavior, with probability that increases with the length of the prompt. This implies that any alignment process that attenuates undesired behavior but does not remove it altogether, is not safe against adversarial prompting attacks. Furthermore, our framework hints at the mechanism by which leading alignment approaches such as reinforcement learning from human feedback increase the LLM's proneness to being prompted into the undesired behaviors. Moreover, we include the notion of personas in our BEB framework, and find that behaviors which are generally very unlikely to be exhibited by the model can be brought to the front by prompting the model to behave as specific persona. This theoretical result is being experimentally demonstrated in large scale by the so called contemporary "chatGPT jailbreaks", where adversarial users trick the LLM into breaking its alignment guardrails by triggering it into acting as a malicious persona. Our results expose fundamental limitations in alignment of LLMs and bring to the forefront the need to devise reliable mechanisms for ensuring AI safety.

Title: Denial-of-Service or Fine-Grained Control: Towards Flexible Model Poisoning Attacks on Federated Learning. (arXiv:2304.10783v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2304.10783
Code URL: null
Copy Paste: [[2304.10783] Denial-of-Service or Fine-Grained Control: Towards Flexible Model Poisoning Attacks on Federated Learning](http://arxiv.org/abs/2304.10783) #attack
Summary:
Federated learning (FL) is vulnerable to poisoning attacks, where adversaries corrupt the global aggregation results and cause denial-of-service (DoS). Unlike recent model poisoning attacks that optimize the amplitude of malicious perturbations along certain prescribed directions to cause DoS, we propose a Flexible Model Poisoning Attack (FMPA) that can achieve versatile attack goals. We consider a practical threat scenario where no extra knowledge about the FL system (e.g., aggregation rules or updates on benign devices) is available to adversaries. FMPA exploits the global historical information to construct an estimator that predicts the next round of the global model as a benign reference. It then fine-tunes the reference model to obtain the desired poisoned model with low accuracy and small perturbations. Besides the goal of causing DoS, FMPA can be naturally extended to launch a fine-grained controllable attack, making it possible to precisely reduce the global accuracy. Armed with precise control, malicious FL service providers can gain advantages over their competitors without getting noticed, hence opening a new attack surface in FL other than DoS. Even for the purpose of DoS, experiments show that FMPA significantly decreases the global accuracy, outperforming six state-of-the-art attacks.

Title: Timing the Transient Execution: A New Side-Channel Attack on Intel CPUs. (arXiv:2304.10877v1 [cs.CR])

Paper URL: http://arxiv.org/abs/2304.10877
Code URL: null
Copy Paste: [[2304.10877] Timing the Transient Execution: A New Side-Channel Attack on Intel CPUs](http://arxiv.org/abs/2304.10877) #attack
Summary:
The transient execution attack is a type of attack leveraging the vulnerability of modern CPU optimization technologies. New attacks surface rapidly. The side-channel is a key part of transient execution attacks to leak data. In this work, we discover a vulnerability that the change of the EFLAGS register in transient execution may have a side effect on the Jcc (jump on condition code) instruction after it in Intel CPUs. Based on our discovery, we propose a new side-channel attack that leverages the timing of both transient execution and Jcc instructions to deliver data. This attack encodes secret data to the change of register which makes the execution time of context slightly slower, which can be measured by the attacker to decode data. This attack doesn't rely on the cache system and doesn't need to reset the EFLAGS register manually to its initial state before the attack, which may make it more difficult to detect or mitigate. We implemented this side-channel on machines with Intel Core i7-6700, i7-7700, and i9-10980XE CPUs. In the first two processors, we combined it as the side-channel of the Meltdown attack, which could achieve 100\% success leaking rate. We evaluate and discuss potential defenses against the attack. Our contributions include discovering security vulnerabilities in the implementation of Jcc instructions and EFLAGS register and proposing a new side-channel attack that does not rely on the cache system.

Title: PowerGAN: A Machine Learning Approach for Power Side-Channel Attack on Compute-in-Memory Accelerators. (arXiv:2304.11056v1 [cs.CR])

Paper URL: http://arxiv.org/abs/2304.11056
Code URL: null
Copy Paste: [[2304.11056] PowerGAN: A Machine Learning Approach for Power Side-Channel Attack on Compute-in-Memory Accelerators](http://arxiv.org/abs/2304.11056) #attack
Summary:
Analog compute-in-memory (CIM) accelerators are becoming increasingly popular for deep neural network (DNN) inference due to their energy efficiency and in-situ vector-matrix multiplication (VMM) capabilities. However, as the use of DNNs expands, protecting user input privacy has become increasingly important. In this paper, we identify a security vulnerability wherein an adversary can reconstruct the user's private input data from a power side-channel attack, under proper data acquisition and pre-processing, even without knowledge of the DNN model. We further demonstrate a machine learning-based attack approach using a generative adversarial network (GAN) to enhance the reconstruction. Our results show that the attack methodology is effective in reconstructing user inputs from analog CIM accelerator power leakage, even when at large noise levels and countermeasures are applied. Specifically, we demonstrate the efficacy of our approach on the U-Net for brain tumor detection in magnetic resonance imaging (MRI) medical images, with a noise-level of 20% standard deviation of the maximum power signal value. Our study highlights a significant security vulnerability in analog CIM accelerators and proposes an effective attack methodology using a GAN to breach user privacy.

Title: An Unbiased Transformer Source Code Learning with Semantic Vulnerability Graph. (arXiv:2304.11072v1 [cs.CR])

Paper URL: http://arxiv.org/abs/2304.11072
Code URL: https://github.com/pial08/semvuldet
Copy Paste: [[2304.11072] An Unbiased Transformer Source Code Learning with Semantic Vulnerability Graph](http://arxiv.org/abs/2304.11072) #attack
Summary:
Over the years, open-source software systems have become prey to threat actors. Even as open-source communities act quickly to patch the breach, code vulnerability screening should be an integral part of agile software development from the beginning. Unfortunately, current vulnerability screening techniques are ineffective at identifying novel vulnerabilities or providing developers with code vulnerability and classification. Furthermore, the datasets used for vulnerability learning often exhibit distribution shifts from the real-world testing distribution due to novel attack strategies deployed by adversaries and as a result, the machine learning model's performance may be hindered or biased. To address these issues, we propose a joint interpolated multitasked unbiased vulnerability classifier comprising a transformer "RoBERTa" and graph convolution neural network (GCN). We present a training process utilizing a semantic vulnerability graph (SVG) representation from source code, created by integrating edges from a sequential flow, control flow, and data flow, as well as a novel flow dubbed Poacher Flow (PF). Poacher flow edges reduce the gap between dynamic and static program analysis and handle complex long-range dependencies. Moreover, our approach reduces biases of classifiers regarding unbalanced datasets by integrating Focal Loss objective function along with SVG. Remarkably, experimental results show that our classifier outperforms state-of-the-art results on vulnerability detection with fewer false negatives and false positives. After testing our model across multiple datasets, it shows an improvement of at least 2.41% and 18.75% in the best-case scenario. Evaluations using N-day program samples demonstrate that our proposed approach achieves a 93% accuracy and was able to detect 4, zero-day vulnerabilities from popular GitHub repositories.

robust

Title: Learning in Imperfect Environment: Multi-Label Classification with Long-Tailed Distribution and Partial Labels. (arXiv:2304.10539v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2304.10539
Code URL: null
Copy Paste: [[2304.10539] Learning in Imperfect Environment: Multi-Label Classification with Long-Tailed Distribution and Partial Labels](http://arxiv.org/abs/2304.10539) #robust
Summary:
Conventional multi-label classification (MLC) methods assume that all samples are fully labeled and identically distributed. Unfortunately, this assumption is unrealistic in large-scale MLC data that has long-tailed (LT) distribution and partial labels (PL). To address the problem, we introduce a novel task, Partial labeling and Long-Tailed Multi-Label Classification (PLT-MLC), to jointly consider the above two imperfect learning environments. Not surprisingly, we find that most LT-MLC and PL-MLC approaches fail to solve the PLT-MLC, resulting in significant performance degradation on the two proposed PLT-MLC benchmarks. Therefore, we propose an end-to-end learning framework: \textbf{CO}rrection $\rightarrow$ \textbf{M}odificat\textbf{I}on $\rightarrow$ balan\textbf{C}e, abbreviated as \textbf{\method{}}. Our bootstrapping philosophy is to simultaneously correct the missing labels (Correction) with convinced prediction confidence over a class-aware threshold and to learn from these recall labels during training. We next propose a novel multi-focal modifier loss that simultaneously addresses head-tail imbalance and positive-negative imbalance to adaptively modify the attention to different samples (Modification) under the LT class distribution. In addition, we develop a balanced training strategy by distilling the model's learning effect from head and tail samples, and thus design a balanced classifier (Balance) conditioned on the head and tail learning effect to maintain stable performance for all samples. Our experimental study shows that the proposed \method{} significantly outperforms general MLC, LT-MLC and PL-MLC methods in terms of effectiveness and robustness on our newly created PLT-MLC datasets.

Title: Multi-domain learning CNN model for microscopy image classification. (arXiv:2304.10616v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2304.10616
Code URL: null
Copy Paste: [[2304.10616] Multi-domain learning CNN model for microscopy image classification](http://arxiv.org/abs/2304.10616) #robust
Summary:
For any type of microscopy image, getting a deep learning model to work well requires considerable effort to select a suitable architecture and time to train it. As there is a wide range of microscopes and experimental setups, designing a single model that can apply to multiple imaging domains, instead of having multiple per-domain models, becomes more essential. This task is challenging and somehow overlooked in the literature. In this paper, we present a multi-domain learning architecture for the classification of microscopy images that differ significantly in types and contents. Unlike previous methods that are computationally intensive, we have developed a compact model, called Mobincep, by combining the simple but effective techniques of depth-wise separable convolution and the inception module. We also introduce a new optimization technique to regulate the latent feature space during training to improve the network's performance. We evaluated our model on three different public datasets and compared its performance in single-domain and multiple-domain learning modes. The proposed classifier surpasses state-of-the-art results and is robust for limited labeled data. Moreover, it helps to eliminate the burden of designing a new network when switching to new experiments.

Title: Enhancing object detection robustness: A synthetic and natural perturbation approach. (arXiv:2304.10622v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2304.10622
Code URL: null
Copy Paste: [[2304.10622] Enhancing object detection robustness: A synthetic and natural perturbation approach](http://arxiv.org/abs/2304.10622) #robust
Summary:
Robustness against real-world distribution shifts is crucial for the successful deployment of object detection models in practical applications. In this paper, we address the problem of assessing and enhancing the robustness of object detection models against natural perturbations, such as varying lighting conditions, blur, and brightness. We analyze four state-of-the-art deep neural network models, Detr-ResNet-101, Detr-ResNet-50, YOLOv4, and YOLOv4-tiny, using the COCO 2017 dataset and ExDark dataset. By simulating synthetic perturbations with the AugLy package, we systematically explore the optimal level of synthetic perturbation required to improve the models robustness through data augmentation techniques. Our comprehensive ablation study meticulously evaluates the impact of synthetic perturbations on object detection models performance against real-world distribution shifts, establishing a tangible connection between synthetic augmentation and real-world robustness. Our findings not only substantiate the effectiveness of synthetic perturbations in improving model robustness, but also provide valuable insights for researchers and practitioners in developing more robust and reliable object detection models tailored for real-world applications.

Title: Joint Token Pruning and Squeezing Towards More Aggressive Compression of Vision Transformers. (arXiv:2304.10716v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2304.10716
Code URL: https://github.com/megvii-research/tps-cvpr2023
Copy Paste: [[2304.10716] Joint Token Pruning and Squeezing Towards More Aggressive Compression of Vision Transformers](http://arxiv.org/abs/2304.10716) #robust
Summary:
Although vision transformers (ViTs) have shown promising results in various computer vision tasks recently, their high computational cost limits their practical applications. Previous approaches that prune redundant tokens have demonstrated a good trade-off between performance and computation costs. Nevertheless, errors caused by pruning strategies can lead to significant information loss. Our quantitative experiments reveal that the impact of pruned tokens on performance should be noticeable. To address this issue, we propose a novel joint Token Pruning & Squeezing module (TPS) for compressing vision transformers with higher efficiency. Firstly, TPS adopts pruning to get the reserved and pruned subsets. Secondly, TPS squeezes the information of pruned tokens into partial reserved tokens via the unidirectional nearest-neighbor matching and similarity-based fusing steps. Compared to state-of-the-art methods, our approach outperforms them under all token pruning intensities. Especially while shrinking DeiT-tiny&small computational budgets to 35%, it improves the accuracy by 1%-6% compared with baselines on ImageNet classification. The proposed method can accelerate the throughput of DeiT-small beyond DeiT-tiny, while its accuracy surpasses DeiT-tiny by 4.78%. Experiments on various transformers demonstrate the effectiveness of our method, while analysis experiments prove our higher robustness to the errors of the token pruning policy. Code is available at https://github.com/megvii-research/TPS-CVPR2023.

Title: RoCOCO: Robust Benchmark MS-COCO to Stress-test Robustness of Image-Text Matching Models. (arXiv:2304.10727v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2304.10727
Code URL: https://github.com/pseulki/rococo
Copy Paste: [[2304.10727] RoCOCO: Robust Benchmark MS-COCO to Stress-test Robustness of Image-Text Matching Models](http://arxiv.org/abs/2304.10727) #robust
Summary:
Recently, large-scale vision-language pre-training models and visual semantic embedding methods have significantly improved image-text matching (ITM) accuracy on MS COCO 5K test set. However, it is unclear how robust these state-of-the-art (SOTA) models are when using them in the wild. In this paper, we propose a novel evaluation benchmark to stress-test the robustness of ITM models. To this end, we add various fooling images and captions to a retrieval pool. Specifically, we change images by inserting unrelated images, and change captions by substituting a noun, which can change the meaning of a sentence. We discover that just adding these newly created images and captions to the test set can degrade performances (i.e., Recall@1) of a wide range of SOTA models (e.g., 81.9% $\rightarrow$ 64.5% in BLIP, 66.1% $\rightarrow$ 37.5% in VSE$\infty$). We expect that our findings can provide insights for improving the robustness of the vision-language models and devising more diverse stress-test methods in cross-modal retrieval task. Source code and dataset will be available at https://github.com/pseulki/rococo.

Title: Missing Modality Robustness in Semi-Supervised Multi-Modal Semantic Segmentation. (arXiv:2304.10756v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2304.10756
Code URL: https://github.com/harshm121/m3l
Copy Paste: [[2304.10756] Missing Modality Robustness in Semi-Supervised Multi-Modal Semantic Segmentation](http://arxiv.org/abs/2304.10756) #robust
Summary:
Using multiple spatial modalities has been proven helpful in improving semantic segmentation performance. However, there are several real-world challenges that have yet to be addressed: (a) improving label efficiency and (b) enhancing robustness in realistic scenarios where modalities are missing at the test time. To address these challenges, we first propose a simple yet efficient multi-modal fusion mechanism Linear Fusion, that performs better than the state-of-the-art multi-modal models even with limited supervision. Second, we propose M3L: Multi-modal Teacher for Masked Modality Learning, a semi-supervised framework that not only improves the multi-modal performance but also makes the model robust to the realistic missing modality scenario using unlabeled data. We create the first benchmark for semi-supervised multi-modal semantic segmentation and also report the robustness to missing modalities. Our proposal shows an absolute improvement of up to 10% on robust mIoU above the most competitive baselines. Our code is available at https://github.com/harshm121/M3L

Title: Automated Static Camera Calibration with Intelligent Vehicles. (arXiv:2304.10814v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2304.10814
Code URL: null
Copy Paste: [[2304.10814] Automated Static Camera Calibration with Intelligent Vehicles](http://arxiv.org/abs/2304.10814) #robust
Summary:
Connected and cooperative driving requires precise calibration of the roadside infrastructure for having a reliable perception system. To solve this requirement in an automated manner, we present a robust extrinsic calibration method for automated geo-referenced camera calibration. Our method requires a calibration vehicle equipped with a combined GNSS/RTK receiver and an inertial measurement unit (IMU) for self-localization. In order to remove any requirements for the target's appearance and the local traffic conditions, we propose a novel approach using hypothesis filtering. Our method does not require any human interaction with the information recorded by both the infrastructure and the vehicle. Furthermore, we do not limit road access for other road users during calibration. We demonstrate the feasibility and accuracy of our approach by evaluating our approach on synthetic datasets as well as a real-world connected intersection, and deploying the calibration on real infrastructure. Our source code is publicly available.

Title: IXA/Cogcomp at SemEval-2023 Task 2: Context-enriched Multilingual Named Entity Recognition using Knowledge Bases. (arXiv:2304.10637v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2304.10637
Code URL: https://github.com/ikergarcia1996/context-enriched-ner
Copy Paste: [[2304.10637] IXA/Cogcomp at SemEval-2023 Task 2: Context-enriched Multilingual Named Entity Recognition using Knowledge Bases](http://arxiv.org/abs/2304.10637) #robust
Summary:
Named Entity Recognition (NER) is a core natural language processing task in which pre-trained language models have shown remarkable performance. However, standard benchmarks like CoNLL 2003 \cite{conll03} do not address many of the challenges that deployed NER systems face, such as having to classify emerging or complex entities in a fine-grained way. In this paper we present a novel NER cascade approach comprising three steps: first, identifying candidate entities in the input sentence; second, linking the each candidate to an existing knowledge base; third, predicting the fine-grained category for each entity candidate. We empirically demonstrate the significance of external knowledge bases in accurately classifying fine-grained and emerging entities. Our system exhibits robust performance in the MultiCoNER2 \cite{multiconer2-data} shared task, even in the low-resource language setting where we leverage knowledge bases of high-resource languages.

Title: LEIA: Linguistic Embeddings for the Identification of Affect. (arXiv:2304.10973v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2304.10973
Code URL: null
Copy Paste: [[2304.10973] LEIA: Linguistic Embeddings for the Identification of Affect](http://arxiv.org/abs/2304.10973) #robust
Summary:
The wealth of text data generated by social media has enabled new kinds of analysis of emotions with language models. These models are often trained on small and costly datasets of text annotations produced by readers who guess the emotions expressed by others in social media posts. This affects the quality of emotion identification methods due to training data size limitations and noise in the production of labels used in model development. We present LEIA, a model for emotion identification in text that has been trained on a dataset of more than 6 million posts with self-annotated emotion labels for happiness, affection, sadness, anger, and fear. LEIA is based on a word masking method that enhances the learning of emotion words during model pre-training. LEIA achieves macro-F1 values of approximately 73 on three in-domain test datasets, outperforming other supervised and unsupervised methods in a strong benchmark that shows that LEIA generalizes across posts, users, and time periods. We further perform an out-of-domain evaluation on five different datasets of social media and other sources, showing LEIA's robust performance across media, data collection methods, and annotation schemes. Our results show that LEIA generalizes its classification of anger, happiness, and sadness beyond the domain it was trained on. LEIA can be applied in future research to provide better identification of emotions in text from the perspective of the writer. The models produced for this article are publicly available at https://huggingface.co/LEIA

Title: Inducing anxiety in large language models increases exploration and bias. (arXiv:2304.11111v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2304.11111
Code URL: null
Copy Paste: [[2304.11111] Inducing anxiety in large language models increases exploration and bias](http://arxiv.org/abs/2304.11111) #robust
Summary:
Large language models are transforming research on machine learning while galvanizing public debates. Understanding not only when these models work well and succeed but also why they fail and misbehave is of great societal relevance. We propose to turn the lens of computational psychiatry, a framework used to computationally describe and modify aberrant behavior, to the outputs produced by these models. We focus on the Generative Pre-Trained Transformer 3.5 and subject it to tasks commonly studied in psychiatry. Our results show that GPT-3.5 responds robustly to a common anxiety questionnaire, producing higher anxiety scores than human subjects. Moreover, GPT-3.5's responses can be predictably changed by using emotion-inducing prompts. Emotion-induction not only influences GPT-3.5's behavior in a cognitive task measuring exploratory decision-making but also influences its behavior in a previously-established task measuring biases such as racism and ableism. Crucially, GPT-3.5 shows a strong increase in biases when prompted with anxiety-inducing text. Thus, it is likely that how prompts are communicated to large language models has a strong influence on their behavior in applied settings. These results progress our understanding of prompt engineering and demonstrate the usefulness of methods taken from computational psychiatry for studying the capable algorithms to which we increasingly delegate authority and autonomy.

Title: Smart Learning to Find Dumb Contracts. (arXiv:2304.10726v1 [cs.CR])

Paper URL: http://arxiv.org/abs/2304.10726
Code URL: null
Copy Paste: [[2304.10726] Smart Learning to Find Dumb Contracts](http://arxiv.org/abs/2304.10726) #robust
Summary:
We introduce Deep Learning Vulnerability Analyzer (DLVA), a vulnerability detection tool for Ethereum smart contracts based on powerful deep learning techniques for sequential data adapted for bytecode. We train DLVA to judge bytecode even though the supervising oracle, Slither, can only judge source code. DLVA's training algorithm is general: we "extend" a source code analysis to bytecode without any manual feature engineering, predefined patterns, or expert rules. DLVA's training algorithm is also robust: it overcame a 1.25% error rate mislabeled contracts, and the student surpassing the teacher; found vulnerable contracts that Slither mislabeled. In addition to extending a source code analyzer to bytecode, DLVA is much faster than conventional tools for smart contract vulnerability detection based on formal methods: DLVA checks contracts for 29 vulnerabilities in 0.2 seconds, a speedup of 10-500x+ compared to traditional tools.

DLVA has three key components. Smart Contract to Vector (SC2V) uses neural networks to map arbitrary smart contract bytecode to an high-dimensional floating-point vector. Sibling Detector (SD) classifies contracts when a target contract's vector is Euclidian-close to a labeled contract's vector in a training set; although only able to judge 55.7% of the contracts in our test set, it has an average accuracy of 97.4% with a false positive rate of only 0.1%. Lastly, Core Classifier (CC) uses neural networks to infer vulnerable contracts regardless of vector distance. DLVA has an overall accuracy of 96.6% with an associated false positive rate of only 3.7%.

Title: Using Z3 for Formal Modeling and Verification of FNN Global Robustness. (arXiv:2304.10558v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2304.10558
Code URL: https://github.com/weizeming/z3_for_verification_of_fnn_global_robustness
Copy Paste: [[2304.10558] Using Z3 for Formal Modeling and Verification of FNN Global Robustness](http://arxiv.org/abs/2304.10558) #robust
Summary:
While Feedforward Neural Networks (FNNs) have achieved remarkable success in various tasks, they are vulnerable to adversarial examples. Several techniques have been developed to verify the adversarial robustness of FNNs, but most of them focus on robustness verification against the local perturbation neighborhood of a single data point. There is still a large research gap in global robustness analysis. The global-robustness verifiable framework DeepGlobal has been proposed to identify \textit{all} possible Adversarial Dangerous Regions (ADRs) of FNNs, not limited to data samples in a test set. In this paper, we propose a complete specification and implementation of DeepGlobal utilizing the SMT solver Z3 for more explicit definition, and propose several improvements to DeepGlobal for more efficient verification. To evaluate the effectiveness of our implementation and improvements, we conduct extensive experiments on a set of benchmark datasets. Visualization of our experiment results shows the validity and effectiveness of the approach.

Title: B-Learner: Quasi-Oracle Bounds on Heterogeneous Causal Effects Under Hidden Confounding. (arXiv:2304.10577v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2304.10577
Code URL: https://github.com/causalml/sharpcate
Copy Paste: [[2304.10577] B-Learner: Quasi-Oracle Bounds on Heterogeneous Causal Effects Under Hidden Confounding](http://arxiv.org/abs/2304.10577) #robust
Summary:
Estimating heterogeneous treatment effects from observational data is a crucial task across many fields, helping policy and decision-makers take better actions. There has been recent progress on robust and efficient methods for estimating the conditional average treatment effect (CATE) function, but these methods often do not take into account the risk of hidden confounding, which could arbitrarily and unknowingly bias any causal estimate based on observational data. We propose a meta-learner called the B-Learner, which can efficiently learn sharp bounds on the CATE function under limits on the level of hidden confounding. We derive the B-Learner by adapting recent results for sharp and valid bounds of the average treatment effect (Dorn et al., 2021) into the framework given by Kallus & Oprescu (2022) for robust and model-agnostic learning of distributional treatment effects. The B-Learner can use any function estimator such as random forests and deep neural networks, and we prove its estimates are valid, sharp, efficient, and have a quasi-oracle property with respect to the constituent estimators under more general conditions than existing methods. Semi-synthetic experimental comparisons validate the theoretical findings, and we use real-world data demonstrate how the method might be used in practice.

Title: Debiasing Conditional Stochastic Optimization. (arXiv:2304.10613v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2304.10613
Code URL: null
Copy Paste: [[2304.10613] Debiasing Conditional Stochastic Optimization](http://arxiv.org/abs/2304.10613) #robust
Summary:
In this paper, we study the conditional stochastic optimization (CSO) problem which covers a variety of applications including portfolio selection, reinforcement learning, robust learning, causal inference, etc. The sample-averaged gradient of the CSO objective is biased due to its nested structure and therefore requires a high sample complexity to reach convergence. We introduce a general stochastic extrapolation technique that effectively reduces the bias. We show that for nonconvex smooth objectives, combining this extrapolation with variance reduction techniques can achieve a significantly better sample complexity than existing bounds. We also develop new algorithms for the finite-sum variant of CSO that also significantly improve upon existing results. Finally, we believe that our debiasing technique could be an interesting tool applicable to other stochastic optimization problems too.

Title: DEIR: Efficient and Robust Exploration through Discriminative-Model-Based Episodic Intrinsic Rewards. (arXiv:2304.10770v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2304.10770
Code URL: null
Copy Paste: [[2304.10770] DEIR: Efficient and Robust Exploration through Discriminative-Model-Based Episodic Intrinsic Rewards](http://arxiv.org/abs/2304.10770) #robust
Summary:
Exploration is a fundamental aspect of reinforcement learning (RL), and its effectiveness crucially decides the performance of RL algorithms, especially when facing sparse extrinsic rewards. Recent studies showed the effectiveness of encouraging exploration with intrinsic rewards estimated from novelty in observations. However, there is a gap between the novelty of an observation and an exploration in general, because the stochasticity in the environment as well as the behavior of an agent may affect the observation. To estimate exploratory behaviors accurately, we propose DEIR, a novel method where we theoretically derive an intrinsic reward from a conditional mutual information term that principally scales with the novelty contributed by agent explorations, and materialize the reward with a discriminative forward model. We conduct extensive experiments in both standard and hardened exploration games in MiniGrid to show that DEIR quickly learns a better policy than baselines. Our evaluations in ProcGen demonstrate both generalization capabilities and the general applicability of our intrinsic reward.

biometric

steal

Title: Schooling to Exploit Foolish Contracts. (arXiv:2304.10737v1 [cs.CR])

Paper URL: http://arxiv.org/abs/2304.10737
Code URL: null
Copy Paste: [[2304.10737] Schooling to Exploit Foolish Contracts](http://arxiv.org/abs/2304.10737) #steal
Summary:
We introduce SCooLS, our Smart Contract Learning (Semi-supervised) engine. SCooLS uses neural networks to analyze Ethereum contract bytecode and identifies specific vulnerable functions. SCooLS incorporates two key elements: semi-supervised learning and graph neural networks (GNNs). Semi-supervised learning produces more accurate models than unsupervised learning, while not requiring the large oracle-labeled training set that supervised learning requires. GNNs enable direct analysis of smart contract bytecode without any manual feature engineering, predefined patterns, or expert rules.

SCooLS is the first application of semi-supervised learning to smart contract vulnerability analysis, as well as the first deep learning-based vulnerability analyzer to identify specific vulnerable functions. SCooLS's performance is better than existing tools, with an accuracy level of 98.4%, an F1 score of 90.5%, and an exceptionally low false positive rate of only 0.8%. Furthermore, SCooLS is fast, analyzing a typical function in 0.05 seconds.

We leverage SCooLS's ability to identify specific vulnerable functions to build an exploit generator, which was successful in stealing Ether from 76.9% of the true positives.

extraction

Title: GeoLayoutLM: Geometric Pre-training for Visual Information Extraction. (arXiv:2304.10759v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2304.10759
Code URL: https://github.com/alibabaresearch/advancedliteratemachinery
Copy Paste: [[2304.10759] GeoLayoutLM: Geometric Pre-training for Visual Information Extraction](http://arxiv.org/abs/2304.10759) #extraction
Summary:
Visual information extraction (VIE) plays an important role in Document Intelligence. Generally, it is divided into two tasks: semantic entity recognition (SER) and relation extraction (RE). Recently, pre-trained models for documents have achieved substantial progress in VIE, particularly in SER. However, most of the existing models learn the geometric representation in an implicit way, which has been found insufficient for the RE task since geometric information is especially crucial for RE. Moreover, we reveal another factor that limits the performance of RE lies in the objective gap between the pre-training phase and the fine-tuning phase for RE. To tackle these issues, we propose in this paper a multi-modal framework, named GeoLayoutLM, for VIE. GeoLayoutLM explicitly models the geometric relations in pre-training, which we call geometric pre-training. Geometric pre-training is achieved by three specially designed geometry-related pre-training tasks. Additionally, novel relation heads, which are pre-trained by the geometric pre-training tasks and fine-tuned for RE, are elaborately designed to enrich and enhance the feature representation. According to extensive experiments on standard VIE benchmarks, GeoLayoutLM achieves highly competitive scores in the SER task and significantly outperforms the previous state-of-the-arts for RE (\eg, the F1 score of RE on FUNSD is boosted from 80.35\% to 89.45\%). The code and models are publicly available at https://github.com/AlibabaResearch/AdvancedLiterateMachinery/tree/main/DocumentUnderstanding/GeoLayoutLM

Title: DeformableFormer: Classification of Endoscopic Ultrasound Guided Fine Needle Biopsy in Pancreatic Diseases. (arXiv:2304.10791v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2304.10791
Code URL: null
Copy Paste: [[2304.10791] DeformableFormer: Classification of Endoscopic Ultrasound Guided Fine Needle Biopsy in Pancreatic Diseases](http://arxiv.org/abs/2304.10791) #extraction
Summary:
Endoscopic Ultrasound-Fine Needle Aspiration (EUS-FNA) is used to examine pancreatic cancer. EUS-FNA is an examination using EUS to insert a thin needle into the tumor and collect pancreatic tissue fragments. Then collected pancreatic tissue fragments are then stained to classify whether they are pancreatic cancer. However, staining and visual inspection are time consuming. In addition, if the pancreatic tissue fragment cannot be examined after staining, the collection must be done again on the other day. Therefore, our purpose is to classify from an unstained image whether it is available for examination or not, and to exceed the accuracy of visual classification by specialist physicians. Image classification before staining can reduce the time required for staining and the burden of patients. However, the images of pancreatic tissue fragments used in this study cannot be successfully classified by processing the entire image because the pancreatic tissue fragments are only a part of the image. Therefore, we propose a DeformableFormer that uses Deformable Convolution in MetaFormer framework. The architecture consists of a generalized model of the Vision Transformer, and we use Deformable Convolution in the TokenMixer part. In contrast to existing approaches, our proposed DeformableFormer is possible to perform feature extraction more locally and dynamically by Deformable Convolution. Therefore, it is possible to perform suitable feature extraction for classifying target. To evaluate our method, we classify two categories of pancreatic tissue fragments; available and unavailable for examination. We demonstrated that our method outperformed the accuracy by specialist physicians and conventional methods.

Title: Learn to Cluster Faces with Better Subgraphs. (arXiv:2304.10831v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2304.10831
Code URL: null
Copy Paste: [[2304.10831] Learn to Cluster Faces with Better Subgraphs](http://arxiv.org/abs/2304.10831) #extraction
Summary:
Face clustering can provide pseudo-labels to the massive unlabeled face data and improve the performance of different face recognition models. The existing clustering methods generally aggregate the features within subgraphs that are often implemented based on a uniform threshold or a learned cutoff position. This may reduce the recall of subgraphs and hence degrade the clustering performance. This work proposed an efficient neighborhood-aware subgraph adjustment method that can significantly reduce the noise and improve the recall of the subgraphs, and hence can drive the distant nodes to converge towards the same centers. More specifically, the proposed method consists of two components, i.e. face embeddings enhancement using the embeddings from neighbors, and enclosed subgraph construction of node pairs for structural information extraction. The embeddings are combined to predict the linkage probabilities for all node pairs to replace the cosine similarities to produce new subgraphs that can be further used for aggregation of GCNs or other clustering methods. The proposed method is validated through extensive experiments against a range of clustering solutions using three benchmark datasets and numerical results confirm that it outperforms the SOTA solutions in terms of generalization capability.

Title: TC-GAT: Graph Attention Network for Temporal Causality Discovery. (arXiv:2304.10706v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2304.10706
Code URL: null
Copy Paste: [[2304.10706] TC-GAT: Graph Attention Network for Temporal Causality Discovery](http://arxiv.org/abs/2304.10706) #extraction
Summary:
The present study explores the intricacies of causal relationship extraction, a vital component in the pursuit of causality knowledge. Causality is frequently intertwined with temporal elements, as the progression from cause to effect is not instantaneous but rather ensconced in a temporal dimension. Thus, the extraction of temporal causality holds paramount significance in the field. In light of this, we propose a method for extracting causality from the text that integrates both temporal and causal relations, with a particular focus on the time aspect. To this end, we first compile a dataset that encompasses temporal relationships. Subsequently, we present a novel model, TC-GAT, which employs a graph attention mechanism to assign weights to the temporal relationships and leverages a causal knowledge graph to determine the adjacency matrix. Additionally, we implement an equilibrium mechanism to regulate the interplay between temporal and causal relations. Our experiments demonstrate that our proposed method significantly surpasses baseline models in the task of causality extraction.

Title: Information Extraction from Documents: Question Answering vs Token Classification in real-world setups. (arXiv:2304.10994v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2304.10994
Code URL: null
Copy Paste: [[2304.10994] Information Extraction from Documents: Question Answering vs Token Classification in real-world setups](http://arxiv.org/abs/2304.10994) #extraction
Summary:
Research in Document Intelligence and especially in Document Key Information Extraction (DocKIE) has been mainly solved as Token Classification problem. Recent breakthroughs in both natural language processing (NLP) and computer vision helped building document-focused pre-training methods, leveraging a multimodal understanding of the document text, layout and image modalities. However, these breakthroughs also led to the emergence of a new DocKIE subtask of extractive document Question Answering (DocQA), as part of the Machine Reading Comprehension (MRC) research field. In this work, we compare the Question Answering approach with the classical token classification approach for document key information extraction. We designed experiments to benchmark five different experimental setups : raw performances, robustness to noisy environment, capacity to extract long entities, fine-tuning speed on Few-Shot Learning and finally Zero-Shot Learning. Our research showed that when dealing with clean and relatively short entities, it is still best to use token classification-based approach, while the QA approach could be a good alternative for noisy environment or long entities use-cases.

Title: BERT Based Clinical Knowledge Extraction for Biomedical Knowledge Graph Construction and Analysis. (arXiv:2304.10996v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2304.10996
Code URL: null
Copy Paste: [[2304.10996] BERT Based Clinical Knowledge Extraction for Biomedical Knowledge Graph Construction and Analysis](http://arxiv.org/abs/2304.10996) #extraction
Summary:
Background : Knowledge is evolving over time, often as a result of new discoveries or changes in the adopted methods of reasoning. Also, new facts or evidence may become available, leading to new understandings of complex phenomena. This is particularly true in the biomedical field, where scientists and physicians are constantly striving to find new methods of diagnosis, treatment and eventually cure. Knowledge Graphs (KGs) offer a real way of organizing and retrieving the massive and growing amount of biomedical knowledge.

Objective : We propose an end-to-end approach for knowledge extraction and analysis from biomedical clinical notes using the Bidirectional Encoder Representations from Transformers (BERT) model and Conditional Random Field (CRF) layer.

Methods : The approach is based on knowledge graphs, which can effectively process abstract biomedical concepts such as relationships and interactions between medical entities. Besides offering an intuitive way to visualize these concepts, KGs can solve more complex knowledge retrieval problems by simplifying them into simpler representations or by transforming the problems into representations from different perspectives. We created a biomedical Knowledge Graph using using Natural Language Processing models for named entity recognition and relation extraction. The generated biomedical knowledge graphs (KGs) are then used for question answering.

Results : The proposed framework can successfully extract relevant structured information with high accuracy (90.7% for Named-entity recognition (NER), 88% for relation extraction (RE)), according to experimental findings based on real-world 505 patient biomedical unstructured clinical notes.

Conclusions : In this paper, we propose a novel end-to-end system for the construction of a biomedical knowledge graph from clinical textual using a variation of BERT models.

Title: SkillGPT: a RESTful API service for skill extraction and standardization using a Large Language Model. (arXiv:2304.11060v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2304.11060
Code URL: https://github.com/aida-ugent/skillgpt
Copy Paste: [[2304.11060] SkillGPT: a RESTful API service for skill extraction and standardization using a Large Language Model](http://arxiv.org/abs/2304.11060) #extraction
Summary:
We present SkillGPT, a tool for skill extraction and standardization (SES) from free-style job descriptions and user profiles with an open-source Large Language Model (LLM) as backbone. Most previous methods for similar tasks either need supervision or rely on heavy data-preprocessing and feature engineering. Directly prompting the latest conversational LLM for standard skills, however, is slow, costly and inaccurate. In contrast, SkillGPT utilizes a LLM to perform its tasks in steps via summarization and vector similarity search, to balance speed with precision. The backbone LLM of SkillGPT is based on Llama, free for academic use and thus useful for exploratory research and prototype development. Hence, our cost-free SkillGPT gives users the convenience of conversational SES, efficiently and reliably.

membership infer

federate

Title: Get Rid Of Your Trail: Remotely Erasing Backdoors in Federated Learning. (arXiv:2304.10638v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2304.10638
Code URL: null
Copy Paste: [[2304.10638] Get Rid Of Your Trail: Remotely Erasing Backdoors in Federated Learning](http://arxiv.org/abs/2304.10638) #federate
Summary:
Federated Learning (FL) enables collaborative deep learning training across multiple participants without exposing sensitive personal data. However, the distributed nature of FL and the unvetted participants' data makes it vulnerable to backdoor attacks. In these attacks, adversaries inject malicious functionality into the centralized model during training, leading to intentional misclassifications for specific adversary-chosen inputs. While previous research has demonstrated successful injections of persistent backdoors in FL, the persistence also poses a challenge, as their existence in the centralized model can prompt the central aggregation server to take preventive measures to penalize the adversaries. Therefore, this paper proposes a methodology that enables adversaries to effectively remove backdoors from the centralized model upon achieving their objectives or upon suspicion of possible detection. The proposed approach extends the concept of machine unlearning and presents strategies to preserve the performance of the centralized model and simultaneously prevent over-unlearning of information unrelated to backdoor patterns, making the adversaries stealthy while removing backdoors. To the best of our knowledge, this is the first work that explores machine unlearning in FL to remove backdoors to the benefit of adversaries. Exhaustive evaluation considering image classification scenarios demonstrates the efficacy of the proposed method in efficient backdoor removal from the centralized model, injected by state-of-the-art attacks across multiple configurations.

Title: Federated Learning for Predictive Maintenance and Quality Inspection in Industrial Applications. (arXiv:2304.11101v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2304.11101
Code URL: null
Copy Paste: [[2304.11101] Federated Learning for Predictive Maintenance and Quality Inspection in Industrial Applications](http://arxiv.org/abs/2304.11101) #federate
Summary:
Data-driven machine learning is playing a crucial role in the advancements of Industry 4.0, specifically in enhancing predictive maintenance and quality inspection. Federated learning (FL) enables multiple participants to develop a machine learning model without compromising the privacy and confidentiality of their data. In this paper, we evaluate the performance of different FL aggregation methods and compare them to central and local training approaches. Our study is based on four datasets with varying data distributions. The results indicate that the performance of FL is highly dependent on the data and its distribution among clients. In some scenarios, FL can be an effective alternative to traditional central or local training methods. Additionally, we introduce a new federated learning dataset from a real-world quality inspection setting.

fair

Title: Supporting Qualitative Analysis with Large Language Models: Combining Codebook with GPT-3 for Deductive Coding. (arXiv:2304.10548v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2304.10548
Code URL: null
Copy Paste: [[2304.10548] Supporting Qualitative Analysis with Large Language Models: Combining Codebook with GPT-3 for Deductive Coding](http://arxiv.org/abs/2304.10548) #fair
Summary:
Qualitative analysis of textual contents unpacks rich and valuable information by assigning labels to the data. However, this process is often labor-intensive, particularly when working with large datasets. While recent AI-based tools demonstrate utility, researchers may not have readily available AI resources and expertise, let alone be challenged by the limited generalizability of those task-specific models. In this study, we explored the use of large language models (LLMs) in supporting deductive coding, a major category of qualitative analysis where researchers use pre-determined codebooks to label the data into a fixed set of codes. Instead of training task-specific models, a pre-trained LLM could be used directly for various tasks without fine-tuning through prompt learning. Using a curiosity-driven questions coding task as a case study, we found, by combining GPT-3 with expert-drafted codebooks, our proposed approach achieved fair to substantial agreements with expert-coded results. We lay out challenges and opportunities in using LLMs to support qualitative coding and beyond.

Title: Individual Fairness in Bayesian Neural Networks. (arXiv:2304.10828v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2304.10828
Code URL: https://github.com/alicedoherty/bayesian-fairness
Copy Paste: [[2304.10828] Individual Fairness in Bayesian Neural Networks](http://arxiv.org/abs/2304.10828) #fair
Summary:
We study Individual Fairness (IF) for Bayesian neural networks (BNNs). Specifically, we consider the $\epsilon$-$\delta$-individual fairness notion, which requires that, for any pair of input points that are $\epsilon$-similar according to a given similarity metrics, the output of the BNN is within a given tolerance $\delta>0.$ We leverage bounds on statistical sampling over the input space and the relationship between adversarial robustness and individual fairness to derive a framework for the systematic estimation of $\epsilon$-$\delta$-IF, designing Fair-FGSM and Fair-PGD as global,fairness-aware extensions to gradient-based attacks for BNNs. We empirically study IF of a variety of approximately inferred BNNs with different architectures on fairness benchmarks, and compare against deterministic models learnt using frequentist techniques. Interestingly, we find that BNNs trained by means of approximate Bayesian inference consistently tend to be markedly more individually fair than their deterministic counterparts.

Title: Automated Medical Coding on MIMIC-III and MIMIC-IV: A Critical Review and Replicability Study. (arXiv:2304.10909v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2304.10909
Code URL: https://github.com/joakimedin/medical-coding-reproducibility
Copy Paste: [[2304.10909] Automated Medical Coding on MIMIC-III and MIMIC-IV: A Critical Review and Replicability Study](http://arxiv.org/abs/2304.10909) #fair
Summary:
Medical coding is the task of assigning medical codes to clinical free-text documentation. Healthcare professionals manually assign such codes to track patient diagnoses and treatments. Automated medical coding can considerably alleviate this administrative burden. In this paper, we reproduce, compare, and analyze state-of-the-art automated medical coding machine learning models. We show that several models underperform due to weak configurations, poorly sampled train-test splits, and insufficient evaluation. In previous work, the macro F1 score has been calculated sub-optimally, and our correction doubles it. We contribute a revised model comparison using stratified sampling and identical experimental setups, including hyperparameters and decision boundary tuning. We analyze prediction errors to validate and falsify assumptions of previous works. The analysis confirms that all models struggle with rare codes, while long documents only have a negligible impact. Finally, we present the first comprehensive results on the newly released MIMIC-IV dataset using the reproduced models. We release our code, model parameters, and new MIMIC-III and MIMIC-IV training and evaluation pipelines to accommodate fair future comparisons.

interpretability

Title: A Revisit to the Normalized Eight-Point Algorithm and A Self-Supervised Deep Solution. (arXiv:2304.10771v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2304.10771
Code URL: null
Copy Paste: [[2304.10771] A Revisit to the Normalized Eight-Point Algorithm and A Self-Supervised Deep Solution](http://arxiv.org/abs/2304.10771) #interpretability
Summary:
The Normalized Eight-Point algorithm has been widely viewed as the cornerstone in two-view geometry computation, where the seminal Hartley's normalization greatly improves the performance of the direct linear transformation (DLT) algorithm. A natural question is, whether there exists and how to find other normalization methods that may further improve the performance as per each input sample. In this paper, we provide a novel perspective and make two contributions towards this fundamental problem: 1) We revisit the normalized eight-point algorithm and make a theoretical contribution by showing the existence of different and better normalization algorithms; 2) We present a deep convolutional neural network with a self-supervised learning strategy to the normalization. Given eight pairs of correspondences, our network directly predicts the normalization matrices, thus learning to normalize each input sample. Our learning-based normalization module could be integrated with both traditional (e.g., RANSAC) and deep learning framework (affording good interpretability) with minimal efforts. Extensive experiments on both synthetic and real images show the effectiveness of our proposed approach.

explainability

watermark

diffusion

Title: Long-Term Photometric Consistent Novel View Synthesis with Diffusion Models. (arXiv:2304.10700v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2304.10700
Code URL: null
Copy Paste: [[2304.10700] Long-Term Photometric Consistent Novel View Synthesis with Diffusion Models](http://arxiv.org/abs/2304.10700) #diffusion
Summary:
Novel view synthesis from a single input image is a challenging task, where the goal is to generate a new view of a scene from a desired camera pose that may be separated by a large motion. The highly uncertain nature of this synthesis task due to unobserved elements within the scene (i.e., occlusion) and outside the field-of-view makes the use of generative models appealing to capture the variety of possible outputs. In this paper, we propose a novel generative model which is capable of producing a sequence of photorealistic images consistent with a specified camera trajectory, and a single starting image. Our approach is centred on an autoregressive conditional diffusion-based model capable of interpolating visible scene elements, and extrapolating unobserved regions in a view, in a geometrically consistent manner. Conditioning is limited to an image capturing a single camera view and the (relative) pose of the new camera view. To measure the consistency over a sequence of generated views, we introduce a new metric, the thresholded symmetric epipolar distance (TSED), to measure the number of consistent frame pairs in a sequence. While previous methods have been shown to produce high quality images and consistent semantics across pairs of views, we show empirically with our metric that they are often inconsistent with the desired camera poses. In contrast, we demonstrate that our method produces both photorealistic and view-consistent imagery.

Title: Improved Diffusion-based Image Colorization via Piggybacked Models. (arXiv:2304.11105v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2304.11105
Code URL: null
Copy Paste: [[2304.11105] Improved Diffusion-based Image Colorization via Piggybacked Models](http://arxiv.org/abs/2304.11105) #diffusion
Summary:
Image colorization has been attracting the research interests of the community for decades. However, existing methods still struggle to provide satisfactory colorized results given grayscale images due to a lack of human-like global understanding of colors. Recently, large-scale Text-to-Image (T2I) models have been exploited to transfer the semantic information from the text prompts to the image domain, where text provides a global control for semantic objects in the image. In this work, we introduce a colorization model piggybacking on the existing powerful T2I diffusion model. Our key idea is to exploit the color prior knowledge in the pre-trained T2I diffusion model for realistic and diverse colorization. A diffusion guider is designed to incorporate the pre-trained weights of the latent diffusion model to output a latent color prior that conforms to the visual semantics of the grayscale input. A lightness-aware VQVAE will then generate the colorized result with pixel-perfect alignment to the given grayscale image. Our model can also achieve conditional colorization with additional inputs (e.g. user hints and texts). Extensive experiments show that our method achieves state-of-the-art performance in terms of perceptual quality.

Title: BoDiffusion: Diffusing Sparse Observations for Full-Body Human Motion Synthesis. (arXiv:2304.11118v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2304.11118
Code URL: null
Copy Paste: [[2304.11118] BoDiffusion: Diffusing Sparse Observations for Full-Body Human Motion Synthesis](http://arxiv.org/abs/2304.11118) #diffusion
Summary:
Mixed reality applications require tracking the user's full-body motion to enable an immersive experience. However, typical head-mounted devices can only track head and hand movements, leading to a limited reconstruction of full-body motion due to variability in lower body configurations. We propose BoDiffusion -- a generative diffusion model for motion synthesis to tackle this under-constrained reconstruction problem. We present a time and space conditioning scheme that allows BoDiffusion to leverage sparse tracking inputs while generating smooth and realistic full-body motion sequences. To the best of our knowledge, this is the first approach that uses the reverse diffusion process to model full-body tracking as a conditional sequence generation task. We conduct experiments on the large-scale motion-capture dataset AMASS and show that our approach outperforms the state-of-the-art approaches by a significant margin in terms of full-body motion realism and joint reconstruction error.

Title: IDQL: Implicit Q-Learning as an Actor-Critic Method with Diffusion Policies. (arXiv:2304.10573v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2304.10573
Code URL: https://github.com/philippe-eecs/idql
Copy Paste: [[2304.10573] IDQL: Implicit Q-Learning as an Actor-Critic Method with Diffusion Policies](http://arxiv.org/abs/2304.10573) #diffusion
Summary:
Effective offline RL methods require properly handling out-of-distribution actions. Implicit Q-learning (IQL) addresses this by training a Q-function using only dataset actions through a modified Bellman backup. However, it is unclear which policy actually attains the values represented by this implicitly trained Q-function. In this paper, we reinterpret IQL as an actor-critic method by generalizing the critic objective and connecting it to a behavior-regularized implicit actor. This generalization shows how the induced actor balances reward maximization and divergence from the behavior policy, with the specific loss choice determining the nature of this tradeoff. Notably, this actor can exhibit complex and multimodal characteristics, suggesting issues with the conditional Gaussian actor fit with advantage weighted regression (AWR) used in prior methods. Instead, we propose using samples from a diffusion parameterized behavior policy and weights computed from the critic to then importance sampled our intended policy. We introduce Implicit Diffusion Q-learning (IDQL), combining our general IQL critic with the policy extraction method. IDQL maintains the ease of implementation of IQL while outperforming prior offline RL methods and demonstrating robustness to hyperparameters. Code is available at https://github.com/philippe-eecs/IDQL.