secure

Title: ISC-FLAT: On the Conflict Between Control Flow Attestation and Real-Time Operations. (arXiv:2303.03561v1 [cs.CR])

Paper URL: http://arxiv.org/abs/2303.03561
Code URL: null
Copy Paste: [[2303.03561] ISC-FLAT: On the Conflict Between Control Flow Attestation and Real-Time Operations](http://arxiv.org/abs/2303.03561) #secure
Summary:
The wide adoption of IoT gadgets and Cyber-Physical Systems (CPS) makes embedded devices increasingly important. While some of these devices perform mission-critical tasks, they are usually implemented using Micro-Controller Units (MCUs) that lack security mechanisms on par with those available to general-purpose computers, making them more susceptible to remote exploits that could corrupt their software integrity. Motivated by this problem, prior work has proposed techniques to remotely assess the trustworthiness of embedded MCU software. Among them, Control Flow Attestation (CFA) enables remote detection of runtime abuses that illegally modify the program's control flow during execution.

Despite these advances, current CFA methods share a fundamental limitation: they preclude interrupts during the execution of the software operation being attested. Simply put, existing CFA techniques are insecure unless interrupts are disabled on the MCU. On the other hand, we argue that the lack of interruptability can obscure CFA usefulness, as most embedded applications depend on interrupts to process asynchronous events in real-time.

To address this limitation, we propose Interrupt-Safe Control Flow Attestation (ISC-FLAT): a CFA technique that is compatible with existing MCUs and enables interrupt handling without compromising the authenticity of CFA reports. Similar to other CFA techniques that do not require customized hardware modifications, ISC-FLAT leverages a Trusted Execution Environment (TEE) (in particular, our prototype is built on ARM TrustZone-M) to securely generate unforgeable CFA reports without precluding applications from processing interrupts. We implement a fully functional ISC-FLAT prototype on the ARM Cortex-M33 MCU and demonstrate that it incurs minimal runtime overhead when compared to existing TEE-based CFA methods that do not support interrupts.

Title: Client-specific Property Inference against Secure Aggregation in Federated Learning. (arXiv:2303.03908v1 [cs.CR])

Paper URL: http://arxiv.org/abs/2303.03908
Code URL: null
Copy Paste: [[2303.03908] Client-specific Property Inference against Secure Aggregation in Federated Learning](http://arxiv.org/abs/2303.03908) #secure
Summary:
Federated learning has become a widely used paradigm for collaboratively training a common model among different participants with the help of a central server that coordinates the training. Although only the model parameters or other model updates are exchanged during the federated training instead of the participant's data, many attacks have shown that it is still possible to infer sensitive information such as membership, property, or outright reconstruction of participant data. Although differential privacy is considered an effective solution to protect against privacy attacks, it is also criticized for its negative effect on utility. Another possible defense is to use secure aggregation which allows the server to only access the aggregated update instead of each individual one, and it is often more appealing because it does not degrade model quality. However, combining only the aggregated updates, which are generated by a different composition of clients in every round, may still allow the inference of some client-specific information.

In this paper, we show that simple linear models can effectively capture client-specific properties only from the aggregated model updates due to the linearity of aggregation. We formulate an optimization problem across different rounds in order to infer a tested property of every client from the output of the linear models, for example, whether they have a specific sample in their training data (membership inference) or whether they misbehave and attempt to degrade the performance of the common model by poisoning attacks. Our reconstruction technique is completely passive and undetectable. We demonstrate the efficacy of our approach on several scenarios which shows that secure aggregation provides very limited privacy guarantees in practice. The source code will be released upon publication.

security

Title: SoK: Content Moderation for End-to-End Encryption. (arXiv:2303.03979v1 [cs.CR])

Paper URL: http://arxiv.org/abs/2303.03979
Code URL: null
Copy Paste: [[2303.03979] SoK: Content Moderation for End-to-End Encryption](http://arxiv.org/abs/2303.03979) #security
Summary:
Popular messaging applications now enable end-to-end-encryption (E2EE) by default, and E2EE data storage is becoming common. These important advances for security and privacy create new content moderation challenges for online services, because services can no longer directly access plaintext content. While ongoing public policy debates about E2EE and content moderation in the United States and European Union emphasize child sexual abuse material and misinformation in messaging and storage, we identify and synthesize a wealth of scholarship that goes far beyond those topics. We bridge literature that is diverse in both content moderation subject matter, such as malware, spam, hate speech, terrorist content, and enterprise policy compliance, as well as intended deployments, including not only privacy-preserving content moderation for messaging, email, and cloud storage, but also private introspection of encrypted web traffic by middleboxes. In this work, we systematize the study of content moderation in E2EE settings. We set out a process pipeline for content moderation, drawing on a broad interdisciplinary literature that is not specific to E2EE. We examine cryptography and policy design choices at all stages of this pipeline, and we suggest areas of future research to fill gaps in literature and better understand possible paths forward.

Title: A Comparison of Methods for Neural Network Aggregation. (arXiv:2303.03488v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2303.03488
Code URL: null
Copy Paste: [[2303.03488] A Comparison of Methods for Neural Network Aggregation](http://arxiv.org/abs/2303.03488) #security
Summary:
Deep learning has been successful in the theoretical aspect. For deep learning to succeed in industry, we need to have algorithms capable of handling many inconsistencies appearing in real data. These inconsistencies can have large effects on the implementation of a deep learning algorithm. Artificial Intelligence is currently changing the medical industry. However, receiving authorization to use medical data for training machine learning algorithms is a huge hurdle. A possible solution is sharing the data without sharing the patient information. We propose a multi-party computation protocol for the deep learning algorithm. The protocol enables to conserve both the privacy and the security of the training data. Three approaches of neural networks assembly are analyzed: transfer learning, average ensemble learning, and series network learning. The results are compared to approaches based on data-sharing in different experiments. We analyze the security issues of the proposed protocol. Although the analysis is based on medical data, the results of multi-party computation of machine learning training are theoretical and can be implemented in multiple research areas.

privacy

Title: Bootstrap The Original Latent: Freeze-and-thaw Adapter for Back-Propagated Black-Box Adaptation. (arXiv:2303.03709v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2303.03709
Code URL: null
Copy Paste: [[2303.03709] Bootstrap The Original Latent: Freeze-and-thaw Adapter for Back-Propagated Black-Box Adaptation](http://arxiv.org/abs/2303.03709) #privacy
Summary:
In this paper, considering the balance of data/model privacy of model owners and user needs, we propose a new setting called Back-Propagated Black-Box Adaptation (BPBA) for users to better train their private models via the guidance of the back-propagated results of foundation/source models. Our setting can ease the usage of foundation/source models as well as prevent the leakage and misuse of foundation/source models. Moreover, we also propose a new training strategy called Bootstrap The Original Latent (BTOL) to fully utilize the foundation/source models. Our strategy consists of a domain adapter and a freeze-and-thaw strategy. We apply our BTOL under BPBA and Black-box UDA settings on three different datasets. Experiments show that our strategy is efficient and robust in various settings without manual augmentations.

Title: EavesDroid: Eavesdropping User Behaviors via OS Side-Channels on Smartphones. (arXiv:2303.03700v1 [cs.CR])

Paper URL: http://arxiv.org/abs/2303.03700
Code URL: null
Copy Paste: [[2303.03700] EavesDroid: Eavesdropping User Behaviors via OS Side-Channels on Smartphones](http://arxiv.org/abs/2303.03700) #privacy
Summary:
As the Internet of Things (IoT) continues to grow, smartphones have become an integral part of IoT systems. However, with the increasing amount of personal information stored on smartphones, users' privacy is at risk of being compromised by malicious attackers. Malware detection engines are commonly installed on smartphones to defend against these attacks, but new attacks that can evade these defenses may still emerge. In this paper, we present EavesDroid, a new side-channel attack on Android smartphones that allows an unprivileged attacker to accurately infer fine-grained user behaviors (e.g. viewing messages, playing videos) through the on-screen operations. Our attack relies on the correlation between user behaviors and the return values of system calls. The fact that these return values are affected by many factors, resulting in fluctuation and misalignment, makes the attack more challenging. Therefore, we build a CNN-GRU classification model, apply min-max normalization to the raw data and combine multiple features to identify the fine-grained user behaviors. A series of experiments on different models and systems of Android smartphones show that, EavesDroid can achieve an accuracy of 98% and 86% for already considered user behaviors in test set and real-world settings. To prevent this attack, we recommend malware detection, obfuscating return values or restricting applications from reading vulnerable return values.

protect

defense

attack

Title: Logit Margin Matters: Improving Transferable Targeted Adversarial Attack by Logit Calibration. (arXiv:2303.03680v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2303.03680
Code URL: null
Copy Paste: [[2303.03680] Logit Margin Matters: Improving Transferable Targeted Adversarial Attack by Logit Calibration](http://arxiv.org/abs/2303.03680) #attack
Summary:
Previous works have extensively studied the transferability of adversarial samples in untargeted black-box scenarios. However, it still remains challenging to craft targeted adversarial examples with higher transferability than non-targeted ones. Recent studies reveal that the traditional Cross-Entropy (CE) loss function is insufficient to learn transferable targeted adversarial examples due to the issue of vanishing gradient. In this work, we provide a comprehensive investigation of the CE loss function and find that the logit margin between the targeted and untargeted classes will quickly obtain saturation in CE, which largely limits the transferability. Therefore, in this paper, we devote to the goal of continually increasing the logit margin along the optimization to deal with the saturation issue and propose two simple and effective logit calibration methods, which are achieved by downscaling the logits with a temperature factor and an adaptive margin, respectively. Both of them can effectively encourage optimization to produce a larger logit margin and lead to higher transferability. Besides, we show that minimizing the cosine distance between the adversarial examples and the classifier weights of the target class can further improve the transferability, which is benefited from downscaling logits via L2-normalization. Experiments conducted on the ImageNet dataset validate the effectiveness of the proposed methods, which outperform the state-of-the-art methods in black-box targeted attacks. The source code is available at \href{https://github.com/WJJLL/Target-Attack/}{Link}

Title: Securing Autonomous Vehicles Under Partial-Information Cyber Attacks on LiDAR Data. (arXiv:2303.03470v1 [cs.CR])

Paper URL: http://arxiv.org/abs/2303.03470
Code URL: null
Copy Paste: [[2303.03470] Securing Autonomous Vehicles Under Partial-Information Cyber Attacks on LiDAR Data](http://arxiv.org/abs/2303.03470) #attack
Summary:
Safety is paramount in autonomous vehicles (AVs). Auto manufacturers have spent millions of dollars and driven billions of miles to prove AVs are safe. However, this is ill-suited to answer: what happens to an AV if its data are adversarially compromised? We design a framework built on security-relevant metrics to benchmark AVs on longitudinal datasets. We establish the capabilities of a cyber-level attacker with only access to LiDAR datagrams and from them derive novel attacks on LiDAR. We demonstrate that even though the attacker has minimal knowledge and only access to raw datagrams, the attacks compromise perception and tracking in multi-sensor AVs and lead to objectively unsafe scenarios. To mitigate vulnerabilities and advance secure architectures in AVs, we present two improvements for security-aware fusion -- a data-asymmetry monitor and a scalable track-to-track fusion of 3D LiDAR and monocular detections (T2T-3DLM); we demonstrate that the approaches significantly reduce the attack effectiveness.

Title: Exploring the Limits of Indiscriminate Data Poisoning Attacks. (arXiv:2303.03592v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2303.03592
Code URL: null
Copy Paste: [[2303.03592] Exploring the Limits of Indiscriminate Data Poisoning Attacks](http://arxiv.org/abs/2303.03592) #attack
Summary:
Indiscriminate data poisoning attacks aim to decrease a model's test accuracy by injecting a small amount of corrupted training data. Despite significant interest, existing attacks remain relatively ineffective against modern machine learning (ML) architectures. In this work, we introduce the notion of model poisonability as a technical tool to explore the intrinsic limits of data poisoning attacks. We derive an easily computable threshold to establish and quantify a surprising phase transition phenomenon among popular ML models: data poisoning attacks become effective only when the poisoning ratio exceeds our threshold. Building on existing parameter corruption attacks and refining the Gradient Canceling attack, we perform extensive experiments to confirm our theoretical findings, test the predictability of our transition threshold, and significantly improve existing data poisoning baselines over a range of datasets and models. Our work highlights the critical role played by the poisoning ratio, and sheds new insights on existing empirical results, attacks and mitigation strategies in data poisoning.

Title: SCRAMBLE-CFI: Mitigating Fault-Induced Control-Flow Attacks on OpenTitan. (arXiv:2303.03711v1 [cs.CR])

Paper URL: http://arxiv.org/abs/2303.03711
Code URL: null
Copy Paste: [[2303.03711] SCRAMBLE-CFI: Mitigating Fault-Induced Control-Flow Attacks on OpenTitan](http://arxiv.org/abs/2303.03711) #attack
Summary:
Secure elements physically exposed to adversaries are frequently targeted by fault attacks. These attacks can be utilized to hijack the control-flow of software allowing the attacker to bypass security measures, extract sensitive data, or gain full code execution. In this paper, we systematically analyze the threat vector of fault-induced control-flow manipulations on the open-source OpenTitan secure element. Our thorough analysis reveals that current countermeasures of this chip either induce large area overheads or still cannot prevent the attacker from exploiting the identified threats. In this context, we introduce SCRAMBLE-CFI, an encryption-based control-flow integrity scheme utilizing existing hardware features of OpenTitan. SCRAMBLE-CFI confines, with minimal hardware overhead, the impact of fault-induced control-flow attacks by encrypting each function with a different encryption tweak at load-time. At runtime, code only can be successfully decrypted when the correct decryption tweak is active. We open-source our hardware changes and release our LLVM toolchain automatically protecting programs. Our analysis shows that SCRAMBLE-CFI complementarily enhances security guarantees of OpenTitan with a negligible hardware overhead of less than 3.97 % and a runtime overhead of 7.02 % for the Embench-IoT benchmarks.

robust

Title: Memory Maps for Video Object Detection and Tracking on UAVs. (arXiv:2303.03508v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2303.03508
Code URL: null
Copy Paste: [[2303.03508] Memory Maps for Video Object Detection and Tracking on UAVs](http://arxiv.org/abs/2303.03508) #robust
Summary:
This paper introduces a novel approach to video object detection detection and tracking on Unmanned Aerial Vehicles (UAVs). By incorporating metadata, the proposed approach creates a memory map of object locations in actual world coordinates, providing a more robust and interpretable representation of object locations in both, image space and the real world. We use this representation to boost confidences, resulting in improved performance for several temporal computer vision tasks, such as video object detection, short and long-term single and multi-object tracking, and video anomaly detection. These findings confirm the benefits of metadata in enhancing the capabilities of UAVs in the field of temporal computer vision and pave the way for further advancements in this area.

Title: Learning Discriminative Representations for Skeleton Based Action Recognition. (arXiv:2303.03729v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2303.03729
Code URL: null
Copy Paste: [[2303.03729] Learning Discriminative Representations for Skeleton Based Action Recognition](http://arxiv.org/abs/2303.03729) #robust
Summary:
Human action recognition aims at classifying the category of human action from a segment of a video. Recently, people dive into designing GCN-based models to extract features from skeletons for performing this task, because skeleton representations are much efficient and robust than other modalities such as RGB frames. However, when employing the skeleton data, some important clues like related items are also dismissed. It results in some ambiguous actions that are hard to be distinguished and tend to be misclassified. To alleviate this problem, we propose an auxiliary feature refinement head (FR Head), which consists of spatial-temporal decoupling and contrastive feature refinement, to obtain discriminative representations of skeletons. Ambiguous samples are dynamically discovered and calibrated in the feature space. Furthermore, FR Head could be imposed on different stages of GCNs to build a multi-level refinement for stronger supervision. Extensive experiments are conducted on NTU RGB+D, NTU RGB+D 120, and NW-UCLA datasets. Our proposed models obtain competitive results from state-of-the-art methods and can help to discriminate those ambiguous samples.

Title: Guiding Pseudo-labels with Uncertainty Estimation for Test-Time Adaptation. (arXiv:2303.03770v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2303.03770
Code URL: null
Copy Paste: [[2303.03770] Guiding Pseudo-labels with Uncertainty Estimation for Test-Time Adaptation](http://arxiv.org/abs/2303.03770) #robust
Summary:
Standard Unsupervised Domain Adaptation (UDA) methods assume the availability of both source and target data during the adaptation. In this work, we investigate the Test-Time Adaptation (TTA), a specific case of UDA where a model is adapted to a target domain without access to source data. We propose a novel approach for the TTA setting based on a loss reweighting strategy that brings robustness against the noise that inevitably affects the pseudo-labels. The classification loss is reweighted based on the reliability of the pseudo-labels that is measured by estimating their uncertainty. Guided by such reweighting strategy, the pseudo-labels are progressively refined by aggregating knowledge from neighbouring samples. Furthermore, a self-supervised contrastive framework is leveraged as a target space regulariser to enhance such knowledge aggregation. A novel negative pairs exclusion strategy is proposed to identify and exclude negative pairs made of samples sharing the same class, even in presence of some noise in the pseudo-labels. Our method outperforms previous methods on three major benchmarks by a large margin. We set the new TTA state-of-the-art on VisDA-C and DomainNet with a performance gain of +1.8\% on both benchmarks and on PACS with +12.3\% in the single-source setting and +6.6\% in\ multi-target adaptation. Additional analyses demonstrate that the proposed approach is robust to the noise, which results in significantly more accurate pseudo-labels compared to state-of-the-art approaches.

Title: Event Voxel Set Transformer for Spatiotemporal Representation Learning on Event Streams. (arXiv:2303.03856v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2303.03856
Code URL: null
Copy Paste: [[2303.03856] Event Voxel Set Transformer for Spatiotemporal Representation Learning on Event Streams](http://arxiv.org/abs/2303.03856) #robust
Summary:
Event cameras are neuromorphic vision sensors representing visual information as sparse and asynchronous event streams. Most state-of-the-art event-based methods project events into dense frames and process them with conventional learning models. However, these approaches sacrifice the sparsity and high temporal resolution of event data, resulting in a large model size and high computational complexity. To fit the sparse nature of events and sufficiently explore their implicit relationship, we develop a novel attention-aware framework named Event Voxel Set Transformer (EVSTr) for spatiotemporal representation learning on event streams. It first converts the event stream into a voxel set and then hierarchically aggregates voxel features to obtain robust representations. The core of EVSTr is an event voxel transformer encoder to extract discriminative spatiotemporal features, which consists of two well-designed components, including a multi-scale neighbor embedding layer (MNEL) for local information aggregation and a voxel self-attention layer (VSAL) for global representation modeling. Enabling the framework to incorporate a long-term temporal structure, we introduce a segmental consensus strategy for modeling motion patterns over a sequence of segmented voxel sets. We evaluate the proposed framework on two event-based tasks: object classification and action recognition. Comprehensive experiments show that EVSTr achieves state-of-the-art performance while maintaining low model complexity. Additionally, we present a new dataset (NeuroHAR) recorded in challenging visual scenarios to address the lack of real-world event-based datasets for action recognition.

Title: DeepSeeColor: Realtime Adaptive Color Correction for Autonomous Underwater Vehicles via Deep Learning Methods. (arXiv:2303.04025v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2303.04025
Code URL: null
Copy Paste: [[2303.04025] DeepSeeColor: Realtime Adaptive Color Correction for Autonomous Underwater Vehicles via Deep Learning Methods](http://arxiv.org/abs/2303.04025) #robust
Summary:
Successful applications of complex vision-based behaviours underwater have lagged behind progress in terrestrial and aerial domains. This is largely due to the degraded image quality resulting from the physical phenomena involved in underwater image formation. Spectrally-selective light attenuation drains some colors from underwater images while backscattering adds others, making it challenging to perform vision-based tasks underwater. State-of-the-art methods for underwater color correction optimize the parameters of image formation models to restore the full spectrum of color to underwater imagery. However, these methods have high computational complexity that is unfavourable for realtime use by autonomous underwater vehicles (AUVs), as a result of having been primarily designed for offline color correction. Here, we present DeepSeeColor, a novel algorithm that combines a state-of-the-art underwater image formation model with the computational efficiency of deep learning frameworks. In our experiments, we show that DeepSeeColor offers comparable performance to the popular "Sea-Thru" algorithm (Akkaynak & Treibitz, 2019) while being able to rapidly process images at up to 60Hz, thus making it suitable for use onboard AUVs as a preprocessing step to enable more robust vision-based behaviours.

Title: A Challenging Benchmark for Low-Resource Learning. (arXiv:2303.03840v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2303.03840
Code URL: null
Copy Paste: [[2303.03840] A Challenging Benchmark for Low-Resource Learning](http://arxiv.org/abs/2303.03840) #robust
Summary:
With promising yet saturated results in high-resource settings, low-resource datasets have gradually become popular benchmarks for evaluating the learning ability of advanced neural networks (e.g., BigBench, superGLUE). Some models even surpass humans according to benchmark test results. However, we find that there exists a set of hard examples in low-resource settings that challenge neural networks but are not well evaluated, which causes over-estimated performance. We first give a theoretical analysis on which factors bring the difficulty of low-resource learning. It then motivate us to propose a challenging benchmark hardBench to better evaluate the learning ability, which covers 11 datasets, including 3 computer vision (CV) datasets and 8 natural language process (NLP) datasets. Experiments on a wide range of models show that neural networks, even pre-trained language models, have sharp performance drops on our benchmark, demonstrating the effectiveness on evaluating the weaknesses of neural networks. On NLP tasks, we surprisingly find that despite better results on traditional low-resource benchmarks, pre-trained networks, does not show performance improvements on our benchmarks. These results demonstrate that there are still a large robustness gap between existing models and human-level performance.

Title: Can Decentralized Learning be more robust than Federated Learning?. (arXiv:2303.03829v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2303.03829
Code URL: null
Copy Paste: [[2303.03829] Can Decentralized Learning be more robust than Federated Learning?](http://arxiv.org/abs/2303.03829) #robust
Summary:
Decentralized Learning (DL) is a peer--to--peer learning approach that allows a group of users to jointly train a machine learning model. To ensure correctness, DL should be robust, i.e., Byzantine users must not be able to tamper with the result of the collaboration. In this paper, we introduce two \textit{new} attacks against DL where a Byzantine user can: make the network converge to an arbitrary model of their choice, and exclude an arbitrary user from the learning process. We demonstrate our attacks' efficiency against Self--Centered Clipping, the state--of--the--art robust DL protocol. Finally, we show that the capabilities decentralization grants to Byzantine users result in decentralized learning \emph{always} providing less robustness than federated learning.

Title: Decision Transformer under Random Frame Dropping. (arXiv:2303.03391v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2303.03391
Code URL: null
Copy Paste: [[2303.03391] Decision Transformer under Random Frame Dropping](http://arxiv.org/abs/2303.03391) #robust
Summary:
Controlling agents remotely with deep reinforcement learning~(DRL) in the real world is yet to come. One crucial stepping stone is to devise RL algorithms that are robust in the face of dropped information from corrupted communication or malfunctioning sensors. Typical RL methods usually require considerable online interaction data that are costly and unsafe to collect in the real world. Furthermore, when applying to the frame dropping scenarios, they perform unsatisfactorily even with moderate drop rates. To address these issues, we propose Decision Transformer under Random Frame Dropping~(DeFog), an offline RL algorithm that enables agents to act robustly in frame dropping scenarios without online interaction. DeFog first randomly masks out data in the offline datasets and explicitly adds the time span of frame dropping as inputs. After that, a finetuning stage on the same offline dataset with a higher mask rate would further boost the performance. Empirical results show that DeFog outperforms strong baselines under severe frame drop rates like 90\%, while maintaining similar returns under non-frame-dropping conditions in the regular MuJoCo control benchmarks and the Atari environments. Our approach offers a robust and deployable solution for controlling agents in real-world environments with limited or unreliable data.

Title: Robust Dominant Periodicity Detection for Time Series with Missing Data. (arXiv:2303.03553v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2303.03553
Code URL: null
Copy Paste: [[2303.03553] Robust Dominant Periodicity Detection for Time Series with Missing Data](http://arxiv.org/abs/2303.03553) #robust
Summary:
Periodicity detection is an important task in time series analysis, but still a challenging problem due to the diverse characteristics of time series data like abrupt trend change, outlier, noise, and especially block missing data. In this paper, we propose a robust and effective periodicity detection algorithm for time series with block missing data. We first design a robust trend filter to remove the interference of complicated trend patterns under missing data. Then, we propose a robust autocorrelation function (ACF) that can handle missing values and outliers effectively. We rigorously prove that the proposed robust ACF can still work well when the length of the missing block is less than $1/3$ of the period length. Last, by combining the time-frequency information, our algorithm can generate the period length accurately. The experimental results demonstrate that our algorithm outperforms existing periodicity detection algorithms on real-world time series datasets.

Title: AHPA: Adaptive Horizontal Pod Autoscaling Systems on Alibaba Cloud Container Service for Kubernetes. (arXiv:2303.03640v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2303.03640
Code URL: null
Copy Paste: [[2303.03640] AHPA: Adaptive Horizontal Pod Autoscaling Systems on Alibaba Cloud Container Service for Kubernetes](http://arxiv.org/abs/2303.03640) #robust
Summary:
The existing resource allocation policy for application instances in Kubernetes cannot dynamically adjust according to the requirement of business, which would cause an enormous waste of resources during fluctuations. Moreover, the emergence of new cloud services puts higher resource management requirements. This paper discusses horizontal POD resources management in Alibaba Cloud Container Services with a newly deployed AI algorithm framework named AHPA -- the adaptive horizontal pod auto-scaling system. Based on a robust decomposition forecasting algorithm and performance training model, AHPA offers an optimal pod number adjustment plan that could reduce POD resources and maintain business stability. Since being deployed in April 2021, this system has expanded to multiple customer scenarios, including logistics, social networks, AI audio and video, e-commerce, etc. Compared with the previous algorithms, AHPA solves the elastic lag problem, increasing CPU usage by 10% and reducing resource cost by more than 20%. In addition, AHPA can automatically perform flexible planning according to the predicted business volume without manual intervention, significantly saving operation and maintenance costs.

Title: Robust Semi-Supervised Anomaly Detection via Adversarially Learned Continuous Noise Corruption. (arXiv:2303.03925v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2303.03925
Code URL: null
Copy Paste: [[2303.03925] Robust Semi-Supervised Anomaly Detection via Adversarially Learned Continuous Noise Corruption](http://arxiv.org/abs/2303.03925) #robust
Summary:
Anomaly detection is the task of recognising novel samples which deviate significantly from pre-establishednormality. Abnormal classes are not present during training meaning that models must learn effective rep-resentations solely across normal class data samples. Deep Autoencoders (AE) have been widely used foranomaly detection tasks, but suffer from overfitting to a null identity function. To address this problem, weimplement a training scheme applied to a Denoising Autoencoder (DAE) which introduces an efficient methodof producing Adversarially Learned Continuous Noise (ALCN) to maximally globally corrupt the input priorto denoising. Prior methods have applied similar approaches of adversarial training to increase the robustnessof DAE, however they exhibit limitations such as slow inference speed reducing their real-world applicabilityor producing generalised obfuscation which is more trivial to denoise. We show through rigorous evaluationthat our ALCN method of regularisation during training improves AUC performance during inference whileremaining efficient over both classical, leave-one-out novelty detection tasks with the variations-: 9 (normal)vs. 1 (abnormal) & 1 (normal) vs. 9 (abnormal); MNIST - AUCavg: 0.890 & 0.989, CIFAR-10 - AUCavg: 0.670& 0.742, in addition to challenging real-world anomaly detection tasks: industrial inspection (MVTEC-AD -AUCavg: 0.780) and plant disease detection (Plant Village
AUC: 0.770) when compared to prior approaches.

biometric

steal

extraction

Title: At Your Fingertips: Extracting Piano Fingering Instructions from Videos. (arXiv:2303.03745v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2303.03745
Code URL: null
Copy Paste: [[2303.03745] At Your Fingertips: Extracting Piano Fingering Instructions from Videos](http://arxiv.org/abs/2303.03745) #extraction
Summary:
Piano fingering -- knowing which finger to use to play each note in a musical piece, is a hard and important skill to master when learning to play the piano. While some sheet music is available with expert-annotated fingering information, most pieces lack this information, and people often resort to learning the fingering from demonstrations in online videos. We consider the AI task of automating the extraction of fingering information from videos. This is a non-trivial task as fingers are often occluded by other fingers, and it is often not clear from the video which of the keys were pressed, requiring the synchronization of hand position information and knowledge about the notes that were played. We show how to perform this task with high-accuracy using a combination of deep-learning modules, including a GAN-based approach for fine-tuning on out-of-domain data. We extract the fingering information with an f1 score of 97\%. We run the resulting system on 90 videos, resulting in high-quality piano fingering information of 150K notes, the largest available dataset of piano-fingering to date.

Title: Hidden Knowledge: Mathematical Methods for the Extraction of the Fingerprint of Medieval Paper from Digital Images. (arXiv:2303.03794v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2303.03794
Code URL: null
Copy Paste: [[2303.03794] Hidden Knowledge: Mathematical Methods for the Extraction of the Fingerprint of Medieval Paper from Digital Images](http://arxiv.org/abs/2303.03794) #extraction
Summary:
Medieval paper, a handmade product, is made with a mould which leaves an indelible imprint on the sheet of paper. This imprint includes chain lines, laid lines and watermarks which are often visible on the sheet. Extracting these features allows the identification of paper stock and gives information about chronology, localisation and movement of books and people. Most computational work for feature extraction of paper analysis has so far focused on radiography or transmitted light images. While these imaging methods provide clear visualisation for the features of interest, they are expensive and time consuming in their acquisition and not feasible for smaller institutions. However, reflected light images of medieval paper manuscripts are abundant and possibly cheaper in their acquisition. In this paper, we propose algorithms to detect and extract the laid and chain lines from reflected light images. We tackle the main drawback of reflected light images, that is, the low contrast attenuation of lines and intensity jumps due to noise and degradation, by employing the spectral total variation decomposition and develop methods for subsequent line extraction. Our results clearly demonstrate the feasibility of using reflected light images in paper analysis. This work enables the feature extraction for paper manuscripts that have otherwise not been analysed due to a lack of appropriate images. We also open the door for paper stock identification at scale.

Title: A survey on automated detection and classification of acute leukemia and WBCs in microscopic blood cells. (arXiv:2303.03916v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2303.03916
Code URL: null
Copy Paste: [[2303.03916] A survey on automated detection and classification of acute leukemia and WBCs in microscopic blood cells](http://arxiv.org/abs/2303.03916) #extraction
Summary:
Leukemia (blood cancer) is an unusual spread of White Blood Cells or Leukocytes (WBCs) in the bone marrow and blood. Pathologists can diagnose leukemia by looking at a person's blood sample under a microscope. They identify and categorize leukemia by counting various blood cells and morphological features. This technique is time-consuming for the prediction of leukemia. The pathologist's professional skills and experiences may be affecting this procedure, too. In computer vision, traditional machine learning and deep learning techniques are practical roadmaps that increase the accuracy and speed in diagnosing and classifying medical images such as microscopic blood cells. This paper provides a comprehensive analysis of the detection and classification of acute leukemia and WBCs in the microscopic blood cells. First, we have divided the previous works into six categories based on the output of the models. Then, we describe various steps of detection and classification of acute leukemia and WBCs, including Data Augmentation, Preprocessing, Segmentation, Feature Extraction, Feature Selection (Reduction), Classification, and focus on classification step in the methods. Finally, we divide automated detection and classification of acute leukemia and WBCs into three categories, including traditional, Deep Neural Network (DNN), and mixture (traditional and DNN) methods based on the type of classifier in the classification step and analyze them. The results of this study show that in the diagnosis and classification of acute leukemia and WBCs, the Support Vector Machine (SVM) classifier in traditional machine learning models and Convolutional Neural Network (CNN) classifier in deep learning models have widely employed. The performance metrics of the models that use these classifiers compared to the others model are higher.

Title: Classifying Text-Based Conspiracy Tweets related to COVID-19 using Contextualized Word Embeddings. (arXiv:2303.03706v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2303.03706
Code URL: null
Copy Paste: [[2303.03706] Classifying Text-Based Conspiracy Tweets related to COVID-19 using Contextualized Word Embeddings](http://arxiv.org/abs/2303.03706) #extraction
Summary:
The FakeNews task in MediaEval 2022 investigates the challenge of finding accurate and high-performance models for the classification of conspiracy tweets related to COVID-19. In this paper, we used BERT, ELMO, and their combination for feature extraction and RandomForest as classifier. The results show that ELMO performs slightly better than BERT, however their combination at feature level reduces the performance.

Title: Exploring the Feasibility of ChatGPT for Event Extraction. (arXiv:2303.03836v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2303.03836
Code URL: null
Copy Paste: [[2303.03836] Exploring the Feasibility of ChatGPT for Event Extraction](http://arxiv.org/abs/2303.03836) #extraction
Summary:
Event extraction is a fundamental task in natural language processing that involves identifying and extracting information about events mentioned in text. However, it is a challenging task due to the lack of annotated data, which is expensive and time-consuming to obtain. The emergence of large language models (LLMs) such as ChatGPT provides an opportunity to solve language tasks with simple prompts without the need for task-specific datasets and fine-tuning. While ChatGPT has demonstrated impressive results in tasks like machine translation, text summarization, and question answering, it presents challenges when used for complex tasks like event extraction. Unlike other tasks, event extraction requires the model to be provided with a complex set of instructions defining all event types and their schemas. To explore the feasibility of ChatGPT for event extraction and the challenges it poses, we conducted a series of experiments. Our results show that ChatGPT has, on average, only 51.04% of the performance of a task-specific model such as EEQA in long-tail and complex scenarios. Our usability testing experiments indicate that ChatGPT is not robust enough, and continuous refinement of the prompt does not lead to stable performance improvements, which can result in a poor user experience. Besides, ChatGPT is highly sensitive to different prompt styles.

Title: Document-level Relation Extraction with Cross-sentence Reasoning Graph. (arXiv:2303.03912v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2303.03912
Code URL: null
Copy Paste: [[2303.03912] Document-level Relation Extraction with Cross-sentence Reasoning Graph](http://arxiv.org/abs/2303.03912) #extraction
Summary:
Relation extraction (RE) has recently moved from the sentence-level to document-level, which requires aggregating document information and using entities and mentions for reasoning. Existing works put entity nodes and mention nodes with similar representations in a document-level graph, whose complex edges may incur redundant information. Furthermore, existing studies only focus on entity-level reasoning paths without considering global interactions among entities cross-sentence. To these ends, we propose a novel document-level RE model with a GRaph information Aggregation and Cross-sentence Reasoning network (GRACR). Specifically, a simplified document-level graph is constructed to model the semantic information of all mentions and sentences in a document, and an entity-level graph is designed to explore relations of long-distance cross-sentence entity pairs. Experimental results show that GRACR achieves excellent performance on two public datasets of document-level RE. It is especially effective in extracting potential relations of cross-sentence entity pairs. Our code is available at https://github.com/UESTC-LHF/GRACR.

Title: Exploiting Asymmetry for Synthetic Training Data Generation: SynthIE and the Case of Information Extraction. (arXiv:2303.04132v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2303.04132
Code URL: null
Copy Paste: [[2303.04132] Exploiting Asymmetry for Synthetic Training Data Generation: SynthIE and the Case of Information Extraction](http://arxiv.org/abs/2303.04132) #extraction
Summary:
Large language models (LLMs) show great potential for synthetic data generation. This work shows that useful data can be synthetically generated even for tasks that cannot be solved directly by the LLM: we show that, for problems with structured outputs, it is possible to prompt an LLM to perform the task in the opposite direction, to generate plausible text for the target structure. Leveraging the asymmetry in task difficulty makes it possible to produce large-scale, high-quality data for complex tasks. We demonstrate the effectiveness of this approach on closed information extraction, where collecting ground-truth data is challenging, and no satisfactory dataset exists to date. We synthetically generate a dataset of 1.8M data points, demonstrate its superior quality compared to existing datasets in a human evaluation and use it to finetune small models (220M and 770M parameters). The models we introduce, SynthIE, outperform existing baselines of comparable size with a substantial gap of 57 and 79 absolute points in micro and macro F1, respectively. Code, data, and models are available at https://github.com/epfl-dlab/SynthIE.

membership infer

Title: Students Parrot Their Teachers: Membership Inference on Model Distillation. (arXiv:2303.03446v1 [cs.CR])

Paper URL: http://arxiv.org/abs/2303.03446
Code URL: null
Copy Paste: [[2303.03446] Students Parrot Their Teachers: Membership Inference on Model Distillation](http://arxiv.org/abs/2303.03446) #membership infer
Summary:
Model distillation is frequently proposed as a technique to reduce the privacy leakage of machine learning. These empirical privacy defenses rely on the intuition that distilled student'' models protect the privacy of training data, as they only interact with this data indirectly through ateacher'' model. In this work, we design membership inference attacks to systematically study the privacy provided by knowledge distillation to both the teacher and student training sets. Our new attacks show that distillation alone provides only limited privacy across a number of domains. We explain the success of our attacks on distillation by showing that membership inference attacks on a private dataset can succeed even if the target model is never queried on any actual training points, but only on inputs whose predictions are highly influenced by training data. Finally, we show that our attacks are strongest when student and teacher sets are similar, or when the attacker can poison the teacher set.

Title: Can Membership Inferencing be Refuted?. (arXiv:2303.03648v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2303.03648
Code URL: null
Copy Paste: [[2303.03648] Can Membership Inferencing be Refuted?](http://arxiv.org/abs/2303.03648) #membership infer
Summary:
Membership inference (MI) attack is currently the most popular test for measuring privacy leakage in machine learning models. Given a machine learning model, a data point and some auxiliary information, the goal of an MI~attack is to determine whether the data point was used to train the model. In this work, we study the reliability of membership inference attacks in practice. Specifically, we show that a model owner can plausibly refute the result of a membership inference test on a data point $x$ by constructing a \textit{proof of repudiation} that proves that the model was trained \textit{without} $x$. We design efficient algorithms to construct proofs of repudiation for all data points of the training dataset. Our empirical evaluation demonstrates the practical feasibility of our algorithm by constructing proofs of repudiation for popular machine learning models on MNIST and CIFAR-10. Consequently, our results call for a re-evaluation of the implications of membership inference attacks in practice.

federate

fair

Title: Group conditional validity via multi-group learning. (arXiv:2303.03995v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2303.03995
Code URL: null
Copy Paste: [[2303.03995] Group conditional validity via multi-group learning](http://arxiv.org/abs/2303.03995) #fair
Summary:
We consider the problem of distribution-free conformal prediction and the criterion of group conditional validity. This criterion is motivated by many practical scenarios including hidden stratification and group fairness. Existing methods achieve such guarantees under either restrictive grouping structure or distributional assumptions, or they are overly-conservative under heteroskedastic noise. We propose a simple reduction to the problem of achieving validity guarantees for individual populations by leveraging algorithms for a problem called multi-group learning. This allows us to port theoretical guarantees from multi-group learning to obtain obtain sample complexity guarantees for conformal prediction. We also provide a new algorithm for multi-group learning for groups with hierarchical structure. Using this algorithm in our reduction leads to improved sample complexity guarantees with a simpler predictor structure.

interpretability

Title: Towards Composable Distributions of Latent Space Augmentations. (arXiv:2303.03462v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2303.03462
Code URL: null
Copy Paste: [[2303.03462] Towards Composable Distributions of Latent Space Augmentations](http://arxiv.org/abs/2303.03462) #interpretability
Summary:
We propose a composable framework for latent space image augmentation that allows for easy combination of multiple augmentations. Image augmentation has been shown to be an effective technique for improving the performance of a wide variety of image classification and generation tasks. Our framework is based on the Variational Autoencoder architecture and uses a novel approach for augmentation via linear transformation within the latent space itself. We explore losses and augmentation latent geometry to enforce the transformations to be composable and involuntary, thus allowing the transformations to be readily combined or inverted. Finally, we show these properties are better performing with certain pairs of augmentations, but we can transfer the latent space to other sets of augmentations to modify performance, effectively constraining the VAE's bottleneck to preserve the variance of specific augmentations and features of the image which we care about. We demonstrate the effectiveness of our approach with initial results on the MNIST dataset against both a standard VAE and a Conditional VAE. This latent augmentation method allows for much greater control and geometric interpretability of the latent space, making it a valuable tool for researchers and practitioners in the field.

Title: Filter Pruning based on Information Capacity and Independence. (arXiv:2303.03645v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2303.03645
Code URL: null
Copy Paste: [[2303.03645] Filter Pruning based on Information Capacity and Independence](http://arxiv.org/abs/2303.03645) #interpretability
Summary:
Filter pruning has been widely used in the compression and acceleration of convolutional neural networks (CNNs). However, most existing methods are still challenged by heavy compute cost and biased filter selection. Moreover, most designs for filter evaluation miss interpretability due to the lack of appropriate theoretical guidance. In this paper, we propose a novel filter pruning method which evaluates filters in a interpretable, multi-persepective and data-free manner. We introduce information capacity, a metric that represents the amount of information contained in a filter. Based on the interpretability and validity of information entropy, we propose to use that as a quantitative index of information quantity. Besides, we experimently show that the obvious correlation between the entropy of the feature map and the corresponding filter, so as to propose an interpretable, data-driven scheme to measure the information capacity of the filter. Further, we introduce information independence, another metric that represents the correlation among differrent filters. Consequently, the least impotant filters, which have less information capacity and less information independence, will be pruned. We evaluate our method on two benchmarks using multiple representative CNN architectures, including VGG-16 and ResNet. On CIFAR-10, we reduce 71.9% of floating-point operations (FLOPs) and 69.4% of parameters for ResNet-110 with 0.28% accuracy increase. On ILSVRC-2012, we reduce 76.6% of floating-point operations (FLOPs) and 68.6% of parameters for ResNet-50 with only 2.80% accuracy decrease, which outperforms the state-of-the-arts.

Title: Towards Interpretable and Efficient Automatic Reference-Based Summarization Evaluation. (arXiv:2303.03608v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2303.03608
Code URL: null
Copy Paste: [[2303.03608] Towards Interpretable and Efficient Automatic Reference-Based Summarization Evaluation](http://arxiv.org/abs/2303.03608) #interpretability
Summary:
Interpretability and efficiency are two important considerations for the adoption of neural automatic metrics. In this work, we develop strong-performing automatic metrics for reference-based summarization evaluation, based on a two-stage evaluation pipeline that first extracts basic information units from one text sequence and then checks the extracted units in another sequence. The metrics we developed include two-stage metrics that can provide high interpretability at both the fine-grained unit level and summary level, and one-stage metrics that achieve a balance between efficiency and interoperability. We make the developed tools publicly available through a Python package and GitHub.

Title: DA-VEGAN: Differentiably Augmenting VAE-GAN for microstructure reconstruction from extremely small data sets. (arXiv:2303.03403v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2303.03403
Code URL: null
Copy Paste: [[2303.03403] DA-VEGAN: Differentiably Augmenting VAE-GAN for microstructure reconstruction from extremely small data sets](http://arxiv.org/abs/2303.03403) #interpretability
Summary:
Microstructure reconstruction is an important and emerging field of research and an essential foundation to improving inverse computational materials engineering (ICME). Much of the recent progress in the field is made based on generative adversarial networks (GANs). Although excellent results have been achieved throughout a variety of materials, challenges remain regarding the interpretability of the model's latent space as well as the applicability to extremely small data sets. The present work addresses these issues by introducing DA-VEGAN, a model with two central innovations. First, a $\beta$-variational autoencoder is incorporated into a hybrid GAN architecture that allows to penalize strong nonlinearities in the latent space by an additional parameter, $\beta$. Secondly, a custom differentiable data augmentation scheme is developed specifically for this architecture. The differentiability allows the model to learn from extremely small data sets without mode collapse or deteriorated sample quality. An extensive validation on a variety of structures demonstrates the potential of the method and future directions of investigation are discussed.

explainability

Title: Multi-resolution Interpretation and Diagnostics Tool for Natural Language Classifiers. (arXiv:2303.03542v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2303.03542
Code URL: null
Copy Paste: [[2303.03542] Multi-resolution Interpretation and Diagnostics Tool for Natural Language Classifiers](http://arxiv.org/abs/2303.03542) #explainability
Summary:
Developing explainability methods for Natural Language Processing (NLP) models is a challenging task, for two main reasons. First, the high dimensionality of the data (large number of tokens) results in low coverage and in turn small contributions for the top tokens, compared to the overall model performance. Second, owing to their textual nature, the input variables, after appropriate transformations, are effectively binary (presence or absence of a token in an observation), making the input-output relationship difficult to understand. Common NLP interpretation techniques do not have flexibility in resolution, because they usually operate at word-level and provide fully local (message level) or fully global (over all messages) summaries. The goal of this paper is to create more flexible model explainability summaries by segments of observation or clusters of words that are semantically related to each other. In addition, we introduce a root cause analysis method for NLP models, by analyzing representative False Positive and False Negative examples from different segments. At the end, we illustrate, using a Yelp review data set with three segments (Restaurant, Hotel, and Beauty), that exploiting group/cluster structures in words and/or messages can aid in the interpretation of decisions made by NLP models and can be utilized to assess the model's sensitivity or bias towards gender, syntax, and word meanings.

watermark

diffusion

Title: DLT: Conditioned layout generation with Joint Discrete-Continuous Diffusion Layout Transformer. (arXiv:2303.03755v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2303.03755
Code URL: null
Copy Paste: [[2303.03755] DLT: Conditioned layout generation with Joint Discrete-Continuous Diffusion Layout Transformer](http://arxiv.org/abs/2303.03755) #diffusion
Summary:
Generating visual layouts is an essential ingredient of graphic design. The ability to condition layout generation on a partial subset of component attributes is critical to real-world applications that involve user interaction. Recently, diffusion models have demonstrated high-quality generative performances in various domains. However, it is unclear how to apply diffusion models to the natural representation of layouts which consists of a mix of discrete (class) and continuous (location, size) attributes. To address the conditioning layout generation problem, we introduce DLT, a joint discrete-continuous diffusion model. DLT is a transformer-based model which has a flexible conditioning mechanism that allows for conditioning on any given subset of all the layout component classes, locations, and sizes. Our method outperforms state-of-the-art generative models on various layout generation datasets with respect to different metrics and conditioning settings. Additionally, we validate the effectiveness of our proposed conditioning mechanism and the joint continuous-diffusion process. This joint process can be incorporated into a wide range of mixed discrete-continuous generative tasks.

Title: Zeroth-Order Optimization Meets Human Feedback: Provable Learning via Ranking Oracles. (arXiv:2303.03751v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2303.03751
Code URL: null
Copy Paste: [[2303.03751] Zeroth-Order Optimization Meets Human Feedback: Provable Learning via Ranking Oracles](http://arxiv.org/abs/2303.03751) #diffusion
Summary:
In this paper, we focus on a novel optimization problem in which the objective function is a black-box and can only be evaluated through a ranking oracle. This problem is common in real-world applications, particularly in cases where the function is assessed by human judges. Reinforcement Learning with Human Feedback (RLHF) is a prominent example of such an application, which is adopted by the recent works \cite{ouyang2022training,liu2023languages,chatgpt,bai2022training} to improve the quality of Large Language Models (LLMs) with human guidance. We propose ZO-RankSGD, a first-of-its-kind zeroth-order optimization algorithm, to solve this optimization problem with a theoretical guarantee. Specifically, our algorithm employs a new rank-based random estimator for the descent direction and is proven to converge to a stationary point. ZO-RankSGD can also be directly applied to the policy search problem in reinforcement learning when only a ranking oracle of the episode reward is available. This makes ZO-RankSGD a promising alternative to existing RLHF methods, as it optimizes in an online fashion and thus can work without any pre-collected data. Furthermore, we demonstrate the effectiveness of ZO-RankSGD in a novel application: improving the quality of images generated by a diffusion generative model with human ranking feedback. Throughout experiments, we found that ZO-RankSGD can significantly enhance the detail of generated images with only a few rounds of human feedback. Overall, our work advances the field of zeroth-order optimization by addressing the problem of optimizing functions with only ranking feedback, and offers an effective approach for aligning human and machine intentions in a wide range of domains. Our code is released here \url{https://github.com/TZW1998/Taming-Stable-Diffusion-with-Human-Ranking-Feedback}.