diffusion

Title: Light Field Diffusion for Single-View Novel View Synthesis. (arXiv:2309.11525v1 [cs.CV])

Title: Deshadow-Anything: When Segment Anything Model Meets Zero-shot shadow removal. (arXiv:2309.11715v1 [cs.CV])

Title: Latent Diffusion Models for Structural Component Design. (arXiv:2309.11601v1 [cs.LG])

self-supervised

Title: Understanding Pose and Appearance Disentanglement in 3D Human Pose Estimation. (arXiv:2309.11667v1 [cs.CV])

Title: MoDA: Leveraging Motion Priors from Videos for Advancing Unsupervised Domain Adaptation in Semantic Segmentation. (arXiv:2309.11711v1 [cs.CV])

Title: DimCL: Dimensional Contrastive Learning For Improving Self-Supervised Learning. (arXiv:2309.11782v1 [cs.CV])

Title: Multi-level Asymmetric Contrastive Learning for Medical Image Segmentation Pre-training. (arXiv:2309.11876v1 [cs.CV])

Title: Unlocking the Heart Using Adaptive Locked Agnostic Networks. (arXiv:2309.11899v1 [cs.CV])

Title: Bridging the Gap: Learning Pace Synchronization for Open-World Semi-Supervised Learning. (arXiv:2309.11930v1 [cs.LG])

Title: A Study of Forward-Forward Algorithm for Self-Supervised Learning. (arXiv:2309.11955v1 [cs.CV])

In this study, for the first time, we study the performance of forward-forward vs. backpropagation for self-supervised representation learning and provide insights into the learned representation spaces. Our benchmark employs four standard datasets, namely MNIST, F-MNIST, SVHN and CIFAR-10, and three commonly used self-supervised representation learning techniques, namely rotation, flip and jigsaw.

Our main finding is that while the forward-forward algorithm performs comparably to backpropagation during (self-)supervised training, the transfer performance is significantly lagging behind in all the studied settings. This may be caused by a combination of factors, including having a loss function for each layer and the way the supervised training is realized in the forward-forward paradigm. In comparison to backpropagation, the forward-forward algorithm focuses more on the boundaries and drops part of the information unnecessary for making decisions which harms the representation learning goal. Further investigation and research are necessary to stabilize the forward-forward strategy for self-supervised learning, to work beyond the datasets and configurations demonstrated by Geoffrey Hinton.

Title: Elevating Skeleton-Based Action Recognition with Efficient Multi-Modality Self-Supervision. (arXiv:2309.12009v1 [cs.CV])

In this work, we first propose an Implicit Knowledge Exchange Module (IKEM) which alleviates the propagation of erroneous knowledge between low-performance modalities. Then, we further propose three new modalities to enrich the complementary information between modalities. Finally, to maintain efficiency when introducing new modalities, we propose a novel teacher-student framework to distill the knowledge from the secondary modalities into the mandatory modalities considering the relationship constrained by anchors, positives, and negatives, named relational cross-modality knowledge distillation. The experimental results demonstrate the effectiveness of our approach, unlocking the efficient use of skeleton-based multi-modality data. Source code will be made publicly available at https://github.com/desehuileng0o0/IKEM.

Title: Unveiling the Hidden Realm: Self-supervised Skeleton-based Action Recognition in Occluded Environments. (arXiv:2309.12029v1 [cs.CV])

foundation model

Title: SAM-OCTA: A Fine-Tuning Strategy for Applying Foundation Model to OCTA Image Segmentation Tasks. (arXiv:2309.11758v1 [cs.CV])

generative

Title: Distilling Adversarial Prompts from Safety Benchmarks: Report for the Adversarial Nibbler Challenge. (arXiv:2309.11575v1 [cs.CV])

Title: TextCLIP: Text-Guided Face Image Generation And Manipulation Without Adversarial Training. (arXiv:2309.11923v1 [cs.CV])

Title: A Paradigm Shift in Machine Translation: Boosting Translation Performance of Large Language Models. (arXiv:2309.11674v1 [cs.CL])

Title: Word Embedding with Neural Probabilistic Prior. (arXiv:2309.11824v1 [cs.CL])

Title: InstructERC: Reforming Emotion Recognition in Conversation with a Retrieval Multi-task LLMs Framework. (arXiv:2309.11911v1 [cs.CL])

InstructERC, to reformulates the ERC task from a discriminative framework to a generative framework based on Large Language Models (LLMs) . InstructERC has two significant contributions: Firstly, InstructERC introduces a simple yet effective retrieval template module, which helps the model explicitly integrate multi-granularity dialogue supervision information by concatenating the historical dialog content, label statement, and emotional domain demonstrations with high semantic similarity. Furthermore, we introduce two additional emotion alignment tasks, namely speaker identification and emotion prediction tasks, to implicitly model the dialogue role relationships and future emotional tendencies in conversations. Our LLM-based plug-and-play plugin framework significantly outperforms all previous models and achieves comprehensive SOTA on three commonly used ERC datasets. Extensive analysis of parameter-efficient and data-scaling experiments provide empirical guidance for applying InstructERC in practical scenarios. Our code will be released after blind review.

Title: SQUARE: Automatic Question Answering Evaluation using Multiple Positive and Negative References. (arXiv:2309.12250v1 [cs.CL])

Title: Inspire the Large Language Model by External Knowledge on BioMedical Named Entity Recognition. (arXiv:2309.12278v1 [cs.CL])

Title: CATS: Conditional Adversarial Trajectory Synthesis for Privacy-Preserving Trajectory Data Publication Using Deep Learning Approaches. (arXiv:2309.11587v1 [cs.LG])

Title: Human-in-the-Loop Causal Discovery under Latent Confounding using Ancestral GFlowNets. (arXiv:2309.12032v1 [cs.LG])

anomaly

in-context

Title: Towards Effective Disambiguation for Machine Translation with Large Language Models. (arXiv:2309.11668v1 [cs.CL])

Title: Privacy-Preserving In-Context Learning with Differentially Private Few-Shot Generation. (arXiv:2309.11765v1 [cs.LG])

memory

Title: Efficient Long-Short Temporal Attention Network for Unsupervised Video Object Segmentation. (arXiv:2309.11707v1 [cs.CV])

Title: Memory-Augmented LLM Personalization with Short- and Long-Term Memory Coordination. (arXiv:2309.11696v1 [cs.CL])

Title: Clustering-based Domain-Incremental Learning. (arXiv:2309.12078v1 [cs.LG])

Title: SALSA-CLRS: A Sparse and Scalable Benchmark for Algorithmic Reasoning. (arXiv:2309.12253v1 [cs.LG])

few-shot