diffusion

Title: DreamCom: Finetuning Text-guided Inpainting Model for Image Composition. (arXiv:2309.15508v1 [cs.CV])

Title: Uncertainty Quantification via Neural Posterior Principal Components. (arXiv:2309.15533v1 [cs.CV])

Title: Dynamic Prompt Learning: Addressing Cross-Attention Leakage for Text-Based Image Editing. (arXiv:2309.15664v1 [cs.CV])

Title: Factorized Diffusion Architectures for Unsupervised Image Generation and Segmentation. (arXiv:2309.15726v1 [cs.CV])

Title: Emu: Enhancing Image Generation Models Using Photogenic Needles in a Haystack. (arXiv:2309.15807v1 [cs.CV])

Title: Show-1: Marrying Pixel and Latent Diffusion Models for Text-to-Video Generation. (arXiv:2309.15818v1 [cs.CV])

Title: Exploiting the Signal-Leak Bias in Diffusion Models. (arXiv:2309.15842v1 [cs.CV])

Title: Learning Using Generated Privileged Information by Text-to-Image Diffusion Models. (arXiv:2309.15238v1 [cs.CL])

Title: PINF: Continuous Normalizing Flows for Physics-Constrained Deep Learning. (arXiv:2309.15139v1 [cs.LG])

Title: Generative Residual Diffusion Modeling for Km-scale Atmospheric Downscaling. (arXiv:2309.15214v1 [cs.LG])

Title: Maximum Diffusion Reinforcement Learning. (arXiv:2309.15293v1 [cs.LG])

self-supervised

Title: SEPT: Towards Efficient Scene Representation Learning for Motion Prediction. (arXiv:2309.15289v1 [cs.CV])

Title: M$^{3}$3D: Learning 3D priors using Multi-Modal Masked Autoencoders for 2D image and video understanding. (arXiv:2309.15313v1 [cs.CV])

Title: KDD-LOAM: Jointly Learned Keypoint Detector and Descriptors Assisted LiDAR Odometry and Mapping. (arXiv:2309.15394v1 [cs.CV])

Title: The Triad of Failure Modes and a Possible Way Out. (arXiv:2309.15420v1 [cs.LG])

Title: SGRec3D: Self-Supervised 3D Scene Graph Learning via Object-Level Scene Reconstruction. (arXiv:2309.15702v1 [cs.CV])

Title: STANCE-C3: Domain-adaptive Cross-target Stance Detection via Contrastive Learning and Counterfactual Generation. (arXiv:2309.15176v1 [cs.CL])

Title: joint prediction and denoising for large-scale multilingual self-supervised learning. (arXiv:2309.15317v1 [cs.CL])

Title: Graph Neural Prompting with Large Language Models. (arXiv:2309.15427v1 [cs.CL])

Title: Exploring Speech Recognition, Translation, and Understanding with Discrete Speech Units: A Comparative Study. (arXiv:2309.15800v1 [cs.CL])

Title: Scaling Representation Learning from Ubiquitous ECG with State-Space Models. (arXiv:2309.15292v1 [cs.LG])

foundation model

Title: Towards Foundation Models Learned from Anatomy in Medical Imaging via Self-Supervision. (arXiv:2309.15358v1 [cs.CV])

Title: Tackling VQA with Pretrained Foundation Models without Further Training. (arXiv:2309.15487v1 [cs.CV])

Title: Learning from SAM: Harnessing a Segmentation Foundation Model for Sim2Real Domain Adaptation through Regularization. (arXiv:2309.15562v1 [cs.CV])

Title: Deep Model Fusion: A Survey. (arXiv:2309.15698v1 [cs.LG])

generative

Title: Subjective Face Transform using Human First Impressions. (arXiv:2309.15381v1 [cs.CV])

Title: P2I-NET: Mapping Camera Pose to Image via Adversarial Learning for New View Synthesis in Real Indoor Environments. (arXiv:2309.15526v1 [cs.CV])

Title: Guided Frequency Loss for Image Restoration. (arXiv:2309.15563v1 [cs.CV])

Title: A Unified View of Differentially Private Deep Generative Modeling. (arXiv:2309.15696v1 [cs.LG])

Title: Generative Speech Recognition Error Correction with Large Language Models. (arXiv:2309.15649v1 [cs.CL])

Title: HyPoradise: An Open Baseline for Generative Speech Recognition with Large Language Models. (arXiv:2309.15701v1 [cs.CL])

Title: ChatGPT-BCI: Word-Level Neural State Classification Using GPT, EEG, and Eye-Tracking Biomarkers in Semantic Inference Reading Comprehension. (arXiv:2309.15714v1 [cs.CL])

Title: Disinformation Detection: An Evolving Challenge in the Age of LLMs. (arXiv:2309.15847v1 [cs.CL])

Title: Deep Generative Methods for Producing Forecast Trajectories in Power Systems. (arXiv:2309.15137v1 [cs.LG])

Title: Deep Learning in Deterministic Computational Mechanics. (arXiv:2309.15421v1 [cs.LG])

Title: SANGEA: Scalable and Attributed Network Generation. (arXiv:2309.15648v1 [cs.LG])

anomaly

Title: Human Kinematics-inspired Skeleton-based Video Anomaly Detection. (arXiv:2309.15662v1 [cs.CV])

Title: ADGym: Design Choices for Deep Anomaly Detection. (arXiv:2309.15376v1 [cs.LG])

in-context

memory

Title: Contrastive Continual Multi-view Clustering with Filtered Structural Fusion. (arXiv:2309.15135v1 [cs.LG])

Title: Memory-Efficient Continual Learning Object Segmentation for Long Video. (arXiv:2309.15274v1 [cs.CV])

We propose two novel techniques to reduce the memory requirement of online VOS methods while improving modeling accuracy and generalization on long videos. Motivated by the success of continual learning techniques in preserving previously-learned knowledge, here we propose Gated-Regularizer Continual Learning (GRCL), which improves the performance of any online VOS subject to limited memory, and a Reconstruction-based Memory Selection Continual Learning (RMSCL) which empowers online VOS methods to efficiently benefit from stored information in memory.

Experimental results show that the proposed methods improve the performance of online VOS models up to 10 %, and boosts their robustness on long-video datasets while maintaining comparable performance on short-video datasets DAVIS16 and DAVIS17.

Title: Inherit with Distillation and Evolve with Contrast: Exploring Class Incremental Semantic Segmentation Without Exemplar Memory. (arXiv:2309.15413v1 [cs.CV])

Title: Local Compressed Video Stream Learning for Generic Event Boundary Detection. (arXiv:2309.15431v1 [cs.CV])

Title: Neuromorphic Imaging and Classification with Graph Learning. (arXiv:2309.15627v1 [cs.CV])

Title: One For All: Video Conversation is Feasible Without Video Instruction Tuning. (arXiv:2309.15785v1 [cs.CV])

Title: SHACIRA: Scalable HAsh-grid Compression for Implicit Neural Representations. (arXiv:2309.15848v1 [cs.CV])

Title: Enhancing End-to-End Conversational Speech Translation Through Target Language Context Utilization. (arXiv:2309.15686v1 [cs.CL])

Title: Auto-grading C programming assignments with CodeBERT and Random Forest Regressor. (arXiv:2309.15216v1 [cs.LG])

Title: A Physics Enhanced Residual Learning (PERL) Framework for Traffic State Prediction. (arXiv:2309.15284v1 [cs.LG])

Title: Enabling Resource-efficient AIoT System with Cross-level Optimization: A survey. (arXiv:2309.15467v1 [cs.LG])

Title: Rethinking Channel Dimensions to Isolate Outliers for Low-bit Weight Quantization of Large Language Models. (arXiv:2309.15531v1 [cs.LG])

Title: Federated Deep Equilibrium Learning: A Compact Shared Representation for Edge Communication Efficiency. (arXiv:2309.15659v1 [cs.LG])

few-shot

Title: Confidence-based Visual Dispersal for Few-shot Unsupervised Domain Adaptation. (arXiv:2309.15575v1 [cs.CV])

Title: Few-Shot Multi-Label Aspect Category Detection Utilizing Prototypical Network with Sentence-Level Weighting and Label Augmentation. (arXiv:2309.15588v1 [cs.CL])

Title: Robust Internal Representations for Domain Generalization. (arXiv:2309.15522v1 [cs.LG])