diffusion

Title: Intriguing Properties of Diffusion Models: A Large-Scale Dataset for Evaluating Natural Attack Capability in Text-to-Image Generative Models. (arXiv:2308.15692v1 [cs.CV])

Title: Zero-shot Inversion Process for Image Attribute Editing with Diffusion Models. (arXiv:2308.15854v1 [cs.CV])

Title: Feature Attention Network (FA-Net): A Deep-Learning Based Approach for Underwater Single Image Enhancement. (arXiv:2308.15868v1 [cs.CV])

Title: Physics-Informed DeepMRI: Bridging the Gap from Heat Diffusion to k-Space Interpolation. (arXiv:2308.15918v1 [cs.CV])

Title: DiffuVolume: Diffusion Model for Volume based Stereo Matching. (arXiv:2308.15989v1 [cs.CV])

Title: SignDiff: Learning Diffusion Models for American Sign Language Production. (arXiv:2308.16082v1 [cs.CV])

To conduct large-scale ASLP, we propose SignDiff based on the latest work in related fields, which is a dual-condition diffusion pre-training model that can generate human sign language speakers from a skeleton pose. SignDiff has a novel Frame Reinforcement Network called FR-Net, similar to dense human pose estimation work, which enhances the correspondence between text lexical symbols and sign language dense pose frames reduce the occurrence of multiple fingers in the diffusion model. In addition, our ASLP method proposes two new improved modules and a new loss function to improve the accuracy and quality of sign language skeletal posture and enhance the ability of the model to train on large-scale data.

We propose the first baseline for ASL production and report the scores of 17.19 and 12.85 on BLEU-4 on the How2Sign dev/test sets. We also evaluated our model on the previous mainstream dataset called PHOENIX14T, and the main experiments achieved the results of SOTA. In addition, our image quality far exceeds all previous results by 10 percentage points on the SSIM indicator. Finally, we conducted ablation studies and qualitative evaluations for discussion.

self-supervised

foundation model

Title: Multimodal Foundation Models For Echocardiogram Interpretation. (arXiv:2308.15670v1 [cs.CV])

generative

Title: Unveiling Camouflage: A Learnable Fourier-based Augmentation for Camouflaged Object Detection and Instance Segmentation. (arXiv:2308.15660v1 [cs.CV])

Title: DTrOCR: Decoder-only Transformer for Optical Character Recognition. (arXiv:2308.15996v1 [cs.CV])

Title: Jais and Jais-chat: Arabic-Centric Foundation and Instruction-Tuned Open Generative Large Language Models. (arXiv:2308.16149v1 [cs.CL])

Title: On the Steganographic Capacity of Selected Learning Models. (arXiv:2308.15502v1 [cs.LG])

Title: Fully Embedded Time-Series Generative Adversarial Networks. (arXiv:2308.15730v1 [cs.LG])

anomaly

Title: AnoVL: Adapting Vision-Language Models for Unified Zero-shot Anomaly Localization. (arXiv:2308.15939v1 [cs.CV])

in-context

Title: Text-to-OverpassQL: A Natural Language Interface for Complex Geodata Querying of OpenStreetMap. (arXiv:2308.16060v1 [cs.CL])

memory

Title: Towards Earlier Detection of Oral Diseases On Smartphones Using Oral and Dental RGB Images. (arXiv:2308.15705v1 [cs.CV])

Title: Everything Perturbed All at Once: Enabling Differentiable Graph Attacks. (arXiv:2308.15614v1 [cs.LG])

Title: Advanced Deep Regression Models for Forecasting Time Series Oil Production. (arXiv:2308.16105v1 [cs.LG])

few-shot

Title: Improving Few-shot Image Generation by Structural Discrimination and Textural Modulation. (arXiv:2308.16110v1 [cs.CV])

Title: MerA: Merging Pretrained Adapters For Few-Shot Learning. (arXiv:2308.15982v1 [cs.CL])