diffusion

Title: Masked Diffusion as Self-supervised Representation Learner. (arXiv:2308.05695v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2308.05695
Code URL: null
Copy Paste: [[2308.05695] Masked Diffusion as Self-supervised Representation Learner](http://arxiv.org/abs/2308.05695) #diffusion
Summary:
Denoising diffusion probabilistic models have recently demonstrated state-of-the-art generative performance and been used as strong pixel-level representation learners. This paper decomposes the interrelation between the generative capability and representation learning ability inherent in diffusion models. We present masked diffusion model (MDM), a scalable self-supervised representation learner that substitutes the conventional additive Gaussian noise of traditional diffusion with a masking mechanism. Our proposed approach convincingly surpasses prior benchmarks, demonstrating remarkable advancements in both medical and natural image semantic segmentation tasks, particularly within the context of few-shot scenario.

Title: PDE-Refiner: Achieving Accurate Long Rollouts with Neural PDE Solvers. (arXiv:2308.05732v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2308.05732
Code URL: null
Copy Paste: [[2308.05732] PDE-Refiner: Achieving Accurate Long Rollouts with Neural PDE Solvers](http://arxiv.org/abs/2308.05732) #diffusion
Summary:
Time-dependent partial differential equations (PDEs) are ubiquitous in science and engineering. Recently, mostly due to the high computational cost of traditional solution techniques, deep neural network based surrogates have gained increased interest. The practical utility of such neural PDE solvers relies on their ability to provide accurate, stable predictions over long time horizons, which is a notoriously hard problem. In this work, we present a large-scale analysis of common temporal rollout strategies, identifying the neglect of non-dominant spatial frequency information, often associated with high frequencies in PDE solutions, as the primary pitfall limiting stable, accurate rollout performance. Based on these insights, we draw inspiration from recent advances in diffusion models to introduce PDE-Refiner; a novel model class that enables more accurate modeling of all frequency components via a multistep refinement process. We validate PDE-Refiner on challenging benchmarks of complex fluid dynamics, demonstrating stable and accurate rollouts that consistently outperform state-of-the-art models, including neural, numerical, and hybrid neural-numerical architectures. We further demonstrate that PDE-Refiner greatly enhances data efficiency, since the denoising objective implicitly induces a novel form of spectral data augmentation. Finally, PDE-Refiner's connection to diffusion models enables an accurate and efficient assessment of the model's predictive uncertainty, allowing us to estimate when the surrogate becomes inaccurate.

self-supervised

Title: SC3K: Self-supervised and Coherent 3D Keypoints Estimation from Rotated, Noisy, and Decimated Point Cloud Data. (arXiv:2308.05410v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2308.05410
Code URL: null
Copy Paste: [[2308.05410] SC3K: Self-supervised and Coherent 3D Keypoints Estimation from Rotated, Noisy, and Decimated Point Cloud Data](http://arxiv.org/abs/2308.05410) #self-supervised
Summary:
This paper proposes a new method to infer keypoints from arbitrary object categories in practical scenarios where point cloud data (PCD) are noisy, down-sampled and arbitrarily rotated. Our proposed model adheres to the following principles: i) keypoints inference is fully unsupervised (no annotation given), ii) keypoints position error should be low and resilient to PCD perturbations (robustness), iii) keypoints should not change their indexes for the intra-class objects (semantic coherence), iv) keypoints should be close to or proximal to PCD surface (compactness). We achieve these desiderata by proposing a new self-supervised training strategy for keypoints estimation that does not assume any a priori knowledge of the object class, and a model architecture with coupled auxiliary losses that promotes the desired keypoints properties. We compare the keypoints estimated by the proposed approach with those of the state-of-the-art unsupervised approaches. The experiments show that our approach outperforms by estimating keypoints with improved coverage (+9.41%) while being semantically consistent (+4.66%) that best characterizes the object's 3D shape for downstream tasks. Code and data are available at: https://github.com/IITPAVIS/SC3K

Title: Self-Supervised Monocular Depth Estimation by Direction-aware Cumulative Convolution Network. (arXiv:2308.05605v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2308.05605
Code URL: https://github.com/wencheng256/daccn
Copy Paste: [[2308.05605] Self-Supervised Monocular Depth Estimation by Direction-aware Cumulative Convolution Network](http://arxiv.org/abs/2308.05605) #self-supervised
Summary:
Monocular depth estimation is known as an ill-posed task in which objects in a 2D image usually do not contain sufficient information to predict their depth. Thus, it acts differently from other tasks (e.g., classification and segmentation) in many ways. In this paper, we find that self-supervised monocular depth estimation shows a direction sensitivity and environmental dependency in the feature representation. But the current backbones borrowed from other tasks pay less attention to handling different types of environmental information, limiting the overall depth accuracy. To bridge this gap, we propose a new Direction-aware Cumulative Convolution Network (DaCCN), which improves the depth feature representation in two aspects. First, we propose a direction-aware module, which can learn to adjust the feature extraction in each direction, facilitating the encoding of different types of information. Secondly, we design a new cumulative convolution to improve the efficiency for aggregating important environmental information. Experiments show that our method achieves significant improvements on three widely used benchmarks, KITTI, Cityscapes, and Make3D, setting a new state-of-the-art performance on the popular benchmarks with all three types of self-supervision.

Title: Zero Grads Ever Given: Learning Local Surrogate Losses for Non-Differentiable Graphics. (arXiv:2308.05739v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2308.05739
Code URL: null
Copy Paste: [[2308.05739] Zero Grads Ever Given: Learning Local Surrogate Losses for Non-Differentiable Graphics](http://arxiv.org/abs/2308.05739) #self-supervised
Summary:
Gradient-based optimization is now ubiquitous across graphics, but unfortunately can not be applied to problems with undefined or zero gradients. To circumvent this issue, the loss function can be manually replaced by a "surrogate" that has similar minima but is differentiable. Our proposed framework, ZeroGrads, automates this process by learning a neural approximation of the objective function, the surrogate, which in turn can be used to differentiate through arbitrary black-box graphics pipelines. We train the surrogate on an actively smoothed version of the objective and encourage locality, focusing the surrogate's capacity on what matters at the current training episode. The fitting is performed online, alongside the parameter optimization, and self-supervised, without pre-computed data or pre-trained models. As sampling the objective is expensive (it requires a full rendering or simulator run), we devise an efficient sampling scheme that allows for tractable run-times and competitive performance at little overhead. We demonstrate optimizing diverse non-convex, non-differentiable black-box problems in graphics, such as visibility in rendering, discrete parameter spaces in procedural modelling or optimal control in physics-driven animation. In contrast to more traditional algorithms, our approach scales well to higher dimensions, which we demonstrate on problems with up to 35k interlinked variables.

Title: EXPRESSO: A Benchmark and Analysis of Discrete Expressive Speech Resynthesis. (arXiv:2308.05725v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2308.05725
Code URL: null
Copy Paste: [[2308.05725] EXPRESSO: A Benchmark and Analysis of Discrete Expressive Speech Resynthesis](http://arxiv.org/abs/2308.05725) #self-supervised
Summary:
Recent work has shown that it is possible to resynthesize high-quality speech based, not on text, but on low bitrate discrete units that have been learned in a self-supervised fashion and can therefore capture expressive aspects of speech that are hard to transcribe (prosody, voice styles, non-verbal vocalization). The adoption of these methods is still limited by the fact that most speech synthesis datasets are read, severely limiting spontaneity and expressivity. Here, we introduce Expresso, a high-quality expressive speech dataset for textless speech synthesis that includes both read speech and improvised dialogues rendered in 26 spontaneous expressive styles. We illustrate the challenges and potentials of this dataset with an expressive resynthesis benchmark where the task is to encode the input in low-bitrate units and resynthesize it in a target voice while preserving content and style. We evaluate resynthesis quality with automatic metrics for different self-supervised discrete encoders, and explore tradeoffs between quality, bitrate and invariance to speaker and style. All the dataset, evaluation metrics and baseline models are open source

foundation model

Title: Adaptive Low Rank Adaptation of Segment Anything to Salient Object Detection. (arXiv:2308.05426v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2308.05426
Code URL: null
Copy Paste: [[2308.05426] Adaptive Low Rank Adaptation of Segment Anything to Salient Object Detection](http://arxiv.org/abs/2308.05426) #foundation model
Summary:
Foundation models, such as OpenAI's GPT-3 and GPT-4, Meta's LLaMA, and Google's PaLM2, have revolutionized the field of artificial intelligence. A notable paradigm shift has been the advent of the Segment Anything Model (SAM), which has exhibited a remarkable capability to segment real-world objects, trained on 1 billion masks and 11 million images. Although SAM excels in general object segmentation, it lacks the intrinsic ability to detect salient objects, resulting in suboptimal performance in this domain. To address this challenge, we present the Segment Salient Object Model (SSOM), an innovative approach that adaptively fine-tunes SAM for salient object detection by harnessing the low-rank structure inherent in deep learning. Comprehensive qualitative and quantitative evaluations across five challenging RGB benchmark datasets demonstrate the superior performance of our approach, surpassing state-of-the-art methods.

Title: AD-CLIP: Adapting Domains in Prompt Space Using CLIP. (arXiv:2308.05659v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2308.05659
Code URL: null
Copy Paste: [[2308.05659] AD-CLIP: Adapting Domains in Prompt Space Using CLIP](http://arxiv.org/abs/2308.05659) #foundation model
Summary:
Although deep learning models have shown impressive performance on supervised learning tasks, they often struggle to generalize well when the training (source) and test (target) domains differ. Unsupervised domain adaptation (DA) has emerged as a popular solution to this problem. However, current DA techniques rely on visual backbones, which may lack semantic richness. Despite the potential of large-scale vision-language foundation models like CLIP, their effectiveness for DA has yet to be fully explored. To address this gap, we introduce AD-CLIP, a domain-agnostic prompt learning strategy for CLIP that aims to solve the DA problem in the prompt space. We leverage the frozen vision backbone of CLIP to extract both image style (domain) and content information, which we apply to learn prompt tokens. Our prompts are designed to be domain-invariant and class-generalizable, by conditioning prompt learning on image style and content features simultaneously. We use standard supervised contrastive learning in the source domain, while proposing an entropy minimization strategy to align domains in the embedding space given the target domain data. We also consider a scenario where only target domain samples are available during testing, without any source domain data, and propose a cross-domain style mapping network to hallucinate domain-agnostic tokens. Our extensive experiments on three benchmark DA datasets demonstrate the effectiveness of AD-CLIP compared to existing literature.

generative

Title: Data-Free Model Extraction Attacks in the Context of Object Detection. (arXiv:2308.05127v1 [cs.CR])

Paper URL: http://arxiv.org/abs/2308.05127
Code URL: null
Copy Paste: [[2308.05127] Data-Free Model Extraction Attacks in the Context of Object Detection](http://arxiv.org/abs/2308.05127) #generative
Summary:
A significant number of machine learning models are vulnerable to model extraction attacks, which focus on stealing the models by using specially curated queries against the target model. This task is well accomplished by using part of the training data or a surrogate dataset to train a new model that mimics a target model in a white-box environment. In pragmatic situations, however, the target models are trained on private datasets that are inaccessible to the adversary. The data-free model extraction technique replaces this problem when it comes to using queries artificially curated by a generator similar to that used in Generative Adversarial Nets. We propose for the first time, to the best of our knowledge, an adversary black box attack extending to a regression problem for predicting bounding box coordinates in object detection. As part of our study, we found that defining a loss function and using a novel generator setup is one of the key aspects in extracting the target model. We find that the proposed model extraction method achieves significant results by using reasonable queries. The discovery of this object detection vulnerability will support future prospects for securing such models.

Title: Hierarchical Representations for Spatio-Temporal Visual Attention Modeling and Understanding. (arXiv:2308.05189v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2308.05189
Code URL: null
Copy Paste: [[2308.05189] Hierarchical Representations for Spatio-Temporal Visual Attention Modeling and Understanding](http://arxiv.org/abs/2308.05189) #generative
Summary:
This PhD. Thesis concerns the study and development of hierarchical representations for spatio-temporal visual attention modeling and understanding in video sequences. More specifically, we propose two computational models for visual attention. First, we present a generative probabilistic model for context-aware visual attention modeling and understanding. Secondly, we develop a deep network architecture for visual attention modeling, which first estimates top-down spatio-temporal visual attention, and ultimately serves for modeling attention in the temporal domain.

Title: Vector quantization loss analysis in VQGANs: a single-GPU ablation study for image-to-image synthesis. (arXiv:2308.05242v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2308.05242
Code URL: https://github.com/luv91/vqgan_project
Copy Paste: [[2308.05242] Vector quantization loss analysis in VQGANs: a single-GPU ablation study for image-to-image synthesis](http://arxiv.org/abs/2308.05242) #generative
Summary:
This study performs an ablation analysis of Vector Quantized Generative Adversarial Networks (VQGANs), concentrating on image-to-image synthesis utilizing a single NVIDIA A100 GPU. The current work explores the nuanced effects of varying critical parameters including the number of epochs, image count, and attributes of codebook vectors and latent dimensions, specifically within the constraint of limited resources. Notably, our focus is pinpointed on the vector quantization loss, keeping other hyperparameters and loss components (GAN loss) fixed. This was done to delve into a deeper understanding of the discrete latent space, and to explore how varying its size affects the reconstruction. Though, our results do not surpass the existing benchmarks, however, our findings shed significant light on VQGAN's behaviour for a smaller dataset, particularly concerning artifacts, codebook size optimization, and comparative analysis with Principal Component Analysis (PCA). The study also uncovers the promising direction by introducing 2D positional encodings, revealing a marked reduction in artifacts and insights into balancing clarity and overfitting.

Title: Shadow Datasets, New challenging datasets for Causal Representation Learning. (arXiv:2308.05707v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2308.05707
Code URL: https://github.com/Jiagengzhu/Shadow-dataset-for-crl
Copy Paste: [[2308.05707] Shadow Datasets, New challenging datasets for Causal Representation Learning](http://arxiv.org/abs/2308.05707) #generative
Summary:
Discovering causal relations among semantic factors is an emergent topic in representation learning. Most causal representation learning (CRL) methods are fully supervised, which is impractical due to costly labeling. To resolve this restriction, weakly supervised CRL methods were introduced. To evaluate CRL performance, four existing datasets, Pendulum, Flow, CelebA(BEARD) and CelebA(SMILE), are utilized. However, existing CRL datasets are limited to simple graphs with few generative factors. Thus we propose two new datasets with a larger number of diverse generative factors and more sophisticated causal graphs. In addition, current real datasets, CelebA(BEARD) and CelebA(SMILE), the originally proposed causal graphs are not aligned with the dataset distributions. Thus, we propose modifications to them.

Title: Neural Progressive Meshes. (arXiv:2308.05741v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2308.05741
Code URL: null
Copy Paste: [[2308.05741] Neural Progressive Meshes](http://arxiv.org/abs/2308.05741) #generative
Summary:
The recent proliferation of 3D content that can be consumed on hand-held devices necessitates efficient tools for transmitting large geometric data, e.g., 3D meshes, over the Internet. Detailed high-resolution assets can pose a challenge to storage as well as transmission bandwidth, and level-of-detail techniques are often used to transmit an asset using an appropriate bandwidth budget. It is especially desirable for these methods to transmit data progressively, improving the quality of the geometry with more data. Our key insight is that the geometric details of 3D meshes often exhibit similar local patterns even across different shapes, and thus can be effectively represented with a shared learned generative space. We learn this space using a subdivision-based encoder-decoder architecture trained in advance on a large collection of surfaces. We further observe that additional residual features can be transmitted progressively between intermediate levels of subdivision that enable the client to control the tradeoff between bandwidth cost and quality of reconstruction, providing a neural progressive mesh representation. We evaluate our method on a diverse set of complex 3D shapes and demonstrate that it outperforms baselines in terms of compression ratio and reconstruction quality.

Title: Classification of Human- and AI-Generated Texts: Investigating Features for ChatGPT. (arXiv:2308.05341v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2308.05341
Code URL: null
Copy Paste: [[2308.05341] Classification of Human- and AI-Generated Texts: Investigating Features for ChatGPT](http://arxiv.org/abs/2308.05341) #generative
Summary:
Recently, generative AIs like ChatGPT have become available to the wide public. These tools can for instance be used by students to generate essays or whole theses. But how does a teacher know whether a text is written by a student or an AI? In our work, we explore traditional and new features to (1) detect text generated by AI from scratch and (2) text rephrased by AI. Since we found that classification is more difficult when the AI has been instructed to create the text in a way that a human would not recognize that it was generated by an AI, we also investigate this more advanced case. For our experiments, we produced a new text corpus covering 10 school topics. Our best systems to classify basic and advanced human-generated/AI-generated texts have F1-scores of over 96%. Our best systems for classifying basic and advanced human-generated/AI-rephrased texts have F1-scores of more than 78%. The systems use a combination of perplexity, semantic, list lookup, error-based, readability, AI feedback, and text vector features. Our results show that the new features substantially help to improve the performance of many classifiers. Our best basic text rephrasing detection system even outperforms GPTZero by 183.8% relative in F1-score.

Title: $\mathcal{G}^2Pxy$: Generative Open-Set Node Classification on Graphs with Proxy Unknowns. (arXiv:2308.05463v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2308.05463
Code URL: null
Copy Paste: [[2308.05463] $\mathcal{G}^2Pxy$: Generative Open-Set Node Classification on Graphs with Proxy Unknowns](http://arxiv.org/abs/2308.05463) #generative
Summary:
Node classification is the task of predicting the labels of unlabeled nodes in a graph. State-of-the-art methods based on graph neural networks achieve excellent performance when all labels are available during training. But in real-life, models are often applied on data with new classes, which can lead to massive misclassification and thus significantly degrade performance. Hence, developing open-set classification methods is crucial to determine if a given sample belongs to a known class. Existing methods for open-set node classification generally use transductive learning with part or all of the features of real unseen class nodes to help with open-set classification. In this paper, we propose a novel generative open-set node classification method, i.e. $\mathcal{G}^2Pxy$, which follows a stricter inductive learning setting where no information about unknown classes is available during training and validation. Two kinds of proxy unknown nodes, inter-class unknown proxies and external unknown proxies are generated via mixup to efficiently anticipate the distribution of novel classes. Using the generated proxies, a closed-set classifier can be transformed into an open-set one, by augmenting it with an extra proxy classifier. Under the constraints of both cross entropy loss and complement entropy loss, $\mathcal{G}^2Pxy$ achieves superior effectiveness for unknown class detection and known class classification, which is validated by experiments on benchmark graph datasets. Moreover, $\mathcal{G}^2Pxy$ does not have specific requirement on the GNN architecture and shows good generalizations.

anomaly

in-context

memory

Title: Explicifying Neural Implicit Fields for Efficient Dynamic Human Avatar Modeling via a Neural Explicit Surface. (arXiv:2308.05112v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2308.05112
Code URL: null
Copy Paste: [[2308.05112] Explicifying Neural Implicit Fields for Efficient Dynamic Human Avatar Modeling via a Neural Explicit Surface](http://arxiv.org/abs/2308.05112) #memory
Summary:
This paper proposes a technique for efficiently modeling dynamic humans by explicifying the implicit neural fields via a Neural Explicit Surface (NES). Implicit neural fields have advantages over traditional explicit representations in modeling dynamic 3D content from sparse observations and effectively representing complex geometries and appearances. Implicit neural fields defined in 3D space, however, are expensive to render due to the need for dense sampling during volumetric rendering. Moreover, their memory efficiency can be further optimized when modeling sparse 3D space. To overcome these issues, the paper proposes utilizing Neural Explicit Surface (NES) to explicitly represent implicit neural fields, facilitating memory and computational efficiency. To achieve this, the paper creates a fully differentiable conversion between the implicit neural fields and the explicit rendering interface of NES, leveraging the strengths of both implicit and explicit approaches. This conversion enables effective training of the hybrid representation using implicit methods and efficient rendering by integrating the explicit rendering interface with a newly proposed rasterization-based neural renderer that only incurs a texture color query once for the initial ray interaction with the explicit surface, resulting in improved inference efficiency. NES describes dynamic human geometries with pose-dependent neural implicit surface deformation fields and their dynamic neural textures both in 2D space, which is a more memory-efficient alternative to traditional 3D methods, reducing redundancy and computational load. The comprehensive experiments show that NES performs similarly to previous 3D approaches, with greatly improved rendering speed and reduced memory cost.

Title: RLSAC: Reinforcement Learning enhanced Sample Consensus for End-to-End Robust Estimation. (arXiv:2308.05318v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2308.05318
Code URL: https://github.com/irmvlab/rlsac
Copy Paste: [[2308.05318] RLSAC: Reinforcement Learning enhanced Sample Consensus for End-to-End Robust Estimation](http://arxiv.org/abs/2308.05318) #memory
Summary:
Robust estimation is a crucial and still challenging task, which involves estimating model parameters in noisy environments. Although conventional sampling consensus-based algorithms sample several times to achieve robustness, these algorithms cannot use data features and historical information effectively. In this paper, we propose RLSAC, a novel Reinforcement Learning enhanced SAmple Consensus framework for end-to-end robust estimation. RLSAC employs a graph neural network to utilize both data and memory features to guide exploring directions for sampling the next minimum set. The feedback of downstream tasks serves as the reward for unsupervised training. Therefore, RLSAC can avoid differentiating to learn the features and the feedback of downstream tasks for end-to-end robust estimation. In addition, RLSAC integrates a state transition module that encodes both data and memory features. Our experimental results demonstrate that RLSAC can learn from features to gradually explore a better hypothesis. Through analysis, it is apparent that RLSAC can be easily transferred to other sampling consensus-based robust estimation tasks. To the best of our knowledge, RLSAC is also the first method that uses reinforcement learning to sample consensus for end-to-end robust estimation. We release our codes at https://github.com/IRMVLab/RLSAC.

Title: Look at the Neighbor: Distortion-aware Unsupervised Domain Adaptation for Panoramic Semantic Segmentation. (arXiv:2308.05493v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2308.05493
Code URL: null
Copy Paste: [[2308.05493] Look at the Neighbor: Distortion-aware Unsupervised Domain Adaptation for Panoramic Semantic Segmentation](http://arxiv.org/abs/2308.05493) #memory
Summary:
Endeavors have been recently made to transfer knowledge from the labeled pinhole image domain to the unlabeled panoramic image domain via Unsupervised Domain Adaptation (UDA). The aim is to tackle the domain gaps caused by the style disparities and distortion problem from the non-uniformly distributed pixels of equirectangular projection (ERP). Previous works typically focus on transferring knowledge based on geometric priors with specially designed multi-branch network architectures. As a result, considerable computational costs are induced, and meanwhile, their generalization abilities are profoundly hindered by the variation of distortion among pixels. In this paper, we find that the pixels' neighborhood regions of the ERP indeed introduce less distortion. Intuitively, we propose a novel UDA framework that can effectively address the distortion problems for panoramic semantic segmentation. In comparison, our method is simpler, easier to implement, and more computationally efficient. Specifically, we propose distortion-aware attention (DA) capturing the neighboring pixel distribution without using any geometric constraints. Moreover, we propose a class-wise feature aggregation (CFA) module to iteratively update the feature representations with a memory bank. As such, the feature similarity between two domains can be consistently optimized. Extensive experiments show that our method achieves new state-of-the-art performance while remarkably reducing 80% parameters.

Title: NUPES : Non-Uniform Post-Training Quantization via Power Exponent Search. (arXiv:2308.05600v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2308.05600
Code URL: null
Copy Paste: [[2308.05600] NUPES : Non-Uniform Post-Training Quantization via Power Exponent Search](http://arxiv.org/abs/2308.05600) #memory
Summary:
Deep neural network (DNN) deployment has been confined to larger hardware devices due to their expensive computational requirements. This challenge has recently reached another scale with the emergence of large language models (LLMs). In order to reduce both their memory footprint and latency, a promising technique is quantization. It consists in converting floating point representations to low bit-width fixed point representations, usually by assuming a uniform mapping onto a regular grid. This process, referred to in the literature as uniform quantization, may however be ill-suited as most DNN weights and activations follow a bell-shaped distribution. This is even worse on LLMs whose weight distributions are known to exhibit large, high impact, outlier values. In this work, we propose an improvement over the most commonly adopted way to tackle this limitation in deep learning models quantization, namely, non-uniform quantization. NUPES leverages automorphisms to preserve the scalar multiplications. Such transformations are derived from power functions. However, the optimization of the exponent parameter and weight values remains a challenging and novel problem which could not be solved with previous post training optimization techniques which only learn to round up or down weight values in order to preserve the predictive function. We circumvent this limitation with a new paradigm: learning new quantized weights over the entire quantized space. Similarly, we enable the optimization of the power exponent, i.e. the optimization of the quantization operator itself during training by alleviating all the numerical instabilities. The resulting predictive function is compatible with integer-only low-bit inference. We show the ability of the method to achieve state-of-the-art compression rates in both, data-free and data-driven configurations.

Title: Iterative Reweighted Least Squares Networks With Convergence Guarantees for Solving Inverse Imaging Problems. (arXiv:2308.05745v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2308.05745
Code URL: null
Copy Paste: [[2308.05745] Iterative Reweighted Least Squares Networks With Convergence Guarantees for Solving Inverse Imaging Problems](http://arxiv.org/abs/2308.05745) #memory
Summary:
In this work we present a novel optimization strategy for image reconstruction tasks under analysis-based image regularization, which promotes sparse and/or low-rank solutions in some learned transform domain. We parameterize such regularizers using potential functions that correspond to weighted extensions of the $\ell_p^p$-vector and $\mathcal{S}_p^p$ Schatten-matrix quasi-norms with $0 < p \le 1$. Our proposed minimization strategy extends the Iteratively Reweighted Least Squares (IRLS) method, typically used for synthesis-based $\ell_p$ and $\mathcal{S}_p$ norm and analysis-based $\ell_1$ and nuclear norm regularization. We prove that under mild conditions our minimization algorithm converges linearly to a stationary point, and we provide an upper bound for its convergence rate. Further, to select the parameters of the regularizers that deliver the best results for the problem at hand, we propose to learn them from training data by formulating the supervised learning process as a stochastic bilevel optimization problem. We show that thanks to the convergence guarantees of our proposed minimization strategy, such optimization can be successfully performed with a memory-efficient implicit back-propagation scheme. We implement our learned IRLS variants as recurrent networks and assess their performance on the challenging image reconstruction tasks of non-blind deblurring, super-resolution and demosaicking. The comparisons against other existing learned reconstruction approaches demonstrate that our overall method is very competitive and in many cases outperforms existing unrolled networks, whose number of parameters is orders of magnitude higher than in our case.

Title: IoT Security: On-Chip Secure Deletion Scheme using ECC Modulation in IoT Appliances. (arXiv:2308.05225v1 [cs.CR])

Paper URL: http://arxiv.org/abs/2308.05225
Code URL: null
Copy Paste: [[2308.05225] IoT Security: On-Chip Secure Deletion Scheme using ECC Modulation in IoT Appliances](http://arxiv.org/abs/2308.05225) #memory
Summary:
NAND flash memory-based IoT devices inherently suffer from data retention issues. In IoT security, these retention issues are significant and require a robust solution for secure deletion. Secure deletion methods can be categorized into off-chip and on-chip schemes. Off-chip secure deletion schemes, based on block-level erasure operations, are unable to perform real-time trim operations. Consequently, they are vulnerable to hacking threats. On the other hand, on-chip secure deletion schemes enable real-time trim operations by performing deletion on a page-by-page basis. However, the on-chip scheme introduces a challenge of program disturbance for neighboring page data. The proposed on-chip deletion scheme tackles this problem by utilizing ECC code modulation through a partial program operation. This approach significantly reduces the program disturbance issue associated with neighboring page data. Moreover, the proposed code modulation secure deletion scheme allows for real-time verification of the deletion of original data.

Title: ReLU and Addition-based Gated RNN. (arXiv:2308.05629v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2308.05629
Code URL: null
Copy Paste: [[2308.05629] ReLU and Addition-based Gated RNN](http://arxiv.org/abs/2308.05629) #memory
Summary:
We replace the multiplication and sigmoid function of the conventional recurrent gate with addition and ReLU activation. This mechanism is designed to maintain long-term memory for sequence processing but at a reduced computational cost, thereby opening up for more efficient execution or larger models on restricted hardware. Recurrent Neural Networks (RNNs) with gating mechanisms such as LSTM and GRU have been widely successful in learning from sequential data due to their ability to capture long-term dependencies. Conventionally, the update based on current inputs and the previous state history is each multiplied with dynamic weights and combined to compute the next state. However, multiplication can be computationally expensive, especially for certain hardware architectures or alternative arithmetic systems such as homomorphic encryption. It is demonstrated that the novel gating mechanism can capture long-term dependencies for a standard synthetic sequence learning task while significantly reducing computational costs such that execution time is reduced by half on CPU and by one-third under encryption. Experimental results on handwritten text recognition tasks furthermore show that the proposed architecture can be trained to achieve comparable accuracy to conventional GRU and LSTM baselines. The gating mechanism introduced in this paper may enable privacy-preserving AI applications operating under homomorphic encryption by avoiding the multiplication of encrypted variables. It can also support quantization in (unencrypted) plaintext applications, with the potential for substantial performance gains since the addition-based formulation can avoid the expansion to double precision often required for multiplication.

few-shot

Title: Few-Shot Data-to-Text Generation via Unified Representation and Multi-Source Learning. (arXiv:2308.05317v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2308.05317
Code URL: null
Copy Paste: [[2308.05317] Few-Shot Data-to-Text Generation via Unified Representation and Multi-Source Learning](http://arxiv.org/abs/2308.05317) #few-shot
Summary:
We present a novel approach for structured data-to-text generation that addresses the limitations of existing methods that primarily focus on specific types of structured data. Our proposed method aims to improve performance in multi-task training, zero-shot and few-shot scenarios by providing a unified representation that can handle various forms of structured data such as tables, knowledge graph triples, and meaning representations. We demonstrate that our proposed approach can effectively adapt to new structured forms, and can improve performance in comparison to current methods. For example, our method resulted in a 66% improvement in zero-shot BLEU scores when transferring models trained on table inputs to a knowledge graph dataset. Our proposed method is an important step towards a more general data-to-text generation framework.

Title: Cross-heterogeneity Graph Few-shot Learning. (arXiv:2308.05275v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2308.05275
Code URL: null
Copy Paste: [[2308.05275] Cross-heterogeneity Graph Few-shot Learning](http://arxiv.org/abs/2308.05275) #few-shot
Summary:
In recent years, heterogeneous graph few-shot learning has been proposed to address the label sparsity issue in heterogeneous graphs (HGs), which contain various types of nodes and edges. The existing methods have achieved good performance by transferring generalized knowledge extracted from rich-labeled classes in source HG(s) to few-labeled classes in a target HG. However, these methods only consider the single-heterogeneity scenario where the source and target HGs share a fixed set of node/edge types, ignoring the more general scenario of cross-heterogeneity, where each HG can have a different and non-fixed set of node/edge types. To this end, we focus on the unexplored cross-heterogeneity scenario and propose a novel model for Cross-heterogeneity Graph Few-shot Learning, namely CGFL. In CGFL, we first extract meta-patterns to capture heterogeneous information and propose a multi-view heterogeneous graph neural network (MHGN) to learn meta-patterns across HGs. Then, we propose a score module to measure the informativeness of labeled samples and determine the transferability of each source HG. Finally, by integrating MHGN and the score module into a meta-learning mechanism, CGFL can effectively transfer generalized knowledge to predict new classes with few-labeled data. Extensive experiments on four real-world datasets have demonstrated the superior performance of CGFL over the state-of-the-art methods.