[[2207.11720] Progressive Feature Learning for Realistic Cloth-Changing Gait Recognition](http://arxiv.org/abs/2207.11720)
Gait recognition is instrumental in crime prevention and social security, for it can be conducted at a long distance without the cooperation of subjects. However, existing datasets and methods cannot deal with the most challenging problem in realistic gait recognition effectively: walking in different clothes (CL). In order to tackle this problem, we propose two benchmarks: CASIA-BN-RCC and OUMVLP-RCC, to simulate the cloth-changing condition in practice. The two benchmarks can force the algorithm to realize cross-view and cross-cloth with two sub-datasets. Furthermore, we propose a new framework that can be applied with off-the-shelf backbones to improve its performance in the Realistic Cloth-Changing problem with Progressive Feature Learning. Specifically, in our framework, we design Progressive Mapping and Progressive Uncertainty to extract the cross-view features and then extract cross-cloth features on the basis. In this way, the features from the cross-view sub-dataset can first dominate the feature space and relieve the uneven distribution caused by the adverse effect from the cross-cloth sub-dataset. The experiments on our benchmarks show that our framework can effectively improve the recognition performance in CL conditions. Our codes and datasets will be released after accepted.
[[2207.11306] Security policy audits: why and how](http://arxiv.org/abs/2207.11306)
Information security isn't just about software and hardware -- it's at least as much about policies and processes. But the research community overwhelmingly focuses on the former over the latter, while gaping policy and process problems persist. In this experience paper, we describe a series of security policy audits that we conducted, exposing policy flaws affecting billions of users that can be -- and often are -- exploited by low-tech attackers who don't need to use any tools or exploit software vulnerabilities. The solutions, in turn, need to be policy-based. We advocate for the study of policies and processes, point out its intellectual and practical challenges, lay out our theory of change, and present a research agenda.
[[2207.11519] Bandwidth-Hard Functions from Random Permutations](http://arxiv.org/abs/2207.11519)
ASIC hash engines are specifically optimized for parallel computations of cryptographic hashes and thus a natural environment for mounting brute-force attacks on hash functions. Two fundamental advantages of ASICs over general purpose computers are the area advantage and the energy efficiency. The memory-hard functions approach the problem by reducing the area advantage of ASICs compared to general-purpose computers. Traditionally, memory-hard functions have been analyzed in the (parallel) random oracle model. However, as the memory-hard security game is multi-stage, indifferentiability does not apply and instantiating the random oracle becomes a non-trivial problem. Chen and Tessaro (CRYPTO 2019) considered this issue and showed how random oracles should be instantiated in the context of memory-hard functions. The Bandwidth-Hard functions, introduced by Ren and Devadas (TCC 2017), aim to provide ASIC resistance by reducing the energy advantage of ASICs. In particular, bandwidth-hard functions provide ASIC resistance by guaranteeing high run time energy cost if the available cache is not large enough. Previously, bandwidth-hard functions have been analyzed in the parallel random oracle model. In this work, we show how those random oracles can be instantiated using random permutations in the context of bandwidth-hard functions. Our results are generic and valid for any hard-to-pebble graphs.
[[2207.11610] Will You Trust This TLS Certificate? Perceptions of People Working in IT (Extended Version)](http://arxiv.org/abs/2207.11610)
Flawed TLS certificates are not uncommon on the Internet. While they signal a potential issue, in most cases they have benign causes (e.g., misconfiguration or even deliberate deployment). This adds fuzziness to the decision on whether to trust a connection or not. Little is known about perceptions of flawed certificates by IT professionals, even though their decisions impact high numbers of end users. Moreover, it is unclear how much the content of error messages and documentation influences these perceptions. To shed light on these issues, we observed 75 attendees of an industrial IT conference investigating different certificate validation errors. We also analyzed the influence of reworded error messages and redesigned documentation. We find that people working in IT have very nuanced opinions, with trust decisions being far from binary. The self-signed and the name-constrained certificates seem to be over-trusted (the latter also being poorly understood). We show that even small changes in existing error messages can positively influence resource use, comprehension, and trust assessment. At the end of the article, we summarize lessons learned from conducting usable security studies with IT professionals.
[[2207.11615] SyncPCN/PSyncPCN: Payment Channel Networks without Blockchain Synchrony](http://arxiv.org/abs/2207.11615)
Payment channel networks (PCNs) enhance the scalability of blockchains by allowing parties to conduct transactions off-chain, i.e, without broadcasting every transaction to all blockchain participants. To conduct transactions, a sender and a receiver can either establish a direct payment channel with a funding blockchain transaction or leverage existing channels in a multi-hop payment. The security of PCNs usually relies on the synchrony of the underlying blockchain, i.e., evidence of misbehavior needs to be published on the blockchain within a time limit. Alternative payment channel proposals that do not require blockchain synchrony rely on quorum certificates and use a committee to register the transactions of a channel. However, these proposals do not support multi-hop payments, a limitation we aim to overcome. In this paper, we demonstrate that it is in fact impossible to design a multi-hop payment protocol with both network asynchrony and faulty channels, i.e., channels that may not correctly follow the protocol. We then detail two committee-based multi-hop payment protocols that respectively assume synchronous communications and possibly faulty channels, or asynchronous communication and correct channels. The first protocol relies on possibly faulty committees instead of the blockchain to resolve channel disputes, and enforces privacy properties within a synchronous network. The second one relies on committees that contain at most f faulty members out of 3f+1 and successively delegate to each other the role of eventually completing a multi-hop payment. We show that both protocols satisfy the security requirements of a multi-hop payment and compare their communication complexity and latency.
[[2207.11577] Augmented Bilinear Network for Incremental Multi-Stock Time-Series Classification](http://arxiv.org/abs/2207.11577)
Deep Learning models have become dominant in tackling financial time-series analysis problems, overturning conventional machine learning and statistical methods. Most often, a model trained for one market or security cannot be directly applied to another market or security due to differences inherent in the market conditions. In addition, as the market evolves through time, it is necessary to update the existing models or train new ones when new data is made available. This scenario, which is inherent in most financial forecasting applications, naturally raises the following research question: How to efficiently adapt a pre-trained model to a new set of data while retaining performance on the old data, especially when the old data is not accessible? In this paper, we propose a method to efficiently retain the knowledge available in a neural network pre-trained on a set of securities and adapt it to achieve high performance in new ones. In our method, the prior knowledge encoded in a pre-trained neural network is maintained by keeping existing connections fixed, and this knowledge is adjusted for the new securities by a set of augmented connections, which are optimized using the new data. The auxiliary connections are constrained to be of low rank. This not only allows us to rapidly optimize for the new task but also reduces the storage and run-time complexity during the deployment phase. The efficiency of our approach is empirically validated in the stock mid-price movement prediction problem using a large-scale limit order book dataset. Experimental results show that our approach enhances prediction performance as well as reduces the overall number of network parameters.
[[2207.11325] PieTrack: An MOT solution based on synthetic data training and self-supervised domain adaptation](http://arxiv.org/abs/2207.11325)
In order to cope with the increasing demand for labeling data and privacy issues with human detection, synthetic data has been used as a substitute and showing promising results in human detection and tracking tasks. We participate in the 7th Workshop on Benchmarking Multi-Target Tracking (BMTT), themed on "How Far Can Synthetic Data Take us"? Our solution, PieTrack, is developed based on synthetic data without using any pre-trained weights. We propose a self-supervised domain adaptation method that enables mitigating the domain shift issue between the synthetic (e.g., MOTSynth) and real data (e.g., MOT17) without involving extra human labels. By leveraging the proposed multi-scale ensemble inference, we achieved a final HOTA score of 58.7 on the MOT17 testing set, ranked third place in the challenge.
[[2207.11677] Learnable Privacy-Preserving Anonymization for Pedestrian Images](http://arxiv.org/abs/2207.11677)
This paper studies a novel privacy-preserving anonymization problem for pedestrian images, which preserves personal identity information (PII) for authorized models and prevents PII from being recognized by third parties. Conventional anonymization methods unavoidably cause semantic information loss, leading to limited data utility. Besides, existing learned anonymization techniques, while retaining various identity-irrelevant utilities, will change the pedestrian identity, and thus are unsuitable for training robust re-identification models. To explore the privacy-utility trade-off for pedestrian images, we propose a joint learning reversible anonymization framework, which can reversibly generate full-body anonymous images with little performance drop on person re-identification tasks. The core idea is that we adopt desensitized images generated by conventional methods as the initial privacy-preserving supervision and jointly train an anonymization encoder with a recovery decoder and an identity-invariant model. We further propose a progressive training strategy to improve the performance, which iteratively upgrades the initial anonymization supervision. Experiments further demonstrate the effectiveness of our anonymized pedestrian images for privacy protection, which boosts the re-identification performance while preserving privacy. Code is available at \url{https://github.com/whuzjw/privacy-reid}.
[[2207.11500] Catch Me If You Can: Deceiving Stance Detection and Geotagging Models to Protect Privacy of Individuals on Twitter](http://arxiv.org/abs/2207.11500)
The recent advances in natural language processing have yielded many exciting developments in text analysis and language understanding models; however, these models can also be used to track people, bringing severe privacy concerns. In this work, we investigate what individuals can do to avoid being detected by those models while using social media platforms. We ground our investigation in two exposure-risky tasks, stance detection and geotagging. We explore a variety of simple techniques for modifying text, such as inserting typos in salient words, paraphrasing, and adding dummy social media posts. Our experiments show that the performance of BERT-based models fined tuned for stance detection decreases significantly due to typos, but it is not affected by paraphrasing. Moreover, we find that typos have minimal impact on state-of-the-art geotagging models due to their increased reliance on social networks; however, we show that users can deceive those models by interacting with different users, reducing their performance by almost 50%.
[[2207.11788] Privacy Against Inference Attacks in Vertical Federated Learning](http://arxiv.org/abs/2207.11788)
Vertical federated learning is considered, where an active party, having access to true class labels, wishes to build a classification model by utilizing more features from a passive party, which has no access to the labels, to improve the model accuracy. In the prediction phase, with logistic regression as the classification model, several inference attack techniques are proposed that the adversary, i.e., the active party, can employ to reconstruct the passive party's features, regarded as sensitive information. These attacks, which are mainly based on a classical notion of the center of a set, i.e., the Chebyshev center, are shown to be superior to those proposed in the literature. Moreover, several theoretical performance guarantees are provided for the aforementioned attacks. Subsequently, we consider the minimum amount of information that the adversary needs to fully reconstruct the passive party's features. In particular, it is shown that when the passive party holds one feature, and the adversary is only aware of the signs of the parameters involved, it can perfectly reconstruct that feature when the number of predictions is large enough. Next, as a defense mechanism, two privacy-preserving schemes are proposed that worsen the adversary's reconstruction attacks, while preserving the full benefits that VFL brings to the active party. Finally, experimental results demonstrate the effectiveness of the proposed attacks and the privacy-preserving schemes.
[[2207.11689] PMUSpill: The Counters in Performance Monitor Unit that Leak SGX-Protected Secrets](http://arxiv.org/abs/2207.11689)
Performance Monitor Unit (PMU) is a significant hardware module on the current processors, which counts the events launched by processor into a set of PMU counters. Ideally, the events triggered by instructions that are executed but the results are not successfully committed (transient execution) should not be recorded. However, in this study, we discover that some PMU events triggered by the transient execution instructions will actually be recorded by PMU. Based on this, we propose the PMUSpill attack, which enables attackers to maliciously leak the secret data that are loaded during transient executions. The biggest challenge is how to encode the secret data into PMU events. We construct an instruction gadget to solve this challenge, whose execution path that can be identified by PMU counters represents what values the secret data are. We successfully implement the PMUSpill attack to leak the secret data stored in Intel Software Guard Extensions (SGX) (a Trusted Execution Environment (TEE) in the Intel's processors) through real experiments. Besides, we locate the vulnerable PMU counters and their trigger instructions by iterating all the valid PMU counters and instructions. The experiment results demonstrate that there are up to 20 PMU counters available to implement the PMUSpill attack. We also provide some possible hardware and software-based countermeasures for addressing the PMUSpill attack, which can be utilized to enhance the security of processors in future.
[[2207.11694] Proving Common Mechanisms Shared by Twelve Methods of Boosting Adversarial Transferability](http://arxiv.org/abs/2207.11694)
Although many methods have been proposed to enhance the transferability of adversarial perturbations, these methods are designed in a heuristic manner, and the essential mechanism for improving adversarial transferability is still unclear. This paper summarizes the common mechanism shared by twelve previous transferability-boosting methods in a unified view, i.e., these methods all reduce game-theoretic interactions between regional adversarial perturbations. To this end, we focus on the attacking utility of all interactions between regional adversarial perturbations, and we first discover and prove the negative correlation between the adversarial transferability and the attacking utility of interactions. Based on this discovery, we theoretically prove and empirically verify that twelve previous transferability-boosting methods all reduce interactions between regional adversarial perturbations. More crucially, we consider the reduction of interactions as the essential reason for the enhancement of adversarial transferability. Furthermore, we design the interaction loss to directly penalize interactions between regional adversarial perturbations during attacking. Experimental results show that the interaction loss significantly improves the transferability of adversarial perturbations.
[[2207.11465] Distributed Nonlinear State Estimation in Electric Power Systems using Graph Neural Networks](http://arxiv.org/abs/2207.11465)
Nonlinear state estimation (SE), with the goal of estimating complex bus voltages based on all types of measurements available in the power system, is usually solved using the iterative Gauss-Newton method. The nonlinear SE presents some difficulties when considering inputs from both phasor measurement units and supervisory control and data acquisition system. These include numerical instabilities, convergence time depending on the starting point of the iterative method, and the quadratic computational complexity of a single iteration regarding the number of state variables. This paper introduces an original graph neural network based SE implementation over the augmented factor graph of the nonlinear power system SE, capable of incorporating measurements on both branches and buses, as well as both phasor and legacy measurements. The proposed regression model has linear computational complexity during the inference time once trained, with a possibility of distributed implementation. Since the method is noniterative and non-matrix-based, it is resilient to the problems that the Gauss-Newton solver is prone to. Aside from prediction accuracy on the test set, the proposed model demonstrates robustness when simulating cyber attacks and unobservable scenarios due to communication irregularities. In those cases, prediction errors are sustained locally, with no effect on the rest of the power system's results.
[[2207.11250] Rich Feature Distillation with Feature Affinity Module for Efficient Image Dehazing](http://arxiv.org/abs/2207.11250)
Single-image haze removal is a long-standing hurdle for computer vision applications. Several works have been focused on transferring advances from image classification, detection, and segmentation to the niche of image dehazing, primarily focusing on contrastive learning and knowledge distillation. However, these approaches prove computationally expensive, raising concern regarding their applicability to on-the-edge use-cases. This work introduces a simple, lightweight, and efficient framework for single-image haze removal, exploiting rich "dark-knowledge" information from a lightweight pre-trained super-resolution model via the notion of heterogeneous knowledge distillation. We designed a feature affinity module to maximize the flow of rich feature semantics from the super-resolution teacher to the student dehazing network. In order to evaluate the efficacy of our proposed framework, its performance as a plug-and-play setup to a baseline model is examined. Our experiments are carried out on the RESIDE-Standard dataset to demonstrate the robustness of our framework to the synthetic and real-world domains. The extensive qualitative and quantitative results provided establish the effectiveness of the framework, achieving gains of upto 15\% (PSNR) while reducing the model size by $\sim$20 times.
[[2207.11341] Dynamic Graph Reasoning for Multi-person 3D Pose Estimation](http://arxiv.org/abs/2207.11341)
Multi-person 3D pose estimation is a challenging task because of occlusion and depth ambiguity, especially in the cases of crowd scenes. To solve these problems, most existing methods explore modeling body context cues by enhancing feature representation with graph neural networks or adding structural constraints. However, these methods are not robust for their single-root formulation that decoding 3D poses from a root node with a pre-defined graph. In this paper, we propose GR-M3D, which models the \textbf{M}ulti-person \textbf{3D} pose estimation with dynamic \textbf{G}raph \textbf{R}easoning. The decoding graph in GR-M3D is predicted instead of pre-defined. In particular, It firstly generates several data maps and enhances them with a scale and depth aware refinement module (SDAR). Then multiple root keypoints and dense decoding paths for each person are estimated from these data maps. Based on them, dynamic decoding graphs are built by assigning path weights to the decoding paths, while the path weights are inferred from those enhanced data maps. And this process is named dynamic graph reasoning (DGR). Finally, the 3D poses are decoded according to dynamic decoding graphs for each detected person. GR-M3D can adjust the structure of the decoding graph implicitly by adopting soft path weights according to input data, which makes the decoding graphs be adaptive to different input persons to the best extent and more capable of handling occlusion and depth ambiguity than previous methods. We empirically show that the proposed bottom-up approach even outperforms top-down methods and achieves state-of-the-art results on three 3D pose datasets.
[[2207.11347] An Impartial Take to the CNN vs Transformer Robustness Contest](http://arxiv.org/abs/2207.11347)
Following the surge of popularity of Transformers in Computer Vision, several studies have attempted to determine whether they could be more robust to distribution shifts and provide better uncertainty estimates than Convolutional Neural Networks (CNNs). The almost unanimous conclusion is that they are, and it is often conjectured more or less explicitly that the reason of this supposed superiority is to be attributed to the self-attention mechanism. In this paper we perform extensive empirical analyses showing that recent state-of-the-art CNNs (particularly, ConvNeXt) can be as robust and reliable or even sometimes more than the current state-of-the-art Transformers. However, there is no clear winner. Therefore, although it is tempting to state the definitive superiority of one family of architectures over another, they seem to enjoy similar extraordinary performances on a variety of tasks while also suffering from similar vulnerabilities such as texture, background, and simplicity biases.
[[2207.11378] Do Perceptually Aligned Gradients Imply Adversarial Robustness?](http://arxiv.org/abs/2207.11378)
In the past decade, deep learning-based networks have achieved unprecedented success in numerous tasks, including image classification. Despite this remarkable achievement, recent studies have demonstrated that such networks are easily fooled by small malicious perturbations, also known as adversarial examples. This security weakness led to extensive research aimed at obtaining robust models. Beyond the clear robustness benefits of such models, it was also observed that their gradients with respect to the input align with human perception. Several works have identified Perceptually Aligned Gradients (PAG) as a byproduct of robust training, but none have considered it as a standalone phenomenon nor studied its own implications. In this work, we focus on this trait and test whether Perceptually Aligned Gradients imply Robustness. To this end, we develop a novel objective to directly promote PAG in training classifiers and examine whether models with such gradients are more robust to adversarial attacks. Extensive experiments on CIFAR-10 and STL validate that such models have improved robust performance, exposing the surprising bidirectional connection between PAG and robustness.
[[2207.11396] Orientation and Context Entangled Network for Retinal Vessel Segmentation](http://arxiv.org/abs/2207.11396)
Most of the existing deep learning based methods for vessel segmentation neglect two important aspects of retinal vessels, one is the orientation information of vessels, and the other is the contextual information of the whole fundus region. In this paper, we propose a robust Orientation and Context Entangled Network (denoted as OCE-Net), which has the capability of extracting complex orientation and context information of the blood vessels. To achieve complex orientation aware, a Dynamic Complex Orientation Aware Convolution (DCOA Conv) is proposed to extract complex vessels with multiple orientations for improving the vessel continuity. To simultaneously capture the global context information and emphasize the important local information, a Global and Local Fusion Module (GLFM) is developed to simultaneously model the long-range dependency of vessels and focus sufficient attention on local thin vessels. A novel Orientation and Context Entangled Non-local (OCE-NL) module is proposed to entangle the orientation and context information together. In addition, an Unbalanced Attention Refining Module (UARM) is proposed to deal with the unbalanced pixel numbers of background, thick and thin vessels. Extensive experiments were performed on several commonly used datasets (DRIVE, STARE and CHASEDB1) and some more challenging datasets (AV-WIDE, UoA-DR, RFMiD and UK Biobank). The ablation study shows that the proposed method achieves promising performance on maintaining the continuity of thin vessels and the comparative experiments demonstrate that our OCE-Net can achieve state-of-the-art performance on retinal vessel segmentation.
[[2207.11484] GraphFit: Learning Multi-scale Graph-Convolutional Representation for Point Cloud Normal Estimation](http://arxiv.org/abs/2207.11484)
We propose a precise and efficient normal estimation method that can deal with noise and nonuniform density for unstructured 3D point clouds. Unlike existing approaches that directly take patches and ignore the local neighborhood relationships, which make them susceptible to challenging regions such as sharp edges, we propose to learn graph convolutional feature representation for normal estimation, which emphasizes more local neighborhood geometry and effectively encodes intrinsic relationships. Additionally, we design a novel adaptive module based on the attention mechanism to integrate point features with their neighboring features, hence further enhancing the robustness of the proposed normal estimator against point density variations. To make it more distinguishable, we introduce a multi-scale architecture in the graph block to learn richer geometric features. Our method outperforms competitors with the state-of-the-art accuracy on various benchmark datasets, and is quite robust against noise, outliers, as well as the density variations.
[[2207.11514] Semantic Abstraction: Open-World 3D Scene Understanding from 2D Vision-Language Models](http://arxiv.org/abs/2207.11514)
We study open-world 3D scene understanding, a family of tasks that require agents to reason about their 3D environment with an open-set vocabulary and out-of-domain visual inputs - a critical skill for robots to operate in the unstructured 3D world. Towards this end, we propose Semantic Abstraction (SemAbs), a framework that equips 2D Vision-Language Models (VLMs) with new 3D spatial capabilities, while maintaining their zero-shot robustness. We achieve this abstraction using relevancy maps extracted from CLIP, and learn 3D spatial and geometric reasoning skills on top of those abstractions in a semantic-agnostic manner. We demonstrate the usefulness of SemAbs on two open-world 3D scene understanding tasks: 1) completing partially observed objects and 2) localizing hidden objects from language descriptions. Experiments show that SemAbs can generalize to novel vocabulary, materials/lighting, classes, and domains (i.e., real-world scans) from training on limited 3D synthetic data. Code and data will be available at https://semantic-abstraction.cs.columbia.edu/
[[2207.11617] Face Deblurring using Dual Camera Fusion on Mobile Phones](http://arxiv.org/abs/2207.11617)
Motion blur of fast-moving subjects is a longstanding problem in photography and very common on mobile phones due to limited light collection efficiency, particularly in low-light conditions. While we have witnessed great progress in image deblurring in recent years, most methods require significant computational power and have limitations in processing high-resolution photos with severe local motions. To this end, we develop a novel face deblurring system based on the dual camera fusion technique for mobile phones. The system detects subject motion to dynamically enable a reference camera, e.g., ultrawide angle camera commonly available on recent premium phones, and captures an auxiliary photo with faster shutter settings. While the main shot is low noise but blurry, the reference shot is sharp but noisy. We learn ML models to align and fuse these two shots and output a clear photo without motion blur. Our algorithm runs efficiently on Google Pixel 6, which takes 463 ms overhead per shot. Our experiments demonstrate the advantage and robustness of our system against alternative single-image, multi-frame, face-specific, and video deblurring algorithms as well as commercial products. To the best of our knowledge, our work is the first mobile solution for face motion deblurring that works reliably and robustly over thousands of images in diverse motion and lighting conditions.
[[2207.11643] Robust Scene Inference under Noise-Blur Dual Corruptions](http://arxiv.org/abs/2207.11643)
Scene inference under low-light is a challenging problem due to severe noise in the captured images. One way to reduce noise is to use longer exposure during the capture. However, in the presence of motion (scene or camera motion), longer exposures lead to motion blur, resulting in loss of image information. This creates a trade-off between these two kinds of image degradations: motion blur (due to long exposure) vs. noise (due to short exposure), also referred as a dual image corruption pair in this paper. With the rise of cameras capable of capturing multiple exposures of the same scene simultaneously, it is possible to overcome this trade-off. Our key observation is that although the amount and nature of degradation varies for these different image captures, the semantic content remains the same across all images. To this end, we propose a method to leverage these multi exposure captures for robust inference under low-light and motion. Our method builds on a feature consistency loss to encourage similar results from these individual captures, and uses the ensemble of their final predictions for robust visual recognition. We demonstrate the effectiveness of our approach on simulated images as well as real captures with multiple exposures, and across the tasks of object detection and image classification.
[[2207.11659] Improved Regularization of Event-based Learning by Reversing and Drifting](http://arxiv.org/abs/2207.11659)
Event camera has an enormous potential in challenging scenes for its advantages of high temporal resolution, high dynamic range, low power consumption, and no motion blur. However, event-based learning is hindered by insufficient generalization ability. In this paper, we first analyze the influence of different brightness variations on event data. Then we propose two novel augmentation methods: EventReverse and EventDrift. By reversing and drifting events to their corresponding positions in the spatiotemporal or polarity domain, the proposed methods generate samples affected by different brightness variations, which improves the robustness of event-based learning and results in a better generalization. Extensive experiments on N-CARS, N-Caltech101 and CIFAR10-DVS datasets demonstrate that our method is general and remarkably effective.
[[2207.11727] Can we achieve robustness from data alone?](http://arxiv.org/abs/2207.11727)
Adversarial training and its variants have come to be the prevailing methods to achieve adversarially robust classification using neural networks. However, its increased computational cost together with the significant gap between standard and robust performance hinder progress and beg the question of whether we can do better. In this work, we take a step back and ask: Can models achieve robustness via standard training on a suitably optimized set? To this end, we devise a meta-learning method for robust classification, that optimizes the dataset prior to its deployment in a principled way, and aims to effectively remove the non-robust parts of the data. We cast our optimization method as a multi-step PGD procedure on kernel regression, with a class of kernels that describe infinitely wide neural nets (Neural Tangent Kernels - NTKs). Experiments on MNIST and CIFAR-10 demonstrate that the datasets we produce enjoy very high robustness against PGD attacks, when deployed in both kernel regression classifiers and neural networks. However, this robustness is somewhat fallacious, as alternative attacks manage to fool the models, which we find to be the case for previous similar works in the literature as well. We discuss potential reasons for this and outline further avenues of research.
[[2207.11795] Cross-Modal 3D Shape Generation and Manipulation](http://arxiv.org/abs/2207.11795)
Creating and editing the shape and color of 3D objects require tremendous human effort and expertise. Compared to direct manipulation in 3D interfaces, 2D interactions such as sketches and scribbles are usually much more natural and intuitive for the users. In this paper, we propose a generic multi-modal generative model that couples the 2D modalities and implicit 3D representations through shared latent spaces. With the proposed model, versatile 3D generation and manipulation are enabled by simply propagating the editing from a specific 2D controlling modality through the latent spaces. For example, editing the 3D shape by drawing a sketch, re-colorizing the 3D surface via painting color scribbles on the 2D rendering, or generating 3D shapes of a certain category given one or a few reference images. Unlike prior works, our model does not require re-training or fine-tuning per editing task and is also conceptually simple, easy to implement, robust to input domain shifts, and flexible to diverse reconstruction on partial 2D inputs. We evaluate our framework on two representative 2D modalities of grayscale line sketches and rendered color images, and demonstrate that our method enables various shape manipulation and generation tasks with these 2D modalities.
[[2207.11562] Better Reasoning Behind Classification Predictions with BERT for Fake News Detection](http://arxiv.org/abs/2207.11562)
Fake news detection has become a major task to solve as there has been an increasing number of fake news on the internet in recent years. Although many classification models have been proposed based on statistical learning methods showing good results, reasoning behind the classification performances may not be enough. In the self-supervised learning studies, it has been highlighted that a quality of representation (embedding) space matters and directly affects a downstream task performance. In this study, a quality of the representation space is analyzed visually and analytically in terms of linear separability for different classes on a real and fake news dataset. To further add interpretability to a classification model, a modification of Class Activation Mapping (CAM) is proposed. The modified CAM provides a CAM score for each word token, where the CAM score on a word token denotes a level of focus on that word token to make the prediction. Finally, it is shown that the naive BERT model topped with a learnable linear layer is enough to achieve robust performance while being compatible with CAM.
[[2207.11697] Improving Mandarin Speech Recogntion with Block-augmented Transformer](http://arxiv.org/abs/2207.11697)
Recently Convolution-augmented Transformer (Conformer) has shown promising results in Automatic Speech Recognition (ASR), outperforming the previous best published Transformer Transducer. In this work, we believe that the output information of each block in the encoder and decoder is not completely inclusive, in other words, their output information may be complementary. We study how to take advantage of the complementary information of each block in a parameter-efficient way, and it is expected that this may lead to more robust performance. Therefore we propose the Block-augmented Transformer for speech recognition, named Blockformer. We have implemented two block ensemble methods: the base Weighted Sum of the Blocks Output (Base-WSBO), and the Squeeze-and-Excitation module to Weighted Sum of the Blocks Output (SE-WSBO). Experiments have proved that the Blockformer significantly outperforms the state-of-the-art Conformer-based models on AISHELL-1, our model achieves a CER of 4.35\% without using a language model and 4.10\% with an external language model on the testset.
[[2207.11657] FileInsurer: A Scalable and Reliable Protocol for Decentralized File Storage in Blockchain](http://arxiv.org/abs/2207.11657)
With the development of blockchain applications, the requirements for file storage in blockchain are increasing rapidly. Many protocols, including Filecoin, Arweave, and Sia, have been proposed to provide scalable decentralized file storage for blockchain applications. However, the reliability is not well promised by existing protocols. Inspired by the idea of insurance, we innovatively propose a decentralized file storage protocol in blockchain, named as FileInsurer, to achieve both scalability and reliability. While ensuring scalability by distributed storage, FileInsurer guarantees reliability by enhancing robustness and fully compensating for the file loss. Specifically, under mild conditions, we prove that no more than 0.1\% value of all files should be compensated even if half of the storage collapses. Therefore, only a relatively small deposit needs to be pledged by storage providers to cover the potential file loss. Because of lower burdens of deposit, storage providers have more incentives to participate in the storage network. FileInsurer can run in the top layer of the InterPlanetary File System (IPFS), and thus it can be directly applied in Web 3.0, Non-Fungible Tokens, and Metaverse.
[[2207.11290] TRUST-LAPSE: An Explainable & Actionable Mistrust Scoring Framework for Model Monitoring](http://arxiv.org/abs/2207.11290)
Continuous monitoring of trained ML models to determine when their predictions should and should not be trusted is essential for their safe deployment. Such a framework ought to be high-performing, explainable, post-hoc and actionable. We propose TRUST-LAPSE, a "mistrust" scoring framework for continuous model monitoring. We assess the trustworthiness of each input sample's model prediction using a sequence of latent-space embeddings. Specifically, (a) our latent-space mistrust score estimates mistrust using distance metrics (Mahalanobis distance) and similarity metrics (cosine similarity) in the latent-space and (b) our sequential mistrust score determines deviations in correlations over the sequence of past input representations in a non-parametric, sliding-window based algorithm for actionable continuous monitoring. We evaluate TRUST-LAPSE via two downstream tasks: (1) distributionally shifted input detection and (2) data drift detection, across diverse domains -- audio & vision using public datasets and further benchmark our approach on challenging, real-world electroencephalograms (EEG) datasets for seizure detection. Our latent-space mistrust scores achieve state-of-the-art results with AUROCs of 84.1 (vision), 73.9 (audio), 77.1 (clinical EEGs), outperforming baselines by over 10 points. We expose critical failures in popular baselines that remain insensitive to input semantic content, rendering them unfit for real-world model monitoring. We show that our sequential mistrust scores achieve high drift detection rates: over 90% of the streams show < 20% error for all domains. Through extensive qualitative and quantitative evaluations, we show that our mistrust scores are more robust and provide explainability for easy adoption into practice.
[[2207.11842] $\textit{FastSVD-ML-ROM}$: A Reduced-Order Modeling Framework based on Machine Learning for Real-Time Applications](http://arxiv.org/abs/2207.11842)
Digital twins have emerged as a key technology for optimizing the performance of engineering products and systems. High-fidelity numerical simulations constitute the backbone of engineering design, providing an accurate insight into the performance of complex systems. However, large-scale, dynamic, non-linear models require significant computational resources and are prohibitive for real-time digital twin applications. To this end, reduced order models (ROMs) are employed, to approximate the high-fidelity solutions while accurately capturing the dominant aspects of the physical behavior. The present work proposes a new machine learning (ML) platform for the development of ROMs, to handle large-scale numerical problems dealing with transient nonlinear partial differential equations. Our framework, mentioned as $\textit{FastSVD-ML-ROM}$, utilizes $\textit{(i)}$ a singular value decomposition (SVD) update methodology, to compute a linear subspace of the multi-fidelity solutions during the simulation process, $\textit{(ii)}$ convolutional autoencoders for nonlinear dimensionality reduction, $\textit{(iii)}$ feed-forward neural networks to map the input parameters to the latent spaces, and $\textit{(iv)}$ long short-term memory networks to predict and forecast the dynamics of parametric solutions. The efficiency of the $\textit{FastSVD-ML-ROM}$ framework is demonstrated for a 2D linear convection-diffusion equation, the problem of fluid around a cylinder, and the 3D blood flow inside an arterial segment. The accuracy of the reconstructed results demonstrates the robustness and assesses the efficiency of the proposed approach.
[[2207.11523] Unstructured Road Segmentation using Hypercolumn based Random Forests of Local experts](http://arxiv.org/abs/2207.11523)
Monocular vision based road detection methods are mostly based on machine learning methods, relying on classification and feature extraction accuracy, and suffer from appearance, illumination and weather changes. Traditional methods introduce the predictions into conditional random fields or markov random fields models to improve the intermediate predictions based on structure. These methods are optimization based and therefore resource heavy and slow, making it unsuitable for real time applications. We propose a method to detect and segment roads with a random forest classifier of local experts with superpixel based machine-learned features. The random forest takes in machine learnt descriptors from a pre-trained convolutional neural network - VGG-16. The features are also pooled into their respective superpixels, allowing for local structure to be continuous. We compare our algorithm against Nueral Network based methods and Traditional approaches (based on Hand-crafted features), on both Structured Road (CamVid and Kitti) and Unstructured Road Datasets. Finally, we introduce a Road Scene Dataset with 1000 annotated images, and verify that our algorithm works well in non-urban and rural road scenarios.
[[2207.11433] Enhancing Document-level Relation Extraction by Entity Knowledge Injection](http://arxiv.org/abs/2207.11433)
Document-level relation extraction (RE) aims to identify the relations between entities throughout an entire document. It needs complex reasoning skills to synthesize various knowledge such as coreferences and commonsense. Large-scale knowledge graphs (KGs) contain a wealth of real-world facts, and can provide valuable knowledge to document-level RE. In this paper, we propose an entity knowledge injection framework to enhance current document-level RE models. Specifically, we introduce coreference distillation to inject coreference knowledge, endowing an RE model with the more general capability of coreference reasoning. We also employ representation reconciliation to inject factual knowledge and aggregate KG representations and document representations into a unified space. The experiments on two benchmark datasets validate the generalization of our entity knowledge injection framework and the consistent improvement to several document-level RE models.
[[2207.11528] Supporting peace negotiations in the Yemen war through machine learning](http://arxiv.org/abs/2207.11528)
Today's conflicts are becoming increasingly complex, fluid and fragmented, often involving a host of national and international actors with multiple and often divergent interests. This development poses significant challenges for conflict mediation, as mediators struggle to make sense of conflict dynamics, such as the range of conflict parties and the evolution of their political positions, the distinction between relevant and less relevant actors in peace-making, or the identification of key conflict issues and their interdependence. International peace efforts appear ill-equipped to successfully address these challenges. While technology is already being experimented with and used in a range of conflict related fields, such as conflict predicting or information gathering, less attention has been given to how technology can contribute to conflict mediation. This case study contributes to emerging research on the use of state-of-the-art machine learning technologies and techniques in conflict mediation processes. Using dialogue transcripts from peace negotiations in Yemen, this study shows how machine-learning can effectively support mediating teams by providing them with tools for knowledge management, extraction and conflict analysis. Apart from illustrating the potential of machine learning tools in conflict mediation, the paper also emphasises the importance of interdisciplinary and participatory, co-creation methodology for the development of context-sensitive and targeted tools and to ensure meaningful and responsible implementation.
[[2207.11716] A Cognitive Study on Semantic Similarity Analysis of Large Corpora: A Transformer-based Approach](http://arxiv.org/abs/2207.11716)
Semantic similarity analysis and modeling is a fundamentally acclaimed task in many pioneering applications of natural language processing today. Owing to the sensation of sequential pattern recognition, many neural networks like RNNs and LSTMs have achieved satisfactory results in semantic similarity modeling. However, these solutions are considered inefficient due to their inability to process information in a non-sequential manner, thus leading to the improper extraction of context. Transformers function as the state-of-the-art architecture due to their advantages like non-sequential data processing and self-attention. In this paper, we perform semantic similarity analysis and modeling on the U.S Patent Phrase to Phrase Matching Dataset using both traditional and transformer-based techniques. We experiment upon four different variants of the Decoding Enhanced BERT - DeBERTa and enhance its performance by performing K-Fold Cross-Validation. The experimental results demonstrate our methodology's enhanced performance compared to traditional techniques, with an average Pearson correlation score of 0.79.
[[2207.11353] A Supervised Tensor Dimension Reduction-Based Prognostics Model for Applications with Incomplete Imaging Data](http://arxiv.org/abs/2207.11353)
This paper proposes a supervised dimension reduction methodology for tensor data which has two advantages over most image-based prognostic models. First, the model does not require tensor data to be complete which expands its application to incomplete data. Second, it utilizes time-to-failure (TTF) to supervise the extraction of low-dimensional features which makes the extracted features more effective for the subsequent prognostic. Besides, an optimization algorithm is proposed for parameter estimation and closed-form solutions are derived under certain distributions.
[[2207.11382] Density-Aware Personalized Training for Risk Prediction in Imbalanced Medical Data](http://arxiv.org/abs/2207.11382)
Medical events of interest, such as mortality, often happen at a low rate in electronic medical records, as most admitted patients survive. Training models with this imbalance rate (class density discrepancy) may lead to suboptimal prediction. Traditionally this problem is addressed through ad-hoc methods such as resampling or reweighting but performance in many cases is still limited. We propose a framework for training models for this imbalance issue: 1) we first decouple the feature extraction and classification process, adjusting training batches separately for each component to mitigate bias caused by class density discrepancy; 2) we train the network with both a density-aware loss and a learnable cost matrix for misclassifications. We demonstrate our model's improved performance in real-world medical datasets (TOPCAT and MIMIC-III) to show improved AUC-ROC, AUC-PRC, Brier Skill Score compared with the baselines in the domain.
[[2207.11719] Gradient-based Bi-level Optimization for Deep Learning: A Survey](http://arxiv.org/abs/2207.11719)
Bi-level optimization, especially the gradient-based category, has been widely used in the deep learning community including hyperparameter optimization and meta-knowledge extraction. Bi-level optimization embeds one problem within another and the gradient-based category solves the outer level task by computing the hypergradient, which is much more efficient than classical methods such as the evolutionary algorithm. In this survey, we first give a formal definition of the gradient-based bi-level optimization. Secondly, we illustrate how to formulate a research problem as a bi-level optimization problem, which is of great practical use for beginners. More specifically, there are two formulations: the single-task formulation to optimize hyperparameters such as regularization parameters and the distilled data, and the multi-task formulation to extract meta knowledge such as the model initialization. With a bi-level formulation, we then discuss four bi-level optimization solvers to update the outer variable including explicit gradient update, proxy update, implicit function update, and closed-form update. Last but not least, we conclude the survey by pointing out the great potential of gradient-based bi-level optimization on science problems (AI4Science).
[[2207.11759] Spatial-Temporal Federated Learning for Lifelong Person Re-identification on Distributed Edges](http://arxiv.org/abs/2207.11759)
Data drift is a thorny challenge when deploying person re-identification (ReID) models into real-world devices, where the data distribution is significantly different from that of the training environment and keeps changing. To tackle this issue, we propose a federated spatial-temporal incremental learning approach, named FedSTIL, which leverages both lifelong learning and federated learning to continuously optimize models deployed on many distributed edge clients. Unlike previous efforts, FedSTIL aims to mine spatial-temporal correlations among the knowledge learnt from different edge clients. Specifically, the edge clients first periodically extract general representations of drifted data to optimize their local models. Then, the learnt knowledge from edge clients will be aggregated by centralized parameter server, where the knowledge will be selectively and attentively distilled from spatial- and temporal-dimension with carefully designed mechanisms. Finally, the distilled informative spatial-temporal knowledge will be sent back to correlated edge clients to further improve the recognition accuracy of each edge client with a lifelong learning method. Extensive experiments on a mixture of five real-world datasets demonstrate that our method outperforms others by nearly 4% in Rank-1 accuracy, while reducing communication cost by 62%. All implementation codes are publicly available on https://github.com/MSNLAB/Federated-Lifelong-Person-ReID
[[2207.11456] Accelerating Vertical Federated Learning](http://arxiv.org/abs/2207.11456)
Privacy, security and data governance constraints rule out a brute force process in the integration of cross-silo data, which inherits the development of the Internet of Things. Federated learning is proposed to ensure that all parties can collaboratively complete the training task while the data is not out of the local. Vertical federated learning is a specialization of federated learning for distributed features. To preserve privacy, homomorphic encryption is applied to enable encrypted operations without decryption. Nevertheless, together with a robust security guarantee, homomorphic encryption brings extra communication and computation overhead. In this paper, we analyze the current bottlenecks of vertical federated learning under homomorphic encryption comprehensively and numerically. We propose a straggler-resilient and computation-efficient accelerating system that reduces the communication overhead in heterogeneous scenarios by 65.26% at most and reduces the computation overhead caused by homomorphic encryption by 40.66% at most. Our system can improve the robustness and efficiency of the current vertical federated learning framework without loss of security.
[[2207.11447] Handling Data Heterogeneity in Federated Learning via Knowledge Fusion](http://arxiv.org/abs/2207.11447)
Federated learning (FL) supports distributed training of a global machine learning model across multiple clients with the help from a central server. The local dataset held by each client is never exchanged in FL, so the local dataset privacy is protected. Although FL is increasingly popular, data heterogeneity across different clients leads to the client model drift issue and results in model performance degradation and poor model fairness. To address the issue, we design Federated learning with global-local Knowledge Fusion (FedKF) scheme in this paper. The key idea in FedKF is to let the server return the global knowledge to be fused with the local knowledge in each training round so that the local model can be regularized towards the global optima. Thus, the client model drift issue can be mitigated. In FedKF, we first propose the active-inactive model aggregation technique that supports a precise global knowledge representation. Then, we propose a data-free knowledge distillation (KD) approach to facilitate the KD from the global model to the local model while the local model can still learn the local knowledge (embedded in the local dataset) simultaneously, thereby realizing the global-local knowledge fusion process. The theoretical analysis and intensive experiments demonstrate that FedKF achieves high model performance, high fairness, and privacy-preserving simultaneously. The project source codes will be released on GitHub after the paper review.
[[2207.11812] Federated Graph Machine Learning: A Survey of Concepts, Techniques, and Applications](http://arxiv.org/abs/2207.11812)
Graph machine learning has gained great attention in both academia and industry recently. Most of the graph machine learning models, such as Graph Neural Networks (GNNs), are trained over massive graph data. However, in many real-world scenarios, such as hospitalization prediction in healthcare systems, the graph data is usually stored at multiple data owners and cannot be directly accessed by any other parties due to privacy concerns and regulation restrictions. Federated Graph Machine Learning (FGML) is a promising solution to tackle this challenge by training graph machine learning models in a federated manner. In this survey, we conduct a comprehensive review of the literature in FGML. Specifically, we first provide a new taxonomy to divide the existing problems in FGML into two settings, namely, \emph{FL with structured data} and \emph{structured FL}. Then, we review the mainstream techniques in each setting and elaborate on how they address the challenges under FGML. In addition, we summarize the real-world applications of FGML from different domains and introduce open graph datasets and platforms adopted in FGML. Finally, we present several limitations in the existing studies with promising research directions in this field.
[[2207.11836] Federated Graph Contrastive Learning](http://arxiv.org/abs/2207.11836)
Graph learning models are critical tools for researchers to explore graph-structured data. To train a capable graph learning model, a conventional method uses sufficient training data to train a graph model on a single device. However, it is prohibitive to do so in real-world scenarios due to privacy concerns. Federated learning provides a feasible solution to address such limitations via introducing various privacy-preserving mechanisms, such as differential privacy on graph edges. Nevertheless, differential privacy in federated graph learning secures the classified information maintained in graphs. It degrades the performances of the graph learning models. In this paper, we investigate how to implement differential privacy on graph edges and observe the performances decreasing in the experiments. We also note that the differential privacy on graph edges introduces noises to perturb graph proximity, which is one of the graph augmentations in graph contrastive learning. Inspired by that, we propose to leverage the advantages of graph contrastive learning to alleviate the performance dropping caused by differential privacy. Extensive experiments are conducted with several representative graph models and widely-used datasets, showing that contrastive learning indeed alleviates the models' performance dropping caused by differential privacy.
[[2207.11345] Toward Fairness in Speech Recognition: Discovery and mitigation of performance disparities](http://arxiv.org/abs/2207.11345)
As for other forms of AI, speech recognition has recently been examined with respect to performance disparities across different user cohorts. One approach to achieve fairness in speech recognition is to (1) identify speaker cohorts that suffer from subpar performance and (2) apply fairness mitigation measures targeting the cohorts discovered. In this paper, we report on initial findings with both discovery and mitigation of performance disparities using data from a product-scale AI assistant speech recognition system. We compare cohort discovery based on geographic and demographic information to a more scalable method that groups speakers without human labels, using speaker embedding technology. For fairness mitigation, we find that oversampling of underrepresented cohorts, as well as modeling speaker cohort membership by additional input variables, reduces the gap between top- and bottom-performing cohorts, without deteriorating overall recognition accuracy.
[[2207.11385] Causal Fairness Analysis](http://arxiv.org/abs/2207.11385)
Decision-making systems based on AI and machine learning have been used throughout a wide range of real-world scenarios, including healthcare, law enforcement, education, and finance. It is no longer far-fetched to envision a future where autonomous systems will be driving entire business decisions and, more broadly, supporting large-scale decision-making infrastructure to solve society's most challenging problems. Issues of unfairness and discrimination are pervasive when decisions are being made by humans, and remain (or are potentially amplified) when decisions are made using machines with little transparency, accountability, and fairness. In this paper, we introduce a framework for \textit{causal fairness analysis} with the intent of filling in this gap, i.e., understanding, modeling, and possibly solving issues of fairness in decision-making settings. The main insight of our approach will be to link the quantification of the disparities present on the observed data with the underlying, and often unobserved, collection of causal mechanisms that generate the disparity in the first place, challenge we call the Fundamental Problem of Causal Fairness Analysis (FPCFA). In order to solve the FPCFA, we study the problem of decomposing variations and empirical measures of fairness that attribute such variations to structural mechanisms and different units of the population. Our effort culminates in the Fairness Map, which is the first systematic attempt to organize and explain the relationship between different criteria found in the literature. Finally, we study which causal assumptions are minimally needed for performing causal fairness analysis and propose a Fairness Cookbook, which allows data scientists to assess the existence of disparate impact and disparate treatment.
[[2207.11837] Inter-model Interpretability: Self-supervised Models as a Case Study](http://arxiv.org/abs/2207.11837)
Since early machine learning models, metrics such as accuracy and precision have been the de facto way to evaluate and compare trained models. However, a single metric number doesn't fully capture the similarities and differences between models, especially in the computer vision domain. A model with high accuracy on a certain dataset might provide a lower accuracy on another dataset, without any further insights. To address this problem we build on a recent interpretability technique called Dissect to introduce \textit{inter-model interpretability}, which determines how models relate or complement each other based on the visual concepts they have learned (such as objects and materials). Towards this goal, we project 13 top-performing self-supervised models into a Learned Concepts Embedding (LCE) space that reveals proximities among models from the perspective of learned concepts. We further crossed this information with the performance of these models on four computer vision tasks and 15 datasets. The experiment allowed us to categorize the models into three categories and revealed for the first time the type of visual concepts different tasks requires. This is a step forward for designing cross-task learning algorithms.
[[2207.11564] A general-purpose method for applying Explainable AI for Anomaly Detection](http://arxiv.org/abs/2207.11564)
The need for explainable AI (XAI) is well established but relatively little has been published outside of the supervised learning paradigm. This paper focuses on a principled approach to applying explainability and interpretability to the task of unsupervised anomaly detection. We argue that explainability is principally an algorithmic task and interpretability is principally a cognitive task, and draw on insights from the cognitive sciences to propose a general-purpose method for practical diagnosis using explained anomalies. We define Attribution Error, and demonstrate, using real-world labeled datasets, that our method based on Integrated Gradients (IG) yields significantly lower attribution errors than alternative methods.
[[2207.11559] Tensor-based Multi-view Spectral Clustering via Shared Latent Space](http://arxiv.org/abs/2207.11559)
Multi-view Spectral Clustering (MvSC) attracts increasing attention due to diverse data sources. However, most existing works are prohibited in out-of-sample predictions and overlook model interpretability and exploration of clustering results. In this paper, a new method for MvSC is proposed via a shared latent space from the Restricted Kernel Machine framework. Through the lens of conjugate feature duality, we cast the weighted kernel principal component analysis problem for MvSC and develop a modified weighted conjugate feature duality to formulate dual variables. In our method, the dual variables, playing the role of hidden features, are shared by all views to construct a common latent space, coupling the views by learning projections from view-specific spaces. Such single latent space promotes well-separated clusters and provides straightforward data exploration, facilitating visualization and interpretation. Our method requires only a single eigendecomposition, whose dimension is independent of the number of views. To boost higher-order correlations, tensor-based modelling is introduced without increasing computational complexity. Our method can be flexibly applied with out-of-sample extensions, enabling greatly improved efficiency for large-scale data with fixed-size kernel schemes. Numerical experiments verify that our method is effective regarding accuracy, efficiency, and interpretability, showing a sharp eigenvalue decay and distinct latent variable distributions.
[[2207.11735] AMS-Net: Adaptive Multiscale Sparse Neural Network with Interpretable Basis Expansion for Multiphase Flow Problems](http://arxiv.org/abs/2207.11735)
In this work, we propose an adaptive sparse learning algorithm that can be applied to learn the physical processes and obtain a sparse representation of the solution given a large snapshot space. Assume that there is a rich class of precomputed basis functions that can be used to approximate the quantity of interest. We then design a neural network architecture to learn the coefficients of solutions in the spaces which are spanned by these basis functions. The information of the basis functions are incorporated in the loss function, which minimizes the differences between the downscaled reduced order solutions and reference solutions at multiple time steps. The network contains multiple submodules and the solutions at different time steps can be learned simultaneously. We propose some strategies in the learning framework to identify important degrees of freedom. To find a sparse solution representation, a soft thresholding operator is applied to enforce the sparsity of the output coefficient vectors of the neural network. To avoid over-simplification and enrich the approximation space, some degrees of freedom can be added back to the system through a greedy algorithm. In both scenarios, that is, removing and adding degrees of freedom, the corresponding network connections are pruned or reactivated guided by the magnitude of the solution coefficients obtained from the network outputs. The proposed adaptive learning process is applied to some toy case examples to demonstrate that it can achieve a good basis selection and accurate approximation. More numerical tests are performed on two-phase multiscale flow problems to show the capability and interpretability of the proposed method on complicated applications.