[[2211.14442] Contract-Backed Digital Cash](http://arxiv.org/abs/2211.14442) #secure
We characterize digital cash as the digital equivalent of physical cash: secure, fungible, decentralized, directly controlled, privacy-preserving; but enhanced with qualitatively new functionality. It is extremely efficiently transferable and, most importantly, transactional or, more generally, contract-backed. This facilitates fully automated, guaranteed transactional execution of atomic resource exchanges and more complex contracts, without a multitude of intermediaries and expensive or slow semi-manual processes.
A didactic objective is separating money characteristics from technology aspects such as specific blockchain and distributed ledger systems to help disentangle discussions of digital money design from implementation techniques.
We finally discuss the power and role of programmable (contract-backed) digital money in case studies: tokenization of invoice debt using smart contracts on Ethereum, with stablecoins serving as digital money; smart contracts for disbursing payments transparently and reliably in accordance with social legislation; and a Danish e-krone for crowdfunding public and private community projets.
These contributions are made in independent chapters by participants of the Working Group on Digital Cash at Copenhagen FinTech in 2018 and 2019, which have not been published before.
Collectively, the contributions illustrate the design space and potential of digital money when powered by smart digital contracts that effectively eliminate both counterparty risk (somebody does not pay or does not deliver) and settlement risk (a trade fails and needs to be aborted) orders of magnitude faster than in current financial practice.
[[2211.14667] Deep Fake Detection, Deterrence and Response: Challenges and Opportunities](http://arxiv.org/abs/2211.14667) #security
According to the 2020 cyber threat defence report, 78% of Canadian organizations experienced at least one successful cyberattack in 2020. The consequences of such attacks vary from privacy compromises to immersing damage costs for individuals, companies, and countries. Specialists predict that the global loss from cybercrime will reach 10.5 trillion US dollars annually by
[[2211.14790] Devils in the Clouds: An Evolutionary Study of Telnet Bot Loaders](http://arxiv.org/abs/2211.14790) #security
One of the innovations brought by Mirai and its derived malware is the adoption of self-contained loaders for infecting IoT devices and recruiting them in botnets. Functionally decoupled from other botnet components and not embedded in the payload, loaders cannot be analysed using conventional approaches that rely on honeypots for capturing samples. Different approaches are necessary for studying the loaders evolution and defining a genealogy. To address the insufficient knowledge about loaders' lineage in existing studies, in this paper, we propose a semantic-aware method to measure, categorize, and compare different loader servers, with the goal of highlighting their evolution, independent from the payload evolution. Leveraging behavior-based metrics, we cluster the discovered loaders and define eight families to determine the genealogy and draw a homology map. Our study shows that the source code of Mirai is evolving and spawning new botnets with new capabilities, both on the client side and the server side. In turn, shedding light on the infection loaders can help the cybersecurity community to improve detection and prevention tools.
[[2211.14428] Utility Assessment of Synthetic Data Generation Methods](http://arxiv.org/abs/2211.14428) #privacy
Big data analysis poses the dual problem of privacy preservation and utility, i.e., how accurate data analyses remain after transforming original data in order to protect the privacy of the individuals that the data is about - and whether they are accurate enough to be meaningful. In this paper, we thus investigate across several datasets whether different methods of generating fully synthetic data vary in their utility a priori (when the specific analyses to be performed on the data are not known yet), how closely their results conform to analyses on original data a posteriori, and whether these two effects are correlated. We find some methods (decision-tree based) to perform better than others across the board, sizeable effects of some choices of imputation parameters (notably the number of released datasets), no correlation between broad utility metrics and analysis accuracy, and varying correlations for narrow metrics. We did get promising findings for classification tasks when using synthetic data for training machine learning models, which we consider worth exploring further also in terms of mitigating privacy attacks against ML models such as membership inference and model inversion.
[[2211.14669] Game Theoretic Mixed Experts for Combinational Adversarial Machine Learning](http://arxiv.org/abs/2211.14669) #defense
Recent advances in adversarial machine learning have shown that defenses considered to be robust are actually susceptible to adversarial attacks which are specifically tailored to target their weaknesses. These defenses include Barrage of Random Transforms (BaRT), Friendly Adversarial Training (FAT), Trash is Treasure (TiT) and ensemble models made up of Vision Transformers (ViTs), Big Transfer models and Spiking Neural Networks (SNNs). A natural question arises: how can one best leverage a combination of adversarial defenses to thwart such attacks? In this paper, we provide a game-theoretic framework for ensemble adversarial attacks and defenses which answers this question. In addition to our framework we produce the first adversarial defense transferability study to further motivate a need for combinational defenses utilizing a diverse set of defense architectures. Our framework is called Game theoretic Mixed Experts (GaME) and is designed to find the Mixed-Nash strategy for a defender when facing an attacker employing compositional adversarial attacks. We show that this framework creates an ensemble of defenses with greater robustness than multiple state-of-the-art, single-model defenses in addition to combinational defenses with uniform probability distributions. Overall, our framework and analyses advance the field of adversarial machine learning by yielding new insights into compositional attack and defense formulations.
[[2211.14794] Traditional Classification Neural Networks are Good Generators: They are Competitive with DDPMs and GANs](http://arxiv.org/abs/2211.14794) #attack
Classifiers and generators have long been separated. We break down this separation and showcase that conventional neural network classifiers can generate high-quality images of a large number of categories, being comparable to the state-of-the-art generative models (e.g., DDPMs and GANs). We achieve this by computing the partial derivative of the classification loss function with respect to the input to optimize the input to produce an image. Since it is widely known that directly optimizing the inputs is similar to targeted adversarial attacks incapable of generating human-meaningful images, we propose a mask-based stochastic reconstruction module to make the gradients semantic-aware to synthesize plausible images. We further propose a progressive-resolution technique to guarantee fidelity, which produces photorealistic images. Furthermore, we introduce a distance metric loss and a non-trivial distribution loss to ensure classification neural networks can synthesize diverse and high-fidelity images. Using traditional neural network classifiers, we can generate good-quality images of 256$\times$256 resolution on ImageNet. Intriguingly, our method is also applicable to text-to-image generation by regarding image-text foundation models as generalized classifiers.
Proving that classifiers have learned the data distribution and are ready for image generation has far-reaching implications, for classifiers are much easier to train than generative models like DDPMs and GANs. We don't even need to train classification models because tons of public ones are available for download. Also, this holds great potential for the interpretability and robustness of classifiers.
[[2211.14719] BadPrompt: Backdoor Attacks on Continuous Prompts](http://arxiv.org/abs/2211.14719) #attack
The prompt-based learning paradigm has gained much research attention recently. It has achieved state-of-the-art performance on several NLP tasks, especially in the few-shot scenarios. While steering the downstream tasks, few works have been reported to investigate the security problems of the prompt-based models. In this paper, we conduct the first study on the vulnerability of the continuous prompt learning algorithm to backdoor attacks. We observe that the few-shot scenarios have posed a great challenge to backdoor attacks on the prompt-based models, limiting the usability of existing NLP backdoor methods. To address this challenge, we propose BadPrompt, a lightweight and task-adaptive algorithm, to backdoor attack continuous prompts. Specially, BadPrompt first generates candidate triggers which are indicative for predicting the targeted label and dissimilar to the samples of the non-targeted labels. Then, it automatically selects the most effective and invisible trigger for each sample with an adaptive trigger optimization algorithm. We evaluate the performance of BadPrompt on five datasets and two continuous prompt models. The results exhibit the abilities of BadPrompt to effectively attack continuous prompts while maintaining high performance on the clean test sets, outperforming the baseline models by a large margin. The source code of BadPrompt is publicly available at https://github.com/papersPapers/BadPrompt.
[[2211.14440] Don't Watch Me: A Spatio-Temporal Trojan Attack on Deep-Reinforcement-Learning-Augment Autonomous Driving](http://arxiv.org/abs/2211.14440) #attack
Deep reinforcement learning (DRL) is one of the most popular algorithms to realize an autonomous driving (AD) system. The key success factor of DRL is that it embraces the perception capability of deep neural networks which, however, have been proven vulnerable to Trojan attacks. Trojan attacks have been widely explored in supervised learning (SL) tasks (e.g., image classification), but rarely in sequential decision-making tasks solved by DRL. Hence, in this paper, we explore Trojan attacks on DRL for AD tasks. First, we propose a spatio-temporal DRL algorithm based on the recurrent neural network and attention mechanism to prove that capturing spatio-temporal traffic features is the key factor to the effectiveness and safety of a DRL-augment AD system. We then design a spatial-temporal Trojan attack on DRL policies, where the trigger is hidden in a sequence of spatial and temporal traffic features, rather than a single instant state used in existing Trojan on SL and DRL tasks. With our Trojan, the adversary acts as a surrounding normal vehicle and can trigger attacks via specific spatial-temporal driving behaviors, rather than physical or wireless access. Through extensive experiments, we show that while capturing spatio-temporal traffic features can improve the performance of DRL for different AD tasks, they suffer from Trojan attacks since our designed Trojan shows high stealthy (various spatio-temporal trigger patterns), effective (less than 3.1\% performance variance rate and more than 98.5\% attack success rate), and sustainable to existing advanced defenses.
[[2211.14642] SCAPHY: Detecting Modern ICS Attacks by Correlating Behaviors in SCADA and PHYsical](http://arxiv.org/abs/2211.14642) #attack
Modern Industrial Control Systems (ICS) attacks evade existing tools by using knowledge of ICS processes to blend their activities with benign Supervisory Control and Data Acquisition (SCADA) operation, causing physical world damages. We present SCAPHY to detect ICS attacks in SCADA by leveraging the unique execution phases of SCADA to identify the limited set of legitimate behaviors to control the physical world in different phases, which differentiates from attackers activities. For example, it is typical for SCADA to setup ICS device objects during initialization, but anomalous during processcontrol. To extract unique behaviors of SCADA execution phases, SCAPHY first leverages open ICS conventions to generate a novel physical process dependency and impact graph (PDIG) to identify disruptive physical states. SCAPHY then uses PDIG to inform a physical process-aware dynamic analysis, whereby code paths of SCADA process-control execution is induced to reveal API call behaviors unique to legitimate process-control phases. Using this established behavior, SCAPHY selectively monitors attackers physical world-targeted activities that violates legitimate processcontrol behaviors. We evaluated SCAPHY at a U.S. national lab ICS testbed environment. Using diverse ICS deployment scenarios and attacks across 4 ICS industries, SCAPHY achieved 95% accuracy & 3.5% false positives (FP), compared to 47.5% accuracy and 25% FP of existing work. We analyze SCAPHYs resilience to futuristic attacks where attacker knows our approach.
[[2211.14395] Deep Learning Training Procedure Augmentations](http://arxiv.org/abs/2211.14395) #robust
Recent advances in Deep Learning have greatly improved performance on various tasks such as object detection, image segmentation, sentiment analysis. The focus of most research directions up until very recently has been on beating state-of-the-art results. This has materialized in the utilization of bigger and bigger models and techniques which help the training procedure to extract more predictive power out of a given dataset. While this has lead to great results, many of which with real-world applications, other relevant aspects of deep learning have remained neglected and unknown. In this work, we will present several novel deep learning training techniques which, while capable of offering significant performance gains they also reveal several interesting analysis results regarding convergence speed, optimization landscape smoothness, and adversarial robustness. The methods presented in this work are the following:
$\bullet$ Perfect Ordering Approximation; a generalized model agnostic curriculum learning approach. The results show the effectiveness of the technique for improving training time as well as offer some new insight into the training process of deep networks.
$\bullet$ Cascading Sum Augmentation; an extension of mixup capable of utilizing more data points for linear interpolation by leveraging a smoother optimization landscape. This can be used for computer vision tasks in order to improve both prediction performance as well as improve passive model robustness.
[[2211.14512] Residual Pattern Learning for Pixel-wise Out-of-Distribution Detection in Semantic Segmentation](http://arxiv.org/abs/2211.14512) #robust
Semantic segmentation models classify pixels into a set of known (``in-distribution'') visual classes. When deployed in an open world, the reliability of these models depends on their ability not only to classify in-distribution pixels but also to detect out-of-distribution (OoD) pixels. Historically, the poor OoD detection performance of these models has motivated the design of methods based on model re-training using synthetic training images that include OoD visual objects. Although successful, these re-trained methods have two issues: 1) their in-distribution segmentation accuracy may drop during re-training, and 2) their OoD detection accuracy does not generalise well to new contexts (e.g., country surroundings) outside the training set (e.g., city surroundings). In this paper, we mitigate these issues with: (i) a new residual pattern learning (RPL) module that assists the segmentation model to detect OoD pixels without affecting the inlier segmentation performance; and (ii) a novel context-robust contrastive learning (CoroCL) that enforces RPL to robustly detect OoD pixels among various contexts. Our approach improves by around 10\% FPR and 7\% AuPRC the previous state-of-the-art in Fishyscapes, Segment-Me-If-You-Can, and RoadAnomaly datasets. Our code is available at: https://github.com/yyliu01/RPL.
[[2211.14513] Rethinking Alignment and Uniformity in Unsupervised Image Semantic Segmentation](http://arxiv.org/abs/2211.14513) #robust
Unsupervised image semantic segmentation(UISS) aims to match low-level visual features with semantic-level representations without outer supervision. In this paper, we address the critical properties from the view of feature alignments and feature uniformity for UISS models. We also make a comparison between UISS and image-wise representation learning. Based on the analysis, we argue that the existing MI-based methods in UISS suffer from representation collapse. By this, we proposed a robust network called Semantic Attention Network(SAN), in which a new module Semantic Attention(SEAT) is proposed to generate pixel-wise and semantic features dynamically. Experimental results on multiple semantic segmentation benchmarks show that our unsupervised segmentation framework specializes in catching semantic representations, which outperforms all the unpretrained and even several pretrained methods.
[[2211.14521] Robust One-shot Segmentation of Brain Tissues via Image-aligned Style Transformation](http://arxiv.org/abs/2211.14521) #robust
One-shot segmentation of brain tissues is typically a dual-model iterative learning: a registration model (reg-model) warps a carefully-labeled atlas onto unlabeled images to initialize their pseudo masks for training a segmentation model (seg-model); the seg-model revises the pseudo masks to enhance the reg-model for a better warping in the next iteration. However, there is a key weakness in such dual-model iteration that the spatial misalignment inevitably caused by the reg-model could misguide the seg-model, which makes it converge on an inferior segmentation performance eventually. In this paper, we propose a novel image-aligned style transformation to reinforce the dual-model iterative learning for robust one-shot segmentation of brain tissues. Specifically, we first utilize the reg-model to warp the atlas onto an unlabeled image, and then employ the Fourier-based amplitude exchange with perturbation to transplant the style of the unlabeled image into the aligned atlas. This allows the subsequent seg-model to learn on the aligned and style-transferred copies of the atlas instead of unlabeled images, which naturally guarantees the correct spatial correspondence of an image-mask training pair, without sacrificing the diversity of intensity patterns carried by the unlabeled images. Furthermore, we introduce a feature-aware content consistency in addition to the image-level similarity to constrain the reg-model for a promising initialization, which avoids the collapse of image-aligned style transformation in the first iteration. Experimental results on two public datasets demonstrate 1) a competitive segmentation performance of our method compared to the fully-supervised method, and 2) a superior performance over other state-of-the-arts with an increase of average Dice by up to 4.67%. The source code is available.
[[2211.14627] Where to Pay Attention in Sparse Training for Feature Selection?](http://arxiv.org/abs/2211.14627) #robust
A new line of research for feature selection based on neural networks has recently emerged. Despite its superiority to classical methods, it requires many training iterations to converge and detect informative features. The computational time becomes prohibitively long for datasets with a large number of samples or a very high dimensional feature space. In this paper, we present a new efficient unsupervised method for feature selection based on sparse autoencoders. In particular, we propose a new sparse training algorithm that optimizes a model's sparse topology during training to pay attention to informative features quickly. The attention-based adaptation of the sparse topology enables fast detection of informative features after a few training iterations. We performed extensive experiments on 10 datasets of different types, including image, speech, text, artificial, and biological. They cover a wide range of characteristics, such as low and high-dimensional feature spaces, and few and large training samples. Our proposed approach outperforms the state-of-the-art methods in terms of selecting informative features while reducing training iterations and computational costs substantially. Moreover, the experiments show the robustness of our method in extremely noisy environments.
[[2211.14715] A Knowledge-based Learning Framework for Self-supervised Pre-training Towards Enhanced Recognition of Medical Images](http://arxiv.org/abs/2211.14715) #robust
Self-supervised pre-training has become the priory choice to establish reliable models for automated recognition of massive medical images, which are routinely annotation-free, without semantics, and without guarantee of quality. Note that this paradigm is still at its infancy and limited by closely related open issues: 1) how to learn robust representations in an unsupervised manner from unlabelled medical images of low diversity in samples? and 2) how to obtain the most significant representations demanded by a high-quality segmentation? Aiming at these issues, this study proposes a knowledge-based learning framework towards enhanced recognition of medical images, which works in three phases by synergizing contrastive learning and generative learning models: 1) Sample Space Diversification: Reconstructive proxy tasks have been enabled to embed a priori knowledge with context highlighted to diversify the expanded sample space; 2) Enhanced Representation Learning: Informative noise-contrastive estimation loss regularizes the encoder to enhance representation learning of annotation-free images; 3) Correlated Optimization: Optimization operations in pre-training the encoder and the decoder have been correlated via image restoration from proxy tasks, targeting the need for semantic segmentation. Extensive experiments have been performed on various public medical image datasets (e.g., CheXpert and DRIVE) against the state-of-the-art counterparts (e.g., SimCLR and MoCo), and results demonstrate that: The proposed framework statistically excels in self-supervised benchmarks, achieving 2.08, 1.23, 1.12, 0.76 and 1.38 percentage points improvements over SimCLR in AUC/Dice. The proposed framework achieves label-efficient semi-supervised learning, e.g., reducing the annotation cost by up to 99% in pathological classification.
[[2211.14736] Attribution-based XAI Methods in Computer Vision: A Review](http://arxiv.org/abs/2211.14736) #robust
The advancements in deep learning-based methods for visual perception tasks have seen astounding growth in the last decade, with widespread adoption in a plethora of application areas from autonomous driving to clinical decision support systems. Despite their impressive performance, these deep learning-based models remain fairly opaque in their decision-making process, making their deployment in human-critical tasks a risky endeavor. This in turn makes understanding the decisions made by these models crucial for their reliable deployment. Explainable AI (XAI) methods attempt to address this by offering explanations for such black-box deep learning methods. In this paper, we provide a comprehensive survey of attribution-based XAI methods in computer vision and review the existing literature for gradient-based, perturbation-based, and contrastive methods for XAI, and provide insights on the key challenges in developing and evaluating robust XAI methods.
[[2211.14750] Conditioning Covert Geo-Location (CGL) Detection on Semantic Class Information](http://arxiv.org/abs/2211.14750) #robust
The primary goal of artificial intelligence is to mimic humans. Therefore, to advance toward this goal, the AI community attempts to imitate qualities/skills possessed by humans and imbibes them into machines with the help of datasets/tasks. Earlier, many tasks which require knowledge about the objects present in an image are satisfactorily solved by vision models. Recently, with the aim to incorporate knowledge about non-object image regions (hideouts, turns, and other obscured regions), a task for identification of potential hideouts termed Covert Geo-Location (CGL) detection was proposed by Saha et al. It involves identification of image regions which have the potential to either cause an imminent threat or appear as target zones to be accessed for further investigation to identify any occluded objects. Only certain occluding items belonging to certain semantic classes can give rise to CGLs. This fact was overlooked by Saha et al. and no attempts were made to utilize semantic class information, which is crucial for CGL detection. In this paper, we propose a multitask-learning-based approach to achieve 2 goals - i) extraction of features having semantic class information; ii) robust training of the common encoder, exploiting large standard annotated datasets as training set for the auxiliary task (semantic segmentation). To explicitly incorporate class information in the features extracted by the encoder, we have further employed attention mechanism in a novel manner. We have also proposed a better evaluation metric for CGL detection that gives more weightage to recognition rather than precise localization. Experimental evaluations performed on the CGL dataset, demonstrate a significant increase in performance of about 3% to 14% mIoU and 3% to 16% DaR on split 1, and 1% mIoU and 1% to 2% DaR on split 2 over SOTA, serving as a testimony to the superiority of our approach.
[[2211.14411] c-TPE: Generalizing Tree-structured Parzen Estimator with Inequality Constraints for Continuous and Categorical Hyperparameter Optimization](http://arxiv.org/abs/2211.14411) #robust
Hyperparameter optimization (HPO) is crucial for strong performance of deep learning algorithms. A widely-used versatile HPO method is a variant of Bayesian optimization called tree-structured Parzen estimator (TPE), which splits data into good and bad groups and uses the density ratio of those groups as an acquisition function (AF). However, real-world applications often have some constraints, such as memory requirements, or latency. In this paper, we present an extension of TPE to constrained optimization (c-TPE) via simple factorization of AFs. The experiments demonstrate c-TPE is robust to various constraint levels and exhibits the best average rank performance among existing methods with statistical significance on search spaces with categorical parameters on 81 settings.
[[2211.14424] Supervised Contrastive Prototype Learning: Augmentation Free Robust Neural Network](http://arxiv.org/abs/2211.14424) #robust
Transformations in the input space of Deep Neural Networks (DNN) lead to unintended changes in the feature space. Almost perceptually identical inputs, such as adversarial examples, can have significantly distant feature representations. On the contrary, Out-of-Distribution (OOD) samples can have highly similar feature representations to training set samples. Our theoretical analysis for DNNs trained with a categorical classification head suggests that the inflexible logit space restricted by the classification problem size is one of the root causes for the lack of $\textit{robustness}$. Our second observation is that DNNs over-fit to the training augmentation technique and do not learn $\textit{nuance invariant}$ representations. Inspired by the recent success of prototypical and contrastive learning frameworks for both improving robustness and learning nuance invariant representations, we propose a training framework, $\textbf{Supervised Contrastive Prototype Learning}$ (SCPL). We use N-pair contrastive loss with prototypes of the same and opposite classes and replace a categorical classification head with a $\textbf{Prototype Classification Head}$ (PCH). Our approach is $\textit{sample efficient}$, does not require $\textit{sample mining}$, can be implemented on any existing DNN without modification to their architecture, and combined with other training augmentation techniques. We empirically evaluate the $\textbf{clean}$ robustness of our method on out-of-distribution and adversarial samples. Our framework outperforms other state-of-the-art contrastive and prototype learning approaches in $\textit{robustness}$.
[[2211.14701] Spatio-Temporal Meta-Graph Learning for Traffic Forecasting](http://arxiv.org/abs/2211.14701) #robust
Traffic forecasting as a canonical task of multivariate time series forecasting has been a significant research topic in AI community. To address the spatio-temporal heterogeneity and non-stationarity implied in the traffic stream, in this study, we propose Spatio-Temporal Meta-Graph Learning as a novel Graph Structure Learning mechanism on spatio-temporal data. Specifically, we implement this idea into Meta-Graph Convolutional Recurrent Network (MegaCRN) by plugging the Meta-Graph Learner powered by a Meta-Node Bank into GCRN encoder-decoder. We conduct a comprehensive evaluation on two benchmark datasets (METR-LA and PEMS-BAY) and a new large-scale traffic speed dataset in which traffic incident information is contained. Our model outperformed the state-of-the-arts to a large degree on all three datasets (over 27% MAE and 34% RMSE). Besides, through a series of qualitative evaluations, we demonstrate that our model can explicitly disentangle the road links and time slots with different patterns and be robustly adaptive to any anomalous traffic situations. Codes and datasets are available at https://github.com/deepkashiwa20/MegaCRN.
[[2211.14797] Latent SHAP: Toward Practical Human-Interpretable Explanations](http://arxiv.org/abs/2211.14797) #robust
Model agnostic feature attribution algorithms (such as SHAP and LIME) are ubiquitous techniques for explaining the decisions of complex classification models, such as deep neural networks. However, since complex classification models produce superior performance when trained on low-level (or encoded) features, in many cases, the explanations generated by these algorithms are neither interpretable nor usable by humans. Methods proposed in recent studies that support the generation of human-interpretable explanations are impractical, because they require a fully invertible transformation function that maps the model's input features to the human-interpretable features. In this work, we introduce Latent SHAP, a black-box feature attribution framework that provides human-interpretable explanations, without the requirement for a fully invertible transformation function. We demonstrate Latent SHAP's effectiveness using (1) a controlled experiment where invertible transformation functions are available, which enables robust quantitative evaluation of our method, and (2) celebrity attractiveness classification (using the CelebA dataset) where invertible transformation functions are not available, which enables thorough qualitative evaluation of our method.
[[2211.14647] Hacky Racers: Exploiting Instruction-Level Parallelism to Generate Stealthy Fine-Grained Timers](http://arxiv.org/abs/2211.14647) #steal
Side-channel attacks pose serious threats to many security models, especially sandbox-based browsers. While transient-execution side channels in out-of-order processors have previously been blamed for vulnerabilities such as Spectre and Meltdown, we show that in fact, the capability of out-of-order execution \emph{itself} to cause mayhem is far more general.
We develop Hacky Racers, a new type of timing gadget that uses instruction-level parallelism, another key feature of out-of-order execution, to measure arbitrary fine-grained timing differences, even in the presence of highly restricted JavaScript sandbox environments. While such environments try to mitigate timing side channels by reducing timer precision and removing language features such as \textit{SharedArrayBuffer} that can be used to indirectly generate timers via thread-level parallelism, no such restrictions can be designed to limit Hacky Racers. We also design versions of Hacky Racers that require no misspeculation whatsoever, demonstrating that transient execution is not the only threat to security from modern microarchitectural performance optimization.
We use Hacky Racers to construct novel \textit{backwards-in-time} Spectre gadgets, which break many hardware countermeasures in the literature by leaking secrets before misspeculation is discovered. We also use them to generate the first known last-level cache eviction set generator in JavaScript that does not require \textit{SharedArrayBuffer} support.
[[2211.14362] Chart-RCNN: Efficient Line Chart Data Extraction from Camera Images](http://arxiv.org/abs/2211.14362) #extraction
Line Chart Data Extraction is a natural extension of Optical Character Recognition where the objective is to recover the underlying numerical information a chart image represents. Some recent works such as ChartOCR approach this problem using multi-stage networks combining OCR models with object detection frameworks. However, most of the existing datasets and models are based on "clean" images such as screenshots that drastically differ from camera photos. In addition, creating domain-specific new datasets requires extensive labeling which can be time-consuming. Our main contributions are as follows: we propose a synthetic data generation framework and a one-stage model that outputs text labels, mark coordinates, and perspective estimation simultaneously. We collected two datasets consisting of real camera photos for evaluation. Results show that our model trained only on synthetic data can be applied to real photos without any fine-tuning and is feasible for real-world application.
[[2211.14607] Sketch2FullStack: Generating Skeleton Code of Full Stack Website and Application from Sketch using Deep Learning and Computer Vision](http://arxiv.org/abs/2211.14607) #extraction
For a full-stack web or app development, it requires a software firm or more specifically a team of experienced developers to contribute a large portion of their time and resources to design the website and then convert it to code. As a result, the efficiency of the development team is significantly reduced when it comes to converting UI wireframes and database schemas into an actual working system. It would save valuable resources and fasten the overall workflow if the clients or developers can automate this process of converting the pre-made full-stack website design to get a partially working if not fully working code. In this paper, we present a novel approach of generating the skeleton code from sketched images using Deep Learning and Computer Vision approaches. The dataset for training are first-hand sketched images of low fidelity wireframes, database schemas and class diagrams. The approach consists of three parts. First, the front-end or UI elements detection and extraction from custom-made UI wireframes. Second, individual database table creation from schema designs and lastly, creating a class file from class diagrams.
[[2211.14654] Unsupervised Wildfire Change Detection based on Contrastive Learning](http://arxiv.org/abs/2211.14654) #extraction
The accurate characterization of the severity of the wildfire event strongly contributes to the characterization of the fuel conditions in fire-prone areas, and provides valuable information for disaster response. The aim of this study is to develop an autonomous system built on top of high-resolution multispectral satellite imagery, with an advanced deep learning method for detecting burned area change. This work proposes an initial exploration of using an unsupervised model for feature extraction in wildfire scenarios. It is based on the contrastive learning technique SimCLR, which is trained to minimize the cosine distance between augmentations of images. The distance between encoded images can also be used for change detection. We propose changes to this method that allows it to be used for unsupervised burned area detection and following downstream tasks. We show that our proposed method outperforms the tested baseline approaches.
[[2211.14739] MNER-QG: An End-to-End MRC framework for Multimodal Named Entity Recognition with Query Grounding](http://arxiv.org/abs/2211.14739) #extraction
Multimodal named entity recognition (MNER) is a critical step in information extraction, which aims to detect entity spans and classify them to corresponding entity types given a sentence-image pair. Existing methods either (1) obtain named entities with coarse-grained visual clues from attention mechanisms, or (2) first detect fine-grained visual regions with toolkits and then recognize named entities. However, they suffer from improper alignment between entity types and visual regions or error propagation in the two-stage manner, which finally imports irrelevant visual information into texts. In this paper, we propose a novel end-to-end framework named MNER-QG that can simultaneously perform MRC-based multimodal named entity recognition and query grounding. Specifically, with the assistance of queries, MNER-QG can provide prior knowledge of entity types and visual regions, and further enhance representations of both texts and images. To conduct the query grounding task, we provide manual annotations and weak supervisions that are obtained via training a highly flexible visual grounding model with transfer learning. We conduct extensive experiments on two public MNER datasets, Twitter2015 and Twitter2017. Experimental results show that MNER-QG outperforms the current state-of-the-art models on the MNER task, and also improves the query grounding performance.
[[2211.14470] Towards Better Document-level Relation Extraction via Iterative Inference](http://arxiv.org/abs/2211.14470) #extraction
Document-level relation extraction (RE) aims to extract the relations between entities from the input document that usually containing many difficultly-predicted entity pairs whose relations can only be predicted through relational inference. Existing methods usually directly predict the relations of all entity pairs of input document in a one-pass manner, ignoring the fact that predictions of some entity pairs heavily depend on the predicted results of other pairs. To deal with this issue, in this paper, we propose a novel document-level RE model with iterative inference. Our model is mainly composed of two modules: 1) a base module expected to provide preliminary relation predictions on entity pairs; 2) an inference module introduced to refine these preliminary predictions by iteratively dealing with difficultly-predicted entity pairs depending on other pairs in an easy-to-hard manner. Unlike previous methods which only consider feature information of entity pairs, our inference module is equipped with two Extended Cross Attention units, allowing it to exploit both feature information and previous predictions of entity pairs during relational inference. Furthermore, we adopt a two-stage strategy to train our model. At the first stage, we only train our base module. During the second stage, we train the whole model, where contrastive learning is introduced to enhance the training of inference module. Experimental results on three commonly-used datasets show that our model consistently outperforms other competitive baselines.
[[2211.14477] PCRED: Zero-shot Relation Triplet Extraction with Potential Candidate Relation Selection and Entity Boundary Detection](http://arxiv.org/abs/2211.14477) #extraction
Zero-shot relation triplet extraction (ZeroRTE) aims to extract relation triplets from unstructured texts, while the relation sets at the training and testing stages are disjoint. Previous state-of-the-art method handles this challenging task by leveraging pretrained language models to generate data as additional training samples, which increases the training cost and severely constrains the model performance. We tackle this task from a new perspective and propose a novel method named PCRED for ZeroRTE with Potential Candidate Relation selection and Entity boundary Detection. The model adopts a relation-first paradigm, which firstly recognizes unseen relations through candidate relation selection. By this approach, the semantics of relations are naturally infused in the context. Entities are extracted based on the context and the semantics of relations subsequently. We evaluate our model on two ZeroRTE datasets. The experiment result shows that our method consistently outperforms previous works. Besides, our model does not rely on any additional data, which boasts the advantages of simplicity and effectiveness. Our code is available at https://anonymous.4open.science/r/PCRED.
[[2211.14437] Unsupervised User-Based Insider Threat Detection Using Bayesian Gaussian Mixture Models](http://arxiv.org/abs/2211.14437) #extraction
Insider threats are a growing concern for organizations due to the amount of damage that their members can inflict by combining their privileged access and domain knowledge. Nonetheless, the detection of such threats is challenging, precisely because of the ability of the authorized personnel to easily conduct malicious actions and because of the immense size and diversity of audit data produced by organizations in which the few malicious footprints are hidden. In this paper, we propose an unsupervised insider threat detection system based on audit data using Bayesian Gaussian Mixture Models. The proposed approach leverages a user-based model to optimize specific behaviors modelization and an automatic feature extraction system based on Word2Vec for ease of use in a real-life scenario. The solution distinguishes itself by not requiring data balancing nor to be trained only on normal instances, and by its little domain knowledge required to implement. Still, results indicate that the proposed method competes with state-of-the-art approaches, presenting a good recall of 88\%, accuracy and true negative rate of 93%, and a false positive rate of 6.9%. For our experiments, we used the benchmark dataset CERT version 4.2.
[[2211.14391] MDA: Availability-Aware Federated Learning Client Selection](http://arxiv.org/abs/2211.14391) #federate
Recently, a new distributed learning scheme called Federated Learning (FL) has been introduced. FL is designed so that server never collects user-owned data meaning it is great at preserving privacy. FL's process starts with the server sending a model to clients, then the clients train that model using their data and send the updated model back to the server. Afterward, the server aggregates all the updates and modifies the global model. This process is repeated until the model converges. This study focuses on an FL setting called cross-device FL, which trains based on a large number of clients. Since many devices may be unavailable in cross-device FL, and communication between the server and all clients is extremely costly, only a fraction of clients gets selected for training at each round. In vanilla FL, clients are selected randomly, which results in an acceptable accuracy but is not ideal from the overall training time perspective, since some clients are slow and can cause some training rounds to be slow. If only fast clients get selected the learning would speed up, but it will be biased toward only the fast clients' data, and the accuracy degrades. Consequently, new client selection techniques have been proposed to improve the training time by considering individual clients' resources and speed. This paper introduces the first availability-aware selection strategy called MDA. The results show that our approach makes learning faster than vanilla FL by up to 6.5%. Moreover, we show that resource heterogeneity-aware techniques are effective but can become even better when combined with our approach, making it faster than the state-of-the-art selectors by up to 16%. Lastly, our approach selects more unique clients for training compared to client selectors that only select fast clients, which reduces our technique's bias.
[[2211.14393] FedSysID: A Federated Approach to Sample-Efficient System Identification](http://arxiv.org/abs/2211.14393) #federate
We study the problem of learning a linear system model from the observations of $M$ clients. The catch: Each client is observing data from a different dynamical system. This work addresses the question of how multiple clients collaboratively learn dynamical models in the presence of heterogeneity. We pose this problem as a federated learning problem and characterize the tension between achievable performance and system heterogeneity. Furthermore, our federated sample complexity result provides a constant factor improvement over the single agent setting. Finally, we describe a meta federated learning algorithm, FedSysID, that leverages existing federated algorithms at the client level.
[[2211.14462] Meta Architecure for Point Cloud Analysis](http://arxiv.org/abs/2211.14462) #fair
Recent advances in 3D point cloud analysis bring a diverse set of network architectures to the field. However, the lack of a unified framework to interpret those networks makes any systematic comparison, contrast, or analysis challenging, and practically limits healthy development of the field. In this paper, we take the initiative to explore and propose a unified framework called PointMeta, to which the popular 3D point cloud analysis approaches could fit. This brings three benefits. First, it allows us to compare different approaches in a fair manner, and use quick experiments to verify any empirical observations or assumptions summarized from the comparison. Second, the big picture brought by PointMeta enables us to think across different components, and revisit common beliefs and key design decisions made by the popular approaches. Third, based on the learnings from the previous two analyses, by doing simple tweaks on the existing approaches, we are able to derive a basic building block, termed PointMetaBase. It shows very strong performance in efficiency and effectiveness through extensive experiments on challenging benchmarks, and thus verifies the necessity and benefits of high-level interpretation, contrast, and comparison like PointMeta. In particular, PointMetaBase surpasses the previous state-of-the-art method by 0.7%/1.4/%2.1% mIoU with only 2%/11%/13% of the computation cost on the S3DIS datasets.
[[2211.14498] The Impact of Racial Distribution in Training Data on Face Recognition Bias: A Closer Look](http://arxiv.org/abs/2211.14498) #fair
Face recognition algorithms, when used in the real world, can be very useful, but they can also be dangerous when biased toward certain demographics. So, it is essential to understand how these algorithms are trained and what factors affect their accuracy and fairness to build better ones. In this study, we shed some light on the effect of racial distribution in the training data on the performance of face recognition models. We conduct 16 different experiments with varying racial distributions of faces in the training data. We analyze these trained models using accuracy metrics, clustering metrics, UMAP projections, face quality, and decision thresholds. We show that a uniform distribution of races in the training datasets alone does not guarantee bias-free face recognition algorithms and how factors like face image quality play a crucial role. We also study the correlation between the clustering metrics and bias to understand whether clustering is a good indicator of bias. Finally, we introduce a metric called racial gradation to study the inter and intra race correlation in facial features and how they affect the learning ability of the face recognition models. With this study, we try to bring more understanding to an essential element of face recognition training, the data. A better understanding of the impact of training data on the bias of face recognition algorithms will aid in creating better datasets and, in turn, better face recognition systems.
[[2211.14358] A Moral- and Event- Centric Inspection of Gender Bias in Fairy Tales at A Large Scale](http://arxiv.org/abs/2211.14358) #fair
Fairy tales are a common resource for young children to learn a language or understand how a society works. However, gender bias, e.g., stereotypical gender roles, in this literature may cause harm and skew children's world view. Instead of decades of qualitative and manual analysis of gender bias in fairy tales, we computationally analyze gender bias in a fairy tale dataset containing 624 fairy tales from 7 different cultures. We specifically examine gender difference in terms of moral foundations, which are measures of human morality, and events, which reveal human activities associated with each character. We find that the number of male characters is two times that of female characters, showing a disproportionate gender representation. Our analysis further reveal stereotypical portrayals of both male and female characters in terms of moral foundations and events. Female characters turn out more associated with care-, loyalty- and sanctity- related moral words, while male characters are more associated with fairness- and authority- related moral words. Female characters' events are often about emotion (e.g., weep), appearance (e.g., comb), household (e.g., bake), etc.; while male characters' events are more about profession (e.g., hunt), violence (e.g., destroy), justice (e.g., judge), etc. Gender bias in terms of moral foundations shows an obvious difference across cultures. For example, female characters are more associated with care and sanctity in high uncertainty-avoidance cultures which are less open to changes and unpredictability. Based on the results, we propose implications for children's literature and early literacy research.
[[2211.14620] The distribution of syntactic dependency distances](http://arxiv.org/abs/2211.14620) #fair
The syntactic structure of a sentence can be represented as a graph where vertices are words and edges indicate syntactic dependencies between them. In this setting, the distance between two syntactically linked words can be defined as the difference between their positions. Here we want to contribute to the characterization of the actual distribution of syntactic dependency distances, and unveil its relationship with short-term memory limitations. We propose a new double-exponential model in which decay in probability is allowed to change after a break-point. This transition could mirror the transition from the processing of words chunks to higher-level structures. We find that a two-regime model -- where the first regime follows either an exponential or a power-law decay -- is the most likely one in all 20 languages we considered, independently of sentence length and annotation style. Moreover, the break-point is fairly stable across languages and averages values of 4-5 words, suggesting that the amount of words that can be simultaneously processed abstracts from the specific language to a high degree. Finally, we give an account of the relation between the best estimated model and the closeness of syntactic dependencies, as measured by a recently introduced optimality score.
[[2211.14383] Interpreting Unfairness in Graph Neural Networks via Training Node Attribution](http://arxiv.org/abs/2211.14383) #fair
Graph Neural Networks (GNNs) have emerged as the leading paradigm for solving graph analytical problems in various real-world applications. Nevertheless, GNNs could potentially render biased predictions towards certain demographic subgroups. Understanding how the bias in predictions arises is critical, as it guides the design of GNN debiasing mechanisms. However, most existing works overwhelmingly focus on GNN debiasing, but fall short on explaining how such bias is induced. In this paper, we study a novel problem of interpreting GNN unfairness through attributing it to the influence of training nodes. Specifically, we propose a novel strategy named Probabilistic Distribution Disparity (PDD) to measure the bias exhibited in GNNs, and develop an algorithm to efficiently estimate the influence of each training node on such bias. We verify the validity of PDD and the effectiveness of influence estimation through experiments on real-world datasets. Finally, we also demonstrate how the proposed framework could be used for debiasing GNNs. Open-source code can be found at https://github.com/yushundong/BIND.
[[2211.14646] Towards Better Input Masking for Convolutional Neural Networks](http://arxiv.org/abs/2211.14646) #interpretability
The ability to remove features from the input of machine learning models is very important to understand and interpret model predictions. However, this is non-trivial for vision models since masking out parts of the input image and replacing them with a baseline color like black or grey typically causes large distribution shifts. Masking may even make the model focus on the masking patterns for its prediction rather than the unmasked portions of the image. In recent work, it has been shown that vision transformers are less affected by such issues as one can simply drop the tokens corresponding to the masked image portions. They are thus more easily interpretable using techniques like LIME which rely on input perturbation. Using the same intuition, we devise a masking technique for CNNs called layer masking, which simulates running the CNN on only the unmasked input. We find that our method is (i) much less disruptive to the model's output and its intermediate activations, and (ii) much better than commonly used masking techniques for input perturbation based interpretability techniques like LIME. Thus, layer masking is able to close the interpretability gap between CNNs and transformers, and even make CNNs more interpretable in many cases.
[[2211.14425] PatchGT: Transformer over Non-trainable Clusters for Learning Graph Representations](http://arxiv.org/abs/2211.14425) #interpretability
Recently the Transformer structure has shown good performances in graph learning tasks. However, these Transformer models directly work on graph nodes and may have difficulties learning high-level information. Inspired by the vision transformer, which applies to image patches, we propose a new Transformer-based graph neural network: Patch Graph Transformer (PatchGT). Unlike previous transformer-based models for learning graph representations, PatchGT learns from non-trainable graph patches, not from nodes directly. It can help save computation and improve the model performance. The key idea is to segment a graph into patches based on spectral clustering without any trainable parameters, with which the model can first use GNN layers to learn patch-level representations and then use Transformer to obtain graph-level representations. The architecture leverages the spectral information of graphs and combines the strengths of GNNs and Transformers. Further, we show the limitations of previous hierarchical trainable clusters theoretically and empirically. We also prove the proposed non-trainable spectral clustering method is permutation invariant and can help address the information bottlenecks in the graph. PatchGT achieves higher expressiveness than 1-WL-type GNNs, and the empirical study shows that PatchGT achieves competitive performances on benchmark datasets and provides interpretability to its predictions. The implementation of our algorithm is released at our Github repo: https://github.com/tufts-ml/PatchGT.
[[2211.14545] Ensemble Multi-Quantile: Adaptively Flexible Distribution Prediction for Uncertainty Quantification](http://arxiv.org/abs/2211.14545) #interpretability
We propose a novel, succinct, and effective approach to quantify uncertainty in machine learning. It incorporates adaptively flexible distribution prediction for $\mathbb{P}(\mathbf{y}|\mathbf{X}=x)$ in regression tasks. For predicting this conditional distribution, its quantiles of probability levels spreading the interval $(0,1)$ are boosted by additive models which are designed by us with intuitions and interpretability. We seek an adaptive balance between the structural integrity and the flexibility for $\mathbb{P}(\mathbf{y}|\mathbf{X}=x)$, while Gaussian assumption results in a lack of flexibility for real data and highly flexible approaches (e.g., estimating the quantiles separately without a distribution structure) inevitably have drawbacks and may not lead to good generalization. This ensemble multi-quantiles approach called EMQ proposed by us is totally data-driven, and can gradually depart from Gaussian and discover the optimal conditional distribution in the boosting. On extensive regression tasks from UCI datasets, we show that EMQ achieves state-of-the-art performance comparing to many recent uncertainty quantification methods including Gaussian assumption-based, Bayesian methods, quantile regression-based, and traditional tree models, under the metrics of calibration, sharpness, and tail-side calibration. Visualization results show what we actually learn from the real data and how, illustrating the necessity and the merits of such an ensemble model.
[[2211.14617] Mixture of Decision Trees for Interpretable Machine Learning](http://arxiv.org/abs/2211.14617) #interpretability
This work introduces a novel interpretable machine learning method called Mixture of Decision Trees (MoDT). It constitutes a special case of the Mixture of Experts ensemble architecture, which utilizes a linear model as gating function and decision trees as experts. Our proposed method is ideally suited for problems that cannot be satisfactorily learned by a single decision tree, but which can alternatively be divided into subproblems. Each subproblem can then be learned well from a single decision tree. Therefore, MoDT can be considered as a method that improves performance while maintaining interpretability by making each of its decisions understandable and traceable to humans.
Our work is accompanied by a Python implementation, which uses an interpretable gating function, a fast learning algorithm, and a direct interface to fine-tuned interpretable visualization methods. The experiments confirm that the implementation works and, more importantly, show the superiority of our approach compared to single decision trees and random forests of similar complexity.
[[2211.14575] Randomized Conditional Flow Matching for Video Prediction](http://arxiv.org/abs/2211.14575) #diffusion
We introduce a novel generative model for video prediction based on latent flow matching, an efficient alternative to diffusion-based models. In contrast to prior work that either incurs a high training cost by modeling the past through a memory state, as in recurrent neural networks, or limits the computational load by conditioning only on a predefined window of past frames, we efficiently and effectively take the past into account by conditioning at inference time only on a small random set of past frames at each integration step of the learned flow. Moreover, to enable the generation of high-resolution videos and speed up the training, we work in the latent space of a pretrained VQGAN. Furthermore, we propose to approximate the initial condition of the flow ODE with the previous noisy frame. This allows to reduce the number of integration steps and hence, speed up the sampling at inference time. We call our model Random frame conditional flow Integration for VidEo pRediction, or, in short, RIVER. We show that RIVER achieves superior or on par performance compared to prior work on common video prediction benchmarks.
[[2211.14680] A Physics-informed Diffusion Model for High-fidelity Flow Field Reconstruction](http://arxiv.org/abs/2211.14680) #diffusion
Machine learning models are gaining increasing popularity in the domain of fluid dynamics for their potential to accelerate the production of high-fidelity computational fluid dynamics data. However, many recently proposed machine learning models for high-fidelity data reconstruction require low-fidelity data for model training. Such requirement restrains the application performance of these models, since their data reconstruction accuracy would drop significantly if the low-fidelity input data used in model test has a large deviation from the training data. To overcome this restraint, we propose a diffusion model which only uses high-fidelity data at training. With different configurations, our model is able to reconstruct high-fidelity data from either a regular low-fidelity sample or a sparsely measured sample, and is also able to gain an accuracy increase by using physics-informed conditioning information from a known partial differential equation when that is available. Experimental results demonstrate that our model can produce accurate reconstruction results for 2d turbulent flows based on different input sources without retraining.