[[2209.11857] Analysis of the new standard hash function](http://arxiv.org/abs/2209.11857)
On 2$^{nd}$ October 2012 the NIST (National Institute of Standards and Technology) in the United States of America announced the new hashing algorithm which will be adopted as standard from now on. Among a total of 73 candidates, the winner was Keccak, designed by a group of cryptographers from Belgium and Italy. The public selection of a new standard of cryptographic hash function SHA (Secure Hash Algorithm) took five years. Its object is to generate a hash a fixed size from a pattern with arbitrary length. The first selection on behalf of NIST on a standard of this family took place in 1993 when SHA-1 was chosen, which later on was replaced by SHA-2. This paper is focused on the analysis both from the point of view of security and the implementation of the Keccak function, which is the base of the new SHA-3 standard. In particular, an implementation in the mobile platform Android is presented here, providing the first known external library in this mobile operating system so that any developer could use the new standard hashing. Finally, the new standard in applications in the Internet of Things is analysed.
[[2209.11861] Authentication and encryption for a robotic ad hoc network using identity-based cryptography](http://arxiv.org/abs/2209.11861)
In some situations the communications of a place can be affected, totally lost, or not even exist. In these cases, the MANETs play an important role, allowing to establish a communications point using the different nodes of the network to reach the destination using decentralized communications. This paper proposes the implementation of a Robotic MANET, a decentralized network using robots as its nodes, which allows to move the network nodes to the desired location remotely. For this, each robot has as a core a Raspberry Pi with the capabilities to perform audio and video streaming, remote control of robots, tracking of objects, and deployment of wireless networks. To protect the network, different security mechanisms are used that allow secure authentication on the network by different nodes and encryption of information transmitted between them. All communications are protected through Identity-Based Cryptography, specifically with an Identity-Based Signcryption scheme.
[[2209.11957] Cooperative Resource Management in Quantum Key Distribution (QKD) Networks for Semantic Communication](http://arxiv.org/abs/2209.11957)
Increasing privacy and security concerns in intelligence-native 6G networks require quantum key distribution-secured semantic information communication (QKD-SIC). In QKD-SIC systems, edge devices connected via quantum channels can efficiently encrypt semantic information from the semantic source, and securely transmit the encrypted semantic information to the semantic destination. In this paper, we consider an efficient resource (i.e., QKD and KM wavelengths) sharing problem to support QKD-SIC systems under the uncertainty of semantic information generated by edge devices. In such a system, QKD service providers offer QKD services with different subscription options to the edge devices. As such, to reduce the cost for the edge device users, we propose a QKD resource management framework for the edge devices communicating semantic information. The framework is based on a two-stage stochastic optimization model to achieve optimal QKD deployment. Moreover, to reduce the deployment cost of QKD service providers, QKD resources in the proposed framework can be utilized based on efficient QKD-SIC resource management, including semantic information transmission among edge devices, secret-key provisioning, and cooperation formation among QKD service providers. In detail, the formulated two-stage stochastic optimization model can achieve the optimal QKD-SIC resource deployment while meeting the secret-key requirements for semantic information transmission of edge devices. Moreover, to share the cost of the QKD resource pool among cooperative QKD service providers forming a coalition in a fair and interpretable manner, the proposed framework leverages the concept of Shapley value from cooperative game theory as a solution. Experimental results demonstrate that the proposed framework can reduce the deployment cost by about 40% compared with existing non-cooperative baselines.
[[2209.12145] Secure Decentralized IoT Service Platform using Consortium Blockchain](http://arxiv.org/abs/2209.12145)
Blockchain technology has gained increasing popularity in the research of Internet of Things (IoT) systems in the past decade. As a distributed and immutable ledger secured by strong cryptography algorithms, the blockchain brings a new perspective to secure IoT systems. Many studies have been devoted to integrating blockchain into IoT device management, access control, data integrity, security, and privacy. In comparison, the blockchain-facilitated IoT communication is much less studied. Nonetheless, we see the potential of blockchain in decentralizing and securing IoT communications. This paper proposes an innovative IoT service platform powered by consortium blockchain technology. The presented solution abstracts machine-to-machine (M2M) and human-to-machine (H2M) communications into services provided by IoT devices. Then, it materializes data exchange of the IoT network through smart contracts and blockchain transactions. Additionally, we introduce the auxiliary storage layer to the proposed platform to address various data storage requirements. Our proof-of-concept implementation is tested against various workloads and connection sizes under different block configurations to evaluate the platform's transaction throughput, latency, and hardware utilization. The experiment results demonstrate that our solution can maintain high performance under most testing scenarios and provide valuable insights on optimizing the blockchain configuration to achieve the best performance.
[[2209.11802] Reversible Data Hiding in Encrypted Text Using Paillier Cryptosystem](http://arxiv.org/abs/2209.11802)
Reversible Data Hiding in Encrypted Domain (RDHED) is an innovative method that can keep cover information secret and allows the data hider to insert additional information into it. This article presents a novel data hiding technique in an encrypted text called Reversible Data Hiding in Encrypted Text (RDHET). Initially, the original text is converted into their ASCII values. After that, the Paillier cryptosystem is adopted to encrypt all ASCII values of the original text and send it to the data hider for further processing. At the data hiding phase, the secret data are embedded into homomorphically encrypted text using a technique that does not lose any information, i.e., the homomorphic properties of the Paillier cryptosystem. Finally, the embedded secret data and the original text are recovered at the receiving end without any loss. Experimental results show that the proposed scheme is vital in the context of encrypted text processing at cloud-based services. Moreover, the scheme works well, especially for the embedding phase, text recovery, and performance on different security key sizes.
[[2209.11878] "My Privacy for their Security": Employees' Privacy Perspectives and Expectations when using Enterprise Security Software](http://arxiv.org/abs/2209.11878)
Employees are often required to use Enterprise Security Software ("ESS") on corporate and personal devices. ESS products collect users' activity data including users' location, applications used, and websites visited - operating from employees' device to the cloud. To the best of our knowledge, the privacy implications of this data collection have yet to be explored. We conduct an online survey (n=258) and a semi-structured interview (n=22) with ESS users to understand their privacy perceptions, the challenges they face when using ESS, and the ways they try to overcome those challenges. We found that while many participants reported receiving no information about what data their ESS collected, those who received some information often underestimated what was collected. Employees reported lack of communication about various data collection aspects including: the entities with access to the data and the scope of the data collected. We use the interviews to uncover several sources of misconceptions among the participants. Our findings show that while employees understand the need for data collection for security, the lack of communication and ambiguous data collection practices result in the erosion of employees' trust on the ESS and employers. We obtain suggestions from participants on how to mitigate these misconceptions and collect feedback on our design mockups of a privacy notice and privacy indicators for ESS. Our work will benefit researchers, employers, and ESS developers to protect users' privacy in the growing ESS market.
[[2209.11904] CryptoGCN: Fast and Scalable Homomorphically Encrypted Graph Convolutional Network Inference](http://arxiv.org/abs/2209.11904)
Recently cloud-based graph convolutional network (GCN) has demonstrated great success and potential in many privacy-sensitive applications such as personal healthcare and financial systems. Despite its high inference accuracy and performance on cloud, maintaining data privacy in GCN inference, which is of paramount importance to these practical applications, remains largely unexplored. In this paper, we take an initial attempt towards this and develop $\textit{CryptoGCN}$--a homomorphic encryption (HE) based GCN inference framework. A key to the success of our approach is to reduce the tremendous computational overhead for HE operations, which can be orders of magnitude higher than its counterparts in the plaintext space. To this end, we develop an approach that can effectively take advantage of the sparsity of matrix operations in GCN inference to significantly reduce the computational overhead. Specifically, we propose a novel AMA data formatting method and associated spatial convolution methods, which can exploit the complex graph structure and perform efficient matrix-matrix multiplication in HE computation and thus greatly reduce the HE operations. We also develop a co-optimization framework that can explore the trade offs among the accuracy, security level, and computational overhead by judicious pruning and polynomial approximation of activation module in GCNs. Based on the NTU-XVIEW skeleton joint dataset, i.e., the largest dataset evaluated homomorphically by far as we are aware of, our experimental results demonstrate that $\textit{CryptoGCN}$ outperforms state-of-the-art solutions in terms of the latency and number of homomorphic operations, i.e., achieving as much as a 3.10$\times$ speedup on latency and reduces the total Homomorphic Operation Count by 77.4\% with a small accuracy loss of 1-1.5$\%$.
[[2209.12041] Blockchain technologies in the design of Industrial Control Systems for Smart Cities](http://arxiv.org/abs/2209.12041)
The proliferation of sensor technologies in Industrial Control Systems (ICS) helped to transform the environment towards better automation, process control and monitoring. However, sensor technologies expose the smart cities of the future to complex security challenges. Luckily, the sensing capabilities also create opportunities to capture various data types, which apart from operational use can add substantial value to developing mechanisms to protect ICS and critical infrastructure. We discuss Blockchain (BC), a disruptive technology with applications ranging from cryptocurrency to smart contracts and the value of integrating BC technologies into the design of ICS to support modern digital forensic readiness.
[[2209.12195] SPRITZ-1](http://arxiv.org/abs/2209.12195)
In the past few years, Convolutional Neural Networks (CNN) have demonstrated promising performance in various real-world cybersecurity applications, such as network and multimedia security. However, the underlying fragility of CNN structures poses major security problems, making them inappropriate for use in security-oriented applications including such computer networks. Protecting these architectures from adversarial attacks necessitates using security-wise architectures that are challenging to attack.
In this study, we present a novel architecture based on an ensemble classifier that combines the enhanced security of 1-Class classification (known as 1C) with the high performance of conventional 2-Class classification (known as 2C) in the absence of attacks.Our architecture is referred to as the 1.5-Class (SPRITZ-1.5C) classifier and constructed using a final dense classifier, one 2C classifier (i.e., CNNs), and two parallel 1C classifiers (i.e., auto-encoders). In our experiments, we evaluated the robustness of our proposed architecture by considering eight possible adversarial attacks in various scenarios. We performed these attacks on the 2C and SPRITZ-1.5C architectures separately. The experimental results of our study showed that the Attack Success Rate (ASR) of the I-FGSM attack against a 2C classifier trained with the N-BaIoT dataset is 0.9900. In contrast, the ASR is 0.0000 for the SPRITZ-1.5C classifier.
[[2209.12046] Blinder: End-to-end Privacy Protection in Sensing Systems via Personalized Federated Learning](http://arxiv.org/abs/2209.12046)
This paper proposes a sensor data anonymization model that is trained on decentralized data and strikes a desirable trade-off between data utility and privacy, even in heterogeneous settings where the collected sensor data have different underlying distributions. Our anonymization model, dubbed Blinder, is based on a variational autoencoder and discriminator networks trained in an adversarial fashion. We use the model-agnostic meta-learning framework to adapt the anonymization model trained via federated learning to each user's data distribution. We evaluate Blinder under different settings and show that it provides end-to-end privacy protection at the cost of increasing privacy loss by up to 4.00% and decreasing data utility by up to 4.24%, compared to the state-of-the-art anonymization model trained on centralized data. Our experiments confirm that Blinder can obscure multiple private attributes at once, and has sufficiently low power consumption and computational overhead for it to be deployed on edge devices and smartphones to perform real-time anonymization of sensor data.
[[2209.11843] Privacy-Preserving Online Content Moderation: A Federated Learning Use Case](http://arxiv.org/abs/2209.11843)
Users are daily exposed to a large volume of harmful content on various social network platforms. One solution is developing online moderation tools using Machine Learning techniques. However, the processing of user data by online platforms requires compliance with privacy policies. Federated Learning (FL) is an ML paradigm where the training is performed locally on the users' devices. Although the FL framework complies, in theory, with the GDPR policies, privacy leaks can still occur. For instance, an attacker accessing the final trained model can successfully perform unwanted inference of the data belonging to the users who participated in the training process. In this paper, we propose a privacy-preserving FL framework for online content moderation that incorporates Differential Privacy (DP). To demonstrate the feasibility of our approach, we focus on detecting harmful content on Twitter - but the overall concept can be generalized to other types of misbehavior. We simulate a text classifier - in FL fashion - which can detect tweets with harmful content. We show that the performance of the proposed FL framework can be close to the centralized approach - for both the DP and non-DP FL versions. Moreover, it has a high performance even if a small number of clients (each with a small number of data points) are available for the FL training. When reducing the number of clients (from 50 to 10) or the data points per client (from 1K to 0.1K), the classifier can still achieve ~81% AUC. Furthermore, we extend the evaluation to four other Twitter datasets that capture different types of user misbehavior and still obtain a promising performance (61% - 80% AUC). Finally, we explore the overhead on the users' devices during the FL training phase and show that the local training does not introduce excessive CPU utilization and memory consumption overhead.
[[2209.12136] Vision-based Perimeter Defense via Multiview Pose Estimation](http://arxiv.org/abs/2209.12136)
Previous studies in the perimeter defense game have largely focused on the fully observable setting where the true player states are known to all players. However, this is unrealistic for practical implementation since defenders may have to perceive the intruders and estimate their states. In this work, we study the perimeter defense game in a photo-realistic simulator and the real world, requiring defenders to estimate intruder states from vision. We train a deep machine learning-based system for intruder pose detection with domain randomization that aggregates multiple views to reduce state estimation errors and adapt the defensive strategy to account for this. We newly introduce performance metrics to evaluate the vision-based perimeter defense. Through extensive experiments, we show that our approach improves state estimation, and eventually, perimeter defense performance in both 1-defender-vs-1-intruder games, and 2-defenders-vs-1-intruder games.
[[2209.11964] Approximate better, Attack stronger: Adversarial Example Generation via Asymptotically Gaussian Mixture Distribution](http://arxiv.org/abs/2209.11964)
Strong adversarial examples are the keys to evaluating and enhancing the robustness of deep neural networks. The popular adversarial attack algorithms maximize the non-concave loss function using the gradient ascent. However, the performance of each attack is usually sensitive to, for instance, minor image transformations due to insufficient information (only one input example, few white-box source models and unknown defense strategies). Hence, the crafted adversarial examples are prone to overfit the source model, which limits their transferability to unidentified architectures. In this paper, we propose Multiple Asymptotically Normal Distribution Attacks (MultiANDA), a novel method that explicitly characterizes adversarial perturbations from a learned distribution. Specifically, we approximate the posterior distribution over the perturbations by taking advantage of the asymptotic normality property of stochastic gradient ascent (SGA), then apply the ensemble strategy on this procedure to estimate a Gaussian mixture model for a better exploration of the potential optimization space. Drawing perturbations from the learned distribution allow us to generate any number of adversarial examples for each input. The approximated posterior essentially describes the stationary distribution of SGA iterations, which captures the geometric information around the local optimum. Thus, the samples drawn from the distribution reliably maintain the transferability. Our proposed method outperforms nine state-of-the-art black-box attacks on deep learning models with or without defenses through extensive experiments on seven normally trained and seven defence models.
[[2209.12208] A Uniform Representation Learning Method for OCT-based Fingerprint Presentation Attack Detection and Reconstruction](http://arxiv.org/abs/2209.12208)
The technology of optical coherence tomography (OCT) to fingerprint imaging opens up a new research potential for fingerprint recognition owing to its ability to capture depth information of the skin layers. Developing robust and high security Automated Fingerprint Recognition Systems (AFRSs) are possible if the depth information can be fully utilized. However, in existing studies, Presentation Attack Detection (PAD) and subsurface fingerprint reconstruction based on depth information are treated as two independent branches, resulting in high computation and complexity of AFRS building.Thus, this paper proposes a uniform representation model for OCT-based fingerprint PAD and subsurface fingerprint reconstruction. Firstly, we design a novel semantic segmentation network which only trained by real finger slices of OCT-based fingerprints to extract multiple subsurface structures from those slices (also known as B-scans). The latent codes derived from the network are directly used to effectively detect the PA since they contain abundant subsurface biological information, which is independent with PA materials and has strong robustness for unknown PAs. Meanwhile, the segmented subsurface structures are adopted to reconstruct multiple subsurface 2D fingerprints. Recognition can be easily achieved by using existing mature technologies based on traditional 2D fingerprints. Extensive experiments are carried on our own established database, which is the largest public OCT-based fingerprint database with 2449 volumes. In PAD task, our method can improve 0.33% Acc from the state-of-the-art method. For reconstruction performance, our method achieves the best performance with 0.834 mIOU and 0.937 PA. By comparing with the recognition performance on surface 2D fingerprints, the effectiveness of our proposed method on high quality subsurface fingerprint reconstruction is further proved.
[[2209.11962] Trace-based cryptoanalysis of cyclotomic PLWE for the non-split case](http://arxiv.org/abs/2209.11962)
We provide an attack against the decision version of PLWE over the cyclotomic ring $\mathbb{F}q[x]/(\Phi{p^k}(x))$ with $k>1$ in the case where $q\equiv 1\pmod{p}$ but $\Phi_{p^k}(x)$ is not totally split over $\mathbb{F}q$. Our attack uses that the roots of $\Phi{p^k}(x)$ over suitable extensions of $\mathbb{F}_q$ have zero-trace and has overwhelming success probability in function of the number of samples taken as input. An implementation in Maple and some examples of our attack are also provided.
[[2209.11894] Closing the Loop: Graph Networks to Unify Semantic Objects and Visual Features for Multi-object Scenes](http://arxiv.org/abs/2209.11894)
In Simultaneous Localization and Mapping (SLAM), Loop Closure Detection (LCD) is essential to minimize drift when recognizing previously visited places. Visual Bag-of-Words (vBoW) has been an LCD algorithm of choice for many state-of-the-art SLAM systems. It uses a set of visual features to provide robust place recognition but fails to perceive the semantics or spatial relationship between feature points. Previous work has mainly focused on addressing these issues by combining vBoW with semantic and spatial information from objects in the scene. However, they are unable to exploit spatial information of local visual features and lack a structure that unifies semantic objects and visual features, therefore limiting the symbiosis between the two components. This paper proposes SymbioLCD2, which creates a unified graph structure to integrate semantic objects and visual features symbiotically. Our novel graph-based LCD system utilizes the unified graph structure by applying a Weisfeiler-Lehman graph kernel with temporal constraints to robustly predict loop closure candidates. Evaluation of the proposed system shows that having a unified graph structure incorporating semantic objects and visual features improves LCD prediction accuracy, illustrating that the proposed graph structure provides a strong symbiosis between these two complementary components. It also outperforms other Machine Learning algorithms - such as SVM, Decision Tree, Random Forest, Neural Network and GNN based Graph Matching Networks. Furthermore, it has shown good performance in detecting loop closure candidates earlier than state-of-the-art SLAM systems, demonstrating that extended semantic and spatial awareness from the unified graph structure significantly impacts LCD performance.
[[2209.11916] A Simple Strategy to Provable Invariance via Orbit Mapping](http://arxiv.org/abs/2209.11916)
Many applications require robustness, or ideally invariance, of neural networks to certain transformations of input data. Most commonly, this requirement is addressed by training data augmentation, using adversarial training, or defining network architectures that include the desired invariance by design. In this work, we propose a method to make network architectures provably invariant with respect to group actions by choosing one element from a (possibly continuous) orbit based on a fixed criterion. In a nutshell, we intend to 'undo' any possible transformation before feeding the data into the actual network. Further, we empirically analyze the properties of different approaches which incorporate invariance via training or architecture, and demonstrate the advantages of our method in terms of robustness and computational efficiency. In particular, we investigate the robustness with respect to rotations of images (which can hold up to discretization artifacts) as well as the provable orientation and scaling invariance of 3D point cloud classification.
[[2209.11945] Towards Bridging the Space Domain Gap for Satellite Pose Estimation using Event Sensing](http://arxiv.org/abs/2209.11945)
Deep models trained using synthetic data require domain adaptation to bridge the gap between the simulation and target environments. State-of-the-art domain adaptation methods often demand sufficient amounts of (unlabelled) data from the target domain. However, this need is difficult to fulfil when the target domain is an extreme environment, such as space. In this paper, our target problem is close proximity satellite pose estimation, where it is costly to obtain images of satellites from actual rendezvous missions. We demonstrate that event sensing offers a promising solution to generalise from the simulation to the target domain under stark illumination differences. Our main contribution is an event-based satellite pose estimation technique, trained purely on synthetic event data with basic data augmentation to improve robustness against practical (noisy) event sensors. Underpinning our method is a novel dataset with carefully calibrated ground truth, comprising of real event data obtained by emulating satellite rendezvous scenarios in the lab under drastic lighting conditions. Results on the dataset showed that our event-based satellite pose estimation method, trained only on synthetic data without adaptation, could generalise to the target domain effectively.
[[2209.11960] Raising the Bar on the Evaluation of Out-of-Distribution Detection](http://arxiv.org/abs/2209.11960)
In image classification, a lot of development has happened in detecting out-of-distribution (OoD) data. However, most OoD detection methods are evaluated on a standard set of datasets, arbitrarily different from training data. There is no clear definition of what forms a ``good" OoD dataset. Furthermore, the state-of-the-art OoD detection methods already achieve near perfect results on these standard benchmarks. In this paper, we define 2 categories of OoD data using the subtly different concepts of perceptual/visual and semantic similarity to in-distribution (iD) data. We define Near OoD samples as perceptually similar but semantically different from iD samples, and Shifted samples as points which are visually different but semantically akin to iD data. We then propose a GAN based framework for generating OoD samples from each of these 2 categories, given an iD dataset. Through extensive experiments on MNIST, CIFAR-10/100 and ImageNet, we show that a) state-of-the-art OoD detection methods which perform exceedingly well on conventional benchmarks are significantly less robust to our proposed benchmark. Moreover, b) models performing well on our setup also perform well on conventional real-world OoD detection benchmarks and vice versa, thereby indicating that one might not even need a separate OoD set, to reliably evaluate performance in OoD detection.
[[2209.11979] Robust Hyperspectral Image Fusion with Simultaneous Guide Image Denoising via Constrained Convex Optimization](http://arxiv.org/abs/2209.11979)
The paper proposes a new high spatial resolution hyperspectral (HR-HS) image estimation method based on convex optimization. The method assumes a low spatial resolution HS (LR-HS) image and a guide image as observations, where both observations are contaminated by noise. Our method simultaneously estimates an HR-HS image and a noiseless guide image, so the method can utilize spatial information in a guide image even if it is contaminated by heavy noise. The proposed estimation problem adopts hybrid spatio-spectral total variation as regularization and evaluates the edge similarity between HR-HS and guide images to effectively use apriori knowledge on an HR-HS image and spatial detail information in a guide image. To efficiently solve the problem, we apply a primal-dual splitting method. Experiments demonstrate the performance of our method and the advantage over several existing methods.
[[2209.12009] Tracking and Reconstructing Hand Object Interactions from Point Cloud Sequences in the Wild](http://arxiv.org/abs/2209.12009)
In this work, we tackle the challenging task of jointly tracking hand object pose and reconstructing their shapes from depth point cloud sequences in the wild, given the initial poses at frame 0. We for the first time propose a point cloud based hand joint tracking network, HandTrackNet, to estimate the inter-frame hand joint motion. Our HandTrackNet proposes a novel hand pose canonicalization module to ease the tracking task, yielding accurate and robust hand joint tracking. Our pipeline then reconstructs the full hand via converting the predicted hand joints into a template-based parametric hand model MANO. For object tracking, we devise a simple yet effective module that estimates the object SDF from the first frame and performs optimization-based tracking. Finally, a joint optimization step is adopted to perform joint hand and object reasoning, which alleviates the occlusion-induced ambiguity and further refines the hand pose. During training, the whole pipeline only sees purely synthetic data, which are synthesized with sufficient variations and by depth simulation for the ease of generalization. The whole pipeline is pertinent to the generalization gaps and thus directly transferable to real in-the-wild data. We evaluate our method on two real hand object interaction datasets, e.g. HO3D and DexYCB, without any finetuning. Our experiments demonstrate that the proposed method significantly outperforms the previous state-of-the-art depth-based hand and object pose estimation and tracking methods, running at a frame rate of 9 FPS.
[[2209.12138] Towards Stable Co-saliency Detection and Object Co-segmentation](http://arxiv.org/abs/2209.12138)
In this paper, we present a novel model for simultaneous stable co-saliency detection (CoSOD) and object co-segmentation (CoSEG). To detect co-saliency (segmentation) accurately, the core problem is to well model inter-image relations between an image group. Some methods design sophisticated modules, such as recurrent neural network (RNN), to address this problem. However, order-sensitive problem is the major drawback of RNN, which heavily affects the stability of proposed CoSOD (CoSEG) model. In this paper, inspired by RNN-based model, we first propose a multi-path stable recurrent unit (MSRU), containing dummy orders mechanisms (DOM) and recurrent unit (RU). Our proposed MSRU not only helps CoSOD (CoSEG) model captures robust inter-image relations, but also reduces order-sensitivity, resulting in a more stable inference and training process. { Moreover, we design a cross-order contrastive loss (COCL) that can further address order-sensitive problem by pulling close the feature embedding generated from different input orders.} We validate our model on five widely used CoSOD datasets (CoCA, CoSOD3k, Cosal2015, iCoseg and MSRC), and three widely used datasets (Internet, iCoseg and PASCAL-VOC) for object co-segmentation, the performance demonstrates the superiority of the proposed approach as compared to the state-of-the-art (SOTA) methods.
[[2209.12160] PL-EVIO: Robust Monocular Event-based Visual Inertial Odometry with Point and Line Features](http://arxiv.org/abs/2209.12160)
Event cameras are motion-activated sensors that capture pixel-level illumination changes instead of the intensity image with a fixed frame rate. Compared with the standard cameras, it can provide reliable visual perception during high-speed motions and in high dynamic range scenarios. However, event cameras output only a little information or even noise when the relative motion between the camera and the scene is limited, such as in a still state. While standard cameras can provide rich perception information in most scenarios, especially in good lighting conditions. These two cameras are exactly complementary. In this paper, we proposed a robust, high-accurate, and real-time optimization-based monocular event-based visual-inertial odometry (VIO) method with event-corner features, line-based event features, and point-based image features. The proposed method offers to leverage the point-based features in the nature scene and line-based features in the human-made scene to provide more additional structure or constraints information through well-design feature management. Experiments in the public benchmark datasets show that our method can achieve superior performance compared with the state-of-the-art image-based or event-based VIO. Finally, we used our method to demonstrate an onboard closed-loop autonomous quadrotor flight and large-scale outdoor experiments. Videos of the evaluations are presented on our project website: https://b23.tv/OE3QM6j
[[2209.12221] Hand Hygiene Assessment via Joint Step Segmentation and Key Action Scorer](http://arxiv.org/abs/2209.12221)
Hand hygiene is a standard six-step hand-washing action proposed by the World Health Organization (WHO). However, there is no good way to supervise medical staff to do hand hygiene, which brings the potential risk of disease spread. In this work, we propose a new computer vision task called hand hygiene assessment to provide intelligent supervision of hand hygiene for medical staff. Existing action assessment works usually make an overall quality prediction on an entire video. However, the internal structures of hand hygiene action are important in hand hygiene assessment. Therefore, we propose a novel fine-grained learning framework to perform step segmentation and key action scorer in a joint manner for accurate hand hygiene assessment. Existing temporal segmentation methods usually employ multi-stage convolutional network to improve the segmentation robustness, but easily lead to over-segmentation due to the lack of the long-range dependence. To address this issue, we design a multi-stage convolution-transformer network for step segmentation. Based on the observation that each hand-washing step involves several key actions which determine the hand-washing quality, we design a set of key action scorers to evaluate the quality of key actions in each step. In addition, there lacks a unified dataset in hand hygiene assessment. Therefore, under the supervision of medical staff, we contribute a video dataset that contains 300 video sequences with fine-grained annotations. Extensive experiments on the dataset suggest that our method well assesses hand hygiene videos and achieves outstanding performance.
[[2209.12244] Multimodal Learning with Channel-Mixing and Masked Autoencoder on Facial Action Unit Detection](http://arxiv.org/abs/2209.12244)
Recent studies utilizing multi-modal data aimed at building a robust model for facial Action Unit (AU) detection. However, due to the heterogeneity of multi-modal data, multi-modal representation learning becomes one of the main challenges. On one hand, it is difficult to extract the relevant features from multi-modalities by only one feature extractor, on the other hand, previous studies have not fully explored the potential of multi-modal fusion strategies. For example, early fusion usually required all modalities to be present during inference, while late fusion and middle fusion increased the network size for feature learning. In contrast to a large amount of work on late fusion, there are few works on early fusion to explore the channel information. This paper presents a novel multi-modal network called Multi-modal Channel-Mixing (MCM), as a pre-trained model to learn a robust representation in order to facilitate the multi-modal fusion. We evaluate the learned representation on a downstream task of automatic facial action units detection. Specifically, it is a single stream encoder network that uses a channel-mixing module in early fusion, requiring only one modality in the downstream detection task. We also utilize the masked ViT encoder to learn features from the fusion image and reconstruct back two modalities with two ViT decoders. We have conducted extensive experiments on two public datasets, known as BP4D and DISFA, to evaluate the effectiveness and robustness of the proposed multimodal framework. The results show our approach is comparable or superior to the state-of-the-art baseline methods.
[[2209.12362] Multi-dataset Training of Transformers for Robust Action Recognition](http://arxiv.org/abs/2209.12362)
We study the task of robust feature representations, aiming to generalize well on multiple datasets for action recognition. We build our method on Transformers for its efficacy. Although we have witnessed great progress for video action recognition in the past decade, it remains challenging yet valuable how to train a single model that can perform well across multiple datasets. Here, we propose a novel multi-dataset training paradigm, MultiTrain, with the design of two new loss terms, namely informative loss and projection loss, aiming to learn robust representations for action recognition. In particular, the informative loss maximizes the expressiveness of the feature embedding while the projection loss for each dataset mines the intrinsic relations between classes across datasets. We verify the effectiveness of our method on five challenging datasets, Kinetics-400, Kinetics-700, Moments-in-Time, Activitynet and Something-something-v2 datasets. Extensive experimental results show that our method can consistently improve the state-of-the-art performance.
[[2209.12016] Unsupervised Model-based Pre-training for Data-efficient Control from Pixels](http://arxiv.org/abs/2209.12016)
Controlling artificial agents from visual sensory data is an arduous task. Reinforcement learning (RL) algorithms can succeed in this but require large amounts of interactions between the agent and the environment. To alleviate the issue, unsupervised RL proposes to employ self-supervised interaction and learning, for adapting faster to future tasks. Yet, whether current unsupervised strategies improve generalization capabilities is still unclear, especially in visual control settings. In this work, we design an effective unsupervised RL strategy for data-efficient visual control. First, we show that world models pre-trained with data collected using unsupervised RL can facilitate adaptation for future tasks. Then, we analyze several design choices to adapt efficiently, effectively reusing the agents' pre-trained components, and learning and planning in imagination, with our hybrid planner, which we dub Dyna-MPC. By combining the findings of a large-scale empirical study, we establish an approach that strongly improves performance on the Unsupervised RL Benchmark, requiring 20$\times$ less data to match the performance of supervised methods. The approach also demonstrates robust performance on the Real-Word RL benchmark, hinting that the approach generalizes to noisy environments.
[[2209.11885] Physics-Informed Graph Neural Network for Spatial-temporal Production Forecasting](http://arxiv.org/abs/2209.11885)
Production forecast based on historical data provides essential value for developing hydrocarbon resources. Classic history matching workflow is often computationally intense and geometry-dependent. Analytical data-driven models like decline curve analysis (DCA) and capacitance resistance models (CRM) provide a grid-free solution with a relatively simple model capable of integrating some degree of physics constraints. However, the analytical solution may ignore subsurface geometries and is appropriate only for specific flow regimes and otherwise may violate physics conditions resulting in degraded model prediction accuracy. Machine learning-based predictive model for time series provides non-parametric, assumption-free solutions for production forecasting, but are prone to model overfit due to training data sparsity; therefore may be accurate over short prediction time intervals.
We propose a grid-free, physics-informed graph neural network (PI-GNN) for production forecasting. A customized graph convolution layer aggregates neighborhood information from historical data and has the flexibility to integrate domain expertise into the data-driven model. The proposed method relaxes the dependence on close-form solutions like CRM and honors the given physics-based constraints. Our proposed method is robust, with improved performance and model interpretability relative to the conventional CRM and GNN baseline without physics constraints.
[[2209.12128] A Deep Learning Approach to Analyzing Continuous-Time Systems](http://arxiv.org/abs/2209.12128)
Scientists often use observational time series data to study complex natural processes, from climate change to civil conflict to brain activity. But regression analyses of these data often assume simplistic dynamics. Recent advances in deep learning have yielded startling improvements to the performance of models of complex processes, from speech comprehension to nuclear physics to competitive gaming. But deep learning is generally not used for scientific analysis. Here, we bridge this gap by showing that deep learning can be used, not just to imitate, but to analyze complex processes, providing flexible function approximation while preserving interpretability. Our approach -- the continuous-time deconvolutional regressive neural network (CDRNN) -- relaxes standard simplifying assumptions (e.g., linearity, stationarity, and homoscedasticity) that are implausible for many natural systems and may critically affect the interpretation of data. We evaluate CDRNNs on incremental human language processing, a domain with complex continuous dynamics. We demonstrate dramatic improvements to predictive likelihood in behavioral and neuroimaging data, and we show that CDRNNs enable flexible discovery of novel patterns in exploratory analyses, provide robust control of possible confounds in confirmatory analyses, and open up research questions that are otherwise hard to study using observational data.
[[2209.12340] Solving Seismic Wave Equations on Variable Velocity Models with Fourier Neural Operator](http://arxiv.org/abs/2209.12340)
In the study of subsurface seismic imaging, solving the acoustic wave equation is a pivotal component in existing models. With the advancement of deep learning, neural networks are applied to numerically solve partial differential equations by learning the mapping between the inputs and the solution of the equation, the wave equation in particular, since traditional methods can be time consuming if numerous instances are to be solved. Previous works that concentrate on solving the wave equation by neural networks consider either a single velocity model or multiple simple velocity models, which is restricted in practice. Therefore, inspired by the idea of operator learning, this work leverages the Fourier neural operator (FNO) to effectively learn the frequency domain seismic wavefields under the context of variable velocity models. Moreover, we propose a new framework paralleled Fourier neural operator (PFNO) for efficiently training the FNO-based solver given multiple source locations and frequencies. Numerical experiments demonstrate the high accuracy of both FNO and PFNO with complicated velocity models in the OpenFWI datasets. Furthermore, the cross-dataset generalization test verifies that PFNO adapts to out-of-distribution velocity models. Also, PFNO has robust performance in the presence of random noise in the labels. Finally, PFNO admits higher computational efficiency on large-scale testing datasets, compared with the traditional finite-difference method. The aforementioned advantages endow the FNO-based solver with the potential to build powerful models for research on seismic waves.
[[2209.12213] ECO-TR: Efficient Correspondences Finding Via Coarse-to-Fine Refinement](http://arxiv.org/abs/2209.12213)
Modeling sparse and dense image matching within a unified functional correspondence model has recently attracted increasing research interest. However, existing efforts mainly focus on improving matching accuracy while ignoring its efficiency, which is crucial for realworld applications. In this paper, we propose an efficient structure named Efficient Correspondence Transformer (ECO-TR) by finding correspondences in a coarse-to-fine manner, which significantly improves the efficiency of functional correspondence model. To achieve this, multiple transformer blocks are stage-wisely connected to gradually refine the predicted coordinates upon a shared multi-scale feature extraction network. Given a pair of images and for arbitrary query coordinates, all the correspondences are predicted within a single feed-forward pass. We further propose an adaptive query-clustering strategy and an uncertainty-based outlier detection module to cooperate with the proposed framework for faster and better predictions. Experiments on various sparse and dense matching tasks demonstrate the superiority of our method in both efficiency and effectiveness against existing state-of-the-arts.
[[2209.12177] Application of Deep Learning in Generating Structured Radiology Reports: A Transformer-Based Technique](http://arxiv.org/abs/2209.12177)
Since radiology reports needed for clinical practice and research are written and stored in free-text narrations, extraction of relative information for further analysis is difficult. In these circumstances, natural language processing (NLP) techniques can facilitate automatic information extraction and transformation of free-text formats to structured data. In recent years, deep learning (DL)-based models have been adapted for NLP experiments with promising results. Despite the significant potential of DL models based on artificial neural networks (ANN) and convolutional neural networks (CNN), the models face some limitations to implement in clinical practice. Transformers, another new DL architecture, have been increasingly applied to improve the process. Therefore, in this study, we propose a transformer-based fine-grained named entity recognition (NER) architecture for clinical information extraction. We collected 88 abdominopelvic sonography reports in free-text formats and annotated them based on our developed information schema. The text-to-text transfer transformer model (T5) and Scifive, a pre-trained domain-specific adaptation of the T5 model, were applied for fine-tuning to extract entities and relations and transform the input into a structured format. Our transformer-based model in this study outperformed previously applied approaches such as ANN and CNN models based on ROUGE-1, ROUGE-2, ROUGE-L, and BLEU scores of 0.816, 0.668, 0.528, and 0.743, respectively, while providing an interpretable structured report.
[[2209.11950] Developing a Knowledge Graph Framework for Pharmacokinetic Natural Product-Drug Interactions](http://arxiv.org/abs/2209.11950)
Pharmacokinetic natural product-drug interactions (NPDIs) occur when botanical natural products are co-consumed with pharmaceutical drugs. Understanding mechanisms of NPDIs is key to preventing adverse events. We constructed a knowledge graph framework, NP-KG, as a step toward computational discovery of pharmacokinetic NPDIs. NP-KG is a heterogeneous KG with biomedical ontologies, linked data, and full texts of the scientific literature, constructed with the Phenotype Knowledge Translator framework and the semantic relation extraction systems, SemRep and Integrated Network and Dynamic Reasoning Assembler. NP-KG was evaluated with case studies of pharmacokinetic green tea- and kratom-drug interactions through path searches and meta-path discovery to determine congruent and contradictory information compared to ground truth data. The fully integrated NP-KG consisted of 745,512 nodes and 7,249,576 edges. Evaluation of NP-KG resulted in congruent (38.98% for green tea, 50% for kratom), contradictory (15.25% for green tea, 21.43% for kratom), and both congruent and contradictory (15.25% for green tea, 21.43% for kratom) information. Potential pharmacokinetic mechanisms for several purported NPDIs, including the green tea-raloxifene, green tea-nadolol, kratom-midazolam, kratom-quetiapine, and kratom-venlafaxine interactions were congruent with the published literature. NP-KG is the first KG to integrate biomedical ontologies with full texts of the scientific literature focused on natural products. We demonstrate the application of NP-KG to identify pharmacokinetic interactions involving enzymes, transporters, and pharmaceutical drugs. We envision that NP-KG will facilitate improved human-machine collaboration to guide researchers in future studies of pharmacokinetic NPDIs. The NP-KG framework is publicly available at https://doi.org/10.5281/zenodo.6814507 and https://github.com/sanyabt/np-kg.
[[2209.11763] Enhancing Claim Classification with Feature Extraction from Anomaly-Detection-Derived Routine and Peculiarity Profiles](http://arxiv.org/abs/2209.11763)
Usage-based insurance is becoming the new standard in vehicle insurance; it is therefore relevant to find efficient ways of using insureds' driving data. Applying anomaly detection to vehicles' trip summaries, we develop a method allowing to derive a "routine" and a "peculiarity" anomaly profile for each vehicle. To this end, anomaly detection algorithms are used to compute a routine and a peculiarity anomaly score for each trip a vehicle makes. The former measures the anomaly degree of the trip compared to the other trips made by the concerned vehicle, while the latter measures its anomaly degree compared to trips made by any vehicle. The resulting anomaly scores vectors are used as routine and peculiarity profiles. Features are then extracted from these profiles, for which we investigate the predictive power in the claim classification framework. Using real data, we find that features extracted from the vehicles' peculiarity profile improve classification.
[[2209.11892] Highly Scalable Task Grouping for Deep Multi-Task Learning in Prediction of Epigenetic Events](http://arxiv.org/abs/2209.11892)
Deep neural networks trained for predicting cellular events from DNA sequence have become emerging tools to help elucidate the biological mechanism underlying the associations identified in genome-wide association studies. To enhance the training, multi-task learning (MTL) has been commonly exploited in previous works where trained networks were needed for multiple profiles differing in either event modality or cell type. All existing works adopted a simple MTL framework where all tasks share a single feature extraction network. Such a strategy even though effective to certain extent leads to substantial negative transfer, meaning the existence of large portion of tasks for which models obtained through MTL perform worse than those by single task learning. There have been methods developed to address such negative transfer in other domains, such as computer vision. However, these methods are generally difficult to scale up to handle large amount of tasks. In this paper, we propose a highly scalable task grouping framework to address negative transfer by only jointly training tasks that are potentially beneficial to each other. The proposed method exploits the network weights associated with task specific classification heads that can be cheaply obtained by one-time joint training of all tasks. Our results using a dataset consisting of 367 epigenetic profiles demonstrate the effectiveness of the proposed approach and its superiority over baseline methods.
[[2209.12282] Deep Feature Selection Using a Novel Complementary Feature Mask](http://arxiv.org/abs/2209.12282)
Feature selection has drawn much attention over the last decades in machine learning because it can reduce data dimensionality while maintaining the original physical meaning of features, which enables better interpretability than feature extraction. However, most existing feature selection approaches, especially deep-learning-based, often focus on the features with great importance scores only but neglect those with less importance scores during training as well as the order of important candidate features. This can be risky since some important and relevant features might be unfortunately ignored during training, leading to suboptimal solutions or misleading selections. In our work, we deal with feature selection by exploiting the features with less importance scores and propose a feature selection framework based on a novel complementary feature mask. Our method is generic and can be easily integrated into existing deep-learning-based feature selection approaches to improve their performance as well. Experiments have been conducted on benchmarking datasets and shown that the proposed method can select more representative and informative features than the state of the art.
[[2209.11944] Communication-Efficient {Federated} Learning Using Censored Heavy Ball Descent](http://arxiv.org/abs/2209.11944)
Distributed machine learning enables scalability and computational offloading, but requires significant levels of communication. Consequently, communication efficiency in distributed learning settings is an important consideration, especially when the communications are wireless and battery-driven devices are employed. In this paper we develop a censoring-based heavy ball (CHB) method for distributed learning in a server-worker architecture. Each worker self-censors unless its local gradient is sufficiently different from the previously transmitted one. The significant practical advantages of the HB method for learning problems are well known, but the question of reducing communications has not been addressed. CHB takes advantage of the HB smoothing to eliminate reporting small changes, and provably achieves a linear convergence rate equivalent to that of the classical HB method for smooth and strongly convex objective functions. The convergence guarantee of CHB is theoretically justified for both convex and nonconvex cases. In addition we prove that, under some conditions, at least half of all communications can be eliminated without any impact on convergence rate. Extensive numerical results validate the communication efficiency of CHB on both synthetic and real datasets, for convex, nonconvex, and nondifferentiable cases. Given a target accuracy, CHB can significantly reduce the number of communications compared to existing algorithms, achieving the same accuracy without slowing down the optimization process.
[[2209.12307] On the Stability Analysis of Open Federated Learning Systems](http://arxiv.org/abs/2209.12307)
We consider the open federated learning (FL) systems, where clients may join and/or leave the system during the FL process. Given the variability of the number of present clients, convergence to a fixed model cannot be guaranteed in open systems. Instead, we resort to a new performance metric that we term the stability of open FL systems, which quantifies the magnitude of the learned model in open systems. Under the assumption that local clients' functions are strongly convex and smooth, we theoretically quantify the radius of stability for two FL algorithms, namely local SGD and local Adam. We observe that this radius relies on several key parameters, including the function condition number as well as the variance of the stochastic gradient. Our theoretical results are further verified by numerical simulations on both synthetic and real-world benchmark data-sets.
[[2209.12099] Controllable Text Generation for Open-Domain Creativity and Fairness](http://arxiv.org/abs/2209.12099)
Recent advances in large pre-trained language models have demonstrated strong results in generating natural languages and significantly improved performances for many natural language generation (NLG) applications such as machine translation and text summarization. However, when the generation tasks are more open-ended and the content is under-specified, existing techniques struggle to generate long-term coherent and creative content. Moreover, the models exhibit and even amplify social biases that are learned from the training corpora. This happens because the generation models are trained to capture the surface patterns (i.e. sequences of words), instead of capturing underlying semantics and discourse structures, as well as background knowledge including social norms. In this paper, I introduce our recent works on controllable text generation to enhance the creativity and fairness of language generation models. We explore hierarchical generation and constrained decoding, with applications to creative language generation including story, poetry, and figurative languages, and bias mitigation for generation models.
[[2209.12226] Re-contextualizing Fairness in NLP: The Case of India](http://arxiv.org/abs/2209.12226)
Recent research has revealed undesirable biases in NLP data & models. However, these efforts focus of social disparities in West, and are not directly portable to other geo-cultural contexts. In this paper, we focus on NLP fairness in the context of India. We start with a brief account of prominent axes of social disparities in India. We build resources for fairness evaluation in the Indian context and use them to demonstrate prediction biases along some of the axes. We then delve deeper into social stereotypes for Region & Religion, demonstrating its prevalence in corpora & models. Finally, we outline a holistic research agenda to re-contextualize NLP fairness research for the Indian context, accounting for Indian societal context, bridging technological gaps in capability, resources, and adapting to Indian cultural values. While we focus on 'India' here, this framework can be generalized for recontextualization in other geo-cultural contexts.
[[2209.11762] Towards Auditing Unsupervised Learning Algorithms and Human Processes For Fairness](http://arxiv.org/abs/2209.11762)
Existing work on fairness typically focuses on making known machine learning algorithms fairer. Fair variants of classification, clustering, outlier detection and other styles of algorithms exist. However, an understudied area is the topic of auditing an algorithm's output to determine fairness. Existing work has explored the two group classification problem for binary protected status variables using standard definitions of statistical parity. Here we build upon the area of auditing by exploring the multi-group setting under more complex definitions of fairness.
[[2209.11817] An Efficient Algorithm for Fair Multi-Agent Multi-Armed Bandit with Low Regret](http://arxiv.org/abs/2209.11817)
Recently a multi-agent variant of the classical multi-armed bandit was proposed to tackle fairness issues in online learning. Inspired by a long line of work in social choice and economics, the goal is to optimize the Nash social welfare instead of the total utility. Unfortunately previous algorithms either are not efficient or achieve sub-optimal regret in terms of the number of rounds $T$. We propose a new efficient algorithm with lower regret than even previous inefficient ones. For $N$ agents, $K$ arms, and $T$ rounds, our approach has a regret bound of $\tilde{O}(\sqrt{NKT} + NK)$. This is an improvement to the previous approach, which has regret bound of $\tilde{O}( \min(NK, \sqrt{N} K^{3/2})\sqrt{T})$. We also complement our efficient algorithm with an inefficient approach with $\tilde{O}(\sqrt{KT} + N^2K)$ regret. The experimental findings confirm the effectiveness of our efficient algorithm compared to the previous approaches.
[[2209.11837] Doubly Fair Dynamic Pricing](http://arxiv.org/abs/2209.11837)
We study the problem of online dynamic pricing with two types of fairness constraints: a "procedural fairness" which requires the proposed prices to be equal in expectation among different groups, and a "substantive fairness" which requires the accepted prices to be equal in expectation among different groups. A policy that is simultaneously procedural and substantive fair is referred to as "doubly fair". We show that a doubly fair policy must be random to have higher revenue than the best trivial policy that assigns the same price to different groups. In a two-group setting, we propose an online learning algorithm for the 2-group pricing problems that achieves $\tilde{O}(\sqrt{T})$ regret, zero procedural unfairness and $\tilde{O}(\sqrt{T})$ substantive unfairness over $T$ rounds of learning. We also prove two lower bounds showing that these results on regret and unfairness are both information-theoretically optimal up to iterated logarithmic factors. To the best of our knowledge, this is the first dynamic pricing algorithm that learns to price while satisfying two fairness constraints at the same time.
[[2209.11972] Ground then Navigate: Language-guided Navigation in Dynamic Scenes](http://arxiv.org/abs/2209.11972)
We investigate the Vision-and-Language Navigation (VLN) problem in the context of autonomous driving in outdoor settings. We solve the problem by explicitly grounding the navigable regions corresponding to the textual command. At each timestamp, the model predicts a segmentation mask corresponding to the intermediate or the final navigable region. Our work contrasts with existing efforts in VLN, which pose this task as a node selection problem, given a discrete connected graph corresponding to the environment. We do not assume the availability of such a discretised map. Our work moves towards continuity in action space, provides interpretability through visual feedback and allows VLN on commands requiring finer manoeuvres like "park between the two cars". Furthermore, we propose a novel meta-dataset CARLA-NAV to allow efficient training and validation. The dataset comprises pre-recorded training sequences and a live environment for validation and testing. We provide extensive qualitative and quantitive empirical results to validate the efficacy of the proposed approach.
[[2209.11799] Emb-GAM: an Interpretable and Efficient Predictor using Pre-trained Language Models](http://arxiv.org/abs/2209.11799)
Deep learning models have achieved impressive prediction performance but often sacrifice interpretability, a critical consideration in high-stakes domains such as healthcare or policymaking. In contrast, generalized additive models (GAMs) can maintain interpretability but often suffer from poor prediction performance due to their inability to effectively capture feature interactions. In this work, we aim to bridge this gap by using pre-trained neural language models to extract embeddings for each input before learning a linear model in the embedding space. The final model (which we call Emb-GAM) is a transparent, linear function of its input features and feature interactions. Leveraging the language model allows Emb-GAM to learn far fewer linear coefficients, model larger interactions, and generalize well to novel inputs (e.g. unseen ngrams in text). Across a variety of natural-language-processing datasets, Emb-GAM achieves strong prediction performance without sacrificing interpretability. All code is made available on Github.
[[2209.12001] Toward Intention Discovery for Early Malice Detection in Bitcoin](http://arxiv.org/abs/2209.12001)
Bitcoin has been subject to illicit activities more often than probably any other financial assets, due to the pseudo-anonymous nature of its transacting entities. An ideal detection model is expected to achieve all the three properties of (I) early detection, (II) good interpretability, and (III) versatility for various illicit activities. However, existing solutions cannot meet all these requirements, as most of them heavily rely on deep learning without satisfying interpretability and are only available for retrospective analysis of a specific illicit type.
First, we present asset transfer paths, which aim to describe addresses' early characteristics. Next, with a decision tree based strategy for feature selection and segmentation, we split the entire observation period into different segments and encode each as a segment vector. After clustering all these segment vectors, we get the global status vectors, essentially the basic unit to describe the whole intention. Finally, a hierarchical self-attention predictor predicts the label for the given address in real time. A survival module tells the predictor when to stop and proposes the status sequence, namely intention. %
With the type-dependent selection strategy and global status vectors, our model can be applied to detect various illicit activities with strong interpretability. The well-designed predictor and particular loss functions strengthen the model's prediction speed and interpretability one step further. Extensive experiments on three real-world datasets show that our proposed algorithm outperforms state-of-the-art methods. Besides, additional case studies justify our model can not only explain existing illicit patterns but can also find new suspicious characters.