[[2208.06594] Using identity-based cryptography in mobile applications](http://arxiv.org/abs/2208.06594)
This work includes a review of two cases study of mobile applications that use Identity-Based Cryptography (IBC) to protect communications. It also describes a proposal of a new mobile application that combines the use of IBC for Wi-Fi or Bluetooth communication between smartphones, with the promising Near Field Communication (NFC) technology for secure authentication. The proposed scheme involves NFC pairing to establish as public key a piece of information linked to the device, such as the phone number, so that this information is then used in an IBC scheme for peer-to-peer communication. This is a work in progress, so the implementation of a prototype based on smartphones is still being improved.
[[2208.06946] Targeted Honeyword Generation with Language Models](http://arxiv.org/abs/2208.06946)
Honeywords are fictitious passwords inserted into databases in order to identify password breaches. The major difficulty is how to produce honeywords that are difficult to distinguish from real passwords. Although the generation of honeywords has been widely investigated in the past, the majority of existing research assumes attackers have no knowledge of the users. These honeyword generating techniques (HGTs) may utterly fail if attackers exploit users' personally identifiable information (PII) and the real passwords include users' PII. In this paper, we propose to build a more secure and trustworthy authentication system that employs off-the-shelf pre-trained language models which require no further training on real passwords to produce honeywords while retaining the PII of the associated real password, therefore significantly raising the bar for attackers.
We conducted a pilot experiment in which individuals are asked to distinguish between authentic passwords and honeywords when the username is provided for GPT-3 and a tweaking technique. Results show that it is extremely difficult to distinguish the real passwords from the artifical ones for both techniques. We speculate that a larger sample size could reveal a significant difference between the two HGT techniques, favouring our proposed approach.
[[2208.07060] A Blockchain-based Decentralised and Dynamic Authorisation Scheme for the Internet of Things](http://arxiv.org/abs/2208.07060)
An authorisation has been recognised as an important security measure for preventing unauthorised access to critical resources, such as devices and data, within the Internet of Things (IoT) networks. Existing authorisation methods for the IoT network are based on traditional access control models, which have several drawbacks, including architecture centralisation, policy tampering, access rights validation, malicious third-party policy assignment and control, and network-related overheads. The increasing trend of integrating Blockchain technology with IoT networks demonstrates its importance and potential to address the shortcomings of traditional IoT network authorisation mechanisms. This paper proposes a decentralised, secure, dynamic, and flexible authorisation scheme for IoT networks based on attribute-based access control (ABAC) fine-grained policies stored on a distributed immutable ledger. We design a Blockchain-based ABAC policy management framework divided into Attribute Management Authority (AMA) and Policy Management Authority (PMA) frameworks that use smart contract features to initialise, store, and manage attributes and policies on the Blockchain. To achieve flexibility and dynamicity in the authorisation process, we capture and utilise the environmental-related attributes in conjunction with the subject and object attributes of the ABAC model to define the policies. Furthermore, we designed the Blockchain-based Access Management Framework (AMF) to manage user requests to access IoT devices while maintaining the privacy and auditability of user requests and assigned policies. We implemented a prototype of our proposed scheme and executed it on the local Ethereum Blockchain. Finally, we demonstrated the applicability and flexibility of our proposed scheme for an IoT-based smart home scenario, taking into account deployment, execution and financial costs.
[[2208.07189] DHSA: Efficient Doubly Homomorphic Secure Aggregation for Cross-silo Federated Learning](http://arxiv.org/abs/2208.07189)
Secure aggregation is widely used in horizontal Federated Learning (FL), to prevent leakage of training data when model updates from data owners are aggregated. Secure aggregation protocols based on Homomorphic Encryption (HE) have been utilized in industrial cross-silo FL systems, one of the settings involved with privacy-sensitive organizations such as financial or medical, presenting more stringent requirements on privacy security. However, existing HE-based solutions have limitations in efficiency and security guarantees against colluding adversaries without a Trust Third Party.
This paper proposes an efficient Doubly Homomorphic Secure Aggregation (DHSA) scheme for cross-silo FL, which utilizes multi-key Homomorphic Encryption (MKHE) and seed homomorphic pseudorandom generator (SHPRG) as cryptographic primitives. The application of MKHE provides strong security guarantees against up to $N-2$ participates colluding with the aggregator, with no TTP required. To mitigate the large computation and communication cost of MKHE, we leverage the homomorphic property of SHPRG to replace the majority of MKHE computation by computationally-friendly mask generation from SHPRG, while preserving the security. Overall, the resulting scheme satisfies the stringent security requirements of typical cross-silo FL scenarios, at the same time providing high computation and communication efficiency for practical usage. We experimentally demonstrate our scheme brings a speedup to 20$\times$ over the state-of-the-art HE-based secure aggregation, and reduces the traffic volume to approximately 1.5$\times$ inflation over the plain learning setting.
[[2208.06592] Confidence Matters: Inspecting Backdoors in Deep Neural Networks via Distribution Transfer](http://arxiv.org/abs/2208.06592)
Backdoor attacks have been shown to be a serious security threat against deep learning models, and detecting whether a given model has been backdoored becomes a crucial task. Existing defenses are mainly built upon the observation that the backdoor trigger is usually of small size or affects the activation of only a few neurons. However, the above observations are violated in many cases especially for advanced backdoor attacks, hindering the performance and applicability of the existing defenses. In this paper, we propose a backdoor defense DTInspector built upon a new observation. That is, an effective backdoor attack usually requires high prediction confidence on the poisoned training samples, so as to ensure that the trained model exhibits the targeted behavior with a high probability. Based on this observation, DTInspector first learns a patch that could change the predictions of most high-confidence data, and then decides the existence of backdoor by checking the ratio of prediction changes after applying the learned patch on the low-confidence data. Extensive evaluations on five backdoor attacks, four datasets, and three advanced attacking types demonstrate the effectiveness of the proposed defense.
[[2208.07049] Self-Supervised Vision Transformers for Malware Detection](http://arxiv.org/abs/2208.07049)
Malware detection plays a crucial role in cyber-security with the increase in malware growth and advancements in cyber-attacks. Previously unseen malware which is not determined by security vendors are often used in these attacks and it is becoming inevitable to find a solution that can self-learn from unlabeled sample data. This paper presents SHERLOCK, a self-supervision based deep learning model to detect malware based on the Vision Transformer (ViT) architecture. SHERLOCK is a novel malware detection method which learns unique features to differentiate malware from benign programs with the use of image-based binary representation. Experimental results using 1.2 million Android applications across a hierarchy of 47 types and 696 families, shows that self-supervised learning can achieve an accuracy of 97% for the binary classification of malware which is higher than existing state-of-the-art techniques. Our proposed model is also able to outperform state-of-the-art techniques for multi-class malware classification of types and family with macro-F1 score of .497 and .491 respectively.
[[2208.06593] Analysis and implementation of the SNOW 3G generator used in 4G/LTE systems](http://arxiv.org/abs/2208.06593)
The fourth generation of cell phones, marketed as 4G/LTE (Long-Term Evolution) is being quickly adopted worldwide. Given the mobile and wireless nature of the involved communications, security is crucial. This paper includes both a theoretical study and a practical analysis of the SNOW 3G generator, included in such a standard for protecting confidentiality and integrity. From its implementation and performance evaluation in mobile devices, several conclusions about how to improve its efficiency are obtained.
[[2208.06722] A hands-on gaze on HTTP/3 security through the lens of HTTP/2 and a public dataset](http://arxiv.org/abs/2208.06722)
Following QUIC protocol ratification on May 2021, the third major version of the Hypertext Transfer Protocol, namely HTTP/3, was published around one year later in RFC 9114. In light of these consequential advancements, the current work aspires to provide a full-blown coverage of the following issues, which to our knowledge have received feeble or no attention in the literature so far. First, we provide a complete review of attacks against HTTP/2, and elaborate on if and in which way they can be migrated to HTTP/3. Second, through the creation of a testbed comprising the at present six most popular HTTP/3-enabled servers, we examine the effectiveness of a quartet of attacks, either stemming directly from the HTTP/2 relevant literature or being entirely new. This scrutiny led to the assignment of at least one CVE ID with a critical base score by MITRE. No less important, by capitalizing on a realistic, abundant in devices testbed, we compiled a voluminous, labeled corpus containing traces of ten diverse attacks against HTTP and QUIC services. An initial evaluation of the dataset mainly by means of machine learning techniques is included as well. Given that the 30 GB dataset is made available in both pcap and CSV formats, forthcoming research can easily take advantage of any subset of features, contingent upon the specific network topology and configuration.
[[2208.06774] Cryptanalyzing an Image Encryption Algorithm Underpinned by 2D Lag-Complex Logistic Map](http://arxiv.org/abs/2208.06774)
This paper analyzes security performance of an image encryption algorithm using 2D lag-complex Logistic map (LCLM), which adopts it as a pseudo-random number generator, and uses the sum of all pixel values of the plain-image as its initial value to control the random combination of the basic encryption operations. However, multiple factors make the final pseudo-random sequences controlling the encryption process may be the same for different plain-images. Based on this point, we proposed a chosen-plaintext attack by attacking the six encryption steps with a strategy of divide and conquer. Using the pitfalls of 2D-LCLM, the number of required chosen plain-images is further reduced to $5\cdot\log_2(MN)+95$, where $\mathit{MN}$ is the number of pixels of the plain-image.
[[2208.06943] GNPassGAN: Improved Generative Adversarial Networks For Trawling Offline Password Guessing](http://arxiv.org/abs/2208.06943)
The security of passwords depends on a thorough understanding of the strategies used by attackers. Unfortunately, real-world adversaries use pragmatic guessing tactics like dictionary attacks, which are difficult to simulate in password security research. Dictionary attacks must be carefully configured and modified to represent an actual threat. This approach, however, needs domain-specific knowledge and expertise that are difficult to duplicate. This paper reviews various deep learning-based password guessing approaches that do not require domain knowledge or assumptions about users' password structures and combinations. It also introduces GNPassGAN, a password guessing tool built on generative adversarial networks for trawling offline attacks. In comparison to the state-of-the-art PassGAN model, GNPassGAN is capable of guessing 88.03\% more passwords and generating 31.69\% fewer duplicates.
[[2208.07127] Deception for Cyber Defence: Challenges and Opportunities](http://arxiv.org/abs/2208.07127)
Deception is rapidly growing as an important tool for cyber defence, complementing existing perimeter security measures to rapidly detect breaches and data theft. One of the factors limiting the use of deception has been the cost of generating realistic artefacts by hand. Recent advances in Machine Learning have, however, created opportunities for scalable, automated generation of realistic deceptions. This vision paper describes the opportunities and challenges involved in developing models to mimic many common elements of the IT stack for deception effects.
[[2208.06997] Combining deep learning and crowdsourcing geo-images to predict housing quality in rural China](http://arxiv.org/abs/2208.06997)
Housing quality is an essential proxy for regional wealth, security and health. Understanding the distribution of housing quality is crucial for unveiling rural development status and providing political proposals. However,present rural house quality data highly depends on a top-down, time-consuming survey at the national or provincial level but fails to unpack the housing quality at the village level. To fill the gap between accurately depicting rural housing quality conditions and deficient data,we collect massive rural images and invite users to assess their housing quality at scale. Furthermore, a deep learning framework is proposed to automatically and efficiently predict housing quality based on crowd-sourcing rural images.
[[2208.06962] InvisibiliTee: Angle-agnostic Cloaking from Person-Tracking Systems with a Tee](http://arxiv.org/abs/2208.06962)
After a survey for person-tracking system-induced privacy concerns, we propose a black-box adversarial attack method on state-of-the-art human detection models called InvisibiliTee. The method learns printable adversarial patterns for T-shirts that cloak wearers in the physical world in front of person-tracking systems. We design an angle-agnostic learning scheme which utilizes segmentation of the fashion dataset and a geometric warping process so the adversarial patterns generated are effective in fooling person detectors from all camera angles and for unseen black-box detection models. Empirical results in both digital and physical environments show that with the InvisibiliTee on, person-tracking systems' ability to detect the wearer drops significantly.
[[2208.06481] PRIVEE: A Visual Analytic Workflow for Proactive Privacy Risk Inspection of Open Data](http://arxiv.org/abs/2208.06481)
Open data sets that contain personal information are susceptible to adversarial attacks even when anonymized. By performing low-cost joins on multiple datasets with shared attributes, malicious users of open data portals might get access to information that violates individuals' privacy. However, open data sets are primarily published using a release-and-forget model, whereby data owners and custodians have little to no cognizance of these privacy risks. We address this critical gap by developing a visual analytic solution that enables data defenders to gain awareness about the disclosure risks in local, joinable data neighborhoods. The solution is derived through a design study with data privacy researchers, where we initially play the role of a red team and engage in an ethical data hacking exercise based on privacy attack scenarios. We use this problem and domain characterization to develop a set of visual analytic interventions as a defense mechanism and realize them in PRIVEE, a visual risk inspection workflow that acts as a proactive monitor for data defenders. PRIVEE uses a combination of risk scores and associated interactive visualizations to let data defenders explore vulnerable joins and interpret risks at multiple levels of data granularity. We demonstrate how PRIVEE can help emulate the attack strategies and diagnose disclosure risks through two case studies with data privacy experts.
[[2208.06680] Locating disparities in machine learning](http://arxiv.org/abs/2208.06680)
Machine learning was repeatedly proven to provide predictions with disparate outcomes, in which subgroups of the population (e.g., defined by age, gender, or other sensitive attributes) are systematically disadvantaged. Previous literature has focused on detecting such disparities through statistical procedures for when the sensitive attribute is specified a priori. However, this limits applicability in real-world settings where datasets are high dimensional and, on top of that, sensitive attributes may be unknown. As a remedy, we propose a data-driven framework called Automatic Location of Disparities (ALD) which aims at locating disparities in machine learning. ALD meets several demands from machine learning practice: ALD (1) is applicable to arbitrary machine learning classifiers; (2) operates on different definitions of disparities (e.g., statistical parity or equalized odds); (3) deals with both categorical and continuous predictors; (4) is suitable to handle high-dimensional settings; and (5) even identifies disparities due to intersectionality where disparities arise from complex and multi-way interactions (e.g., age above 60 and female). ALD produces interpretable fairness reports as output. We demonstrate the effectiveness of ALD based on both synthetic and real-world datasets. As a result, ALD helps practitioners and researchers of algorithmic fairness to detect disparities in machine learning algorithms, so that disparate -- or even unfair -- outcomes can be mitigated. Moreover, ALD supports practitioners in conducting algorithmic audits and protecting individuals from discrimination.
[[2208.06537] Defense against Backdoor Attacks via Identifying and Purifying Bad Neurons](http://arxiv.org/abs/2208.06537)
The opacity of neural networks leads their vulnerability to backdoor attacks, where hidden attention of infected neurons is triggered to override normal predictions to the attacker-chosen ones. In this paper, we propose a novel backdoor defense method to mark and purify the infected neurons in the backdoored neural networks. Specifically, we first define a new metric, called benign salience. By combining the first-order gradient to retain the connections between neurons, benign salience can identify the infected neurons with higher accuracy than the commonly used metric in backdoor defense. Then, a new Adaptive Regularization (AR) mechanism is proposed to assist in purifying these identified infected neurons via fine-tuning. Due to the ability to adapt to different magnitudes of parameters, AR can provide faster and more stable convergence than the common regularization mechanism in neuron purifying. Extensive experimental results demonstrate that our method can erase the backdoor in neural networks with negligible performance degradation.
[[2208.06538] MaskBlock: Transferable Adversarial Examples with Bayes Approach](http://arxiv.org/abs/2208.06538)
The transferability of adversarial examples (AEs) across diverse models is of critical importance for black-box adversarial attacks, where attackers cannot access the information about black-box models. However, crafted AEs always present poor transferability. In this paper, by regarding the transferability of AEs as generalization ability of the model, we reveal that vanilla black-box attacks craft AEs via solving a maximum likelihood estimation (MLE) problem. For MLE, the results probably are model-specific local optimum when available data is small, i.e., limiting the transferability of AEs. By contrast, we re-formulate crafting transferable AEs as the maximizing a posteriori probability estimation problem, which is an effective approach to boost the generalization of results with limited available data. Because Bayes posterior inference is commonly intractable, a simple yet effective method called MaskBlock is developed to approximately estimate. Moreover, we show that the formulated framework is a generalization version for various attack methods. Extensive experiments illustrate MaskBlock can significantly improve the transferability of crafted adversarial examples by up to about 20%.
[[2208.06918] Gradient Mask: Lateral Inhibition Mechanism Improves Performance in Artificial Neural Networks](http://arxiv.org/abs/2208.06918)
Lateral inhibitory connections have been observed in the cortex of the biological brain, and has been extensively studied in terms of its role in cognitive functions. However, in the vanilla version of backpropagation in deep learning, all gradients (which can be understood to comprise of both signal and noise gradients) flow through the network during weight updates. This may lead to overfitting. In this work, inspired by biological lateral inhibition, we propose Gradient Mask, which effectively filters out noise gradients in the process of backpropagation. This allows the learned feature information to be more intensively stored in the network while filtering out noisy or unimportant features. Furthermore, we demonstrate analytically how lateral inhibition in artificial neural networks improves the quality of propagated gradients. A new criterion for gradient quality is proposed which can be used as a measure during training of various convolutional neural networks (CNNs). Finally, we conduct several different experiments to study how Gradient Mask improves the performance of the network both quantitatively and qualitatively. Quantitatively, accuracy in the original CNN architecture, accuracy after pruning, and accuracy after adversarial attacks have shown improvements. Qualitatively, the CNN trained using Gradient Mask has developed saliency maps that focus primarily on the object of interest, which is useful for data augmentation and network interpretability.
[[2208.06984] A Multi-objective Memetic Algorithm for Auto Adversarial Attack Optimization Design](http://arxiv.org/abs/2208.06984)
The phenomenon of adversarial examples has been revealed in variant scenarios. Recent studies show that well-designed adversarial defense strategies can improve the robustness of deep learning models against adversarial examples. However, with the rapid development of defense technologies, it also tends to be more difficult to evaluate the robustness of the defensed model due to the weak performance of existing manually designed adversarial attacks. To address the challenge, given the defensed model, the efficient adversarial attack with less computational burden and lower robust accuracy is needed to be further exploited. Therefore, we propose a multi-objective memetic algorithm for auto adversarial attack optimization design, which realizes the automatical search for the near-optimal adversarial attack towards defensed models. Firstly, the more general mathematical model of auto adversarial attack optimization design is constructed, where the search space includes not only the attacker operations, magnitude, iteration number, and loss functions but also the connection ways of multiple adversarial attacks. In addition, we develop a multi-objective memetic algorithm combining NSGA-II and local search to solve the optimization problem. Finally, to decrease the evaluation cost during the search, we propose a representative data selection strategy based on the sorting of cross entropy loss values of each images output by models. Experiments on CIFAR10, CIFAR100, and ImageNet datasets show the effectiveness of our proposed method.
[[2208.06628] CANdito: Improving Payload-based Detection of Attacks on Controller Area Networks](http://arxiv.org/abs/2208.06628)
Over the years, the increasingly complex and interconnected vehicles raised the need for effective and efficient Intrusion Detection Systems against on-board networks. In light of the stringent domain requirements and the heterogeneity of information transmitted on Controller Area Network, multiple approaches have been proposed, which work at different abstraction levels and granularities. Among these, RNN-based solutions received the attention of the research community for their performances and promising results. In this paper, we improve CANnolo, an RNN-based state-of-the-art IDS for CAN, by proposing CANdito, an unsupervised IDS that exploits Long Short-Term Memory autoencoders to detect anomalies through a signal reconstruction process. We evaluate CANdito by measuring its effectiveness against a comprehensive set of synthetic attacks injected in a real-world CAN dataset. We demonstrate the improvement of CANdito with respect to CANnolo on a real-world dataset injected with a comprehensive set of attacks, both in terms of detection and temporal performances.
[[2208.06956] ARIEL: Adversarial Graph Contrastive Learning](http://arxiv.org/abs/2208.06956)
Contrastive learning is an effective unsupervised method in graph representation learning, and the key component of contrastive learning lies in the construction of positive and negative samples. Previous methods usually utilize the proximity of nodes in the graph as the principle. Recently, the data augmentation based contrastive learning method has advanced to show great power in the visual domain, and some works extended this method from images to graphs. However, unlike the data augmentation on images, the data augmentation on graphs is far less intuitive and much harder to provide high-quality contrastive samples, which leaves much space for improvement. In this work, by introducing an adversarial graph view for data augmentation, we propose a simple but effective method, Adversarial Graph Contrastive Learning (ARIEL), to extract informative contrastive samples within reasonable constraints. We develop a new technique called information regularization for stable training and use subgraph sampling for scalability. We generalize our method from node-level contrastive learning to the graph-level by treating each graph instance as a supernode. ARIEL consistently outperforms the current graph contrastive learning methods for both node-level and graph-level classification tasks on real-world datasets. We further demonstrate that ARIEL is more robust in face of adversarial attacks.
[[2208.06416] Uni6Dv2: Noise Elimination for 6D Pose Estimation](http://arxiv.org/abs/2208.06416)
Few prior 6D pose estimation methods use a backbone network to extract features from RGB and depth images, and Uni6D is the pioneer to do so. We find that primary causes of the performance limitation in Uni6D are Instance-Outside and Instance-Inside noise. Uni6D inevitably introduces Instance-Outside noise from background pixels in the receptive field due to its inherently straightforward pipeline design and ignores the Instance-Inside noise in the input depth data. In this work, we propose a two-step denoising method to handle aforementioned noise in Uni6D. In the first step, an instance segmentation network is used to crop and mask the instance to remove noise from non-instance regions. In the second step, a lightweight depth denoising module is proposed to calibrate the depth feature before feeding it into the pose regression network. Extensive experiments show that our method called Uni6Dv2 is able to eliminate the noise effectively and robustly, outperforming Uni6D without sacrificing too much inference efficiency. It also reduces the need for annotated real data that requires costly labeling.
[[2208.06461] Real-Time Accident Detection in Traffic Surveillance Using Deep Learning](http://arxiv.org/abs/2208.06461)
Automatic detection of traffic accidents is an important emerging topic in traffic monitoring systems. Nowadays many urban intersections are equipped with surveillance cameras connected to traffic management systems. Therefore, computer vision techniques can be viable tools for automatic accident detection. This paper presents a new efficient framework for accident detection at intersections for traffic surveillance applications. The proposed framework consists of three hierarchical steps, including efficient and accurate object detection based on the state-of-the-art YOLOv4 method, object tracking based on Kalman filter coupled with the Hungarian algorithm for association, and accident detection by trajectory conflict analysis. A new cost function is applied for object association to accommodate for occlusion, overlapping objects, and shape changes in the object tracking step. The object trajectories are analyzed in terms of velocity, angle, and distance in order to detect different types of trajectory conflicts including vehicle-to-vehicle, vehicle-to-pedestrian, and vehicle-to-bicycle. Experimental results using real traffic video data show the feasibility of the proposed method in real-time applications of traffic surveillance. In particular, trajectory conflicts, including near-accidents and accidents occurring at urban intersections are detected with a low false alarm rate and a high detection rate. The robustness of the proposed framework is evaluated using video sequences collected from YouTube with diverse illumination conditions. The dataset is publicly available at: this http URL
[[2208.06579] Enhanced Vehicle Re-identification for ITS: A Feature Fusion approach using Deep Learning](http://arxiv.org/abs/2208.06579)
In recent years, the development of robust Intelligent transportation systems (ITS) is tackled across the globe to provide better traffic efficiency by reducing frequent traffic problems. As an application of ITS, vehicle re-identification has gained ample interest in the domain of computer vision and robotics. Convolutional neural network (CNN) based methods are developed to perform vehicle re-identification to address key challenges such as occlusion, illumination change, scale, etc. The advancement of transformers in computer vision has opened an opportunity to explore the re-identification process further to enhance performance. In this paper, a framework is developed to perform the re-identification of vehicles across CCTV cameras. To perform re-identification, the proposed framework fuses the vehicle representation learned using a CNN and a transformer model. The framework is tested on a dataset that contains 81 unique vehicle identities observed across 20 CCTV cameras. From the experiments, the fused vehicle re-identification framework yields an mAP of 61.73% which is significantly better when compared with the standalone CNN or transformer model.
[[2208.06674] DS-MVSNet: Unsupervised Multi-view Stereo via Depth Synthesis](http://arxiv.org/abs/2208.06674)
In recent years, supervised or unsupervised learning-based MVS methods achieved excellent performance compared with traditional methods. However, these methods only use the probability volume computed by cost volume regularization to predict reference depths and this manner cannot mine enough information from the probability volume. Furthermore, the unsupervised methods usually try to use two-step or additional inputs for training which make the procedure more complicated. In this paper, we propose the DS-MVSNet, an end-to-end unsupervised MVS structure with the source depths synthesis. To mine the information in probability volume, we creatively synthesize the source depths by splattering the probability volume and depth hypotheses to source views. Meanwhile, we propose the adaptive Gaussian sampling and improved adaptive bins sampling approach that improve the depths hypotheses accuracy. On the other hand, we utilize the source depths to render the reference images and propose depth consistency loss and depth smoothness loss. These can provide additional guidance according to photometric and geometric consistency in different views without additional inputs. Finally, we conduct a series of experiments on the DTU dataset and Tanks & Temples dataset that demonstrate the efficiency and robustness of our DS-MVSNet compared with the state-of-the-art methods.
[[2208.06811] Contrastive Learning for Joint Normal Estimation and Point Cloud Filtering](http://arxiv.org/abs/2208.06811)
Point cloud filtering and normal estimation are two fundamental research problems in the 3D field. Existing methods usually perform normal estimation and filtering separately and often show sensitivity to noise and/or inability to preserve sharp geometric features such as corners and edges. In this paper, we propose a novel deep learning method to jointly estimate normals and filter point clouds. We first introduce a 3D patch based contrastive learning framework, with noise corruption as an augmentation, to train a feature encoder capable of generating faithful representations of point cloud patches while remaining robust to noise. These representations are consumed by a simple regression network and supervised by a novel joint loss, simultaneously estimating point normals and displacements that are used to filter the patch centers. Experimental results show that our method well supports the two tasks simultaneously and preserves sharp features and fine details. It generally outperforms state-of-the-art techniques on both tasks.
[[2208.06866] HyP$^2$ Loss: Beyond Hypersphere Metric Space for Multi-label Image Retrieval](http://arxiv.org/abs/2208.06866)
Image retrieval has become an increasingly appealing technique with broad multimedia application prospects, where deep hashing serves as the dominant branch towards low storage and efficient retrieval. In this paper, we carried out in-depth investigations on metric learning in deep hashing for establishing a powerful metric space in multi-label scenarios, where the pair loss suffers high computational overhead and converge difficulty, while the proxy loss is theoretically incapable of expressing the profound label dependencies and exhibits conflicts in the constructed hypersphere space. To address the problems, we propose a novel metric learning framework with Hybrid Proxy-Pair Loss (HyP$^2$ Loss) that constructs an expressive metric space with efficient training complexity w.r.t. the whole dataset. The proposed HyP$^2$ Loss focuses on optimizing the hypersphere space by learnable proxies and excavating data-to-data correlations of irrelevant pairs, which integrates sufficient data correspondence of pair-based methods and high-efficiency of proxy-based methods. Extensive experiments on four standard multi-label benchmarks justify the proposed method outperforms the state-of-the-art, is robust among different hash bits and achieves significant performance gains with a faster, more stable convergence speed. Our code is available at https://github.com/JerryXu0129/HyP2-Loss.
[[2208.06882] CoShNet: A Hybird Complex Valued Neural Network using Shearlets](http://arxiv.org/abs/2208.06882)
In a hybrid neural network, the expensive convolutional layers are replaced by a non-trainable fixed transform with a great reduction in parameters. In previous works, good results were obtained by replacing the convolutions with wavelets. However, wavelet based hybrid network inherited wavelet's lack of vanishing moments along curves and its axis-bias. We propose to use Shearlets with its robust support for important image features like edges, ridges and blobs. The resulting network is called Complex Shearlets Network (CoShNet). It was tested on Fashion-MNIST against ResNet-50 and Resnet-18, obtaining 92.2% versus 90.7% and 91.8% respectively. The proposed network has 49.9k parameters versus ResNet-18 with 11.18m and use 52 times fewer FLOPs. Finally, we trained in under 20 epochs versus 200 epochs required by ResNet and do not need any hyperparameter tuning nor regularization.
Code: https://github.com/Ujjawal-K-Panchal/coshnet
[[2208.06966] STAR-GNN: Spatial-Temporal Video Representation for Content-based Retrieval](http://arxiv.org/abs/2208.06966)
We propose a video feature representation learning framework called STAR-GNN, which applies a pluggable graph neural network component on a multi-scale lattice feature graph. The essence of STAR-GNN is to exploit both the temporal dynamics and spatial contents as well as visual connections between regions at different scales in the frames. It models a video with a lattice feature graph in which the nodes represent regions of different granularity, with weighted edges that represent the spatial and temporal links. The contextual nodes are aggregated simultaneously by graph neural networks with parameters trained with retrieval triplet loss. In the experiments, we show that STAR-GNN effectively implements a dynamic attention mechanism on video frame sequences, resulting in the emphasis for dynamic and semantically rich content in the video, and is robust to noise and redundancies. Empirical results show that STAR-GNN achieves state-of-the-art performance for Content-Based Video Retrieval.
[[2208.06980] Faster Attention Is What You Need: A Fast Self-Attention Neural Network Backbone Architecture for the Edge via Double-Condensing Attention Condensers](http://arxiv.org/abs/2208.06980)
With the growing adoption of deep learning for on-device TinyML applications, there has been an ever-increasing demand for more efficient neural network backbones optimized for the edge. Recently, the introduction of attention condenser networks have resulted in low-footprint, highly-efficient, self-attention neural networks that strike a strong balance between accuracy and speed. In this study, we introduce a new faster attention condenser design called double-condensing attention condensers that enable more condensed feature embedding. We further employ a machine-driven design exploration strategy that imposes best practices design constraints for greater efficiency and robustness to produce the macro-micro architecture constructs of the backbone. The resulting backbone (which we name AttendNeXt) achieves significantly higher inference throughput on an embedded ARM processor when compared to several other state-of-the-art efficient backbones (>10X faster than FB-Net C at higher accuracy and speed) while having a small model size (>1.47X smaller than OFA-62 at higher speed and similar accuracy) and strong accuracy (1.1% higher top-1 accuracy than MobileViT XS on ImageNet at higher speed). These promising results demonstrate that exploring different efficient architecture designs and self-attention mechanisms can lead to interesting new building blocks for TinyML applications.
[[2208.07142] Perspective Reconstruction of Human Faces by Joint Mesh and Landmark Regression](http://arxiv.org/abs/2208.07142)
Even though 3D face reconstruction has achieved impressive progress, most orthogonal projection-based face reconstruction methods can not achieve accurate and consistent reconstruction results when the face is very close to the camera due to the distortion under the perspective projection. In this paper, we propose to simultaneously reconstruct 3D face mesh in the world space and predict 2D face landmarks on the image plane to address the problem of perspective 3D face reconstruction. Based on the predicted 3D vertices and 2D landmarks, the 6DoF (6 Degrees of Freedom) face pose can be easily estimated by the PnP solver to represent perspective projection. Our approach achieves 1st place on the leader-board of the ECCV 2022 WCPA challenge and our model is visually robust under different identities, expressions and poses. The training code and models are released to facilitate future research.
[[2208.06458] LM-CORE: Language Models with Contextually Relevant External Knowledge](http://arxiv.org/abs/2208.06458)
Large transformer-based pre-trained language models have achieved impressive performance on a variety of knowledge-intensive tasks and can capture factual knowledge in their parameters. We argue that storing large amounts of knowledge in the model parameters is sub-optimal given the ever-growing amounts of knowledge and resource requirements. We posit that a more efficient alternative is to provide explicit access to contextually relevant structured knowledge to the model and train it to use that knowledge. We present LM-CORE -- a general framework to achieve this -- that allows \textit{decoupling} of the language model training from the external knowledge source and allows the latter to be updated without affecting the already trained model. Experimental results show that LM-CORE, having access to external knowledge, achieves significant and robust outperformance over state-of-the-art knowledge-enhanced language models on knowledge probing tasks; can effectively handle knowledge updates; and performs well on two downstream tasks. We also present a thorough error analysis highlighting the successes and failures of LM-CORE.
[[2208.06838] Reduced Implication-bias Logic Loss for Neuro-Symbolic Learning](http://arxiv.org/abs/2208.06838)
Integrating logical reasoning and machine learning by approximating logical inference with differentiable operators is a widely used technique in Neuro-Symbolic systems.
However, some differentiable operators could bring a significant bias during backpropagation and degrade the performance of Neuro-Symbolic learning.
In this paper, we reveal that this bias, named \textit{Implication Bias} is common in loss functions derived from fuzzy logic operators.
Furthermore, we propose a simple yet effective method to transform the biased loss functions into \textit{Reduced Implication-bias Logic Loss (RILL)} to address the above problem.
Empirical study shows that RILL can achieve significant improvements compared with the biased logic loss functions, especially when the knowledge base is incomplete, and keeps more robust than the compared methods when labelled data is insufficient.
[[2208.06979] DuETA: Traffic Congestion Propagation Pattern Modeling via Efficient Graph Learning for ETA Prediction at Baidu Maps](http://arxiv.org/abs/2208.06979)
Estimated time of arrival (ETA) prediction, also known as travel time estimation, is a fundamental task for a wide range of intelligent transportation applications, such as navigation, route planning, and ride-hailing services. To accurately predict the travel time of a route, it is essential to take into account both contextual and predictive factors, such as spatial-temporal interaction, driving behavior, and traffic congestion propagation inference. The ETA prediction models previously deployed at Baidu Maps have addressed the factors of spatial-temporal interaction (ConSTGAT) and driving behavior (SSML). In this work, we focus on modeling traffic congestion propagation patterns to improve ETA performance. Traffic congestion propagation pattern modeling is challenging, and it requires accounting for impact regions over time and cumulative effect of delay variations over time caused by traffic events on the road network. In this paper, we present a practical industrial-grade ETA prediction framework named DuETA. Specifically, we construct a congestion-sensitive graph based on the correlations of traffic patterns, and we develop a route-aware graph transformer to directly learn the long-distance correlations of the road segments. This design enables DuETA to capture the interactions between the road segment pairs that are spatially distant but highly correlated with traffic conditions. Extensive experiments are conducted on large-scale, real-world datasets collected from Baidu Maps. Experimental results show that ETA prediction can significantly benefit from the learned traffic congestion propagation patterns. In addition, DuETA has already been deployed in production at Baidu Maps, serving billions of requests every day. This demonstrates that DuETA is an industrial-grade and robust solution for large-scale ETA prediction services.
[[2208.06616] Self-supervised Contrastive Representation Learning for Semi-supervised Time-Series Classification](http://arxiv.org/abs/2208.06616)
Learning time-series representations when only unlabeled data or few labeled samples are available can be a challenging task. Recently, contrastive self-supervised learning has shown great improvement in extracting useful representations from unlabeled data via contrasting different augmented views of data. In this work, we propose a novel Time-Series representation learning framework via Temporal and Contextual Contrasting (TS-TCC) that learns representations from unlabeled data with contrastive learning. Specifically, we propose time-series specific weak and strong augmentations and use their views to learn robust temporal relations in the proposed temporal contrasting module, besides learning discriminative representations by our proposed contextual contrasting module. Additionally, we conduct a systematic study of time-series data augmentation selection, which is a key part of contrastive learning. We also extend TS-TCC to the semi-supervised learning settings and propose a Class-Aware TS-TCC (CA-TCC) that benefits from the available few labeled data to further improve representations learned by TS-TCC. Specifically, we leverage robust pseudo labels produced by TS-TCC to realize class-aware contrastive loss. Extensive experiments show that the linear evaluation of the features learned by our proposed framework performs comparably with the fully supervised training. Additionally, our framework shows high efficiency in few labeled data and transfer learning scenarios. The code is publicly available at \url{https://github.com/emadeldeen24/TS-TCC}.
[[2208.06988] IRL with Partial Observations using the Principle of Uncertain Maximum Entropy](http://arxiv.org/abs/2208.06988)
The principle of maximum entropy is a broadly applicable technique for computing a distribution with the least amount of information possible while constrained to match empirically estimated feature expectations. However, in many real-world applications that use noisy sensors computing the feature expectations may be challenging due to partial observation of the relevant model variables. For example, a robot performing apprenticeship learning may lose sight of the agent it is learning from due to environmental occlusion. We show that in generalizing the principle of maximum entropy to these types of scenarios we unavoidably introduce a dependency on the learned model to the empirical feature expectations. We introduce the principle of uncertain maximum entropy and present an expectation-maximization based solution generalized from the principle of latent maximum entropy. Finally, we experimentally demonstrate the improved robustness to noisy data offered by our technique in a maximum causal entropy inverse reinforcement learning domain.
[[2208.06411] SFF-DA: Sptialtemporal Feature Fusion for Detecting Anxiety Nonintrusively](http://arxiv.org/abs/2208.06411)
Early detection of anxiety disorders is essential to reduce the suffering of people with mental disorders and to improve treatment outcomes. Anxiety screening based on the mHealth platform is of particular practical value in improving screening efficiency and reducing screening costs. In practice, differences in mobile devices in subjects' physical and mental evaluations and the problems faced with uneven data quality and small sample sizes of data in the real world have made existing methods ineffective. Therefore, we propose a framework based on spatiotemporal feature fusion for detecting anxiety nonintrusively. To reduce the impact of uneven data quality, we constructed a feature extraction network based on "3DCNN+LSTM" and fused spatiotemporal features of facial behavior and noncontact physiology. Moreover, we designed a similarity assessment strategy to solve the problem that the small sample size of data leads to a decline in model accuracy. Our framework was validated with our crew dataset from the real world and two public datasets, UBFC-PHYS and SWELL-KW. The experimental results show that the overall performance of our framework was better than that of the state-of-the-art comparison methods.
[[2208.06615] A Unified Two-Stage Group Semantics Propagation and Contrastive Learning Network for Co-Saliency Detection](http://arxiv.org/abs/2208.06615)
Co-saliency detection (CoSOD) aims at discovering the repetitive salient objects from multiple images. Two primary challenges are group semantics extraction and noise object suppression. In this paper, we present a unified Two-stage grOup semantics PropagatIon and Contrastive learning NETwork (TopicNet) for CoSOD. TopicNet can be decomposed into two substructures, including a two-stage group semantics propagation module (TGSP) to address the first challenge and a contrastive learning module (CLM) to address the second challenge. Concretely, for TGSP, we design an image-to-group propagation module (IGP) to capture the consensus representation of intra-group similar features and a group-to-pixel propagation module (GPP) to build the relevancy of consensus representation. For CLM, with the design of positive samples, the semantic consistency is enhanced. With the design of negative samples, the noise objects are suppressed. Experimental results on three prevailing benchmarks reveal that TopicNet outperforms other competitors in terms of various evaluation metrics.
[[2208.06756] Predicting skull fractures via CNN with classification algorithms](http://arxiv.org/abs/2208.06756)
Computer Tomography (CT) images have become quite important to diagnose diseases. CT scan slice contains a vast amount of data that may not be properly examined with the requisite precision and speed using normal visual inspection. A computer-assisted skull fracture classification expert system is needed to assist physicians. Convolutional Neural Networks (CNNs) are the most extensively used deep learning models for image categorization since most often time they outperform other models in terms of accuracy and results. The CNN models were then developed and tested, and several convolutional neural network (CNN) architectures were compared. ResNet50, which was used for feature extraction combined with a gradient boosted decision tree machine learning algorithm to act as a classifier for the categorization of skull fractures from brain CT scans into three fracture categories, had the best overall F1-score of 96%, Hamming Score of 95%, Balanced accuracy Score of 94% & ROC AUC curve of 96% for the classification of skull fractures.
[[2208.06885] Global Priors Guided Modulation Network for Joint Super-Resolution and Inverse Tone-Mapping](http://arxiv.org/abs/2208.06885)
Joint super-resolution and inverse tone-mapping (SR-ITM) aims to enhance the visual quality of videos that have quality deficiencies in resolution and dynamic range. This problem arises when using 4K high dynamic range (HDR) TVs to watch a low-resolution standard dynamic range (LR SDR) video. Previous methods that rely on learning local information typically cannot do well in preserving color conformity and long-range structural similarity, resulting in unnatural color transition and texture artifacts. In order to tackle these challenges, we propose a global priors guided modulation network (GPGMNet) for joint SR-ITM. In particular, we design a global priors extraction module (GPEM) to extract color conformity prior and structural similarity prior that are beneficial for ITM and SR tasks, respectively. To further exploit the global priors and preserve spatial information, we devise multiple global priors guided spatial-wise modulation blocks (GSMBs) with a few parameters for intermediate feature modulation, in which the modulation parameters are generated by the shared global priors and the spatial features map from the spatial pyramid convolution block (SPCB). With these elaborate designs, the GPGMNet can achieve higher visual quality with lower computational complexity. Extensive experiments demonstrate that our proposed GPGMNet is superior to the state-of-the-art methods. Specifically, our proposed model exceeds the state-of-the-art by 0.64 dB in PSNR, with 69$\%$ fewer parameters and 3.1$\times$ speedup. The code will be released soon.
[[2208.07010] Automatic Landmark Detection and Registration of Brain Cortical Surfaces via Quasi-Conformal Geometry and Convolutional Neural Networks](http://arxiv.org/abs/2208.07010)
In medical imaging, surface registration is extensively used for performing systematic comparisons between anatomical structures, with a prime example being the highly convoluted brain cortical surfaces. To obtain a meaningful registration, a common approach is to identify prominent features on the surfaces and establish a low-distortion mapping between them with the feature correspondence encoded as landmark constraints. Prior registration works have primarily focused on using manually labeled landmarks and solving highly nonlinear optimization problems, which are time-consuming and hence hinder practical applications. In this work, we propose a novel framework for the automatic landmark detection and registration of brain cortical surfaces using quasi-conformal geometry and convolutional neural networks. We first develop a landmark detection network (LD-Net) that allows for the automatic extraction of landmark curves given two prescribed starting and ending points based on the surface geometry. We then utilize the detected landmarks and quasi-conformal theory for achieving the surface registration. Specifically, we develop a coefficient prediction network (CP-Net) for predicting the Beltrami coefficients associated with the desired landmark-based registration and a mapping network called the disk Beltrami solver network (DBS-Net) for generating quasi-conformal mappings from the predicted Beltrami coefficients, with the bijectivity guaranteed by quasi-conformal theory. Experimental results are presented to demonstrate the effectiveness of our proposed framework. Altogether, our work paves a new way for surface-based morphometry and medical shape analysis.
[[2208.07011] Automatic Controlling Fish Feeding Machine using Feature Extraction of Nutriment and Ripple Behavior](http://arxiv.org/abs/2208.07011)
Controlling fish feeding machine is challenging problem because experienced fishermen can adequately control based on assumption. To build robust method for reasonable application, we propose automatic controlling fish feeding machine based on computer vision using combination of counting nutriments and estimating ripple behavior using regression and textural feature, respectively. To count number of nutriments, we apply object detection and tracking methods to acknowledge the nutriments moving to sea surface. Recently, object tracking is active research and challenging problem in computer vision. Unfortunately, the robust tracking method for multiple small objects with dense and complex relationships is unsolved problem in aquaculture field with more appearance creatures. Based on the number of nutriments and ripple behavior, we can control fish feeding machine which consistently performs well in real environment. Proposed method presents the agreement for automatic controlling fish feeding by the activation graphs and textural feature of ripple behavior. Our tracking method can precisely track the nutriments in next frame comparing with other methods. Based on computational time, proposed method reaches 3.86 fps while other methods spend lower than 1.93 fps. Quantitative evaluation can promise that proposed method is valuable for aquaculture fish farm with widely applied to real environment.
[[2208.07070] A Vision Transformer-Based Approach to Bearing Fault Classification via Vibration Signals](http://arxiv.org/abs/2208.07070)
Rolling bearings are the most crucial components of rotating machinery. Identifying defective bearings in a timely manner may prevent the malfunction of an entire machinery system. The mechanical condition monitoring field has entered the big data phase as a result of the fast advancement of machine parts. When working with large amounts of data, the manual feature extraction approach has the drawback of being inefficient and inaccurate. Data-driven methods like the Deep Learning method have been successfully used in recent years for mechanical intelligent fault detection. Convolutional neural networks (CNNs) were mostly used in earlier research to detect and identify bearing faults. The CNN model, however, suffers from the drawback of having trouble managing fault-time information, which results in a lack of classification results. In this study, bearing defects have been classified using a state-of-the-art Vision Transformer (ViT). Bearing defects were classified using Case Western Reserve University (CWRU) bearing failure laboratory experimental data. The research took into account 13 distinct kinds of defects under 0-load situations in addition to normal bearing conditions. Using the short-time Fourier transform (STFT), the vibration signals were converted into 2D time-frequency images. The 2D time-frequency images are used as input parameters for the ViT. The model achieved an overall accuracy of 98.8%.
[[2208.06961] A Hybrid Model of Classification and Generation for Spatial Relation Extraction](http://arxiv.org/abs/2208.06961)
Extracting spatial relations from texts is a fundamental task for natural language understanding and previous studies only regard it as a classification task, ignoring those spatial relations with null roles due to their poor information. To address the above issue, we first view spatial relation extraction as a generation task and propose a novel hybrid model HMCGR for this task. HMCGR contains a generation and a classification model, while the former can generate those null-role relations and the latter can extract those non-null-role relations to complement each other. Moreover, a reflexivity evaluation mechanism is applied to further improve the accuracy based on the reflexivity principle of spatial relation. Experimental results on SpaceEval show that HMCGR outperforms the SOTA baselines significantly.
[[2208.07130] Exploring Generative Models for Joint Attribute Value Extraction from Product Titles](http://arxiv.org/abs/2208.07130)
Attribute values of the products are an essential component in any e-commerce platform. Attribute Value Extraction (AVE) deals with extracting the attributes of a product and their values from its title or description. In this paper, we propose to tackle the AVE task using generative frameworks. We present two types of generative paradigms, namely, word sequence-based and positional sequence-based, by formulating the AVE task as a generation problem. We conduct experiments on two datasets where the generative approaches achieve the new state-of-the-art results. This shows that we can use the proposed framework for AVE tasks without additional tagging or task-specific model design.
[[2208.07017] Prospects of federated machine learning in fluid dynamics](http://arxiv.org/abs/2208.07017)
Physics-based models have been mainstream in fluid dynamics for developing predictive models. In recent years, machine learning has offered a renaissance to the fluid community due to the rapid developments in data science, processing units, neural network based technologies, and sensor adaptations. So far in many applications in fluid dynamics, machine learning approaches have been mostly focused on a standard process that requires centralizing the training data on a designated machine or in a data center. In this letter, we present a federated machine learning approach that enables localized clients to collaboratively learn an aggregated and shared predictive model while keeping all the training data on each edge device. We demonstrate the feasibility and prospects of such decentralized learning approach with an effort to forge a deep learning surrogate model for reconstructing spatiotemporal fields. Our results indicate that federated machine learning might be a viable tool for designing highly accurate predictive decentralized digital twins relevant to fluid dynamics.
[[2208.07194] An Efficient and Reliable Asynchronous Federated Learning Scheme for Smart Public Transportation](http://arxiv.org/abs/2208.07194)
Machine Learning (ML) is a distributed approach for training predictive models on the Internet of Vehicles (IoV) to enable smart public transportation. Since the traffic conditions change over time, the ML model that predicts traffic flows and the time passengers wait at stops must be updated continuously and efficiently. Federated learning (FL) is a distributed machine learning scheme that allows vehicles to receive continuous model updates without having to upload raw data to the cloud and wait for models to be trained. However, FL in smart public transportation is vulnerable to poisoning or DDoS attacks since vehicles travel in public. Besides, due to device heterogeneity and imbalanced data distributions, the synchronized aggregation strategy that collects local models from specific vehicles before aggregation is inefficient. Although Asynchronous Federated Learning (AFL) schemes are developed to improve efficiency by aggregating local models as soon as they are received, the stale local models remain unreasonably weighted, resulting in poor learning performance. To enable smarter public transportation, this paper offers a blockchain-based asynchronous federated learning scheme with a dynamic scaling factor (DBAFL). Specifically, the novel committee-based consensus algorithm for blockchain improves reliability at the lowest possible cost of time. Meanwhile, the devised dynamic scaling factor allows AFL to assign reasonable weight to stale local models. Extensive experiments conducted on heterogeneous devices validate outperformed learning performance, efficiency, and reliability of DBAFL.
[[2208.07204] USB: A Unified Semi-supervised Learning Benchmark](http://arxiv.org/abs/2208.07204)
Semi-supervised learning (SSL) improves model generalization by leveraging massive unlabeled data to augment limited labeled samples. However, currently, popular SSL evaluation protocols are often constrained to computer vision (CV) tasks. In addition, previous work typically trains deep neural networks from scratch, which is time-consuming and environmentally unfriendly. To address the above issues, we construct a Unified SSL Benchmark (USB) by selecting 15 diverse, challenging, and comprehensive tasks from CV, natural language processing (NLP), and audio processing (Audio), on which we systematically evaluate dominant SSL methods, and also open-source a modular and extensible codebase for fair evaluation on these SSL methods. We further provide pre-trained versions of the state-of-the-art neural models for CV tasks to make the cost affordable for further tuning. USB enables the evaluation of a single SSL algorithm on more tasks from multiple domains but with less cost. Specifically, on a single NVIDIA V100, only 37 GPU days are required to evaluate FixMatch on 15 tasks in USB while 335 GPU days (279 GPU days on 4 CV datasets except for ImageNet) are needed on 5 CV tasks with the typical protocol.
[[2208.06648] Imputation Strategies Under Clinical Presence: Impact on Algorithmic Fairness](http://arxiv.org/abs/2208.06648)
Biases have marked medical history, leading to unequal care affecting marginalised groups. The patterns of missingness in observational data often reflect these group discrepancies, but the algorithmic fairness implications of group-specific missingness are not well understood. Despite its potential impact, imputation is too often a forgotten preprocessing step. At best, practitioners guide imputation choice by optimising overall performance, ignoring how this preprocessing can reinforce inequities. Our work questions this choice by studying how imputation affects downstream algorithmic fairness. First, we provide a structured view of the relationship between clinical presence mechanisms and group-specific missingness patterns. Then, through simulations and real-world experiments, we demonstrate that the imputation choice influences marginalised group performance and that no imputation strategy consistently reduces disparities. Importantly, our results show that current practices may endanger health equity as similarly performing imputation strategies at the population level can affect marginalised groups in different ways. Finally, we propose recommendations for mitigating inequity stemming from a neglected step of the machine learning pipeline.
[[2208.07211] RuDi: Explaining Behavior Sequence Models by Automatic Statistics Generation and Rule Distillation](http://arxiv.org/abs/2208.07211)
Risk scoring systems have been widely deployed in many applications, which assign risk scores to users according to their behavior sequences. Though many deep learning methods with sophisticated designs have achieved promising results, the black-box nature hinders their applications due to fairness, explainability, and compliance consideration. Rule-based systems are considered reliable in these sensitive scenarios. However, building a rule system is labor-intensive. Experts need to find informative statistics from user behavior sequences, design rules based on statistics and assign weights to each rule. In this paper, we bridge the gap between effective but black-box models and transparent rule models. We propose a two-stage method, RuDi, that distills the knowledge of black-box teacher models into rule-based student models. We design a Monte Carlo tree search-based statistics generation method that can provide a set of informative statistics in the first stage. Then statistics are composed into logical rules with our proposed neural logical networks by mimicking the outputs of teacher models. We evaluate RuDi on three real-world public datasets and an industrial dataset to demonstrate its effectiveness.
[[2208.06557] A Novel Regularization Approach to Fair ML](http://arxiv.org/abs/2208.06557)
A number of methods have been introduced for the fair ML issue, most of them complex and many of them very specific to the underlying ML moethodology. Here we introduce a new approach that is simple, easily explained, and potentially applicable to a number of standard ML algorithms. Explicitly Deweighted Features (EDF) reduces the impact of each feature among the proxies of sensitive variables, allowing a different amount of deweighting applied to each such feature. The user specifies the deweighting hyperparameters, to achieve a given point in the Utility/Fairness tradeoff spectrum. We also introduce a new, simple criterion for evaluating the degree of protection afforded by any fair ML method.
[[2208.06894] The SVD of Convolutional Weights: A CNN Interpretability Framework](http://arxiv.org/abs/2208.06894)
Deep neural networks used for image classification often use convolutional filters to extract distinguishing features before passing them to a linear classifier. Most interpretability literature focuses on providing semantic meaning to convolutional filters to explain a model's reasoning process and confirm its use of relevant information from the input domain. Fully connected layers can be studied by decomposing their weight matrices using a singular value decomposition, in effect studying the correlations between the rows in each matrix to discover the dynamics of the map. In this work we define a singular value decomposition for the weight tensor of a convolutional layer, which provides an analogous understanding of the correlations between filters, exposing the dynamics of the convolutional map. We validate our definition using recent results in random matrix theory. By applying the decomposition across the linear layers of an image classification network we suggest a framework against which interpretability methods might be applied using hypergraphs to model class separation. Rather than looking to the activations to explain the network, we use the singular vectors with the greatest corresponding singular values for each linear layer to identify those features most important to the network. We illustrate our approach with examples and introduce the DeepDataProfiler library, the analysis tool used for this study.
[[2208.06436] RandomSCM: interpretable ensembles of sparse classifiers tailored for omics data](http://arxiv.org/abs/2208.06436)
Background: Understanding the relationship between the Omics and the phenotype is a central problem in precision medicine. The high dimensionality of metabolomics data challenges learning algorithms in terms of scalability and generalization. Most learning algorithms do not produce interpretable models -- Method: We propose an ensemble learning algorithm based on conjunctions or disjunctions of decision rules. -- Results : Applications on metabolomics data shows that it produces models that achieves high predictive performances. The interpretability of the models makes them useful for biomarker discovery and patterns discovery in high dimensional data.
[[2208.06991] Towards Interpretable Sleep Stage Classification Using Cross-Modal Transformers](http://arxiv.org/abs/2208.06991)
Accurate sleep stage classification is significant for sleep health assessment. In recent years, several deep learning and machine learning based sleep staging algorithms have been developed and they have achieved performance on par with human annotation. Despite improved performance, a limitation of most deep-learning based algorithms is their Black-box behavior, which which have limited their use in clinical settings. Here, we propose Cross-Modal Transformers, which is a transformer-based method for sleep stage classification. Our models achieve both competitive performance with the state-of-the-art approaches and eliminates the Black-box behavior of deep-learning models by utilizing the interpretability aspect of the attention modules. The proposed cross-modal transformers consist of a novel cross-modal transformer encoder architecture along with a multi-scale 1-dimensional convolutional neural network for automatic representation learning. Our sleep stage classifier based on this design was able to achieve sleep stage classification performance on par with or better than the state-of-the-art approaches, along with interpretability, a fourfold reduction in the number of parameters and a reduced training time compared to the current state-of-the-art. Our code is available at https://github.com/Jathurshan0330/Cross-Modal-Transformer.