[[2301.04725] Blockchain For Mobile Health Applications: Acceleration With GPU Computing](http://arxiv.org/abs/2301.04725) #secure
Blockchain is a linearly linked, distributed, and very robust data structure. Originally proposed as part of the Bitcoin distributed stack, it found a number of applications in a number of fields, most notably in smart contracts, social media, secure IoT, and cryptocurrency mining. It ensures data integrity by distributing strongly encrypted data in widely redundant segments. Each new insertion requires verification and approval by the majority of the users of the blockchain. Both encryption and verification are computationally intensive tasks which cannot be solved with ordinary off-the-shelf CPUs. This has resulted in a renewed scientific interest in secure distributed communication and coordination protocols. Mobile health applications are growing progressively popular and have the enormous advantage of timely diagnosis of certain conditions. However, privacy concerns have been raised as mobile health application by default have access to highly sensitive personal data. This chapter presents concisely how blockchain can be applied to mobile health applications in order to enhance privacy.
[[2301.04841] LZR: Identifying Unexpected Internet Services](http://arxiv.org/abs/2301.04841) #secure
Internet-wide scanning is a commonly used research technique that has helped uncover real-world attacks, find cryptographic weaknesses, and understand both operator and miscreant behavior. Studies that employ scanning have largely assumed that services are hosted on their IANA-assigned ports, overlooking the study of services on unusual ports. In this work, we investigate where Internet services are deployed in practice and evaluate the security posture of services on unexpected ports. We show protocol deployment is more diffuse than previously believed and that protocols run on many additional ports beyond their primary IANA-assigned port. For example, only 3% of HTTP and 6% of TLS services run on ports 80 and 443, respectively. Services on non-standard ports are more likely to be insecure, which results in studies dramatically underestimating the security posture of Internet hosts. Building on our observations, we introduce LZR ("Laser"), a system that identifies 99% of identifiable unexpected services in five handshakes and dramatically reduces the time needed to perform application-layer scans on ports with few responsive expected services (e.g., 5500% speedup on 27017/MongoDB). We conclude with recommendations for future studies.
[[2301.04888] Code-based Cryptography in IoT: A HW/SW Co-Design of HQC](http://arxiv.org/abs/2301.04888) #secure
Recent advances in quantum computing pose a serious threat on the security of widely used public-key cryptosystems. Thus, new post-quantum cryptographic algorithms have been proposed as part of the associated US NIST process to enable secure, encrypted communication in the age of quantum computing. Many hardware accelerators for structured lattice-based algorithms have already been published to meet the strict power, area and latency requirements of low-power IoT edge devices. However, the security of these algorithms is still uncertain. Currently, many new attacks against the lattice structure are investigated to judge on their security. In contrast, code-based algorithms, which rely on deeply explored security metrics and are appealing candidates in the NIST process, have not yet been investigated to the same depth in the context of IoT due to the computational complexity and memory footprint of state-of-the-art software implementations.
In this paper, we present to the best of our knowledge the first HW/SW co-design based implementation of the code-based Hamming Quasi Cyclic Key-Encapsulation Mechanism. We profile and evaluate this algorithm in order to explore the trade-off between software optimizations, tightly coupled hardware acceleration by instruction set extension and modular, loosely coupled accelerators. We provide detailed results on the energy consumption and performance of our design and compare it to existing implementations of lattice- and code-based algorithms. The design was implemented in two technologies: FPGA and ASIC. Our results show that code-based algorithms are valid alternatives in low-power IoT from an implementation perspective.
[[2301.05048] Open SESAME: Fighting Botnets with Seed Reconstructions of Domain Generation Algorithms](http://arxiv.org/abs/2301.05048) #security
An important aspect of many botnets is their capability to generate pseudorandom domain names using Domain Generation Algorithms (DGAs). A cyber criminal can register such domains to establish periodically changing rendezvous points with the bots. DGAs make use of seeds to generate sets of domains. Seeds can easily be changed in order to generate entirely new groups of domains while using the same underlying algorithm. While this requires very little manual effort for an adversary, security specialists typically have to manually reverse engineer new malware strains to reconstruct the seeds. Only when the seed and DGA are known, past and future domains can be generated, efficiently attributed, blocked, sinkholed or used for a take-down. Common counters in the literature consist of databases or Machine Learning (ML) based detectors to keep track of past and future domains of known DGAs and to identify DGA-generated domain names, respectively. However, database based approaches can not detect domains generated by new DGAs, and ML approaches can not generate future domain names. In this paper, we introduce SESAME, a system that combines the two above-mentioned approaches and contains a module for automatic Seed Reconstruction, which is, to our knowledge, the first of its kind. It is used to automatically classify domain names, rate their novelty, and determine the seeds of the underlying DGAs. SESAME consists of multiple DGA-specific Seed Reconstructors and is designed to work purely based on domain names, as they are easily obtainable from observing the network traffic. We evaluated our approach on 20.8 gigabytes of DNS-lookups. Thereby, we identified 17 DGAs, of which 4 were entirely new to us.
[[2301.05060] Evaluating the Fork-Awareness of Coverage-Guided Fuzzers](http://arxiv.org/abs/2301.05060) #security
Fuzz testing (or fuzzing) is an effective technique used to find security vulnerabilities. It consists of feeding a software under test with malformed inputs, waiting for a weird system behaviour (often a crash of the system). Over the years, different approaches have been developed, and among the most popular lies the coverage-based one. It relies on the instrumentation of the system to generate inputs able to cover as much code as possible. The success of this approach is also due to its usability as fuzzing techniques research approaches that do not require (or only partial require) human interactions. Despite the efforts, devising a fully-automated fuzzer still seems to be a challenging task. Target systems may be very complex; they may integrate cryptographic primitives, compute and verify check-sums and employ forks to enhance the system security, achieve better performances or manage different connections at the same time. This paper introduces the fork-awareness property to express the fuzzer ability to manage systems using forks. This property is leveraged to evaluate 14 of the most widely coverage-guided fuzzers and highlight how current fuzzers are ineffective against systems using forks.
[[2301.04794] LiteLSTM Architecture Based on Weights Sharing for Recurrent Neural Networks](http://arxiv.org/abs/2301.04794) #security
Long short-term memory (LSTM) is one of the robust recurrent neural network architectures for learning sequential data. However, it requires considerable computational power to learn and implement both software and hardware aspects. This paper proposed a novel LiteLSTM architecture based on reducing the LSTM computation components via the weights sharing concept to reduce the overall architecture computation cost and maintain the architecture performance. The proposed LiteLSTM can be significant for processing large data where time-consuming is crucial while hardware resources are limited, such as the security of IoT devices and medical data processing. The proposed model was evaluated and tested empirically on three different datasets from the computer vision, cybersecurity, speech emotion recognition domains. The proposed LiteLSTM has comparable accuracy to the other state-of-the-art recurrent architecture while using a smaller computation budget.
[[2301.04875] Color-NeuraCrypt: Privacy-Preserving Color-Image Classification Using Extended Random Neural Networks](http://arxiv.org/abs/2301.04875) #privacy
In recent years, with the development of cloud computing platforms, privacy-preserving methods for deep learning have become an urgent problem. NeuraCrypt is a private random neural network for privacy-preserving that allows data owners to encrypt the medical data before the data uploading, and data owners can train and then test their models in a cloud server with the encrypted data directly. However, we point out that the performance of NeuraCrypt is heavily degraded when using color images. In this paper, we propose a Color-NeuraCrypt to solve this problem. Experiment results show that our proposed Color-NeuraCrypt can achieve a better classification accuracy than the original one and other privacy-preserving methods.
[[2301.05012] Fairly Private: Investigating The Fairness of Visual Privacy Preservation Algorithms](http://arxiv.org/abs/2301.05012) #privacy
As the privacy risks posed by camera surveillance and facial recognition have grown, so has the research into privacy preservation algorithms. Among these, visual privacy preservation algorithms attempt to impart bodily privacy to subjects in visuals by obfuscating privacy-sensitive areas. While disparate performances of facial recognition systems across phenotypes are the subject of much study, its counterpart, privacy preservation, is not commonly analysed from a fairness perspective. In this paper, the fairness of commonly used visual privacy preservation algorithms is investigated through the performances of facial recognition models on obfuscated images. Experiments on the PubFig dataset clearly show that the privacy protection provided is unequal across groups.
[[2301.04785] Phase-shifted Adversarial Training](http://arxiv.org/abs/2301.04785) #attack
Adversarial training has been considered an imperative component for safely deploying neural network-based applications to the real world. To achieve stronger robustness, existing methods primarily focus on how to generate strong attacks by increasing the number of update steps, regularizing the models with the smoothed loss function, and injecting the randomness into the attack. Instead, we analyze the behavior of adversarial training through the lens of response frequency. We empirically discover that adversarial training causes neural networks to have low convergence to high-frequency information, resulting in highly oscillated predictions near each data. To learn high-frequency contents efficiently and effectively, we first prove that a universal phenomenon of frequency principle, i.e., \textit{lower frequencies are learned first}, still holds in adversarial training. Based on that, we propose phase-shifted adversarial training (PhaseAT) in which the model learns high-frequency components by shifting these frequencies to the low-frequency range where the fast convergence occurs. For evaluations, we conduct the experiments on CIFAR-10 and ImageNet with the adaptive attack carefully designed for reliable evaluation. Comprehensive results show that PhaseAT significantly improves the convergence for high-frequency information. This results in improved adversarial robustness by enabling the model to have smoothed predictions near each data.
[[2301.04733] AGMN: Association Graph-based Graph Matching Network for Coronary Artery Semantic Labeling on Invasive Coronary Angiograms](http://arxiv.org/abs/2301.04733) #robust
Semantic labeling of coronary arterial segments in invasive coronary angiography (ICA) is important for automated assessment and report generation of coronary artery stenosis in the computer-aided diagnosis of coronary artery disease (CAD). Inspired by the training procedure of interventional cardiologists for interpreting the structure of coronary arteries, we propose an association graph-based graph matching network (AGMN) for coronary arterial semantic labeling. We first extract the vascular tree from invasive coronary angiography (ICA) and convert it into multiple individual graphs. Then, an association graph is constructed from two individual graphs where each vertex represents the relationship between two arterial segments. Using the association graph, the AGMN extracts the vertex features by the embedding module, aggregates the features from adjacent vertices and edges by graph convolution network, and decodes the features to generate the semantic mappings between arteries. By learning the mapping of arterial branches between two individual graphs, the unlabeled arterial segments are classified by the labeled segments to achieve semantic labeling. A dataset containing 263 ICAs was employed to train and validate the proposed model, and a five-fold cross-validation scheme was performed. Our AGMN model achieved an average accuracy of 0.8264, an average precision of 0.8276, an average recall of 0.8264, and an average F1-score of 0.8262, which significantly outperformed existing coronary artery semantic labeling methods. In conclusion, we have developed and validated a new algorithm with high accuracy, interpretability, and robustness for coronary artery semantic labeling on ICAs.
[[2301.04860] Edge Preserving Implicit Surface Representation of Point Clouds](http://arxiv.org/abs/2301.04860) #robust
Learning implicit surface directly from raw data recently has become a very attractive representation method for 3D reconstruction tasks due to its excellent performance. However, as the raw data quality deteriorates, the implicit functions often lead to unsatisfactory reconstruction results. To this end, we propose a novel edge-preserving implicit surface reconstruction method, which mainly consists of a differentiable Laplican regularizer and a dynamic edge sampling strategy. Among them, the differential Laplican regularizer can effectively alleviate the implicit surface unsmoothness caused by the point cloud quality deteriorates; Meanwhile, in order to reduce the excessive smoothing at the edge regions of implicit suface, we proposed a dynamic edge extract strategy for sampling near the sharp edge of point cloud, which can effectively avoid the Laplacian regularizer from smoothing all regions. Finally, we combine them with a simple regularization term for robust implicit surface reconstruction. Compared with the state-of-the-art methods, experimental results show that our method significantly improves the quality of 3D reconstruction results. Moreover, we demonstrate through several experiments that our method can be conveniently and effectively applied to some point cloud analysis tasks, including point cloud edge feature extraction, normal estimation,etc.
[[2301.05106] Forgetful Active Learning with Switch Events: Efficient Sampling for Out-of-Distribution Data](http://arxiv.org/abs/2301.05106) #robust
This paper considers deep out-of-distribution active learning. In practice, fully trained neural networks interact randomly with out-of-distribution (OOD) inputs and map aberrant samples randomly within the model representation space. Since data representations are direct manifestations of the training distribution, the data selection process plays a crucial role in outlier robustness. For paradigms such as active learning, this is especially challenging since protocols must not only improve performance on the training distribution most effectively but further render a robust representation space. However, existing strategies directly base the data selection on the data representation of the unlabeled data which is random for OOD samples by definition. For this purpose, we introduce forgetful active learning with switch events (FALSE) - a novel active learning protocol for out-of-distribution active learning. Instead of defining sample importance on the data representation directly, we formulate "informativeness" with learning difficulty during training. Specifically, we approximate how often the network "forgets" unlabeled samples and query the most "forgotten" samples for annotation. We report up to 4.5\% accuracy improvements in over 270 experiments, including four commonly used protocols, two OOD benchmarks, one in-distribution benchmark, and three different architectures.
[[2301.05158] SemPPL: Predicting pseudo-labels for better contrastive representations](http://arxiv.org/abs/2301.05158) #robust
Learning from large amounts of unsupervised data and a small amount of supervision is an important open problem in computer vision. We propose a new semi-supervised learning method, Semantic Positives via Pseudo-Labels (SemPPL), that combines labelled and unlabelled data to learn informative representations. Our method extends self-supervised contrastive learning -- where representations are shaped by distinguishing whether two samples represent the same underlying datum (positives) or not (negatives) -- with a novel approach to selecting positives. To enrich the set of positives, we leverage the few existing ground-truth labels to predict the missing ones through a $k$-nearest neighbours classifier by using the learned embeddings of the labelled data. We thus extend the set of positives with datapoints having the same pseudo-label and call these semantic positives. We jointly learn the representation and predict bootstrapped pseudo-labels. This creates a reinforcing cycle. Strong initial representations enable better pseudo-label predictions which then improve the selection of semantic positives and lead to even better representations. SemPPL outperforms competing semi-supervised methods setting new state-of-the-art performance of $68.5\%$ and $76\%$ top-$1$ accuracy when using a ResNet-$50$ and training on $1\%$ and $10\%$ of labels on ImageNet, respectively. Furthermore, when using selective kernels, SemPPL significantly outperforms previous state-of-the-art achieving $72.3\%$ and $78.3\%$ top-$1$ accuracy on ImageNet with $1\%$ and $10\%$ labels, respectively, which improves absolute $+7.8\%$ and $+6.2\%$ over previous work. SemPPL also exhibits state-of-the-art performance over larger ResNet models as well as strong robustness, out-of-distribution and transfer performance.
[[2301.05169] Causal Triplet: An Open Challenge for Intervention-centric Causal Representation Learning](http://arxiv.org/abs/2301.05169) #robust
Recent years have seen a surge of interest in learning high-level causal representations from low-level image pairs under interventions. Yet, existing efforts are largely limited to simple synthetic settings that are far away from real-world problems. In this paper, we present Causal Triplet, a causal representation learning benchmark featuring not only visually more complex scenes, but also two crucial desiderata commonly overlooked in previous works: (i) an actionable counterfactual setting, where only certain object-level variables allow for counterfactual observations whereas others do not; (ii) an interventional downstream task with an emphasis on out-of-distribution robustness from the independent causal mechanisms principle. Through extensive experiments, we find that models built with the knowledge of disentangled or object-centric representations significantly outperform their distributed counterparts. However, recent causal representation learning methods still struggle to identify such latent structures, indicating substantial challenges and opportunities for future work. Our code and datasets will be available at https://sites.google.com/view/causaltriplet.
[[2301.05175] Scene-Aware 3D Multi-Human Motion Capture from a Single Camera](http://arxiv.org/abs/2301.05175) #robust
In this work, we consider the problem of estimating the 3D position of multiple humans in a scene as well as their body shape and articulation from a single RGB video recorded with a static camera. In contrast to expensive marker-based or multi-view systems, our lightweight setup is ideal for private users as it enables an affordable 3D motion capture that is easy to install and does not require expert knowledge. To deal with this challenging setting, we leverage recent advances in computer vision using large-scale pre-trained models for a variety of modalities, including 2D body joints, joint angles, normalized disparity maps, and human segmentation masks. Thus, we introduce the first non-linear optimization-based approach that jointly solves for the absolute 3D position of each human, their articulated pose, their individual shapes as well as the scale of the scene. In particular, we estimate the scene depth and person unique scale from normalized disparity predictions using the 2D body joints and joint angles. Given the per-frame scene depth, we reconstruct a point-cloud of the static scene in 3D space. Finally, given the per-frame 3D estimates of the humans and scene point-cloud, we perform a space-time coherent optimization over the video to ensure temporal, spatial and physical plausibility. We evaluate our method on established multi-person 3D human pose benchmarks where we consistently outperform previous methods and we qualitatively demonstrate that our method is robust to in-the-wild conditions including challenging scenes with people of different sizes.
[[2301.05187] WIRE: Wavelet Implicit Neural Representations](http://arxiv.org/abs/2301.05187) #robust
Implicit neural representations (INRs) have recently advanced numerous vision-related areas. INR performance depends strongly on the choice of the nonlinear activation function employed in its multilayer perceptron (MLP) network. A wide range of nonlinearities have been explored, but, unfortunately, current INRs designed to have high accuracy also suffer from poor robustness (to signal noise, parameter variation, etc.). Inspired by harmonic analysis, we develop a new, highly accurate and robust INR that does not exhibit this tradeoff. Wavelet Implicit neural REpresentation (WIRE) uses a continuous complex Gabor wavelet activation function that is well-known to be optimally concentrated in space-frequency and to have excellent biases for representing images. A wide range of experiments (image denoising, image inpainting, super-resolution, computed tomography reconstruction, image overfitting, and novel view synthesis with neural radiance fields) demonstrate that WIRE defines the new state of the art in INR accuracy, training time, and robustness.
[[2301.04770] KAER: A Knowledge Augmented Pre-Trained Language Model for Entity Resolution](http://arxiv.org/abs/2301.04770) #robust
Entity resolution has been an essential and well-studied task in data cleaning research for decades. Existing work has discussed the feasibility of utilizing pre-trained language models to perform entity resolution and achieved promising results. However, few works have discussed injecting domain knowledge to improve the performance of pre-trained language models on entity resolution tasks. In this study, we propose Knowledge Augmented Entity Resolution (KAER), a novel framework named for augmenting pre-trained language models with external knowledge for entity resolution. We discuss the results of utilizing different knowledge augmentation and prompting methods to improve entity resolution performance. Our model improves on Ditto, the existing state-of-the-art entity resolution method. In particular, 1) KAER performs more robustly and achieves better results on "dirty data", and 2) with more general knowledge injection, KAER outperforms the existing baseline models on the textual dataset and dataset from the online product domain. 3) KAER achieves competitive results on highly domain-specific datasets, such as citation datasets, requiring the injection of expert knowledge in future work.
[[2301.05220] Adversarial Adaptation for French Named Entity Recognition](http://arxiv.org/abs/2301.05220) #robust
Named Entity Recognition (NER) is the task of identifying and classifying named entities in large-scale texts into predefined classes. NER in French and other relatively limited-resource languages cannot always benefit from approaches proposed for languages like English due to a dearth of large, robust datasets. In this paper, we present our work that aims to mitigate the effects of this dearth of large, labeled datasets. We propose a Transformer-based NER approach for French, using adversarial adaptation to similar domain or general corpora to improve feature extraction and enable better generalization. Our approach allows learning better features using large-scale unlabeled corpora from the same domain or mixed domains to introduce more variations during training and reduce overfitting. Experimental results on three labeled datasets show that our adaptation framework outperforms the corresponding non-adaptive models for various combinations of Transformer models, source datasets, and target corpora. We also show that adversarial adaptation to large-scale unlabeled corpora can help mitigate the performance dip incurred on using Transformer models pre-trained on smaller corpora.
[[2301.04652] Estimate Deformation Capacity of Non-Ductile RC Shear Walls using Explainable Boosting Machine](http://arxiv.org/abs/2301.04652) #robust
Machine learning is becoming increasingly prevalent for tackling challenges in earthquake engineering and providing fairly reliable and accurate predictions. However, it is mostly unclear how decisions are made because machine learning models are generally highly sophisticated, resulting in opaque black-box models. Machine learning models that are naturally interpretable and provide their own decision explanation, rather than using an explanatory, are more accurate in determining what the model actually computes. With this motivation, this study aims to develop a fully explainable machine learning model to predict the deformation capacity of non-ductile reinforced concrete shear walls based on experimental data collected worldwide. The proposed Explainable Boosting Machines (EBM)-based model is an interpretable, robust, naturally explainable glass-box model, yet provides high accuracy comparable to its black-box counterparts. The model enables the user to observe the relationship between the wall properties and the deformation capacity by quantifying the individual contribution of each wall property as well as the correlations among them. The mean coefficient of determination R2 and the mean ratio of predicted to actual value based on the test dataset are 0.92 and 1.05, respectively. The proposed predictive model stands out with its overall consistency with scientific knowledge, practicality, and interpretability without sacrificing high accuracy.
[[2301.04842] Towards High Performance One-Stage Human Pose Estimation](http://arxiv.org/abs/2301.04842) #extraction
Making top-down human pose estimation method present both good performance and high efficiency is appealing. Mask RCNN can largely improve the efficiency by conducting person detection and pose estimation in a single framework, as the features provided by the backbone are able to be shared by the two tasks. However, the performance is not as good as traditional two-stage methods. In this paper, we aim to largely advance the human pose estimation results of Mask-RCNN and still keep the efficiency. Specifically, we make improvements on the whole process of pose estimation, which contains feature extraction and keypoint detection. The part of feature extraction is ensured to get enough and valuable information of pose. Then, we introduce a Global Context Module into the keypoints detection branch to enlarge the receptive field, as it is crucial to successful human pose estimation. On the COCO val2017 set, our model using the ResNet-50 backbone achieves an AP of 68.1, which is 2.6 higher than Mask RCNN (AP of 65.5). Compared to the classic two-stage top-down method SimpleBaseline, our model largely narrows the performance gap (68.1 AP vs. 68.9 AP) with a much faster inference speed (77 ms vs. 168 ms), demonstrating the effectiveness of the proposed method. Code is available at: https://github.com/lingl_space/maskrcnn_keypoint_refined.
[[2301.05029] Interaction models for remaining useful life estimation](http://arxiv.org/abs/2301.05029) #extraction
The paper deals with the problem of controlling the state of industrial devices according to the readings of their sensors. The current methods rely on one approach to feature extraction in which the prediction occurs. We proposed a technique to build a scalable model that combines multiple different feature extractor blocks. A new model based on sequential sensor space analysis achieves state-of-the-art results on the C-MAPSS benchmark for equipment remaining useful life estimation. The resulting model performance was validated including the prediction changes with scaling.
[[2301.04829] Federated Transfer-Ordered-Personalized Learning for Driver Monitoring Application](http://arxiv.org/abs/2301.04829) #federate
Federated learning (FL) shines through in the internet of things (IoT) with its ability to realize collaborative learning and improve learning efficiency by sharing client model parameters trained on local data. Although FL has been successfully applied to various domains, including driver monitoring application (DMA) on the internet of vehicles (IoV), its usages still face some open issues, such as data and system heterogeneity, large-scale parallelism communication resources, malicious attacks, and data poisoning. This paper proposes a federated transfer-ordered-personalized learning (FedTOP) framework to address the above problems and test on two real-world datasets with and without system heterogeneity. The performance of the three extensions, transfer, ordered, and personalized, is compared by an ablation study and achieves 92.32% and 95.96% accuracy on the test clients of two datasets, respectively. Compared to the baseline, there is a 462% improvement in accuracy and a 37.46% reduction in communication resource consumption. The results demonstrate that the proposed FedTOP can be used as a highly accurate, streamlined, privacy-preserving, cybersecurity-oriented, personalized framework for DMA.
[[2301.05219] Why is the State of Neural Network Pruning so Confusing? On the Fairness, Comparison Setup, and Trainability in Network Pruning](http://arxiv.org/abs/2301.05219) #fair
The state of neural network pruning has been noticed to be unclear and even confusing for a while, largely due to "a lack of standardized benchmarks and metrics" [3]. To standardize benchmarks, first, we need to answer: what kind of comparison setup is considered fair? This basic yet crucial question has barely been clarified in the community, unfortunately. Meanwhile, we observe several papers have used (severely) sub-optimal hyper-parameters in pruning experiments, while the reason behind them is also elusive. These sub-optimal hyper-parameters further exacerbate the distorted benchmarks, rendering the state of neural network pruning even more obscure.
Two mysteries in pruning represent such a confusing status: the performance-boosting effect of a larger finetuning learning rate, and the no-value argument of inheriting pretrained weights in filter pruning.
In this work, we attempt to explain the confusing state of network pruning by demystifying the two mysteries. Specifically, (1) we first clarify the fairness principle in pruning experiments and summarize the widely-used comparison setups; (2) then we unveil the two pruning mysteries and point out the central role of network trainability, which has not been well recognized so far; (3) finally, we conclude the paper and give some concrete suggestions regarding how to calibrate the pruning benchmarks in the future. Code: https://github.com/mingsun-tse/why-the-state-of-pruning-so-confusing.
[[2301.04844] SACDNet: Towards Early Type 2 Diabetes Prediction with Uncertainty for Electronic Health Records](http://arxiv.org/abs/2301.04844) #fair
Type 2 diabetes mellitus (T2DM) is one of the most common diseases and a leading cause of death. The problem of early diagnosis of T2DM is challenging and necessary to prevent serious complications. This study proposes a novel neural network architecture for early T2DM prediction using multi-headed self-attention and dense layers to extract features from historic diagnoses, patient vitals, and demographics. The proposed technique is called the Self-Attention for Comorbid Disease Net (SACDNet), achieving an accuracy of 89.3% and an F1-Score of 89.1%, having a 1.6% increased accuracy and 1.3% increased f1-score compared to the baseline techniques. Monte Carlo (MC) Dropout is applied to the SACEDNet to get a bayesian approximation. A T2DM prediction framework based on the MC Dropout SACDNet is proposed to quantize the uncertainty associated with the predictions. A T2DM prediction dataset is also built as part of this study which is based on real-world routine Electronic Health Record (EHR) data comprising 4,124 diabetic and 181,767 non-diabetic examples, collected from 295 different EHR systems running in different parts of the United States of America. This dataset is further used to evaluate 7 different machine learning and 3 deep learning-based models. Finally, a detailed analysis of the fairness of every technique against different patient demographic groups is performed to validate the unbiased generalization of the techniques and the diversity of the data.
[[2301.04704] SensePOLAR: Word sense aware interpretability for pre-trained contextual word embeddings](http://arxiv.org/abs/2301.04704) #interpretability
Adding interpretability to word embeddings represents an area of active research in text representation. Recent work has explored thepotential of embedding words via so-called polar dimensions (e.g. good vs. bad, correct vs. wrong). Examples of such recent approaches include SemAxis, POLAR, FrameAxis, and BiImp. Although these approaches provide interpretable dimensions for words, they have not been designed to deal with polysemy, i.e. they can not easily distinguish between different senses of words. To address this limitation, we present SensePOLAR, an extension of the original POLAR framework that enables word-sense aware interpretability for pre-trained contextual word embeddings. The resulting interpretable word embeddings achieve a level of performance that is comparable to original contextual word embeddings across a variety of natural language processing tasks including the GLUE and SQuAD benchmarks. Our work removes a fundamental limitation of existing approaches by offering users sense aware interpretations for contextual word embeddings.
[[2301.04740] The Berkelmans-Pries Feature Importance Method: A Generic Measure of Informativeness of Features](http://arxiv.org/abs/2301.04740) #interpretability
Over the past few years, the use of machine learning models has emerged as a generic and powerful means for prediction purposes. At the same time, there is a growing demand for interpretability of prediction models. To determine which features of a dataset are important to predict a target variable $Y$, a Feature Importance (FI) method can be used. By quantifying how important each feature is for predicting $Y$, irrelevant features can be identified and removed, which could increase the speed and accuracy of a model, and moreover, important features can be discovered, which could lead to valuable insights. A major problem with evaluating FI methods, is that the ground truth FI is often unknown. As a consequence, existing FI methods do not give the exact correct FI values. This is one of the many reasons why it can be hard to properly interpret the results of an FI method. Motivated by this, we introduce a new global approach named the Berkelmans-Pries FI method, which is based on a combination of Shapley values and the Berkelmans-Pries dependency function. We prove that our method has many useful properties, and accurately predicts the correct FI values for several cases where the ground truth FI can be derived in an exact manner. We experimentally show for a large collection of FI methods (468) that existing methods do not have the same useful properties. This shows that the Berkelmans-Pries FI method is a highly valuable tool for analyzing datasets with complex interdependencies.
[[2301.05062] Tracr: Compiled Transformers as a Laboratory for Interpretability](http://arxiv.org/abs/2301.05062) #interpretability
Interpretability research aims to build tools for understanding machine learning (ML) models. However, such tools are inherently hard to evaluate because we do not have ground truth information about how ML models actually work. In this work, we propose to build transformer models manually as a testbed for interpretability research. We introduce Tracr, a "compiler" for translating human-readable programs into weights of a transformer model. Tracr takes code written in RASP, a domain-specific language (Weiss et al. 2021), and translates it into weights for a standard, decoder-only, GPT-like transformer architecture. We use Tracr to create a range of ground truth transformers that implement programs including computing token frequencies, sorting, and Dyck-n parenthesis checking, among others. To enable the broader research community to explore and use compiled models, we provide an open-source implementation of Tracr at https://github.com/deepmind/tracr.
[[2301.05217] Progress measures for grokking via mechanistic interpretability](http://arxiv.org/abs/2301.05217) #interpretability
Neural networks often exhibit emergent behavior, where qualitatively new capabilities arise from scaling up the amount of parameters, training data, or training steps. One approach to understanding emergence is to find continuous \textit{progress measures} that underlie the seemingly discontinuous qualitative changes. We argue that progress measures can be found via mechanistic interpretability: reverse-engineering learned behaviors into their individual components. As a case study, we investigate the recently-discovered phenomenon of ``grokking'' exhibited by small transformers trained on modular addition tasks. We fully reverse engineer the algorithm learned by these networks, which uses discrete Fourier transforms and trigonometric identities to convert addition to rotation about a circle. We confirm the algorithm by analyzing the activations and weights and by performing ablations in Fourier space. Based on this understanding, we define progress measures that allow us to study the dynamics of training and split training into three continuous phases: memorization, circuit formation, and cleanup. Our results show that grokking, rather than being a sudden shift, arises from the gradual amplification of structured mechanisms encoded in the weights, followed by the later removal of memorizing components.
[[2301.04802] Diffusion-based Data Augmentation for Skin Disease Classification: Impact Across Original Medical Datasets to Fully Synthetic Images](http://arxiv.org/abs/2301.04802) #diffusion
Despite continued advancement in recent years, deep neural networks still rely on large amounts of training data to avoid overfitting. However, labeled training data for real-world applications such as healthcare is limited and difficult to access given longstanding privacy, and strict data sharing policies. By manipulating image datasets in the pixel or feature space, existing data augmentation techniques represent one of the effective ways to improve the quantity and diversity of training data. Here, we look to advance augmentation techniques by building upon the emerging success of text-to-image diffusion probabilistic models in augmenting the training samples of our macroscopic skin disease dataset. We do so by enabling fine-grained control of the image generation process via input text prompts. We demonstrate that this generative data augmentation approach successfully maintains a similar classification accuracy of the visual classifier even when trained on a fully synthetic skin disease dataset. Similar to recent applications of generative models, our study suggests that diffusion models are indeed effective in generating high-quality skin images that do not sacrifice the classifier performance, and can improve the augmentation of training datasets after curation.
[[2301.05221] Guiding Text-to-Image Diffusion Model Towards Grounded Generation](http://arxiv.org/abs/2301.05221) #diffusion
The goal of this paper is to augment a pre-trained text-to-image diffusion model with the ability of open-vocabulary objects grounding, i.e., simultaneously generating images and segmentation masks for the corresponding visual entities described in the text prompt. We make the following contributions: (i) we insert a grounding module into the existing diffusion model, that can be trained to align the visual and textual embedding space of the diffusion model with only a small number of object categories; (ii) we propose an automatic pipeline for constructing a dataset, that consists of {image, segmentation mask, text prompt} triplets, to train the proposed grounding module; (iii) we evaluate the performance of open-vocabulary grounding on images generated from the text-to-image diffusion model and show that the module can well segment the objects of categories beyond seen ones at training time; (iv) we adopt the guided diffusion model to build a synthetic semantic segmentation dataset, and show that training a standard segmentation model on such dataset demonstrates competitive performance on zero-shot segmentation(ZS3) benchmark, which opens up new opportunities for adopting the powerful diffusion model for discriminative tasks.
[[2301.04655] ChatGPT is not all you need](http://arxiv.org/abs/2301.04655) #diffusion
During the last two years there has been a plethora of large generative models such as ChatGPT or Stable Diffusion that have been published. Concretely, these models are able to perform tasks such as being a general question and answering system or automatically creating artistic images that are revolutionizing several sectors. Consequently, the implications that these generative models have in the industry and society are enormous, as several job positions may be transformed. For example, Generative AI is capable of transforming effectively and creatively texts to images, like the DALLE-2 model; text to 3D images, like the Dreamfusion model; images to text, like the Flamingo model; texts to video, like the Phenaki model; texts to audio, like the AudioLM model; texts to other texts, like ChatGPT; texts to code, like the Codex model; texts to scientific texts, like the Galactica model or even create algorithms like AlphaTensor. This work consists on an attempt to describe in a concise way the main models are sectors that are affected by generative AI and to provide a taxonomy of the main generative models published recently.
[[2301.05182] Thompson Sampling with Diffusion Generative Prior](http://arxiv.org/abs/2301.05182) #diffusion
In this work, we initiate the idea of using denoising diffusion models to learn priors for online decision making problems. Our special focus is on the meta-learning for bandit framework, with the goal of learning a strategy that performs well across bandit tasks of a same class. To this end, we train a diffusion model that learns the underlying task distribution and combine Thompson sampling with the learned prior to deal with new tasks at test time. Our posterior sampling algorithm is designed to carefully balance between the learned prior and the noisy observations that come from the learner's interaction with the environment. To capture realistic bandit scenarios, we also propose a novel diffusion model training procedure that trains even from incomplete and/or noisy data, which could be of independent interest. Finally, our extensive experimental evaluations clearly demonstrate the potential of the proposed approach.