[[2211.10027] Secure Quantum Computing for Healthcare Sector: A Short Analysis](http://arxiv.org/abs/2211.10027)
Quantum computing research might lead to "quantum leaps," and it could have unanticipated repercussions in the medical field. This technique has the potential to be used in a broad range of contexts, some of which include the development of novel drugs, the individualization of medical treatments, and the speeding of DNA sequencing. This work has assembled a list of the numerous methodologies presently employed in quantum medicine and other disciplines pertaining to healthcare. This work has created a list of the most critical concerns that need to be addressed before the broad use of quantum computing can be realized. In addition, this work investigates in detail the ways in which potential future applications of quantum computing might compromise the safety of healthcare delivery systems from the perspective of the medical industry and the patient-centric healthcare system. The primary objective of this investigation into quantum cryptography is to locate any potential flaws in the cryptographic protocols and strategies that have only very recently been the focus of scrutiny from academic research community members.
[[2211.10117] Scaling Native Language Identification with Transformer Adapters](http://arxiv.org/abs/2211.10117)
Native language identification (NLI) is the task of automatically identifying the native language (L1) of an individual based on their language production in a learned language. It is useful for a variety of purposes including marketing, security and educational applications. NLI is usually framed as a multi-label classification task, where numerous designed features are combined to achieve state-of-the-art results. Recently deep generative approach based on transformer decoders (GPT-2) outperformed its counterparts and achieved the best results on the NLI benchmark datasets. We investigate this approach to determine the practical implications compared to traditional state-of-the-art NLI systems. We introduce transformer adapters to address memory limitations and improve training/inference speed to scale NLI applications for production.
[[2211.10028] Comparative evaluation of different methods of "Homomorphic Encryption" and "Traditional Encryption" on a dataset with current problems and developments](http://arxiv.org/abs/2211.10028)
A database is a prime target for cyber-attacks as it contains confidential, sensitive, or protected information. With the increasing sophistication of the internet and dependencies on internet data transmission, it has become vital to be aware of various encryption technologies and trends. It can assist in safeguarding private information and sensitive data, as well as improve the security of client-server communication. Database encryption is a procedure that employs an algorithm to convert data contained in a database into "cipher text," which is incomprehensible until decoded. Homomorphic encryption technology, which works with encrypted data, can be utilized in both symmetric and asymmetric systems. In this paper, we evaluated homomorphic encryption techniques based on recent highly cited articles, as well as compared all database encryption problems and developments since 2018. The benefits and drawbacks of homomorphic approaches were examined over classic encryption methods including Transparent Database Encryption, Column Level Encryption, Field Level Encryption, File System Level Encryption, and Encrypting File System Encryption in this review. Additionally, popular databases that provide encryption services to their customers to protect their data are also examined.
[[2211.10062] Intrusion Detection in Internet of Things using Convolutional Neural Networks](http://arxiv.org/abs/2211.10062)
Internet of Things (IoT) has become a popular paradigm to fulfil needs of the industry such as asset tracking, resource monitoring and automation. As security mechanisms are often neglected during the deployment of IoT devices, they are more easily attacked by complicated and large volume intrusion attacks using advanced techniques. Artificial Intelligence (AI) has been used by the cyber security community in the past decade to automatically identify such attacks. However, deep learning methods have yet to be extensively explored for Intrusion Detection Systems (IDS) specifically for IoT. Most recent works are based on time sequential models like LSTM and there is short of research in CNNs as they are not naturally suited for this problem. In this article, we propose a novel solution to the intrusion attacks against IoT devices using CNNs. The data is encoded as the convolutional operations to capture the patterns from the sensors data along time that are useful for attacks detection by CNNs. The proposed method is integrated with two classical CNNs: ResNet and EfficientNet, where the detection performance is evaluated. The experimental results show significant improvement in both true positive rate and false positive rate compared to the baseline using LSTM.
[[2211.10299] Trusted Hart for Mobile RISC-V Security](http://arxiv.org/abs/2211.10299)
The majority of mobile devices today are based on Arm architecture that supports the hosting of trusted applications in Trusted Execution Environment (TEE). RISC-V is a relatively new open-source instruction set architecture that was engineered to fit many uses. In one potential RISC-V usage scenario, mobile devices could be based on RISC-V hardware.
We consider the implications of porting the mobile security stack on top of a RISC-V system on a chip, identify the gaps in the open-source Keystone framework for building custom TEEs, and propose a security architecture that, among other things, supports the GlobalPlatform TEE API specification for trusted applications. In addition to Keystone enclaves the architecture includes a Trusted Hart -- a normal core that runs a trusted operating system and is dedicated for security functions, like control of the device's keystore and the management of secure peripherals.
The proposed security architecture for RISC-V platform is verified experimentally using the HiFive Unleashed RISC-V development board.
[[2211.10173] How Do Input Attributes Impact the Privacy Loss in Differential Privacy?](http://arxiv.org/abs/2211.10173)
Differential privacy (DP) is typically formulated as a worst-case privacy guarantee over all individuals in a database. More recently, extensions to individual subjects or their attributes, have been introduced. Under the individual/per-instance DP interpretation, we study the connection between the per-subject gradient norm in DP neural networks and individual privacy loss and introduce a novel metric termed the Privacy Loss-Input Susceptibility (PLIS), which allows one to apportion the subject's privacy loss to their input attributes. We experimentally show how this enables the identification of sensitive attributes and of subjects at high risk of data reconstruction.
[[2211.10014] Users are Closer than they Appear: Protecting User Location from WiFi APs](http://arxiv.org/abs/2211.10014)
WiFi-based indoor localization has now matured for over a decade. Most of the current localization algorithms rely on the WiFi access points (APs) in the enterprise network to localize the WiFi user accurately. Thus, the WiFi user's location information could be easily snooped by an attacker listening through a compromised WiFi AP. With indoor localization and navigation being the next step towards automation, it is important to give users the capability to defend against such attacks. In this paper, we present MIRAGE, a system that can utilize the downlink physical layer information to create a defense against an attacker snooping on a WiFi user's location information. MIRAGE achieves this by utilizing the beamforming capability of the transmitter that is already part of the WiFi protocols. With this initial idea, we have demonstrated that the user can obfuscate his/her location from the WiFi AP always with no compromise to the throughput of the existing WiFi communication system and reduce the user location accuracy of the attacker from 2.3m to more than 10m.
[[2211.10048] Clustering based opcode graph generation for malware variant detection](http://arxiv.org/abs/2211.10048)
Malwares are the key means leveraged by threat actors in the cyber space for their attacks. There is a large array of commercial solutions in the market and significant scientific research to tackle the challenge of the detection and defense against malwares. At the same time, attackers also advance their capabilities in creating polymorphic and metamorphic malwares to make it increasingly challenging for existing solutions. To tackle this issue, we propose a methodology to perform malware detection and family attribution. The proposed methodology first performs the extraction of opcodes from malwares in each family and constructs their respective opcode graphs. We explore the use of clustering algorithms on the opcode graphs to detect clusters of malwares within the same malware family. Such clusters can be seen as belonging to different sub-family groups. Opcode graph signatures are built from each detected cluster. Hence, for each malware family, a group of signatures is generated to represent the family. These signatures are used to classify an unknown sample as benign or belonging to one the malware families. We evaluate our methodology by performing experiments on a dataset consisting of both benign files and malware samples belonging to a number of different malware families and comparing the results to existing approach.
[[2211.09959] Potential Auto-driving Threat: Universal Rain-removal Attack](http://arxiv.org/abs/2211.09959)
The problem of robustness in adverse weather conditions is considered a significant challenge for computer vision algorithms in the applicants of autonomous driving. Image rain removal algorithms are a general solution to this problem. They find a deep connection between raindrops/rain-streaks and images by mining the hidden features and restoring information about the rain-free environment based on the powerful representation capabilities of neural networks. However, previous research has focused on architecture innovations and has yet to consider the vulnerability issues that already exist in neural networks. This research gap hints at a potential security threat geared toward the intelligent perception of autonomous driving in the rain. In this paper, we propose a universal rain-removal attack (URA) on the vulnerability of image rain-removal algorithms by generating a non-additive spatial perturbation that significantly reduces the similarity and image quality of scene restoration. Notably, this perturbation is difficult to recognise by humans and is also the same for different target images. Thus, URA could be considered a critical tool for the vulnerability detection of image rain-removal algorithms. It also could be developed as a real-world artificial intelligence attack method. Experimental results show that URA can reduce the scene repair capability by 39.5% and the image generation quality by 26.4%, targeting the state-of-the-art (SOTA) single-image rain-removal algorithms currently available.
[[2211.10227] Adversarial Detection by Approximation of Ensemble Boundary](http://arxiv.org/abs/2211.10227)
A spectral approximation of a Boolean function is proposed for approximating the decision boundary of an ensemble of Deep Neural Networks (DNNs) solving two-class pattern recognition problems. The Walsh combination of relatively weak DNN classifiers is shown experimentally to be capable of detecting adversarial attacks. By observing the difference in Walsh coefficient approximation between clean and adversarial images, it appears that transferability of attack may be used for detection. Approximating the decision boundary may also aid in understanding the learning and transferability properties of DNNs. While the experiments here use images, the proposed approach of modelling two-class ensemble decision boundaries could in principle be applied to any application area.
[[2211.10024] Diagnostics for Deep Neural Networks with Automated Copy/Paste Attacks](http://arxiv.org/abs/2211.10024)
Deep neural networks (DNNs) are powerful, but they can make mistakes that pose significant risks. A model performing well on a test set does not imply safety in deployment, so it is important to have additional tools to understand its flaws. Adversarial examples can help reveal weaknesses, but they are often difficult for a human to interpret or draw generalizable, actionable conclusions from. Some previous works have addressed this by studying human-interpretable attacks. We build on these with three contributions. First, we introduce a method termed Search for Natural Adversarial Features Using Embeddings (SNAFUE) which offers a fully-automated method for finding "copy/paste" attacks in which one natural image can be pasted into another in order to induce an unrelated misclassification. Second, we use this to red team an ImageNet classifier and identify hundreds of easily-describable sets of vulnerabilities. Third, we compare this approach with other interpretability tools by attempting to rediscover trojans. Our results suggest that SNAFUE can be useful for interpreting DNNs and generating adversarial data for them. Code is available at https://github.com/thestephencasper/snafue
[[2211.10033] Adversarial Stimuli: Attacking Brain-Computer Interfaces via Perturbed Sensory Events](http://arxiv.org/abs/2211.10033)
Machine learning models are known to be vulnerable to adversarial perturbations in the input domain, causing incorrect predictions. Inspired by this phenomenon, we explore the feasibility of manipulating EEG-based Motor Imagery (MI) Brain Computer Interfaces (BCIs) via perturbations in sensory stimuli. Similar to adversarial examples, these \emph{adversarial stimuli} aim to exploit the limitations of the integrated brain-sensor-processing components of the BCI system in handling shifts in participants' response to changes in sensory stimuli. This paper proposes adversarial stimuli as an attack vector against BCIs, and reports the findings of preliminary experiments on the impact of visual adversarial stimuli on the integrity of EEG-based MI BCIs. Our findings suggest that minor adversarial stimuli can significantly deteriorate the performance of MI BCIs across all participants (p=0.0003). Additionally, our results indicate that such attacks are more effective in conditions with induced stress.
[[2211.10076] Applications of Quantum Annealing in Cryptography](http://arxiv.org/abs/2211.10076)
This paper presents a new method to reduce the optimization of a pseudo-Boolean function to QUBO problem which can be solved by quantum annealer. The new method has two aspects, one is coefficient optimization and the other is variable optimization. The former is an improvement on the existing algorithm in a special case. The latter is realized by means of the maximal independent point set in graph theory. We apply this new method in integer factorization on quantum annealers and achieve the largest integer factorization (4137131) with 93 variables, the range of coefficients is [-1024,1024] which is much smaller than the previous results. We also focus on the quantum attacks on block ciphers and present an efficient method with smaller coefficients to transform Boolean equation systems into QUBO problems.
[[2211.10209] Leveraging Algorithmic Fairness to Mitigate Blackbox Attribute Inference Attacks](http://arxiv.org/abs/2211.10209)
Machine learning (ML) models have been deployed for high-stakes applications, e.g., healthcare and criminal justice. Prior work has shown that ML models are vulnerable to attribute inference attacks where an adversary, with some background knowledge, trains an ML attack model to infer sensitive attributes by exploiting distinguishable model predictions. However, some prior attribute inference attacks have strong assumptions about adversary's background knowledge (e.g., marginal distribution of sensitive attribute) and pose no more privacy risk than statistical inference. Moreover, none of the prior attacks account for class imbalance of sensitive attribute in datasets coming from real-world applications (e.g., Race and Sex). In this paper, we propose an practical and effective attribute inference attack that accounts for this imbalance using an adaptive threshold over the attack model's predictions. We exhaustively evaluate our proposed attack on multiple datasets and show that the adaptive threshold over the model's predictions drastically improves the attack accuracy over prior work. Finally, current literature lacks an effective defence against attribute inference attacks. We investigate the impact of fairness constraints (i.e., designed to mitigate unfairness in model predictions) during model training on our attribute inference attack. We show that constraint based fairness algorithms which enforces equalized odds acts as an effective defense against attribute inference attacks without impacting the model utility. Hence, the objective of algorithmic fairness and sensitive attribute privacy are aligned.
[[2211.10260] Integrated Space Domain Awareness and Communication System](http://arxiv.org/abs/2211.10260)
Space has been reforming and this evolution brings new threats that, together with technological developments and malicious intent, can pose a major challenge. Space domain awareness (SDA), a new conceptual idea, has come to the forefront. It aims sensing, detection, identification and countermeasures by providing autonomy, intelligence and flexibility against potential threats in space. In this study, we first present an insightful and clear view of the new space. Secondly, we propose an integrated SDA and communication (ISDAC) system for attacker detection. We assume that the attacker has beam-steering antennas and is capable to vary attack scenarios, such as random attacks on some receiver antennas. To track random patterns and meet SDA requirements, a lightweight convolutional neural network architecture is developed. The proposed ISDAC system shows superior and robust performance under 12 different attacker configurations with a detection accuracy of over 97.8%.
[[2211.09859] Data-Centric Debugging: mitigating model failures via targeted data collection](http://arxiv.org/abs/2211.09859)
Deep neural networks can be unreliable in the real world when the training set does not adequately cover all the settings where they are deployed. Focusing on image classification, we consider the setting where we have an error distribution $\mathcal{E}$ representing a deployment scenario where the model fails. We have access to a small set of samples $\mathcal{E}{sample}$ from $\mathcal{E}$ and it can be expensive to obtain additional samples. In the traditional model development framework, mitigating failures of the model in $\mathcal{E}$ can be challenging and is often done in an ad hoc manner. In this paper, we propose a general methodology for model debugging that can systemically improve model performance on $\mathcal{E}$ while maintaining its performance on the original test set. Our key assumption is that we have access to a large pool of weakly (noisily) labeled data $\mathcal{F}$. However, naively adding $\mathcal{F}$ to the training would hurt model performance due to the large extent of label noise. Our Data-Centric Debugging (DCD) framework carefully creates a debug-train set by selecting images from $\mathcal{F}$ that are perceptually similar to the images in $\mathcal{E}{sample}$. To do this, we use the $\ell_2$ distance in the feature space (penultimate layer activations) of various models including ResNet, Robust ResNet and DINO where we observe DINO ViTs are significantly better at discovering similar images compared to Resnets. Compared to LPIPS, we find that our method reduces compute and storage requirements by 99.58\%. Compared to the baselines that maintain model performance on the test set, we achieve significantly (+9.45\%) improved results on the debug-heldout sets.
[[2211.09945] SparseVLR: A Novel Framework for Verified Locally Robust Sparse Neural Networks Search](http://arxiv.org/abs/2211.09945)
The compute-intensive nature of neural networks (NNs) limits their deployment in resource-constrained environments such as cell phones, drones, autonomous robots, etc. Hence, developing robust sparse models fit for safety-critical applications has been an issue of longstanding interest. Though adversarial training with model sparsification has been combined to attain the goal, conventional adversarial training approaches provide no formal guarantee that the models would be robust against any rogue samples in a restricted space around a benign sample. Recently proposed verified local robustness techniques provide such a guarantee. This is the first paper that combines the ideas from verified local robustness and dynamic sparse training to develop `SparseVLR'-- a novel framework to search verified locally robust sparse networks. Obtained sparse models exhibit accuracy and robustness comparable to their dense counterparts at sparsity as high as 99%. Furthermore, unlike most conventional sparsification techniques, SparseVLR does not require a pre-trained dense model, reducing the training time by 50%. We exhaustively investigated SparseVLR's efficacy and generalizability by evaluating various benchmark and application-specific datasets across several models.
[[2211.10288] Just a Matter of Scale? Reevaluating Scale Equivariance in Convolutional Neural Networks](http://arxiv.org/abs/2211.10288)
The widespread success of convolutional neural networks may largely be attributed to their intrinsic property of translation equivariance. However, convolutions are not equivariant to variations in scale and fail to generalize to objects of different sizes. Despite recent advances in this field, it remains unclear how well current methods generalize to unobserved scales on real-world data and to what extent scale equivariance plays a role. To address this, we propose the novel Scaled and Translated Image Recognition (STIR) benchmark based on four different domains. Additionally, we introduce a new family of models that applies many re-scaled kernels with shared weights in parallel and then selects the most appropriate one. Our experimental results on STIR show that both the existing and proposed approaches can improve generalization across scales compared to standard convolutions. We also demonstrate that our family of models is able to generalize well towards larger scales and improve scale equivariance. Moreover, due to their unique design we can validate that kernel selection is consistent with input scale. Even so, none of the evaluated models maintain their performance for large differences in scale, demonstrating that a general understanding of how scale equivariance can improve generalization and robustness is still lacking.
[[2211.10086] Metadata Might Make Language Models Better](http://arxiv.org/abs/2211.10086)
This paper discusses the benefits of including metadata when training language models on historical collections. Using 19th-century newspapers as a case study, we extend the time-masking approach proposed by Rosin et al., 2022 and compare different strategies for inserting temporal, political and geographical information into a Masked Language Model. After fine-tuning several DistilBERT on enhanced input data, we provide a systematic evaluation of these models on a set of evaluation tasks: pseudo-perplexity, metadata mask-filling and supervised classification. We find that showing relevant metadata to a language model has a beneficial impact and may even produce more robust and fairer models.
[[2211.10163] Overview of the HASOC Subtrack at FIRE 2022: Offensive Language Identification in Marathi](http://arxiv.org/abs/2211.10163)
The widespread of offensive content online has become a reason for great concern in recent years, motivating researchers to develop robust systems capable of identifying such content automatically. With the goal of carrying out a fair evaluation of these systems, several international competitions have been organized, providing the community with important benchmark data and evaluation methods for various languages. Organized since 2019, the HASOC (Hate Speech and Offensive Content Identification) shared task is one of these initiatives. In its fourth iteration, HASOC 2022 included three subtracks for English, Hindi, and Marathi. In this paper, we report the results of the HASOC 2022 Marathi subtrack which provided participants with a dataset containing data from Twitter manually annotated using the popular OLID taxonomy. The Marathi track featured three additional subtracks, each corresponding to one level of the taxonomy: Task A - offensive content identification (offensive vs. non-offensive); Task B - categorization of offensive types (targeted vs. untargeted), and Task C - offensive target identification (individual vs. group vs. others). Overall, 59 runs were submitted by 10 teams. The best systems obtained an F1 of 0.9745 for Subtrack 3A, an F1 of 0.9207 for Subtrack 3B, and F1 of 0.9607 for Subtrack 3C. The best performing algorithms were a mixture of traditional and deep learning approaches.
[[2211.09810] Certifying Robustness of Convolutional Neural Networks with Tight Linear Approximation](http://arxiv.org/abs/2211.09810)
The robustness of neural network classifiers is becoming important in the safety-critical domain and can be quantified by robustness verification. However, at present, efficient and scalable verification techniques are always sound but incomplete. Therefore, the improvement of certified robustness bounds is the key criterion to evaluate the superiority of robustness verification approaches. In this paper, we present a Tight Linear approximation approach for robustness verification of Convolutional Neural Networks(Ti-Lin). For general CNNs, we first provide a new linear constraints for S-shaped activation functions, which is better than both existing Neuron-wise Tightest and Network-wise Tightest tools. We then propose Neuron-wise Tightest linear bounds for Maxpool function. We implement Ti-Lin, the resulting verification method. We evaluate it with 48 different CNNs trained on MNIST, CIFAR-10, and Tiny ImageNet datasets. Experimental results show that Ti-Lin significantly outperforms other five state-of-the-art methods(CNN-Cert, DeepPoly, DeepCert, VeriNet, Newise). Concretely, Ti-Lin certifies much more precise robustness bounds on pure CNNs with Sigmoid/Tanh/Arctan functions and CNNs with Maxpooling function with at most 63.70% and 253.54% improvement, respectively.
[[2211.10095] Improving Robustness of TCM-based Robust Steganography with Variable Robustness](http://arxiv.org/abs/2211.10095)
Recent study has found out that after multiple times of recompression, the DCT coefficients of JPEG image can form an embedding domain that is robust to recompression, which is called transport channel matching (TCM) method. Because the cost function of the adaptive steganography does not consider the impact of modification on the robustness, the modified DCT coefficients of the stego image after TCM will change after recompression. To reduce the number of changed coefficients after recompression, this paper proposes a robust steganography algorithm which dynamically updates the robustness cost of every DCT coefficient. The robustness cost proposed is calculated by testing whether the modified DCT coefficient can resist recompression in every step of STC embedding process. By adding robustness cost to the distortion cost and using the framework of STC embedding algorithm to embed the message, the stego images have good performance both in robustness and security. The experimental results show that the proposed algorithm can significantly enhance the robustness of stego images, and the embedded messages could be extracted correctly at almost all cases when recompressing with a lower quality factor and recompression process is known to the user of proposed algorithm.
[[2211.09894] Features Compression based on Counterfactual Analysis](http://arxiv.org/abs/2211.09894)
Counterfactual Explanations are becoming a de-facto standard in post-hoc interpretable machine learning. For a given classifier and an instance classified in an undesired class, its counterfactual explanation corresponds to small perturbations of that instance that allow changing the classification outcome. This work aims to leverage Counterfactual Explanations to detect the important decision boundaries of a pre-trained black-box model. This information is used to build a supervised discretization of the features in the dataset with a tunable granularity. A small and interpretable Decision Tree is trained on the discretized dataset that is stable and robust. Numerical results on real-world datasets show the effectiveness of the approach.
[[2211.09954] Robust DNN Surrogate Models with Uncertainty Quantification via Adversarial Training](http://arxiv.org/abs/2211.09954)
For computational efficiency, surrogate models have been used to emulate mathematical simulators for physical or biological processes. High-speed simulation is crucial for conducting uncertainty quantification (UQ) when the simulation is repeated over many randomly sampled input points (aka, the Monte Carlo method). In some cases, UQ is only feasible with a surrogate model. Recently, Deep Neural Network (DNN) surrogate models have gained popularity for their hard-to-match emulation accuracy. However, it is well-known that DNN is prone to errors when input data are perturbed in particular ways, the very motivation for adversarial training. In the usage scenario of surrogate models, the concern is less of a deliberate attack but more of the high sensitivity of the DNN's accuracy to input directions, an issue largely ignored by researchers using emulation models. In this paper, we show the severity of this issue through empirical studies and hypothesis testing. Furthermore, we adopt methods in adversarial training to enhance the robustness of DNN surrogate models. Experiments demonstrate that our approaches significantly improve the robustness of the surrogate models without compromising emulation accuracy.
[[2211.09981] Weighted Ensemble Self-Supervised Learning](http://arxiv.org/abs/2211.09981)
Ensembling has proven to be a powerful technique for boosting model performance, uncertainty estimation, and robustness in supervised learning. Advances in self-supervised learning (SSL) enable leveraging large unlabeled corpora for state-of-the-art few-shot and supervised learning performance. In this paper, we explore how ensemble methods can improve recent SSL techniques by developing a framework that permits data-dependent weighted cross-entropy losses. We refrain from ensembling the representation backbone; this choice yields an efficient ensemble method that incurs a small training cost and requires no architectural changes or computational overhead to downstream evaluation. The effectiveness of our method is demonstrated with two state-of-the-art SSL methods, DINO (Caron et al., 2021) and MSN (Assran et al., 2022). Our method outperforms both in multiple evaluation metrics on ImageNet-1K, particularly in the few-shot setting. We explore several weighting schemes and find that those which increase the diversity of ensemble heads lead to better downstream evaluation results. Thorough experiments yield improved prior art baselines which our method still surpasses; e.g., our overall improvement with MSN ViT-B/16 is 3.9 p.p. for 1-shot learning.
[[2211.10012] A Tale of Two Cities: Data and Configuration Variances in Robust Deep Learning](http://arxiv.org/abs/2211.10012)
Deep neural networks (DNNs), are widely used in many industries such as image recognition, supply chain, medical diagnosis, and autonomous driving. However, prior work has shown the high accuracy of a DNN model does not imply high robustness (i.e., consistent performances on new and future datasets) because the input data and external environment (e.g., software and model configurations) for a deployed model are constantly changing. Hence, ensuring the robustness of deep learning is not an option but a priority to enhance business and consumer confidence. Previous studies mostly focus on the data aspect of model variance. In this article, we systematically summarize DNN robustness issues and formulate them in a holistic view through two important aspects, i.e., data and software configuration variances in DNNs. We also provide a predictive framework to generate representative variances (counterexamples) by considering both data and configurations for robust learning through the lens of search-based optimization.
[[2211.10420] Mirror Sinkhorn: Fast Online Optimization on Transport Polytopes](http://arxiv.org/abs/2211.10420)
Optimal transport has arisen as an important tool in machine learning, allowing to capture geometric properties of the data. It is formulated as a linear program on transport polytopes. The problem of convex optimization on this set includes both OT and multiple related ones, such as point cloud registration.
We present in this work an optimization algorithm that utilizes Sinkhorn matrix scaling and mirror descent to minimize convex objectives on this domain. This algorithm can be run online and is both adaptive and robust to noise. A mathematical analysis of the convergence rate of the algorithm for minimising convex functions is provided, as well as experiments that illustrate its performance on synthetic data and real-world data.
[[2211.09950] TempNet: Temporal Attention Towards the Detection of Animal Behaviour in Videos](http://arxiv.org/abs/2211.09950)
Recent advancements in cabled ocean observatories have increased the quality and prevalence of underwater videos; this data enables the extraction of high-level biologically relevant information such as species' behaviours. Despite this increase in capability, most modern methods for the automatic interpretation of underwater videos focus only on the detection and counting organisms. We propose an efficient computer vision- and deep learning-based method for the detection of biological behaviours in videos. TempNet uses an encoder bridge and residual blocks to maintain model performance with a two-staged, spatial, then temporal, encoder. TempNet also presents temporal attention during spatial encoding as well as Wavelet Down-Sampling pre-processing to improve model accuracy. Although our system is designed for applications to diverse fish behaviours (i.e, is generic), we demonstrate its application to the detection of sablefish (Anoplopoma fimbria) startle events. We compare the proposed approach with a state-of-the-art end-to-end video detection method (ReMotENet) and a hybrid method previously offered exclusively for the detection of sablefish's startle events in videos from an existing dataset. Results show that our novel method comfortably outperforms the comparison baselines in multiple metrics, reaching a per-clip accuracy and precision of 80% and 0.81, respectively. This represents a relative improvement of 31% in accuracy and 27% in precision over the compared methods using this dataset. Our computational pipeline is also highly efficient, as it can process each 4-second video clip in only 38ms. Furthermore, since it does not employ features specific to sablefish startle events, our system can be easily extended to other behaviours in future works.
[[2211.09979] Comparison between EM and FCM algorithms in skin tone extraction](http://arxiv.org/abs/2211.09979)
This study aims to investigate implementing EM and FCM algorithms for skin color extraction. The capabilities of three well-known color spaces, namely, RGB, HSV, and YCbCr for skin-tone extraction are assessed by using statistical modeling of skin tones using EM and FCM algorithms. The results show that utilizing a Gaussian mixture model for parametric modeling of skin tones using EM algorithm works well in HSV color space when all three components of the color vector are used. In spite of discarding the luminance components in YCbCr and HSV color spaces, EM algorithm provides the best results. The results of the detailed comparisons are explained in the conclusion.
[[2211.10154] CRAFT: Concept Recursive Activation FacTorization for Explainability](http://arxiv.org/abs/2211.10154)
Attribution methods are a popular class of explainability methods that use heatmaps to depict the most important areas of an image that drive a model decision. Nevertheless, recent work has shown that these methods have limited utility in practice, presumably because they only highlight the most salient parts of an image (i.e., 'where' the model looked) and do not communicate any information about 'what' the model saw at those locations. In this work, we try to fill in this gap with CRAFT -- a novel approach to identify both 'what' and 'where' by generating concept-based explanations. We introduce 3 new ingredients to the automatic concept extraction literature: (i) a recursive strategy to detect and decompose concepts across layers, (ii) a novel method for a more faithful estimation of concept importance using Sobol indices, and (iii) the use of implicit differentiation to unlock Concept Attribution Maps. We conduct both human and computer vision experiments to demonstrate the benefits of the proposed approach. We show that our recursive decomposition generates meaningful and accurate concepts and that the proposed concept importance estimation technique is more faithful to the model than previous methods. When evaluating the usefulness of the method for human experimenters on a human-defined utility benchmark, we find that our approach significantly improves on two of the three test scenarios (while none of the current methods including ours help on the third). Overall, our study suggests that, while much work remains toward the development of general explainability methods that are useful in practical scenarios, the identification of meaningful concepts at the proper level of granularity yields useful and complementary information beyond that afforded by attribution methods.
[[2211.10018] A Dataset for Hyper-Relational Extraction and a Cube-Filling Approach](http://arxiv.org/abs/2211.10018)
Relation extraction has the potential for large-scale knowledge graph construction, but current methods do not consider the qualifier attributes for each relation triplet, such as time, quantity or location. The qualifiers form hyper-relational facts which better capture the rich and complex knowledge graph structure. For example, the relation triplet (Leonard Parker, Educated At, Harvard University) can be factually enriched by including the qualifier (End Time, 1967). Hence, we propose the task of hyper-relational extraction to extract more specific and complete facts from text. To support the task, we construct HyperRED, a large-scale and general-purpose dataset. Existing models cannot perform hyper-relational extraction as it requires a model to consider the interaction between three entities. Hence, we propose CubeRE, a cube-filling model inspired by table-filling approaches and explicitly considers the interaction between relation triplets and qualifiers. To improve model scalability and reduce negative class imbalance, we further propose a cube-pruning method. Our experiments show that CubeRE outperforms strong baselines and reveal possible directions for future research. Our code and data are available at github.com/declare-lab/HyperRED.
[[2211.10082] Private Federated Statistics in an Interactive Setting](http://arxiv.org/abs/2211.10082)
Privately learning statistics of events on devices can enable improved user experience. Differentially private algorithms for such problems can benefit significantly from interactivity. We argue that an aggregation protocol can enable an interactive private federated statistics system where user's devices maintain control of the privacy assurance. We describe the architecture of such a system, and analyze its security properties.
[[2211.09855] ProtSi: Prototypical Siamese Network with Data Augmentation for Few-Shot Subjective Answer Evaluation](http://arxiv.org/abs/2211.09855)
Subjective answer evaluation is a time-consuming and tedious task, and the quality of the evaluation is heavily influenced by a variety of subjective personal characteristics. Instead, machine evaluation can effectively assist educators in saving time while also ensuring that evaluations are fair and realistic. However, most existing methods using regular machine learning and natural language processing techniques are generally hampered by a lack of annotated answers and poor model interpretability, making them unsuitable for real-world use. To solve these challenges, we propose ProtSi Network, a unique semi-supervised architecture that for the first time uses few-shot learning to subjective answer evaluation. To evaluate students' answers by similarity prototypes, ProtSi Network simulates the natural process of evaluator scoring answers by combining Siamese Network which consists of BERT and encoder layers with Prototypical Network. We employed an unsupervised diverse paraphrasing model ProtAugment, in order to prevent overfitting for effective few-shot text classification. By integrating contrastive learning, the discriminative text issue can be mitigated. Experiments on the Kaggle Short Scoring Dataset demonstrate that the ProtSi Network outperforms the most recent baseline models in terms of accuracy and quadratic weighted kappa.
[[2211.10001] BDTS: A Blockchain-based Data Trading System with Fair Exchange](http://arxiv.org/abs/2211.10001)
Trading data through blockchain platforms is hard to achieve \textit{fair exchange}. Reasons come from two folds: Firstly, guaranteeing fairness between sellers and consumers is a challenging task as the deception of any participating parties is risk-free. This leads to the second issue where judging the behavior of data executors (such as cloud service providers) among distrustful parties is impractical in traditional trading protocols. To fill the gaps, in this paper, we present a \underline{b}lockchain-based \underline{d}ata \underline{t}rading \underline{s}ystem, named BDTS. The proposed BDTS implements a fair-exchange protocol in which benign behaviors can obtain rewards while dishonest behaviors will be punished. Our scheme leverages the smart contract technique to act as the agency, managing data distribution and payment execution. The solution requires the seller to provide consumers with the correct decryption keys for proper execution and encourages a \textit{rational} data executor to behave faithfully for \textit{maximum} benefits. We analyze the strategies of consumers, sellers, and dealers based on the game theory and prove that our game can reach the subgame perfect Nash equilibrium when each party honestly behaves. Further, we implement our scheme based on the Hyperledger Fabric platform with a full-functional design. Evaluations show that our scheme achieves satisfactory efficiency and feasibility.
[[2211.09925] FairMILE: A Multi-Level Framework for Fair and Scalable Graph Representation Learning](http://arxiv.org/abs/2211.09925)
Graph representation learning models have been deployed for making decisions in multiple high-stakes scenarios. It is therefore critical to ensure that these models are fair. Prior research has shown that graph neural networks can inherit and reinforce the bias present in graph data. Researchers have begun to examine ways to mitigate the bias in such models. However, existing efforts are restricted by their inefficiency, limited applicability, and the constraints they place on sensitive attributes. To address these issues, we present FairMILE a general framework for fair and scalable graph representation learning. FairMILE is a multi-level framework that allows contemporary unsupervised graph embedding methods to scale to large graphs in an agnostic manner. FairMILE learns both fair and high-quality node embeddings where the fairness constraints are incorporated in each phase of the framework. Our experiments across two distinct tasks demonstrate that FairMILE can learn node representations that often achieve superior fairness scores and high downstream performance while significantly outperforming all the baselines in terms of efficiency.
[[2211.10285] A Fair Loss Function for Network Pruning](http://arxiv.org/abs/2211.10285)
Model pruning can enable the deployment of neural networks in environments with resource constraints. While pruning may have a small effect on the overall performance of the model, it can exacerbate existing biases into the model such that subsets of samples see significantly degraded performance. In this paper, we introduce the performance weighted loss function, a simple modified cross-entropy loss function that can be used to limit the introduction of biases during pruning. Experiments using biased classifiers for facial classification and skin-lesion classification tasks demonstrate that the proposed method is a simple and effective tool that can enable existing pruning methods to be used in fairness sensitive contexts.
[[2211.09869] RenderDiffusion: Image Diffusion for 3D Reconstruction, Inpainting and Generation](http://arxiv.org/abs/2211.09869)
Diffusion models currently achieve state-of-the-art performance for both conditional and unconditional image generation. However, so far, image diffusion models do not support tasks required for 3D understanding, such as view-consistent 3D generation or single-view object reconstruction. In this paper, we present RenderDiffusion as the first diffusion model for 3D generation and inference that can be trained using only monocular 2D supervision. At the heart of our method is a novel image denoising architecture that generates and renders an intermediate three-dimensional representation of a scene in each denoising step. This enforces a strong inductive structure into the diffusion process that gives us a 3D consistent representation while only requiring 2D supervision. The resulting 3D representation can be rendered from any viewpoint. We evaluate RenderDiffusion on ShapeNet and Clevr datasets and show competitive performance for generation of 3D scenes and inference of 3D scenes from 2D images. Additionally, our diffusion-based approach allows us to use 2D inpainting to edit 3D scenes. We believe that our work promises to enable full 3D generation at scale when trained on massive image collections, thus circumventing the need to have large-scale 3D model collections for supervision.
[[2211.10370] Invariant Learning via Diffusion Dreamed Distribution Shifts](http://arxiv.org/abs/2211.10370)
Though the background is an important signal for image classification, over reliance on it can lead to incorrect predictions when spurious correlations between foreground and background are broken at test time. Training on a dataset where these correlations are unbiased would lead to more robust models. In this paper, we propose such a dataset called Diffusion Dreamed Distribution Shifts (D3S). D3S consists of synthetic images generated through StableDiffusion using text prompts and image guides obtained by pasting a sample foreground image onto a background template image. Using this scalable approach we generate 120K images of objects from all 1000 ImageNet classes in 10 diverse backgrounds. Due to the incredible photorealism of the diffusion model, our images are much closer to natural images than previous synthetic datasets. D3S contains a validation set of more than 17K images whose labels are human-verified in an MTurk study. Using the validation set, we evaluate several popular DNN image classifiers and find that the classification performance of models generally suffers on our background diverse images. Next, we leverage the foreground & background labels in D3S to learn a foreground (background) representation that is invariant to changes in background (foreground) by penalizing the mutual information between the foreground (background) features and the background (foreground) labels. Linear classifiers trained on these features to predict foreground (background) from foreground (background) have high accuracies at 82.9% (93.8%), while classifiers that predict these labels from background and foreground have a much lower accuracy of 2.4% and 45.6% respectively. This suggests that our foreground and background features are well disentangled. We further test the efficacy of these representations by training classifiers on a task with strong spurious correlations.
[[2211.10437] A Structure-Guided Diffusion Model for Large-Hole Diverse Image Completion](http://arxiv.org/abs/2211.10437)
Diverse image completion, a problem of generating various ways of filling incomplete regions (i.e. holes) of an image, has made remarkable success. However, managing input images with large holes is still a challenging problem due to the corruption of semantically important structures. In this paper, we tackle this problem by incorporating explicit structural guidance. We propose a structure-guided diffusion model (SGDM) for the large-hole diverse completion problem. Our proposed SGDM consists of a structure generator and a texture generator, which are both diffusion probabilistic models (DMs). The structure generator generates an edge image representing a plausible structure within the holes, which is later used to guide the texture generation process. To jointly train these two generators, we design a strategy that combines optimal Bayesian denoising and a momentum framework. In addition to the quality improvement, auxiliary edge images generated by the structure generator can be manually edited to allow user-guided image editing. Our experiments using datasets of faces (CelebA-HQ) and natural scenes (Places) show that our method achieves a comparable or superior trade-off between visual quality and diversity compared to other state-of-the-art methods.
[[2211.10440] Magic3D: High-Resolution Text-to-3D Content Creation](http://arxiv.org/abs/2211.10440)
DreamFusion has recently demonstrated the utility of a pre-trained text-to-image diffusion model to optimize Neural Radiance Fields (NeRF), achieving remarkable text-to-3D synthesis results. However, the method has two inherent limitations: (a) extremely slow optimization of NeRF and (b) low-resolution image space supervision on NeRF, leading to low-quality 3D models with a long processing time. In this paper, we address these limitations by utilizing a two-stage optimization framework. First, we obtain a coarse model using a low-resolution diffusion prior and accelerate with a sparse 3D hash grid structure. Using the coarse representation as the initialization, we further optimize a textured 3D mesh model with an efficient differentiable renderer interacting with a high-resolution latent diffusion model. Our method, dubbed Magic3D, can create high quality 3D mesh models in 40 minutes, which is 2x faster than DreamFusion (reportedly taking 1.5 hours on average), while also achieving higher resolution. User studies show 61.7% raters to prefer our approach over DreamFusion. Together with the image-conditioned generation capabilities, we provide users with new ways to control 3D synthesis, opening up new avenues to various creative applications.