secure

Title: K-means Clustering Based Feature Consistency Alignment for Label-free Model Evaluation. (arXiv:2304.09758v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2304.09758
Code URL: null
Copy Paste: [[2304.09758] K-means Clustering Based Feature Consistency Alignment for Label-free Model Evaluation](http://arxiv.org/abs/2304.09758) #secure
Summary:
The label-free model evaluation aims to predict the model performance on various test sets without relying on ground truths. The main challenge of this task is the absence of labels in the test data, unlike in classical supervised model evaluation. This paper presents our solutions for the 1st DataCV Challenge of the Visual Dataset Understanding workshop at CVPR 2023. Firstly, we propose a novel method called K-means Clustering Based Feature Consistency Alignment (KCFCA), which is tailored to handle the distribution shifts of various datasets. KCFCA utilizes the K-means algorithm to cluster labeled training sets and unlabeled test sets, and then aligns the cluster centers with feature consistency. Secondly, we develop a dynamic regression model to capture the relationship between the shifts in distribution and model accuracy. Thirdly, we design an algorithm to discover the outlier model factors, eliminate the outlier models, and combine the strengths of multiple autoeval models. On the DataCV Challenge leaderboard, our approach secured 2nd place with an RMSE of 6.8526. Our method significantly improved over the best baseline method by 36\% (6.8526 vs. 10.7378). Furthermore, our method achieves a relatively more robust and optimal single model performance on the validation dataset.

Title: A Protocol for Cast-as-Intended Verifiability with a Second Device. (arXiv:2304.09456v1 [cs.CR])

Paper URL: http://arxiv.org/abs/2304.09456
Code URL: null
Copy Paste: [[2304.09456] A Protocol for Cast-as-Intended Verifiability with a Second Device](http://arxiv.org/abs/2304.09456) #secure
Summary:
Numerous institutions, such as companies, universities, or non-governmental organizations, employ Internet voting for remote elections. Since the main purpose of an election is to determine the voters' will, it is fundamentally important to ensure that the final election result correctly reflects the voters' votes. To this end, modern secure Internet voting schemes aim for what is called end-to-end verifiability. This fundamental security property ensures that the correctness of the final result can be verified, even if some of the computers or parties involved are malfunctioning or corrupted.

A standard component in this approach is so called cast-as-intended verifiability which enables individual voters to verify that the ballots cast on their behalf contain their intended choices. Numerous approaches for cast-as-intended verifiability have been proposed in the literature, some of which have also been employed in real-life Internet elections.

One of the well established approaches for cast-as-intended verifiability is to employ a second device which can be used by voters to audit their submitted ballots. This approach offers several advantages - including support for flexible ballot/election types and intuitive user experience - and it has been used in real-life elections, for instance in Estonia.

In this work, we improve the existing solutions for cast-as-intended verifiability based on the use of a second device. We propose a solution which, while preserving the advantageous practical properties sketched above, provides tighter security guarantees. Our method does not increase the risk of vote-selling when compared to the underlying voting protocol being augmented and, to achieve this, it requires only comparatively weak trust assumptions. It can be combined with various voting protocols, including commitment-based systems offering everlasting privacy.

Title: Secure Mobile Payment Architecture Enabling Multi-factor Authentication. (arXiv:2304.09468v1 [cs.CR])

Paper URL: http://arxiv.org/abs/2304.09468
Code URL: null
Copy Paste: [[2304.09468] Secure Mobile Payment Architecture Enabling Multi-factor Authentication](http://arxiv.org/abs/2304.09468) #secure
Summary:
The rise of smartphones has led to a significant increase in the usage of mobile payments. Mobile payments allow individuals to access financial resources and make transactions through their mobile devices while on the go. However, the current mobile payment systems were designed to align with traditional payment structures, which limits the full potential of smartphones, including their security features. This has become a major concern in the rapidly growing mobile payment market. To address these security concerns,in this paper we propose new mobile payment architecture. This architecture leverages the advanced capabilities of modern smartphones to verify various aspects of a payment, such as funds, biometrics, location, and others. The proposed system aims to guarantee the legitimacy of transactions and protect against identity theft by verifying multiple elements of a payment. The security of mobile payment systems is crucial, given the rapid growth of the market. Evaluating mobile payment systems based on their authentication, encryption, and fraud detection capabilities is of utmost importance. The proposed architecture provides a secure mobile payment solution that enhances the overall payment experience by taking advantage of the advanced capabilities of modern smartphones. This will not only improve the security of mobile payments but also offer a more user-friendly payment experience for consumers.

Title: Secure Split Learning against Property Inference, Data Reconstruction, and Feature Space Hijacking Attacks. (arXiv:2304.09515v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2304.09515
Code URL: null
Copy Paste: [[2304.09515] Secure Split Learning against Property Inference, Data Reconstruction, and Feature Space Hijacking Attacks](http://arxiv.org/abs/2304.09515) #secure
Summary:
Split learning of deep neural networks (SplitNN) has provided a promising solution to learning jointly for the mutual interest of a guest and a host, which may come from different backgrounds, holding features partitioned vertically. However, SplitNN creates a new attack surface for the adversarial participant, holding back its practical use in the real world. By investigating the adversarial effects of highly threatening attacks, including property inference, data reconstruction, and feature hijacking attacks, we identify the underlying vulnerability of SplitNN and propose a countermeasure. To prevent potential threats and ensure the learning guarantees of SplitNN, we design a privacy-preserving tunnel for information exchange between the guest and the host. The intuition is to perturb the propagation of knowledge in each direction with a controllable unified solution. To this end, we propose a new activation function named R3eLU, transferring private smashed data and partial loss into randomized responses in forward and backward propagations, respectively. We give the first attempt to secure split learning against three threatening attacks and present a fine-grained privacy budget allocation scheme. The analysis proves that our privacy-preserving SplitNN solution provides a tight privacy budget, while the experimental results show that our solution performs better than existing solutions in most cases and achieves a good tradeoff between defense and model usability.

Title: How Secure is Code Generated by ChatGPT?. (arXiv:2304.09655v1 [cs.CR])

Paper URL: http://arxiv.org/abs/2304.09655
Code URL: null
Copy Paste: [[2304.09655] How Secure is Code Generated by ChatGPT?](http://arxiv.org/abs/2304.09655) #secure
Summary:
In recent years, large language models have been responsible for great advances in the field of artificial intelligence (AI). ChatGPT in particular, an AI chatbot developed and recently released by OpenAI, has taken the field to the next level. The conversational model is able not only to process human-like text, but also to translate natural language into code. However, the safety of programs generated by ChatGPT should not be overlooked. In this paper, we perform an experiment to address this issue. Specifically, we ask ChatGPT to generate a number of program and evaluate the security of the resulting source code. We further investigate whether ChatGPT can be prodded to improve the security by appropriate prompts, and discuss the ethical aspects of using AI to generate code. Results suggest that ChatGPT is aware of potential vulnerabilities, but nonetheless often generates source code that are not robust to certain attacks.

security

Title: Security and Privacy Problems in Voice Assistant Applications: A Survey. (arXiv:2304.09486v1 [cs.CR])

Paper URL: http://arxiv.org/abs/2304.09486
Code URL: null
Copy Paste: [[2304.09486] Security and Privacy Problems in Voice Assistant Applications: A Survey](http://arxiv.org/abs/2304.09486) #security
Summary:
Voice assistant applications have become omniscient nowadays. Two models that provide the two most important functions for real-life applications (i.e., Google Home, Amazon Alexa, Siri, etc.) are Automatic Speech Recognition (ASR) models and Speaker Identification (SI) models. According to recent studies, security and privacy threats have also emerged with the rapid development of the Internet of Things (IoT). The security issues researched include attack techniques toward machine learning models and other hardware components widely used in voice assistant applications. The privacy issues include technical-wise information stealing and policy-wise privacy breaches. The voice assistant application takes a steadily growing market share every year, but their privacy and security issues never stopped causing huge economic losses and endangering users' personal sensitive information. Thus, it is important to have a comprehensive survey to outline the categorization of the current research regarding the security and privacy problems of voice assistant applications. This paper concludes and assesses five kinds of security attacks and three types of privacy threats in the papers published in the top-tier conferences of cyber security and voice domain.

Title: 5G-SRNG: 5G Spectrogram-based Random Number Generation for Devices with Low Entropy Sources. (arXiv:2304.09591v1 [cs.CR])

Paper URL: http://arxiv.org/abs/2304.09591
Code URL: null
Copy Paste: [[2304.09591] 5G-SRNG: 5G Spectrogram-based Random Number Generation for Devices with Low Entropy Sources](http://arxiv.org/abs/2304.09591) #security
Summary:
Random number generation (RNG) is a crucial element in security protocols, and its performance and reliability are critical for the safety and integrity of digital systems. This is especially true in 5G networks with many devices with low entropy sources. This paper proposes 5G-SRNG, an end-to-end random number generation solution for devices with low entropy sources in 5G networks. Compared to traditional RNG methods, the 5G-SRNG relies on hardware or software random number generators, using 5G spectral information, such as from spectrum-sensing or a spectrum-aware feedback mechanism, as a source of entropy. The proposed algorithm is experimentally verified, and its performance is analysed by simulating a realistic 5G network environment. Results show that 5G-SRNG outperforms existing RNG in all aspects, including randomness, partial correlation and power, making it suitable for 5G network deployments.

Title: Contactless Human Activity Recognition using Deep Learning with Flexible and Scalable Software Define Radio. (arXiv:2304.09756v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2304.09756
Code URL: null
Copy Paste: [[2304.09756] Contactless Human Activity Recognition using Deep Learning with Flexible and Scalable Software Define Radio](http://arxiv.org/abs/2304.09756) #security
Summary:
Ambient computing is gaining popularity as a major technological advancement for the future. The modern era has witnessed a surge in the advancement in healthcare systems, with viable radio frequency solutions proposed for remote and unobtrusive human activity recognition (HAR). Specifically, this study investigates the use of Wi-Fi channel state information (CSI) as a novel method of ambient sensing that can be employed as a contactless means of recognizing human activity in indoor environments. These methods avoid additional costly hardware required for vision-based systems, which are privacy-intrusive, by (re)using Wi-Fi CSI for various safety and security applications. During an experiment utilizing universal software-defined radio (USRP) to collect CSI samples, it was observed that a subject engaged in six distinct activities, which included no activity, standing, sitting, and leaning forward, across different areas of the room. Additionally, more CSI samples were collected when the subject walked in two different directions. This study presents a Wi-Fi CSI-based HAR system that assesses and contrasts deep learning approaches, namely convolutional neural network (CNN), long short-term memory (LSTM), and hybrid (LSTM+CNN), employed for accurate activity recognition. The experimental results indicate that LSTM surpasses current models and achieves an average accuracy of 95.3% in multi-activity classification when compared to CNN and hybrid techniques. In the future, research needs to study the significance of resilience in diverse and dynamic environments to identify the activity of multiple users.

privacy

Title: Rehabilitation Exercise Repetition Segmentation and Counting using Skeletal Body Joints. (arXiv:2304.09735v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2304.09735
Code URL: https://github.com/abedicodes/repetition-segmentation
Copy Paste: [[2304.09735] Rehabilitation Exercise Repetition Segmentation and Counting using Skeletal Body Joints](http://arxiv.org/abs/2304.09735) #privacy
Summary:
Physical exercise is an essential component of rehabilitation programs that improve quality of life and reduce mortality and re-hospitalization rates. In AI-driven virtual rehabilitation programs, patients complete their exercises independently at home, while AI algorithms analyze the exercise data to provide feedback to patients and report their progress to clinicians. To analyze exercise data, the first step is to segment it into consecutive repetitions. There has been a significant amount of research performed on segmenting and counting the repetitive activities of healthy individuals using raw video data, which raises concerns regarding privacy and is computationally intensive. Previous research on patients' rehabilitation exercise segmentation relied on data collected by multiple wearable sensors, which are difficult to use at home by rehabilitation patients. Compared to healthy individuals, segmenting and counting exercise repetitions in patients is more challenging because of the irregular repetition duration and the variation between repetitions. This paper presents a novel approach for segmenting and counting the repetitions of rehabilitation exercises performed by patients, based on their skeletal body joints. Skeletal body joints can be acquired through depth cameras or computer vision techniques applied to RGB videos of patients. Various sequential neural networks are designed to analyze the sequences of skeletal body joints and perform repetition segmentation and counting. Extensive experiments on three publicly available rehabilitation exercise datasets, KIMORE, UI-PRMD, and IntelliRehabDS, demonstrate the superiority of the proposed method compared to previous methods. The proposed method enables accurate exercise analysis while preserving privacy, facilitating the effective delivery of virtual rehabilitation programs.

Title: Neural Network Quantisation for Faster Homomorphic Encryption. (arXiv:2304.09490v1 [cs.CR])

Paper URL: http://arxiv.org/abs/2304.09490
Code URL: null
Copy Paste: [[2304.09490] Neural Network Quantisation for Faster Homomorphic Encryption](http://arxiv.org/abs/2304.09490) #privacy
Summary:
Homomorphic encryption (HE) enables calculating on encrypted data, which makes it possible to perform privacypreserving neural network inference. One disadvantage of this technique is that it is several orders of magnitudes slower than calculation on unencrypted data. Neural networks are commonly trained using floating-point, while most homomorphic encryption libraries calculate on integers, thus requiring a quantisation of the neural network. A straightforward approach would be to quantise to large integer sizes (e.g. 32 bit) to avoid large quantisation errors. In this work, we reduce the integer sizes of the networks, using quantisation-aware training, to allow more efficient computations. For the targeted MNIST architecture proposed by Badawi et al., we reduce the integer sizes by 33% without significant loss of accuracy, while for the CIFAR architecture, we can reduce the integer sizes by 43%. Implementing the resulting networks under the BFV homomorphic encryption scheme using SEAL, we could reduce the execution time of an MNIST neural network by 80% and by 40% for a CIFAR neural network.

Title: Visualising Personal Data Flows: Insights from a Case Study of Booking.com. (arXiv:2304.09603v1 [cs.CR])

Paper URL: http://arxiv.org/abs/2304.09603
Code URL: null
Copy Paste: [[2304.09603] Visualising Personal Data Flows: Insights from a Case Study of Booking](http://arxiv.org/abs/2304.09603) #privacy
Summary:
Commercial organisations are holding and processing an ever-increasing amount of personal data. Policies and laws are continually changing to require these companies to be more transparent regarding collection, storage, processing and sharing of this data. This paper reports our work of taking Booking.com as a case study to visualise personal data flows extracted from their privacy policy. By showcasing how the company shares its consumers' personal data, we raise questions and extend discussions on the challenges and limitations of using privacy policy to inform customers the true scale and landscape of personal data flows. More importantly, this case study can inform us about future research on more data flow-oriented privacy policy analysis and on the construction of a more comprehensive ontology on personal data flows in complicated business ecosystems.

protect

defense

Title: Maybenot: A Framework for Traffic Analysis Defenses. (arXiv:2304.09510v1 [cs.CR])

Paper URL: http://arxiv.org/abs/2304.09510
Code URL: null
Copy Paste: [[2304.09510] Maybenot: A Framework for Traffic Analysis Defenses](http://arxiv.org/abs/2304.09510) #defense
Summary:
End-to-end encryption is a powerful tool for protecting the privacy of Internet users. Together with the increasing use of technologies such as Tor, VPNs, and encrypted messaging, it is becoming increasingly difficult for network adversaries to monitor and censor Internet traffic. One remaining avenue for adversaries is traffic analysis: the analysis of patterns in encrypted traffic to infer information about the users and their activities. Recent improvements using deep learning have made traffic analysis attacks more effective than ever before.

We present Maybenot, a framework for traffic analysis defenses. Maybenot is designed to be easy to use and integrate into existing end-to-end encrypted protocols. It is implemented in the Rust programming language as a crate (library), together with a simulator to further the development of defenses. Defenses in Maybenot are expressed as probabilistic state machines that schedule actions to inject padding or block outgoing traffic. Maybenot is an evolution from the Tor Circuit Padding Framework by Perry and Kadianakis, designed to support a wide range of protocols and use cases.

attack

robust

Title: Real-Time Helmet Violation Detection Using YOLOv5 and Ensemble Learning. (arXiv:2304.09246v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2304.09246
Code URL: null
Copy Paste: [[2304.09246] Real-Time Helmet Violation Detection Using YOLOv5 and Ensemble Learning](http://arxiv.org/abs/2304.09246) #robust
Summary:
The proper enforcement of motorcycle helmet regulations is crucial for ensuring the safety of motorbike passengers and riders, as roadway cyclists and passengers are not likely to abide by these regulations if no proper enforcement systems are instituted. This paper presents the development and evaluation of a real-time YOLOv5 Deep Learning (DL) model for detecting riders and passengers on motorbikes, identifying whether the detected person is wearing a helmet. We trained the model on 100 videos recorded at 10 fps, each for 20 seconds. Our study demonstrated the applicability of DL models to accurately detect helmet regulation violators even in challenging lighting and weather conditions. We employed several data augmentation techniques in the study to ensure the training data is diverse enough to help build a robust model. The proposed model was tested on 100 test videos and produced an mAP score of 0.5267, ranking 11th on the AI City Track 5 public leaderboard. The use of deep learning techniques for image classification tasks, such as identifying helmet-wearing riders, has enormous potential for improving road safety. The study shows the potential of deep learning models for application in smart cities and enforcing traffic regulations and can be deployed in real-time for city-wide monitoring.

Title: Wavelets Beat Monkeys at Adversarial Robustness. (arXiv:2304.09403v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2304.09403
Code URL: null
Copy Paste: [[2304.09403] Wavelets Beat Monkeys at Adversarial Robustness](http://arxiv.org/abs/2304.09403) #robust
Summary:
Research on improving the robustness of neural networks to adversarial noise
imperceptible malicious perturbations of the data - has received significant attention. The currently uncontested state-of-the-art defense to obtain robust deep neural networks is Adversarial Training (AT), but it consumes significantly more resources compared to standard training and trades off accuracy for robustness. An inspiring recent work [Dapello et al.] aims to bring neurobiological tools to the question: How can we develop Neural Nets that robustly generalize like human vision? [Dapello et al.] design a network structure with a neural hidden first layer that mimics the primate primary visual cortex (V1), followed by a back-end structure adapted from current CNN vision models. It seems to achieve non-trivial adversarial robustness on standard vision benchmarks when tested on small perturbations. Here we revisit this biologically inspired work, and ask whether a principled parameter-free representation with inspiration from physics is able to achieve the same goal. We discover that the wavelet scattering transform can replace the complex V1-cortex and simple uniform Gaussian noise can take the role of neural stochasticity, to achieve adversarial robustness. In extensive experiments on the CIFAR-10 benchmark with adaptive adversarial attacks we show that: 1) Robustness of VOneBlock architectures is relatively weak (though non-zero) when the strength of the adversarial attack radius is set to commonly used benchmarks. 2) Replacing the front-end VOneBlock by an off-the-shelf parameter-free Scatternet followed by simple uniform Gaussian noise can achieve much more substantial adversarial robustness without adversarial training. Our work shows how physically inspired structures yield new insights into robustness that were previously only thought possible by meticulously mimicking the human cortex.

Title: On the Effectiveness of Image Manipulation Detection in the Age of Social Media. (arXiv:2304.09414v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2304.09414
Code URL: null
Copy Paste: [[2304.09414] On the Effectiveness of Image Manipulation Detection in the Age of Social Media](http://arxiv.org/abs/2304.09414) #robust
Summary:
Image manipulation detection algorithms designed to identify local anomalies often rely on the manipulated regions being ``sufficiently'' different from the rest of the non-tampered regions in the image. However, such anomalies might not be easily identifiable in high-quality manipulations, and their use is often based on the assumption that certain image phenomena are associated with the use of specific editing tools. This makes the task of manipulation detection hard in and of itself, with state-of-the-art detectors only being able to detect a limited number of manipulation types. More importantly, in cases where the anomaly assumption does not hold, the detection of false positives in otherwise non-manipulated images becomes a serious problem.

To understand the current state of manipulation detection, we present an in-depth analysis of deep learning-based and learning-free methods, assessing their performance on different benchmark datasets containing tampered and non-tampered samples. We provide a comprehensive study of their suitability for detecting different manipulations as well as their robustness when presented with non-tampered data. Furthermore, we propose a novel deep learning-based pre-processing technique that accentuates the anomalies present in manipulated regions to make them more identifiable by a variety of manipulation detection methods. To this end, we introduce an anomaly enhancement loss that, when used with a residual architecture, improves the performance of different detection algorithms with a minimal introduction of false positives on the non-manipulated data.

Lastly, we introduce an open-source manipulation detection toolkit comprising a number of standard detection algorithms.

Title: Decoupled Training for Long-Tailed Classification With Stochastic Representations. (arXiv:2304.09426v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2304.09426
Code URL: null
Copy Paste: [[2304.09426] Decoupled Training for Long-Tailed Classification With Stochastic Representations](http://arxiv.org/abs/2304.09426) #robust
Summary:
Decoupling representation learning and classifier learning has been shown to be effective in classification with long-tailed data. There are two main ingredients in constructing a decoupled learning scheme; 1) how to train the feature extractor for representation learning so that it provides generalizable representations and 2) how to re-train the classifier that constructs proper decision boundaries by handling class imbalances in long-tailed data. In this work, we first apply Stochastic Weight Averaging (SWA), an optimization technique for improving the generalization of deep neural networks, to obtain better generalizing feature extractors for long-tailed classification. We then propose a novel classifier re-training algorithm based on stochastic representation obtained from the SWA-Gaussian, a Gaussian perturbed SWA, and a self-distillation strategy that can harness the diverse stochastic representations based on uncertainty estimates to build more robust classifiers. Extensive experiments on CIFAR10/100-LT, ImageNet-LT, and iNaturalist-2018 benchmarks show that our proposed method improves upon previous methods both in terms of prediction accuracy and uncertainty estimation.

Title: Density-Insensitive Unsupervised Domain Adaption on 3D Object Detection. (arXiv:2304.09446v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2304.09446
Code URL: https://github.com/woodwindhu/dts
Copy Paste: [[2304.09446] Density-Insensitive Unsupervised Domain Adaption on 3D Object Detection](http://arxiv.org/abs/2304.09446) #robust
Summary:
3D object detection from point clouds is crucial in safety-critical autonomous driving. Although many works have made great efforts and achieved significant progress on this task, most of them suffer from expensive annotation cost and poor transferability to unknown data due to the domain gap. Recently, few works attempt to tackle the domain gap in objects, but still fail to adapt to the gap of varying beam-densities between two domains, which is critical to mitigate the characteristic differences of the LiDAR collectors. To this end, we make the attempt to propose a density-insensitive domain adaption framework to address the density-induced domain gap. In particular, we first introduce Random Beam Re-Sampling (RBRS) to enhance the robustness of 3D detectors trained on the source domain to the varying beam-density. Then, we take this pre-trained detector as the backbone model, and feed the unlabeled target domain data into our newly designed task-specific teacher-student framework for predicting its high-quality pseudo labels. To further adapt the property of density-insensitivity into the target domain, we feed the teacher and student branches with the same sample of different densities, and propose an Object Graph Alignment (OGA) module to construct two object-graphs between the two branches for enforcing the consistency in both the attribute and relation of cross-density objects. Experimental results on three widely adopted 3D object detection datasets demonstrate that our proposed domain adaption method outperforms the state-of-the-art methods, especially over varying-density data. Code is available at https://github.com/WoodwindHu/DTS}{https://github.com/WoodwindHu/DTS.

Title: Enhancing Multi-Camera People Tracking with Anchor-Guided Clustering and Spatio-Temporal Consistency ID Re-Assignment. (arXiv:2304.09471v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2304.09471
Code URL: https://github.com/ipl-uw/aic23_track1_uwipl_etri
Copy Paste: [[2304.09471] Enhancing Multi-Camera People Tracking with Anchor-Guided Clustering and Spatio-Temporal Consistency ID Re-Assignment](http://arxiv.org/abs/2304.09471) #robust
Summary:
Multi-camera multiple people tracking has become an increasingly important area of research due to the growing demand for accurate and efficient indoor people tracking systems, particularly in settings such as retail, healthcare centers, and transit hubs. We proposed a novel multi-camera multiple people tracking method that uses anchor-guided clustering for cross-camera re-identification and spatio-temporal consistency for geometry-based cross-camera ID reassigning. Our approach aims to improve the accuracy of tracking by identifying key features that are unique to every individual and utilizing the overlap of views between cameras to predict accurate trajectories without needing the actual camera parameters. The method has demonstrated robustness and effectiveness in handling both synthetic and real-world data. The proposed method is evaluated on CVPR AI City Challenge 2023 dataset, achieving IDF1 of 95.36% with the first-place ranking in the challenge. The code is available at: https://github.com/ipl-uw/AIC23_Track1_UWIPL_ETRI.

Title: Learning Robust Visual-Semantic Embedding for Generalizable Person Re-identification. (arXiv:2304.09498v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2304.09498
Code URL: https://github.com/jeremyxsc/mmet
Copy Paste: [[2304.09498] Learning Robust Visual-Semantic Embedding for Generalizable Person Re-identification](http://arxiv.org/abs/2304.09498) #robust
Summary:
Generalizable person re-identification (Re-ID) is a very hot research topic in machine learning and computer vision, which plays a significant role in realistic scenarios due to its various applications in public security and video surveillance. However, previous methods mainly focus on the visual representation learning, while neglect to explore the potential of semantic features during training, which easily leads to poor generalization capability when adapted to the new domain. In this paper, we propose a Multi-Modal Equivalent Transformer called MMET for more robust visual-semantic embedding learning on visual, textual and visual-textual tasks respectively. To further enhance the robust feature learning in the context of transformer, a dynamic masking mechanism called Masked Multimodal Modeling strategy (MMM) is introduced to mask both the image patches and the text tokens, which can jointly works on multimodal or unimodal data and significantly boost the performance of generalizable person Re-ID. Extensive experiments on benchmark datasets demonstrate the competitive performance of our method over previous approaches. We hope this method could advance the research towards visual-semantic representation learning. Our source code is also publicly available at https://github.com/JeremyXSC/MMET.

Title: Realistic Data Enrichment for Robust Image Segmentation in Histopathology. (arXiv:2304.09534v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2304.09534
Code URL: null
Copy Paste: [[2304.09534] Realistic Data Enrichment for Robust Image Segmentation in Histopathology](http://arxiv.org/abs/2304.09534) #robust
Summary:
Poor performance of quantitative analysis in histopathological Whole Slide Images (WSI) has been a significant obstacle in clinical practice. Annotating large-scale WSIs manually is a demanding and time-consuming task, unlikely to yield the expected results when used for fully supervised learning systems. Rarely observed disease patterns and large differences in object scales are difficult to model through conventional patient intake. Prior methods either fall back to direct disease classification, which only requires learning a few factors per image, or report on average image segmentation performance, which is highly biased towards majority observations. Geometric image augmentation is commonly used to improve robustness for average case predictions and to enrich limited datasets. So far no method provided sampling of a realistic posterior distribution to improve stability, e.g. for the segmentation of imbalanced objects within images. Therefore, we propose a new approach, based on diffusion models, which can enrich an imbalanced dataset with plausible examples from underrepresented groups by conditioning on segmentation maps. Our method can simply expand limited clinical datasets making them suitable to train machine learning pipelines, and provides an interpretable and human-controllable way of generating histopathology images that are indistinguishable from real ones to human experts. We validate our findings on two datasets, one from the public domain and one from a Kidney Transplant study.

Title: CrossFusion: Interleaving Cross-modal Complementation for Noise-resistant 3D Object Detection. (arXiv:2304.09694v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2304.09694
Code URL: null
Copy Paste: [[2304.09694] CrossFusion: Interleaving Cross-modal Complementation for Noise-resistant 3D Object Detection](http://arxiv.org/abs/2304.09694) #robust
Summary:
The combination of LiDAR and camera modalities is proven to be necessary and typical for 3D object detection according to recent studies. Existing fusion strategies tend to overly rely on the LiDAR modal in essence, which exploits the abundant semantics from the camera sensor insufficiently. However, existing methods cannot rely on information from other modalities because the corruption of LiDAR features results in a large domain gap. Following this, we propose CrossFusion, a more robust and noise-resistant scheme that makes full use of the camera and LiDAR features with the designed cross-modal complementation strategy. Extensive experiments we conducted show that our method not only outperforms the state-of-the-art methods under the setting without introducing an extra depth estimation network but also demonstrates our model's noise resistance without re-training for the specific malfunction scenarios by increasing 5.2\% mAP and 2.4\% NDS.

Title: Skeleton-based action analysis for ADHD diagnosis. (arXiv:2304.09751v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2304.09751
Code URL: null
Copy Paste: [[2304.09751] Skeleton-based action analysis for ADHD diagnosis](http://arxiv.org/abs/2304.09751) #robust
Summary:
Attention Deficit Hyperactivity Disorder (ADHD) is a common neurobehavioral disorder worldwide. While extensive research has focused on machine learning methods for ADHD diagnosis, most research relies on high-cost equipment, e.g., MRI machine and EEG patch. Therefore, low-cost diagnostic methods based on the action characteristics of ADHD are desired. Skeleton-based action recognition has gained attention due to the action-focused nature and robustness. In this work, we propose a novel ADHD diagnosis system with a skeleton-based action recognition framework, utilizing a real multi-modal ADHD dataset and state-of-the-art detection algorithms. Compared to conventional methods, the proposed method shows cost-efficiency and significant performance improvement, making it more accessible for a broad range of initial ADHD diagnoses. Through the experiment results, the proposed method outperforms the conventional methods in accuracy and AUC. Meanwhile, our method is widely applicable for mass screening.

Title: Attributing Image Generative Models using Latent Fingerprints. (arXiv:2304.09752v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2304.09752
Code URL: https://github.com/guangyunie/watermarking-through-style-space-edition
Copy Paste: [[2304.09752] Attributing Image Generative Models using Latent Fingerprints](http://arxiv.org/abs/2304.09752) #robust
Summary:
Generative models have enabled the creation of contents that are indistinguishable from those taken from the nature. Open-source development of such models raised concerns about the risks in their misuse for malicious purposes. One potential risk mitigation strategy is to attribute generative models via fingerprinting. Current fingerprinting methods exhibit significant tradeoff between robust attribution accuracy and generation quality, and also lack designing principles to improve this tradeoff. This paper investigates the use of latent semantic dimensions as fingerprints, from where we can analyze the effects of design variables, including the choice of fingerprinting dimensions, strength, and capacity, on the accuracy-quality tradeoff. Compared with previous SOTA, our method requires minimum computation and is more applicable to large-scale models. We use StyleGAN2 and the latent diffusion model to demonstrate the efficacy of our method.

Title: MetaBEV: Solving Sensor Failures for BEV Detection and Map Segmentation. (arXiv:2304.09801v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2304.09801
Code URL: null
Copy Paste: [[2304.09801] MetaBEV: Solving Sensor Failures for BEV Detection and Map Segmentation](http://arxiv.org/abs/2304.09801) #robust
Summary:
Perception systems in modern autonomous driving vehicles typically take inputs from complementary multi-modal sensors, e.g., LiDAR and cameras. However, in real-world applications, sensor corruptions and failures lead to inferior performances, thus compromising autonomous safety. In this paper, we propose a robust framework, called MetaBEV, to address extreme real-world environments involving overall six sensor corruptions and two extreme sensor-missing situations. In MetaBEV, signals from multiple sensors are first processed by modal-specific encoders. Subsequently, a set of dense BEV queries are initialized, termed meta-BEV. These queries are then processed iteratively by a BEV-Evolving decoder, which selectively aggregates deep features from either LiDAR, cameras, or both modalities. The updated BEV representations are further leveraged for multiple 3D prediction tasks. Additionally, we introduce a new M2oE structure to alleviate the performance drop on distinct tasks in multi-task joint learning. Finally, MetaBEV is evaluated on the nuScenes dataset with 3D object detection and BEV map segmentation tasks. Experiments show MetaBEV outperforms prior arts by a large margin on both full and corrupted modalities. For instance, when the LiDAR signal is missing, MetaBEV improves 35.5% detection NDS and 17.7% segmentation mIoU upon the vanilla BEVFusion model; and when the camera signal is absent, MetaBEV still achieves 69.2% NDS and 53.7% mIoU, which is even higher than previous works that perform on full-modalities. Moreover, MetaBEV performs fairly against previous methods in both canonical perception and multi-task learning settings, refreshing state-of-the-art nuScenes BEV map segmentation with 70.4% mIoU.

Title: Transformer-Based Visual Segmentation: A Survey. (arXiv:2304.09854v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2304.09854
Code URL: https://github.com/lxtgh/awesome-segmenation-with-transformer
Copy Paste: [[2304.09854] Transformer-Based Visual Segmentation: A Survey](http://arxiv.org/abs/2304.09854) #robust
Summary:
Visual segmentation seeks to partition images, video frames, or point clouds into multiple segments or groups. This technique has numerous real-world applications, such as autonomous driving, image editing, robot sensing, and medical analysis. Over the past decade, deep learning-based methods have made remarkable strides in this area. Recently, transformers, a type of neural network based on self-attention originally designed for natural language processing, have considerably surpassed previous convolutional or recurrent approaches in various vision processing tasks. Specifically, vision transformers offer robust, unified, and even simpler solutions for various segmentation tasks. This survey provides a thorough overview of transformer-based visual segmentation, summarizing recent advancements. We first review the background, encompassing problem definitions, datasets, and prior convolutional methods. Next, we summarize a meta-architecture that unifies all recent transformer-based approaches. Based on this meta-architecture, we examine various method designs, including modifications to the meta-architecture and associated applications. We also present several closely related settings, including 3D point cloud segmentation, foundation model tuning, domain-aware segmentation, efficient segmentation, and medical segmentation. Additionally, we compile and re-evaluate the reviewed methods on several well-established datasets. Finally, we identify open challenges in this field and propose directions for future research. The project page can be found at https://github.com/lxtGH/Awesome-Segmenation-With-Transformer. We will also continually monitor developments in this rapidly evolving field.

Title: Token Imbalance Adaptation for Radiology Report Generation. (arXiv:2304.09185v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2304.09185
Code URL: null
Copy Paste: [[2304.09185] Token Imbalance Adaptation for Radiology Report Generation](http://arxiv.org/abs/2304.09185) #robust
Summary:
Imbalanced token distributions naturally exist in text documents, leading neural language models to overfit on frequent tokens. The token imbalance may dampen the robustness of radiology report generators, as complex medical terms appear less frequently but reflect more medical information. In this study, we demonstrate how current state-of-the-art models fail to generate infrequent tokens on two standard benchmark datasets (IU X-RAY and MIMIC-CXR) of radiology report generation. % However, no prior study has proposed methods to adapt infrequent tokens for text generators feeding with medical images. To solve the challenge, we propose the \textbf{T}oken \textbf{Im}balance Adapt\textbf{er} (\textit{TIMER}), aiming to improve generation robustness on infrequent tokens. The model automatically leverages token imbalance by an unlikelihood loss and dynamically optimizes generation processes to augment infrequent tokens. We compare our approach with multiple state-of-the-art methods on the two benchmarks. Experiments demonstrate the effectiveness of our approach in enhancing model robustness overall and infrequent tokens. Our ablation analysis shows that our reinforcement learning method has a major effect in adapting token imbalance for radiology report generation.

Title: On the Robustness of Aspect-based Sentiment Analysis: Rethinking Model, Data, and Training. (arXiv:2304.09563v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2304.09563
Code URL: null
Copy Paste: [[2304.09563] On the Robustness of Aspect-based Sentiment Analysis: Rethinking Model, Data, and Training](http://arxiv.org/abs/2304.09563) #robust
Summary:
Aspect-based sentiment analysis (ABSA) aims at automatically inferring the specific sentiment polarities toward certain aspects of products or services behind the social media texts or reviews, which has been a fundamental application to the real-world society. Since the early 2010s, ABSA has achieved extraordinarily high accuracy with various deep neural models. However, existing ABSA models with strong in-house performances may fail to generalize to some challenging cases where the contexts are variable, i.e., low robustness to real-world environments. In this study, we propose to enhance the ABSA robustness by systematically rethinking the bottlenecks from all possible angles, including model, data, and training. First, we strengthen the current best-robust syntax-aware models by further incorporating the rich external syntactic dependencies and the labels with aspect simultaneously with a universal-syntax graph convolutional network. In the corpus perspective, we propose to automatically induce high-quality synthetic training data with various types, allowing models to learn sufficient inductive bias for better robustness. Last, we based on the rich pseudo data perform adversarial training to enhance the resistance to the context perturbation and meanwhile employ contrastive learning to reinforce the representations of instances with contrastive sentiments. Extensive robustness evaluations are conducted. The results demonstrate that our enhanced syntax-aware model achieves better robustness performances than all the state-of-the-art baselines. By additionally incorporating our synthetic corpus, the robust testing results are pushed with around 10% accuracy, which are then further improved by installing the advanced training strategies. In-depth analyses are presented for revealing the factors influencing the ABSA robustness.

Title: Early Detection of Parkinson's Disease using Motor Symptoms and Machine Learning. (arXiv:2304.09245v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2304.09245
Code URL: null
Copy Paste: [[2304.09245] Early Detection of Parkinson's Disease using Motor Symptoms and Machine Learning](http://arxiv.org/abs/2304.09245) #robust
Summary:
Parkinson's disease (PD) has been found to affect 1 out of every 1000 people, being more inclined towards the population above 60 years. Leveraging wearable-systems to find accurate biomarkers for diagnosis has become the need of the hour, especially for a neurodegenerative condition like Parkinson's. This work aims at focusing on early-occurring, common symptoms, such as motor and gait related parameters to arrive at a quantitative analysis on the feasibility of an economical and a robust wearable device. A subset of the Parkinson's Progression Markers Initiative (PPMI), PPMI Gait dataset has been utilised for feature-selection after a thorough analysis with various Machine Learning algorithms. Identified influential features has then been used to test real-time data for early detection of Parkinson Syndrome, with a model accuracy of 91.9%

Title: Amplifying Sine Unit: An Oscillatory Activation Function for Deep Neural Networks to Recover Nonlinear Oscillations Efficiently. (arXiv:2304.09759v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2304.09759
Code URL: null
Copy Paste: [[2304.09759] Amplifying Sine Unit: An Oscillatory Activation Function for Deep Neural Networks to Recover Nonlinear Oscillations Efficiently](http://arxiv.org/abs/2304.09759) #robust
Summary:
Many industrial and real life problems exhibit highly nonlinear periodic behaviors and the conventional methods may fall short of finding their analytical or closed form solutions. Such problems demand some cutting edge computational tools with increased functionality and reduced cost. Recently, deep neural networks have gained massive research interest due to their ability to handle large data and universality to learn complex functions. In this work, we put forward a methodology based on deep neural networks with responsive layers structure to deal nonlinear oscillations in microelectromechanical systems. We incorporated some oscillatory and non oscillatory activation functions such as growing cosine unit known as GCU, Sine, Mish and Tanh in our designed network to have a comprehensive analysis on their performance for highly nonlinear and vibrational problems. Integrating oscillatory activation functions with deep neural networks definitely outperform in predicting the periodic patterns of underlying systems. To support oscillatory actuation for nonlinear systems, we have proposed a novel oscillatory activation function called Amplifying Sine Unit denoted as ASU which is more efficient than GCU for complex vibratory systems such as microelectromechanical systems. Experimental results show that the designed network with our proposed activation function ASU is more reliable and robust to handle the challenges posed by nonlinearity and oscillations. To validate the proposed methodology, outputs of our networks are being compared with the results from Livermore solver for ordinary differential equation called LSODA. Further, graphical illustrations of incurred errors are also being presented in the work.

Title: Towards transparent and robust data-driven wind turbine power curve models. (arXiv:2304.09835v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2304.09835
Code URL: null
Copy Paste: [[2304.09835] Towards transparent and robust data-driven wind turbine power curve models](http://arxiv.org/abs/2304.09835) #robust
Summary:
Wind turbine power curve models translate ambient conditions into turbine power output. They are essential for energy yield prediction and turbine performance monitoring. In recent years, data-driven machine learning methods have outperformed parametric, physics-informed approaches. However, they are often criticised for being opaque "black boxes" which raises concerns regarding their robustness in non-stationary environments, such as faced by wind turbines. We, therefore, introduce an explainable artificial intelligence (XAI) framework to investigate and validate strategies learned by data-driven power curve models from operational SCADA data. It combines domain-specific considerations with Shapley Values and the latest findings from XAI for regression. Our results suggest, that learned strategies can be better indicators for model robustness than validation or test set errors. Moreover, we observe that highly complex, state-of-the-art ML models are prone to learn physically implausible strategies. Consequently, we compare several measures to ensure physically reasonable model behaviour. Lastly, we propose the utilization of XAI in the context of wind turbine performance monitoring, by disentangling environmental and technical effects that cause deviations from an expected turbine output. We hope, our work can guide domain experts towards training and selecting more transparent and robust data-driven wind turbine power curve models.

biometric

steal

extraction

Title: SigSegment: A Signal-Based Segmentation Algorithm for Identifying Anomalous Driving Behaviours in Naturalistic Driving Videos. (arXiv:2304.09247v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2304.09247
Code URL: null
Copy Paste: [[2304.09247] SigSegment: A Signal-Based Segmentation Algorithm for Identifying Anomalous Driving Behaviours in Naturalistic Driving Videos](http://arxiv.org/abs/2304.09247) #extraction
Summary:
In recent years, distracted driving has garnered considerable attention as it continues to pose a significant threat to public safety on the roads. This has increased the need for innovative solutions that can identify and eliminate distracted driving behavior before it results in fatal accidents. In this paper, we propose a Signal-Based anomaly detection algorithm that segments videos into anomalies and non-anomalies using a deep CNN-LSTM classifier to precisely estimate the start and end times of an anomalous driving event. In the phase of anomaly detection and analysis, driver pose background estimation, mask extraction, and signal activity spikes are utilized. A Deep CNN-LSTM classifier was applied to candidate anomalies to detect and classify final anomalies. The proposed method achieved an overlap score of 0.5424 and ranked 9th on the public leader board in the AI City Challenge 2023, according to experimental validation results.

Title: Language Models Enable Simple Systems for Generating Structured Views of Heterogeneous Data Lakes. (arXiv:2304.09433v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2304.09433
Code URL: https://github.com/hazyresearch/evaporate
Copy Paste: [[2304.09433] Language Models Enable Simple Systems for Generating Structured Views of Heterogeneous Data Lakes](http://arxiv.org/abs/2304.09433) #extraction
Summary:
A long standing goal of the data management community is to develop general, automated systems that ingest semi-structured documents and output queryable tables without human effort or domain specific customization. Given the sheer variety of potential documents, state-of-the art systems make simplifying assumptions and use domain specific training. In this work, we ask whether we can maintain generality by using large language models (LLMs). LLMs, which are pretrained on broad data, can perform diverse downstream tasks simply conditioned on natural language task descriptions.

We propose and evaluate EVAPORATE, a simple, prototype system powered by LLMs. We identify two fundamentally different strategies for implementing this system: prompt the LLM to directly extract values from documents or prompt the LLM to synthesize code that performs the extraction. Our evaluations show a cost-quality tradeoff between these two approaches. Code synthesis is cheap, but far less accurate than directly processing each document with the LLM. To improve quality while maintaining low cost, we propose an extended code synthesis implementation, EVAPORATE-CODE+, which achieves better quality than direct extraction. Our key insight is to generate many candidate functions and ensemble their extractions using weak supervision. EVAPORATE-CODE+ not only outperforms the state-of-the art systems, but does so using a sublinear pass over the documents with the LLM. This equates to a 110x reduction in the number of tokens the LLM needs to process, averaged across 16 real-world evaluation settings of 10k documents each.

membership infer

federate

Title: Federated Alternate Training (FAT): Leveraging Unannotated Data Silos in Federated Segmentation for Medical Imaging. (arXiv:2304.09327v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2304.09327
Code URL: null
Copy Paste: [[2304.09327] Federated Alternate Training (FAT): Leveraging Unannotated Data Silos in Federated Segmentation for Medical Imaging](http://arxiv.org/abs/2304.09327) #federate
Summary:
Federated Learning (FL) aims to train a machine learning (ML) model in a distributed fashion to strengthen data privacy with limited data migration costs. It is a distributed learning framework naturally suitable for privacy-sensitive medical imaging datasets. However, most current FL-based medical imaging works assume silos have ground truth labels for training. In practice, label acquisition in the medical field is challenging as it often requires extensive labor and time costs. To address this challenge and leverage the unannotated data silos to improve modeling, we propose an alternate training-based framework, Federated Alternate Training (FAT), that alters training between annotated data silos and unannotated data silos. Annotated data silos exploit annotations to learn a reasonable global segmentation model. Meanwhile, unannotated data silos use the global segmentation model as a target model to generate pseudo labels for self-supervised learning. We evaluate the performance of the proposed framework on two naturally partitioned Federated datasets, KiTS19 and FeTS2021, and show its promising performance.

Title: Practical Differentially Private and Byzantine-resilient Federated Learning. (arXiv:2304.09762v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2304.09762
Code URL: null
Copy Paste: [[2304.09762] Practical Differentially Private and Byzantine-resilient Federated Learning](http://arxiv.org/abs/2304.09762) #federate
Summary:
Privacy and Byzantine resilience are two indispensable requirements for a federated learning (FL) system. Although there have been extensive studies on privacy and Byzantine security in their own track, solutions that consider both remain sparse. This is due to difficulties in reconciling privacy-preserving and Byzantine-resilient algorithms.

In this work, we propose a solution to such a two-fold issue. We use our version of differentially private stochastic gradient descent (DP-SGD) algorithm to preserve privacy and then apply our Byzantine-resilient algorithms. We note that while existing works follow this general approach, an in-depth analysis on the interplay between DP and Byzantine resilience has been ignored, leading to unsatisfactory performance. Specifically, for the random noise introduced by DP, previous works strive to reduce its impact on the Byzantine aggregation. In contrast, we leverage the random noise to construct an aggregation that effectively rejects many existing Byzantine attacks.

We provide both theoretical proof and empirical experiments to show our protocol is effective: retaining high accuracy while preserving the DP guarantee and Byzantine resilience. Compared with the previous work, our protocol 1) achieves significantly higher accuracy even in a high privacy regime; 2) works well even when up to 90% of distributive workers are Byzantine.

Title: Learning to Transmit with Provable Guarantees in Wireless Federated Learning. (arXiv:2304.09329v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2304.09329
Code URL: null
Copy Paste: [[2304.09329] Learning to Transmit with Provable Guarantees in Wireless Federated Learning](http://arxiv.org/abs/2304.09329) #federate
Summary:
We propose a novel data-driven approach to allocate transmit power for federated learning (FL) over interference-limited wireless networks. The proposed method is useful in challenging scenarios where the wireless channel is changing during the FL training process and when the training data are not independent and identically distributed (non-i.i.d.) on the local devices. Intuitively, the power policy is designed to optimize the information received at the server end during the FL process under communication constraints. Ultimately, our goal is to improve the accuracy and efficiency of the global FL model being trained. The proposed power allocation policy is parameterized using a graph convolutional network and the associated constrained optimization problem is solved through a primal-dual (PD) algorithm. Theoretically, we show that the formulated problem has zero duality gap and, once the power policy is parameterized, optimality depends on how expressive this parameterization is. Numerically, we demonstrate that the proposed method outperforms existing baselines under different wireless channel settings and varying degrees of data heterogeneity.

fair

Title: Generative models improve fairness of medical classifiers under distribution shifts. (arXiv:2304.09218v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2304.09218
Code URL: null
Copy Paste: [[2304.09218] Generative models improve fairness of medical classifiers under distribution shifts](http://arxiv.org/abs/2304.09218) #fair
Summary:
A ubiquitous challenge in machine learning is the problem of domain generalisation. This can exacerbate bias against groups or labels that are underrepresented in the datasets used for model development. Model bias can lead to unintended harms, especially in safety-critical applications like healthcare. Furthermore, the challenge is compounded by the difficulty of obtaining labelled data due to high cost or lack of readily available domain expertise. In our work, we show that learning realistic augmentations automatically from data is possible in a label-efficient manner using generative models. In particular, we leverage the higher abundance of unlabelled data to capture the underlying data distribution of different conditions and subgroups for an imaging modality. By conditioning generative models on appropriate labels, we can steer the distribution of synthetic examples according to specific requirements. We demonstrate that these learned augmentations can surpass heuristic ones by making models more robust and statistically fair in- and out-of-distribution. To evaluate the generality of our approach, we study 3 distinct medical imaging contexts of varying difficulty: (i) histopathology images from a publicly available generalisation benchmark, (ii) chest X-rays from publicly available clinical datasets, and (iii) dermatology images characterised by complex shifts and imaging conditions. Complementing real training samples with synthetic ones improves the robustness of models in all three medical tasks and increases fairness by improving the accuracy of diagnosis within underrepresented groups. This approach leads to stark improvements OOD across modalities: 7.7% prediction accuracy improvement in histopathology, 5.2% in chest radiology with 44.6% lower fairness gap and a striking 63.5% improvement in high-risk sensitivity for dermatology with a 7.5x reduction in fairness gap.

Title: A Real Balanced Dataset For Understanding Bias? Factors That Impact Accuracy, Not Numbers of Identities and Images. (arXiv:2304.09818v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2304.09818
Code URL: null
Copy Paste: [[2304.09818] A Real Balanced Dataset For Understanding Bias? Factors That Impact Accuracy, Not Numbers of Identities and Images](http://arxiv.org/abs/2304.09818) #fair
Summary:
The issue of disparities in face recognition accuracy across demographic groups has attracted increasing attention in recent years. Various face image datasets have been proposed as 'fair' or 'balanced' to assess the accuracy of face recognition algorithms across demographics. While these datasets often balance the number of identities and images across demographic groups. It is important to note that the number of identities and images in an evaluation dataset are not the driving factors for 1-to-1 face matching accuracy. Moreover, balancing the number of identities and images does not ensure balance in other factors known to impact accuracy, such as head pose, brightness, and image quality. We demonstrate these issues using several recently proposed datasets. To enhance the capacity for less biased evaluations, we propose a bias-aware toolkit that facilitates the creation of cross-demographic evaluation datasets balanced on factors mentioned in this paper.

Title: Long-Term Fairness with Unknown Dynamics. (arXiv:2304.09362v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2304.09362
Code URL: null
Copy Paste: [[2304.09362] Long-Term Fairness with Unknown Dynamics](http://arxiv.org/abs/2304.09362) #fair
Summary:
While machine learning can myopically reinforce social inequalities, it may also be used to dynamically seek equitable outcomes. In this paper, we formalize long-term fairness in the context of online reinforcement learning. This formulation can accommodate dynamical control objectives, such as driving equity inherent in the state of a population, that cannot be incorporated into static formulations of fairness. We demonstrate that this framing allows an algorithm to adapt to unknown dynamics by sacrificing short-term incentives to drive a classifier-population system towards more desirable equilibria. For the proposed setting, we develop an algorithm that adapts recent work in online learning. We prove that this algorithm achieves simultaneous probabilistic bounds on cumulative loss and cumulative violations of fairness (as statistical regularities between demographic groups). We compare our proposed algorithm to the repeated retraining of myopic classifiers, as a baseline, and to a deep reinforcement learning algorithm that lacks safety guarantees. Our experiments model human populations according to evolutionary game theory and integrate real-world datasets.

Title: Loss minimization yields multicalibration for large neural networks. (arXiv:2304.09424v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2304.09424
Code URL: null
Copy Paste: [[2304.09424] Loss minimization yields multicalibration for large neural networks](http://arxiv.org/abs/2304.09424) #fair
Summary:
Multicalibration is a notion of fairness that aims to provide accurate predictions across a large set of groups. Multicalibration is known to be a different goal than loss minimization, even for simple predictors such as linear functions. In this note, we show that for (almost all) large neural network sizes, optimally minimizing squared error leads to multicalibration. Our results are about representational aspects of neural networks, and not about algorithmic or sample complexity considerations. Previous such results were known only for predictors that were nearly Bayes-optimal and were therefore representation independent. We emphasize that our results do not apply to specific algorithms for optimizing neural networks, such as SGD, and they should not be interpreted as "fairness comes for free from optimizing neural networks".

Title: Equalised Odds is not Equal Individual Odds: Post-processing for Group and Individual Fairness. (arXiv:2304.09779v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2304.09779
Code URL: null
Copy Paste: [[2304.09779] Equalised Odds is not Equal Individual Odds: Post-processing for Group and Individual Fairness](http://arxiv.org/abs/2304.09779) #fair
Summary:
Group fairness is achieved by equalising prediction distributions between protected sub-populations; individual fairness requires treating similar individuals alike. These two objectives, however, are incompatible when a scoring model is calibrated through discontinuous probability functions, where individuals can be randomly assigned an outcome determined by a fixed probability. This procedure may provide two similar individuals from the same protected group with classification odds that are disparately different -- a clear violation of individual fairness. Assigning unique odds to each protected sub-population may also prevent members of one sub-population from ever receiving equal chances of a positive outcome to another, which we argue is another type of unfairness called individual odds. We reconcile all this by constructing continuous probability functions between group thresholds that are constrained by their Lipschitz constant. Our solution preserves the model's predictive power, individual fairness and robustness while ensuring group fairness.

interpretability

Title: Disentangling Neuron Representations with Concept Vectors. (arXiv:2304.09707v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2304.09707
Code URL: https://github.com/lomahony/sw-interpretability
Copy Paste: [[2304.09707] Disentangling Neuron Representations with Concept Vectors](http://arxiv.org/abs/2304.09707) #interpretability
Summary:
Mechanistic interpretability aims to understand how models store representations by breaking down neural networks into interpretable units. However, the occurrence of polysemantic neurons, or neurons that respond to multiple unrelated features, makes interpreting individual neurons challenging. This has led to the search for meaningful vectors, known as concept vectors, in activation space instead of individual neurons. The main contribution of this paper is a method to disentangle polysemantic neurons into concept vectors encapsulating distinct features. Our method can search for fine-grained concepts according to the user's desired level of concept separation. The analysis shows that polysemantic neurons can be disentangled into directions consisting of linear combinations of neurons. Our evaluations show that the concept vectors found encode coherent, human-understandable features.

Title: Emotion fusion for mental illness detection from social media: A survey. (arXiv:2304.09493v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2304.09493
Code URL: null
Copy Paste: [[2304.09493] Emotion fusion for mental illness detection from social media: A survey](http://arxiv.org/abs/2304.09493) #interpretability
Summary:
Mental illnesses are one of the most prevalent public health problems worldwide, which negatively influence people's lives and society's health. With the increasing popularity of social media, there has been a growing research interest in the early detection of mental illness by analysing user-generated posts on social media. According to the correlation between emotions and mental illness, leveraging and fusing emotion information has developed into a valuable research topic. In this article, we provide a comprehensive survey of approaches to mental illness detection in social media that incorporate emotion fusion. We begin by reviewing different fusion strategies, along with their advantages and disadvantages. Subsequently, we discuss the major challenges faced by researchers working in this area, including issues surrounding the availability and quality of datasets, the performance of algorithms and interpretability. We additionally suggest some potential directions for future research.

Title: Graph Neural Network-Based Anomaly Detection for River Network Systems. (arXiv:2304.09367v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2304.09367
Code URL: null
Copy Paste: [[2304.09367] Graph Neural Network-Based Anomaly Detection for River Network Systems](http://arxiv.org/abs/2304.09367) #interpretability
Summary:
Water is the lifeblood of river networks, and its quality plays a crucial role in sustaining both aquatic ecosystems and human societies. Real-time monitoring of water quality is increasingly reliant on in-situ sensor technology. Anomaly detection is crucial for identifying erroneous patterns in sensor data, but can be a challenging task due to the complexity and variability of the data, even under normal conditions. This paper presents a solution to the challenging task of anomaly detection for river network sensor data, which is essential for the accurate and continuous monitoring of water quality. We use a graph neural network model, the recently proposed Graph Deviation Network (GDN), which employs graph attention-based forecasting to capture the complex spatio-temporal relationships between sensors. We propose an alternate anomaly threshold criteria for the model, GDN+, based on the learned graph. To evaluate the model's efficacy, we introduce new benchmarking simulation experiments with highly-sophisticated dependency structures and subsequence anomalies of various types. We further examine the strengths and weaknesses of this baseline approach, GDN, in comparison to other benchmarking methods on complex real-world river network data. Findings suggest that GDN+ outperforms the baseline approach in high-dimensional data, while also providing improved interpretability. We also introduce software called gnnad.

explainability

watermark

diffusion

Title: DiFaReli : Diffusion Face Relighting. (arXiv:2304.09479v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2304.09479
Code URL: null
Copy Paste: [[2304.09479] DiFaReli : Diffusion Face Relighting](http://arxiv.org/abs/2304.09479) #diffusion
Summary:
We present a novel approach to single-view face relighting in the wild. Handling non-diffuse effects, such as global illumination or cast shadows, has long been a challenge in face relighting. Prior work often assumes Lambertian surfaces, simplified lighting models or involves estimating 3D shape, albedo, or a shadow map. This estimation, however, is error-prone and requires many training examples with lighting ground truth to generalize well. Our work bypasses the need for accurate estimation of intrinsic components and can be trained solely on 2D images without any light stage data, multi-view images, or lighting ground truth. Our key idea is to leverage a conditional diffusion implicit model (DDIM) for decoding a disentangled light encoding along with other encodings related to 3D shape and facial identity inferred from off-the-shelf estimators. We also propose a novel conditioning technique that eases the modeling of the complex interaction between light and geometry by using a rendered shading reference to spatially modulate the DDIM. We achieve state-of-the-art performance on standard benchmark Multi-PIE and can photorealistically relight in-the-wild images. Please visit our page: https://diffusion-face-relighting.github.io

Title: Reference-based Image Composition with Sketch via Structure-aware Diffusion Model. (arXiv:2304.09748v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2304.09748
Code URL: null
Copy Paste: [[2304.09748] Reference-based Image Composition with Sketch via Structure-aware Diffusion Model](http://arxiv.org/abs/2304.09748) #diffusion
Summary:
Recent remarkable improvements in large-scale text-to-image generative models have shown promising results in generating high-fidelity images. To further enhance editability and enable fine-grained generation, we introduce a multi-input-conditioned image composition model that incorporates a sketch as a novel modal, alongside a reference image. Thanks to the edge-level controllability using sketches, our method enables a user to edit or complete an image sub-part with a desired structure (i.e., sketch) and content (i.e., reference image). Our framework fine-tunes a pre-trained diffusion model to complete missing regions using the reference image while maintaining sketch guidance. Albeit simple, this leads to wide opportunities to fulfill user needs for obtaining the in-demand images. Through extensive experiments, we demonstrate that our proposed method offers unique use cases for image manipulation, enabling user-driven modifications of arbitrary scenes.

Title: NeuralField-LDM: Scene Generation with Hierarchical Latent Diffusion Models. (arXiv:2304.09787v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2304.09787
Code URL: null
Copy Paste: [[2304.09787] NeuralField-LDM: Scene Generation with Hierarchical Latent Diffusion Models](http://arxiv.org/abs/2304.09787) #diffusion
Summary:
Automatically generating high-quality real world 3D scenes is of enormous interest for applications such as virtual reality and robotics simulation. Towards this goal, we introduce NeuralField-LDM, a generative model capable of synthesizing complex 3D environments. We leverage Latent Diffusion Models that have been successfully utilized for efficient high-quality 2D content creation. We first train a scene auto-encoder to express a set of image and pose pairs as a neural field, represented as density and feature voxel grids that can be projected to produce novel views of the scene. To further compress this representation, we train a latent-autoencoder that maps the voxel grids to a set of latent representations. A hierarchical diffusion model is then fit to the latents to complete the scene generation pipeline. We achieve a substantial improvement over existing state-of-the-art scene generation models. Additionally, we show how NeuralField-LDM can be used for a variety of 3D content creation applications, including conditional scene generation, scene inpainting and scene style manipulation.