Federated learning (FL)
We are focused on studying three different problems in FL schemes:
- Incorporating verifiability to secure different stages of the federated learning pipeline.
- Designing federated learning architectures with decentralized coordination.
- Unresolved privacy challenges in federeted learning.
Verifiability in the federated learning pipeline
In many contemporary machine learning workflows, particularly those involving edge computing, outsourced training, or federated learning, the entity that develops or deploys a model often lacks direct control over the training environment. This creates a security gap, as the training process may occur on untrusted infrastructure or with data provided by third parties. A growing concern in such scenarios involves model poisoning and data poisoning attacks, where adversaries manipulate the training process or inject malicious data to corrupt the resulting model. These threats not only undermine model accuracy but can also introduce backdoors or hidden behaviors, posing serious risks in safety-critical and privacy-sensitive applications.
To address these concerns, we make use of cryptographic primitives to design verifiable colaborative learning schemes that protect the integrity of outsourced model training against both data and model poisoning. These scheme employ a combination of cryptographic hashes, digital signatures, and zero-knowledge proofs to enforce trust and accountability throughout the training pipeline. Hashes provide continuity and traceability for data collected from authenticated sources, ensuring it remains untampered. Digital signatures authenticate the origin and integrity of training inputs, while zero-knowledge proofs enable verifiable assurance that the training entity has correctly executed the model update process without revealing sensitive information. This design allows to securely delegate training tasks to potentially untrusted parties while maintaining end-to-end verifiability.
Decentralized federated learning
Federated learning enables collaborative model training across distributed end devices, ensuring that raw user data remains local and never needs to be uploaded to a central repository. This decentralization enhances privacy by design; however, most practical implementations of federated learning still rely heavily on a central server to orchestrate the training process. This centralization reintroduces vulnerabilities, including single points of failure, trust assumptions in the coordinator, and risks of information leakage or manipulation. These limitations undermine the full potential of federated learning as a privacy-preserving and resilient paradigm for distributed machine learning.
To address these challenges, we design a fully decentralized federated learning framework built on blockchain technology. We eliminate the need for a central coordinator by leveraging a public blockchain to manage orchestration, record-keeping, and incentivization. The framework incorporates homomorphic encryption to ensure the confidentiality of both training data and local model updates, preventing unauthorized access even during aggregation. To guarantee correctness and integrity without revealing private information, zkSNARKs (zero-knowledge Succinct Non-interactive Arguments of Knowledge) are used to make model parameters and updates publicly verifiable. This combination of cryptographic tools provides strong privacy, security, and transparency guarantees throughout the learning lifecycle. We have implemented a proof-of-concept prototype and evaluate it on a public blockchain, demonstrating its practical viability in terms of computational overhead and on-chain costs, particularly when deployed on widely-used platforms like Ethereum.
Privacy challenges in federated learning
Federated Learning (FL) represents a promising approach to privacy-preserving machine learning by enabling decentralized model training directly on user devices, thereby ensuring that raw data remains local and is never shared with a central server. While this design mitigates many traditional privacy risks associated with centralized data collection, it does not eliminate all vulnerabilities
In particular, FL remains susceptible to property inference attacks—a class of threats in which an adversary can infer whether a specific property or attribute is present in the training data, even when that property is unrelated to the primary task of the global model. This type of leakage poses significant risks in sensitive domains such as healthcare, finance, or personalized services, where secondary attributes may reveal private user information.
In this work, we investigate the feasibility and limitations of property inference attacks in FL settings by replicating and extending previous work using TensorFlow Federated. Our experimental evaluation confirms the reproducibility and credibility of these attacks, while also shedding light on their robustness across different conditions. Notably, we identify two core factors contributing to attack success: the pronounced impact of early model updates during the initial training rounds, and the heightened effectiveness of attacks when the target property is rare in the dataset—due to the stronger gradient signal such samples produce. These findings suggest that even in the absence of malicious tampering or poisoning, FL systems are vulnerable to subtle yet powerful privacy threats. As a result, our study underscores the urgent need for improved defenses and privacy-enhancing techniques to safeguard sensitive information in federated learning deployments.