Duality Glossary of Terms


Federated Learning  

Federated Learning is a machine learning technique that trains an algorithm across multiple decentralized edge devices, without the need to bring the data to a central location. In this approach, the data remains on the local devices, and the updates to the algorithm iteratively occur on the same devices. Federated learning can be useful in scenarios where the data is distributed across multiple devices or is sensitive to transfer. This technique is commonly used in areas such as healthcare, where patient data privacy is a top priority.

How Does Federated Learning Work?

Federated Learning allows access to a wide variety of data sets without needing to directly share sensitive data, whether it's between a user and a server or between organizations.

To begin, each edge device learns an initial model from local data that gets sent to the server. From there, the various user-specific models are averaged at the central server to come up with an updated global model and complete what is known as a Federated Learning Round. This process can then be repeated as required to come up with improved versions of the model.

Benefits/Drawbacks of Federated Learning

Federated Learning comes with several benefits that include enhanced privacy, greatly reduced learning time, reduced cost of training, and enhanced regulatory compliance. 

Unfortunately, there are also some drawbacks - including debate over whether it provides privacy benefits in the first place. With Federated Learning, it is possible to reverse engineer the underlying data sets based on metadata revealed by the model once it’s complete, and the model is known by all collaborating parties. That said there are platforms such as Duality platform that address those concerns by providing a secured federated learning.
 

MPC

Multiparty computation (MPC) is a technique in cryptography that enables multiple parties to jointly compute a function over their private inputs without revealing those inputs to each other or to any external parties. In other words, MPC allows multiple parties to perform a computation over their confidential data, without revealing any information about that data to others involved in the computation. This technique is particularly useful in scenarios where multiple parties need to collaborate and perform computations, but none of them want to share their data with the others, such as in financial transactions, healthcare data analysis, or voting protocols

How it Works

In an MPC protocol, two or more parties each hold a secret input, and they want to compute a function of their inputs without revealing their inputs to each other. The primary goal of an MPC protocol is to enable the participants to compute the desired computation results while preserving the privacy of their data.

For example, suppose that two hospitals want to collaborate to identify patients with a rare health condition without revealing their identities to each other. In that case, they can use MPC to jointly compute the proper function on their entire patient datasets. This approach can help them maintain privacy while still obtaining valuable insights into rare health conditions.

Benefits of MPC

  • Privacy Preservation: The primary advantage of MPC is privacy preservation. It allows parties to perform computations while keeping their inputs private, ensuring that sensitive information remains secure.
  • Increased Security: MPC increases security by eliminating the need for data to be centralized. Data is broken down into multiple pieces, and each piece is held by a different party. This reduces the risk of a successful data breach or cyberattack.
  • Trustless Collaboration: With MPC, parties can collaborate securely without having to trust one another. 

Drawbacks of MPC

  • Complexity: MPC protocols can be complex to implement, and they require businesses or organizations looking to utilize MPC. 
  • Latency: MPC computations can take longer to perform than other methods due to the need for parties to exchange information securely. This latency can be a significant issue in situations where time is of the essence.
  • Threshold Limitations: Some MPC protocols have a threshold limit on the number of participants involved in a computation. As more parties are involved, the protocol becomes more complicated and can increase the risk of a successful attack.

In summary, MPC has numerous benefits in terms of privacy, security, and trustless collaboration. However, implementing MPC can be challenging, and it may not be suitable for all use cases due to complexity, latency, and threshold limitations.

FHE

FHE stands for Fully Homomorphic Encryption, which is a type of encryption scheme that enables computation on ciphertexts directly, without the need for decryption. This means that the encrypted data remains encrypted throughout the entire computation process, and the result of the computation is also encrypted, without any party having access to the plaintext data at any point.

This property of FHE is particularly useful in scenarios where privacy is a concern, such as in cloud computing, where the data is stored on a remote server and processed by third-party service providers. With FHE, the data can be encrypted and stored on the server, and computation can be performed on the ciphertexts without the server or the service provider ever knowing the plaintext data.

 

The Holy Grail of Cryptography

Due to its unique ability to secure data from end-to-end in all three states, HE has long been dubbed the “Holy Grail of Data Privacy” or the “Holy Grail of Cryptography.”

The idea of HE is not new, and cryptographers first proposed it in 1978. However, they didn’t know at the time if it was possible to achieve. It wasn’t until 2009 when Craig Gentry, then at Stanford, described the first plausible construction for a fully homomorphic encryption scheme, showing that FHE could be realized in principle. Since then, it has been adopted in a variety of areas, including the private and public sectors and academia, where it has been shown to perform at scale.

What FHE Does

FHE is perhaps the most important breakthrough in theoretical computer science of the 21st  century. Since Gentry’s paper was published, research and implementation efforts throughout academia, government, and industry have brought FHE from theory to reality.

HE enables computations, including machine learning and AI analysis, on encrypted data, allowing data scientists, researchers, and data driven enterprises to gain valuable insights without decrypting or exposing the underlying data or models.  This enables organizations to extract value from data while maintaining privacy and complying with applicable regulations. In addition, HE provides a functional and dependable privacy layer, eliminating the trade-off between data privacy and utility. This is particularly useful for enabling collaborations between parties across sensitive data – such as privacy preserving collaborations with patient data between multiple healthcare and research centers, or inter-bank cooperation in financial crime investigations – where different parties can analyze sensitive information without exposing the underlying data to one another. 

Because homomorphically-encrypted data is encrypted from end-to-end in all three states, no trusted third parties are ever required. This allows for computations to be outsourced, keeping both the data and the analytical models used to operate on the data safe, secured, and concealed. A cloud host could run a computation on the data, get an encrypted result, and give that result back to the data owner. The data owner could then decrypt that result, with the decrypted result being the same as if they had run the computation on the original data without encryption. 

Benefits and Drawbacks of FHE

Fully Homomorphic Encryption (FHE) has many potential benefits, but it also has drawbacks that must be taken into consideration.

Benefits:

  • Privacy: Since FHE allows computations to be performed on encrypted data directly, it can provide a higher level of privacy compared to traditional encryption methods. 
  • Security: Homomorphic encryption can keep data secure both in rest and in transit, effectively reducing the risk of data breaches.
  • Accessibility: FHE could make cloud-based computation significantly more accessible and secure, as personal data would not need to be entrusted to third parties.
  • Efficiency: FHE can enable efficient processing of large amounts of data without the need to decrypt and re-encrypt the data, reducing computation time and resources.

Drawbacks:

  • Computational Overhead: One major drawback of FHE is its computational overhead, which typically carries a much higher processing cost than traditional encryption methods.
  • Complexity: Developing FHE algorithms is complex and requires significant expertise in cryptography and mathematics.
  • Limited Industry Adoption: FHE is still a relatively new technology, so it may take time for it to be widely adopted in industry. 4. Key Management: Managing keys with FHE is a complex task, and any mistake could lead to data loss or compromise.

Therefore, FHE has a lot of potential benefits, but it does come with some drawbacks that need to be considered when deciding whether to use it or not.

TEE

A Trusted Execution Environment (TEE) is a secure area within a computer system or mobile device that ensures the confidentiality and integrity of data and processes that are executed inside it. The TEE is isolated and protected from the main operating system and other software applications, which prevents them from accessing or interfering with the data and processes within the TEE. The TEE is typically used for security-sensitive operations, such as secure storage of cryptographic keys, biometric authentication, and secure mobile payments. The TEE provides a high level of assurance that sensitive data and processes remain secure and tamper-proof, even if the main operating system or other software components are compromised.

How a Trusted Execution Environment Works

Trusted Execution Environments are established at the hardware level, which means that they are partitioned and isolated, complete with busses, peripherals, interrupts, memory regions, etc. TEEs run their instance of an operating system known as Trusted OS, and the apps allowed to run in this isolated environment are referred to as Trusted Applications (TA). Untrusted apps run on an open part of the larger operating system referred to as the Rich Execution Environment (REE).

A trusted application has access to the full performance of the device despite operating in an isolated environment, and it is protected from all other applications. Data is usually encrypted in storage and transit and is only decrypted when it’s in the TEE for processing. The CPU blocks access to the TEE by all untrusted apps, regardless of the privileges of the entities requesting access.

To enhance security, two trusted applications running in the TEE also do not have access to each other’s data as they are separated through software and cryptographic functions.

Benefits of Trusted Execution Environment

TEE offers several benefits that include:

  • Data Integrity & Confidentiality: Your organization can use TEE to ensure data accuracy, consistency, and privacy as no third party will have access to the data when it’s unencrypted.
  • Code Integrity: TEE helps implement code integrity policies as your code is authenticated every time before it’s loaded into memory.
  • Secure Collaboration: When used in conjunction with other PETs such as federated learning (FL), multiparty computation (MPC) or fully homomorphic encryption (FHE), TEE allows organizations to securely collaborate without having to trust each other by providing a secure environment where code can be tested without being directly exported. This allows you to gain more value from your sensitive data.
  • Simplified Compliance: TEE provides an easy way to achieve compliance as sensitive data is not exposed, hardware requirements that may be present are met, and the technology is pre-installed on devices such as smartphones and PCs.

TEE Limitations

TEE has several major limitations as compared to software-focused privacy technologies, particularly around the financial burden of acquiring and deploying the technology, retrofitting existing solutions to use TEEs and the challenges of vendor-lock-in.  In short, TEEs are inherently a hardware solution, implying that they need to be purchased, physically delivered, installed and maintained, in addition to this, special software is needed to run on them.  This is a much higher “conversion” burden than software-only privacy technologies.  Also, once the TEEs are installed, they need to be maintained.  There is little commonality between the various TEE vendors’ solutions, and this implies vendor lock-in.  If a major vendor were to stop supporting a specific architecture or, if worse, a hardware design flaw were to be found in a specific vendor’s solution, then a completely new and expensive solution stack would need to be designed, installed and integrated at great cost to the users of the technologies.

In addition to the lifecycle costs, TEE technology is not foolproof as it has its own attack vectors both in the TEE Operating System and in the Trusted Apps (they still involve many lines of code). This has been proven through several lab tests, with Quarkslab successfully exploiting a vulnerability in Kinibi, a TrustZone-based TEE used on some Samsung devices, to obtain code execution in monitor mode.

Differential Privacy

Differential Privacy is a privacy-enhancing technique that allows organizations to collect and analyze data while preserving the privacy of the individuals in the dataset. Differential Privacy adds noise to the data which makes it harder for attackers to identify individual records while still maintaining the aggregate results.

With Differential Privacy, the goal is to provide accurate results without identifying individual records. To achieve this, a randomized function is used to add noise to the data. The amount of noise added to the data is controlled by a value called the privacy budget, which limits the amount of information that can be revealed about individuals in the data set.

Differential Privacy can be used in a variety of contexts, such as collecting and analyzing medical records, conducting surveys, or tracking usage patterns of mobile phones. It provides a way to share aggregate information without compromising the privacy of individuals in the dataset.

The use of Differential Privacy ensures that individuals can safely share their data without risk of their personal information being compromised. It has become an important tool for organizations dealing with sensitive data and striving to maintain a high level of privacy for their users.

How Differential Privacy Works

Differential Privacy is implemented by applying a randomized mechanism, ℳ[D], to any information exposed from a dataset, D, to an exterior observer.  The mechanism works by introducing controlled randomness or “noise” to the exposed data to protect privacy. A Differential Privacy mechanism can employ a range of techniques such as randomized response, shuffling or additive noise. The particular choice of mechanism is, essentially, tailored to the nature and quality of the information sought by the observer.  The mechanism is designed to ensure information-theoretic privacy guarantee that the output of a particular analysis remains fairly the same, whether or not data about a particular individual is included.

Benefits & Limitations of Differential Privacy

Differential Privacy provides many benefits to organizations, including greater control and governance over data, plausible deniability to ensure people are more willing to share their sensitive data, resistance to linking attacks, and regulatory compliance. Limitations include usefulness only for large data sets, risks of privacy leaks, and a lack of end-to-end encryption; there is also no built-in ability to collaborate on multiple data sets. 

 

PETs

Privacy Enhancing Technologies (PETS) are a set of tools, methodologies and techniques that are designed to protect the privacy of individuals and their personal data. These technologies are used to help people maintain control over their personal information and protect them against unauthorized access or misuse by others. Examples of PETs include encryption, anonymous communication tools, digital signatures, and privacy-focused search engines.

PETS can be used in a variety of contexts, such as online transactions, data sharing, and communication systems, to ensure confidentiality, integrity, and authenticity of data. PETs are particularly useful when deploying systems that collect personal data, such as medical records, online shopping histories, or credit scores. By using PETS in these contexts, individuals can maintain control over their personal information and reduce the risks associated with data loss, identity theft, or other privacy violations.

Here are several well-known PETs:
 

Name

Definition

Homomorphic Encryption

Data and/or models encrypted at rest, in transit, and in use (ensuring sensitive data never needs to be decrypted), but still enables analysis of that data.

Multiparty Computation 

Allows multiple parties to perform joint computations on individual inputs without revealing the underlying data between them. 

Differential Privacy

Data aggregation method that adds randomized “noise” to the data; data cannot be reverse engineered to understand the original inputs.

Federated Learning 

Statistical analysis or model training on decentralized data sets; a traveling algorithm where the model gets “smarter” with every analysis of the data.

Secure Enclave/Trusted Execution Environment

A physically isolated execution environment, usually a secure area of a main processor, that guarantees code and data loaded inside to be protected. 

Zero-Knowledge Proofs

Cryptographic method by which one party can prove to another party that a given statement is true without conveying any additional information apart from the fact that the statement is indeed true.