Data Security and Privacy with Palantir AIP (Part 5: A Palantir Case Study)
Data Security and Privacy with Palantir AIP
This article explores how Palantir's AIP prioritizes data security and privacy. Going beyond simple access control, we employ layers of security mechanisms such as data provenance tracking, marking, restricted views, and data obfuscation with ciphers to protect sensitive information while enabling efficient data utilization.
The full article is available at this link, as it was too long to include in this email!
Interested in cooperation with Morph Systems? Contact us at mingyupark@morphsys.ai
As in previous AIPCon 6 Pre-Shows, we will explore key access control, security, and privacy elements against the backdrop of the fictional enterprise Onyx Incorporated.
Tracking Data Provenance
The Onyx internal directory application takes a single ontology object type called "employee" as input. Behind this object type are multiple datasets, and the data lineage allows you to see how the raw data is transformed, cleaned, and integrated into the ontology. As you work within the platform, data provenance is automatically maintained as you derive downstream data.
Initially, an employee object might only have a single backup dataset (supporting an internal directory). However, organizations may need to integrate a variety of sensitive data, such as financials, payroll, and more, to get a complete picture of employee information. The Palantir platform enables you to build applications, analytics, and monitoring that integrate data without compromising security or privacy.
Access Control
Developers also often have access to sensitive data, requiring secure access controls at both design time and runtime, and Palantir's access controls apply to developers, end users, and all roles in between. Currently, users can only access one set of upstream data, and AIP uses access control to ensure that users only see the data they are authorized to see.
If you add the user to a group that protects sensitive data and then come back to them, they can see two backup data sets (address data, payroll data) that they couldn't see before.
Data lineage is useful not only for understanding where your data sources come from, but also for understanding their freshness, build availability, original source, and security. You can use data lineage to determine whether the current user has access to something. When switching to a different demo user who is not in the current group, you can see what data the user can see (blue box) and what data they cannot see (red box).
Markings
Markings are the most basic form of access control on the platform: a binary form of access control where you can only access marked data if you have a specific marking. In Data Linage, you can see the markings by the icon in the upper right corner of each dataset, and you can see that the payroll dataset is protected by both HR data markings and payroll data markings.
We can see that the user was added to a group and was able to see sensitive data that was previously hidden from view, because the group and the marking were linked to gain access.
Marking applies access control statically to individual data assets, but also through the data provenance and lineage. If upstream data is marked, all data derived downstream will inherit that marking by default. This enforcement extends all the way up to the ontology.
Restricted Views
Unlike the previous markings, restricted views implement lower-level (row-level access control within a dataset). You can use restricted views to partition access to data within a dataset into multiple subsets based on specific conditions at the row level. Restricted views are also implemented as group conditions, where a row can be viewed only if you are a member of the group, and not otherwise.
For example, suppose you have an organizational policy that US HR staff can only view payroll information for US-based employees, and international HR staff can only view payroll data for international employees. Therefore, the current user is not authorized to view data for international employees, which includes all data for international employees with an international pay group. As you can see in the screen, you can only see rows for the specific 'payroll_group' (US-based employees).
Cipher
Cipher is another layer of security and data protection on top of basic access control. It is used when access control alone cannot completely minimize data (when data is needed for certain workflows, but not always). While marking or restricted views remove unauthorized data, ciphers are useful when specific cells of information are needed for downstream operations.
With a cipher, you can obfuscate data based on an open encryption algorithm and then decrypt it to get it back later when you need it. On the screen, you can see that the address is displayed in a scrambled format called 'cipher text'. Single values can be brought back when a specific workflow needs them
AI and agent workflows
Because features like marking (granting section access), ciphers (basic obfuscation), and restricted views (enforcing property-level access control in the ontology) are built into the ontology, the same types of controls are applied to everything that is built on top of it. This means that AI agents must adhere to the same marking, security, and access control principles as humans. There are also additional access control primitives, such as tool scopes and logic, that control what an agent can access and what it can do.
Early investments in data protection, privacy, and access control will ensure that your software can accelerate in the current era of generative AI. All of the generative AI workflows that organizations need can only be built with a foundation of data protection and data governance. Palantir invested in data protection and governance as a core concept from the beginning. This approach has been effective and has allowed us to deploy real workflows in real environments. Generative AI provides tools to minimize data while making it effective in production environments, enabling both security and productivity to be achieved.









