Public vs. private: AI for IP and how both stack up on patentability, confidentiality, and security
May 3, 2024 | 3 min readIn my work helping IP teams integrate AI into their IP management workflows, there’s a question that I always hear: “If I take an invention disclosure, which is a confidential document, and put it into your platform, does that constitute a public disclosure? Does that cause me to lose confidentiality or trigger a patent bar?”
To answer this question, we need to draw the distinction between public and private AI, and address the differences between disclosure, confidentiality, and security.
Whether using AI constitutes public disclosure depends on whether it’s public or private
Broadly speaking, a public disclosure of an invention triggers a bar that makes the invention ineligible for patenting. And right now, the general consensus is that disclosing an invention to a public AI constitutes a public disclosure.
On the other hand, use of private instances of AI don’t constitute public disclosures. Examples of AI platforms that can be deployed as private AIs include Amazon SageMaker, Google Cloud AI, Microsoft Azure AI, and IBM’s Watson — and Tradespace.
The difference between a public and a private instance of AI comes down to model access. In a public AI system like the free versions of ChatGPT and Claude, everybody is using a single instance of a model. User inputs serve as training inputs that improve the model outputs for all users. Once a model’s weights have been updated based on your information, it becomes co-mingled with all other training data and enters the public domain. Data sharing like this is more common than you might think: Google Search does this too. All Google search and click data is used to improve the search algorithm for everyone (and to inform highly targeted predictions for advertising).
Paid doesn’t automatically mean private — and private can come with a major tradeoff
Just because an AI tool is paid doesn’t mean it’s automatically private. Most modern SaaS tools operate using a “multi-tenant” architecture where all account data is stored in the same cloud database as every other, but each account’s data stays separate thanks to widely available tooling. This separation isn’t possible with Large Language Models. If multiple accounts are using the same version of an LLM, called an Instance, and one account provides data for training, the models’ weights will update and influence the outputs for all accounts.
For a model to be truly private, SaaS vendors would need to spin up a separate LLM instance for every single account they serve. Even though this increases operational complexity for the vendor, separate instances are the only way to ensure that the data you provide to a model doesn’t influence other accounts’ weights and subsequently trigger a public disclosure. Microsoft Azure, which has been a frontrunner in establishing best practices for AI, outlines these considerations in its guide to multitenancy.
One tradeoff in the Private LLM approach is lack of recency. Since private models are purposely isolated, they don’t benefit from the latest training data. Tasks like prior art search and competitive landscaping require up-to-date intelligence, so relying on private instances can create a serious gap.
Fortunately, recent advances in AI services have solved this dilemma. At Tradespace, for example, we use a “hub-and-spoke” model to address the tradeoff between privacy and recency. We have a model that’s updated on a recurring basis, the hub, and then when we push these updates, each private LLM instance we support essentially gets retrained. Any data flowing out of our private models back to our hub is prevented. Doing this would have been prohibitively expensive even 12 months ago, but advances in AI orchestration mean we can do this on a monthly basis for hundreds of private model instances. Soon, we’ll be doing it weekly.
Even when using a private AI, it’s important to confirm with your provider that your data is being used to train your model only, not anybody else’s. You can also specify that your own model will not get trained on your data.
Confidentiality means ensuring trust and controls
Confidentiality is a similar but separate question to privacy that asks, “How can I be confident that what I’m putting in here will not be shared with anyone else? In the context of AI, the question becomes, “How do I know that you aren’t taking my data and using it to train your system?” In both, the questions are essentially ones of trust and, even better, controls. The best way to ensure both trust and controls is through clear, standardized service terms.
As discussed above, Microsoft Azure has been a leader in developing industry-standard terms governing training and data retention. Azure’s default is to not train on customers’ data. They also purge all data from their servers and encrypt it when it’s at rest and in-transit. Our philosophy at Tradespace is that clear, user-oriented terms like these are critical to building trust, and outweigh any short-term gains a company might realize from training without explicit permission.
Evaluate an AI vendor’s security just like you would any other SaaS provider
Security is likely to be the most well-understood of the three concerns discussed here. Questions like, “Can I trust your construction? Are you resistant to leaks and hacking? Even if I trust your intentions, how do I trust the integrity of what you’ve built?” These questions have been common refrains for IT departments long before LLMs.
Evaluating whether an AI vendor can be trusted from a security standpoint resembles the evaluation of any SaaS provider. Different providers will vary in their security capabilities and priorities, but just like with SaaS, adequate AI security is possible. When SaaS was first hitting the market, similar concerns existed. But now, and for a long time, even the CIA uses AWS. A big driver of commercial Cloud adoption has been standardized security frameworks. In the commercial world, SOC2 certification accelerated SaaS adoption. In the University space, the HECVAT offers an easy way for technology transfer offices to assess cloud risk. Already, we’ve seen companies and universities adapting these frameworks to cover AI models.
If the prospect of adopting AI for your IP team raises concerns, you’re correct to have questions, but you need not be afraid of AI as a whole — AI isn’t a monolith. Though these common concerns of public disclosure, confidentiality, and security are important, they’re also addressable. Start with a strong fact base and select an architecture that addresses your needs.