Database Rights and Knowledge Graph IP

Sui generis database protection, knowledge graph ownership, and the GDPR erasure conflict

Ip law for ai builders — Database Rights and Knowledge Graph IP
Key takeaways
  • EU sui generis database right (Directive 96/9/EC): protects databases that represent a 'substantial investment' in obtaining, verifying, or presenting the data. The right is automatic — no registration required — and lasts 15 years from completion, resetting if the database is substantially updated. The right prevents any third party from extracting and reusing a substantial part of the database without authorization.
  • Copyright in selection and arrangement: under India's §13 Copyright Act 1957 and the US Feist Publications v. Rural Telephone doctrine, a database is protected by copyright if there is originality in the selection and arrangement of the data. A curated behavioral ontology with non-obvious category relationships and deliberate annotation decisions has a copyright claim. A pure alphabetical or numerical listing does not.
  • Who owns the ChromaDB collection: the platform operator who designed the schema, the ingestion pipeline, and the embedding methodology has strong ownership claims to the knowledge base as a structured work. Clients who contributed documents have potential claims to their specific contributions. The embedding model itself (Ollama's) is licensed separately. A clear data ownership clause in the platform's Terms of Service is the primary protection mechanism — courts will follow the contract.
  • Behavioral embeddings as trade secrets: the specific embedding approach, fine-tuning methodology, and ontology schema can be protected as trade secrets indefinitely — no registration required, no public disclosure. This protection survives as long as the secrecy is maintained.
  • GDPR tension: the EU sui generis database right lasts 15 years. The GDPR right to erasure operates on demand. When a data subject requests erasure, their data must be deleted from the database even if doing so affects the database's commercial value. Database rights cannot be invoked to resist a GDPR erasure request — the data protection right takes precedence.
Risk signals
  • No data ownership clause in client contracts: if clients contribute documents, structured data, or embeddings to the platform, the ownership of the resulting knowledge base is ambiguous and a dispute becomes likely.
  • Extracting and incorporating data from third-party databases — publicly accessible via API or scraping — without assessing whether the source has EU sui generis database rights. Transformation of the data does not automatically eliminate the database right.
  • Storing personal data in vector databases without a clear legal basis, documented retention schedule, and a tested erasure mechanism. GDPR applies to embeddings that can be linked to natural persons even indirectly.
Action items
  • Add explicit data ownership and platform IP clauses to the Terms of Service and client contracts: client data remains client data; the platform schema, embedding pipeline, and ontology structure are platform IP. This avoids the ownership ambiguity that leads to disputes at contract termination.
  • For the EU market, document the investment made in building the behavioral ontology — staff hours, third-party data acquisition costs, annotation effort — as this evidence will be needed to assert the sui generis right in enforcement proceedings.
  • Implement a selective vector erasure mechanism: when a GDPR erasure request is received, identify all vectors associated with the data subject and delete them. Provide a mechanism that does not require a full database rebuild. Document each erasure in an erasure log keyed to the subject and the request date.

Building a behavioral ontology, embedding model, or knowledge graph involves substantial investment. In the EU, that investment creates an enforceable sui generis database right lasting 15 years. In India and the US, protection relies on copyright in selection and arrangement. GDPR complicates both.

Key Analysis

EU sui generis database right (Directive 96/9/EC): protects databases that represent a 'substantial investment' in obtaining, verifying, or presenting the data. The right is automatic — no registration required — and lasts 15 years from completion, resetting if the database is substantially updated. The right prevents any third party from extracting and reusing a substantial part of the database without authorization.
Copyright in selection and arrangement: under India's §13 Copyright Act 1957 and the US Feist Publications v. Rural Telephone doctrine, a database is protected by copyright if there is originality in the selection and arrangement of the data. A curated behavioral ontology with non-obvious category relationships and deliberate annotation decisions has a copyright claim. A pure alphabetical or numerical listing does not.
Who owns the ChromaDB collection: the platform operator who designed the schema, the ingestion pipeline, and the embedding methodology has strong ownership claims to the knowledge base as a structured work. Clients who contributed documents have potential claims to their specific contributions. The embedding model itself (Ollama's) is licensed separately. A clear data ownership clause in the platform's Terms of Service is the primary protection mechanism — courts will follow the contract.
Behavioral embeddings as trade secrets: the specific embedding approach, fine-tuning methodology, and ontology schema can be protected as trade secrets indefinitely — no registration required, no public disclosure. This protection survives as long as the secrecy is maintained.
GDPR tension: the EU sui generis database right lasts 15 years. The GDPR right to erasure operates on demand. When a data subject requests erasure, their data must be deleted from the database even if doing so affects the database's commercial value. Database rights cannot be invoked to resist a GDPR erasure request — the data protection right takes precedence.

Risk Signals

No data ownership clause in client contracts: if clients contribute documents, structured data, or embeddings to the platform, the ownership of the resulting knowledge base is ambiguous and a dispute becomes likely.
Extracting and incorporating data from third-party databases — publicly accessible via API or scraping — without assessing whether the source has EU sui generis database rights. Transformation of the data does not automatically eliminate the database right.
Storing personal data in vector databases without a clear legal basis, documented retention schedule, and a tested erasure mechanism. GDPR applies to embeddings that can be linked to natural persons even indirectly.

Action Items

Add explicit data ownership and platform IP clauses to the Terms of Service and client contracts: client data remains client data; the platform schema, embedding pipeline, and ontology structure are platform IP. This avoids the ownership ambiguity that leads to disputes at contract termination.
For the EU market, document the investment made in building the behavioral ontology — staff hours, third-party data acquisition costs, annotation effort — as this evidence will be needed to assert the sui generis right in enforcement proceedings.
Implement a selective vector erasure mechanism: when a GDPR erasure request is received, identify all vectors associated with the data subject and delete them. Provide a mechanism that does not require a full database rebuild. Document each erasure in an erasure log keyed to the subject and the request date.

LinkedIn

Technical Deep Dive

Read the technical deep dive

See the implementation walkthrough on govindpreetsingh.com

Read on govindpreetsingh.com →

Request a consultation

This is a lightweight intake endpoint for now. It is structured so the practice management system can later take over scheduling, conflict checks and matter creation.

Submitting this form does not create an advocate-client relationship. Please avoid sending confidential details until engagement is confirmed.