llamaIndex - 💡(How to fix) Fix [Question]: SchemaLLMPathExtractor: can't get property extraction to work [3 comments, 3 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
run-llama/llama_index#21142Fetched 2026-04-08 01:26:26
View on GitHub
Comments
3
Participants
3
Timeline
10
Reactions
0
Author
Timeline (top)
commented ×3mentioned ×3subscribed ×3labeled ×1

Code Example

from typing import Literal
from llama_index.core import Document, StorageContext
from llama_index.core.indices.property_graph import (
    PropertyGraphIndex,
    SchemaLLMPathExtractor,
)
from llama_index.core.prompts.base import PromptTemplate
from llama_index.llms.openai import OpenAI
from llama_index.graph_stores.neo4j import Neo4jPropertyGraphStore
import os

llm = OpenAI(model="gpt-4o")

entities = Literal["PERSON", "PLACE", "THING"]
relations = Literal["PART_OF", "HAS", "IS_A"]
schema = {
    "PERSON": ["PART_OF", "HAS", "IS_A"],
    "PLACE": ["PART_OF", "HAS"],
    "THING": ["IS_A"],
}

text = """
Alice Johnson is a 30 year old software engineer living in Vienna.
She has worked at Acme Corp since 2020 as a backend developer.
Acme Corp is part of the Tech Industry and was founded in 2005.
Alice owns a laptop called ThinkPad X1 Carbon which was purchased in 2022.
She also owns a smartphone, an iPhone 14, bought in 2023.
Bob Smith is a 45 year old manager at Acme Corp and has supervised Alice since 2021.
Vienna is part of Austria and has a population of about 1.9 million people.
"""

docs = [Document(text=text)]

entity_props = [
    ("age", "age of a person"),
    ("role", "job or profession"),
]

relation_props = [
    ("since", "when the relation started"),
    ("type", "type or category of the relation"),
]

extract_prompt = PromptTemplate(
    "Extract a property graph from the text.\n"
    "Return a list of (subject, relation, object) triples.\n"
    "Include entity and relation properties if present.\n"
    "Use only the provided schema.\n"
    "{text}\n"
    "-------\n"
)

# --- Extractor ---
kg_extractor = SchemaLLMPathExtractor(
    llm=llm,
    possible_entities=entities,
    possible_relations=relations,
    kg_validation_schema=schema,
    possible_entity_props=entity_props,
    possible_relation_props=relation_props,
    # extract_prompt=extract_prompt,
    strict=True,
    max_triplets_per_chunk=10,
    allow_additional_properties=False,
)

# --- Neo4j connection ---
graph_store = Neo4jPropertyGraphStore(
    username="neo4j",
    password="password",
    url="bolt://localhost:7687",
)

storage_context = StorageContext.from_defaults(
    property_graph_store=graph_store
)

# --- Build index ---
index = PropertyGraphIndex.from_documents(
    docs,
    storage_context=storage_context,
    include_embeddings=False,
    kg_extractors=[kg_extractor],
)

---

# With default extraction prompt:
# triplets=[Triplet(subject=Entity(type='PERSON', name='Alice Johnson', properties=None), relation=Relation(type='IS_A', properties=None), object=Entity(type='THING', name='software engineer', properties=None)), Triplet(subject=Entity(type='PERSON', name='Alice Johnson', properties=None), relation=Relation(type='HAS', properties=None), object=Entity(type='THING', name='ThinkPad X1 Carbon', properties=None)), Triplet(subject=Entity(type='THING', name='ThinkPad X1 Carbon', properties=None), relation=Relation(type='HAS', properties=None), object=Entity(type='THING', name='laptop', properties=None)), Triplet(subject=Entity(type='PERSON', name='Alice Johnson', properties=None), relation=Relation(type='HAS', properties=None), object=Entity(type='THING', name='iPhone 14', properties=None)), Triplet(subject=Entity(type='THING', name='iPhone 14', properties=None), relation=Relation(type='HAS', properties=None), object=Entity(type='THING', name='smartphone', properties=None)), Triplet(subject=Entity(type='PERSON', name='Alice Johnson', properties=None), relation=Relation(type='PART_OF', properties=None), object=Entity(type='PLACE', name='Vienna', properties=None)), Triplet(subject=Entity(type='PLACE', name='Vienna', properties=None), relation=Relation(type='PART_OF', properties=None), object=Entity(type='PLACE', name='Austria', properties=None)), Triplet(subject=Entity(type='PERSON', name='Bob Smith', properties=None), relation=Relation(type='IS_A', properties=None), object=Entity(type='THING', name='manager', properties=None)), Triplet(subject=Entity(type='PERSON', name='Bob Smith', properties=None), relation=Relation(type='HAS', properties=None), object=Entity(type='THING', name='supervised Alice', properties=None)), Triplet(subject=Entity(type='THING', name='Acme Corp', properties=None), relation=Relation(type='PART_OF', properties=None), object=Entity(type='THING', name='Tech Industry', properties=None))]

# With custom extraction prompt:
# triplets=[Triplet(subject=Entity(type='PERSON', name='Alice Johnson', properties=None), relation=Relation(type='IS_A', properties=None), object=Entity(type='THING', name='software engineer', properties=None)), Triplet(subject=Entity(type='PERSON', name='Alice Johnson', properties=None), relation=Relation(type='HAS', properties=None), object=Entity(type='PLACE', name='Vienna', properties=None)), Triplet(subject=Entity(type='PERSON', name='Alice Johnson', properties=None), relation=Relation(type='HAS', properties=None), object=Entity(type='THING', name='ThinkPad X1 Carbon', properties=None)), Triplet(subject=Entity(type='THING', name='ThinkPad X1 Carbon', properties=None), relation=Relation(type='PART_OF', properties=None), object=Entity(type='THING', name='laptop', properties=None)), Triplet(subject=Entity(type='THING', name='ThinkPad X1 Carbon', properties=None), relation=Relation(type='HAS', properties=None), object=Entity(type='THING', name='laptop', properties=None)), Triplet(subject=Entity(type='PERSON', name='Alice Johnson', properties=None), relation=Relation(type='HAS', properties=None), object=Entity(type='THING', name='iPhone 14', properties=None)), Triplet(subject=Entity(type='THING', name='iPhone 14', properties=None), relation=Relation(type='PART_OF', properties=None), object=Entity(type='THING', name='smartphone', properties=None)), Triplet(subject=Entity(type='PERSON', name='Alice Johnson', properties=None), relation=Relation(type='HAS', properties=None), object=Entity(type='THING', name='smartphone', properties=None)), Triplet(subject=Entity(type='PERSON', name='Alice Johnson', properties=None), relation=Relation(type='HAS', properties=None), object=Entity(type='THING', name='backend developer', properties=None)), Triplet(subject=Entity(type='PERSON', name='Alice Johnson', properties=None), relation=Relation(type='HAS', properties=None), object=Entity(type='THING', name='role', properties=None)), Triplet(subject=Entity(type='PERSON', name='Alice Johnson', properties=None), relation=Relation(type='HAS', properties=None), object=Entity(type='THING', name='age', properties=None)), Triplet(subject=Entity(type='PERSON', name='Bob Smith', properties=None), relation=Relation(type='IS_A', properties=None), object=Entity(type='THING', name='manager', properties=None)), Triplet(subject=Entity(type='PERSON', name='Bob Smith', properties=None), relation=Relation(type='HAS', properties=None), object=Entity(type='THING', name='role', properties=None)), Triplet(subject=Entity(type='PERSON', name='Bob Smith', properties=None), relation=Relation(type='HAS', properties=None), object=Entity(type='THING', name='age', properties=None)), Triplet(subject=Entity(type='PERSON', name='Bob Smith', properties=None), relation=Relation(type='HAS', properties=None), object=Entity(type='THING', name='supervisor', properties=None)), Triplet(subject=Entity(type='PERSON', name='Bob Smith', properties=None), relation=Relation(type='HAS', properties=None), object=Entity(type='PERSON', name='Alice Johnson', properties=None)), Triplet(subject=Entity(type='PLACE', name='Vienna', properties=None), relation=Relation(type='PART_OF', properties=None), object=Entity(type='PLACE', name='Austria', properties=None)), Triplet(subject=Entity(type='THING', name='Acme Corp', properties=None), relation=Relation(type='PART_OF', properties=None), object=Entity(type='THING', name='Tech Industry', properties=None)), Triplet(subject=Entity(type='THING', name='Acme Corp', properties=None), relation=Relation(type='HAS', properties=None), object=Entity(type='THING', name='founded', properties=None)), Triplet(subject=Entity(type='THING', name='Acme Corp', properties=None), relation=Relation(type='HAS', properties=None), object=Entity(type='THING', name='since', properties=None))]
RAW_BUFFERClick to expand / collapse

Question Validation

  • I have searched both the documentation and discord for an answer.

Question

I'm trying to extract triples with properties from a toy example using SchemaLLMPathExtractor. I can't get it to extract properties for me. I couldn't find an example in the docs. What am I missing here?

Minimal example:

from typing import Literal
from llama_index.core import Document, StorageContext
from llama_index.core.indices.property_graph import (
    PropertyGraphIndex,
    SchemaLLMPathExtractor,
)
from llama_index.core.prompts.base import PromptTemplate
from llama_index.llms.openai import OpenAI
from llama_index.graph_stores.neo4j import Neo4jPropertyGraphStore
import os

llm = OpenAI(model="gpt-4o")

entities = Literal["PERSON", "PLACE", "THING"]
relations = Literal["PART_OF", "HAS", "IS_A"]
schema = {
    "PERSON": ["PART_OF", "HAS", "IS_A"],
    "PLACE": ["PART_OF", "HAS"],
    "THING": ["IS_A"],
}

text = """
Alice Johnson is a 30 year old software engineer living in Vienna.
She has worked at Acme Corp since 2020 as a backend developer.
Acme Corp is part of the Tech Industry and was founded in 2005.
Alice owns a laptop called ThinkPad X1 Carbon which was purchased in 2022.
She also owns a smartphone, an iPhone 14, bought in 2023.
Bob Smith is a 45 year old manager at Acme Corp and has supervised Alice since 2021.
Vienna is part of Austria and has a population of about 1.9 million people.
"""

docs = [Document(text=text)]

entity_props = [
    ("age", "age of a person"),
    ("role", "job or profession"),
]

relation_props = [
    ("since", "when the relation started"),
    ("type", "type or category of the relation"),
]

extract_prompt = PromptTemplate(
    "Extract a property graph from the text.\n"
    "Return a list of (subject, relation, object) triples.\n"
    "Include entity and relation properties if present.\n"
    "Use only the provided schema.\n"
    "{text}\n"
    "-------\n"
)

# --- Extractor ---
kg_extractor = SchemaLLMPathExtractor(
    llm=llm,
    possible_entities=entities,
    possible_relations=relations,
    kg_validation_schema=schema,
    possible_entity_props=entity_props,
    possible_relation_props=relation_props,
    # extract_prompt=extract_prompt,
    strict=True,
    max_triplets_per_chunk=10,
    allow_additional_properties=False,
)

# --- Neo4j connection ---
graph_store = Neo4jPropertyGraphStore(
    username="neo4j",
    password="password",
    url="bolt://localhost:7687",
)

storage_context = StorageContext.from_defaults(
    property_graph_store=graph_store
)

# --- Build index ---
index = PropertyGraphIndex.from_documents(
    docs,
    storage_context=storage_context,
    include_embeddings=False,
    kg_extractors=[kg_extractor],
)

Results (from print(kg_schema) right before triplets = self._prune_invalid_triplets(kg_schema) in schema_llm.py:

# With default extraction prompt:
# triplets=[Triplet(subject=Entity(type='PERSON', name='Alice Johnson', properties=None), relation=Relation(type='IS_A', properties=None), object=Entity(type='THING', name='software engineer', properties=None)), Triplet(subject=Entity(type='PERSON', name='Alice Johnson', properties=None), relation=Relation(type='HAS', properties=None), object=Entity(type='THING', name='ThinkPad X1 Carbon', properties=None)), Triplet(subject=Entity(type='THING', name='ThinkPad X1 Carbon', properties=None), relation=Relation(type='HAS', properties=None), object=Entity(type='THING', name='laptop', properties=None)), Triplet(subject=Entity(type='PERSON', name='Alice Johnson', properties=None), relation=Relation(type='HAS', properties=None), object=Entity(type='THING', name='iPhone 14', properties=None)), Triplet(subject=Entity(type='THING', name='iPhone 14', properties=None), relation=Relation(type='HAS', properties=None), object=Entity(type='THING', name='smartphone', properties=None)), Triplet(subject=Entity(type='PERSON', name='Alice Johnson', properties=None), relation=Relation(type='PART_OF', properties=None), object=Entity(type='PLACE', name='Vienna', properties=None)), Triplet(subject=Entity(type='PLACE', name='Vienna', properties=None), relation=Relation(type='PART_OF', properties=None), object=Entity(type='PLACE', name='Austria', properties=None)), Triplet(subject=Entity(type='PERSON', name='Bob Smith', properties=None), relation=Relation(type='IS_A', properties=None), object=Entity(type='THING', name='manager', properties=None)), Triplet(subject=Entity(type='PERSON', name='Bob Smith', properties=None), relation=Relation(type='HAS', properties=None), object=Entity(type='THING', name='supervised Alice', properties=None)), Triplet(subject=Entity(type='THING', name='Acme Corp', properties=None), relation=Relation(type='PART_OF', properties=None), object=Entity(type='THING', name='Tech Industry', properties=None))]

# With custom extraction prompt:
# triplets=[Triplet(subject=Entity(type='PERSON', name='Alice Johnson', properties=None), relation=Relation(type='IS_A', properties=None), object=Entity(type='THING', name='software engineer', properties=None)), Triplet(subject=Entity(type='PERSON', name='Alice Johnson', properties=None), relation=Relation(type='HAS', properties=None), object=Entity(type='PLACE', name='Vienna', properties=None)), Triplet(subject=Entity(type='PERSON', name='Alice Johnson', properties=None), relation=Relation(type='HAS', properties=None), object=Entity(type='THING', name='ThinkPad X1 Carbon', properties=None)), Triplet(subject=Entity(type='THING', name='ThinkPad X1 Carbon', properties=None), relation=Relation(type='PART_OF', properties=None), object=Entity(type='THING', name='laptop', properties=None)), Triplet(subject=Entity(type='THING', name='ThinkPad X1 Carbon', properties=None), relation=Relation(type='HAS', properties=None), object=Entity(type='THING', name='laptop', properties=None)), Triplet(subject=Entity(type='PERSON', name='Alice Johnson', properties=None), relation=Relation(type='HAS', properties=None), object=Entity(type='THING', name='iPhone 14', properties=None)), Triplet(subject=Entity(type='THING', name='iPhone 14', properties=None), relation=Relation(type='PART_OF', properties=None), object=Entity(type='THING', name='smartphone', properties=None)), Triplet(subject=Entity(type='PERSON', name='Alice Johnson', properties=None), relation=Relation(type='HAS', properties=None), object=Entity(type='THING', name='smartphone', properties=None)), Triplet(subject=Entity(type='PERSON', name='Alice Johnson', properties=None), relation=Relation(type='HAS', properties=None), object=Entity(type='THING', name='backend developer', properties=None)), Triplet(subject=Entity(type='PERSON', name='Alice Johnson', properties=None), relation=Relation(type='HAS', properties=None), object=Entity(type='THING', name='role', properties=None)), Triplet(subject=Entity(type='PERSON', name='Alice Johnson', properties=None), relation=Relation(type='HAS', properties=None), object=Entity(type='THING', name='age', properties=None)), Triplet(subject=Entity(type='PERSON', name='Bob Smith', properties=None), relation=Relation(type='IS_A', properties=None), object=Entity(type='THING', name='manager', properties=None)), Triplet(subject=Entity(type='PERSON', name='Bob Smith', properties=None), relation=Relation(type='HAS', properties=None), object=Entity(type='THING', name='role', properties=None)), Triplet(subject=Entity(type='PERSON', name='Bob Smith', properties=None), relation=Relation(type='HAS', properties=None), object=Entity(type='THING', name='age', properties=None)), Triplet(subject=Entity(type='PERSON', name='Bob Smith', properties=None), relation=Relation(type='HAS', properties=None), object=Entity(type='THING', name='supervisor', properties=None)), Triplet(subject=Entity(type='PERSON', name='Bob Smith', properties=None), relation=Relation(type='HAS', properties=None), object=Entity(type='PERSON', name='Alice Johnson', properties=None)), Triplet(subject=Entity(type='PLACE', name='Vienna', properties=None), relation=Relation(type='PART_OF', properties=None), object=Entity(type='PLACE', name='Austria', properties=None)), Triplet(subject=Entity(type='THING', name='Acme Corp', properties=None), relation=Relation(type='PART_OF', properties=None), object=Entity(type='THING', name='Tech Industry', properties=None)), Triplet(subject=Entity(type='THING', name='Acme Corp', properties=None), relation=Relation(type='HAS', properties=None), object=Entity(type='THING', name='founded', properties=None)), Triplet(subject=Entity(type='THING', name='Acme Corp', properties=None), relation=Relation(type='HAS', properties=None), object=Entity(type='THING', name='since', properties=None))]

extent analysis

Fix Plan

To extract properties for entities and relations, you need to modify the extraction prompt to include the properties.

Here are the steps:

  • Update the extract_prompt to include properties in the prompt.
  • Set allow_additional_properties to True in the SchemaLLMPathExtractor to allow properties that are not defined in the schema.

Code Changes

extract_prompt = PromptTemplate(
    "Extract a property graph from the text.\n"
    "Return a list of (subject, relation, object) triples.\n"
    "Include entity and relation properties if present, such as age, role, since, type.\n"
    "Use only the provided schema.\n"
    "{text}\n"
    "-------\n"
)

kg_extractor = SchemaLLMPathExtractor(
    llm=llm,
    possible_entities=entities,
    possible_relations=relations,
    kg_validation_schema=schema,
    possible_entity_props=entity_props,
    possible_relation_props=relation_props,
    extract_prompt=extract_prompt,
    strict=True,
    max_triplets_per_chunk=10,
    allow_additional_properties=True,  # Set to True
)

Verification

After making these changes, you should see properties included in the extracted triples. You can verify this by printing the triplets variable after extraction.

Extra Tips

  • Make sure the extract_prompt is clear and concise to get accurate results from the LLM.
  • You can fine-tune the extract_prompt to get the desired output.
  • If you're still having issues, try printing the kg_schema to see the extracted triples before pruning invalid ones.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING