Ontology building

更新时间:
复制 MD 格式

This document introduces the ONTOLOGY ontology building feature of the knowledge platform, covering the ontology modeling workflow and detailed steps for LLM modeling.

Overview

The ONTOLOGY ontology building module is a core component of the knowledge platform that models enterprise data into a semantic knowledge graph. Through ontology modeling, you can define semantic relationships within your data to build a visual, queryable, and inferable knowledge network. This module includes the following features:

  • Ontology Graph: A visual graph of Objects and Links that supports interactive exploration and multi-hop traversal.

  • Type Definitions: Manages the type metadata for your ontology, including definitions for Object Types, Link Types, and Action Types. It supports property constraints in the JSON Schema format.

  • Instance management: Manages the data instances in the Ontology Graph. This includes creating, retrieving, updating, and deleting Objects and relationships, and supports bulk operations and data retrieval.

  • Data management (ontology building): Provides modeling tools to automatically generate ontology definitions from data sources. It supports LLM modeling, quick modeling, schema change detection, and data synchronization.

  • Version control: Tracks the change history of ontology definitions, allowing you to compare versions and roll back to any previous version.

  • Permission management: Provides fine-grained access control (FGAC) to support data isolation at both the Object and property levels.

Ontology modeling workflow

The complete ontology modeling workflow includes the following steps:

  1. Create a dataset

    On the ONTOLOGY overview page, click + New in the upper-right corner to create a new dataset. A dataset is a container for an ontology that organizes and manages related ontology definitions.

  2. Generate ontology definitions

    Go to the Data Management page and choose a modeling method to generate ontology definitions:

    • LLM modeling (Recommended): Uses a large language model (LLM) to automatically analyze a database schema and intelligently generate ontology definitions. This method is ideal for modeling complex data structures. For more information, see LLM modeling.

    • Quick modeling: Generates ontology definitions by directly mapping the database table structure, where each table corresponds to an Object Type. This method is suitable for simple data structures with straightforward mappings.

  3. Review and refine type definitions

    Go to the Type Definitions page to review the automatically generated Object Types, Link Types, and Action Types. Edit their names, properties, and descriptions to align the definitions with your business requirements.

  4. Synchronize data

    On the Data Management > Data Synchronization page, configure a data synchronization task to import data from the source database into the knowledge graph based on your ontology definitions.

  5. Explore and validate the graph

    Go to the Ontology Graph page to visually inspect the Objects and relationships and verify that the modeling results meet expectations. This page supports multi-hop traversal and interactive exploration.

  6. Continuously iterate

    When the source data schema changes, use the Data Management > Schema Change Detection feature to detect the differences and update the ontology definitions and data.

LLM modeling

The LLM modeling wizard guides you through three steps to automatically generate and register ontology definitions from a database schema.

Step 1: Configure connection

Configure the data source connection and LLM analysis parameters. You can connect to your database in one of three ways:

Connection method

Description

Use case

Project default instance

Uses the PolarDB for PostgreSQL cluster configured in the backend. You only need to select the Source Database Name and Schema.

The data source and the platform are in the same PolarDB for PostgreSQL cluster.

Database connection parameters

Manually enter the Host, Port, Database, Username, and Password.

The data source is an external, standalone database.

DSN connection string

Enter a standard PostgreSQL connection string (postgresql://user:pass@host:port/db).

You already have a connection string.

Note

When using the Database Connection Parameters or DSN connection string method, you must first click Test Connection to verify connectivity. You can select a schema after the test passes.

Parameters

  • Schema: The database schema to analyze. The default is public. The system automatically lists available schemas for you to select.

  • Business Context (Optional): A natural language description of your business domain. This context helps the LLM more accurately understand the business meaning of your table structures.

  • Output Language: The language (Chinese or English) used for the generated display_name and description.

  • Generate ActionType: Specifies whether the LLM should recommend executable business operations for each Object.

  • Advanced options:

    • Exclude Tables: A comma-separated list of table name patterns (wildcards supported) to exclude from modeling.

    • Custom LLM Configuration: Lets you specify a custom LLM model name, API key, and base URL.

    • Wide Table Entity Extraction: Splits fields in a wide table into separate entities, generating independent Object Types.

    • Analysis Timeout: The timeout for the LLM analysis, which defaults to 5 minutes. You can extend this time for complex schemas.

After configuring the settings, click Start Analysis to begin modeling. The system then connects to the database, extracts table and column metadata and sample data, and calls the LLM to analyze this data and generate type definitions.

Note

LLM modeling is intended for initial modeling only. If the current dataset already has ontology definitions, the system prompts you to use the Data Management > Schema Change Detection feature to perform incremental updates.

Step 2: Preview and refine

After the LLM analysis is complete, the system opens a preview page displaying all the generated type definitions.

  • View modes:

    • List View: Displays Object Types, Link Types, and Action Types as cards organized in tabs.

    • Graph View: Shows the relationship structure between types in a visual graph.

  • Editing operations:

    • Edit: Click the edit button on a card to directly modify the type definition in the JSON editor.

    • Delete: Supports cascade deletion. When you delete an Object Type, any Link Types and Action Types that reference it are automatically removed.

    • Edit from graph view: You can also select and edit nodes or edges directly in the graph view.

Click Compile and Check to validate the integrity of the current definitions. The checks include type name uniqueness, the presence of an id property, and the validity of Link Type references. The validation process supports an Auto-fix feature that can resolve common issues with a single click.

Step 3: Register and synchronize

After confirming the ontology definitions are correct, register them and synchronize the data.

Register definitions

  1. Before registration, the system validates the definitions to ensure their format and references are correct.

  2. After the validation passes, click Register to bulk import the ontology definitions into the system. This process creates the Object Types, Link Types, and Action Types, along with their corresponding vertex labels and edge labels in the graph database.

Data synchronization

After registration, the system automatically starts a data synchronization task to import data from the source database into the knowledge graph based on the ontology definitions. Data synchronization uses MERGE (UPSERT) semantics:

  • If a target instance does not exist, the system creates it (INSERT).

  • If a target instance already exists, the system updates it (UPDATE).

The system uses IDs to prevent duplicate data. For small datasets, the process is synchronous. For large datasets, the process runs as an asynchronous background task.

Note

During data synchronization, ensure that the connection to the source database is stable and that its data tables are accessible. After synchronization is complete, you can view the imported Objects and Links on the Ontology Graph page.

Next steps

After building the ontology, you can perform the following operations:

  • Graph exploration: On the Ontology Graph page, browse Objects and Links through a visual interface that supports multi-hop traversal and conditional filtering.

  • Instance management: On the Instance Management page, view, edit, and delete specific data instances. Bulk operations are supported.

  • Version control: On the Version Control page, view the change history of ontology definitions. Version comparison and rollbacks are supported.

  • Permission management: On the Permission Management page, configure fine-grained access control to achieve data isolation at the Object and property levels.

  • Schema change detection: When your data source schema changes, use the Data Management > Schema Change Detection feature to automatically identify differences and update your ontology definitions.