Skip to content

Database Schema

Open Assistant uses SQLite (or PostgreSQL in production) to persist conversations, task history, credentials, and system state.

Entity Relationship Diagram

erDiagram
    conversations ||--o{ messages : contains
    conversations ||--o{ agent_tasks : triggers
    conversations ||--o{ conversation_memory : stores
    conversations ||--o{ future_tasks : schedules
    cron_jobs ||--o{ job_executions : has
    future_tasks ||--o{ job_executions : has
    agent_definitions ||--o{ agent_tasks : defines

    conversations {
        int id PK
        text conversation_id UK
        text channel
        text contact_identifier
        int context_version
        timestamp last_accessed
        timestamp created_at
        timestamp updated_at
        json metadata
    }

    messages {
        int id PK
        text conversation_id FK
        text message_id UK
        text role
        text content
        int token_count
        boolean is_summary
        timestamp timestamp
        json metadata
    }

    agent_definitions {
        int id PK
        text name UK
        text display_name
        text role
        text goal
        text backstory
        json tools
        int priority
        json intent_keywords
        text category
        boolean enabled
        boolean allow_delegation
        timestamp created_at
        timestamp updated_at
    }

    agent_tasks {
        int id PK
        text task_id UK
        text conversation_id FK
        text agent_name
        text action
        json parameters
        text status
        json result
        text error_message
        timestamp created_at
        timestamp started_at
        timestamp completed_at
    }

    cron_jobs {
        int id PK
        text job_id UK
        text name
        text description
        text cron_expression
        text job_type
        text tool_name
        json tool_parameters
        text prompt
        boolean enabled
        timestamp last_run_at
        timestamp next_run_at
        timestamp created_at
        timestamp updated_at
    }

    future_tasks {
        int id PK
        text task_id UK
        text name
        text description
        text conversation_id FK
        timestamp scheduled_time
        text job_type
        text tool_name
        json tool_parameters
        text prompt
        text status
        json result
        text error_message
        timestamp created_at
        timestamp completed_at
    }

    job_executions {
        int id PK
        text job_id FK
        text job_type
        timestamp started_at
        timestamp completed_at
        text status
        json result
        text error_message
        text container_id
    }

    service_credentials {
        int id PK
        text service_name UK
        text credential_type
        text credential_data
        timestamp expires_at
        timestamp created_at
        timestamp updated_at
    }

    service_connections {
        int id PK
        text service_name UK
        text status
        timestamp last_check
        text last_error
        json metadata
        timestamp updated_at
    }

    prompts {
        int id PK
        text key UK
        text value
        text description
        timestamp created_at
        timestamp updated_at
    }

    settings {
        int id PK
        text key UK
        text value
        text value_type
        text description
        text category
        boolean is_required
        boolean is_sensitive
        text validation_regex
        real min_value
        real max_value
        json options
        int display_order
        timestamp updated_at
    }

    audit_log {
        int id PK
        timestamp timestamp
        text event_type
        text service_name
        text agent_name
        text action
        text conversation_id
        json details
        boolean success
        text error_message
        text user_id
        text ip_address
    }

    conversation_memory {
        int id PK
        text conversation_id FK
        text memory_type
        json content
        timestamp created_at
    }

    search_index {
        int id PK
        text source
        text source_id
        text title
        text content
        text content_hash
        blob embedding
        json metadata
        timestamp indexed_at
    }

    schema_migrations {
        int id PK
        text version UK
        timestamp applied_at
    }

Schema Overview

Core Tables

1. Conversations

Stores conversation sessions across all communication channels.

Purpose: Track conversation context and history per channel Key Fields: - conversation_id: Unique identifier for the conversation - channel: Communication channel (whatsapp, webui) - contact_identifier: Phone number or session ID - context_version: Context management version - last_accessed: Last activity timestamp

Relationships: - One-to-many with messages - One-to-many with agent_tasks - One-to-many with conversation_memory

2. Messages

Individual messages within conversations.

Purpose: Store complete conversation history with role-based messages Key Fields: - role: Message source (user, assistant, system) - content: Message text - token_count: Number of tokens in the message - is_summary: Whether this message is a generated summary - metadata: Attachments, agent info, etc.

Relationships: - Many-to-one with conversations

3. Agent Definitions

Stores CrewAI agent configurations.

Purpose: Define agents with their roles, goals, and tool assignments Key Fields: - name: Unique agent identifier (coordinator, research, communication, writer, file_handler, planner, navigator, system, browser) - role: Agent role description - goal: What the agent aims to achieve - backstory: System prompt / detailed instructions - tools: JSON array of assigned tool names - priority: Selection priority (1-10, higher = selected first) - intent_keywords: JSON array of keywords for skill matching - category: Agent category - allow_delegation: Whether agent can delegate to others

Relationships: - One-to-many with agent_tasks

Default Agents: - system - System introspection and memory management - navigator - Geographic & route planning - file_handler - File management across Nextcloud/OneDrive - writer - Content writing - browser - Interactive web browsing - planner - Calendar and scheduling - communication - Email and messaging - research - Information retrieval and search - coordinator - Task coordination and delegation

See Agent Architecture for detailed agent definitions.

4. Agent Tasks

Tracks tasks executed by specialized agents.

Purpose: Monitor agent execution, results, and errors Key Fields: - agent_name: Which agent handled the task - action: What action was performed - status: pending, running, completed, failed - result: Task output data

Relationships: - Many-to-one with conversations (nullable) - Many-to-one with agent_definitions

Scheduling Tables

5. Cron Jobs

Scheduled recurring tasks with Docker-isolated execution.

Purpose: Define and manage scheduled tasks (tool calls or prompts) Key Fields: - cron_expression: Cron schedule (e.g., "0 9 * * MON") - job_type: 'tool' for direct execution, 'prompt' for agent processing - tool_name: Tool to execute (if job_type='tool') - tool_parameters: JSON parameters for tool - prompt: Prompt to send to coordinator (if job_type='prompt') - enabled: Active/inactive status - next_run_at: Calculated next execution time

Relationships: - One-to-many with job_executions

6. Future Tasks

One-time scheduled tasks.

Purpose: Schedule tasks to execute once at a specific time Key Fields: - name: Human-readable task name - description: Task description - conversation_id: Associated conversation (nullable) - scheduled_time: When to execute - job_type: 'tool' or 'prompt' - status: pending, completed, failed, cancelled - Same tool_name/tool_parameters/prompt fields as cron_jobs

Relationships: - Many-to-one with conversations (nullable) - One-to-many with job_executions

7. Job Executions

Unified execution history for both cron jobs and future tasks.

Purpose: Track execution history for monitoring and debugging Key Fields: - job_id: References cron_jobs.job_id or future_tasks.task_id - job_type: 'cron' or 'future_task' - status: running, success, failed - container_id: Docker container ID (for isolated execution) - result: Execution output - error_message: Failure details

Relationships: - Many-to-one with cron_jobs

Service Integration Tables

8. Service Credentials

Encrypted storage for OAuth tokens and API keys.

Purpose: Securely store authentication credentials Key Fields: - service_name: gmail, outlook, notion, etc. - credential_type: oauth_token, api_key, app_password - credential_data: Encrypted JSON blob - expires_at: Token expiration timestamp

Security: All credential_data is encrypted using Fernet symmetric encryption

9. Service Connections

Track connection status of integrated services.

Purpose: Monitor service health and connectivity Key Fields: - service_name: Service identifier (unique) - status: connected, disconnected, error - last_check: Last health check timestamp - last_error: Most recent error message - metadata: Service-specific info (email, username)

Personalization Tables

10. Prompts

Stores assistant personalization configuration: system prompt, memory, and soul.

Purpose: Enable users to customize the assistant's behavior, knowledge, and personality Key Fields: - key: Prompt identifier (system_prompt_default, system_prompt_custom, memory, soul) - value: Prompt text content - description: Explanation of the prompt's purpose

Default Entries: - system_prompt_default: Base system prompt defining core assistant behavior - system_prompt_custom: User-defined custom instructions added on top of the default - memory: Text-based store for user context (name, preferences, people, places, relations) - soul: Personality and communication style description, shaped by user feedback

System Tables

11. Settings

Application-wide configuration and user preferences.

Purpose: Store dynamic configuration values Key Fields: - key: Setting identifier - value: Setting value (stored as string) - value_type: Type hint for parsing (string, int, bool, json) - category: Setting category (default: 'application') - is_required: Whether the setting is required - is_sensitive: Whether the value is sensitive (e.g., API keys) - validation_regex: Optional regex for value validation - min_value/max_value: Numeric validation bounds - options: JSON array of valid values (for enum-like settings) - display_order: UI ordering hint

12. Audit Log

Complete audit trail of system operations.

Purpose: Security auditing and debugging Key Fields: - event_type: Type of event (api_call, agent_action, etc.) - success: Operation outcome - details: Event-specific data - user_id: Associated user identifier - ip_address: Client IP address

13. Conversation Memory

Stores conversation context and memory for each conversation.

Purpose: Enable context-aware responses with short-term and long-term memory Key Fields: - memory_type: Type of memory (short_term, long_term, facts, working) - content: JSON containing memory data - created_at: When memory was stored

Relationships: - Many-to-one with conversations

Memory Types: - short_term: Recent messages (last 10-20) - long_term: Conversation summaries - facts: Extracted facts (names, preferences, dates) - working: Current task context

14. Search Index

Stores searchable content with embeddings for semantic search.

Purpose: Enable semantic search across indexed content Key Fields: - source: Source system (e.g., 'memory', 'file', 'email') - source_id: Unique ID within the source system - title: Content title - content: Full text content - content_hash: Hash for deduplication - embedding: Vector embedding (BLOB) - metadata: Additional source-specific data - indexed_at: When content was indexed

Constraints: Unique on (source, source_id)

15. Schema Migrations

Tracks applied database migrations.

Purpose: Prevent re-running migrations on startup Key Fields: - version: Migration version identifier (unique) - applied_at: When the migration was applied

Data Flow Diagram

flowchart TD
    User[User Request] --> Channel{Channel}
    Channel -->|WhatsApp| WA[WhatsApp Handler]
    Channel -->|Web UI| UI[Web UI]

    WA --> Conv[Create/Get Conversation]
    UI --> Conv

    Conv --> Msg[Store Message]
    Msg --> Orch[Orchestrator]

    Orch --> Agent[Select Agent]
    Agent --> Task[Create Agent Task]
    Task --> Execute[Execute Action]

    Execute --> Service[Call External Service]
    Service --> Cred[(Service Credentials)]
    Service --> Conn[(Service Connections)]

    Execute --> Result[Store Result]
    Result --> Response[Create Response Message]
    Response --> Audit[(Audit Log)]

    Response --> User

Indexing Strategy

High-Traffic Queries

  • messages.conversation_id - Frequently queried for conversation history
  • agent_tasks.status - Task monitoring queries
  • conversations.updated_at - Recent conversations listing
  • audit_log.timestamp - Time-based audit queries

Optimization

  • Composite indexes on frequently filtered combinations
  • Foreign key indexes for join performance
  • Timestamp indexes for time-range queries

Data Retention Policies

Table Retention Policy
conversations Indefinite (user-managed)
messages Indefinite (user-managed)
agent_tasks Last 1000 per conversation
cron_job_executions Last 100 per job
audit_log 30 days
service_credentials Until revoked
service_connections Until disconnected
prompts Indefinite (user-managed)
settings Indefinite
conversation_memory Short-term: 20 msgs, Long-term: indefinite
future_tasks 90 days after execution

Encryption

Encrypted Fields

  • service_credentials.credential_data - Fernet symmetric encryption

Encryption Key Management

  • Key stored in environment variable: ENCRYPTION_KEY
  • Generated using: Fernet.generate_key()
  • Never committed to version control

Backup Strategy

flowchart LR
    DB[(SQLite DB)] -->|Daily| Backup[Automated Backup]
    Backup --> S3[Backup Storage]
    DB -->|Manual| ManualBackup[Manual Backup]
    ManualBackup --> Local[Local Storage]

Automated Backups: - Frequency: Daily at 3 AM - Retention: Last 7 daily, 4 weekly, 3 monthly - Location: data/backups/

Manual Backups:

python -m src.core.database backup

Database Files

  • data/assistant.db - Main application database
  • data/jobs.db - APScheduler job store (managed by APScheduler)
  • data/backups/ - Database backups directory

Migration Management

Migrations are located in src/core/migrations/ and applied automatically on startup.

Migration Format: {version}_{description}.sql

Example: 043_oss_db_init.sql

Applying Migrations:

python -m src.core.database init

Migration Tracking: schema_migrations table tracks applied migrations

Performance Considerations

SQLite Limitations

  • Single writer at a time
  • Not suitable for high-concurrency writes
  • Consider PostgreSQL for production

Optimization Tips

  • Use connection pooling
  • Enable WAL mode for better concurrency
  • Regular VACUUM operations
  • Monitor database size

Migration to PostgreSQL

The schema is designed to be compatible with PostgreSQL. Migration path: 1. Export data from SQLite 2. Update DATABASE_URL to PostgreSQL connection string 3. Run migrations on PostgreSQL 4. Import data

Schema Versioning

Schema changes are tracked through: - Migration files in src/core/migrations/ - schema_migrations table entries - Documentation updates in this file