Graph RAG Laravel Dev Setup - Mint Starter Kit

Setting up a streamlined local development environment is crucial for building and testing this system efficiently. Let’s get this documented.

The Core Question: `docling` CLI vs. API

You’ve hit on a key architectural point. While having docling as a CLI is great for one-off tests, for an application integration, you absolutely want a containerized API version. Why you need the API version:

Decoupling: Your Laravel application should not depend on a specific executable being in a specific path on the host machine. It should communicate with a service over a network protocol (HTTP). This makes your app portable and mirrors a production setup.
Process Management: Calling a CLI from a PHP queue worker (shell_exec or Process) is complex to manage. You have to handle stdout/stderr, process timeouts, and potential hangs. An HTTP API is far more robust and predictable.
Concurrency: A dedicated API service can handle multiple concurrent requests from your queue workers far more gracefully than spawning multiple CLI processes.
State & Caching: The API version can maintain state or caches (like loaded models) in memory, making subsequent calls much faster. The CLI has to bootstrap from scratch every single time.

Conclusion: We will proceed with the setup assuming docling runs as a containerized HTTP service.

Minimal Development Environment Topology

On a single machine, your services will run in their own processes/containers but communicate over localhost. This diagram shows the logical flow.

+-------------------------------------------------------------------------+
|                        Your Development Machine (localhost)             |
|                                                                         |
|  +--------------------+  (HTTP:8000)  +---------------------------+     |
|  |   Your Browser     | <-----------> |  Laravel Dev Server       |     |
|  +--------------------+               |  (php artisan serve)      |     |
|                                       |                           |     |
|                                       |  (Dispatches Job)         |     |
|                                       +-------------+-------------+     |
|                                                     |                   |
|                                                     v (Pushes to queue) |
|      +-------------------------+          +---------+---------+         |
|      | Laravel Queue Worker    | <------->|  Redis / DB Queue |         |
|      | (php artisan queue:work)|(Pulls Job)+-------------------+         |
|      +-----------+-------------+                                        |
|                  |                                                      |
|  (1. Calls API)  |                                                      |
|                  v                                                      |
|      +-----------+-------------+        (2. Writes Graph Data)          |
|      | Docling API Container   |                                        |
|      | (Docker, Port 8001)     |-------------------------------------> +-------------------+
|      +-------------------------+                                        | Neo4j             |
|                                                                         | (Bolt: 7687)      |
|                                                                         +-------------------+
|                                                                         |
|                                                                         |
|      (Your existing running services, which Laravel connects to)        |
|      +-------------------------+        +--------------------------+    |
|      | MongoDB                 |        | Minio                    |    |
|      | (Port 27017)            |        | (Ports 9000, 9001)       |    |
|      +-------------------------+        +--------------------------+    |
|                                                                         |
+-------------------------------------------------------------------------+
      |
      | (Outbound API Call)
      v
+-------------------+
|   Google Gemini   |
|   API Endpoint    |
+-------------------+

Setting Up The Development Environment

Here is a step-by-step guide to integrate the new AI components into your existing setup.

Prerequisites

Your existing Laravel DMS, MongoDB, and Minio are running.
Neo4j is running and accessible (we’ll assume on localhost).
Docker and Docker Compose are installed on your machine.

Step 1: Run `docling` as an API Service

We will use Docker Compose to define and run the docling service. This is clean and easily manageable.

Create a docker-compose.yml file in the root of your Laravel project (or a dedicated folder).
Add the docling service definition to this file. The official docling documentation should specify the image name. We’ll use a placeholder ghcr.io/docling-ai/docling:latest.

# docker-compose.yml
version: '3.8'

services:
  docling:
    # Replace with the official docling image if different
    image: ghcr.io/docling-ai/docling:latest
    container_name: docling_api
    ports:
      - "8001:8001" # Expose the service on localhost:8001
    # Add any required environment variables for docling here, if any.
    # environment:
    #   - MODEL_CACHE=/models
    # volumes:
    #   - ./models:/models
    restart: unless-stopped

Start the service: Open your terminal in the same directory as the docker-compose.yml file and run:

docker-compose up -d

Verify it’s running: You can check the logs with docker-compose logs -f docling. After a minute, test the API endpoint (the exact path may vary, check docling docs). A simple health-check endpoint is common.

# Example test command, adjust endpoint as needed
curl http://localhost:8001/health

Step 2: Configure Your Laravel Application

Now, we need to tell Laravel how to connect to all these new services.

Install Required PHP Libraries:

# For connecting to Neo4j
composer require laudis/neo4j-php-client

# For the Gemini API via Prism (as per your plan)
composer require prism-php/gemini

Update Your .env file: Add the connection details for the new services.

# .env

# Queue Configuration (Redis is recommended for local dev)
QUEUE_CONNECTION=redis
# If you don't have Redis, you can use `database` for testing,
# but run `php artisan queue:table` and `php artisan migrate` first.

# Neo4j Connection Details
NEO4J_HOST=localhost
NEO4J_PORT=7687
NEO4J_USER=neo4j
NEO4J_PASSWORD=your_neo4j_password

# Docling Service URL
DOCLING_API_URL=http://localhost:8001/process

# Gemini API Key
GEMINI_API_KEY=your_google_ai_studio_api_key

(Optional but Recommended) Create a Config File: To keep things clean, create a config file for your services.

config/services.php

// config/services.php
return [
    // ... other services

    'neo4j' => [
        'host' => env('NEO4J_HOST', 'localhost'),
        'scheme' => 'bolt',
        'port' => env('NEO4J_PORT', 7687),
        'user' => env('NEO4J_USER', 'neo4j'),
        'password' => env('NEO4J_PASSWORD', 'password'),
    ],

    'docling' => [
        'url' => env('DOCLING_API_URL'),
    ],

    'gemini' => [
        'key' => env('GEMINI_API_KEY'),
    ],
];

Step 3: Set Up and Run the Laravel Queue Worker

The worker is the background process that will do all the heavy lifting.

Open a new terminal window and navigate to your Laravel project root.
Run the queue worker: This command starts a worker that will listen for jobs on the queue you configured in .env.

php artisan queue:work

Keep this terminal window open. You will see output here when jobs are processed.

Step 4: Implement the Core Logic (Artisan Commands & Jobs)

Now you can start building the pieces that connect everything.

Create the Main Job: This job will orchestrate the call to docling and then write to Neo4j.

php artisan make:job ProcessDocumentForGraph

app/Jobs/ProcessDocumentForGraph.php

<?php

namespace App\Jobs;

use App\Models\Document; // Your Document model
use Illuminate\Bus\Queueable;
use Illuminate\Contracts\Queue\ShouldQueue;
use Illuminate\Foundation\Bus\Dispatchable;
use Illuminate\Queue\InteractsWithQueue;
use Illuminate\Queue\SerializesModels;
use Illuminate\Support\Facades\Http;
use Illuminate\Support\Facades\Log;
use Laudis\Neo4j\ClientBuilder;

class ProcessDocumentForGraph implements ShouldQueue
{
    use Dispatchable, InteractsWithQueue, Queueable, SerializesModels;

    public function __construct(public Document $document) {}

    public function handle(): void
    {
        Log::info("Processing document for graph: {$this->document->id}");

        // 1. Get a temp URL from Minio for the document
        $fileUrl = $this->document->getTemporaryUrl(); // Implement this method in your model

        // 2. Call Docling API
        $doclingUrl = config('services.docling.url');
        $response = Http::timeout(300)->post($doclingUrl, [
            'url' => $fileUrl,
            'document_id' => $this->document->id,
        ]);

        if ($response->failed()) {
            Log::error("Docling processing failed for document {$this->document->id}", $response->json());
            $this->fail(); // Mark the job as failed
            return;
        }

        $graphData = $response->json();
        Log::info("Docling response received.", $graphData);


        // 3. Write to Neo4j
        $this->writeToNeo4j($graphData);

        // 4. Update document status in MongoDB
        $this->document->update(['graph_status' => 'processed']);
        Log::info("Successfully processed and stored graph for document: {$this->document->id}");
    }

    private function writeToNeo4j(array $data)
    {
        $client = ClientBuilder::create()
            ->withDriver('default', config('services.neo4j.scheme').'://'.config('services.neo4j.user').':'.config('services.neo4j.password').'@'.config('services.neo4j.host').':'.config('services.neo4j.port'))
            ->build();

        // Example: Create nodes and relationships. This needs to be robust.
        $cypher = "MERGE (d:Document {mongo_id: \$docId}) ";
        foreach ($data['entities'] as $entity) {
            // IMPORTANT: Sanitize labels and properties
            $label = preg_replace('/[^a-zA-Z0-9_]/', '', $entity['label']);
            $cypher .= "MERGE (e:{$label} {name: \$entity_name_{$entity['id']}}) ";
            $cypher .= "MERGE (d)-[:CONTAINS_ENTITY]->(e) ";
        }

        // This is a simplified example. You'll need a more dynamic query builder.
        // ... build the rest of your query dynamically ...

        // In a real app, you would pass parameters instead of building a huge string
        // $client->run($cypher, $parameters);

        Log::info("Executing Cypher query for document {$this->document->id}");
        // For now, this is just a placeholder for the actual logic.
    }
}

Create the Artisan Command for Backfilling:

php artisan make:command ProcessArchiveForGraph

app/Console/Commands/ProcessArchiveForGraph.php

<?php

namespace App\Console\Commands;

use App\Jobs\ProcessDocumentForGraph;
use App\Models\Document;
use Illuminate\Console\Command;

class ProcessArchiveForGraph extends Command
{
    protected $signature = 'docs:process-archive {--limit=100}';
    protected $description = 'Dispatch jobs to process archived documents and build the knowledge graph.';

    public function handle()
    {
        $this->info("Fetching documents to process...");

        Document::query()
            ->where('graph_status', '!=', 'processed') // Or whatever your criteria is
            ->limit($this->option('limit'))
            ->get()
            ->each(function (Document $document) {
                $this->line("Dispatching job for document: {$document->id}");
                ProcessDocumentForGraph::dispatch($document);
            });

        $this->info("All jobs have been dispatched.");
        return 0;
    }
}

With this setup, your development workflow is:

Run docker-compose up -d once to start docling.
Start your Laravel dev server (php artisan serve).
Start your queue worker in another terminal (php artisan queue:work).
Run php artisan docs:process-archive to kick off the ingestion process.
Watch the queue worker terminal for job processing logs.
Query Neo4j Browser (http://localhost:7474) to see your graph being built.

Docling

GraphRAG

​The Core Question: docling CLI vs. API

​Minimal Development Environment Topology

​Setting Up The Development Environment

​Prerequisites

​Step 1: Run docling as an API Service

​Step 2: Configure Your Laravel Application

​Step 3: Set Up and Run the Laravel Queue Worker

​Step 4: Implement the Core Logic (Artisan Commands & Jobs)