The Core Question: docling CLI vs. API
You’ve hit on a key architectural point. While having docling as a CLI is great for one-off tests, for an application integration, you absolutely want a containerized API version.
Why you need the API version:
- Decoupling: Your Laravel application should not depend on a specific executable being in a specific path on the host machine. It should communicate with a service over a network protocol (HTTP). This makes your app portable and mirrors a production setup.
- Process Management: Calling a CLI from a PHP queue worker (
shell_execorProcess) is complex to manage. You have to handle stdout/stderr, process timeouts, and potential hangs. An HTTP API is far more robust and predictable. - Concurrency: A dedicated API service can handle multiple concurrent requests from your queue workers far more gracefully than spawning multiple CLI processes.
- State & Caching: The API version can maintain state or caches (like loaded models) in memory, making subsequent calls much faster. The CLI has to bootstrap from scratch every single time.
docling runs as a containerized HTTP service.
Minimal Development Environment Topology
On a single machine, your services will run in their own processes/containers but communicate overlocalhost. This diagram shows the logical flow.
Setting Up The Development Environment
Here is a step-by-step guide to integrate the new AI components into your existing setup.Prerequisites
- Your existing Laravel DMS, MongoDB, and Minio are running.
- Neo4j is running and accessible (we’ll assume on
localhost). - Docker and Docker Compose are installed on your machine.
Step 1: Run docling as an API Service
We will use Docker Compose to define and run the docling service. This is clean and easily manageable.
-
Create a
docker-compose.ymlfile in the root of your Laravel project (or a dedicated folder). -
Add the
doclingservice definition to this file. The officialdoclingdocumentation should specify the image name. We’ll use a placeholderghcr.io/docling-ai/docling:latest.
- Start the service: Open your terminal in the same directory as the
docker-compose.ymlfile and run:
- Verify it’s running: You can check the logs with
docker-compose logs -f docling. After a minute, test the API endpoint (the exact path may vary, checkdoclingdocs). A simple health-check endpoint is common.
Step 2: Configure Your Laravel Application
Now, we need to tell Laravel how to connect to all these new services.- Install Required PHP Libraries:
- Update Your
.envfile: Add the connection details for the new services.
- (Optional but Recommended) Create a Config File: To keep things clean, create a config file for your services.
config/services.php
Step 3: Set Up and Run the Laravel Queue Worker
The worker is the background process that will do all the heavy lifting.- Open a new terminal window and navigate to your Laravel project root.
- Run the queue worker: This command starts a worker that will listen for jobs on the queue you configured in
.env.
Step 4: Implement the Core Logic (Artisan Commands & Jobs)
Now you can start building the pieces that connect everything.- Create the Main Job: This job will orchestrate the call to
doclingand then write to Neo4j.
app/Jobs/ProcessDocumentForGraph.php
- Create the Artisan Command for Backfilling:
app/Console/Commands/ProcessArchiveForGraph.php
- Run
docker-compose up -donce to startdocling. - Start your Laravel dev server (
php artisan serve). - Start your queue worker in another terminal (
php artisan queue:work). - Run
php artisan docs:process-archiveto kick off the ingestion process. - Watch the queue worker terminal for job processing logs.
- Query Neo4j Browser (
http://localhost:7474) to see your graph being built.
