ExperionCrawler/dxf-graph/Concept-P&ID Graph Pipeline.md at 960bda4a3cd19cb7b447bbfea3537a4a5f3ba1dc

Files

windpacer e923aab43b opencode 로 바꾸고 작업전 커밋

2026-05-08 17:22:10 +09:00

5.0 KiB

Raw Blame History

✔ 🎯 End-to-End P&ID Graph Pipeline (실전 구조)

            ┌──────────────────────┐
            │   P&ID PDF Input     │
            └─────────┬────────────┘
                      ↓
    ┌─────────────────────────────────┐
    │  1. Document Parsing Layer      │
    │  (layout + text + tables)      │
    └─────────┬──────────────────────┘
              ↓
    ┌─────────────────────────────────┐
    │  2. Spatial Element Extraction  │
    │  (symbols + coordinates)        │
    └─────────┬──────────────────────┘
              ↓
    ┌─────────────────────────────────┐
    │  3. Entity Extraction (LLM)     │
    │  FIC-101, Pump-01, Valve...    │
    └─────────┬──────────────────────┘
              ↓
    ┌─────────────────────────────────┐
    │  4. Relationship Inference      │
    │  (rules + LLM hybrid)          │
    └─────────┬──────────────────────┘
              ↓
    ┌─────────────────────────────────┐
    │  5. Graph Builder               │
    │  nodes + edges                 │
    └─────────┬──────────────────────┘
              ↓
    ┌─────────────────────────────────┐
    │  6. DB Integration Layer        │
    │  (existing OPC + SQL system)    │
    └─────────────────────────────────┘



    ✔ 1️⃣ Document Parsing Layer (PDF → 구조화)

기술

Unstructured

역할 텍스트 추출 표 추출 블록 segmentation page coordinate 유지 출력 예시

{ "page": 12, "elements": [ { "text": "FIC-101", "bbox": [120, 300, 160, 320] } ] }

👉 핵심: 좌표 반드시 유지

✔ 2️⃣ Spatial Element Extraction (핵심 단계)

여기서 P&ID가 살아난다.

해야 할 것 symbol detection line detection proximity mapping 결과 JSON { "FIC-101": { "x": 120, "y": 300 }, "FT-101": { "x": 110, "y": 220 }, "Valve-203": { "x": 300, "y": 310 } }

✔ 3️⃣ Entity Extraction (LLM) 모델

Qwen (via vLLM)

입력 Extract all P&ID entities:

controller
sensor
valve
pump 출력 [ {"name":"FIC-101","type":"controller"}, {"name":"FT-101","type":"sensor"}, {"name":"Valve-203","type":"valve"} ]

✔ 4️⃣ Relationship Inference (가장 중요)

여기서 2개를 섞는다:

A. Rule Engine (필수) if sensor near controller: relation = "signal"

if controller connected to valve: relation = "control"

if pump → tank: relation = "flow"

B. LLM 보조 판단 Determine relationship based on P&ID context: Entities + coordinates 최종 출력 [ { "source": "FT-101", "target": "FIC-101", "relation": "signal", "confidence": 0.93 }, { "source": "FIC-101", "target": "Valve-203", "relation": "control", "confidence": 0.91 } ] ✔ 5️⃣ Graph Builder nodes = set() edges = []

for r in relations: nodes.add(r["source"]) nodes.add(r["target"]) edges.append(r) Graph 구조 { "nodes": [...], "edges": [...] } ✔ 6️⃣ DB Integration (너 기존 시스템)

여기서 너 강점:

OPC UA 있음 ✔ NL2SQL 있음 ✔ DB 있음 ✔ 저장 전략 P&ID Graph DB ↓ Mapping Layer ↓ OPC Tags / SQL semantic layer

✔ 🔥 핵심 설계 포인트 (진짜 중요) ❗ 1. LLM은 “결정자”가 아니다

👉 “해석 보조” 역할

❗ 2. 좌표가 핵심이다

P&ID는 텍스트가 아니라:

공간 + 연결 구조

❗ 3. Rule Engine 없으면 실패한다

LLM 단독 = hallucination

✔ 📌 최종 아키텍처 (완성형) P&ID PDF ↓ Unstructured Parser ↓ Spatial Engine ↓ Qwen (Entity Extraction) ↓ Rule Engine (relationship) ↓ Graph Builder ↓ DB + OPC Integration ✔ 🚀 이 구조의 의미

이건 단순 기능이 아니라:

👉 “Engineering Knowledge Graph 생성 엔진”

✔ 다음 단계 (진짜 실전)

원하면 여기서 바로 이어서:

1️⃣ P&ID 실제 PDF 기준 코드 2️⃣ Rule engine 상세 설계 (ISA 기반) 3️⃣ Qwen prompt 최적화 (산업용) 4️⃣ Graph DB vs SQL 구조 선택

5.0 KiB Raw Blame History Unescape Escape

5.0 KiB

Raw Blame History