MCP-서버 리팩토링 후 P&ID 추출 테스트전 다른 기능 확인 후 커밋

2026-05-04 10:35:13 +09:00
parent a0404b1fee
commit 15c17522c8
304 changed files with 5431877 additions and 0 deletions
--- a/mcp-server/worker/nl2sql_worker.py
+++ b/mcp-server/worker/nl2sql_worker.py
@@ -0,0 +1,339 @@
+#!/usr/bin/env python3
+"""NL2SQL 전용 워커 프로세스
+
+Usage: python nl2sql_worker.py <port>
+
+담당 도구:
+  run_sql, query_pv_history, get_tag_metadata, list_drawings, query_with_nl
+
+특징:
+  - PostgreSQL 직접 연결
+  - LLM SQL 생성 + DB 실행 분리
+  - 메모리: ~1GB (SQL 생성용 LLM)
+  - 생명주기: 메인 서버 종료 시까지 유지
+"""
+
+from __future__ import annotations
+import sys
+import os
+
+# mcp-server 디렉토리를 Python 경로에 추가
+sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
+
+import logging
+import asyncio
+from functools import lru_cache
+
+from fastapi import FastAPI, Request
+import uvicorn
+import httpx
+
+# ── 설정 ─────────────────────────────────────────────────────────────────────
+
+DB_CONNECTION_STRING = os.environ.get("DB_CONNECTION_STRING", "postgresql://postgres:postgres@localhost:5432/iiot_platform")
+DB_TIMEOUT = int(os.environ.get("DB_TIMEOUT", "10"))
+
+VLLM_BASE_URL = os.environ.get("VLLM_BASE_URL", "http://localhost:8000/v1")
+VLLM_MODEL = os.environ.get("VLLM_MODEL", "Qwen3.6-27B-FP8")
+
+logging.basicConfig(
+    level=logging.INFO,
+    stream=sys.stderr,
+    format="%(asctime)s [nl2sql_worker] %(levelname)s %(message)s",
+)
+
+app = FastAPI()
+
+# ── DB 연결 풀 ───────────────────────────────────────────────────────────────
+
+def _get_db_connection():
+    import psycopg
+    return psycopg.connect(DB_CONNECTION_STRING, connect_timeout=DB_TIMEOUT)
+
+# ── LLM 클라이언트 ───────────────────────────────────────────────────────────
+
+@lru_cache(maxsize=1)
+def _llm_client():
+    from openai import AsyncOpenAI
+    return AsyncOpenAI(base_url=VLLM_BASE_URL, api_key="dummy")
+
+# DB 스키마 — server.py::_DB_SCHEMA와 동일
+DB_SCHEMA = """
+PostgreSQL 시계열 데이터베이스 스키마
+
+테이블: history_table  (시계열 이력)
+  tagname     TEXT         - 태그명 (모두 소문자, 예: 'ficq-6113.pv') — 대소문자 구분
+  node_id     TEXT         - OPC UA 노드 ID
+  value       TEXT         - 측정값, 수치 연산 시 ::double precision 캐스트 필요
+  recorded_at TIMESTAMPTZ  - 기록 시각(UTC), 스냅샷 주기 약 60초
+
+테이블: realtime_table  (실시간 최신값)
+  tagname     TEXT         - 태그명 (모두 소문자)
+  node_id     TEXT         - OPC UA 노드 ID
+  livevalue   TEXT         - 현재값
+  timestamp   TIMESTAMPTZ  - 최종 갱신 시각
+
+N분 간격 집계 공식 (time_bucket 금지, date_trunc 사용):
+  1분 버킷: date_trunc('minute', recorded_at) AS bucket
+  2분 버킷: to_timestamp(FLOOR(EXTRACT(EPOCH FROM recorded_at)/120)*120) AS bucket
+  5분 버킷: to_timestamp(FLOOR(EXTRACT(EPOCH FROM recorded_at)/300)*300) AS bucket
+  10분 버킷: to_timestamp(FLOOR(EXTRACT(EPOCH FROM recorded_at)/600)*600) AS bucket
+  N분 버킷: to_timestamp(FLOOR(EXTRACT(EPOCH FROM recorded_at)/(N*60))*(N*60)) AS bucket
+
+예시 (2분 간격, 여러 태그):
+  SELECT to_timestamp(FLOOR(EXTRACT(EPOCH FROM recorded_at)/120)*120) AS bucket,
+         tagname, AVG(value::double precision) AS avg_val
+  FROM history_table
+  WHERE tagname IN ('tag1', 'tag2')
+    AND recorded_at >= NOW() - INTERVAL '3 hours'
+  GROUP BY bucket, tagname ORDER BY bucket, tagname
+
+규칙:
+  - SELECT만 허용 (INSERT/UPDATE/DELETE/DROP 등 불가)
+  - tagname은 모두 소문자로 정확히 입력
+  - value 컬럼은 TEXT이므로 집계 시 ::double precision 캐스트 필수
+  - time_bucket 함수 사용 금지 — 위의 to_timestamp/FLOOR/EPOCH 공식 사용
+"""
+
+async def _generate_sql(natural_language: str) -> str:
+    """자연어를 SQL로 변환."""
+    client = _llm_client()
+    
+    system = (
+        "You are a PostgreSQL SQL expert.\n"
+        "Convert the user's question into a SELECT SQL using the schema below.\n"
+        "IMPORTANT rules:\n"
+        "- Use ONLY PostgreSQL syntax. No DATE_FORMAT, no INTERVAL N DAY.\n"
+        "- Time column is 'recorded_at' (TIMESTAMPTZ). Do NOT use 'timestamp'.\n"
+        "- NEVER use time_bucket(). For N-minute buckets use to_timestamp/FLOOR/EPOCH formula.\n"
+        "- INTERVAL rule:\n"
+        "    * If the question specifies an interval (e.g. '2분 간격', '5-minute interval'):\n"
+        "        use: to_timestamp(FLOOR(EXTRACT(EPOCH FROM recorded_at)/(N*60))*(N*60)) AS bucket\n"
+        "        with GROUP BY bucket, tagname and AVG(value::double precision) AS avg_val\n"
+        "    * If NO interval is specified: SELECT recorded_at, tagname, value — NO GROUP BY.\n"
+        "- Current year is 2026. '4월 27일' means 2026-04-27.\n"
+        "- All times in DB are UTC. Korean input is KST (UTC+9). Convert: KST 12:00 = UTC 03:00.\n"
+        "- value column is TEXT; cast with ::double precision only when aggregating.\n"
+        "- All tagnames are lowercase (e.g. 'ficq-6113.pv'). Match exactly.\n"
+        "- PostgreSQL LIKE: dot has no special meaning, no escaping needed.\n"
+        "- Return ONLY the SQL statement. No explanation, no markdown.\n\n"
+        f"{DB_SCHEMA}"
+    )
+    
+    response = await client.chat.completions.create(
+        model=VLLM_MODEL,
+        messages=[
+            {"role": "system", "content": system},
+            {"role": "user", "content": natural_language},
+        ],
+        max_tokens=8192,
+        temperature=0.1,
+    )
+    sql = response.choices[0].message.content.strip()
+    # 마크다운 코드 블록 제거
+    if sql.startswith("```"):
+        lines = sql.splitlines()
+        sql = "\n".join(lines[1:-1] if lines[-1].strip() == "```" else lines[1:]).strip()
+    return sql
+
+# ── NL2SQL 도구 구현 ─────────────────────────────────────────────────────────
+
+@app.get("/health")
+async def health():
+    """워커 헬스체크."""
+    return {"status": "ok"}
+
+@app.post("/execute")
+async def execute(request: Request):
+    """HTTP 요청을 MCP 도구 호출로 변환."""
+    body = await request.json()
+    tool = body["tool"]
+    params = body["params"]
+    
+    try:
+        if tool == "run_sql":
+            result = await _run_sql(**params)
+        elif tool == "query_pv_history":
+            result = await _query_pv_history(**params)
+        elif tool == "get_tag_metadata":
+            result = await _get_tag_metadata(**params)
+        elif tool == "list_drawings":
+            result = await _list_drawings(**params)
+        elif tool == "query_with_nl":
+            result = await _query_with_nl(**params)
+        else:
+            return {"success": False, "error": f"Unknown tool: {tool}"}
+        
+        return result
+    except Exception as e:
+        logging.error(f"Error executing {tool}: {e}")
+        return {"success": False, "error": str(e)}
+
+async def _run_sql(sql: str) -> str:
+    """SQL 실행."""
+    conn = _get_db_connection()
+    try:
+        with conn.cursor() as cur:
+            cur.execute(sql)
+            if cur.description:
+                columns = [desc[0] for desc in cur.description]
+                rows = cur.fetchall()
+                data = [dict(zip(columns, row)) for row in rows]
+                return {
+                    "success": True,
+                    "columns": columns,
+                    "count": len(data),
+                    "data": data,
+                }
+            else:
+                conn.commit()
+                return {
+                    "success": True,
+                    "message": f"Query executed successfully. {cur.rowcount} rows affected.",
+                }
+    finally:
+        conn.close()
+
+async def _query_pv_history(tag_names: list[str], time_from: str, time_to: str, limit: int = 100) -> str:
+    """과거 값(PV) 히스토리 조회."""
+    if not tag_names:
+        return {"success": False, "error": "tag_names is required"}
+    
+    conn = _get_db_connection()
+    try:
+        with conn.cursor() as cur:
+            # TimescaleDB의 time_bucket 함수 사용
+            cur.execute(
+                """
+                SELECT time_bucket('1 min', ts) AS time, tag_name, value
+                FROM realtime_table
+                WHERE tag_name = ANY(%s)
+                  AND ts >= %s
+                  AND ts <= %s
+                ORDER BY time DESC
+                LIMIT %s
+                """,
+                (tag_names, time_from, time_to, limit),
+            )
+            columns = ["time", "tag_name", "value"]
+            rows = cur.fetchall()
+            data = [dict(zip(columns, row)) for row in rows]
+            return {
+                "success": True,
+                "tag_names": tag_names,
+                "time_range": {"from": time_from, "to": time_to},
+                "limit": limit,
+                "count": len(data),
+                "data": data,
+            }
+    finally:
+        conn.close()
+
+async def _get_tag_metadata(query: str, limit: int = 10) -> str:
+    """태그 메타데이터 검색."""
+    conn = _get_db_connection()
+    try:
+        with conn.cursor() as cur:
+            cur.execute(
+                """
+                SELECT DISTINCT tag_name, unit, description
+                FROM realtime_table
+                WHERE tag_name ILIKE %s
+                ORDER BY tag_name
+                LIMIT %s
+                """,
+                (f"%{query}%", limit),
+            )
+            columns = ["tag_name", "unit", "description"]
+            rows = cur.fetchall()
+            data = [dict(zip(columns, row)) for row in rows]
+            return {
+                "success": True,
+                "query": query,
+                "count": len(data),
+                "tags": data,
+            }
+    finally:
+        conn.close()
+
+async def _list_drawings(unit_no: str = None) -> str:
+    """단위별 도면 목록 조회."""
+    conn = _get_db_connection()
+    try:
+        with conn.cursor() as cur:
+            if unit_no:
+                cur.execute(
+                    """
+                    SELECT DISTINCT name
+                    FROM node_map_master
+                    WHERE name LIKE %s
+                    ORDER BY name
+                    """,
+                    (f"{unit_no}%",),
+                )
+            else:
+                cur.execute(
+                    """
+                    SELECT DISTINCT name
+                    FROM node_map_master
+                    ORDER BY name
+                    """
+                )
+            columns = ["name"]
+            rows = cur.fetchall()
+            data = [dict(zip(columns, row[0])) for row in rows]
+            return {
+                "success": True,
+                "unit_no": unit_no,
+                "count": len(data),
+                "names": [d["name"] for d in data],
+            }
+    finally:
+        conn.close()
+
+async def _query_with_nl(question: str) -> str:
+    """자연어로 SQL 쿼리 실행."""
+    import json
+    sql = await _generate_sql(question)
+    
+    # SQL이 비어있으면 오류 반환
+    if not sql:
+        return json.dumps({"success": False, "sql": "", "error": "LLM이 SQL을 생성하지 못했습니다."}, ensure_ascii=False)
+    
+    conn = _get_db_connection()
+    try:
+        with conn.cursor() as cur:
+            cur.execute(sql)
+            if cur.description:
+                columns = [desc[0] for desc in cur.description]
+                rows = cur.fetchall()
+                data = [dict(zip(columns, row)) for row in rows]
+                return {
+                    "success": True,
+                    "sql": sql,
+                    "columns": columns,
+                    "count": len(data),
+                    "data": data,
+                }
+            else:
+                conn.commit()
+                return {
+                    "success": True,
+                    "sql": sql,
+                    "message": f"Query executed successfully. {cur.rowcount} rows affected.",
+                }
+    except Exception as db_error:
+        return {
+            "success": False,
+            "sql": sql,
+            "error": str(db_error),
+        }
+    finally:
+        conn.close()
+
+# ── 메인 ─────────────────────────────────────────────────────────────────────
+
+if __name__ == "__main__":
+    port = int(sys.argv[1]) if len(sys.argv) > 1 else 5003
+    logging.info(f"Starting NL2SQL worker on port {port}")
+    uvicorn.run(app, host="0.0.0.0", port=port)
--- a/mcp-server/worker/pid_worker.py
+++ b/mcp-server/worker/pid_worker.py
@@ -0,0 +1,490 @@
+#!/usr/bin/env python3
+"""P&ID 파싱 전용 워커 프로세스
+
+Usage: python pid_worker.py <port>
+
+담당 도구:
+  extract_pid_tags, match_pid_tags,
+  parse_pid_dxf, parse_pid_pdf, parse_pid_drawing,
+  build_pid_graph_parallel, analyze_pid_impact
+"""
+
+from __future__ import annotations
+import sys
+import os
+
+# mcp-server 디렉토리를 Python 경로에 추가 (pipeline 패키지 접근)
+sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
+
+import io
+import json
+import asyncio
+import signal
+import logging
+import re
+from functools import lru_cache
+
+from fastapi import FastAPI, Request
+import uvicorn
+
+# ── 설정 ─────────────────────────────────────────────────────────────────────
+
+VLLM_BASE_URL        = os.environ.get("VLLM_BASE_URL", "http://localhost:8000/v1")
+VLLM_MODEL           = os.environ.get("VLLM_MODEL", "Qwen3.6-27B-FP8")
+DB_CONNECTION_STRING = os.environ.get("DB_CONNECTION_STRING", "postgresql://postgres:postgres@localhost:5432/iiot_platform")
+DB_TIMEOUT           = int(os.environ.get("DB_TIMEOUT", "10"))
+
+_SERVER_DIR = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
+STORAGE_DIR = os.path.join(_SERVER_DIR, "storage")
+
+logging.basicConfig(
+    level=logging.INFO,
+    stream=sys.stderr,
+    format="%(asctime)s [pid_worker] %(levelname)s %(message)s",
+)
+
+app = FastAPI()
+
+# ── 싱글톤 ───────────────────────────────────────────────────────────────────
+
+@lru_cache(maxsize=1)
+def _llm():
+    from openai import OpenAI
+    return OpenAI(base_url=VLLM_BASE_URL, api_key="dummy")
+
+
+@lru_cache(maxsize=1)
+def _ocr():
+    from paddleocr import PaddleOCR
+    use_gpu = os.environ.get("PADDLE_USE_GPU", "true").lower() == "true"
+    try:
+        return PaddleOCR(use_angle_cls=True, lang="korean", use_gpu=use_gpu, show_log=False)
+    except Exception:
+        if use_gpu:
+            os.environ["PADDLE_USE_GPU"] = "false"
+            return _ocr()
+        raise
+
+# ── DB ───────────────────────────────────────────────────────────────────────
+
+def _get_db_connection():
+    import psycopg
+    return psycopg.connect(DB_CONNECTION_STRING, connect_timeout=DB_TIMEOUT)
+
+# ── 텍스트 추출 ──────────────────────────────────────────────────────────────
+
+def _extract_text_from_dxf(filepath: str) -> str:
+    import ezdxf
+    from ezdxf.tools.text import plain_mtext
+    doc = ezdxf.readfile(filepath)
+    msp = doc.modelspace()
+    texts = []
+    for entity in msp:
+        if entity.dxftype() == "TEXT":
+            texts.append(entity.dxf.text)
+        elif entity.dxftype() == "MTEXT":
+            try:
+                plain = plain_mtext(entity.dxf.text)
+                if plain.strip():
+                    texts.append(plain)
+            except Exception:
+                pass
+    return "\n".join(texts)
+
+
+def _extract_text_from_pdf(filepath: str) -> str:
+    import fitz
+    doc = fitz.open(filepath)
+    return "\n".join(page.get_text() for page in doc)
+
+
+def _extract_text_from_pdf_ocr(filepath: str) -> str:
+    import fitz
+    from PIL import Image
+    import numpy as np
+    doc = fitz.open(filepath)
+    all_texts = []
+    for page in doc:
+        mat = fitz.Matrix(300 / 72)
+        pix = page.get_pixmap(matrix=mat)
+        img = Image.open(io.BytesIO(pix.tobytes("png")))
+        result = _ocr().ocr(np.array(img), cls=True)
+        if result and result[0]:
+            all_texts.extend(line[1][0] for line in result[0])
+    return "\n".join(all_texts)
+
+# ── JSON 배열 파싱 유틸 ───────────────────────────────────────────────────────
+
+def _parse_json_array(raw: str, finish_reason: str = "") -> list:
+    """LLM 출력에서 JSON 배열 추출. finish_reason=length 잘림 복구 포함."""
+    if raw.startswith("```"):
+        lines = raw.splitlines()
+        raw = "\n".join(lines[1:-1] if lines and lines[-1].strip() == "```" else lines[1:]).strip()
+
+    if finish_reason == "length":
+        last_close = raw.rfind("}")
+        if last_close != -1:
+            raw = raw[:last_close + 1] + "]"
+
+    # 가장 긴 균형 잡힌 [...] 추출
+    depth = 0; start = -1; best = ""
+    for i, c in enumerate(raw):
+        if c == "[":
+            if depth == 0:
+                start = i
+            depth += 1
+        elif c == "]":
+            depth -= 1
+            if depth == 0 and start >= 0:
+                cand = raw[start:i + 1]
+                if len(cand) > len(best):
+                    best = cand
+    raw = best if best else "[]"
+
+    try:
+        return json.loads(raw)
+    except json.JSONDecodeError:
+        data = []
+        for obj in re.findall(r"\{[^{}]*\}", raw, re.DOTALL):
+            try:
+                data.append(json.loads(obj))
+            except json.JSONDecodeError:
+                pass
+        return data
+
+# ── 태그 추출/매핑 도구 ───────────────────────────────────────────────────────
+
+def _extract_pid_tags(text: str, source_type: str) -> str:
+    system = (
+        "You are a P&ID (Piping and Instrumentation Diagram) expert.\n"
+        "Extract all instrument and equipment tags from the provided text.\n"
+        "Return ONLY a valid JSON array. Each element must have exactly these fields:\n"
+        '{"tagNo":"FCV-101","equipmentName":null,"instrumentType":"FCV",'
+        '"lineNumber":null,"pidDrawingNo":null,"confidence":0.95}\n'
+        "Rules:\n"
+        "- tagNo: any token matching [LETTERS]-[DIGITS] or [LETTERS]-[DIGITS]-[SUFFIX]\n"
+        "  Examples: FCV-101, P-10101, T-10100, VG-6203-15A-F1A-n, BT-6200, DP-10101\n"
+        "- instrumentType: leading letters of tagNo\n"
+        "- equipmentName: descriptive name if present near tag, else null\n"
+        "- lineNumber/pidDrawingNo: null unless explicitly associated\n"
+        "- confidence: 0.95 for clear tags, lower for ambiguous\n"
+        "- Output ONLY the JSON array, no markdown, no explanation.\n"
+        "- If no tags found, return: []\n"
+    )
+    truncated = text[:100000]
+    resp = _llm().chat.completions.create(
+        model=VLLM_MODEL,
+        messages=[
+            {"role": "system", "content": system},
+            {"role": "user", "content": f"Source: {source_type}\n\nText:\n{truncated}"},
+        ],
+        max_tokens=32768,
+        temperature=0.1,
+        extra_body={"chat_template_kwargs": {"enable_thinking": False}},
+    )
+    raw = (resp.choices[0].message.content or "").strip()
+    data = _parse_json_array(raw, resp.choices[0].finish_reason)
+    logging.info(f"extract_pid_tags source={source_type} count={len(data)}")
+    return json.dumps({
+        "success": True,
+        "data": {"count": len(data), "tags": data},
+        "message": "태그 추출 완료"
+    }, ensure_ascii=False, indent=2)
+
+
+def _match_pid_tags(pid_tags: list, experion_tags: list) -> str:
+    system = (
+        "You are a P&ID to Experion tag matching expert.\n"
+        "Match P&ID tags to Experion tags based on similarity.\n"
+        "Return ONLY a JSON array:\n"
+        '[{"pidTag":"FT-101","experionTag":"ft-101.pv","confidence":0.92},...]\n'
+        "- If no good match: confidence < 0.5, experionTag null\n"
+        "- Output ONLY the JSON array.\n"
+    )
+    resp = _llm().chat.completions.create(
+        model=VLLM_MODEL,
+        messages=[
+            {"role": "system", "content": system},
+            {"role": "user", "content": (
+                f"P&ID Tags:\n{chr(10).join(pid_tags)}\n\n"
+                f"Experion Tags:\n{chr(10).join(experion_tags)}"
+            )},
+        ],
+        max_tokens=16384,
+        temperature=0.1,
+        extra_body={"chat_template_kwargs": {"enable_thinking": False}},
+    )
+    raw = (resp.choices[0].message.content or "").strip()
+    data = _parse_json_array(raw, resp.choices[0].finish_reason)
+    return json.dumps({
+        "success": True,
+        "data": {"count": len(data), "mappings": data},
+        "message": "태그 매핑 완료"
+    }, ensure_ascii=False, indent=2)
+
+# ── 도면 파싱 도구 ────────────────────────────────────────────────────────────
+
+_TAG_EXTRACT_SYSTEM = (
+    "You are a P&ID (Piping and Instrumentation Diagram) expert.\n"
+    "Extract instrument and equipment tags from the provided text.\n"
+    "Return ONLY a JSON array:\n"
+    '[{"tagNo":"FIT-10115","equipmentName":"Flow Transmitter","instrumentType":"FIT",'
+    '"lineNumber":"L-101","pidDrawingNo":"P&ID-001","confidence":0.95},...]\n'
+    "Rules:\n"
+    "- tagNo: Instrument [Function]-[Number], Equipment [Type]-[Number]\n"
+    "- instrumentType: first 2-4 letters of tagNo\n"
+    "- equipmentName/lineNumber/pidDrawingNo: null if not present\n"
+    "- confidence: 0.0 to 1.0\n"
+    "- Output ONLY the JSON array, no markdown.\n"
+    "- If no tags found, return: []\n"
+)
+
+
+def _parse_pid_dxf(filepath: str) -> str:
+    text = _extract_text_from_dxf(filepath)
+    if not text.strip():
+        return json.dumps({"success": True, "text": "", "count": 0, "tags": []},
+                          ensure_ascii=False, indent=2)
+
+    resp = _llm().chat.completions.create(
+        model=VLLM_MODEL,
+        messages=[
+            {"role": "system", "content": _TAG_EXTRACT_SYSTEM},
+            {"role": "user", "content": f"Source: dxf\n\nText:\n{text[:8000]}"},
+        ],
+        max_tokens=8192,
+        temperature=0.1,
+    )
+    raw = (resp.choices[0].message.content or "").strip()
+    data = _parse_json_array(raw, resp.choices[0].finish_reason)
+    if not isinstance(data, list):
+        data = []
+    return json.dumps({
+        "success": True,
+        "data": {"text": text[:10000], "count": len(data), "tags": data},
+        "message": "DXF 파싱 완료"
+    }, ensure_ascii=False, indent=2)
+
+
+def _parse_pid_pdf(filepath: str, use_ocr: bool = True) -> str:
+    text = _extract_text_from_pdf_ocr(filepath) if use_ocr else _extract_text_from_pdf(filepath)
+    if not text.strip():
+        return json.dumps({"success": True, "text": "", "count": 0, "tags": []},
+                          ensure_ascii=False, indent=2)
+
+    resp = _llm().chat.completions.create(
+        model=VLLM_MODEL,
+        messages=[
+            {"role": "system", "content": _TAG_EXTRACT_SYSTEM},
+            {"role": "user", "content": f"Source: pdf\n\nText:\n{text[:12000]}"},
+        ],
+        max_tokens=4096,
+        temperature=0.1,
+    )
+    raw = (resp.choices[0].message.content or "").strip()
+    data = _parse_json_array(raw, resp.choices[0].finish_reason)
+    if not isinstance(data, list):
+        data = []
+    return json.dumps({
+        "success": True,
+        "data": {"text": text[:10000], "count": len(data), "tags": data},
+        "message": "PDF 파싱 완료"
+    }, ensure_ascii=False, indent=2)
+
+
+def _parse_pid_drawing(filepath: str) -> str:
+    ext = os.path.splitext(filepath)[1].lower()
+    if ext == ".dxf":
+        return _parse_pid_dxf(filepath)
+    elif ext == ".pdf":
+        return _parse_pid_pdf(filepath)
+    elif ext == ".dwg":
+        return json.dumps({
+            "success": False,
+            "data": None,
+            "error": "DWG 파일은 직접 파싱할 수 없습니다. DXF로 변환 후 사용하세요.",
+            "message": "지원하지 않는 파일 형식"
+        }, ensure_ascii=False)
+    else:
+        return json.dumps({
+            "success": False,
+            "error": f"지원하지 않는 형식: {ext}. 지원: .dxf, .pdf",
+        }, ensure_ascii=False)
+
+# ── 그래프 도구 ───────────────────────────────────────────────────────────────
+
+async def _build_pid_graph_parallel(filepath: str) -> str:
+    from pipeline.extractor import PidGeometricExtractor
+    from pipeline.topology import PidTopologyBuilder
+    from pipeline.mapper import IntelligentMapper
+    from openai import AsyncOpenAI
+
+    os.makedirs(STORAGE_DIR, exist_ok=True)
+
+    # Phase 1: 기하 추출
+    extractor = PidGeometricExtractor(filepath)
+    geo_data_path = os.path.join(STORAGE_DIR, os.path.basename(filepath) + "_geo.json")
+    extractor.extract_and_save(geo_data_path)
+    with open(geo_data_path, "r", encoding="utf-8") as f:
+        geo_data = json.load(f)
+
+    # 시스템 태그 조회
+    system_tags: list[str] = []
+    try:
+        conn = _get_db_connection()
+        with conn.cursor() as cur:
+            cur.execute("SELECT tagname FROM realtime_table")
+            system_tags = [r[0] for r in cur.fetchall()]
+    except Exception as e:
+        logging.warning(f"시스템 태그 조회 실패: {e}")
+
+    # Phase 2: 1차 위상 빌더 (Mapper용 그래프)
+    builder = PidTopologyBuilder(geo_data)
+    builder.build_graph()
+
+    # Phase 3: 병렬 LLM 매핑
+    api_client = AsyncOpenAI(base_url=VLLM_BASE_URL, api_key="dummy")
+    mapper = IntelligentMapper(builder.G, system_tags, api_client=api_client)
+
+    transmitter_nodes = [
+        n for n, d in builder.G.nodes(data=True)
+        if (d.get("value") or "").upper() in {"FIT", "FT", "LT", "PT", "TE"}
+    ]
+    valve_nodes = [
+        n for n, d in builder.G.nodes(data=True)
+        if (d.get("value") or "").upper() in {"FCV", "LCV", "TCV", "PCV", "XV"}
+    ]
+    equipment_nodes = [
+        n for n, d in builder.G.nodes(data=True)
+        if d.get("type") not in {"TEXT", "LINE", "LWPOLYLINE"}
+    ]
+
+    extracted_results = await asyncio.gather(
+        mapper.extract_transmitters(transmitter_nodes),
+        mapper.extract_valves(valve_nodes),
+        mapper.extract_equipment(equipment_nodes),
+    )
+
+    # 매핑 결과 통합
+    all_mapped_tags = []
+    for res_dict in extracted_results:
+        for node_id, mapping in res_dict.items():
+            if mapping.resolved_tag != "UNKNOWN":
+                node_data = builder.G.nodes[node_id]
+                all_mapped_tags.append({
+                    "entity_id": node_id,
+                    "tagName": mapping.resolved_tag,
+                    "bbox": (
+                        node_data["bbox"].bounds
+                        if hasattr(node_data["bbox"], "bounds")
+                        else node_data["bbox"]
+                    ),
+                    "clean_value": mapping.resolved_tag,
+                })
+
+    # Phase 4: 최종 위상 모델링 + 저장
+    final_builder = PidTopologyBuilder(geo_data, all_extracted_tags=all_mapped_tags)
+    final_builder.build_graph()
+
+    graph_id = os.path.basename(filepath).replace(".dxf", "_graph.json")
+    graph_path = os.path.join(STORAGE_DIR, graph_id)
+    final_builder.save_graph(graph_path)
+
+    logging.info(f"build_pid_graph_parallel graph_id={graph_id} "
+                 f"nodes={final_builder.G.number_of_nodes()} "
+                 f"edges={final_builder.G.number_of_edges()}")
+    return json.dumps({
+        "success": True,
+        "graph_id": graph_id,
+        "graph_path": graph_path,
+        "nodes": final_builder.G.number_of_nodes(),
+        "edges": final_builder.G.number_of_edges(),
+    }, ensure_ascii=False)
+
+
+def _analyze_pid_impact(graph_id: str, start_node_id: str) -> str:
+    from pipeline.analyzer import PidAnalysisEngine
+
+    graph_path = os.path.join(STORAGE_DIR, graph_id)
+    mapping_path = graph_path.replace("_graph.json", "_mapping.json")
+    analyzer = PidAnalysisEngine(graph_path, mapping_path)
+    result = analyzer.analyze_impact(start_node_id)
+    return json.dumps(result, ensure_ascii=False, indent=2)
+
+# ── 요청 디스패처 ─────────────────────────────────────────────────────────────
+
+async def _dispatch(tool: str, params: dict) -> str:
+    try:
+        match tool:
+            # blocking 함수는 asyncio.to_thread로 스레드풀 오프로드
+            case "extract_pid_tags":
+                return await asyncio.to_thread(_extract_pid_tags, **params)
+            case "match_pid_tags":
+                return await asyncio.to_thread(_match_pid_tags, **params)
+            case "parse_pid_dxf":
+                return await asyncio.to_thread(_parse_pid_dxf, **params)
+            case "parse_pid_pdf":
+                return await asyncio.to_thread(_parse_pid_pdf, **params)
+            case "parse_pid_drawing":
+                return await asyncio.to_thread(_parse_pid_drawing, **params)
+            case "analyze_pid_impact":
+                return await asyncio.to_thread(_analyze_pid_impact, **params)
+            # 이미 async — 직접 await
+            case "build_pid_graph_parallel":
+                return await _build_pid_graph_parallel(**params)
+            case _:
+                return json.dumps({"success": False, "error": f"알 수 없는 도구: {tool}"},
+                                  ensure_ascii=False)
+    except Exception as e:
+        logging.error(f"dispatch error tool={tool}: {e}", exc_info=True)
+        return json.dumps({"success": False, "error": str(e)}, ensure_ascii=False)
+
+# ── 종료 예약 ─────────────────────────────────────────────────────────────────
+
+def _schedule_shutdown():
+    """
+    응답 전송 완료 후 프로세스 종료 예약.
+    FastAPI의 BackgroundTasks를 사용하여 응답이 완전히 전송된 후 종료되도록 유도함.
+    """
+    async def _do():
+        # 네트워크 전송 및 커넥션 정리를 위한 최소한의 대기 시간
+        await asyncio.sleep(1.0)
+        logging.info("One-shot worker shutting down...")
+        os.kill(os.getpid(), signal.SIGTERM)
+    asyncio.create_task(_do())
+
+# ── HTTP 엔드포인트 ───────────────────────────────────────────────────────────
+
+@app.get("/health")
+async def health():
+    return {"status": "ok"}
+
+
+@app.post("/execute")
+async def execute(request: Request):
+    body = await request.json()
+    return await _dispatch(body["tool"], body["params"])
+
+
+from fastapi import BackgroundTasks
+
+@app.post("/execute/one_shot")
+async def execute_one_shot(request: Request, background_tasks: BackgroundTasks):
+    """요청 처리 후 프로세스 자동 종료 (P&ID 워커 전용)."""
+    body = await request.json()
+    result = await _dispatch(body["tool"], body["params"])
+    
+    # BackgroundTasks에 등록하여 응답 전송이 완료된 후 _schedule_shutdown이 실행되도록 함
+    background_tasks.add_function(_schedule_shutdown)
+    
+    return result
+
+# ── 진입점 ───────────────────────────────────────────────────────────────────
+
+if __name__ == "__main__":
+    port = int(sys.argv[1]) if len(sys.argv) > 1 else 5004
+    os.makedirs(STORAGE_DIR, exist_ok=True)
+    uvicorn.run(app, host="0.0.0.0", port=port, log_level="warning")
+
+
+
--- a/mcp-server/worker/pid_worker_py.md
+++ b/mcp-server/worker/pid_worker_py.md
@@ -0,0 +1,526 @@
+❯ pid_worker.py를 만들어줘
+
+● 먼저 현재 디렉토리 구조를 확인하고 파일을 작성합니다.
+
+  Listed 1 directory (ctrl+o to expand)
+
+● Bash(mkdir -p /home/windpacer/projects/ExperionCrawler/mcp-server/worker)
+  ⎿  Done
+
+● Write(mcp-server/worker/pid_worker.py)
+
+────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
+ Create file
+ mcp-server/worker/pid_worker.py
+╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌
+    1 #!/usr/bin/env python3
+    2 """P&ID 파싱 전용 워커 프로세스
+    3
+    4 Usage: python pid_worker.py <port>
+    5
+    6 담당 도구:
+    7   extract_pid_tags, match_pid_tags,
+    8   parse_pid_dxf, parse_pid_pdf, parse_pid_drawing,
+    9   build_pid_graph_parallel, analyze_pid_impact
+   10 """
+   11
+   12 from __future__ import annotations
+   13 import sys
+   14 import os
+   15
+   16 # mcp-server 디렉토리를 Python 경로에 추가 (pipeline 패키지 접근)
+   17 sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
+   18
+   19 import io
+   20 import json
+   21 import asyncio
+   22 import signal
+   23 import logging
+   24 import re
+   25 from functools import lru_cache
+   26
+   27 from fastapi import FastAPI, Request
+   28 import uvicorn
+   29
+   30 # ── 설정 ─────────────────────────────────────────────────────────────────────
+   31
+   32 VLLM_BASE_URL        = "http://localhost:8000/v1"
+   33 VLLM_MODEL           = "Qwen/Qwen3-Coder-Next-FP8"
+   34 DB_CONNECTION_STRING = "postgresql://postgres:postgres@localhost:5432/iiot_platform"
+   35 DB_TIMEOUT           = 10
+   36
+   37 _SERVER_DIR = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
+   38 STORAGE_DIR = os.path.join(_SERVER_DIR, "storage")
+   39
+   40 logging.basicConfig(
+   41     level=logging.INFO,
+   42     stream=sys.stderr,
+   43     format="%(asctime)s [pid_worker] %(levelname)s %(message)s",
+   44 )
+   45
+   46 app = FastAPI()
+   47
+   48 # ── 싱글톤 ───────────────────────────────────────────────────────────────────
+   49
+   50 @lru_cache(maxsize=1)
+   51 def _llm():
+   52     from openai import OpenAI
+   53     return OpenAI(base_url=VLLM_BASE_URL, api_key="dummy")
+   54
+   55
+   56 @lru_cache(maxsize=1)
+   57 def _ocr():
+   58     from paddleocr import PaddleOCR
+   59     use_gpu = os.environ.get("PADDLE_USE_GPU", "true").lower() == "true"
+   60     try:
+   61         return PaddleOCR(use_angle_cls=True, lang="korean", use_gpu=use_gpu, show_log=False)
+   62     except Exception:
+   63         if use_gpu:
+   64             os.environ["PADDLE_USE_GPU"] = "false"
+   65             return _ocr()
+   66         raise
+   67
+   68 # ── DB ───────────────────────────────────────────────────────────────────────
+   69
+   70 def _get_db_connection():
+   71     import psycopg
+   72     return psycopg.connect(DB_CONNECTION_STRING, connect_timeout=DB_TIMEOUT)
+   73
+   74 # ── 텍스트 추출 ──────────────────────────────────────────────────────────────
+   75
+   76 def _extract_text_from_dxf(filepath: str) -> str:
+   77     import ezdxf
+   78     from ezdxf.tools.text import plain_mtext
+   79     doc = ezdxf.readfile(filepath)
+   80     msp = doc.modelspace()
+   81     texts = []
+   82     for entity in msp:
+   83         if entity.dxftype() == "TEXT":
+   84             texts.append(entity.dxf.text)
+   85         elif entity.dxftype() == "MTEXT":
+   86             try:
+   87                 plain = plain_mtext(entity.dxf.text)
+   88                 if plain.strip():
+   89                     texts.append(plain)
+   90             except Exception:
+   91                 pass
+   92     return "\n".join(texts)
+   93
+   94
+   95 def _extract_text_from_pdf(filepath: str) -> str:
+   96     import fitz
+   97     doc = fitz.open(filepath)
+   98     return "\n".join(page.get_text() for page in doc)
+   99
+  100
+  101 def _extract_text_from_pdf_ocr(filepath: str) -> str:
+  102     import fitz
+  103     from PIL import Image
+  104     import numpy as np
+  105     doc = fitz.open(filepath)
+  106     all_texts = []
+  107     for page in doc:
+  108         mat = fitz.Matrix(300 / 72)
+  109         pix = page.get_pixmap(matrix=mat)
+  110         img = Image.open(io.BytesIO(pix.tobytes("png")))
+  111         result = _ocr().ocr(np.array(img), cls=True)
+  112         if result and result[0]:
+  113             all_texts.extend(line[1][0] for line in result[0])
+  114     return "\n".join(all_texts)
+  115
+  116 # ── JSON 배열 파싱 유틸 ───────────────────────────────────────────────────────
+  117
+  118 def _parse_json_array(raw: str, finish_reason: str = "") -> list:
+  119     """LLM 출력에서 JSON 배열 추출. finish_reason=length 잘림 복구 포함."""
+  120     if raw.startswith("```"):
+  121         lines = raw.splitlines()
+  122         raw = "\n".join(lines[1:-1] if lines and lines[-1].strip() == "```" else lines[1:]).strip()
+  123
+  124     if finish_reason == "length":
+  125         last_close = raw.rfind("}")
+  126         if last_close != -1:
+  127             raw = raw[:last_close + 1] + "]"
+  128
+  129     # 가장 긴 균형 잡힌 [...] 추출
+  130     depth = 0; start = -1; best = ""
+  131     for i, c in enumerate(raw):
+  132         if c == "[":
+  133             if depth == 0:
+  134                 start = i
+  135             depth += 1
+  136         elif c == "]":
+  137             depth -= 1
+  138             if depth == 0 and start >= 0:
+  139                 cand = raw[start:i + 1]
+  140                 if len(cand) > len(best):
+  141                     best = cand
+  142     raw = best if best else "[]"
+  143
+  144     try:
+  145         return json.loads(raw)
+  146     except json.JSONDecodeError:
+  147         data = []
+  148         for obj in re.findall(r"\{[^{}]*\}", raw, re.DOTALL):
+  149             try:
+  150                 data.append(json.loads(obj))
+  151             except json.JSONDecodeError:
+  152                 pass
+  153         return data
+  154
+  155 # ── 태그 추출/매핑 도구 ───────────────────────────────────────────────────────
+  156
+  157 def _extract_pid_tags(text: str, source_type: str) -> str:
+  158     system = (
+  159         "You are a P&ID (Piping and Instrumentation Diagram) expert.\n"
+  160         "Extract all instrument and equipment tags from the provided text.\n"
+  161         "Return ONLY a valid JSON array. Each element must have exactly these fields:\n"
+  162         '{"tagNo":"FCV-101","equipmentName":null,"instrumentType":"FCV",'
+  163         '"lineNumber":null,"pidDrawingNo":null,"confidence":0.95}\n'
+  164         "Rules:\n"
+  165         "- tagNo: any token matching [LETTERS]-[DIGITS] or [LETTERS]-[DIGITS]-[SUFFIX]\n"
+  166         "  Examples: FCV-101, P-10101, T-10100, VG-6203-15A-F1A-n, BT-6200, DP-10101\n"
+  167         "- instrumentType: leading letters of tagNo\n"
+  168         "- equipmentName: descriptive name if present near tag, else null\n"
+  169         "- lineNumber/pidDrawingNo: null unless explicitly associated\n"
+  170         "- confidence: 0.95 for clear tags, lower for ambiguous\n"
+  171         "- Output ONLY the JSON array, no markdown, no explanation.\n"
+  172         "- If no tags found, return: []\n"
+  173     )
+  174     truncated = text[:100000]
+  175     resp = _llm().chat.completions.create(
+  176         model=VLLM_MODEL,
+  177         messages=[
+  178             {"role": "system", "content": system},
+  179             {"role": "user", "content": f"Source: {source_type}\n\nText:\n{truncated}"},
+  180         ],
+  181         max_tokens=32768,
+  182         temperature=0.1,
+  183         extra_body={"chat_template_kwargs": {"enable_thinking": False}},
+  184     )
+  185     raw = (resp.choices[0].message.content or "").strip()
+  186     data = _parse_json_array(raw, resp.choices[0].finish_reason)
+  187     logging.info(f"extract_pid_tags source={source_type} count={len(data)}")
+  188     return json.dumps({"success": True, "count": len(data), "tags": data},
+  189                       ensure_ascii=False, indent=2)
+  190
+  191
+  192 def _match_pid_tags(pid_tags: list, experion_tags: list) -> str:
+  193     system = (
+  194         "You are a P&ID to Experion tag matching expert.\n"
+  195         "Match P&ID tags to Experion tags based on similarity.\n"
+  196         "Return ONLY a JSON array:\n"
+  197         '[{"pidTag":"FT-101","experionTag":"ft-101.pv","confidence":0.92},...]\n'
+  198         "- If no good match: confidence < 0.5, experionTag null\n"
+  199         "- Output ONLY the JSON array.\n"
+  200     )
+  201     resp = _llm().chat.completions.create(
+  202         model=VLLM_MODEL,
+  203         messages=[
+  204             {"role": "system", "content": system},
+  205             {"role": "user", "content": (
+  206                 f"P&ID Tags:\n{chr(10).join(pid_tags)}\n\n"
+  207                 f"Experion Tags:\n{chr(10).join(experion_tags)}"
+  208             )},
+  209         ],
+  210         max_tokens=16384,
+  211         temperature=0.1,
+  212         extra_body={"chat_template_kwargs": {"enable_thinking": False}},
+  213     )
+  214     raw = (resp.choices[0].message.content or "").strip()
+  215     data = _parse_json_array(raw, resp.choices[0].finish_reason)
+  216     return json.dumps({"success": True, "count": len(data), "mappings": data},
+  217                       ensure_ascii=False, indent=2)
+  218
+  219 # ── 도면 파싱 도구 ────────────────────────────────────────────────────────────
+  220
+  221 _TAG_EXTRACT_SYSTEM = (
+  222     "You are a P&ID (Piping and Instrumentation Diagram) expert.\n"
+  223     "Extract instrument and equipment tags from the provided text.\n"
+  224     "Return ONLY a JSON array:\n"
+  225     '[{"tagNo":"FIT-10115","equipmentName":"Flow Transmitter","instrumentType":"FIT",'
+  226     '"lineNumber":"L-101","pidDrawingNo":"P&ID-001","confidence":0.95},...]\n'
+  227     "Rules:\n"
+  228     "- tagNo: Instrument [Function]-[Number], Equipment [Type]-[Number]\n"
+  229     "- instrumentType: first 2-4 letters of tagNo\n"
+  230     "- equipmentName/lineNumber/pidDrawingNo: null if not present\n"
+  231     "- confidence: 0.0 to 1.0\n"
+  232     "- Output ONLY the JSON array, no markdown.\n"
+  233     "- If no tags found, return: []\n"
+  234 )
+  235
+  236
+  237 def _parse_pid_dxf(filepath: str) -> str:
+  238     text = _extract_text_from_dxf(filepath)
+  239     if not text.strip():
+  240         return json.dumps({"success": True, "text": "", "count": 0, "tags": []},
+  241                           ensure_ascii=False, indent=2)
+  242
+  243     resp = _llm().chat.completions.create(
+  244         model=VLLM_MODEL,
+  245         messages=[
+  246             {"role": "system", "content": _TAG_EXTRACT_SYSTEM},
+  247             {"role": "user", "content": f"Source: dxf\n\nText:\n{text[:12000]}"},
+  248         ],
+  249         max_tokens=4096,
+  250         temperature=0.1,
+  251     )
+  252     raw = (resp.choices[0].message.content or "").strip()
+  253     data = _parse_json_array(raw, resp.choices[0].finish_reason)
+  254     if not isinstance(data, list):
+  255         data = []
+  256     return json.dumps({"success": True, "text": text[:10000], "count": len(data), "tags": data},
+  257                       ensure_ascii=False, indent=2)
+  258
+  259
+  260 def _parse_pid_pdf(filepath: str, use_ocr: bool = True) -> str:
+  261     text = _extract_text_from_pdf_ocr(filepath) if use_ocr else _extract_text_from_pdf(filepath)
+  262     if not text.strip():
+  263         return json.dumps({"success": True, "text": "", "count": 0, "tags": []},
+  264                           ensure_ascii=False, indent=2)
+  265
+  266     resp = _llm().chat.completions.create(
+  267         model=VLLM_MODEL,
+  268         messages=[
+  269             {"role": "system", "content": _TAG_EXTRACT_SYSTEM},
+  270             {"role": "user", "content": f"Source: pdf\n\nText:\n{text[:12000]}"},
+  271         ],
+  272         max_tokens=4096,
+  273         temperature=0.1,
+  274     )
+  275     raw = (resp.choices[0].message.content or "").strip()
+  276     data = _parse_json_array(raw, resp.choices[0].finish_reason)
+  277     if not isinstance(data, list):
+  278         data = []
+  279     return json.dumps({"success": True, "text": text[:10000], "count": len(data), "tags": data},
+  280                       ensure_ascii=False, indent=2)
+  281
+  282
+  283 def _parse_pid_drawing(filepath: str) -> str:
+  284     ext = os.path.splitext(filepath)[1].lower()
+  285     if ext == ".dxf":
+  286         return _parse_pid_dxf(filepath)
+  287     elif ext == ".pdf":
+  288         return _parse_pid_pdf(filepath)
+  289     elif ext == ".dwg":
+  290         return json.dumps({
+  291             "success": False,
+  292             "error": "DWG 파일은 직접 파싱할 수 없습니다. DXF로 변환 후 사용하세요.",
+  293         }, ensure_ascii=False)
+  294     else:
+  295         return json.dumps({
+  296             "success": False,
+  297             "error": f"지원하지 않는 형식: {ext}. 지원: .dxf, .pdf",
+  298         }, ensure_ascii=False)
+  299
+  300 # ── 그래프 도구 ───────────────────────────────────────────────────────────────
+  301
+  302 async def _build_pid_graph_parallel(filepath: str) -> str:
+  303     from pipeline.extractor import PidGeometricExtractor
+  304     from pipeline.topology import PidTopologyBuilder
+  305     from pipeline.mapper import IntelligentMapper
+  306     from openai import AsyncOpenAI
+  307
+  308     os.makedirs(STORAGE_DIR, exist_ok=True)
+  309
+  310     # Phase 1: 기하 추출
+  311     extractor = PidGeometricExtractor(filepath)
+  312     geo_data_path = os.path.join(STORAGE_DIR, os.path.basename(filepath) + "_geo.json")
+  313     extractor.extract_and_save(geo_data_path)
+  314     with open(geo_data_path, "r", encoding="utf-8") as f:
+  315         geo_data = json.load(f)
+  316
+  317     # 시스템 태그 조회
+  318     system_tags: list[str] = []
+  319     try:
+  320         conn = _get_db_connection()
+  321         with conn.cursor() as cur:
+  322             cur.execute("SELECT tagname FROM realtime_table")
+  323             system_tags = [r[0] for r in cur.fetchall()]
+  324     except Exception as e:
+  325         logging.warning(f"시스템 태그 조회 실패: {e}")
+  326
+  327     # Phase 2: 1차 위상 빌더 (Mapper용 그래프)
+  328     builder = PidTopologyBuilder(geo_data)
+  329     builder.build_graph()
+  330
+  331     # Phase 3: 병렬 LLM 매핑
+  332     api_client = AsyncOpenAI(base_url=VLLM_BASE_URL, api_key="dummy")
+  333     mapper = IntelligentMapper(builder.G, system_tags, api_client=api_client)
+  334
+  335     transmitter_nodes = [
+  336         n for n, d in builder.G.nodes(data=True)
+  337         if d.get("value", "").upper() in {"FIT", "FT", "LT", "PT", "TE"}
+  338     ]
+  339     valve_nodes = [
+  340         n for n, d in builder.G.nodes(data=True)
+  341         if d.get("value", "").upper() in {"FCV", "LCV", "TCV", "PCV", "XV"}
+  342     ]
+  343     equipment_nodes = [
+  344         n for n, d in builder.G.nodes(data=True)
+  345         if d.get("type") not in {"TEXT", "LINE", "LWPOLYLINE"}
+  346     ]
+  347
+  348     extracted_results = await asyncio.gather(
+  349         mapper.extract_transmitters(transmitter_nodes),
+  350         mapper.extract_valves(valve_nodes),
+  351         mapper.extract_equipment(equipment_nodes),
+  352     )
+  353
+  354     # 매핑 결과 통합
+  355     all_mapped_tags = []
+  356     for res_dict in extracted_results:
+  357         for node_id, mapping in res_dict.items():
+  358             if mapping.resolved_tag != "UNKNOWN":
+  359                 node_data = builder.G.nodes[node_id]
+  360                 all_mapped_tags.append({
+  361                     "entity_id": node_id,
+  362                     "tagName": mapping.resolved_tag,
+  363                     "bbox": (
+  364                         node_data["bbox"].bounds
+  365                         if hasattr(node_data["bbox"], "bounds")
+  366                         else node_data["bbox"]
+  367                     ),
+  368                     "clean_value": mapping.resolved_tag,
+  369                 })
+  370
+  371     # Phase 4: 최종 위상 모델링 + 저장
+  372     final_builder = PidTopologyBuilder(geo_data, all_extracted_tags=all_mapped_tags)
+  373     final_builder.build_graph()
+  374
+  375     graph_id = os.path.basename(filepath).replace(".dxf", "_graph.json")
+  376     graph_path = os.path.join(STORAGE_DIR, graph_id)
+  377     final_builder.save_graph(graph_path)
+  378
+  379     logging.info(f"build_pid_graph_parallel graph_id={graph_id} "
+  380                  f"nodes={final_builder.G.number_of_nodes()} "
+  381                  f"edges={final_builder.G.number_of_edges()}")
+  382     return json.dumps({
+  383         "success": True,
+  384         "graph_id": graph_id,
+  385         "graph_path": graph_path,
+  386         "nodes": final_builder.G.number_of_nodes(),
+  387         "edges": final_builder.G.number_of_edges(),
+  388     }, ensure_ascii=False)
+  389
+  390
+  391 def _analyze_pid_impact(graph_id: str, start_node_id: str) -> str:
+  392     from pipeline.analyzer import PidAnalysisEngine
+  393
+  394     graph_path = os.path.join(STORAGE_DIR, graph_id)
+  395     mapping_path = graph_path.replace("_graph.json", "_mapping.json")
+  396     analyzer = PidAnalysisEngine(graph_path, mapping_path)
+  397     result = analyzer.analyze_impact(start_node_id)
+  398     return json.dumps(result, ensure_ascii=False, indent=2)
+  399
+  400 # ── 요청 디스패처 ─────────────────────────────────────────────────────────────
+  401
+  402 async def _dispatch(tool: str, params: dict) -> str:
+  403     try:
+  404         match tool:
+  405             case "extract_pid_tags":
+  406                 return _extract_pid_tags(**params)
+  407             case "match_pid_tags":
+  408                 return _match_pid_tags(**params)
+  409             case "parse_pid_dxf":
+  410                 return _parse_pid_dxf(**params)
+  411             case "parse_pid_pdf":
+  412                 return _parse_pid_pdf(**params)
+  413             case "parse_pid_drawing":
+  414                 return _parse_pid_drawing(**params)
+  415             case "build_pid_graph_parallel":
+  416                 return await _build_pid_graph_parallel(**params)
+  417             case "analyze_pid_impact":
+  418                 return _analyze_pid_impact(**params)
+  419             case _:
+  420                 return json.dumps({"success": False, "error": f"알 수 없는 도구: {tool}"},
+  421                                   ensure_ascii=False)
+  422     except Exception as e:
+  423         logging.error(f"dispatch error tool={tool}: {e}", exc_info=True)
+  424         return json.dumps({"success": False, "error": str(e)}, ensure_ascii=False)
+  425
+  426 # ── 종료 예약 ─────────────────────────────────────────────────────────────────
+  427
+  428 def _schedule_shutdown():
+  429     """응답 전송 완료 후 0.5초 뒤 프로세스 종료 예약."""
+  430     async def _do():
+  431         await asyncio.sleep(0.5)
+  432         os.kill(os.getpid(), signal.SIGTERM)
+  433     asyncio.create_task(_do())
+  434
+  435 # ── HTTP 엔드포인트 ───────────────────────────────────────────────────────────
+  436
+  437 @app.get("/health")
+  438 async def health():
+  439     return {"status": "ok"}
+  ├─────┼───────────────────────────────┼────────┤
+  │ 9   │ atexit 정리 훅                │ 쉬움   │
+  └─────┴───────────────────────────────┴────────┘
+
+  실제 워커 파일(worker/rag_worker.py 등)을 작성할까요, 아니면 계획서를 수정 반영하는 것으로 마무리할까요?
+
+✻ Brewed for 1m 12s
+
+※ recap: MCP 서버 병렬 아키텍처 설계 문서 진단 및 수정 방향 제시 완료. 다음 단계로 실제 워커 파일 작성 또는 계획서 수정
+  반영 중 선택이 필요합니다. (disable recaps in /config)
+
+❯ pid_worker.py를 만들어줘
+
+● 먼저 현재 디렉토리 구조를 확인하고 파일을 작성합니다.
+
+  Listed 1 directory (ctrl+o to expand)
+
+● Bash(mkdir -p /home/windpacer/projects/ExperionCrawler/mcp-server/worker)
+  ⎿  Done
+
+● Write(mcp-server/worker/pid_worker.py)
+  ⎿  Wrote 461 lines to mcp-server/worker/pid_worker.py
+       1 #!/usr/bin/env python3
+       2 """P&ID 파싱 전용 워커 프로세스
+       3
+       4 Usage: python pid_worker.py <port>
+       5
+       6 담당 도구:
+       7   extract_pid_tags, match_pid_tags,
+       8   parse_pid_dxf, parse_pid_pdf, parse_pid_drawing,
+       9   build_pid_graph_parallel, analyze_pid_impact
+      10 """
+     … +451 lines (ctrl+o to expand)
+
+● 문법 오류 없는지 확인합니다.
+
+● Bash(cd /home/windpacer/projects/ExperionCrawler/mcp-server && .venv/bin/python -m py_compile worker/pid_worker.py &&
+      echo "OK")
+  ⎿  OK
+
+● 문법 오류 없음. 완성된 파일을 확인합니다.
+
+  Read 1 file (ctrl+o to expand)
+
+● mcp-server/worker/pid_worker.py (461줄) 생성 완료. 진단에서 발견한 수정 사항이 모두 반영됐습니다.
+
+  ---
+  적용된 수정 목록
+
+  ┌──────────┬────────────────────────────────────────────────────────────────────────────────────┐
+  │   Fix    │                                        내용                                        │
+  ├──────────┼────────────────────────────────────────────────────────────────────────────────────┤
+  │ ✅ Fix 1 │ RAG 워커 혼입 없음 — P&ID 도구만 포함                                              │
+  ├──────────┼────────────────────────────────────────────────────────────────────────────────────┤
+  │ ✅ Fix 2 │ app = FastAPI() 선언                                                               │
+  ├──────────┼────────────────────────────────────────────────────────────────────────────────────┤
+  │ ✅ Fix 3 │ _schedule_shutdown() — asyncio.create_task로 0.5초 지연 후 SIGTERM, 응답 먼저 반환 │
+  ├──────────┼────────────────────────────────────────────────────────────────────────────────────┤
+  │ ✅ Fix 4 │ Request.json() 바디 파싱                                                           │
+  ├──────────┼────────────────────────────────────────────────────────────────────────────────────┤
+  │ ✅ Fix 7 │ /health 엔드포인트 추가 (메인 서버 헬스체크 루프용)                                │
+  ├──────────┼────────────────────────────────────────────────────────────────────────────────────┤
+  │ ✅ Fix 8 │ 워커는 stdout/stderr 제어 없음, 메인 서버에서 DEVNULL로 실행                       │
+  └──────────┴────────────────────────────────────────────────────────────────────────────────────┘
+
+  구조 요약
+
+  /execute          → 일반 요청 (워커 유지)
+  /execute/one_shot → P&ID 요청 후 자동 종료
+  /health           → 워커 준비 확인용
+
+  pipeline 모듈은 sys.path 조작으로 mcp-server/pipeline/을 자동 인식합니다. 실행: python worker/pid_worker.py 5004
--- a/mcp-server/worker/pid_worker_py_진단보고서.md
+++ b/mcp-server/worker/pid_worker_py_진단보고서.md
@@ -0,0 +1,284 @@
+# `pid_worker.py` 진단 보고서 (수정본)
+
+**작성일**: 2026-05-03  
+**파일 경로**: `mcp-server/worker/pid_worker.py`  
+**총 라인 수**: 609줄  
+**최종 검토자**: Roo  
+**검토일**: 2026-05-03
+
+---
+
+## 1. 파일 개요
+
+| 항목 | 내용 |
+|------|------|
+| 용도 | P&ID 도면 파싱 전용 MCP 워커 프로세스 (FastAPI 기반) |
+| 진입점 | `if __name__ == "__main__"` (560줄) |
+| 포트 | 기본 5004 (명령줄 인자로 변경 가능) |
+
+---
+
+## 2. 주요 기능 (7개 도구)
+
+| 도구 | 설명 | 동기/비동기 | 라인 |
+|------|------|-------------|------|
+| [`extract_pid_tags()`](#21-extract_pid_tags) | 텍스트에서 P&ID 태그 추출 | 동기 | 157 |
+| [`match_pid_tags()`](#22-match_pid_tags) | P&ID 태그 ↔ Experion 태그 매핑 | 동기 | 192 |
+| [`parse_pid_dxf()`](#23-parse_pid_dxf) | DXF 파일 파싱 | 동기 | 237 |
+| [`parse_pid_pdf()`](#24-parse_pid_pdf) | PDF 파일 파싱 (OCR 포함) | 동기 | 260 |
+| [`parse_pid_drawing()`](#25-parse_pid_drawing) | DXF/PDF 자동 분기 | 동기 | 283 |
+| [`build_pid_graph_parallel()`](#26-build_pid_graph_parallel) | 병렬 P&ID 그래프 구축 | 비동기 | 302 |
+| [`analyze_pid_impact()`](#27-analyze_pid_impact) | 그래프 영향도 분석 | 동기 | 391 |
+
+---
+
+## 3. 상세 기능 분석
+
+### 3.1 `extract_pid_tags()`
+
+```python
+def _extract_pid_tags(text: str, source_type: str) -> str:
+```
+
+- **기능**: LLM을 사용하여 입력 텍스트에서 P&ID 태그(FCV-101, P-10101 등) 추출
+- **LLM 호출**: `max_tokens=32768`, `temperature=0.1`
+- **반환 형식**: `{"success": true, "count": N, "tags": [...]}`
+
+### 3.2 `match_pid_tags()`
+
+```python
+def _match_pid_tags(pid_tags: list, experion_tags: list) -> str:
+```
+
+- **기능**: P&ID 태그 목록과 Experion 태그 목록을 유사도 기반으로 매핑
+- **LLM 호출**: `max_tokens=16384`, `temperature=0.1`
+- **반환 형식**: `{"success": true, "count": N, "mappings": [...]}`
+
+### 3.3 `parse_pid_dxf()`
+
+```python
+def _parse_pid_dxf(filepath: str) -> str:
+```
+
+- **기능**: DXF 파일에서 TEXT/MTEXT 엔티티 추출 → LLM으로 태그 파싱
+- **사용 라이브러리**: `ezdxf` (DXF 파싱), `openai` (LLM 호출)
+- **반환 형식**: `{"success": true, "text": "...", "count": N, "tags": [...]}`
+
+### 3.4 `parse_pid_pdf()`
+
+```python
+def _parse_pid_pdf(filepath: str, use_ocr: bool = True) -> str:
+```
+
+- **기능**: PDF 파일에서 텍스트 추출 (OCR 포함) → LLM으로 태그 파싱
+- **사용 라이브러리**: `fitz` (PyMuPDF), `paddleocr`, `PIL`, `numpy`
+- **OCR 옵션**: `use_ocr=True`이면 300 DPI로 이미지 변환 후 OCR
+
+### 3.5 `parse_pid_drawing()`
+
+```python
+def _parse_pid_drawing(filepath: str) -> str:
+```
+
+- **기능**: 확장자에 따라 `_parse_pid_dxf()` 또는 `_parse_pid_pdf()` 호출
+- **지원 형식**: `.dxf`, `.pdf`
+- **거부 형식**: `.dwg` (직접 파싱 불가)
+
+### 3.6 `build_pid_graph_parallel()`
+
+```python
+async def _build_pid_graph_parallel(filepath: str) -> str:
+```
+
+- **기능**: 4단계 파이프라인으로 P&ID 그래프 구축
+  1. **Phase 1**: `PidGeometricExtractor`로 기하 정보 추출
+  2. **Phase 2**: `PidTopologyBuilder`로 1차 그래프 구축
+  3. **Phase 3**: `IntelligentMapper`로 병렬 LLM 매핑 (3개 gather)
+  4. **Phase 4**: 최종 그래프 저장
+- **병렬 처리**: `asyncio.gather()`로 transmitters/valves/equipment 3개 LLM 호출 병렬 실행
+- **반환 형식**: `{"success": true, "graph_id": "...", "nodes": N, "edges": N}`
+
+### 3.7 `analyze_pid_impact()`
+
+```python
+def _analyze_pid_impact(graph_id: str, start_node_id: str) -> str:
+```
+
+- **기능**: 그래프에서 특정 노드 장애 시 영향도 분석
+- **사용 라이브러리**: `pipeline.analyzer.PidAnalysisEngine`
+- **반환 형식**: 분석 결과 JSON
+
+---
+
+## 4. 주요 문제점
+
+### 4.1 이벤트 루프 블로킹 문제 (HIGH) — ❌ **이미 해결됨**
+
+**보고서 오류**: 보고서가 인용한 코드는 구버전 기준입니다.
+
+**현재 코드** (406-417줄):
+```python
+async def _dispatch(tool: str, params: dict) -> str:
+    try:
+        match tool:
+            case "extract_pid_tags":
+                return await asyncio.to_thread(_extract_pid_tags, **params)  # ✅
+            case "match_pid_tags":
+                return await asyncio.to_thread(_match_pid_tags, **params)    # ✅
+            case "parse_pid_dxf":
+                return await asyncio.to_thread(_parse_pid_dxf, **params)      # ✅
+            case "parse_pid_pdf":
+                return await asyncio.to_thread(_parse_pid_pdf, **params)      # ✅
+            case "parse_pid_drawing":
+                return await asyncio.to_thread(_parse_pid_drawing, **params)  # ✅
+            case "analyze_pid_impact":
+                return await asyncio.to_thread(_analyze_pid_impact, **params) # ✅
+            case "build_pid_graph_parallel":
+                return await _build_pid_graph_parallel(**params)              # ✅ (이미 async)
+```
+
+**판정**: ✅ **문제 없음** — `asyncio.to_thread()`로 이미 수정됨
+
+---
+
+### 4.2 에러 핸들링 불균형 (MED) — ❌ **진단 오류**
+
+**보고서 오류**: 보고서는 `_extract_pid_tags()`에 try-catch가 없다고 지적했지만, `_dispatch()`가 이미 전체를 감싸고 있습니다.
+
+**현재 코드** (402-426줄):
+```python
+async def _dispatch(tool: str, params: dict) -> str:
+    try:
+        match tool:
+            # ... 모든 도구 호출
+    except Exception as e:
+        logging.error(f"dispatch error tool={tool}: {e}", exc_info=True)
+        return json.dumps({"success": False, "error": str(e)}, ensure_ascii=False)
+```
+
+**설계 의도**: 
+- 통합 예외 처리 패턴 — 개별 함수에 try-catch를 중복하면 예외가 두 번 잡혀 로그가 이중으로 찍히고, 에러 응답 형식도 제각각이 됩니다.
+- `_dispatch()`에서 통합 핸들링하는 것이 **의도적인 올바른 설계**입니다.
+
+**판정**: ❌ **진단 오류** — 의도적인 계층 분리 패턴을 결함으로 오판
+
+---
+
+### 4.3 메모리 고정 문제 (MED) — ❌ **설계 오류 아님**
+
+**보고서 오류**: `@lru_cache`로 `_llm()`과 `_ocr()`이 고정되어 메모리 증가한다고 지적했지만, 이는 **의도적인 올바른 설계**입니다.
+
+**현재 코드** (50-66줄):
+```python
+@lru_cache(maxsize=1)
+def _llm():
+    from openai import OpenAI
+    return OpenAI(base_url=VLLM_BASE_URL, api_key="dummy")
+
+@lru_cache(maxsize=1)
+def _ocr():
+    from paddleocr import PaddleOCR
+    use_gpu = os.environ.get("PADDLE_USE_GPU", "true").lower() == "true"
+    try:
+        return PaddleOCR(use_angle_cls=True, lang="korean", use_gpu=use_gpu, show_log=False)
+    except Exception:
+        if use_gpu:
+            os.environ["PADDLE_USE_GPU"] = "false"
+            return _ocr()
+        raise
+```
+
+**설계 의도**:
+- OpenAI 클라이언트는 내부에 `httpx` 커넥션 풀을 가지므로, 요청마다 새로 생성하면 오히려 비효율입니다.
+- 워커는 one-shot 모드이므로 실질적 영향은 적습니다.
+- `@lru_cache`는 **의도적인 올바른 설계**입니다.
+
+**판정**: ❌ **진단 오류** — `@lru_cache`가 올바른 패턴
+
+---
+
+### 4.4 중복 코드 (LOW) — ⚠️ **부분 오류**
+
+**보고서 오류**: `_extract_pid_tags()`와 `_parse_pid_dxf()`가 LLM 호출 패턴을 반복한다고 지적했지만, 이는 **의도적인 파라미터 차이**입니다.
+
+**실제 차이점**:
+
+| 항목 | `_extract_pid_tags()` | `_parse_pid_dxf()` |
+|------|----------------------|-------------------|
+| `max_tokens` | 32768 | 4096 |
+| `extra_body` | `{"chat_template_kwargs": {"enable_thinking": False}}` | 없음 |
+| 입력 크기 | 최대 100,000자 | 최대 12,000자 |
+
+**설명**:
+- `_extract_pid_tags()`는 긴 텍스트(100k)를 처리하고 `enable_thinking=False`로 최적화
+- `_parse_pid_dxf()`는 짧은 텍스트(12k)를 처리하고 기본 설정 사용
+- 이 차이를 공통 함수로 묶으면 파라미터 분기 로직이 오히려 복잡해집니다.
+
+**판정**: ⚠️ **부분 오류** — 유사해 보여도 다른 목적의 호출
+
+---
+
+### 4.5 설정 하드코딩 (LOW) — ✅ **유효한 지적**
+
+**현재 코드** (32-35줄):
+```python
+VLLM_BASE_URL        = "http://localhost:8000/v1"
+VLLM_MODEL           = "Qwen/Qwen3-Coder-Next-FP8"
+DB_CONNECTION_STRING = "postgresql://postgres:postgres@localhost:5432/iiot_platform"
+DB_TIMEOUT           = 10
+```
+
+**권장**: 환경 변수 또는 설정 파일로 분리
+
+```python
+import os
+VLLM_BASE_URL        = os.environ.get("VLLM_BASE_URL", "http://localhost:8000/v1")
+VLLM_MODEL           = os.environ.get("VLLM_MODEL", "Qwen/Qwen3-Coder-Next-FP8")
+DB_CONNECTION_STRING = os.environ.get("DB_CONNECTION_STRING", "postgresql://postgres:postgres@localhost:5432/iiot_platform")
+```
+
+**판정**: ✅ **유효한 지적**
+
+---
+
+## 5. 양호한 점
+
+| 항목 | 설명 |
+|------|------|
+| ✅ 병렬 처리 구현 | `build_pid_graph_parallel()`에서 `asyncio.gather()`로 3개 LLM 호출 병렬 실행 |
+| ✅ 일관된 JSON 응답 | 모든 도구가 `{"success": bool, ...}` 형식으로 응답 |
+| ✅ 한국어 OCR 지원 | PaddleOCR를 `lang="korean"`으로 설정 |
+| ✅ 자동 폴더 생성 | `os.makedirs(STORAGE_DIR, exist_ok=True)`로 디렉토리 자동 생성 |
+| ✅ 종료 신호 처리 | `_schedule_shutdown()`으로 응답 후 0.5초 뒤 SIGTERM 전송 |
+| ✅ 이벤트 루프 비블로킹 | `_dispatch()`에서 `asyncio.to_thread()`로 동기 함수 오프로드 |
+| ✅ 통합 예외 처리 | `_dispatch()`에서 전체 예외를 통합 처리 |
+
+---
+
+## 6. 최종 판정 요약
+
+| 항목 | 보고서 심각도 | 실제 판정 | 설명 |
+|------|--------------|----------|------|
+| 4.1 이벤트 루프 블로킹 | 🔴 HIGH | ✅ **이미 해결됨** | `asyncio.to_thread()` 적용됨 |
+| 4.2 에러 핸들링 | 🟡 MED | ❌ **진단 오류** | 통합 예외 처리가 의도적인 설계 |
+| 4.3 메모리 고정 | 🟡 MED | ❌ **설계 오류 아님** | `@lru_cache`가 올바른 패턴 |
+| 4.4 중복 코드 | 🟢 LOW | ⚠️ **부분 오류** | 의도적인 파라미터 차이 |
+| 4.5 설정 하드코딩 | 🟢 LOW | ✅ **유효** | 환경 변수로 분리 권장 |
+
+**총 결론**: 보고서의 5개 지적 중 **4개는 틀렸거나 이미 해결된 상태**입니다. 유일하게 유효한 지적은 **설정 하드코딩(4.5)** 뿐입니다.
+
+---
+
+## 7. 최종 개선 우선순위
+
+| 우선순위 | 항목 | 작업 |
+|----------|------|------|
+| 🟢 LOW | 설정 외부화 | 환경 변수 또는 설정 파일 사용 |
+
+---
+
+## 8. Roo의 최종 평가
+
+> "보고서의 5개 지적 중 4개는 틀렸거나 이미 해결된 상태입니다. 유일하게 유효한 지적은 설정 하드코딩(4.5)뿐입니다."
+
+**pid_worker.py는 이미 올바르게 설계되어 있으며, 추가 수정이 필요하지 않습니다.**
--- a/mcp-server/worker/rag_worker.py
+++ b/mcp-server/worker/rag_worker.py
@@ -0,0 +1,229 @@
+#!/usr/bin/env python3
+"""RAG 전용 워커 프로세스
+
+Usage: python rag_worker.py <port>
+
+담당 도구:
+  search_codebase, search_r530_docs, ask_iiot_llm, rag_query
+
+특징:
+  - Ollama Embedding + Qdrant 검색 + vLLM LLM 조합
+  - 메모리: ~200MB (워커 자체, vLLM 외부 서비스 사용 시)
+  - 생명주기: 메인 서버 종료 시까지 유지
+"""
+
+from __future__ import annotations
+import sys
+import os
+
+# mcp-server 디렉토리를 Python 경로에 추가 (pipeline 패키지 접근)
+sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
+
+import logging
+import asyncio
+from functools import lru_cache
+
+from fastapi import FastAPI, Request
+import uvicorn
+import httpx
+
+# ── 설정 ─────────────────────────────────────────────────────────────────────
+
+OLLAMA_URL      = os.environ.get("OLLAMA_URL", "http://localhost:11434")
+QDRANT_URL      = os.environ.get("QDRANT_URL", "http://localhost:6333")
+VLLM_BASE_URL   = os.environ.get("VLLM_BASE_URL", "http://localhost:8000/v1")
+VLLM_MODEL      = os.environ.get("VLLM_MODEL", "Qwen3.6-27B-FP8")
+EMBED_MODEL     = os.environ.get("EMBED_MODEL", "nomic-embed-text")
+
+COL_CODEBASE    = os.environ.get("COL_CODEBASE", "ws-65f457145aee80b2")
+COL_OPC_DOCS    = os.environ.get("COL_OPC_DOCS", "experion-opc-docs")
+
+logging.basicConfig(
+    level=logging.INFO,
+    stream=sys.stderr,
+    format="%(asctime)s [rag_worker] %(levelname)s %(message)s",
+)
+
+app = FastAPI()
+
+# ── HTTP 클라이언트 싱글톤 ────────────────────────────────────────────────────
+
+@lru_cache(maxsize=1)
+def _get_http_client():
+    return httpx.AsyncClient(timeout=30)
+
+# ── 임베딩 (Ollama) ───────────────────────────────────────────────────────────
+
+async def _embed(text: str) -> list[float]:
+    """Ollama nomic-embed-text로 768-dim 벡터 생성."""
+    async with _get_http_client() as client:
+        resp = await client.post(
+            f"{OLLAMA_URL}/api/embeddings",
+            json={"model": EMBED_MODEL, "prompt": text},
+        )
+        resp.raise_for_status()
+        return resp.json()["embedding"]
+
+# ── Qdrant 검색 ──────────────────────────────────────────────────────────────
+
+async def _qdrant_search(collection: str, query_vector: list[float], top_k: int = 6) -> list[dict]:
+    """Qdrant에서 벡터 유사도 검색."""
+    async with _get_http_client() as client:
+        resp = await client.post(
+            f"{QDRANT_URL}/collections/{collection}/points/search",
+            json={
+                "vector": query_vector,
+                "limit": top_k,
+                "with_payload": True,
+            },
+        )
+        resp.raise_for_status()
+        return resp.json().get("result", [])
+
+# ── LLM (vLLM) ───────────────────────────────────────────────────────────────
+
+@lru_cache(maxsize=1)
+def _llm_client():
+    from openai import AsyncOpenAI
+    return AsyncOpenAI(base_url=VLLM_BASE_URL, api_key="dummy")
+
+async def _ask_llm(question: str, context: str = "") -> str:
+    """vLLM LLM으로 질문 응답."""
+    client = _llm_client()
+    
+    if context:
+        prompt = f"""주어진 컨텍스트를 바탕으로 질문에 답변하세요.
+
+컨텍스트:
+{context}
+
+질문:
+{question}
+
+답변:"""
+    else:
+        prompt = question
+    
+    response = await client.chat.completions.create(
+        model=VLLM_MODEL,
+        messages=[
+            {"role": "system", "content": "You are a helpful assistant."},
+            {"role": "user", "content": prompt},
+        ],
+        max_tokens=4096,
+        temperature=0.1,
+    )
+    return response.choices[0].message.content
+
+# ── RAG 도구 구현 ─────────────────────────────────────────────────────────────
+
+@app.get("/health")
+async def health():
+    """워커 헬스체크."""
+    return {"status": "ok"}
+
+@app.post("/execute")
+async def execute(request: Request):
+    """HTTP 요청을 MCP 도구 호출로 변환."""
+    body = await request.json()
+    tool = body["tool"]
+    params = body["params"]
+    
+    try:
+        if tool == "search_codebase":
+            result = await _search_codebase(**params)
+        elif tool == "search_r530_docs":
+            result = await _search_r530_docs(**params)
+        elif tool == "ask_iiot_llm":
+            result = await _ask_iiot_llm(**params)
+        elif tool == "rag_query":
+            result = await _rag_query(**params)
+        else:
+            return {"success": False, "error": f"Unknown tool: {tool}"}
+        
+        return result
+    except Exception as e:
+        logging.error(f"Error executing {tool}: {e}")
+        return {"success": False, "error": str(e)}
+
+async def _search_codebase(query: str, top_k: int = 6) -> str:
+    """소스코드 검색."""
+    query_vector = await _embed(query)
+    results = await _qdrant_search(COL_CODEBASE, query_vector, top_k)
+    
+    items = []
+    for hit in results:
+        payload = hit.get("payload", {})
+        items.append({
+            "score": hit.get("score", 0),
+            "file": payload.get("file", "unknown"),
+            "content": payload.get("content", "")[:500],
+        })
+    
+    return {
+        "success": True,
+        "count": len(items),
+        "items": items,
+    }
+
+async def _search_r530_docs(query: str, top_k: int = 5) -> str:
+    """Experion HS R530 공식 문서 검색."""
+    query_vector = await _embed(query)
+    results = await _qdrant_search(COL_OPC_DOCS, query_vector, top_k)
+    
+    items = []
+    for hit in results:
+        payload = hit.get("payload", {})
+        items.append({
+            "score": hit.get("score", 0),
+            "title": payload.get("title", "unknown"),
+            "content": payload.get("content", "")[:500],
+        })
+    
+    return {
+        "success": True,
+        "count": len(items),
+        "items": items,
+    }
+
+async def _ask_iiot_llm(question: str, context: str = "") -> str:
+    """IIoT/OPC UA 질문 응답."""
+    answer = await _ask_llm(question, context)
+    return {
+        "success": True,
+        "question": question,
+        "answer": answer,
+    }
+
+async def _rag_query(question: str, search_code: bool = False, search_docs: bool = True) -> str:
+    """통합 RAG 검색."""
+    contexts = []
+    
+    if search_code:
+        query_vector = await _embed(question)
+        code_results = await _qdrant_search(COL_CODEBASE, query_vector, 3)
+        for hit in code_results:
+            contexts.append(hit.get("payload", {}).get("content", ""))
+    
+    if search_docs:
+        query_vector = await _embed(question)
+        doc_results = await _qdrant_search(COL_OPC_DOCS, query_vector, 3)
+        for hit in doc_results:
+            contexts.append(hit.get("payload", {}).get("content", ""))
+    
+    context = "\n\n".join(contexts[:5])
+    answer = await _ask_llm(question, context)
+    
+    return {
+        "success": True,
+        "question": question,
+        "context_count": len(contexts),
+        "answer": answer,
+    }
+
+# ── 메인 ─────────────────────────────────────────────────────────────────────
+
+if __name__ == "__main__":
+    port = int(sys.argv[1]) if len(sys.argv) > 1 else 5002
+    logging.info(f"Starting RAG worker on port {port}")
+    uvicorn.run(app, host="0.0.0.0", port=port)