ExperionCrawler/.rooBackup/2026-05-02_pipeline_sync/Graph_Pipeline_Phase5.md

# 🔌 Graph Pipeline Phase 5: MCP 서버 통합 및 고성능 병렬 아키텍처 (MCP Integration & Parallel Processing)

이 문서는 앞서 설계한 1~4단계의 Graph Pipeline을 현재 프로젝트의 **Unified MCP Server (`mcp-server/server.py`)**에 통합하는 방안을 다룹니다. 특히, 대용량 도면 처리 시 발생하는 지연과 버퍼 문제를 해결하기 위해 `PID_Parser_Plan_Revision.md`의 **분산 처리 기법**과 vLLM의 **Continuous Batching** 특성을 극대화한 병렬 아키텍처를 적용합니다.

---

## 🏗️ 1. 통합 아키텍처 설계

### 1.1 고성능 병렬 데이터 흐름 (Parallel End-to-End Flow)
단일 순차 요청 방식에서 벗어나, **[전처리 $\rightarrow$ 병렬 분산 추출 $\rightarrow$ 통합 후처리]** 구조로 전환합니다.

`Frontend (UI)` $\rightarrow$ `C# Server (API)` $\rightarrow$ `MCP Server (Orchestrator)` $\rightarrow$ `Parallel Worker Tools (vLLM Batching)` $\rightarrow$ `Result Aggregator` $\rightarrow$ `C# Server`

1.  **요청:** 사용자가 UI에서 도면 분석 시작 버튼 클릭.
2.  **전처리 (Orchestrator):** MCP 서버가 DXF를 로드하여 기하학적 데이터를 추출하고, 분석 대상(Transmitter, Valve, Pump 등)별로 데이터를 분할합니다.
3.  **병렬 호출 (Continuous Batching):**
    *   분할된 데이터를 기반으로 여러 개의 MCP 툴(또는 동일 툴의 다중 요청)을 **동시에(Asynchronously)** 호출합니다.
    *   vLLM 서버는 이 다수의 요청을 **Continuous Batching**으로 묶어 처리함으로써, 개별 요청 시보다 전체 처리량(Throughput)을 획기적으로 높입니다.
4.  **통합 및 저장 (Aggregator):** 각 분산 툴이 반환한 결과를 취합하여 최종 위상 그래프를 구축하고 DB에 저장합니다.

### 1.2 MCP 서버 내 역할 분담 (분산 처리 모델)
`PID_Parser_Plan_Revision.md`를 반영하여, 기능을 세분화하고 병렬 실행 가능하게 설계합니다.

| 구분 | MCP Tool / Module | 역할 | 병렬 처리 전략 |
|---|---|---|---|
| **Orchestrator** | `orchestrate_pid_pipeline` | 전체 공정 제어, 데이터 분할 및 결과 취합 | Asyncio 기반 비동기 제어 |
| **Worker 1** | `extract_transmitters` | FIT, FT, LT, PT, TE 추출 | vLLM Batching 요청 |
| **Worker 2** | `extract_valves` | FCV, LCV, TCV, PCV, XV 추출 | vLLM Batching 요청 |
| **Worker 3** | `extract_gauges` | PG, TG, LG 추출 | vLLM Batching 요청 |
| **Worker 4** | `extract_equipment` | Column, Tank, Filter, Drum, Heat Exchanger 등 추출 | vLLM Batching 요청 |
| **Worker 5** | `extract_pumps` | P-xxxx, VP-xxxx 추출 | vLLM Batching 요청 |
| **Analyzer** | `analyze_pid_impact` | 구축된 그래프 기반 영향도 분석 | Graph Algorithm (CPU) |

---

## 💻 2. MCP 서버 통합 구현 가이드

### 2.1 비동기 병렬 처리 설계 (Asyncio + vLLM Batching)
`FastMCP` 환경에서 `asyncio.gather`를 사용하여 여러 추출 툴을 동시에 호출함으로써 vLLM의 Continuous Batching 효율을 극대화합니다.

```python
# mcp-server/server.py 통합 설계 (개념 코드)
import asyncio
from typing import List

async def run_parallel_extraction(geo_data):
    """
    분류별 추출 툴을 병렬로 호출하여 vLLM Batching 유도
    """
    # 각 분류별 프롬프트와 데이터 준비
    tasks = [
        extract_transmitters_async(geo_data),
        extract_valves_async(geo_data),
        extract_gauges_async(geo_data),
        extract_equipment_async(geo_data),
        extract_pumps_async(geo_data)
    ]

    # 동시에 요청을 던져 vLLM이 내부적으로 Batch 처리하게 함
    results = await asyncio.gather(*tasks)
    return results

@mcp.tool()
async def build_pid_graph_parallel(filepath: str) -> str:
    """
    분산 처리 기법을 적용한 P&ID 그래프 생성 툴
    """
    # 1. 전처리 (Phase 1)
    extractor = PidGeometricExtractor(filepath)
    geo_data = extractor.extract_all()

    # 2. 병렬 분산 추출 (vLLM Batching 활용)
    # 각 Worker 툴들이 LLM에 요청을 보낼 때 vLLM이 이를 묶어서 처리함
    extracted_parts = await run_parallel_extraction(geo_data)

    # 3. 결과 통합 및 위상 모델링 (Phase 2)
    all_tags = flatten_results(extracted_parts)
    builder = PidTopologyBuilder(geo_data, all_tags)
    builder.build_graph()

    # 4. 저장
    graph_id = os.path.basename(filepath).replace(".dxf", "_graph.json")
    nx.write_graphml(builder.G, f"storage/{graph_id}")

    return json.dumps({"success": True, "graph_id": graph_id, "nodes": builder.G.number_of_nodes()})
```

### 2.2 C# 서버와의 인터페이스 (`McpClient` 활용)
C# 서버는 `src/Infrastructure/Mcp/McpClient.cs`를 통해 위 툴들을 호출합니다.

### 2.2 C# 서버와의 인터페이스 (`McpClient` 활용)
C# 서버는 `src/Infrastructure/Mcp/McpClient.cs`를 통해 위 툴들을 호출합니다.

```csharp
// src/Core/Application/Services/PidGraphService.cs (신규 서비스)
public async Task<ImpactResult> GetImpactAnalysisAsync(string graphId, string nodeId)
{
    var request = new McpToolRequest {
        ToolName = "analyze_pid_impact",
        Arguments = new { graph_id = graphId, start_node_id = nodeId }
    };

    var jsonResponse = await _mcpClient.CallToolAsync(request);
    return JsonSerializer.Deserialize<ImpactResult>(jsonResponse);
}
```

---

## 🛠️ 3. 프로그램 구성 및 배포 전략

### 3.1 디렉토리 구조 확장
```text
mcp-server/
├── server.py              # MCP 메인 서버 (툴 정의)
├── pipeline/              # Graph Pipeline 핵심 로직 (Phase 1~4)
│   ├── __init__.py
│   ├── extractor.py       # Phase 1: Geometric Extraction
│   ├── topology.py        # Phase 2: Topology Modeling
│   ├── mapper.py          # Phase 3: Intelligent Mapping
│   └── analyzer.py        # Phase 4: Impact Analysis
└── storage/               # 생성된 그래프 파일 (.graphml) 저장소
```

### 3.2 실행 프로세스
1.  **MCP 서버 기동:** `python mcp-server/server.py --http` (포트 5001)
2.  **C# 서버 기동:** `dotnet run` (포트 5000)
3.  **통신:** C# 서버 $\xrightarrow{HTTP/JSON}$ MCP 서버 $\xrightarrow{Python\ Libs}$ 결과 반환.

---

## 🚀 4. 최종 완료 기준 (Definition of Done)

- [ ] `mcp-server/server.py`에 `build_pid_graph`, `analyze_pid_impact` 등 핵심 툴이 정의되었는가?
- [ ] Phase 1~4의 Python 로직이 `mcp-server/pipeline/` 모듈로 구조화되어 통합되었는가?
- [ ] C# `McpClient`를 통해 MCP 서버의 그래프 분석 툴을 호출하고 결과를 수신할 수 있는가?
- [ ] 도면 업로드 $\rightarrow$ 그래프 생성 $\rightarrow$ 태그 매핑 $\rightarrow$ 영향도 분석으로 이어지는 **End-to-End 파이프라인**이 완성되었는가?
- [ ] 모든 과정이 `json_response=True` 및 `stateless_http=True` 설정 하에 안정적으로 동작하는가?