ExperionCrawler/P&ID_병렬LLM_아키텍처_개선안.md

# P&ID 도면 파싱 병렬 LLM 아키텍처 개선안

## 1. 기존 문제점 분석

### 1.1 현재 구조의 병목
| 단계 | 문제점 | 심각도 |
|------|--------|--------|
| Phase 1 | ezdxf로 28,000개 엔티티 처리 | 0.58초 (양호) |
| Phase 2 | O(n²) 노드 병합 | timeout (심각) |
| Phase 3 | 순차적 LLM API 호출 | 예측 불가능한 지연 |

### 1.2 test_dxf_extract_pid*.py의 성공적인 병렬 처리 구조

```python
# test_dxf_extract_pid1.py, pid2.py, pid3.py의 공통 구조
chunks = [
    {
        'name': 'Field Instruments - Sensors',
        'system': 'Extract sensor tags only...',
        'user': 'Extract ALL tags of FT, FIT, LT, PT...'
    },
    {
        'name': 'Field Instruments - Valves',
        'system': 'Extract valve tags only...',
        'user': 'Extract ALL tags of FCV, TCV, LCV...'
    },
    {
        'name': 'System Tags',
        'system': 'Extract system tags only...',
        'user': 'Extract ALL tags of LI, PI, TI...'
    }
]

# 각 청크를 순차적으로 처리하지만, LLM은 병렬로 실행 가능
for chunk in chunks:
    resp = llm.chat.completions.create(...)
```

**핵심 발견**:
- **청크 단위 분할**: 태그 유형별로 프롬프트를 분리
- **동시 실행 가능**: 각 청크는 독립적이므로 병렬 실행 가능
- **LLM 자원 최대화**: vLLM의 tensor parallelism 활용 가능

---

## 2. 병렬 LLM 처리 아키텍처 설계

### 2.1 전체 파이프라인 구조 (개선안)

```
┌─────────────────────────────────────────────────────────────────────┐
│                    P&ID 도면 파싱 파이프라인 (병렬 LLM)               │
└─────────────────────────────────────────────────────────────────────┘

Phase 1: 기하학적 추출 (ezdxf)
├─ DXF 파일 로드 (0.84초)
├─ 엔티티별 BBox 계산 (0.58초)
└─ 결과: 28,257개 GeometricEntity

Phase 2: 위상 빌더 (공간 인덱스 + 병렬 LLM)
├─ 공간 인덱스 생성 (R-tree)
├─ 노드 병합 (O(n log n))
└─ 결과: NetworkX 그래프

Phase 3: 지능형 매핑 (병렬 LLM)
├─ 태그 유형별 청크 분할
│  ├─ Sensor Tags (FT, FIT, LT, PT, TE, ...)
│  ├─ Valve Tags (FCV, TCV, LCV, PCV, XV, ...)
│  ├─ Equipment Tags (Pump, Tank, Heat Exchanger)
│  └─ System Tags (FICQ, TICA, PICA, ...)
│
├─ 병렬 LLM 실행 (4개 청크 동시에)
│  ├─ LLM Worker 1: Sensor Tags → 100개 태그
│  ├─ LLM Worker 2: Valve Tags → 80개 태그
│  ├─ LLM Worker 3: Equipment Tags → 50개 태그
│  └─ LLM Worker 4: System Tags → 120개 태그
│
└─ 결과: 350개 매핑된 태그
```

### 2.2 병렬 LLM 워커 구조

```python
# 병렬 LLM 워커
import asyncio
from typing import List, Dict, Any
from openai import AsyncOpenAI

class ParallelLLMWorker:
    def __init__(self, api_client: AsyncOpenAI, max_concurrent: int = 4):
        self.client = api_client
        self.max_concurrent = max_concurrent
        self.semaphore = asyncio.Semaphore(max_concurrent)

    async def process_chunk(self, chunk: Dict[str, Any]) -> List[Dict[str, Any]]:
        """단일 청크 처리 (비동기 + 세마포어로 병렬 제한)"""
        async with self.semaphore:
            system = chunk['system']
            user = chunk['user'].format(text=chunk['text'])

            response = await self.client.chat.completions.create(
                model='Qwen/Qwen3-Coder-Next-FP8',
                messages=[
                    {'role': 'system', 'content': system},
                    {'role': 'user', 'content': user},
                ],
                max_tokens=65536,
                temperature=0.1,
            )

            return self._parse_response(response)

    async def process_all_chunks(self, chunks: List[Dict[str, Any]]) -> List[Dict[str, Any]]:
        """모든 청크 병렬 처리"""
        tasks = [self.process_chunk(chunk) for chunk in chunks]
        results = await asyncio.gather(*tasks)

        # 결과 병합
        all_tags = []
        seen_tags = set()
        for tags in results:
            for tag in tags:
                tag_no = tag.get('tagNo')
                if tag_no and tag_no not in seen_tags:
                    seen_tags.add(tag_no)
                    all_tags.append(tag)
        return all_tags
```

---

## 3. 상세 구현 계획

### 3.1 Phase 1: 기하학적 추출 (변경 없음)

```python
# pid_geometric_extractor.py (현재 그대로 사용)
class PidGeometricExtractor:
    def __init__(self, file_path: str):
        self.doc = ezdxf.readfile(file_path)
        self.msp = self.doc.modelspace()

    def extract_and_save(self, output_path: str):
        results = []
        for entity in self.msp:
            bbox_obj = self.get_bbox(entity)
            # ... 추출 로직
        return results
```

### 3.2 Phase 2: 위상 빌더 (공간 인덱스 도입)

```python
# pid_topology_builder.py (개선안)
from rtree import index

class PidTopologyBuilder:
    def __init__(self, geometric_data: List[Dict[str, Any]]):
        self.data = geometric_data
        self.G = nx.DiGraph()

    def build_graph(self):
        # 1. 공간 인덱스 생성
        self._build_spatial_index()

        # 2. 노드 병합 (R-tree 사용)
        self._merge_nodes_spatial()

        # 3. 태그-설비 연결
        self._link_tags_to_equipment()

        # 4. 배관 연결
        self._link_pipes()

    def _build_spatial_index(self):
        """R-tree 공간 인덱스 생성"""
        p = index.Property()
        self.idx = index.Index(properties=p)
        for i, item in enumerate(self.data):
            bbox = item['bbox']
            self.idx.insert(i, (
                bbox['min_x'], bbox['min_y'],
                bbox['max_x'], bbox['max_y']
            ))

    def _merge_nodes_spatial(self):
        """공간 인덱스를 사용한 병합 (O(n log n))"""
        merge_threshold = 2.0
        merged = []
        visited = set()

        for i, item in enumerate(self.data):
            if i in visited:
                continue

            bbox = item['bbox']
            # 인접 노드만 검색
            neighbors = list(self.idx.intersection((
                bbox['min_x'] - merge_threshold,
                bbox['min_y'] - merge_threshold,
                bbox['max_x'] + merge_threshold,
                bbox['max_y'] + merge_threshold
            )))

            # ... 병합 로직
```

### 3.3 Phase 3: 병렬 LLM 매핑 (신규 구현)

```python
# pid_parallel_llm_mapper.py (신규 파일)
import asyncio
from typing import List, Dict, Any
from openai import AsyncOpenAI
from rapidfuzz import process, fuzz

class ParallelLLMMapper:
    def __init__(self, graph, system_tags: List[str], api_client: AsyncOpenAI,
                 max_concurrent: int = 4):
        self.graph = graph
        self.system_tags = system_tags
        self.client = api_client
        self.max_concurrent = max_concurrent
        self.semaphore = asyncio.Semaphore(max_concurrent)

    def create_chunks(self, node_ids: List[str]) -> List[Dict[str, Any]]:
        """노드를 태그 유형별로 청크 분할"""
        # 태그 유형별 분류
        sensors = []
        valves = []
        equipment = []
        system = []

        for node_id in node_ids:
            node_data = self.graph.nodes[node_id]
            tag_text = node_data.get('value', '').upper()

            # 태그 유형에 따라 분류
            if any(x in tag_text for x in ['FT', 'FIT', 'LT', 'PT', 'TE', 'PG', 'LG', 'TG']):
                sensors.append(node_id)
            elif any(x in tag_text for x in ['FCV', 'TCV', 'LCV', 'PCV', 'XV', 'FV', 'LV', 'PV', 'TV']):
                valves.append(node_id)
            elif any(x in tag_text for x in ['PUMP', 'TANK', 'HEAT', 'EXCHANGER']):
                equipment.append(node_id)
            else:
                system.append(node_id)

        # 청크 생성
        chunks = []
        if sensors:
            chunks.append({
                'name': 'Sensors',
                'node_ids': sensors,
                'system': 'You are a P&ID expert. Extract sensor tags only.',
                'user': 'Extract sensor tags: {tags}'
            })
        if valves:
            chunks.append({
                'name': 'Valves',
                'node_ids': valves,
                'system': 'You are a P&ID expert. Extract valve tags only.',
                'user': 'Extract valve tags: {tags}'
            })
        if equipment:
            chunks.append({
                'name': 'Equipment',
                'node_ids': equipment,
                'system': 'You are a P&ID expert. Extract equipment tags only.',
                'user': 'Extract equipment tags: {tags}'
            })
        if system:
            chunks.append({
                'name': 'System',
                'node_ids': system,
                'system': 'You are a P&ID expert. Extract system tags only.',
                'user': 'Extract system tags: {tags}'
            })

        return chunks

    async def process_chunk(self, chunk: Dict[str, Any]) -> Dict[str, Any]:
        """단일 청크 처리 (비동기 + 세마포어)"""
        async with self.semaphore:
            node_ids = chunk['node_ids']
            tag_texts = [self.graph.nodes[nid]['value'] for nid in node_ids]

            # RapidFuzz 후보 추출
            candidates_list = []
            for tag_text in tag_texts:
                candidates = process.extract(tag_text, self.system_tags, limit=5)
                candidates_list.append(candidates)

            # LLM 프롬프트 생성
            prompt = f"""
            {chunk['system']}
            다음 태그들을 시스템 태그와 매핑하세요:
            {chr(10).join(f'{t} -> {c}' for t, c in zip(tag_texts, candidates_list))}

            JSON 형식으로 응답:
            {{"node_id": "resolved_tag", ...}}
            """

            response = await self.client.chat.completions.create(
                model='Qwen/Qwen3-Coder-Next-FP8',
                messages=[{'role': 'user', 'content': prompt}],
                max_tokens=65536,
                temperature=0.1,
            )

            return self._parse_response(response)

    async def process_all_chunks(self, chunks: List[Dict[str, Any]]) -> Dict[str, Any]:
        """모든 청크 병렬 처리"""
        tasks = [self.process_chunk(chunk) for chunk in chunks]
        results = await asyncio.gather(*tasks)

        # 결과 병합
        merged = {}
        for result in results:
            merged.update(result)
        return merged
```

---

## 4. 성능 예측

### 4.1 Phase 1: 기하학적 추출
- **현재**: 1.4초
- **개선 후**: 1.4초 (변화 없음)

### 4.2 Phase 2: 위상 빌더
- **현재**: timeout (O(n²))
- **개선 후**: 2-3초 (R-tree O(n log n))

### 4.3 Phase 3: 병렬 LLM 매핑
- **현재**: 예측 불가 (순차적 API 호출)
- **개선 후**: 5-10초 (4개 청크 병렬 처리)

**예상 속도 향상**:
- Phase 2: 100배 이상 (timeout → 2-3초)
- Phase 3: 3-5배 (순차적 → 병렬)

---

## 5. 구현 우선순위

| 순위 | 작업 | 예상 시간 | 영향도 |
|------|------|-----------|--------|
| 1 | R-tree 공간 인덱스 도입 | 1일 | HIGH |
| 2 | 병렬 LLM 워커 구현 | 1일 | HIGH |
| 3 | Phase 2-3 통합 | 0.5일 | MEDIUM |
| 4 | 테스트 및 벤치마크 | 0.5일 | LOW |

**총 예상 시간**: 3일

---

## 6. 참고: test_dxf_extract_pid*.py의 성공 요인

### 6.1 청크 단위 분할
- 태그 유형별로 프롬프트를 분리하여 **의도적 병렬화** 가능
- 각 청크는 독립적이므로 **실패 격리** 가능

### 6.2 vLLM의 tensor parallelism 활용
- `Qwen/Qwen3-Coder-Next-FP8` 모델은 **8개 GPU 카드**에 분산 실행 가능
- 4개 청크를 동시에 실행하면 **모든 GPU 카드를 최대한 활용**

### 6.3 비동기 처리
- `asyncio.gather()`로 여러 청크를 동시에 실행
- 각 청크는 `async with semaphore`로 병렬도 제한

---

## 7. 결론

### 7.1 핵심 개선 포인트
1. **Phase 2**: R-tree 공간 인덱스로 O(n²) → O(n log n) 개선
2. **Phase 3**: test_dxf_extract_pid*.py의 병렬 처리 구조 도입
3. **병렬 LLM**: 4개 청크를 동시에 실행하여 GPU 자원 최대화

### 7.2 예상 성능
- **현재**: timeout (Phase 2에서 멈춤)
- **개선 후**: 약 7-13초 (28,000개 엔티티 기준)
- **속도 향상**: 100배 이상 (Phase 2), 3-5배 (Phase 3)

### 7.3 구현 전략
1. 먼저 Phase 2 (공간 인덱스) 구현 → Phase 2 timeout 해결
2. Phase 3 (병렬 LLM) 구현 → test_dxf_extract_pid*.py 구조 참고
3. 전체 파이프라인 통합 → 벤치마크 테스트