# ๐Ÿ› ๏ธ Graph Pipeline Phase 1: ๊ธฐํ•˜ํ•™์  ๋ฐ์ดํ„ฐ ์ถ”์ถœ (Geometric Extraction) ์ด ๋ฌธ์„œ๋Š” P&ID Graph Pipeline์˜ ์ฒซ ๋ฒˆ์งธ ๋‹จ๊ณ„์ธ **๊ธฐํ•˜ํ•™์  ๋ฐ์ดํ„ฐ ์ถ”์ถœ**์˜ ์ƒ์„ธ ๊ตฌํ˜„ ๊ณ„ํš์„ ๋‹ค๋ฃน๋‹ˆ๋‹ค. ๋ชฉํ‘œ๋Š” ๋‹จ์ˆœํ•œ ํ…์ŠคํŠธ ์ถ”์ถœ์„ ๋„˜์–ด, ๋„๋ฉด ๋‚ด ๋ชจ๋“  ๊ฐ์ฒด์˜ **๋ฌผ๋ฆฌ์  ์œ„์น˜(์ขŒํ‘œ)**์™€ **๊ธฐํ•˜ํ•™์  ์†์„ฑ**์„ ๋ณด์กดํ•˜์—ฌ ์ดํ›„ ์œ„์ƒ ๋ชจ๋ธ๋ง(Topology Modeling)์ด ๊ฐ€๋Šฅํ•˜๋„๋ก ํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. --- ## ๐Ÿ“ฆ 1. ํ•„์ˆ˜ ํŒจํ‚ค์ง€ ๋ฐ ํ™˜๊ฒฝ ์„ค์ • ### 1.1 Python ํŒจํ‚ค์ง€ | ํŒจํ‚ค์ง€ | ์šฉ๋„ | ๋น„๊ณ  | |---|---|---| | `ezdxf` | DXF ํŒŒ์ผ ํŒŒ์‹ฑ ๋ฐ ์—”ํ‹ฐํ‹ฐ ์ถ”์ถœ | ํ•ต์‹ฌ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ | | `shapely` | ๊ธฐํ•˜ํ•™์  ์—ฐ์‚ฐ (Intersection, Distance, Bounding Box) | ์ขŒํ‘œ ๊ธฐ๋ฐ˜ ๋ถ„์„ ํ•„์ˆ˜ | | `numpy` | ๋Œ€๋Ÿ‰์˜ ์ขŒํ‘œ ๋ฐ์ดํ„ฐ ๊ณ„์‚ฐ ๋ฐ ํ–‰๋ ฌ ์—ฐ์‚ฐ | ์„ฑ๋Šฅ ์ตœ์ ํ™” | | `pandas` | ์ถ”์ถœ๋œ ๊ฐ์ฒด ๋ฐ์ดํ„ฐ์˜ ๊ตฌ์กฐํ™” ๋ฐ CSV/JSON ์ €์žฅ | ๋ฐ์ดํ„ฐ ๊ด€๋ฆฌ | | `pydantic` | ์ถ”์ถœ ๋ฐ์ดํ„ฐ์˜ ์Šคํ‚ค๋งˆ ์ •์˜ ๋ฐ ์œ ํšจ์„ฑ ๊ฒ€์ฆ | ๋ฐ์ดํ„ฐ ๋ฌด๊ฒฐ์„ฑ ๋ณด์žฅ | | `pytesseract` / `pdf2image` | PDF ๋„๋ฉด์˜ ์˜์—ญ ๊ธฐ๋ฐ˜ OCR ์ถ”์ถœ | PDF ์ฒ˜๋ฆฌ ์‹œ ํ•„์š” | ### 1.2 ์„ค์น˜ ๋ช…๋ น์–ด ```bash pip install ezdxf shapely numpy pandas pydantic pytesseract pdf2image ``` --- ## ๐Ÿ“ 2. ์ƒ์„ธ ์„ค๊ณ„ ๊ตฌ์กฐ ### 2.1 ๋ฐ์ดํ„ฐ ๋ชจ๋ธ (Schema) ๋ชจ๋“  ์ถ”์ถœ ๊ฐ์ฒด๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์€ ๊ณตํ†ต ์†์„ฑ์„ ๊ฐ–๋Š” `GeometricEntity` ๋ชจ๋ธ์„ ๋”ฐ๋ฆ…๋‹ˆ๋‹ค. ```python from pydantic import BaseModel from typing import List, Optional, Union, Tuple class BoundingBox(BaseModel): min_x: float min_y: float max_x: float max_y: float center: Tuple[float, float] class GeometricEntity(BaseModel): entity_id: str entity_type: str # TEXT, LINE, CIRCLE, POLYLINE, ARC layer: str bbox: BoundingBox properties: dict # ํ…์ŠคํŠธ ๊ฐ’, ์ƒ‰์ƒ, ์„  ๊ตต๊ธฐ ๋“ฑ coordinates: List[Tuple[float, float]] # ์‹œ์ž‘์ , ๋์  ๋˜๋Š” ์ •์  ๋ฆฌ์ŠคํŠธ ``` ### 2.2 ์ฒ˜๋ฆฌ ํŒŒ์ดํ”„๋ผ์ธ ํ๋ฆ„ 1. **DXF Load:** `ezdxf.readfile()`์„ ํ†ตํ•ด ๋„๋ฉด ๋กœ๋“œ. 2. **Entity Iteration:** ๋ชจ๋“  ๋ ˆ์ด์–ด์˜ ์—”ํ‹ฐํ‹ฐ๋ฅผ ์ˆœํšŒํ•˜๋ฉฐ ํƒ€์ž…๋ณ„ ๋ถ„๋ฅ˜. 3. **Coordinate Extraction:** * `TEXT`: ์‚ฝ์ž…์ (Insertion Point) ๋ฐ ํ…์ŠคํŠธ ๊ธธ์ด๋ฅผ ์ด์šฉํ•œ BBox ๊ณ„์‚ฐ. * `LINE`: ์‹œ์ž‘์ (Start)๊ณผ ๋์ (End) ์ถ”์ถœ. * `POLYLINE`: ๋ชจ๋“  ์ •์ (Vertices) ๋ฆฌ์ŠคํŠธ ์ถ”์ถœ. * `CIRCLE/ARC`: ์ค‘์‹ฌ์ (Center)๊ณผ ๋ฐ˜์ง€๋ฆ„(Radius) ์ถ”์ถœ. 4. **Spatial Normalization:** ๋„๋ฉด ์ขŒํ‘œ๊ณ„๋ฅผ ๋ถ„์„ ์‹œ์Šคํ…œ ์ขŒํ‘œ๊ณ„๋กœ ์ •๊ทœํ™”. 5. **Structured Export:** JSON ๋˜๋Š” DB(PostgreSQL/PostGIS)์— ์ €์žฅ. --- ## ๐Ÿ’ป 3. ์‹ค์ œ ๊ตฌํ˜„ ์ฝ”๋”ฉ ๊ฐ€์ด๋“œ (Example) ### 3.1 DXF ๊ธฐํ•˜ํ•™์  ์ถ”์ถœ ํ•ต์‹ฌ ์ฝ”๋“œ ```python import ezdxf import re import json from shapely.geometry import box, LineString, Point from typing import List, Optional, Tuple class PidGeometricExtractor: def __init__(self, file_path: str): self.doc = ezdxf.readfile(file_path) self.msp = self.doc.modelspace() def clean_text(self, text: str) -> str: """DXF ํŠน์ˆ˜ ์ œ์–ด ๋ฌธ์ž ๋ฐ MTEXT ํฌ๋งทํŒ…์„ ์ตœ๋Œ€ํ•œ ์ œ๊ฑฐํ•˜์—ฌ LLM ํ† ํฐ ๋ถ€ํ•˜ ๊ฐ์†Œ""" if not text: return "" # 1. MTEXT ํฌ๋งทํŒ… ๋ฐ ์ œ์–ด ๋ฌธ์ž ์ œ๊ฑฐ # \P(์ค„๋ฐ”๊ฟˆ), \W(๋„ˆ๋น„), \L(๋ฐ‘์ค„), \A(์ •๋ ฌ), \C(์ƒ‰์ƒ), \H(๋†’์ด), \S(์Šคํƒ), \T(ํƒญ) ๋ฐ ๊ด€๋ จ ์ธ์ž ์ œ๊ฑฐ text = re.sub(r'\\([P|W|L|A|C|H|S|T])\d*;?', ' ', text) # 2. ์ค‘๊ด„ํ˜ธ { } ์ œ๊ฑฐ (MTEXT์—์„œ ์„œ์‹ ์ง€์ • ์‹œ ์‚ฌ์šฉ๋จ) text = re.sub(r'[\{\}]', ' ', text) # 3. DXF ํŠน์ˆ˜ ์ œ์–ด ๋ฌธ์ž ์ œ๊ฑฐ (%%U: Underline, %%O: Overline, %%S: Strikethrough, %%R: Registered) text = re.sub(r'%%[U|O|S|R]', ' ', text) # 4. ๋ถˆํ•„์š”ํ•œ ํŠน์ˆ˜ ๊ธฐํ˜ธ ๋ฐ ๋ฐ˜๋ณต๋˜๋Š” ๊ณต๋ฐฑ ์ •์ œ # - ์—ฐ์†๋œ ๊ณต๋ฐฑ์„ ํ•˜๋‚˜๋กœ ํ†ตํ•ฉ # - ํ…์ŠคํŠธ ์–‘ ๋์˜ ๊ณต๋ฐฑ ์ œ๊ฑฐ text = re.sub(r'\s+', ' ', text).strip() return text def get_bbox(self, entity) -> Optional[box]: """์—”ํ‹ฐํ‹ฐ์˜ Bounding Box๋ฅผ ๊ณ„์‚ฐํ•˜์—ฌ shapely box ๊ฐ์ฒด๋กœ ๋ฐ˜ํ™˜""" try: if entity.dxftype() == 'TEXT': p = entity.dxf.insert h = entity.dxf.height # ํ…์ŠคํŠธ ๊ธธ์ด์— ๋”ฐ๋ฅธ ๋Œ€๋žต์ ์ธ ๋„ˆ๋น„ ๊ณ„์‚ฐ (๊ธ€์ž์ˆ˜ * ๋†’์ด * 0.6) width = len(entity.dxf.text) * h * 0.6 return box(p.x, p.y, p.x + width, p.y + h) elif entity.dxftype() == 'MTEXT': p = entity.dxf.insert h = entity.dxf.char_height if hasattr(entity.dxf, 'char_height') else 2.5 # MTEXT๋Š” ๋ณดํ†ต width ์†์„ฑ์ด ์ •์˜๋˜์–ด ์žˆ์Œ w = entity.dxf.width if entity.dxf.width > 0 else len(entity.text) * h * 0.6 return box(p.x, p.y, p.x + w, p.y + h) elif entity.dxftype() == 'LINE': start = entity.dxf.start end = entity.dxf.end return box(min(start.x, end.x), min(start.y, end.y), max(start.x, end.x), max(start.y, end.y)) elif entity.dxftype() == 'LWPOLYLINE': points = entity.get_points() xs = [p[0] for p in points] ys = [p[1] for p in points] return box(min(xs), min(ys), max(xs), max(ys)) except Exception as e: print(f"Error calculating bbox for {entity.dxftype()}: {e}") return None def extract_and_save(self, output_path: str): """ ์ถ”์ถœ๋œ ๊ธฐํ•˜ํ•™์  ๋ฐ์ดํ„ฐ๋ฅผ ํŒŒ์ผ๋กœ ์ €์žฅํ•˜์—ฌ Phase 3 Worker๋“ค์ด ๊ณต์œ  ๋ฉ”๋ชจ๋ฆฌ/ํŒŒ์ผ ์‹œ์Šคํ…œ์„ ํ†ตํ•ด ์ฐธ์กฐํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•จ (Phase 5 ๋ณ‘๋ ฌ ์•„ํ‚คํ…์ฒ˜ ๋ฐ˜์˜) """ results = [] for entity in self.msp: bbox_obj = self.get_bbox(entity) if bbox_obj: # ํ…์ŠคํŠธ ๊ฐ’ ์ถ”์ถœ ๋ฐ ์ •์ œ raw_text = "" if entity.dxftype() == 'TEXT': raw_text = entity.dxf.text elif entity.dxftype() == 'MTEXT': raw_text = entity.text results.append({ "id": entity.dxf.handle, "type": entity.dxftype(), "layer": entity.dxf.layer, "bbox": { "min_x": bbox_obj.bounds[0], "min_y": bbox_obj.bounds[1], "max_x": bbox_obj.bounds[2], "max_y": bbox_obj.bounds[3] }, "raw_value": raw_text, "clean_value": self.clean_text(raw_text) if raw_text else None }) with open(output_path, 'w', encoding='utf-8') as f: json.dump(results, f, ensure_ascii=False, indent=4) return output_path # ์‚ฌ์šฉ ์˜ˆ์‹œ (Phase 5 Orchestrator ๊ด€์ ) extractor = PidGeometricExtractor("plant_drawing.dxf") # ๋ฐ์ดํ„ฐ๋ฅผ ์ง์ ‘ ๋ฐ˜ํ™˜๋ฐ›์ง€ ์•Š๊ณ  ๊ณต์œ  ์ €์žฅ์†Œ(ํŒŒ์ผ)์— ์ ์žฌ geo_data_path = extractor.extract_and_save("shared_geo_data.json") ``` ### 3.2 ์œ ํ‹ธ๋ฆฌํ‹ฐ ํ•จ์ˆ˜: ์ธ์ ‘์„ฑ ์ฒดํฌ (Proximity Utility) ์ถ”ํ›„ 2๋‹จ๊ณ„(์œ„์ƒ ๋ชจ๋ธ๋ง)์—์„œ ์‚ฌ์šฉํ•  ํ•ต์‹ฌ ์œ ํ‹ธ๋ฆฌํ‹ฐ์ž…๋‹ˆ๋‹ค. ```python from shapely.geometry import Point def is_near(entity_a_bbox, entity_b_bbox, threshold=5.0): """๋‘ ๊ฐ์ฒด์˜ Bounding Box ๊ฐ„์˜ ์ตœ๋‹จ ๊ฑฐ๋ฆฌ๊ฐ€ ์ž„๊ณ„๊ฐ’ ์ด๋‚ด์ธ์ง€ ํ™•์ธ""" return entity_a_bbox.distance(entity_b_bbox) <= threshold def is_inside(point, bbox): """ํŠน์ • ์ ์ด Bounding Box ๋‚ด๋ถ€์— ์žˆ๋Š”์ง€ ํ™•์ธ""" return bbox.contains(Point(point)) ``` --- ## ๐Ÿš€ 4. Phase 1 ์™„๋ฃŒ ๊ธฐ์ค€ (Definition of Done) - [ ] DXF ํŒŒ์ผ ๋‚ด ๋ชจ๋“  `TEXT`, `LINE`, `POLYLINE`์˜ ์ขŒํ‘œ ๋ฐ์ดํ„ฐ๊ฐ€ ๋ˆ„๋ฝ ์—†์ด ์ถ”์ถœ๋˜๋Š”๊ฐ€? - [ ] ๊ฐ ๊ฐ์ฒด๋ณ„๋กœ ์ •ํ™•ํ•œ `Bounding Box`๊ฐ€ ๊ณ„์‚ฐ๋˜์–ด ์ €์žฅ๋˜๋Š”๊ฐ€? - [ ] ์ถ”์ถœ๋œ ๋ฐ์ดํ„ฐ๊ฐ€ `GeometricEntity` ์Šคํ‚ค๋งˆ์— ๋งž๊ฒŒ JSON ํŒŒ์ผ๋กœ ์ €์žฅ๋˜์–ด Worker๋“ค์ด ๊ณต์œ  ์ฐธ์กฐ ๊ฐ€๋Šฅํ•œ๊ฐ€? (Phase 5 ๋ฐ˜์˜) - [ ] (์„ ํƒ ์‚ฌํ•ญ) PDF ๋„๋ฉด์˜ ๊ฒฝ์šฐ OCR์„ ํ†ตํ•ด ํ…์ŠคํŠธ์˜ ์ขŒํ‘œ๊ฐ’์ด ์ถ”์ถœ๋˜๋Š”๊ฐ€? --- ## ๐Ÿง ๊ฐ๋…์ž ์ง„๋‹จ ๊ฒฐ๊ณผ (2026-05-02) ### 1. ํ”„๋กœ๊ทธ๋žจ ์„ค๊ณ„ ์ ๊ฒ€ - **๊ฐ•์ **: `ezdxf`์™€ `shapely`๋ฅผ ์กฐํ•ฉํ•˜์—ฌ ๊ธฐํ•˜ํ•™์  ๋ฐ์ดํ„ฐ(BBox, ์ขŒํ‘œ)๋ฅผ ๋ณด์กดํ•˜๋ ค๋Š” ์ ‘๊ทผ ๋ฐฉ์‹์ด ๋งค์šฐ ์ ์ ˆํ•จ. ํŠนํžˆ Phase 5์˜ ๋ณ‘๋ ฌ ์•„ํ‚คํ…์ฒ˜๋ฅผ ๊ณ ๋ คํ•˜์—ฌ ๋ฐ์ดํ„ฐ๋ฅผ ํŒŒ์ผ/๊ณต์œ  ์ €์žฅ์†Œ์— ์ ์žฌํ•˜๋Š” ๊ตฌ์กฐ๋Š” ํ™•์žฅ์„ฑ ๋ฉด์—์„œ ์šฐ์ˆ˜ํ•จ. - **๋ณด์™„ ํ•„์š” ์‚ฌํ•ญ**: - **MTEXT ์ฒ˜๋ฆฌ**: ํ˜„์žฌ ์˜ˆ์‹œ ์ฝ”๋“œ(`3.1`)๋Š” `TEXT` ์—”ํ‹ฐํ‹ฐ๋งŒ ์ฒ˜๋ฆฌํ•˜๊ณ  ์žˆ์œผ๋‚˜, ์‹ค์ œ DXF ํŒŒ์ผ ๋ถ„์„ ๊ฒฐ๊ณผ `MTEXT` ์—”ํ‹ฐํ‹ฐ๊ฐ€ ๋‹ค์ˆ˜ ์กด์žฌํ•จ. `MTEXT`๋Š” ๋‚ด๋ถ€ ํฌ๋งทํŒ… ์ฝ”๋“œ(์˜ˆ: `\P`, `\W`)๊ฐ€ ํฌํ•จ๋˜์–ด ์žˆ์–ด ๋‹จ์ˆœ ํ…์ŠคํŠธ ์ถ”์ถœ ์‹œ ์ •์ œ๊ฐ€ ํ•„์š”ํ•จ. - **BBox ๊ณ„์‚ฐ ์ •๋ฐ€๋„**: `TEXT` ์—”ํ‹ฐํ‹ฐ์˜ BBox๋ฅผ `p.x + 10, p.y + 5`์™€ ๊ฐ™์ด ์ƒ์ˆ˜๋กœ ์ฒ˜๋ฆฌํ•˜๊ณ  ์žˆ์Œ. ์‹ค์ œ ๋„๋ฉด์˜ ํฐํŠธ ํฌ๊ธฐ(`height`)์™€ ์ •๋ ฌ ๋ฐฉ์‹(`align`)์„ ๋ฐ˜์˜ํ•œ ๋™์  ๊ณ„์‚ฐ ๋กœ์ง์ด ๋ฐ˜๋“œ์‹œ ์ถ”๊ฐ€๋˜์–ด์•ผ ํ•จ. ### 2. ์‹ค์ œ ๋„๋ฉด(`No-10_Plant_PID.dxf`) ๋ถ„์„ ๊ธฐ๋ฐ˜ ์ฐจ์ด์  - **์—”ํ‹ฐํ‹ฐ ๊ทœ๋ชจ**: ์ด 28,819๊ฐœ์˜ ์—”ํ‹ฐํ‹ฐ๊ฐ€ ์กด์žฌํ•˜์—ฌ ๋ฐ์ดํ„ฐ ์–‘์ด ์ƒ๋‹นํ•จ. ๋‹จ์ˆœ ๋ฆฌ์ŠคํŠธ ์ €์žฅ๋ณด๋‹ค๋Š” ์ธ๋ฑ์‹ฑ ์ „๋žต์ด ํ•„์š”ํ•  ์ˆ˜ ์žˆ์Œ. - **ํ…์ŠคํŠธ ๋ณต์žก๋„**: - `MTEXT` ๋‚ด์— `\P` (์ค„๋ฐ”๊ฟˆ), `\L` (๋ฐ‘์ค„) ๋“ฑ ์ œ์–ด ๋ฌธ์ž๊ฐ€ ํฌํ•จ๋œ ์ˆ˜์ • ์‚ฌํ•ญ(Revision) ํ…์ŠคํŠธ๊ฐ€ ๋งŽ์Œ. ์ด๋ฅผ ๊ทธ๋Œ€๋กœ ์ถ”์ถœํ•˜๋ฉด ์œ„์ƒ ๋ถ„์„ ์‹œ ๋…ธ์ด์ฆˆ๊ฐ€ ๋  ๊ฐ€๋Šฅ์„ฑ์ด ๋†’์Œ. - `%%U` (Underline)์™€ ๊ฐ™์€ DXF ํŠน์ˆ˜ ์ œ์–ด ๋ฌธ์ž๊ฐ€ ํ…์ŠคํŠธ ๊ฐ’์— ํฌํ•จ๋˜์–ด ์žˆ์–ด, ์ด๋ฅผ ์ œ๊ฑฐํ•˜๋Š” ์ „์ฒ˜๋ฆฌ ๊ณผ์ •์ด ํ•„์ˆ˜์ ์ž„. - **๋ฐ์ดํ„ฐ ํŠน์„ฑ**: `IA-10922-25A-F1A-n`์™€ ๊ฐ™์€ ๋ณตํ•ฉ ํŒŒ์ดํ”„๋ผ์ธ ๋ฒˆํ˜ธ(Pipe Line Number) ํ˜•์‹์ด ํ™•์ธ๋จ. ์ด๋ฅผ ์ผ๋ฐ˜ ํƒœ๊ทธ(Tag Name)์™€ ๋ช…ํ™•ํžˆ ๊ตฌ๋ถ„ํ•˜์—ฌ ์ถ”์ถœํ•˜๊ณ  ๊ด€๋ฆฌํ•˜๋Š” ๋กœ์ง์ด Phase 2/3์—์„œ ์ค‘์š”ํ•˜๊ฒŒ ์ž‘์šฉํ•  ๊ฒƒ์œผ๋กœ ๋ณด์ž„. ### 3. ์ตœ์ข… ๊ถŒ๊ณ  ์‚ฌํ•ญ 1. **MTEXT ์ง€์› ์ถ”๊ฐ€**: `PidGeometricExtractor`์— `MTEXT` ์ฒ˜๋ฆฌ ๋กœ์ง์„ ์ถ”๊ฐ€ํ•˜๊ณ , ์ œ์–ด ๋ฌธ์ž๋ฅผ ์ œ๊ฑฐํ•˜๋Š” `clean_text()` ์œ ํ‹ธ๋ฆฌํ‹ฐ ํ•จ์ˆ˜๋ฅผ ๊ตฌํ˜„ํ•  ๊ฒƒ. 2. **๋™์  BBox ๊ตฌํ˜„**: `entity.dxf.height`๋ฅผ ํ™œ์šฉํ•˜์—ฌ ํ…์ŠคํŠธ ํฌ๊ธฐ์— ๋งž๋Š” ์ •ํ™•ํ•œ Bounding Box๋ฅผ ๊ณ„์‚ฐํ•˜๋„๋ก ์ˆ˜์ •ํ•  ๊ฒƒ. 3. **์ „์ฒ˜๋ฆฌ ํŒŒ์ดํ”„๋ผ์ธ ๊ฐ•ํ™”**: ์ถ”์ถœ ๋‹จ๊ณ„์—์„œ `%%U` ๋“ฑ์˜ ํŠน์ˆ˜ ๋ฌธ์ž๋ฅผ ์ œ๊ฑฐํ•˜๋Š” ์ •์ œ ๋‹จ๊ณ„๋ฅผ ์ถ”๊ฐ€ํ•˜์—ฌ ๋ฐ์ดํ„ฐ ํ’ˆ์งˆ์„ ๋†’์ผ ๊ฒƒ.