chore: 루트 문서 정리 — knowledge/ CANON 소스 신설 + 흩어진 문서 루트 밖 격리

seed 품질 확보(GIGO 차단). 루트에 흩어졌던 ~150개 문서를 용도별 분리. - knowledge/ 신설 = 단일 CANON 지식 소스 (RAG/지식은 여기만 참조) · 플랜트 지식 7: 구조설명 6-1/6-2차, 측류추출 관계식·시간지연, PGMEA 일반상식·운전주의점 · 도면-데이터시트/: As-Built 15 + FCV 데이터시트 2 (PDF 바이너리는 .gitignore, 디스크 유지) - 계획·진단·대화로그·멀티모델 초안(byQwen/byGemma 등)·완료작업(dxf-graph/·fastTable/·plans/)은 **프로젝트 루트 밖 저장소로 격리**(삭제 아닌 이동, 복원 가능): /home/windpacer/projects/ReferenceSources/ExperionCrawler/ (ExperionCrawler.Tests/ 도 동일 위치 — 완료/실패분, 필요시 복원) - .gitignore: 대용량 PDF(knowledge 104M + src/Web/uploads 157M)·*.backup 제외 근거 플랜(아카이브): ReferenceSources/.../plans/online-lora-학습-파이프라인-실행계획-byOPUS.md Phase -1. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-26 09:55:19 +09:00
parent ab3e36680f
commit 3e9f3076ef
367 changed files with 1566 additions and 2525740 deletions
--- a/scripts/run-qwen3.6-35b-a3b.sh
+++ b/scripts/run-qwen3.6-35b-a3b.sh
@@ -0,0 +1,55 @@
+#!/usr/bin/env bash
+set -euo pipefail
+
+NAME="vllm_qwen35b"
+PORT="${1:-8001}"
+
+echo "Starting Qwen3.6-35B-A3B-FP8 on port ${PORT} (LoRA enabled)..."
+
+docker rm -f "$NAME" 2>/dev/null || true
+
+docker run -d --name "$NAME" \
+  --restart unless-stopped \
+  --gpus all --network host --ipc host \
+  --ulimit memlock=-1 --ulimit stack=67108864 \
+  -v /home/windpacer/.cache/huggingface:/root/.cache/huggingface \
+  -v /home/windpacer/.cache/vllm:/root/.cache/vllm \
+  -v /home/windpacer/ai-models:/root/ai-models \
+  --entrypoint "" \
+  vllm-node-tf5 \
+  bash -c "
+exec vllm serve /root/ai-models/Qwen3.6-35B-A3B-FP8 \
+  --served-model-name Qwen3.6-35B-A3B-FP8 \
+  --max-model-len 65536 \
+  --max-num-seqs 4 \
+  --gpu-memory-utilization 0.55 \
+  --port ${PORT} --host 0.0.0.0 \
+  --enable-chunked-prefill \
+  --enable-auto-tool-choice \
+  --tool-call-parser qwen3_coder \
+  --reasoning-parser qwen3 \
+  --trust-remote-code \
+  --kv-cache-dtype fp8 \
+  --default-chat-template-kwargs '{\"preserve_thinking\": true}' \
+  --speculative-config '{\"method\": \"qwen3_next_mtp\", \"num_speculative_tokens\": 2}' \
+  --override-generation-config '{\"temperature\": 0.6, \"top_p\": 0.95}' \
+  --load-format instanttensor \
+  --enable-lora \
+  --max-lora-rank 64 \
+  --max-loras 4 \
+  --lora-dtype auto \
+  -tp 1
+"
+
+echo "Waiting for model to load..."
+for i in $(seq 1 120); do
+  if curl -sf "http://localhost:${PORT}/v1/models" > /dev/null 2>&1; then
+    echo "✓ Ready on port ${PORT}"
+    curl -s "http://localhost:${PORT}/v1/models" | python3 -m json.tool 2>/dev/null || true
+    exit 0
+  fi
+  echo "  Waiting... (${i}/120)"
+  sleep 5
+done
+echo "❌ Failed to start within 10 minutes"
+exit 1