RAG入門 - LLMに最新情報を与える技術

📚 概要

RAG（Retrieval-Augmented Generation）は、大規模言語モデル（LLM）に外部知識を与えて、より正確で最新の情報に基づいた回答を生成する技術です。

この記事では、RAGの仕組み、実装方法、ベストプラクティスまで詳しく解説します。

🕰️ 歴史的背景

LLMの課題

2020年代初頭、GPT-3やBERTなどの大規模言語モデルが登場し、自然言語処理タスクで驚異的な性能を発揮しました。しかし、いくつかの課題がありました：

主な課題:

知識のカットオフ: 学習データの時点より後の情報を知らない
ハルシネーション: 事実に基づかない情報を生成することがある
ドメイン知識不足: 特定分野の専門知識が不十分
情報の更新が困難: 再学習にコストがかかる

RAGの登場 - 2020年

論文: "Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks"（Facebook AI, 2020年）
アプローチ: 外部知識ベースから関連情報を検索し、プロンプトに含める
利点: 学習不要で知識を更新可能

🔧 技術解説

RAGの仕組み

graph TB
    A[User Query] --> B[Embedding Model]
    B --> C[Vector Search]
    C --> D[Vector Database]
    D --> E[Top-K Results]
    E --> F[Context Builder]
    F --> G[LLM]
    A --> G
    G --> H[Generated Answer]
    
    I[Documents] --> J[Chunking]
    J --> K[Embedding]
    K --> D
    
    style A fill:#51cf66
    style D fill:#4dabf7
    style G fill:#ff6b6b

フロー:

インデックス構築:
- ドキュメントをチャンク分割
- 各チャンクをベクトル化（エンベディング）
- ベクトルDBに保存
クエリ処理:
- ユーザークエリをベクトル化
- ベクトルDBで類似検索
- Top-K個の関連チャンクを取得
生成:
- 取得したチャンクとクエリを結合
- LLMに入力して回答を生成

1. チャンキング戦略

ドキュメントを適切なサイズに分割する重要なステップです。

固定サイズチャンキング:

def fixed_size_chunking(text: str, chunk_size: int = 500, overlap: int = 50):
    chunks = []
    start = 0
    
    while start < len(text):
        end = start + chunk_size
        chunks.append(text[start:end])
        start = end - overlap
    
    return chunks

# 使用例
text = "長い文書..."
chunks = fixed_size_chunking(text, chunk_size=500, overlap=50)

文単位チャンキング:

import re

def sentence_chunking(text: str, sentences_per_chunk: int = 3):
    sentences = re.split(r'(?<=[.!?])\s+', text)
    chunks = []
    
    for i in range(0, len(sentences), sentences_per_chunk):
        chunk = ' '.join(sentences[i:i + sentences_per_chunk])
        chunks.append(chunk)
    
    return chunks

セマンティックチャンキング（推奨）:

def semantic_chunking(text: str, max_chunk_size: int = 1000):
    # 段落で分割
    paragraphs = text.split('\n\n')
    chunks = []
    current_chunk = ""
    
    for para in paragraphs:
        if len(current_chunk) + len(para) < max_chunk_size:
            current_chunk += para + "\n\n"
        else:
            if current_chunk:
                chunks.append(current_chunk.strip())
            current_chunk = para + "\n\n"
    
    if current_chunk:
        chunks.append(current_chunk.strip())
    
    return chunks

2. エンベディング（ベクトル化）

テキストを数値ベクトルに変換します。

OpenAI Embeddings:

from openai import OpenAI

client = OpenAI()

def get_embedding(text: str, model: str = "text-embedding-3-small"):
    response = client.embeddings.create(
        input=text,
        model=model
    )
    return response.data[0].embedding

# 使用例
embedding = get_embedding("これはテストです")
print(len(embedding))  # 1536次元ベクトル

ローカルモデル（sentence-transformers）:

from sentence_transformers import SentenceTransformer

model = SentenceTransformer('paraphrase-multilingual-MiniLM-L12-v2')

def get_local_embedding(text: str):
    return model.encode(text)

# 使用例
embedding = get_local_embedding("これはテストです")
print(embedding.shape)  # 384次元ベクトル

3. ベクトルデータベース

Pinecone の例:

from pinecone import Pinecone

pc = Pinecone(api_key="your-api-key")
index = pc.Index("my-index")

# ベクトルを挿入
def insert_chunks(chunks: list[str]):
    vectors = []
    
    for i, chunk in enumerate(chunks):
        embedding = get_embedding(chunk)
        vectors.append({
            "id": f"chunk-{i}",
            "values": embedding,
            "metadata": {"text": chunk}
        })
    
    index.upsert(vectors)

# 検索
def search(query: str, top_k: int = 3):
    query_embedding = get_embedding(query)
    
    results = index.query(
        vector=query_embedding,
        top_k=top_k,
        include_metadata=True
    )
    
    return [match['metadata']['text'] for match in results['matches']]

ChromaDB の例（ローカル）:

import chromadb

client = chromadb.Client()
collection = client.create_collection("my_docs")

# ドキュメント追加
def add_documents(chunks: list[str]):
    collection.add(
        documents=chunks,
        ids=[f"chunk-{i}" for i in range(len(chunks))]
    )

# 検索
def search_local(query: str, n_results: int = 3):
    results = collection.query(
        query_texts=[query],
        n_results=n_results
    )
    return results['documents'][0]

4. LLMとの統合

OpenAI GPT-4 の例:

from openai import OpenAI

client = OpenAI()

def generate_answer(query: str, context_chunks: list[str]):
    # コンテキストを構築
    context = "\n\n".join([f"情報 {i+1}:\n{chunk}" for i, chunk in enumerate(context_chunks)])
    
    # プロンプト作成
    prompt = f"""以下の情報を参考にして、質問に答えてください。

{context}

質問: {query}

回答:"""
    
    response = client.chat.completions.create(
        model="gpt-4",
        messages=[
            {"role": "system", "content": "あなたは与えられた情報に基づいて正確に答えるアシスタントです。"},
            {"role": "user", "content": prompt}
        ],
        temperature=0.3
    )
    
    return response.choices[0].message.content

# 使用例
query = "RAGとは何ですか？"
relevant_chunks = search(query, top_k=3)
answer = generate_answer(query, relevant_chunks)
print(answer)

💡 実践例: 完全なRAGシステム

import chromadb
from openai import OpenAI
from sentence_transformers import SentenceTransformer

class RAGSystem:
    def __init__(self):
        self.client = OpenAI()
        self.chroma_client = chromadb.Client()
        self.collection = self.chroma_client.create_collection("docs")
        self.embedding_model = SentenceTransformer('paraphrase-multilingual-MiniLM-L12-v2')
    
    def chunk_text(self, text: str, chunk_size: int = 500, overlap: int = 50):
        """テキストをチャンク分割"""
        chunks = []
        start = 0
        
        while start < len(text):
            end = start + chunk_size
            chunks.append(text[start:end])
            start = end - overlap
        
        return chunks
    
    def add_documents(self, documents: list[str]):
        """ドキュメントを追加"""
        all_chunks = []
        chunk_ids = []
        
        for doc_id, doc in enumerate(documents):
            chunks = self.chunk_text(doc)
            for chunk_id, chunk in enumerate(chunks):
                all_chunks.append(chunk)
                chunk_ids.append(f"doc-{doc_id}-chunk-{chunk_id}")
        
        # ChromaDBに追加
        self.collection.add(
            documents=all_chunks,
            ids=chunk_ids
        )
        
        print(f"Added {len(all_chunks)} chunks from {len(documents)} documents")
    
    def retrieve(self, query: str, top_k: int = 3):
        """関連チャンクを検索"""
        results = self.collection.query(
            query_texts=[query],
            n_results=top_k
        )
        return results['documents'][0]
    
    def generate(self, query: str, context_chunks: list[str]):
        """回答を生成"""
        context = "\n\n".join([f"【情報 {i+1}】\n{chunk}" for i, chunk in enumerate(context_chunks)])
        
        prompt = f"""以下の情報を参考にして、質問に正確に答えてください。
情報に含まれていない内容については、「提供された情報には含まれていません」と答えてください。

{context}

質問: {query}

回答:"""
        
        response = self.client.chat.completions.create(
            model="gpt-4",
            messages=[
                {"role": "system", "content": "あなたは正確で信頼性の高いアシスタントです。"},
                {"role": "user", "content": prompt}
            ],
            temperature=0.3
        )
        
        return response.choices[0].message.content
    
    def query(self, question: str, top_k: int = 3):
        """RAGパイプライン全体を実行"""
        # 1. 関連情報を検索
        chunks = self.retrieve(question, top_k)
        
        # 2. 回答を生成
        answer = self.generate(question, chunks)
        
        return {
            "answer": answer,
            "sources": chunks
        }

# 使用例
rag = RAGSystem()

# ドキュメント追加
documents = [
    "RAGはRetrieval-Augmented Generationの略で、外部知識を使ってLLMの回答を改善する技術です。",
    "LangChainはRAGを簡単に実装できるフレームワークで、2023年に登場しました。",
    "ベクトルデータベースは高速な類似検索を可能にし、Pinecone、Weaviate、Qdrantなどがあります。"
]

rag.add_documents(documents)

# 質問
result = rag.query("RAGとは何ですか？")
print("回答:", result['answer'])
print("\n情報源:")
for i, source in enumerate(result['sources'], 1):
    print(f"{i}. {source}")

📊 RAG vs ファインチューニング

graph TB
    A[Knowledge Update] --> B{Approach}
    B --> C[RAG]
    B --> D[Fine-tuning]
    
    C --> E[External DB]
    C --> F[Fast Updates]
    C --> G[No Retraining]
    
    D --> H[Model Weights]
    D --> I[Slow Updates]
    D --> J[Requires Retraining]
    
    style C fill:#51cf66
    style D fill:#ff6b6b

	RAG	ファインチューニング
知識更新	リアルタイム	再学習が必要
コスト	低い	高い
専門性	中程度	高い
実装	簡単	難しい
トレーサビリティ	高い（情報源を示せる）	低い
ユースケース	FAQbot、社内文書検索	ドメイン特化タスク

🎯 ベストプラクティス

1. チャンクサイズの最適化

# テスト
chunk_sizes = [200, 500, 1000]
for size in chunk_sizes:
    chunks = chunk_text(text, chunk_size=size)
    accuracy = evaluate_rag(chunks, test_queries)
    print(f"Chunk size {size}: Accuracy {accuracy}")

推奨:

テキスト: 500-1000文字
コード: 関数/クラス単位
技術文書: セクション単位

2. メタデータの活用

collection.add(
    documents=chunks,
    metadatas=[
        {"source": "doc1.pdf", "page": 1, "type": "technical"},
        {"source": "doc2.pdf", "page": 2, "type": "faq"}
    ],
    ids=chunk_ids
)

# フィルタリング検索
results = collection.query(
    query_texts=["技術仕様"],
    n_results=5,
    where={"type": "technical"}
)

3. リランキング

from sentence_transformers import CrossEncoder

reranker = CrossEncoder('cross-encoder/ms-marco-MiniLM-L-12-v2')

def rerank(query: str, chunks: list[str], top_k: int = 3):
    # スコア計算
    pairs = [[query, chunk] for chunk in chunks]
    scores = reranker.predict(pairs)
    
    # スコアでソート
    ranked = sorted(zip(chunks, scores), key=lambda x: x[1], reverse=True)
    return [chunk for chunk, score in ranked[:top_k]]

4. ハイブリッド検索

def hybrid_search(query: str, top_k: int = 5):
    # ベクトル検索
    vector_results = vector_search(query, top_k=10)
    
    # キーワード検索（BM25）
    keyword_results = bm25_search(query, top_k=10)
    
    # 結果をマージ
    combined = merge_results(vector_results, keyword_results, top_k=top_k)
    return combined

🔍 関連する問題

この記事に関連するクイズ問題:

Q1: RAGの基本概念
Q2: エンベディングとベクトル検索
Q3: チャンキング戦略

📝 まとめ

RAG: 外部知識を使ってLLMの回答を改善
チャンキング: 適切なサイズでドキュメントを分割
エンベディング: テキストをベクトル化
ベクトルDB: 高速な類似検索
ベストプラクティス: メタデータ活用、リランキング、ハイブリッド検索

次のステップ: 実際にRAGシステムを構築し、自分のデータで試してみましょう！

推奨リソース:

Engineer Quiz

RAG入門 - LLMに最新情報を与える技術

RAG入門 - LLMに最新情報を与える技術

📚 概要

🕰️ 歴史的背景

LLMの課題

RAGの登場 - 2020年

最近の発展

🔧 技術解説

RAGの仕組み

1. チャンキング戦略

2. エンベディング（ベクトル化）

3. ベクトルデータベース

4. LLMとの統合

💡 実践例: 完全なRAGシステム

📊 RAG vs ファインチューニング

🎯 ベストプラクティス

1. チャンクサイズの最適化

2. メタデータの活用

3. リランキング

4. ハイブリッド検索

🔍 関連する問題

📝 まとめ