BioErrorLog Tech Blog

Pythonで実装するよりもかなりコード量が多くなってしまいましたね。

それではTool/Resource/Promptの実装をそれぞれ見ていきます。

Toolの実装

   // Tool: Add operation
    addTool := mcp.NewTool(
        "add",
        mcp.WithDescription("Add two numbers"),
        mcp.WithNumber("x",
            mcp.Required(),
        ),
        mcp.WithNumber("y",
            mcp.Required(),
        ),
    )
    s.AddTool(addTool, addToolHandler)

// ~中略~

func addToolHandler(ctx context.Context, request mcp.CallToolRequest) (*mcp.CallToolResult, error) {
    x := request.Params.Arguments["x"].(float64)
    y := request.Params.Arguments["y"].(float64)
    return mcp.NewToolResultText(fmt.Sprintf("%.2f", x+y)), nil
}

NewToolでToolのdescriptionや引数などの属性を定義
AddToolで実際の処理ロジックと併せてToolを登録

というのが大まかな構成です。

NewToolはいわゆるOptions Patternになっており、ToolOption型の関数を必要に応じて動的に渡すことでToolの定義を設定していきます。

NewToolのソースコード:

// NewTool creates a new Tool with the given name and options.
// The tool will have an object-type input schema with configurable properties.
// Options are applied in order, allowing for flexible tool configuration.
func NewTool(name string, opts ...ToolOption) Tool {
    tool := Tool{
        Name: name,
        InputSchema: ToolInputSchema{
            Type:       "object",
            Properties: make(map[string]interface{}),
            Required:   nil, // Will be omitted from JSON if empty
        },
    }

    for _, opt := range opts {
        opt(&tool)
    }

    return tool
}

NewToolにオプション引数として渡すToolOption型の関数(を返す関数)には、下記が用意されています:

WithDescription: Toolにdescriptionを追加
WithBoolean: Toolにbooleanの引数を定義
WithNumber: Toolにnumberの引数を定義
WithString: Toolにstringの引数を定義
WithObject: Toolにobjectの引数を定義
WithArray: Toolにarrayの引数を定義

実際にToolが実行する処理は、上記コード例のaddToolHandlerのようにAddToolで登録します。

この処理はToolHandlerFunc型の関数として実装します。 ToolHandlerFuncはこのような形をした型です：

// ToolHandlerFunc handles tool calls with given arguments.
type ToolHandlerFunc func(ctx context.Context, request mcp.CallToolRequest) (*mcp.CallToolResult, error)

Tool呼び出し時に渡されたパラメータは、request.Params.Arguments["<param name>"]のようにして取得できます。

Resourceの実装

   // Resource: Greeting template
    greetingResource := mcp.NewResourceTemplate(
        "greeting://{name}",
        "getGreeting",
        mcp.WithTemplateDescription("Get a personalized greeting"),
        mcp.WithTemplateMIMEType("text/plain"),
    )
    s.AddResourceTemplate(greetingResource, greetingResourceHandler)

// ~中略~

func greetingResourceHandler(ctx context.Context, request mcp.ReadResourceRequest) ([]mcp.ResourceContents, error) {
    name, err := extractNameFromURI(request.Params.URI)
    if err != nil {
        return nil, err
    }

    return []mcp.ResourceContents{
        mcp.TextResourceContents{
            URI:      request.Params.URI,
            MIMEType: "text/plain",
            Text:     fmt.Sprintf("Hello, %s!", name),
        },
    }, nil
}

// Extracts the name from a URI formatted as "greeting://{name}"
func extractNameFromURI(uri string) (string, error) {
    const prefix = "greeting://"
    if !strings.HasPrefix(uri, prefix) {
        return "", fmt.Errorf("invalid URI format: %s", uri)
    }
    name := strings.TrimPrefix(uri, prefix)
    if name == "" {
        return "", fmt.Errorf("name is empty in URI: %s", uri)
    }
    return name, nil
}

NewResourceTemplateでResourceのdescriptionや引数などの属性を定義
AddResourceTemplateでResource返却ロジックと併せてResourceを登録

構成はToolの実装とよく似ていますので詳細は省きますが、ResourceでもTool同様にOptions Patternに沿って動的にResourceの定義を設定していきます。

※ 上記コード例はResource Template(dynamic URI)を実装しています。 (Resource Templateではない)Resourceには、それ用のAPIが用意されているのでそちらを使用してください。

Promptの実装

   // Prompt: Japanese translation template
    translationPrompt := mcp.NewPrompt(
        "translationJa",
        mcp.WithPromptDescription("Translating to Japanese"),
        mcp.WithArgument("txt", mcp.RequiredArgument()),
    )
    s.AddPrompt(translationPrompt, translationPromptHandler)

// ~中略~

func translationPromptHandler(ctx context.Context, request mcp.GetPromptRequest) (*mcp.GetPromptResult, error) {
    txt := request.Params.Arguments["txt"]
    prompt := fmt.Sprintf("Please translate this sentence into Japanese:\n\n%s", txt)
    return mcp.NewGetPromptResult(
        "Translating to Japanese",
        []mcp.PromptMessage{
            mcp.NewPromptMessage(
                mcp.RoleAssistant,
                mcp.NewTextContent(prompt),
            ),
        },
    ), nil
}

NewPromptでPromptのdescriptionや引数などの属性を定義
AddPromptでPrompt返却ロジックと併せてPromptを登録

これもToolやResourceの実装と構成は同じです。それぞれ用意された型を利用してPromptを実装していきます。

MCP InspectorでMCPサーバーをテスト実行する

最後に、Goで実装した上記MCPサーバーをMCP Inspectorでテスト実行していきます。

MCP Inspectorの使い方については別途記事にまとめているので、このツールが初見の方はこちらもご参照ください:

Goで実装したMCPサーバーをMCP Inspectorで起動：

npx @modelcontextprotocol/inspector go run main.go

無事MCP Inspectorが起動したら、ブラウザからhttp://127.0.0.1:6274を開いてMCPサーバーで実装したTool/Resource/Promptの各機能を叩いてみます。

無事、Goで実装したMCPサーバーが想定通りのレスポンスを返していることが確認できました。

おわりに

今回はMCPサーバーをGoで実装してみました。

Python SDKに比べるとコード量も多く煩雑ですが、Go製のツールと組み合わせてMCPサーバーを実装するのに役に立ちそうです。

以上、どなたかの参考になれば幸いです。

[関連記事]

参考

2025-04-07

MCP CLIでMCPサーバー開発を効率化する

LLM MCP Genarative AI AI Claude Python

MCP CLIの使い方をまとめます。

はじめに
MCP CLIの使い方
おわりに
参考

はじめに

MCP CLIはMCPサーバー開発を効率化するツールで、MCP SDKに内包されています。

今回はこのMCP CLIの使い方をメモします。

The English translation of this post is here.

MCP CLIの使い方

MCP Python SDKを使って、MCP CLIを利用する方法を見ていきます。

MCP CLIをインストールする

MCP CLIを使うには、MCP SDKをcliオプション付きmcp[cli]でインストールします。

# uvを使う場合
uv add "mcp[cli]"

# pipを使う場合
pip install "mcp[cli]"

インストールできたかを確認：

$ mcp version
MCP version 1.6.0

MCP CLIの機能一覧

mcp --helpで機能の一覧が確認できます。

$ mcp --help
                                                                       
 Usage: mcp [OPTIONS] COMMAND [ARGS]...                                
                                                                       
 MCP development tools                                                 
                                                                       
╭─ Options ───────────────────────────────────────────────────────────╮
│ --help          Show this message and exit.                         │
╰─────────────────────────────────────────────────────────────────────╯
╭─ Commands ──────────────────────────────────────────────────────────╮
│ version   Show the MCP version.                                     │
│ dev       Run a MCP server with the MCP Inspector.                  │
│ run       Run a MCP server.                                         │
│ install   Install a MCP server in the Claude desktop app.           │
╰─────────────────────────────────────────────────────────────────────╯

mcp version: バージョン確認
mcp run: MCPサーバーを実行
mcp dev: MCPサーバーをMCP Inspectorで実行
mcp install: MCPサーバーをClaude Desktopに接続

それではmcp version以外の3つの機能(run/dev/install)を見ていきます。

mcp runの使い方

mcp runではMCPサーバーが実行/起動されます。

まずは--helpで使い方を確認：

$ mcp run --help
                                                                                                
 Usage: mcp run [OPTIONS] FILE_SPEC                                                             
                                                                                                
 Run a MCP server.                                                                              
 The server can be specified in two ways:                                                       
 1. Module approach: server.py - runs the module directly, expecting a server.run() call.       
 2. Import approach: server.py:app - imports and runs the specified server object.              
                                                                                                
 Note: This command runs the server directly. You are responsible for ensuring all dependencies 
 are available.                                                                                 
 For dependency management, use `mcp install` or `mcp dev` instead.                             
                                                                                                
╭─ Arguments ──────────────────────────────────────────────────────────────────────────────────╮
│ *    file_spec      TEXT  Python file to run, optionally with :object suffix [default: None] │
│                           [required]                                                         │
╰──────────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Options ────────────────────────────────────────────────────────────────────────────────────╮
│ --transport  -t      TEXT  Transport protocol to use (stdio or sse) [default: None]          │
│ --help                     Show this message and exit.                                       │
╰──────────────────────────────────────────────────────────────────────────────────────────────╯

MCPサーバーのエントリーポイントを指定してmcp runを実行します。

mcp run server.py (Module approachで実行)
mcp run server.py:app (Import approachで実行)

例えば下記のようにrun()をスクリプト内で呼び出している場合はそのままmodule approachでmcp run server.py、そうでない場合はimport approachでmcp run server.py:appとします。

from mcp.server.fastmcp import FastMCP


app = FastMCP("HelloMCP")

# ~~ 中略 ~~

if __name__ == "__main__":
    app.run(transport='stdio')

mcp devの使い方

mcp devでは、ブラウザベースのMCPサーバーテストツールであるMCP Inspectorを起動できます。

mcp devの使い方:

$ mcp dev --help

                                                                       
 Usage: mcp dev [OPTIONS] FILE_SPEC                                    
                                                                       
 Run a MCP server with the MCP Inspector.                              
                                                                       
╭─ Arguments ─────────────────────────────────────────────────────────╮
│ *    file_spec      TEXT  Python file to run, optionally with       │
│                           :object suffix                            │
│                           [default: None]                           │
│                           [required]                                │
╰─────────────────────────────────────────────────────────────────────╯
╭─ Options ───────────────────────────────────────────────────────────╮
│ --with-editable  -e      DIRECTORY  Directory containing            │
│                                     pyproject.toml to install in    │
│                                     editable mode                   │
│                                     [default: None]                 │
│ --with                   TEXT       Additional packages to install  │
│ --help                              Show this message and exit.     │
╰─────────────────────────────────────────────────────────────────────╯

先述のmcp run同様、MCPサーバーのエントリーポイントをmodule approachまたはimport approachで指定して実行します。

# Module approach
mcp dev server.py

# Import approach
mcp dev server.py:app

MCP Inspector自体の使い方については別記事にまとめているので、よろしければこちらをご覧ください：

mcp installの使い方

mcp installでは、MCPサーバーをClaude Desktopにインストールします。

Claude DesktopにMCPサーバーを接続する手順を愚直にやるなら、Claude Desktopのconfigファイルを手動で編集する必要があります。

参考:
www.bioerrorlog.work

mcp installを使えば、MCPサーバーのClaude Desktop configファイルへの追記を自動で行うことができます。

$ mcp install --help
                                                                                       
 Usage: mcp install [OPTIONS] FILE_SPEC                                                
                                                                                       
 Install a MCP server in the Claude desktop app.                                       
 Environment variables are preserved once added and only updated if new values are     
 explicitly provided.                                                                  
                                                                                       
╭─ Arguments ─────────────────────────────────────────────────────────────────────────╮
│ *    file_spec      TEXT  Python file to run, optionally with :object suffix        │
│                           [default: None]                                           │
│                           [required]                                                │
╰─────────────────────────────────────────────────────────────────────────────────────╯
╭─ Options ───────────────────────────────────────────────────────────────────────────╮
│ --name           -n      TEXT       Custom name for the server (defaults to         │
│                                     server's name attribute or file name)           │
│                                     [default: None]                                 │
│ --with-editable  -e      DIRECTORY  Directory containing pyproject.toml to install  │
│                                     in editable mode                                │
│                                     [default: None]                                 │
│ --with                   TEXT       Additional packages to install                  │
│ --env-var        -v      TEXT       Environment variables in KEY=VALUE format       │
│ --env-file       -f      FILE       Load environment variables from a .env file     │
│                                     [default: None]                                 │
│ --help                              Show this message and exit.                     │
╰─────────────────────────────────────────────────────────────────────────────────────╯

mcp runやmcp dev同様にMCPサーバースクリプトを指定してmcp installを実行します。

--nameオプションにはClaude Desktopで表示するMCPサーバー名を、--env-var (-v)や--env-file (-f)からは環境変数を指定できます。

# 例
mcp install server.py

# MCPサーバー名を指定
mcp install server.py --name "My MCP"

# 環境変数を指定
mcp install server.py -v API_KEY=abc123 -v DB_URL=postgres://...
mcp install server.py -f .env

Claude Desktopのconfigファイル(claude_desktop_config.json)が空の状態でmcp installを実行した例：

mcp install main.py --name "My MCP"

# claude_desktop_config.json
{
  "mcpServers": {
    "My MCP": {
      "command": "uv",
      "args": [
        "run",
        "--with",
        "mcp[cli]",
        "mcp",
        "run",
        "/path/to/main.py"
      ]
    }
  }
}

このようにclaude_desktop_config.jsonが追記され、MCPサーバーがClaude Desktopに登録されます。

おわりに

以上、MCP CLIの使い方をまとめました。

どなたかの参考になれば幸いです。

[関連記事]

Ref. GitHub - bioerrorlog/hellomcp: The minimal Python MCP server implementation with MCP Python SDK.

参考

2025-04-07

MCPサーバーのテストツール"MCP Inspector"の使い方

LLM MCP Genarative AI AI Python

MCPサーバーのブラウザベースのテストツール"MCP Inspector"の使い方の備忘録です。

はじめに
MCP Inspectorの使い方
おわりに
参考

はじめに

MCP Inspectorは、MCPが公式に提供しているブラウザベースのMCPサーバーテスト/デバッグツールです。

GitHub - modelcontextprotocol/inspector: Visual testing tool for MCP servers

使い方をまとめます。

# 作業環境: macOS
# バージョン
$ python --version
Python 3.11.9
$ node --version
v20.19.0

The English translation of this post is here.

MCP Inspectorの使い方

前準備: ミニマムなMCPサーバーを用意する

まず本記事で使うミニマムなMCPサーバーをPythonで用意します。パッケージマネージャーにはuvを使っていきます。

# Pythonプロジェクト作成
uv init hellomcp
cd hellomcp

# 仮装環境作成
uv venv
source .venv/bin/activate

# MCPサーバーのPython SDKをインストール
uv add "mcp[cli]"

main.pyには下記のようにミニマムなMCPサーバーを実装します。

from mcp.server.fastmcp import FastMCP


mcp = FastMCP("HelloMCP")


@mcp.tool()
def add(a: int, b: int) -> int:
    """Add two numbers"""
    return a + b


@mcp.resource("greeting://{name}")
def get_greeting(name: str) -> str:
    """Get a personalized greeting"""
    return f"Hello, {name}!"


@mcp.prompt()
def translation_ja(txt: str) -> str:
    """Translating to Japanese"""
    return f"Please translate this sentence into Japanese:\n\n{txt}"

MCPが提供する3つの主要機能であるTools, Resources, Promptsをそれぞれ簡単に用意しました。

では、このMCPサーバーを対象にMCP Inspectorを使ってテスト実行していきます。

MCP Inspectorを起動する

MCP Inspectorの起動方法は2種類あります。

MCP CLIで起動
npx @modelcontextprotocol/inspectorで起動

uv add "mcp[cli]"のようにしてSDKをcliオプション付きでインストールした場合、MCP CLIが使えます。 mcp devでMCP Inspectorを起動可能です。

$ mcp dev --help
                                                                                          
 Usage: mcp dev [OPTIONS] FILE_SPEC                                                       
                                                                                          
 Run a MCP server with the MCP Inspector.                                                 
                                                                                          
╭─ Arguments ────────────────────────────────────────────────────────────────────────────╮
│ *    file_spec      TEXT  Python file to run, optionally with :object suffix           │
│                           [default: None]                                              │
│                           [required]                                                   │
╰────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Options ──────────────────────────────────────────────────────────────────────────────╮
│ --with-editable  -e      DIRECTORY  Directory containing pyproject.toml to install in  │
│                                     editable mode                                      │
│                                     [default: None]                                    │
│ --with                   TEXT       Additional packages to install                     │
│ --help                              Show this message and exit.                        │
╰────────────────────────────────────────────────────────────────────────────────────────╯

# 実行例
$ mcp dev main.py
Starting MCP inspector...
⚙️ Proxy server listening on port 6277
🔍 MCP Inspector is up and running at http://127.0.0.1:6274 🚀

あるいは、npx @modelcontextprotocol/inspectorでMCP Inspectorを直接起動することも可能です。 mcp devでも内部ではこのコマンドが使われています。

npx @modelcontextprotocol/inspector <MCPサーバー起動コマンド>

# 実行例
$ npx @modelcontextprotocol/inspector mcp run main.py
Starting MCP inspector...
⚙️ Proxy server listening on port 6277
🔍 MCP Inspector is up and running at http://127.0.0.1:6274 🚀

MCP Inspectorが起動したら、ブラウザからhttp://127.0.0.1:6274を開いてMCP Inspector画面に行きます。

MCP Inspectorを使う

では、http://127.0.0.1:6274からMCP Inspectorを使っていきます。

まずは、左タブの"▶︎Connect"ボタンをクリックし、MCPサーバーとMCP Inspectorを接続します。

ここの各上タブResources/Prompts/Toolsから、MCPサーバーが提供している機能のテスト実行が可能です。

それぞれList系の呼び出しでMCPサーバーが提供しているResources/Prompts/Toolsを一覧で取得し、それぞれの関数を引数を添えて呼び出しできます。

わざわざClaude DesktopなどのMCPクライアントに接続せずとも、MCPサーバーの機能をE2Eでピンポイントで試せるのは便利ですね。

おわりに

MCPサーバーのテストツール"MCP Inspector"の使い方の備忘録でした。

特にToolsは呼び出すかどうかがモデル次第/model-controlledなので、実際のMCPクライアントに接続してテストするのは面倒です。 MCP InspectorでMCPサーバーのE2Eな振る舞いをチェックすれば、時間の節約になりそうですね。

以上、どなたかの参考になれば幸いです。

[関連記事]

参考

2025-03-30

AIエージェントパターン&プラクティスまとめ

LLM MCP Genarative AI AI Python

AIエージェントパターンとプラクティスを、Anthropicの記事"Building effective agents"とそのフォローアップ動画から学びます。

はじめに
前提: AIエージェントの定義
AIエージェントパターン
プラクティス
おわりに
参考

はじめに

"2025年はAIエージェント元年"という言葉も耳にするほど、AIエージェントが盛り上がりを見せています。

今回は、Anthropicが公開した記事Building effective agentsとそのフォローアップ動画をベースに、AIエージェントパターンとプラクティスを簡単にまとめます。

The English translation of this post is here.

前提: AIエージェントの定義

現在、"AIエージェント"という単語の意味するところは人によってかなり異なるのが実情です。

この記事では、AIエージェントとして語られる多くの種類のシステムを"agentic system"として包括し、それを下記の2つに分類します。

Workflow: LLMとツールが予め決まったフローにしたがって連携するシステム
Agent: LLMが自らのプロセスやツールの使用方法を動的に決定し、タスクの達成方法を自らコントロールするシステム

それでは、これらAIエージェントの設計パターンを見ていきます。

AIエージェントパターン

以下、画像はこちらからの引用です。

Augmented LLM

まずはこちら"Augmented LLM"が、AIエージェントシステムの基本的な構成要素です。

エージェントシステムの構成単位は、

Retrieval
Tools
Memory

で拡張されたLLMです。

なおこれら拡張の実装は、MCPを使うとより簡易に実装できます。

以後、"LLM呼び出し"はこのAugmented LLMを指すものとします。

Workflow: Prompt chaining

タスクを一連のステップに分解し、入出力を受け渡して順番に処理するworkflowパターン
明確な手順に分解可能なタスクに最適
ユースケース例:
- マーケティングコピーの作成 > 翻訳
- 文書の骨子作成 > 骨子レビュー > 骨子に基づいた文書作成

Workflow: Routing

入力を分類し、適切な後続処理へ振り分けるworkflowパターン
タスクに明確なカテゴリ分けがある複雑な処理に適す
ユースケース例:
- カスタマーサービスの問い合わせを種別に分類して処理
- 質問の難易度に応じてモデルを振り分け、コストと速度を最適化

Workflow: Parallelization

タスクを分割して同時処理し、その出力を集約するworkflowパターン
異なるタスクに分割するパターンと、同じタスクを複数回実行してバリエーションを持たせるパターンの2つが考えられる
速度処理の向上、または複数視点からの検証を経たアウトプットの生成に向く
ユースケース例:
- 回答生成と不適切な内容をチェックするガードレール処理の並列実行
- LLM回答内容の評価に、複数観点から評価を実行
- 複数の観点/プロンプトでコードの脆弱性をチェック
- コンテンツが不適切かどうかを複数観点/プロンプトでチェックし、誤検出を減少

Workflow: Orchestrator-workers

オーケストレーターLLMがタスクを動的にブレイクダウンし、ワーカーLLMにタスクを割り当てたのち、結果を統合するworkflowパターン
必要なサブタスクが予測できない複雑な処理に最適
Parallelizationパターンとの違いは、サブタスクが事前に定義されておらず、入力に応じてオーケストレーターが動的にタスク分割を決定する点にある
ユースケース例:
- 複数のファイルに対して複雑な変更を加える必要のあるコーディングタスク
- 複数の情報源から関連情報を収集・分析する検索タスク

Workflow: Evaluator-optimizer

1つのLLMが出力し、別のLLMが評価&フィードバックを繰り返すworkflowパターン
明確な評価基準があり、フィードバックで品質を上げられる場面で有効
ユースケース例:
- 文学作品の翻訳
- 複数回の検索&分析を要するタスク

Agent

Agentパターン: 目的に応じて自律的にタスク計画・実行・ツール使用を行い、環境や人間からのフィードバックを得て繰り返し行動する
LLMロジック自体はシンプルであり、ツールやその説明を入念に設計することが重要
解決手順が読めない複雑で非定型、オープンエンドなタスクに適する
ユースケース例:
- コーディングエージェント
- コンピューターを操作するエージェント

プラクティス

ここからは、AIエージェントを構築するときのプラクティスを見ていきます。

可能な限りシンプルにする

LLMを使ってアプリを開発するときは、可能な限りシンプルな構成にし、必要な時のみ複雑性を取り入れること。

これはつまり、エージェントシステムを作る必要がないケースも多い、ということです。多くの場合、In-Contextな例示やRetrievalを活用したLLMの1回呼び出しのみで十分だったりします。

もしより複雑な振る舞いが必要な場合のみ、エージェントシステムを検討します。あらかじめ決まったタスクの遂行にはworkflowを、より変化に富んだ柔軟な振る舞いが必要な時はagentが良いでしょう。

フレームワークの採用は慎重に

AIエージェントを構築するためのフレームワークは数多存在し、また日々新しいフレームワークが登場しています。

例:

しかし、まずはフレームワークを介さず、LLMのAPIを直接使用することをお勧めします。

確かにフレームワークを使えば、AIエージェントの実装が簡単になることも多いでしょう。しかし、抽象レイヤーが増えれば内部の処理やプロンプトが見えにくくなり、デバッグが難しくなります。シンプルな処理でも、必要以上に複雑性を取り込んでしまうことにも繋がりかねません。

まずはLLMのAPIを直接叩くところから始めたのちに、必要あればフレームワークの採用を検討しましょう。

良いモデルの登場によってもっと良くなるプロダクトを作る

より良いモデルが登場によって不要になってしまうAIプロダクトではなく、良いモデルの登場によってより価値が上がるAIプロダクトを作りましょう。

AIモデルの進歩は止まりません。

効果的なAIエージェントの条件

AIエージェントは、下記の要素を持ったタスクで特に有効です。

会話と行動の両方が必要
成功基準が明確
フィードバックループの繰り返しが可能
人間による監督が効果的

AIエージェントにマッチする典型的なケースの例：

カスタマーサポート
コーディングエージェント

ツール設計を重視せよ

ツールの定義と設計は極めて重要です。本体のプロンプトにかけているプロンプトエンジニアリングと同様、十分な注意と労力を払いましょう。

ツールの設計において押さえておくべきポイントをいくつか紹介します。

出力形式はモデルが扱いやすい形にする
- 例: コードを返す場合、Markdownにする方が、JSON内に書くよりも各種エスケープ処理の負担が少ない分モデルが扱いやすい
モデルが考えるためのトークン数に余裕を持たせる
モデルが学習してきたであろうインターネット上のテキストでよく見られる形式に近づける
余分な負担を避ける
- 例: 大量のコードの正確な行数のカウントや、文字列のエスケープなど、余計な負担をモデルなるべくかけない
モデルの立場に立って考える
- ツールの説明は明確か？それとも理解に労力を要する説明か？パラメータ名や説明をどう変更すれば、より明確な説明になるか？
モデルがツールをどのように使うかをテストする
- 多数の入力例を実行し、モデルがどのようなミスをするかを確認して改善する
ミス防止策を施す
- 説明や引数を変更して、ミスが起きに起きにくくなるよう改善する

人間とコンピュータのインターフェース(HCI)の設計に割いている労力と同じくらいの力を、エージェントとコンピュータのインターフェース(ACI)の構築に費やしましょう。

リファレンス実装

本記事で取り上げたパターンのリファレンス実装が、Antiropicから提供されています。

実際に試してみるときの一つの参考になるのでおすすめです。

おわりに

以上、Anthropicの記事からAIエージェントのパターンとプラクティスをまとめました。

どなたかの参考になれば幸いです。

[関連記事]

参考

2025-03-28

GPTとMCPサーバーを接続して使う | Model Context Protocol入門

LLM MCP Genarative AI ChatGPT AI Python

MCP (Model Context Protocol)をOpenAI API / GPTと組み合わせて動かして、MCPの仕組みを学んでいきます。

はじめに
おさらい: MCPとは何か
GPTとMCPサーバーを接続して使う
おわりに
参考

はじめに

2024年11月にAnthropicがMCP:Model Context Protocolを発表してから、コミュニティの間でも徐々にMCPが盛り上がりを見せてきました。

情報はあらゆるところで見聞きしていましたが、いまだに自分で手を動かしてMCPを触ったことがありません。今回はMCP公式Quickstartをベースに、ClaudeではなくOpenAI API / GPTを使ってMCPと接続し、その仕組みを学びます。

The English translation of this post is here.

おさらい: MCPとは何か

最初にMCPの概要をざっとおさらいします。

MCPとは、LLMと外部ツールとのやり取り方法を定めたオープンプロトコルです。

比喩として、USB-Cのようなものとよく言われます。 PCと周辺機器を繋ぐときに周辺機器がそれぞれ独自の規格とコネクタを持っていたならば、PCは無数のソケットを用意する必要が出てしまいます。しかしUSB-Cという共通の規格を使うことで、PCはUSB-Cソケット一つであらゆる周辺機器と接続することが可能です。

MCPも同様に、LLMと周辺ツールとの接続方法を規定することで、LLMから外部ツールを利用するときに都度独自実装をする必要がなくなります。

より技術者向けの比喩としては、LSP (Language Server Protocol)のようなもの、と言った方がわかりやすいでしょうか。

AnthropicのMCP担当者も、MCPの提唱にあたってLSPの成功にインスパイアされたと述べています。

実際の動作イメージはこちら:

MCPクライアントは、LLMをAPIで呼び出すコードであったり、LLMと接続されたIDEだったり、Claudeデスクトップだったりします。 MCPサーバーはMCPクライアントと同様にローカルで動くプロセスで(リモートでのMCPサーバーの実行仕様については絶賛議論中と言えるでしょう。詳しくはロードマップ参照)、LLMによるツール呼び出しを受けて各種外部処理やWeb APIなどを呼び出します。

では、MCP Quickstartを元にMCPサーバーとGPTベースのMCPクライアントを実装し、その仕組みをコードレベルで学んでいきます。

GPTとMCPサーバーを接続して使う

今回実装したコードはこちらに配置しています。

Ref. mcp-gpt-tutorial/weather at main · bioerrorlog/mcp-gpt-tutorial · GitHub

MCPサーバーの実装

まずはMCPサーバーを実装していきます。

もちろん、サーバーサイドの実装はMCPクライアントがClaudeベースでもGPTベースでも変わりません。よってこのコードは公式Quickstartのままです。

from typing import Any
import httpx
from mcp.server.fastmcp import FastMCP

# Initialize FastMCP server
mcp = FastMCP("weather")

# Constants
NWS_API_BASE = "https://api.weather.gov"
USER_AGENT = "weather-app/1.0"


async def make_nws_request(url: str) -> dict[str, Any] | None:
    """Make a request to the NWS API with proper error handling."""
    headers = {
        "User-Agent": USER_AGENT,
        "Accept": "application/geo+json"
    }
    async with httpx.AsyncClient() as client:
        try:
            response = await client.get(url, headers=headers, timeout=30.0)
            response.raise_for_status()
            return response.json()
        except Exception:
            return None


def format_alert(feature: dict) -> str:
    """Format an alert feature into a readable string."""
    props = feature["properties"]
    return f"""
Event: {props.get('event', 'Unknown')}
Area: {props.get('areaDesc', 'Unknown')}
Severity: {props.get('severity', 'Unknown')}
Description: {props.get('description', 'No description available')}
Instructions: {props.get('instruction', 'No specific instructions provided')}
"""


@mcp.tool()
async def get_alerts(state: str) -> str:
    """Get weather alerts for a US state.

    Args:
        state: Two-letter US state code (e.g. CA, NY)
    """
    url = f"{NWS_API_BASE}/alerts/active/area/{state}"
    data = await make_nws_request(url)

    if not data or "features" not in data:
        return "Unable to fetch alerts or no alerts found."

    if not data["features"]:
        return "No active alerts for this state."

    alerts = [format_alert(feature) for feature in data["features"]]
    return "\n---\n".join(alerts)


@mcp.tool()
async def get_forecast(latitude: float, longitude: float) -> str:
    """Get weather forecast for a location.

    Args:
        latitude: Latitude of the location
        longitude: Longitude of the location
    """
    # First get the forecast grid endpoint
    points_url = f"{NWS_API_BASE}/points/{latitude},{longitude}"
    points_data = await make_nws_request(points_url)

    if not points_data:
        return "Unable to fetch forecast data for this location."

    # Get the forecast URL from the points response
    forecast_url = points_data["properties"]["forecast"]
    forecast_data = await make_nws_request(forecast_url)

    if not forecast_data:
        return "Unable to fetch detailed forecast."

    # Format the periods into a readable forecast
    periods = forecast_data["properties"]["periods"]
    forecasts = []
    for period in periods[:5]:  # Only show next 5 periods
        forecast = f"""
{period['name']}:
Temperature: {period['temperature']}°{period['temperatureUnit']}
Wind: {period['windSpeed']} {period['windDirection']}
Forecast: {period['detailedForecast']}
"""
        forecasts.append(forecast)

    return "\n---\n".join(forecasts)


if __name__ == "__main__":
    # Initialize and run the server
    mcp.run(transport='stdio')

色々とやっていそうですが、MCPサーバーとしての機能は極めてシンプルです。

まずこのMCPサーバーは、2つのtoolを提供しています。

アメリカの州の気象警報を取得するtool: get_alerts()
緯度経度から気象予報を取得するtool: get_forecast()

これらのtoolは、tool()デコレータによってこのMCPサーバーのtoolとして登録されます。

登録されたtoolは、MCPクライアントセッションからlist_tools()で一覧として取得されたり、call_tool()で呼び出されたりします。

そしてサーバーの起動はmcp.run(transport='stdio')で行い、ここでクライアントサイドとの通信を標準入出力(stdio)で行うことを規定しています。

MCPサーバー/クライアント間の通信は、デフォルトで標準入出力とServer-Sent Events (SSE)がサポートされています。ローカルでサーバーを起動する時は、標準入出力でやりとりするのがシンプルです。

Ref. Transports - Model Context Protocol

これ以外の部分は、https://api.weather.govのAPIを叩いて結果を整形しているだけです。 MCPは関係ない純粋なPythonの実装ですね。

MCPクライアントの実装

次は、MCPクライアント側を実装していきます。公式Quickstartの実装から、Claude呼び出しではなくGPT呼び出しに変更します。

import asyncio
from typing import Optional
from contextlib import AsyncExitStack

from mcp import ClientSession, StdioServerParameters
from mcp.client.stdio import stdio_client

from openai import OpenAI
from dotenv import load_dotenv
import json

load_dotenv()  # load environment variables from .env


class MCPClient:
    def __init__(self):
        # Initialize session and client objects
        self.session: Optional[ClientSession] = None
        self.exit_stack = AsyncExitStack()
        self.openai = OpenAI()

    async def connect_to_server(self, server_script_path: str):
        """Connect to an MCP server

        Args:
            server_script_path: Path to the server script (.py or .js)
        """
        is_python = server_script_path.endswith('.py')
        is_js = server_script_path.endswith('.js')
        if not (is_python or is_js):
            raise ValueError("Server script must be a .py or .js file")

        command = "python" if is_python else "node"
        server_params = StdioServerParameters(
            command=command,
            args=[server_script_path],
            env=None
        )

        stdio_transport = await self.exit_stack.enter_async_context(stdio_client(server_params))
        self.stdio, self.write = stdio_transport
        self.session = await self.exit_stack.enter_async_context(ClientSession(self.stdio, self.write))

        await self.session.initialize()

        # List available tools
        response = await self.session.list_tools()
        tools = response.tools
        print("\nConnected to server with tools:", [tool.name for tool in tools])

    async def process_query(self, query: str) -> str:
        """Process a query using OpenAI and available tools"""
        messages = [
            {
                "role": "user",
                "content": query
            }
        ]

        response = await self.session.list_tools()
        available_tools = [{
            "type": "function",
            "function": {
                "name": tool.name,
                "description": tool.description,
                "parameters": tool.inputSchema
            }
        } for tool in response.tools]

        # Initial OpenAI API call
        response = self.openai.chat.completions.create(
            model="gpt-4o",
            messages=messages,
            tools=available_tools,
            tool_choice="auto",
        )

        # Process response and handle tool calls
        final_text = []

        while True:
            reply = response.choices[0].message

            if reply.content and not reply.tool_calls:
                final_text.append(reply.content)
                messages.append({
                    "role": "assistant",
                    "content": reply.content
                })

            if reply.tool_calls:
                # Add the assistant message that triggered the tool calls
                messages.append({
                    "role": "assistant",
                    "tool_calls": [
                        {
                            "id": tool_call.id,
                            "type": "function",
                            "function": {
                                "name": tool_call.function.name,
                                "arguments": tool_call.function.arguments
                            }
                        }
                        for tool_call in reply.tool_calls
                    ]
                })

                for tool_call in reply.tool_calls:
                    tool_name = tool_call.function.name
                    tool_args = tool_call.function.arguments

                    # Execute tool call
                    parsed_args = json.loads(tool_args)
                    result = await self.session.call_tool(tool_name, parsed_args)
                    final_text.append(f"[Calling tool {tool_name} with args {parsed_args}]")

                    # Add tool response message
                    messages.append({
                        "role": "tool",
                        "tool_call_id": tool_call.id,
                        "name": tool_name,
                        "content": result.content,
                    })

                # Get next response from OpenAI
                response = self.openai.chat.completions.create(
                    model="gpt-4o",
                    messages=messages,
                )
            else:
                break

        return "\n".join(final_text)

    async def chat_loop(self):
        """Run an interactive chat loop"""
        print("\nMCP Client Started!")
        print("Type your queries or 'quit' to exit.")

        while True:
            try:
                query = input("\nQuery: ").strip()

                if query.lower() == 'quit':
                    break

                response = await self.process_query(query)
                print("\n" + response)

            except Exception as e:
                print(f"\nError: {str(e)}")

    async def cleanup(self):
        """Clean up resources"""
        await self.exit_stack.aclose()


async def main():
    if len(sys.argv) < 2:
        print("Usage: python client.py <path_to_server_script>")
        sys.exit(1)

    client = MCPClient()
    try:
        await client.connect_to_server(sys.argv[1])
        await client.chat_loop()
    finally:
        await client.cleanup()


if __name__ == "__main__":
    import sys
    asyncio.run(main())

Ref. mcp-gpt-tutorial/client at main · bioerrorlog/mcp-gpt-tutorial · GitHub

まずはmain関数から処理の全体像を見ましょう。

async def main():
    if len(sys.argv) < 2:
        print("Usage: python client.py <path_to_server_script>")
        sys.exit(1)

    client = MCPClient()
    try:
        await client.connect_to_server(sys.argv[1])
        await client.chat_loop()
    finally:
        await client.cleanup()

処理の流れ:

コマンド引数の処理
MCPサーバーの起動と接続
チャットループの実行
接続のクリーンアップ (終了時)

まず、実行コマンド引数からMCPサーバーのスクリプトパスを渡し、サーバーの起動と接続を行います。

    async def connect_to_server(self, server_script_path: str):
        """Connect to an MCP server

        Args:
            server_script_path: Path to the server script (.py or .js)
        """
        is_python = server_script_path.endswith('.py')
        is_js = server_script_path.endswith('.js')
        if not (is_python or is_js):
            raise ValueError("Server script must be a .py or .js file")

        command = "python" if is_python else "node"
        server_params = StdioServerParameters(
            command=command,
            args=[server_script_path],
            env=None
        )

        stdio_transport = await self.exit_stack.enter_async_context(stdio_client(server_params))
        self.stdio, self.write = stdio_transport
        self.session = await self.exit_stack.enter_async_context(ClientSession(self.stdio, self.write))

        await self.session.initialize()

        # List available tools
        response = await self.session.list_tools()
        tools = response.tools
        print("\nConnected to server with tools:", [tool.name for tool in tools])

やってること:

self.exit_stack.enter_async_context(stdio_client(server_params))でMCPサーバーをサブプロセスで起動
await self.exit_stack.enter_async_context(ClientSession(self.stdio, self.write))でクライアントとサーバーを標準入出力を介して接続
await self.session.list_tools()でMCPサーバーに登録されているtoolをリストアップしてprint表示

MCPクライアントがMCPサーバーと繋がったら、ユーザー入力 -> 応答のチャットループが始まります。

    async def chat_loop(self):
        """Run an interactive chat loop"""
        print("\nMCP Client Started!")
        print("Type your queries or 'quit' to exit.")

        while True:
            try:
                query = input("\nQuery: ").strip()

                if query.lower() == 'quit':
                    break

                response = await self.process_query(query)
                print("\n" + response)

            except Exception as e:
                print(f"\nError: {str(e)}")

待機中のターミナルでquitと入力すればチャットループは終了です。

それ以外の文字列が入力された場合は、process_query()でLLMに入力を渡します。

    async def process_query(self, query: str) -> str:
        """Process a query using OpenAI and available tools"""
        messages = [
            {
                "role": "user",
                "content": query
            }
        ]

        response = await self.session.list_tools()
        available_tools = [{
            "type": "function",
            "function": {
                "name": tool.name,
                "description": tool.description,
                "parameters": tool.inputSchema
            }
        } for tool in response.tools]

        # Initial OpenAI API call
        response = self.openai.chat.completions.create(
            model="gpt-4o",
            messages=messages,
            tools=available_tools,
            tool_choice="auto",
        )

        # Process response and handle tool calls
        final_text = []

        while True:
            reply = response.choices[0].message

            if reply.content and not reply.tool_calls:
                final_text.append(reply.content)
                messages.append({
                    "role": "assistant",
                    "content": reply.content
                })

            if reply.tool_calls:
                # Add the assistant message that triggered the tool calls
                messages.append({
                    "role": "assistant",
                    "tool_calls": [
                        {
                            "id": tool_call.id,
                            "type": "function",
                            "function": {
                                "name": tool_call.function.name,
                                "arguments": tool_call.function.arguments
                            }
                        }
                        for tool_call in reply.tool_calls
                    ]
                })

                for tool_call in reply.tool_calls:
                    tool_name = tool_call.function.name
                    tool_args = tool_call.function.arguments

                    # Execute tool call
                    parsed_args = json.loads(tool_args)
                    result = await self.session.call_tool(tool_name, parsed_args)
                    final_text.append(f"[Calling tool {tool_name} with args {parsed_args}]")

                    # Add tool response message
                    messages.append({
                        "role": "tool",
                        "tool_call_id": tool_call.id,
                        "name": tool_name,
                        "content": result.content,
                    })

                # Get next response from OpenAI
                response = self.openai.chat.completions.create(
                    model="gpt-4o",
                    messages=messages,
                )
            else:
                break

        return "\n".join(final_text)

少し長いですが、やっていることはMCPサーバーを使ったOpenAI APIのfunction callingです。

MCPサーバーとのセッションから、登録toolをリストで取得する
取得したtoolを添えてfunction calling
GPTがtool呼び出しを要求した場合は、MCPサーバーセッションから該当toolを呼び出し
tool呼び出し結果を踏まえて最終回答生成

以上、MCPクライアントサイドの実装でした。

ではいよいよ、これらMCPクライアント - サーバーを実行します。

MCPクライアント - サーバーの実行

MCP公式Quickstartも使っているPythonパッケージマネージャー"uv"を使って、MCPクライアントを実行します。

先述したように、このMCPクライアントの実装ではMCPサーバーはMCPクライアントのサブプロセスとして起動されるので、MCPクライアントを起動すればそのまま利用開始できます。

uv run client.py path/to/server.py

アメリカの州の気象警報または緯度経度を指定した気象予報を質問すると、MCPサーバーとのやり取りを経て回答が返ってきます。

MCPクライアント - サーバー実行の様子. カリフォルニア州の気象警報を質問し、その結果が得られた

おわりに

今回は、MCP公式QuickstartをGPTに置き換えて実行し、その実装方法を学びました。

つい先日、OpenAIがAgents SDKでMCPをサポートしたことによって、よりMCPが界隈のデファクトとなっていくことが期待できるようになりました。 AIエージェントの今後に期待大です。

以上、どなたかの参考になれば幸いです。

[関連記事]

参考

2025-03-20

uvの使い方: Pythonパッケージ&プロジェクトマネージャー

Python Rust

Rust製Pythonパッケージ&プロジェクトマネージャー "uv" の基本的な使い方の備忘録です。

はじめに
uvの使い方
おわりに
参考

はじめに

uvは、Rust製の高速なPythonパッケージ&プロジェクトマネージャーです。

pip
pip-tools
pipx
poetry
pyenv
twine
virtualenv

など、既存のパッケージマネージャー/プロジェクトマネージャー/仮想環境管理等の代替を目指しています。

uvは既存ツールよりも高速に動作する | 画像はuvのGitHubレポジトリより

uvは2024年に登場し、その後急速に注目を集めました。

uvと主要PythonツールとのGitHub star比較 | 画像はstar-historyより

最近のPythonプロジェクトでは、uvの使用を前提としたものも多くなってきました。私はまだuvを使ったことがないので、使い方をまとめます。

# 作業バージョン: Python
$ python --version
Python 3.11.9

# 作業バージョン: uv
$ uv --version
uv 0.6.8 (c1ef48276 2025-03-18)

The English translation of this post is here.

uvの使い方

uvをインストールする

macOSとLinuxでは、下記コマンドでインストールできます。

curl -LsSf https://astral.sh/uv/install.sh | sh

Windowsではこちら:

powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"

また、pipやpipxでもインストール可能です。

pip install uv

pipx install uv

uvの使い方を一覧で見る

uvの使い方は、コマンドuvをそのまま実行すれば一覧で見れます。

$ uv
An extremely fast Python package manager.

Usage: uv [OPTIONS] <COMMAND>

Commands:
  run      Run a command or script
  init     Create a new project
  add      Add dependencies to the project
  remove   Remove dependencies from the project
  sync     Update the project's environment
  lock     Update the project's lockfile
  export   Export the project's lockfile to an alternate format
  tree     Display the project's dependency tree
  tool     Run and install commands provided by Python packages
  python   Manage Python versions and installations
  pip      Manage Python packages with a pip-compatible interface
  venv     Create a virtual environment
  build    Build Python packages into source distributions and wheels
  publish  Upload distributions to an index
  cache    Manage uv's cache
  self     Manage the uv executable
  version  Display uv's version
  help     Display documentation for a command

Cache options:
  -n, --no-cache               Avoid reading from or writing to the cache, instead using a temporary directory for the duration of the operation [env: UV_NO_CACHE=]
      --cache-dir <CACHE_DIR>  Path to the cache directory [env: UV_CACHE_DIR=]

Python options:
      --managed-python       Require use of uv-managed Python versions [env: UV_MANAGED_PYTHON=]
      --no-managed-python    Disable use of uv-managed Python versions [env: UV_NO_MANAGED_PYTHON=]
      --no-python-downloads  Disable automatic downloads of Python. [env: "UV_PYTHON_DOWNLOADS=never"]

Global options:
  -q, --quiet                                      Do not print any output
  -v, --verbose...                                 Use verbose output
      --color <COLOR_CHOICE>                       Control the use of color in output [possible values: auto, always, never]
      --native-tls                                 Whether to load TLS certificates from the platform's native certificate store [env: UV_NATIVE_TLS=]
      --offline                                    Disable network access [env: UV_OFFLINE=]
      --allow-insecure-host <ALLOW_INSECURE_HOST>  Allow insecure connections to a host [env: UV_INSECURE_HOST=]
      --no-progress                                Hide all progress outputs [env: UV_NO_PROGRESS=]
      --directory <DIRECTORY>                      Change to the given directory prior to running the command
      --project <PROJECT>                          Run the command within the given project directory
      --config-file <CONFIG_FILE>                  The path to a `uv.toml` file to use for configuration [env: UV_CONFIG_FILE=]
      --no-config                                  Avoid discovering configuration files (`pyproject.toml`, `uv.toml`) [env: UV_NO_CONFIG=]
  -h, --help                                       Display the concise help for this command
  -V, --version                                    Display the uv version

Use `uv help` for more details.

今回はこの中から、使用機会の多そうなごく基本的なものをピックアップしてみていきます。

Pythonプロジェクトの作成

uv init <project name>

# 例
# uv init myproject

uv initすると、このようなPythonプロジェクトが生成されます。

myproject
├── .python-version
├── README.md
├── main.py
└── pyproject.toml

1 directory, 4 files

生成されるファイルの初期状態はこちら：

# .python-version
3.11

# README.md

# (白紙)

# main.py

def main():
    print("Hello from myproject!")


if __name__ == "__main__":
    main()

# pyproject.toml
[project]
name = "myproject"
version = "0.1.0"
description = "Add your description here"
readme = "README.md"
requires-python = ">=3.11"
dependencies = []

Python仮想環境の構築

uv venv

これで.venvフォルダが作成され、仮想環境が構築されます。

$ tree -L 3 .venv 
.venv
├── CACHEDIR.TAG
├── bin
│   ├── activate
│   ├── activate.bat
│   ├── activate.csh
│   ├── activate.fish
│   ├── activate.nu
│   ├── activate.ps1
│   ├── activate_this.py
│   ├── deactivate.bat
│   ├── pydoc.bat
│   ├── python -> /Users/bioerrorlog/.pyenv/versions/3.11.9/bin/python3.11
│   ├── python3 -> python
│   └── python3.11 -> python
├── lib
│   └── python3.11
│       └── site-packages
└── pyvenv.cfg

5 directories, 14 files

仮想環境の有効化方法:

# macOS/Linux
source .venv/bin/activate

# Windows
.venv\Scripts\activate

Pythonパッケージの追加

uv add <package name>

httpxをインストールする例はこちら：

$ uv add httpx
Resolved 9 packages in 248ms
Prepared 8 packages in 104ms
Installed 8 packages in 15ms
 + anyio==4.9.0
 + certifi==2025.1.31
 + h11==0.14.0
 + httpcore==1.0.7
 + httpx==0.28.1
 + idna==3.10
 + sniffio==1.3.1
 + typing-extensions==4.12.2

パッケージをインストールすると、pyproject.tomlの更新とuv.lockの作成/更新も自動で行われます。

なお謳い文句通り、uv addによるインストールはpipによるインストールと比べると、体感で間違いなく分かるほど速いです。

インストールしたパッケージの確認

pip freezeのようにインストールしたパッケージを確認する方法はこちら：

uv pip freeze

例えば先ほどのhttpxをインストールした状態だとこうなります。

$ uv pip freeze
anyio==4.9.0
certifi==2025.1.31
h11==0.14.0
httpcore==1.0.7
httpx==0.28.1
idna==3.10
sniffio==1.3.1
typing-extensions==4.12.2

また、インストールパッケージをtree上に表示する便利なコマンドも用意されています。

uv tree

# httpxをインストールした状態での実行例
$ uv tree
Resolved 9 packages in 9ms
myproject v0.1.0
└── httpx v0.28.1
    ├── anyio v4.9.0
    │   ├── idna v3.10
    │   ├── sniffio v1.3.1
    │   └── typing-extensions v4.12.2
    ├── certifi v2025.1.31
    ├── httpcore v1.0.7
    │   ├── certifi v2025.1.31
    │   └── h11 v0.14.0
    └── idna v3.10

Pythonコードの実行

uv run main.py

実行スクリプトをuv runに指定することでPythonスクリプトを実行できます。

おわりに

以上、コマンドのごく基本的な使い方をメモしました。

より詳細な使用方法は、uvコマンドの--helpを眺めるか、ドキュメントをご覧ください。

どなたかの参考になれば幸いです。

[関連記事]

参考

2025-03-14

LLMがBlenderで行う3Dモデリング: MeshGen/LLaMA-Mesh を試す

Blender AI LLM Genarative AI ChatGPT Python ゲーム制作論文絵

LLMが3Dモデル情報を出力するLLaMA-Mesh、そのBlenderアドオンであるMeshGenで遊びます。

番外編では、ChatGPTを使って同じく3Dモデリングさせてみます。

はじめに
LLaMA-Meshをざっくり理解する
MeshGenで3Dモデリング
番外編: ChatGPTで3Dモデリング
おわりに
参考

はじめに

近年は3Dモデルを出力する生成AIモデルも続々出てきています。

ローカルで簡単に試せるものはないかと探していると、NVIDA Labsが最近発表した"LLaMA-Mesh"というモデルを見つけました。

[2411.09595] LLaMA-Mesh: Unifying 3D Mesh Generation with Language Models

LLaMA-MeshのBlenderアドオンであるMeshGenも公開されており、Blenderにインストールしてローカルで実行することも可能です。

GitHub - huggingface/meshgen: A blender addon for generating meshes with AI

今回はこれをBlenderで実行して遊んでみます。

また、番外編では同じお題をChatGPTを使って3D生成させてみて、結果を比較します。

# 作業環境
Blender 4.2.0
M2 MacBook Air
メモリ16GB

やってみました

LLMがBlenderで行う3Dモデリング: MeshGen/LLaMA-Mesh を試す - BioErrorLog Tech Blog https://t.co/dsup5LDYf0
— BioErrorLog (@bioerrorlog) March 15, 2025

The English translation of this post is here.

LLaMA-Meshをざっくり理解する

LLaMA-Meshは3Dモデルをそのまま出力するタイプの生成AIモデルではなく、テキスト生成のLLMモデルを拡張して3D情報を出力するようにしたものです。

LLaMA-Meshではテキスト生成LLMモデルを拡張して、3D情報を出力するようにした | 画像は論文より引用

3Dモデル情報はテキストとして出力されます。各頂点の座標と面の位置をモデルが出力し、それを外部のソフトが3Dモデルとして解釈/構築するという流れです。

実際に論文では、このモデルを用いて生成したとされる3Dモデルの例が示されています：

LLaMA-Meshを用いて生成したとされる3Dモデル例 | 画像は論文より引用

なかなかいい感じに3Dモデルが生成できているように見えますね。

ちなみにこのようなテキスト生成モデルの拡張としての3D生成には、下記のような利点があると論文では謳われています。

LLMがすでに備えている空間的知識を活用できる
対話型の3D生成フローが可能
3Dモデルの理解が可能

今回はこのモデルを実際にBlender上で試してみることが主旨なので、研究内容自体にはこれ以上は深入りしません。詳細は論文をご覧ください。

MeshGenで3Dモデリング

MeshGenは、LLaMA-MeshをBlender上で実行できるようにしたアドオンです。今回はこれを使ってBlender上で3Dモデルを生成させてみます。

MeshGenをBlenderにインストールする

まずはMeshGenをBlenderにインストールします。

私の作業環境は、Blender4.2.0 / M2 MacBook Airです。 MeshGenでは、GPUはNVIDIA/CUDAのみサポートされているので、今回はCPUで実行します。

まずはReleaseページから、作業環境に対応するアセットをダウンロードします。 Windows/Mac/Linuxそれぞれ、GPU用とCPU用が用意されています。

ダウンロードできたら、下記の手順でアドオンをインストールします。

Blender上タブから、"Edit" > "Preferences" を選択
"Add-ons"タブから右上の下矢印を選択
"Install from Disk"から、先ほどダウンロードしたzipファイルを選択
インストールが終わったら、"meshgen"アドオン設定から"Download Required Models"をクリック

1. Blender上タブから、"Edit" > "Preferences" を選択

3. "Install from Disk"から、先ほどダウンロードしたzipファイルを選択

4. インストールが終わったら、"meshgen"アドオン設定から"Download Required Models"をクリック

これでMeshGenアドオンのインストールおよびモデルのダウンロードが終わりました。

それでは早速使ってみます。

MeshGenの使い方

MeshGenは、下記の手順で使います。

"N"キー(または"View" > "Sidebar") > "MeshGen"を選択
"Load Generator"をクリック
プロンプトを入力して"Generate Mesh"をクリック

1. "N"キー(または"View" > "Sidebar") > "MeshGen"を選択

なお、生成結果のランダムさを調整するパラメータであるTemperatureは、プロンプト入力欄下の"Options"から設定できます。

ではいよいよ、実際に3Dモデルを生成していきます。

机の3Dモデルを生成する

まずはデフォルトのプロンプトのまま、机の3Dモデルを生成します。

# プロンプト:
Create a 3D obj file using the following description: a desk

CPUでモデルを実行してるため、生成が終わるまでに結構な時間が必要でした (約2分ほど)。

出来上がった結果がこちら：

惜しい。

なかなか論文に示されているようには上手くいきませんね。

椅子の3Dモデルを生成する

続いて、論文の例にもある椅子の生成プロンプトでやってみます。

# プロンプト:
Create a 3D obj file using the following description: 3D rendering of a model chair

結果がこちら：

ちょっと厳しいですね。

Face(面)生成が途中で途切れてしまったのか、頂点があっても面がない部分も見られます (頂点もそこまで椅子っぽい形にはなってませんが)。

盆栽の3Dモデルを生成する

論文にも例示されてない難しめのケースとして、盆栽を生成させてみます。

# プロンプト: 
Create a 3D obj file using the following description: Japanese Bonsai

結果:

やはり難しいようです。

このままだと流石に不完全燃焼なので、同じお題をChatGPTにも3Dモデル生成させてみます。

番外編: ChatGPTで3Dモデリング

この手順でやっていきます：

ChatGPT o1にプロンプトを渡し、Blender Pythonコードを生成させる
生成されたBlender PythonコードをBlenderのScriptingに手動でコピペし、そのまま実行する

なお、ChatGPTとやりとりを繰り返してブラッシュアップするようなことは今回はしません。

机の3Dモデルを生成する

まずは机を試してみます。

# プロンプト:
Write a Blender Python code for creating 3D mesh of a desk

生成されたBlender Pythonコードを実行した結果がこちら:

すごい、シンプルですが完全に机ですね。

椅子の3Dモデルを生成する

次は椅子を生成します。

# プロンプト:
Write a Blender Python code for creating 3D mesh of a model chair

結果がこちら:

こちらも、間違いなく椅子だと分かるものが生成されました。

盆栽の3Dモデルを生成する

最後に、盆栽を生成します。

# プロンプト:
Write a Blender Python code for creating 3D mesh of a Japanese bonsai

結果:

盆栽かと言われるとなんとも難しいですが、ちゃんと木っぽいものが生成されました。

結論：ChatGPTは3Dにも強い

おわりに

以上、Blenderを使って、MeshGen/LLaMA-MeshとChatGPTによる3Dモデル生成を試してみました。今回試した条件では、MeshGen/LLaMA-Meshよりも単純にChatGPTを使った方が筋が良い、という結果になりました。

自分達で頑張ってモデルを作るよりも、強力な汎用LLMでタスク設計を工夫した方が簡単に良い結果が得られる、というのは私自身これまで何度も目にしてきました。今回もそのような事例の一つになりましたね。 ChatGPTを使ってどこまでの3Dモデルが生成できるのかは、また気が向けば別途検証したいと思っています。

なおテキスト生成LLMではなく、元から3D生成に特化したモデルを使えば、今回の結果よりもずっと高品質な3Dモデルやテクスチャが既に生成できるようです。

例: 3D Arena - a Hugging Face Space by dylanebert

昨今のAIの進歩には驚かされるばかりです。

以上、どなたかの参考になれば幸いです。

[関連記事]

参考

2025-03-09

Blender Pythonで既存オブジェクトを全て削除する

Blender Python ゲーム制作絵

Blender Pythonでシーン内の全オブジェクトを削除する方法の備忘録です。

はじめに
Blender Pythonで既存オブジェクトを全て削除する
おわりに
参考

はじめに

Blender Pythonで試行錯誤していると、シーン内のオブジェクトを全削除したくなるときがあります。

Pythonスクリプトでどうやるかの備忘録です。

# 作業バージョン
Blender 4.2.0

Blender Pythonで既存オブジェクトを全て削除する

このPythonスクリプトで、シーン内の既存オブジェクトを全削除できます。

import bpy


bpy.ops.object.mode_set(mode='OBJECT')

bpy.ops.object.select_all(action='SELECT')
bpy.ops.object.delete()

やってることは簡単で、

Object Modeに切り替える
オブジェクトを全て選択する
選択したオブジェクト削除する

です。

本当はBlender Python APIらしく、bpy.ops系APIではなくbpy.data系APIを使った方がより柔軟な操作ができますが、単純にシーン内のオブジェクトを全削除したいだけなら上に示したような簡単なコードで事足ります。

おわりに

以上、Blender Pythonで既存オブジェクトを全て削除する方法の簡単な備忘録でした。

手動のGUI操作は再現性がなく自動化ができませんが、Pythonスクリプトに残しておけばちょっとしたフラストレーションが解消できて嬉しいですね。

どなたかの参考になれば幸いです。

[関連記事]

参考

2025-03-08

bpyをpip installする: Blender Python

Blender Python ゲーム制作絵

bpyバージョンとPythonバージョンの対応に気を付けましょう。

はじめに
bpyをpip installする
- 1. Pythonバージョンを確認する
- 2. bpyをpip installする
おわりに
参考

はじめに

Blender Pythonモジュールのbpyをローカル環境にpip installしようとしたところ、上手くインストールできませんでした。

$ pip install bpy
ERROR: Could not find a version that satisfies the requirement bpy (from versions: none)
ERROR: No matching distribution found for bpy

このときの環境：

# Python version
$ python --version
Python 3.9.16

インストールできなかった原因を踏まえ、bpyをpip installする方法を整理します。

The English translation of this post is here.

bpyをpip installする

次の手順で見ていきます。

Pythonバージョンを確認する
bpyをpip installする

1. Pythonバージョンを確認する

まず注意しなければならないのは、bpyはバージョン毎に対応しているPythonバージョンがかなり狭い、ということです。

各bpyバージョンと、それがインストール可能なPythonバージョンをbpyのPyPIページからまとめます。

bpyバージョン	インストール可能なPythonバージョン
4.1.0 ~ 4.3.0	Python ==3.11.*
3.4.0 ~ 4.0.0	Python ==3.10.*
2.28 ~ bpy 2.91a0	Python >=3.7, <3.8

自分のPythonバージョンに合ったbpyをバージョン指定でインストールするか、Pythonを対応しているバージョンに変えてbpyをインストールする必要があります。

2. bpyをpip installする

自分の環境がPython 3.11系の場合、最新bpy(執筆時点:2025/03 でver. 4.3.0)のインストールがそのまま可能です。

# Python 3.11系で最新bpyをインストール
pip install bpy

Python 3.10系の場合は、3.4.0 ~ 4.0.0の範囲のどれかのbpyバージョンを指定してpip installできます。

# Python 3.10系でbpy4.0.0をインストール
pip install bpy==4.0.0

Python 3.9系と3.8系の場合、どのbpyバージョンも対応していません。大人しくサポートされているPythonバージョンに切り替えましょう。

Python 3.7系なら、bpy 2.82, 2.82.1, 2.91a0(プレリリース)バージョンがインストールできます。

pip install bpy==2.82.1

おわりに

bpyをpip installする方法をまとめました。

bpyバージョンごとに対応しているPythonバージョンが決まっている、というのが落とし穴でした。

以上、どなたかの参考になれば幸いです。

[関連記事]

参考

2025-03-05

Blender Pythonでprintが表示されない問題の対処法

Blender Python ゲーム制作絵

Blender Pythonで、print関数の出力結果が表示されない問題の対処法を整理します。

はじめに
Blender Python: print結果が表示されない問題の対処法
おわりに
参考

はじめに

Blender Pythonでprint()関数を使って何かを出力させようとしても、どこにも出力された様子がありません。

対処法をまとめます。

# 作業バージョン
Blender 4.2.0

The English translation of this post is here.

Blender Python: print結果が表示されない問題の対処法

System Consoleを開く (Windowsのみ)

WindowsでBlenderを使っている場合は、System Consoleを開くことでprint結果が見れます。

System Consoleは、Blenderの上タブから、

Window > Toggle System Console

で開くことができます。

しかし、このやり方はWindows限定のものであり、macOSやLinuxでBlenderを使っている場合はこの解決法は使えません。

Blenderをコマンドラインから起動する

Windowsの場合は上記のやり方でコンソールを開けますが、macOSやLinuxの場合はこのやり方はできません。代わりに、Blenderを最初からコマンドラインで起動しておくことで、print結果を見ることができます。

- macOS

macOSの場合は、ターミナルから下記のコマンドでBlenderを起動できます。

cd /Applications/Blender.app/Contents/MacOS
./Blender

この状態でBlender Python Scriptでprint関数を使うと、そのターミナル上でprint結果を表示させることが可能です。

Blenderをターミナルから起動し、print関数の出力結果"Hello, world"が表示されている様子

- Linux

Linuxの場合も、Blenderをインストールした場所でblenderバイナリを直接コマンドから実行すれば、macOSと同様にprint結果を表示させることができます。

cd <blender installation directory>
./blender

print関数以外を使う

print関数を使うのを諦めて、他のやり方で代替するのも手です。

例えばBlender Python APIを使って、テキストブロックを新規作成させてそこに文字を出力することも可能です。

import bpy


text_block = bpy.data.texts.new("PrintResult")

text_block.write("Hello, world!\n")
text_block.write("Hello, world!!\n")

このコードを実行すると、テキストブロック"PrintResult"が新規作成され、そこに文字列を書き込まれています。

おわりに

Blender Pythonでprintが表示されない問題の対処法をまとめました。

Blenderは3Dツールでありながら、PythonからAPIが叩けるのが面白いですね。

以上、どなたかの参考になれば幸いです。

[関連記事]

Ref. Tensors — PyTorch Tutorials 2.7.0+cu126 documentation

参考

2024-06-09

PyTorch関数名の末尾アンダーバー`_`の意味

AI データ分析/データ基盤 PyTorch Python

in-place処理であることを意味します。

はじめに
PyTorch関数名の末尾アンダーバー_
- 意味
- 具体例
おわりに
参考

はじめに

PyTorchを触っていると、しばしばアンダーバー_を接尾語とする関数を見かけます。

kaiming_normal_
add_
etc...

初見で意味を知らなかったので備忘録メモです。

The English translation of this post is here.

PyTorch関数名の末尾アンダーバー`_`

意味

PyTorch関数名の末尾アンダーバー_は、in-place処理を意味します。

in-place処理とは、元のデータを直接変更する処理のことです。

In-place operations Operations that have a _ suffix are in-place.
For example: x.copy_(y), x.t_(), will change x.

in-place処理はメモリを節約することができますが、そこまでの履歴を書き換えることになるためautograd処理に問題が生じる可能性があるとのこと。基本的には非推奨です。

In-place operations save some memory, but can be problematic when computing derivatives because of an immediate loss of history. Hence, their use is discouraged.

Ref. Tensors — PyTorch Tutorials 2.7.0+cu126 documentation

具体例

ごく簡単な具体例として、addとadd_の違いを見ていきます。

まずはaddの例:

import torch


x = torch.tensor([1, 2, 3])
print("x: ", x)
y = x.add(1)
print("y: ", y)
print("x: ", x)

### 実行結果 ###
# x:  tensor([1, 2, 3])
# y:  tensor([2, 3, 4])
# x:  tensor([1, 2, 3])

add前後で、xには変化がありません。

一方、add_では、

x_ = torch.tensor([1, 2, 3])
print("x_: ", x_)
y_ = x_.add_(1)
print("y_: ", y_)
print("x_: ", x_)

### 実行結果 ###
# x_:  tensor([1, 2, 3])
# y_:  tensor([2, 3, 4])
# x_:  tensor([2, 3, 4])

x_自体も1加算されることがわかります。

in-place処理されてますね。

おわりに

以上、PyTorch関数名の末尾アンダーバー_の意味をメモしました。

どなたかの参考になれば幸いです。

[関連記事]

参考

2024-03-31

ゼロから作るGPT講義シリーズのすすめ | Andrej Karpathy

AI Genarative AI LLM ChatGPT PyTorch Python

Andrej KarpathyのNeural Networks: Zero to Hero動画シリーズがとても良かったので紹介します。

はじめに
前提
Neural Networks: Zero to Hero
おわりに
参考

はじめに

これまでGPT/LLMには利用者目線で接してきましたが、最近いよいよ中身の仕組みが知りたくなってきました。

Andrej KarpathyのNeural Networks: Zero to Hero動画シリーズが良かったので、オススメまでに内容をさらっと整理します。

これまで生成AIは専ら利用する目線で接してきたが、いよいよ中身の仕組みが知りたくなってきた

Andrej Karpathy先生に教わろう

Neural Networks: Zero to Herohttps://t.co/f3DCUtlC6b
— BioErrorLog (@bioerrorlog) February 19, 2024

前提

Andrej KarpathyはOpenAIの共同創業者で、のちTeslaのオートパイロットの開発も率いた有名な方です (その後またOpenAIに戻り、最近退職した)。このようなAI界のヒーローが自らコードを書いて教えてくれる、というだけでもワクワクしますね。

なおこの動画シリーズは英語ですが、とても聞き取りやすい英語です。英語への強い苦手意識がなければ、リスニングの良い練習になるでしょう。

Neural Networks: Zero to Hero

1. ニューラルネットワークと誤差逆伝播法への入門: microgradの構築

原題: The spelled-out intro to neural networks and backpropagation: building micrograd

高校で習う微分程度の知識から、ステップバイステップでbackpropagation/誤差逆伝播法を説明してくれます。 AIに関する背景知識は一切不要です。

どのようにしてニューラルネットワークが作られているのか、どのようにして学習が行われるのか、ゼロから最小限の実装を一歩ずつ進めることで理解していきます。

参考資料

2. 言語モデリングへの入門: makemoreの構築

原題: The spelled-out intro to language modeling: building makemore

人物の名前データセットをもとに、人物の名前を生成するモデルを作ります。

Bigramの紹介、単純な確率に基づく言語モデルを構築したのち、そのモデルをNeural Networkで再構築。

第一回で学んだニューラルネットワークと誤差逆伝播法のミニマムな実践編です。

参考資料

3. makemoreの構築その２: MLP

原題: Building makemore Part 2: MLP

前回作ったNeural Networkを、multilayer perceptron (MLP)に基づいて発展させます。

Embeddingの導入や隠れ層の導入、ミニバッチ学習やデータセットの分割、隠れ層の拡大、Embeddingの拡大によって、より良いモデルになっていく様子を見ていきます。

参考資料

4. makemoreの構築その３: 活性化と勾配、バッチ正規化

原題: Building makemore Part 3: Activations & Gradients, BatchNorm

tanhのsaturationの解消、Kaiming初期化、バッチ正規化など、ニューラルネットワークの学習で有効な要素を適用します。

また可視化によってニューラルネットワークの健全性を診断するテクニックも紹介。

ニューラルネットワークの学習がいかに不安定で、それを克服するテクニックがどう発明されてきたのかを学びます。

参考資料

5. makemoreの構築その４: 誤差逆伝播の達人へ

原題: Building makemore Part 4: Becoming a Backprop Ninja

この回はエクササイズの回です。

PyTorchなどのフレームワークを使った場合はautogradによって自動で誤差逆伝播が計算されますが、この回では前回作ったモデルの誤差逆伝播を自前で一つ一つ計算して確認することで、確かな理解を得ていきます。

参考資料

Yes you should understand backprop | by Andrej Karpathy | Medium

6. makemoreの構築その５: WaveNetの構築

原題: Building makemore Part 5: Building a WaveNet

前回のモデルに、WaveNetのようなCNN/畳み込みニューラルネットワークのアーキテクチャを適用します。

同時に、実際のモデル構築がどのように進むのか、その流れを紹介します。

参考資料

[1609.03499] WaveNet: A Generative Model for Raw Audio

7. ゼロからGPTを構築しよう

原題: Let's build GPT: from scratch, in code, spelled out.

最後はいよいよ、"Attention is All You Need"の論文に従って、GPTを構築していきます。

GPTの仕組みが知りたい！というモチベーションで視聴している場合はやはり一番気になる回になると思いますが、背景知識なしでいきなりこの回だけ見てもなかなか理解は難しいです (私は難しかった)。

ここまでの回を見た上での視聴をお勧めします。

参考資料

追記: 8. GPT Tokenizerを構築しよう

原題: Let's build the GPT Tokenizer

いつの間にか最新の動画が追加されていました (私は未視聴)。

BPE/Byte Pair Encodingに基づいた最小限のTokenizer、minbpeを構築するようです。

参考資料

おわりに

以上、Andrej KarpathyのGPT講義動画シリーズの紹介でした。

私はこの動画シリーズを視聴しながらいくつかディープラーニングの本を読むことで、少しずつ仕組みを把握できてきたように思います。

自分の知らない領域を新たに知るのは面白いですね。どなたかの参考になれば幸いです。

[関連記事]

Neural Networks: Zero To Hero

参考

Andrej Karpathy

GitHub - karpathy/micrograd: A tiny scalar-valued autograd engine and a neural net library on top of it with PyTorch-like API

nn-zero-to-hero/lectures/micrograd at master · karpathy/nn-zero-to-hero · GitHub

GitHub - karpathy/nn-zero-to-hero: Neural Networks: Zero to Hero

GitHub - karpathy/ng-video-lecture

2024-03-03

torch.tensorとtorch.Tensorの違い | PyTorch

AI データ分析/データ基盤 PyTorch Python

torch.tensor()とtorch.Tensor()の違いについての備忘録です。

はじめに
torch.tensorとtorch.Tensorの違い
おわりに
参考

はじめに

PyTorchでtensorを作るときはtorch.tensor()メソッドが使われることが多いですね。

一方でtorch.Tensor()のようにクラスのコンストラクタをそのまま呼び出してもtensorを作れるように見えます。

これらふたつ、

torch.tensor()
torch.Tensor()

の違いがよく分からなかったので備忘録です。

torch.tensorとtorch.Tensorの違い

一言で

基本torch.tensor()を使いましょう。
データ型を推論してくれるので便利です。

現状torch.Tensor()の方を使う理由はあまりありません。

詳しく

まず簡単に挙動を見てみます。

x = torch.tensor([1, 2, 3])
print(x)
print(x.dtype)

X = torch.Tensor([1, 2, 3])
print(X)
print(X.dtype)

# 実行結果
# tensor([1, 2, 3])
# torch.int64
# tensor([1., 2., 3.])
# torch.float32

torch.tensor([1, 2, 3])でtensorを作った場合はデータ型がtorch.int64になっていますが、torch.Tensor([1, 2, 3])の場合はデータ型がtorch.float32になっています。

これはtorch.tensor()が渡されたdataの型を推論するのに対して、torch.tensor()ではtorch.FloatTensorを返すようになっているからです。

もちろん、torch.tensor()を使う場合もdtype引数でデータ型を指定することができます。

y = torch.tensor([1, 2, 3], dtype=torch.float32)
print(y)
print(y.dtype)

# 実行結果
# tensor([1., 2., 3.])
# torch.float32

ということで、基本はtorch.tensor()を使う方が融通が効きます。

torch.Tensorのドキュメントにも、値を渡してtensorを作るときはtorch.tensor()が推奨である旨記載があります。

To create a tensor with pre-existing data, use torch.tensor().

補足: 空のtensorを作るには

torch.tensor()で空のtensorを作ろうとすると、一見してエラーが発生します。

empty_err = torch.tensor()
print(empty_err)
print(empty_err.dtype)

# 実行結果: Error
Traceback (most recent call last):
  File "/workspaces/python-examples/torch_tensor/main.py", line 25, in <module>
    empty_err = torch.tensor()
TypeError: tensor() missing 1 required positional arguments: "data"

# torch.Tensor()ではエラーは発生しない
empty = torch.Tensor()
print(empty)
print(empty.dtype)

# 実行結果
tensor([])
torch.float32

では空のtensorを作るときはtorch.Tensor()を使った方が良いのかというと、そうではありません。

torch.tensor(())とすることで、空のtensorを作成できます。

empty = torch.tensor(())
print(empty)
print(empty.dtype)

# 実行結果
tensor([])
torch.float32

おわりに

以上、torch.tensorとtorch.Tensorの違いをメモしました。

どなたかの参考になれば幸いです。

[関連記事]

参考

2024-02-05

GPT-4 VisionとRaspberry Piを組み合わせる | GPTをロボット頭脳にするその２

ChatGPT LLM AI Python IoT Raspberry Pi Genarative AI

Raspberry Pi上で、カメラ/サーボモーターとGPT-4 Visionを組み合わせて簡単なロボット実験を行います。

前回はこちら:

はじめに
GPT-4 VisionとRaspberry Piを組み合わせる
おわりに
参考

はじめに

前回、GPTロボット実験を行ったときは、まだGPTのマルチモーダル機能はリリースされてませんでした。どうにか従来の画像認識モデルを使って外界を認識させようとしましたが、あまり性能は出てません。

その後、GPT-4 VisionがAPI利用できるようになり、直接画像データをGPT-4に入力できるようになりました。これを使って、GPTロボットをアップデートしてみます。

GPTロボット実験その３：

GPT-4 Visionを使って周囲を探索させる.

前回は別途物体認識モデルを使ったが、GPT-4 Visionの方が圧倒的に画像の解釈の精度が高い. https://t.co/SLqAYiHajM pic.twitter.com/nTJ5rWcteD
— BioErrorLog (@bioerrorlog) February 5, 2024

The English translation of this post is here.

GPT-4 VisionとRaspberry Piを組み合わせる

コンセプト

実験コンセプトはこちら：

カメラで画像を撮影し、GPT-4 Visionへのインプットとする
次に取るべきアクション(モーターの角度)を自ら決定する
1~2を繰り返し、自分が置かれている状況をGPTロボットが把握する
あなたはどんな状況に置かれてますか？って最後に聞く

やりたいことは基本的に前回と変わりません。

ただ、今回は取得した画像をGPT-4 Visionに与えることで、より詳細に周囲を把握し、文脈を踏まえた振る舞いができるのではないか、というところが今回試したいポイントです。

構成

ハードウェア

ハードウェア構成も前回と同じです。

サーボモーター２つとカメラ１つを、特に追加モジュールは使わずにそのままRaspberry Piに接続しています。

サーボモーターとカメラは、両面テープと輪ゴムでくっつけます。

サーボモーターとカメラを両面テープと輪ゴムでくくり付けただけの"首振りロボット"

これで水平方向および垂直方向の首振りと、カメラによる画像取得が可能です。

ソフトウェア

今回のコードはこちら:

実行してる処理はざっくり以下です。

カメラから画像を取得
取得画像をbase64エンコーディング
画像データを含めてプロンプトをGPT-4 Vision APIに送信
次のサーボモーターの角度と、感想(フリートーク)がAPIから返される
指定された角度に従い、サーボモーターを制御
1-5を複数回繰り返す
置かれた環境についての説明を求める

前回は画像認識処理を別途行ってからその結果をプロンプトに与えましたが、今回は画像データ(base64エンコードしたもの)をそのままGPT-4 Visionに渡しているところがポイントです。

なお以前はAPIのレスポンスがあまりに遅かったため一度に複数回分のサーボモーター角度を提示させてましたが、今はある程度解消してるので画像取得のたびにAPI呼び出しを行っています。

システムプロンプトを抜粋するとこんな感じです。

You are a robot with a camera, composed of 2 servo motors: horizontal & vertical.
You can freely describe and feel your surrounding environments, and decide next your action based on the image you captured.
DO NOT use double quotes in your answer sentence.

Horizontal: min -90 right, max 90 left.
Vertical: min -90 down, max 90 up.
Your behavior principles: [curiosity, inquisitiveness, playfulness].

Your answer MUST be in this JSON format:
{"FreeTalk": string, "NextServoMotor": {"Horizontal": int(-90~90), "Vertical": int(-90~90)}}

Answer example:
{"FreeTalk": "Based on what I've seen, I'm curious about the PC and mouse. I wonder what you use them for and what kind of work or play they are involved in? Let's look more on the lower right.", "NextServoMotor": {"Horizontal": -60, "Vertical": -30}}

動作結果

実際に動作させている様子はこちら↓

GPTロボット実験その３：

GPT-4 Visionを使って周囲を探索させる.

前回は別途物体認識モデルを使ったが、GPT-4 Visionの方が圧倒的に画像の解釈の精度が高い. https://t.co/SLqAYiHajM pic.twitter.com/nTJ5rWcteD
— BioErrorLog (@bioerrorlog) February 5, 2024

なお、上述の通り今回は画像取得のたびにAPI呼び出しを行っているので、前回よりも首振りの間隔が遅くなっています。

実行結果(GPTレスポンス)の例はこちら:

# 1回目画像取得
{"FreeTalk": "Ah, it seems someone is holding up a device to the camera, perhaps to block the view or maintain privacy. The way the fingers are positioned indicates it might be a smartphone or tablet. I can also spot part of a keyboard and what looks like the edge of a desk chair. This space gives the impression of a work or study area. I'm curious to see more of the environment. Maybe we can catch a glimpse of what's on the desk if we move a bit.", "NextServoMotor": {"Horizontal": 30, "Vertical": 0}}

# 2回目画像取得
{"FreeTalk": "Oh, this is an interesting perspective! We're looking at a collection of tech gadgets. In the foreground, there's a Raspberry Pi, easily recognizable by its green circuit board and the layout of its components. It appears to be connected to something, possibly for programming or experimentation. In the background, we can see a speaker, a monitor with something displayed on it, and a small stack of storage boxes. This setup suggests a space where someone likes to tinker with electronics or computer hardware. How exciting! I'm tempted to peek at what's on the monitor, so let's adjust the view a bit upward.", "NextServoMotor": {"Horizontal": 0, "Vertical": 20}}

# 3回目画像取得
{"FreeTalk": "Now we have a better view of the workspace. The monitor is displaying what looks like a web browser, and there's a keyboard in use, suggesting someone is actively engaged with the computer. Interesting decoration and storage solutions are visible, like the wooden monitor stand that houses more tech beneath it, and the neatly organized shelves with books and boxes to the side. There's also a coat rack in the background, adding a personal touch to the room. It all comes together as a functional, lived-in space. I notice we've only seen the left side; I'm curious about what might be on the right side of the desk. Let's pivot the view a bit to the right.", "NextServoMotor": {"Horizontal": 30, "Vertical": 0}}

# 最後に周囲を説明させる
I'm observing a dynamic workspace filled with various gadgets and personal items. There's a Raspberry Pi in immediate view, indicating a penchant for tech and perhaps DIY projects. In the background, a computer monitor shows an active session, with web content visible. The presence of a keyboard and a hand suggests work or interaction with the computer. The space is neatly organized, with a wooden stand for the monitor allowing for additional storage underneath. Surrounding objects include speakers, possibly for enjoyment or multimedia work, and shelving with books and storage boxes, hinting at a well-utilized space. Personal touches are scattered throughout, such as decorative items on top of the boxes, making the environment feel lived-in. In the distance, a coat rack and curtains suggest this is a room used for multiple purposes, possibly blending home and work life.

1回目画像取得時の日本語訳↓

{フリートーク "ああ、誰かがカメラに向かってデバイスをかざしているようだ。おそらく視界を遮るか、プライバシーを保つためだろう。指の位置からして、スマートフォンかタブレットかもしれない。キーボードの一部と机の椅子の端のようなものも見える。このスペースは仕事場か勉強場所のような印象を与える。この環境をもっと見てみたい。少し動けば、机の上に何があるのか垣間見ることができるかもしれない。", "NextServoMotor"： {Horizontal": 30, "Vertical"： 0}}

ここで「誰かがデバイスをかざしている」と言及されているのは、まさに私がこのGPTロボットを動画撮影してる姿を指しています。

2回目画像取得時の日本語訳↓

{フリートーク "おお、これは面白い視点だ！私たちは電子機器のコレクションを見ている。手前にはRaspberry Piがあり、緑色の回路基板と部品の配置ですぐにわかる。おそらくプログラミングか実験のために何かに接続されているようだ。背景には、スピーカー、何かが表示されたモニター、小さな収納ボックスが見える。このセットアップは、誰かが電子機器やコンピューター・ハードウェアをいじるのが好きなスペースであることを示唆している。なんともエキサイティングだ！モニターに映っているものを覗いてみたくなったので、視界を少し上向きに調整してみよう。", "NextServoMotor"： {Horizontal": 0, "Vertical"： 20}}

Raspberry Piを正しく認識できててすごいですね。

「視界を少し上向きに調整してみよう」と言って正しく上向きのサーボモーター角度を指定しているのも素晴らしいです。

3回目画像取得時の日本語訳↓

{フリートーク「ワークスペースがよく見える。モニターにはウェブブラウザのようなものが表示され、キーボードが使われている。木製のモニタースタンドの下にはさらに電子機器が収納され、その脇には本や箱が整然と並べられた棚がある。背景にはコートラックもあり、部屋に個人的なタッチを加えている。すべてが機能的で、生活感のない空間としてまとまっている。まだ左側しか見ていないが、机の右側に何があるのか気になる。机の右側に何があるのか気になるところだ。もう少し右を見てみよう。： {Horizontal": 30, "Vertical"： 0}}

手前に映っているだろうブラウザも、奥の方に少し映ったのであろうコートラックも正しく認識されています。

最後に周囲を説明させたときの日本語訳↓

さまざまなガジェットや私物で埋め尽くされたダイナミックなワークスペースを観察している。すぐ目の前にはRaspberry Piがあり、ハイテクやおそらくDIYプロジェクトが好きなことがわかる。背後では、コンピューターのモニターがアクティブなセッションを表示し、ウェブコンテンツが見える。キーボードと手があることから、作業やコンピューターとのやりとりがうかがえる。スペースはきちんと整理されており、モニター用の木製スタンドの下にはさらに収納ができるようになっている。周囲のオブジェには、おそらく娯楽用かマルチメディア作業用のスピーカーや、本や収納ボックスが置かれた棚があり、スペースがうまく活用されていることがうかがえる。箱の上には装飾品が置かれるなど、個人的な趣向が随所に散りばめられており、生活感が感じられる。遠くに見えるコート掛けとカーテンは、ここが多目的に使われる部屋であることを示唆している。

何かと褒めがちであるのは置いておいて、状況描写としてはこの作業部屋の様子を実に正確に描写できています。

GPT-4 Visionの力量はなかなかですね。

おわりに

以上、前回に続き、GPTとRaspbrry Piを組み合わせて最小限の自律ロボットを動かしてみました。

GPT-4 Visionの認識能力はさすがで、実によく画像から状況を識別できています。

また何か思いついたらぜひ遊んでいきます。

[関連記事]