Per-second of GPU time, model by model. Image models are typically a few cents per generation; video and large LLMs cost more. Replicate's site lists per-model rates.

Can it stream LLM tokens?

Yes for models that support streaming. The MCP exposes a streaming mode that emits chunks as they arrive.

Does it auto-pick the right model?

No — you (or the agent) pass the model owner/name. Use the `list_models` tool to discover.

Replicate

Name: Top MCPs — curated MCP directory
Creator: Top MCPs
License: https://creativecommons.org/licenses/by/4.0/

Run any open-source model on Replicate from inside an AI agent.

Score 56(?)ReplicateMITVerified May 31, 2026Top MCPs for AI & Machine Learning →

Install View on GitHub

Quick answer

What it does

Wraps Replicate's prediction API: list models, get model schema, create a prediction, poll status, retrieve output. Supports both synchronous and async modes.

Best for

Image generation
Video generation
Audio processing
Calling long-tail open-source models

Not for

High-volume production
Local-only workflows

Setup recipe

Pick your client, then follow the three steps.

Install

claude_desktop_config.jsonjson

{
  "mcpServers": {
    "replicate": {
      "command": "npx",
      "args": [
        "-y",
        "replicate-mcp"
      ],
      "env": {
        "REPLICATE_API_TOKEN": "${REPLICATE_API_TOKEN}"
      }
    }
  }
}

Paste under mcpServers. Fully quit and reopen Claude after editing.

CLI or .mcp.jsonshell

# export REPLICATE_API_TOKEN=YOUR_API_TOKEN
claude mcp add replicate -- npx -y replicate-mcp

Run from your repo. Commit .mcp.json to share with your team.

.cursor/mcp.jsonjson

{
  "mcpServers": {
    "replicate": {
      "command": "npx",
      "args": [
        "-y",
        "replicate-mcp"
      ],
      "env": {
        "REPLICATE_API_TOKEN": "${REPLICATE_API_TOKEN}"
      }
    }
  }
}

Global path: ~/.cursor/mcp.json. Reload window after editing.

.vscode/mcp.jsonjsonc

{
  "servers": {
    "replicate": {
      "command": "npx",
      "args": [
        "-y",
        "replicate-mcp"
      ],
      "env": {
        "REPLICATE_API_TOKEN": "${REPLICATE_API_TOKEN}"
      }
    }
  }
}

VS Code uses the "servers" key (not "mcpServers").

~/.codeium/windsurf/mcp_config.jsonjson

{
  "mcpServers": {
    "replicate": {
      "command": "npx",
      "args": [
        "-y",
        "replicate-mcp"
      ],
      "env": {
        "REPLICATE_API_TOKEN": "${REPLICATE_API_TOKEN}"
      }
    }
  }
}

Open via Cascade → hammer icon → Configure.

cline_mcp_settings.jsonjson

{
  "mcpServers": {
    "replicate": {
      "command": "npx",
      "args": [
        "-y",
        "replicate-mcp"
      ],
      "env": {
        "REPLICATE_API_TOKEN": "${REPLICATE_API_TOKEN}"
      }
    }
  }
}

Open via the Cline sidebar → MCP Servers → Edit.

~/.continue/config.jsonjson

{
  "experimental": {
    "modelContextProtocolServers": [
      {
        "transport": {
          "type": "stdio",
          "command": "npx",
          "args": [
            "-y",
            "replicate-mcp"
          ],
          "env": {
            "REPLICATE_API_TOKEN": "${REPLICATE_API_TOKEN}"
          }
        }
      }
    ]
  }
}

Continue uses modelContextProtocolServers with a transport block.

~/.codex/config.tomlshell

# ~/.codex/config.toml
[mcp_servers.replicate]
command = "npx"
args = [
  "-y",
  "replicate-mcp",
]
env = { REPLICATE_API_TOKEN = "${REPLICATE_API_TOKEN}" }

Codex uses TOML. Each server is a [mcp_servers.<name>] subtable.

~/.config/zed/settings.jsonjsonc

{
  "context_servers": {
    "replicate": {
      "command": {
        "path": "npx",
        "args": [
          "-y",
          "replicate-mcp"
        ]
      },
      "env": {
        "REPLICATE_API_TOKEN": "${REPLICATE_API_TOKEN}"
      }
    }
  }
}

Zed calls them "context_servers". Settings live-reload on save.

ChatGPT → Apps directorynone

Replicate doesn't ship a hosted HTTPS endpoint today. ChatGPT supports remote MCP servers only — to use this server in ChatGPT you'll need to deploy it to a public HTTPS URL first (e.g. via Cloudflare Workers or Vercel) or wait for an official remote build.

2
Set required secrets
Set REPLICATE_API_TOKEN in your shell environment before launching your MCP client.
3
Try a minimum working prompt
Minimum working prompt pending verification. Try any prompt from the MCP’s README once installed.

Tools & permissions

Tools list pending verification. The server exposes tools over MCP; we haven’t yet parsed its capability manifest into this page. Check the GitHub repo for the authoritative list.

Security & scope

Access scope: Network
Sandbox: All inference runs on Replicate's infra. The API token grants account-level access to your billing and predictions.
Gotchas: Tokens are full-account; restrict use to one MCP and rotate periodically.
Async predictions stay in your account history — clean up sensitive inputs.

Agent prompt pack

— copy into Claude, Cursor, or ChatGPT.

Paste into Claude, Cursor, or ChatGPT. Edit the [brackets] before sending.

Recommend the best MCP servers for [task: e.g. ai & machine learning work] in [client: Claude].

Constraints:
- Prefer tools that are [official | open-source | read-only] — pick what matters for my use case.
- Exclude MCPs that require [e.g. a paid plan, OAuth-only flows, remote-only transport].
- Return at most 3 picks, ranked.

For each pick include:
1. One-sentence rationale.
2. The ready-to-paste install snippet for my client.
3. Any required secrets I need to create before installing.

Cross-check the top-mcps.com listing: https://top-mcps.com/top-mcps-for-ai-machine-learning

Compare Replicate against a real alternative. Swap the second MCP in [brackets] if you want a different match.

Compare Replicate MCP vs [OpenRouter MCP] for the following job: [describe the job, e.g. "let an agent create GitHub issues on bug triage"].

Judge them on:
- Setup time and complexity (what a new user hits first).
- Auth model (none / API key / OAuth 2.1) and credential risk.
- Transport (stdio / Streamable HTTP / SSE) and where the server runs.
- Required secrets and the blast radius if they leak.
- Operational risk in an unattended agent loop.
- Which one is "good enough" for a weekend prototype vs. production.

End with one sentence: which should I pick for my scenario, which is: [my scenario].

References:
- https://top-mcps.com/mcp/replicate
- top-mcps.com listing for OpenRouter

Asks the agent to install and verify. Works inside Claude Code, Cursor Agent, Codex CLI.

Install the Replicate MCP server for my [client: Claude] at the default config path for that client.

Use the exact install snippet published at https://top-mcps.com/mcp/replicate (fetch https://top-mcps.com/mcp/replicate.json for the canonical server.json if you can read URLs).

Before finishing:
1. Create the required secrets (REPLICATE_API_TOKEN) and put them in the appropriate env block — do not hard-code them.
2. Restart or reload the client so it picks up the new server.
3. Verify the server is connected (green / running state) and at least one tool is listed.
4. If anything fails, read the client's MCP logs and report the exact error — do not silently retry.

Confirm when done and list the tools the server now exposes.

Frequently asked questions

What changed

— 2 updates tracked.

May 31, 2026
Refreshed install snippets and fact sheet; verified for 2026.
Feb 1, 2025
Initial directory listing.

More AI & Machine Learning MCPs

Other tools in the same category worth evaluating.

Hugging Face

Search, inspect, and run Hugging Face models and datasets from an agent.

huggingfacemlmodelsinference

3 minLow

OpenRouter

Route one prompt across 300+ LLMs with a single API key.

aillmopenrouterinference

2 minLow

Pinecone

Official

Managed vector database for semantic search and retrieval in AI agents.

vector-dbpineconeretrievalembeddings

5 minMedium

Compared with Replicate

Side-by-side breakdowns for the choices people most often weigh against this MCP.

ReplicatevsHugging Face ReplicatevsOpenRouter

Exploring Top MCPs for AI & Machine Learning? See all AI & Machine Learning MCPs →