n8n-nodes-gemini3-vertex
v0.2.6
Published
n8n community nodes exposing Gemini 3 features for Google Vertex AI
Maintainers
Readme
n8n-nodes-gemini3-vertex
n8n community nodes that expose Gemini 3 features for Google Vertex AI.
Nodes
- Google Vertex Chat Model (Gemini 3) — a Chat Model sub-node for Agents
and Chains. Adds the native Gemini 3 thinking level
(
MINIMAL/LOW/MEDIUM/HIGH), a thinking budget, per-category safety settings, and a streaming toggle. - Google Vertex Gemini 3 — an action node with a Message a Model operation. Adds the native thinking level, thought summaries, per-category safety settings, Google Search grounding, and structured JSON output.
Both Model and Thinking Level are resource locators — pick from a list or switch to Expression / ID mode to supply the value dynamically. Safety settings expose one threshold dropdown per harm category, all at once, so there is no per-row "add item" clicking.
Installation
In n8n: Settings → Community Nodes → Install and enter
n8n-nodes-gemini3-vertex.
Credentials
Both nodes use the built-in Google API credential — a GCP service account with the Vertex AI API enabled (email, private key, region).
Model selection
The Model field is a resource locator: From List queries the live Vertex
AI model catalogue (ai.models.list, base models, filtered to Gemini) for the
selected project and region, so it never goes stale — or switch to ID to type
a model name directly.
Leave the Model field empty and the node auto-resolves the latest flash
model at run time — it queries the live catalogue and picks the highest
Gemini version of gemini-<version>-flash (excluding flash-lite and non-chat
flash variants). No hardcoded model, nothing to keep updating.
Feature placement
- The sub-node forwards the native thinking level and thinking budget through
@langchain/google-vertexai(pinned to2.1.24, matching the version n8n itself ships, to avoid duplicate-@langchain/coreruntime conflicts). - Thought summaries as an independent toggle live on the action node. The sub-node's LangChain layer couples thought inclusion to the thinking budget rather than exposing a separate switch.
Gotchas with Gemini 3.x
includeThoughts/thoughtSummary— On Gemini 3.x, the flag is accepted by Vertex (the model still thinks; you can see the token count inusageMetadata.thoughtsTokenCount), but the response does not include the thought text the way Gemini 1.5 / 2.x did. The action node'sthoughtSummaryoutput field will be empty for 3.x models. Use it with 1.5 / 2.x if you need the readable reasoning.Thinking-by-default models eating your tokens. Gemini 3.5 (and similar thinking-by-default flash models) will consume your entire
maxOutputTokenson reasoning if you don't bound it. If you ask for a short answer with a small token budget and get back empty text, set Thinking Level = Minimal (or give it more tokens). This caught us in the integration tests — the same trap will catch a workflow.
Caveat: grounding & JSON output
Google Search grounding and forced JSON output conflict with how n8n Agents bind their own tools, so they live on the action node rather than the sub-node. The two features also cannot be combined with each other (a Gemini API restriction); the action node validates this and errors clearly.
Development
npm install --ignore-scripts # eslint-plugin-n8n-nodes-base has a pnpm-only preinstall guard
npm run build
npm test # unit tests — mocked, no networkRequires Node.js 20+.
Live integration tests
npm run test:integration exercises both nodes against the real Vertex AI
API and verifies the parameters the nodes send actually take effect in
Google's response. It needs a GCP service-account key:
export GCP_KEY_FILE=/absolute/path/to/service-account.json
# optional overrides:
export GCP_PROJECT_ID=my-project # defaults to project_id in the key file
export GCP_LOCATION=us-central1 # default
export GEMINI_MODEL=gemini-3.1-pro # default
npm run test:integrationWithout GCP_KEY_FILE the suites skip themselves, so a normal npm test
never makes network calls. These tests make billable API calls.
What is verified against the live response:
| Parameter | Verified via |
| --- | --- |
| thinkingLevel (MINIMAL→HIGH) | usageMetadata.thoughtsTokenCount scales up (action node); usage_metadata.output_token_details.reasoning (sub-node) |
| includeThoughts | a response part with thought: true |
| Google Search grounding | candidates[].groundingMetadata present |
| responseSchema | output parses as schema-shaped JSON |
| maxOutputTokens | finishReason === 'MAX_TOKENS' |
| systemInstruction | model obeys the instruction |
| streaming | generateContentStream / .stream() yields chunks |
| latest-flash resolution | resolveLatestFlash returns a live *-flash model (not flash-lite) |
| safety settings | accepted without error, normal response still produced |
| temperature / topP / topK | accepted without error (Google does not echo these back) |
