[PAID] 🧠 Gemini Extension to interact with the Gemini-pro model from Google

Black_Knight · April 4, 2025, 11:55am

Good job

techxsarthak · April 4, 2025, 11:56am

Thanks, your extensions are also good as always

Black_Knight · April 5, 2025, 2:00pm

New Update

Gemini extension now allows you to create and modify images via instructions using Google's Gemini AI models directly within your App Inventor projects!

Black_Knight:

blocks1108×224 22 KB

GenerateImage

Creates a new image from a text description.

prompt (Text): Description of the desired image.

apiKey (Text): Google API Key.

modelName (Text): Image generation model (e.g., "gemini-1.5-flash"). Check Google docs.

blocks(1)1036×332 34.9 KB

EditImage

Modifies an existing image provided as Base64.

prompt (Text): Instructions for changes.

inputImageBase64 (Text): Image to edit (Base64 string).

inputMimeType (Text): MIME type of input image (e.g., "image/jpeg").

apiKey (Text): Google API Key.

modelName (Text): Image editing model.

blocks(2)1710×332 39.3 KB

EditImageFromPath

Modifies an existing image using its file path.

prompt (Text): Instructions for changes.

inputImagePath (Text): Full path to the image file on device.

apiKey (Text): Google API Key.

modelName (Text): Image editing model.

blocks(3)1240×278 29.2 KB

EditMultipleImagesSimple

Advanced editing/generation using multiple input images (URL/Path/Base64) and text.

prompt (Text): Instructions involving the images.

imageSourceStrings (List): List of image sources (URLs, paths, or Base64 strings).

apiKey (Text): Google API Key.

modelName (Text): Multi-image capable model.

blocks(4)944×224 22 KB

DisplayBase64Image

Helper block to display Base64 image data on an Image component.

base64Data (Text): Base64 image data (from GotImageResponse).

mimeType (Text): Image MIME type (from GotImageResponse).

imageComponent (Component): The Image component to display on.

component_event1240×176 13.7 KB

Event: GotImageResponse

Fires when image generation/editing succeeds.

imageBase64 (Text): Resulting image as Base64. Empty on failure.

mimeType (Text): MIME type of the result (e.g., "image/png").

responseText (Text): Any text from the API (e.g., errors if blocked).

rawApiResponse (Text): Full JSON response (for debugging).

imagePath (Text): Path where the result image was saved in app storage (ASD). Empty on failure.

Examples of generating and editing with Gemini

Screenshot_2025-04-05-15-17-48-771080×9632 570 KB

image576×1280 109 KB

image332×1280 81.9 KB

Black_Knight · April 18, 2025, 10:32pm

New Update: You can now use the Gemini AI model to analyze any video, even from a local path or any Youtube video URL

Black_Knight:

_- visual selection648×588 43.2 KB

blocks (1)2042×440 71.3 KB

StreamGenerateContentFromLocalVideoPath

Parameters:

videoPath (String): The local file path to a video file.

prompt (String): The text prompt related to the video content.

apiKey (String): Your Google AI API Key.

modelName (String): The Gemini model to use.

systemInstructionsValue (String): Optional system instructions.

jsonSchemaString (String): Optional JSON schema for structured output.

Description: Uploads a local video file using the File API, polls until the file is processed ("ACTIVE"), and then starts a streaming request based on the video content and prompt. Optionally includes system instructions and/or requests structured output via a JSON schema. Response chunks arrive via GotGeminiStream. Triggers StreamFinished when done or ErrorOccurred on failure.

blocks2252×386 60.9 KB

StreamGenerateContentFromLocalVideoPathWithInstructions

Parameters:

videoPath (String): The local file path to a video file.

prompt (String): The text prompt related to the video content.

apiKey (String): Your Google AI API Key.

modelName (String): The Gemini model to use.

systemInstructionsValue (String): Optional system instructions.

Description: Similar to StreamGenerateContentFromLocalVideoPath, but only includes the option for system instructions (no structured output schema). Uploads the video, waits for processing, then starts the streaming request. Response chunks arrive via GotGeminiStream. Triggers StreamFinished when done or ErrorOccurred on failure. Uses standard Designer Properties for generation config.

_- visual selection612×612 46.4 KB

blocks1368×278 37.5 KB

StreamGenerateContentFromYouTubeUrl (Overload 1 - Basic)

Parameters:

youtubeUrl (String): Public URL of a YouTube video (including Shorts).

prompt (String): Text prompt relating to the video.

apiKey (String): Your Google AI API Key.

modelName (String): The Gemini model to use.

Description: Starts a streaming analysis request using a YouTube URL and prompt. Uses default generation settings from Designer Properties. Response chunks arrive via GotGeminiStream. Triggers StreamFinished when done or ErrorOccurred on failure.

blocks (1)1504×386 56.6 KB

StreamGenerateStructuredContentFromYouTubeUrl (Overload 2 - Advanced)

Parameters:

youtubeUrl (String): Public URL of a YouTube video (including Shorts).

prompt (String): Text prompt relating to the video.

apiKey (String): Your Google AI API Key.

modelName (String): The Gemini model to use.

systemInstructionsValue (String): Optional system instructions.

jsonSchemaString (String): Optional JSON schema for structured output.

Description: Starts a streaming analysis request using a YouTube URL and prompt. Optionally includes system instructions and/or requests structured output via a JSON schema. Response chunks arrive via GotGeminiStream. Triggers StreamFinished when done or ErrorOccurred on failure.

Black_Knight · April 25, 2025, 1:01pm

Overview

Gemini Extension brings Google’s powerful, multimodal Gemini AI directly into MIT App Inventor, enabling you to build and customize any AI-driven API—text processing, OCR, image analysis, video intelligence, image editing, and more—without incurring extra third-party API fees

Why Gemini Extension?

By embedding Google’s state-of-the-art Gemini model (optimized in Ultra, Pro, and Nano sizes), this extension delivers enterprise-grade AI capabilities right inside your App Inventor projects—no external subscriptions required blog.google. You define your own JSON Schemas to guarantee structured, consistent outputs, eliminating parsing headaches and ensuring your data flows smoothly into your app

Key Features

Structured Outputs & Schemas

JSON-Schema Enforcement: Supply any JSON Schema and Gemini Extension will strictly adhere to it, so your responses are always predictable and ready for use

Text Processing & NLP

Advanced Language Tasks: Summarization, sentiment analysis, intent classification, and intelligent text generation—all powered by Gemini’s deep understanding of context and nuance blog.google.

OCR & Image Analysis

Seamless Text Extraction: Leverage Google Cloud’s OCR API to convert images and documents into machine-readable text, complete with layout and language support Google Cloud.
Free Credit on Signup: New users receive $300 of free Google Cloud credits, making it cost-effective to kickstart large-scale OCR projects Google Cloud.

Video Intelligence

Smart Video Insights: Automatically detect objects, scenes, and explicit content in videos using Google Cloud Video Intelligence API—ideal for content moderation, metadata generation, and more Google Cloud.

Image Editing & Creative APIs

AI-Powered Editing: Perform inpainting, outpainting, background removal, color adjustments, and more, all within your App Inventor blocks—no GPU management needed stablediffusionapi.com.
Competitive Alternatives: While other services like Photoroom API and Phot.AI offer image editing, Gemini Extension integrates directly into your project, removing extra steps and costs Photoroom phot.ai.

Benefits

Zero Extra API Costs: Build any custom AI pipeline without paying monthly fees to external providers—your only investment is the extension itself mit-cml.github.io.
Rapid Development: Install in minutes, configure schemas visually in App Inventor, and start testing—all without backend setup or server maintenance.
Scalable for Any Project: From hobby experiments to enterprise deployments with thousands of users, Gemini Extension handles it all under Google’s robust infrastructure.
Full Customization: Mix and match features—combine OCR, NLP, image analysis, and video intelligence in a single workflow tailored to your app’s needs

OCR Gemini customized API Example:

APK file
Try from here

Get Started Today

Transform the way you build AI features in MIT App Inventor with only 5.99$. Install Gemini Extension now, unlock Google’s Gemini AI, and craft the exact APIs your app demands—cost-efficiently, reliably, and at scale.

binna794 · May 1, 2025, 10:28am

Hello,

I recently purchased your Gemini Extension for MIT App Inventor through PayPal, but I wasn’t redirected to the download link after payment was completed.

Could you please send the extension file (or the access link) directly to my email address at [mail id removed by mod, please do not post personal info, use PMs]

Let me know if you need any payment confirmation or transaction details.

Thank you for your support!

Best regards,

Black_Knight · May 1, 2025, 8:24pm

I am sorry for such a situation,
I have sent the Extension for you please check your email,
I hope this extension will move your app development to the next level.

Thanks,

Black_Knight · May 8, 2025, 4:38am

New blocks added that will enhance the UX

files Upload Manager

Uploads a local file, waits for it to be processed (ACTIVE), " +
"and returns detailed metadata via the 'GotFileMetadata' event. " +
"Also reports progress via 'FileUploadProgress'.

image787×186 8.21 KB

Reports the progress of a file upload (e.g., video, audio, pdf)."

Retrieves the direct download URI for the content of a file identified by its resource name (e.g., 'files/your_file_id'). " +
"Use this URI to download the file content directly (e.g., with Web component).

image understanding

Analyzes multiple images (from URLs/Paths) based on prompt, streaming results. Optionally provide system instructions and/or JSON schema. Results via GotGeminiStream/StreamFinished events.

Analyzes an image from URL based on prompt, streaming results. Optionally provide system instructions and/or a JSON schema for structured output. Results via GotGeminiStream/StreamFinished events.

video understanding

Black_Knight · May 21, 2025, 9:58pm

HUGE Gemini Extension Update! Generate Mind-Blowing Audio!

Get ready! You can now transform text into incredibly realistic speech with the NEW Text-to-Speech (TTS) service just added to your Gemini AI extension, powered by Google!

This isn't just any speech generator; it's INSANELY powerful! Create:

Crystal-clear single voice narrations
Dynamic multi-speaker conversations (perfect for podcasts!)
And so much more!

Black_Knight:

GenerateSingleSpeakerAudio

Parameters:

text_input (String): The text content to be converted into speech. This can include natural language prompts to guide the style, accent, pace, and tone (e.g., "Say cheerfully: Have a wonderful day!").

api_key (String): Your Google AI API Key (used to initialize the client).

model_name (String): The specific Gemini model to use for speech generation (e.g., "gemini-2.5-flash-preview-tts", "gemini-2.5-pro-preview-tts").

voice_name (String): The desired prebuilt voice for the audio output (e.g., 'Kore', 'Puck', 'Zephyr'). A list of available voices can be found in the Gemini API documentation.

output_filename (String): (Optional) The desired filename to save the generated audio (e.g., "output.wav"). The method of saving might vary based on implementation.

Description: Converts a given text input into audio spoken by a single synthesized voice. The API allows for control over the speech style through prompts and selection from a variety of prebuilt voices. The generated audio can then be streamed or saved to a file.

Single audio examles

with style (whispering)

with style (Acting)

without style

image871×255 59.5 KB

GenerateMultiSpeakerAudio

Parameters:

script_input (String): A text script that includes dialogue for multiple speakers. Speaker names should be clearly indicated in the script (e.g., "Joe: Hello! Jane: Hi there!"). This input can also include natural language prompts to guide the style and tone for each speaker (e.g., "Make Speaker1 sound tired and Speaker2 sound excited: Speaker1: ... Speaker2: ...").

api_key (String): Your Google AI API Key (used to initialize the client).

model_name (String): The specific Gemini model to use for speech generation (e.g., "gemini-2.5-flash-preview-tts", "gemini-2.5-pro-preview-tts").

speaker_configurations (List of Objects): A list where each object defines a speaker and their voice. Each object should contain:

speaker_tag (String): The identifier for the speaker as used in the script_input (e.g., "Joe", "Speaker1").

voice_name (String): The desired prebuilt voice for this specific speaker (e.g., 'Kore', 'Puck').

output_filename (String): (Optional) The desired filename to save the generated multi-speaker audio (e.g., "dialogue.wav"). The method of saving might vary based on implementation.

Description: Generates audio from a text script involving up to two distinct speakers. Each speaker can be assigned a unique prebuilt voice. The API supports prompts within the script to control the style, tone, and delivery for each speaker individually. The output can be streamed or saved.

blocks examble

image880×518 38.3 KB

for detailed guide how to use GeminiTTS visit this guide

Black_Knight · June 13, 2025, 10:58pm

Gemini Extension Update: Unlock the Web with URL Context!

We're excited to announce a powerful new feature for the Gemini extension: URL Context . This update allows your apps to give the Gemini model the ability to read and understand the content of web pages you provide directly in your prompt!

What is URL Context?

Imagine you want Gemini to summarize a news article, compare two product pages, or answer questions based on a specific blog post. Before, you would have to copy and paste all the text.

Now, you can simply include the web page links (URLs) in your prompt and enable the new enableUrlContext feature. Gemini will visit those URLs, read the content, and use that information to give you a much more relevant and contextual response.

This opens up amazing new possibilities, such as:

Article Summarization: "Summarize the key points of this article for me: [URL]"
Data Extraction: "Extract all the technical specifications from this product page: [URL]"
Content Comparison: "Compare the pros and cons of the cameras reviewed in [URL1] and [URL2]"
Question Answering: "Based on the information at [URL], what is the main ingredient in their recipe?"

Black_Knight · June 28, 2025, 11:55pm

This is an example of how to use Gemini Vesion API with thermer chat extension

See the video :

The blocks :

Black_Knight · August 15, 2025, 4:22pm

New Update _{_gemini.aix}

Two old blocks of "single block response & streaming response" merged into this single block with new argument of isStreaming: boolean

blocks3338×646 212 KB

Two blocks were added to make it easy and fast to ask the model one single question without continuous chat, unlike the previous function GenerateGeminiContent block that enables you to create continuous chat :

blocks900×268 31.9 KB

blocks (1)1236×320 43 KB

sidrobo · August 18, 2025, 2:47pm

Can you please provide all features in one message

Black_Knight · August 18, 2025, 3:06pm

All Gemini extension features

Text Generation :
- Simple Chat : Provides a basic Ask function for simple text-in, text-out conversations.
- Advanced Generation : A powerful GenerateGeminiContent function that supports both single-turn and multi-turn conversations, system instructions, and optional tools.
- Streaming : Offers streaming versions of all major generation functions (StreamGenerateGeminiContent, StreamGenerateGroundedContent, etc.) that provide the response in real-time chunks.
Image Understanding (Vision) :
- Simple Image Queries : An AskWithImage block to ask questions about a single image.
- Multi-Modal Analysis : The ability to send multiple images, videos, audio files, PDFs, and text in a single prompt for comprehensive analysis.
- Multiple Input Sources : Accepts files from local paths, Base64 encoded strings, content URIs, and public URLs (for PDFs and images).
- YouTube Video Analysis : Can analyze content directly from a public YouTube URL (including Shorts) when provided with a prompt.
Image Generation & Editing :
- Text-to-Image : GenerateImage function to create an image from a text description.
- Image Editing : EditImage and EditImageFromPath functions to modify an existing image based on a text prompt.
- Multi-Image Editing : EditMultipleImagesSimple function to process a prompt against a list of images from various sources (URLs, paths, Base64).
Audio Understanding :
- Can process local audio files (e.g., MP3, WAV) as part of a prompt to be analyzed by the model.
Video Understanding :
- Can process local video files as part of a prompt, allowing the AI to analyze the video's content frame-by-frame.
Text-to-Speech (TTS) :
- Single Speaker : GenerateSingleSpeakerSpeech function to convert text into speech using a specified prebuilt voice.
- Multi-Speaker : GenerateMultiSpeakerSpeech function to create dialogue with multiple distinct voices from a structured script.

Advanced Features & Tools

Structured Output (JSON) :
- Users can provide a JSON Schema to force the model to return its answer in a structured JSON format, making it easy to parse and use data in the app. This is supported by multiple functions.
- Includes a CreateJsonSchema helper block to easily build the required schema.
Google Search Grounding :
- The StreamGenerateGroundedContent function can be enabled to have the model perform a Google search to ground its response in real-world information, providing source links for its claims.
Code Execution :
- The model can be given the ability to generate and execute code (like Python) to solve complex problems, with the results returned in the response.
File API Integration :
- Efficient File Uploads : Includes robust functions to upload large files (like videos) directly to Google's servers. This is highly efficient as the file is processed on the server and referenced by a URI, avoiding the need to send the full file with every request.
- File Management : Provides blocks to get detailed metadata (UploadFileAndGetMetadata) and the direct download link (GetFileContentUri) for uploaded files.
- Reusability : Uploaded files can be reused in multiple API calls by referencing their URI.

Events and Callbacks

The extension is event-driven, providing specific events to handle different outcomes:

General Responses : RespondedToGemini (for single responses), GotGeminiStream (for each piece of a streaming response), and StreamFinished.
Image & Audio Generation : GotImageResponse (returns Base64 image data and a saved file path) and GotSpeechAudio (returns Base64 audio and a saved file path).
File Uploads : FileUploadProgress (provides real-time progress for large uploads) and FileUploadComplete / GotFileMetadata (fires when a file is uploaded and processed, returning its URI and details).
Error Handling : A robust ErrorOccurred event that provides detailed error messages for easier debugging.
API Key Validation : APIKeyValid, APIKeyInvalid, and APIKeyCheckError events to confirm if the provided API key works.
Grounding Sources : GotGroundingInfo event that returns a list of source URLs and titles when using Google Search.

Utility and Helper Functions

File Encoding : Multiple blocks to encode various file types (images, videos, PDFs) into Base64 format.
Path Conversion : A GetFilePathFromURI function to handle file paths provided by components like the Activity Starter or File Picker.
Permission Handling : Blocks to check for and request the necessary storage permissions on Android.
Image Display : A DisplayBase64Image helper to easily display a Base64 string in an Image component.
Model Management : A GetGeminiModelNames function to retrieve a list of all available models for the user's API key.
Favicon Fetcher : A simple utility to get the URL for a website's favicon.

Configuration

Designer Properties : The extension allows setting key parameters directly in the MIT App Inventor designer, including:
- API Key and default Model Name.
- Generation controls: Temperature, Top P, Top K, and Max Output Tokens.
- Safety settings: Category and Threshold for content moderation.

sidrobo · August 19, 2025, 4:20am

Which gemini model needs to be used to access all features

Black_Knight · August 19, 2025, 10:04am

There is no model that can access all features

For General Analysis (Text, Chat, Vision, Audio, Video):
- Use Gemini 1.5 Pro or Gemini 2.5 Pro . This covers most of the extension's features. For a faster alternative, use Gemini 1.5 Flash .
For Generating and Editing Images:
- Use an Imagen model (e.g., imagen-4).
For Generating Speech (Text-to-Speech):
- Use a Gemini TTS model (e.g., gemini-2.5-flash-preview-tts).

Black_Knight · September 24, 2025, 7:23pm

I'm excited to share a major update to the Gemini extension!

We've just added a powerful new feature: Image Editing . To celebrate, we are also introducing our most powerful and a-peeling model yet: the Nano Bananana AKA gemini-2.5-flash-image-preview model!

Now you can perform powerful image edits directly within your App Inventor projects. Take a look:

We are very excited to see what you can create with this new functionality.

Happy Inventing

Black_Knight · November 19, 2025, 9:50am

Gemini 3 pro is here this is the game changer!

https://x.com/Google/status/1990924447402828120?t=1Avhi2kbi6XVDg7SQNkuQA&s=19

Black_Knight · December 11, 2025, 11:23pm

Major Update: Function Calling & Files API Integration!

Hello App Inventors!

We are thrilled to announce a game-changing update for the Gemini extension. This version transforms Gemini from a simple chatbot into a powerful AI Agent capable of controlling your app, while also giving you massive upgrades in file and image handling.

What's New?

Demo :

1. Function Calling: Turn Gemini into an Android Agent

The biggest feature in this update is Function Calling. You can now teach Gemini how to use tools within your app!

Instead of just returning text, Gemini can intelligently decide to trigger events in your app based on the user's conversation.

How it Works (The Tool Loop):

Send Request: You provide a prompt ("Give me the weather in Egypt") and a list of tools your app has.
Tool Needed: Gemini realizes it can't answer directly, so it asks you to run the get_weather tool.
Execution: Your app runs the function (e.g., gets data from a weather API).
Returning Data: You send the result (e.g., "30°C, Sunny") back to Gemini using SendFunctionResponse.
Completion: Gemini uses that data to give a final natural language answer: "The weather in Egypt is 30°C and Sunny."

Example Declaration: Here is how you define a function for Gemini using the functionDeclarations parameter:

[

{

"name": "get_weather",

"description": "this function job to get weather status for specific location",

"parameters": {

"type": "object",

"properties": {

"location": {

"type": "string"

}

},

"required": [

"location"

],

"propertyOrdering": [

"location"

]

}

}

]

Key Blocks:

DeclareFunctions: Define the available tools (like the JSON above).
FunctionCallRequested (Event): Fires when Gemini wants to perform an action.
SendFunctionResponse: Return the action's result back to the model.

2. Files API & Hybrid Image Engine

We've completely overhauled how files and images are handled to eliminate size limits and boost performance.

Hybrid Image Engine

Smart Switching: Small images (< 4MB) are processed instantly. Large images (> 4MB) automatically use the Files API.
No More Limits: Send full-resolution 20MB+ raw photos without crashing your app!

Full Files API Control

Manage your AI's knowledge base dynamically:

UploadFile: Upload PDFs, Audio, Video, or Images to Gemini's cloud storage.
ListUploadedFiles: View what's stored in your project.
AskWithFile / AskWithUploadedFiles: "Read this PDF" or "Watch this video" and answer questions about it.
DeleteFile: Manage your storage quota programmatically.

(Add a screenshot here of the new file management blocks)

Why Update?

Build Agents: Create smart home assistants, personal schedulers, or data analysis bots that actually do things.
Stable & Fast: The new image engine prevents "Payload Too Large" and Out-Of-Memory errors.
Multimodal Power: Analyze huge documents and long videos with ease.

PAID_file

Price: 5.99$ not 7$ for limited period
Purchase: PayPal Link or You can pay HERE using your credit card
In both cases after payment, you'll be redirected to the download URL. Contact me for any help or issues.

Happy Coding!

Black_Knight · January 5, 2026, 8:44am

Hi everyone,
I'm working on v2 of the Gemini Extension, and I want to make sure it covers your specific use cases.
Instead of just asking for features, I want to know: What kind of AI app are you trying to build right now?
Are you building a chatbot assistant?
An educational app for homework help?
A tool to generate marketing text?
If you tell me what you are building, I can add the specific blocks or parameters to make that easier for you. Let me know in the comments!

[PAID] 🧠 Gemini Extension to interact with the Gemini-pro model from Google

New Update

Gemini extension now allows you to create and modify images via instructions using Google's Gemini AI models directly within your App Inventor projects!

New Update: You can now use the Gemini AI model to analyze any video, even from a local path or any Youtube video URL

Overview

Why Gemini Extension?

Key Features

Structured Outputs & Schemas

Text Processing & NLP

OCR & Image Analysis

Video Intelligence

Image Editing & Creative APIs

Benefits

Get Started Today

New blocks added that will enhance the UX

files Upload Manager

image understanding

video understanding

HUGE Gemini Extension Update! Generate Mind-Blowing Audio!

Gemini Extension Update: Unlock the Web with URL Context!

What is URL Context?

This is an example of how to use Gemini Vesion API with thermer chat extension

New Update _gemini.aix

All Gemini extension features

Advanced Features & Tools

Events and Callbacks

Utility and Helper Functions

Configuration

Gemini 3 pro is here this is the game changer!

Major Update: Function Calling & Files API Integration!

What's New?

1. Function Calling: Turn Gemini into an Android Agent

2. Files API & Hybrid Image Engine

Hybrid Image Engine

Full Files API Control

Why Update?

PAID_file

New Update _{_gemini.aix}