The Gemini extension for AI2 allows you to interact with the Google Gemini-Pro, Gemini-Pro-Vision, and Gemini 2.0 Flash Thinking models, including models that Bard is based on, to generate text and control a stream of text generation.
Features of the Gemini Extension for AI2:
Text Generation with Gemini API: Leverage the power of Google Gemini API for advanced text generation within your AI2 applications. Includes support for various Gemini models, including Gemini-Pro, Gemini-Pro-Vision, and Gemini 2.0 Flash Thinking.
Streaming Text Generation: Experience real-time text generation with streaming capabilities, providing immediate feedback and a more interactive user experience.
Vision Capabilities (Image & Video Support):
- Generate Text with Images: Incorporate images into your prompts to create multimodal AI interactions.
- Video Thumbnail Generation: Extract and utilize video thumbnails for richer content processing.
- PDF Processing from URL: Process and generate content from PDF files directly from web URLs.
- PDF Processing from Local Path: Process and generate content from PDF files stored on the device's local storage.
Audio Processing from Local Path: Process and generate content from audio files stored on the device's local storage.
Gemini 2.0 Flash Thinking Model Support: Access and utilize the faster and more efficient Gemini 2.0 Flash Thinking model for rapid text generation tasks.
Code Execution Support: (Optional) Enable code execution within Gemini API requests for dynamic and interactive responses.
Structured Output with JSON Schema: Define JSON Schemas to ensure structured and predictable output from the Gemini API, ideal for data-driven applications.
File Handling & Encoding:
- Base64 Encoding for Files (Optimized & Standard): Efficiently encode various file types (images, videos, PDFs, general files) to Base64 for API compatibility. Includes optimized fast encoding and standard encoding options.
- File Path & URI Handling: Robustly handle file paths and content URIs to access local files for processing.
- MIME Type Detection: Automatically detect MIME types for files to ensure correct data handling with the Gemini API.
PaLM API Integration (Text Generation): Includes support for the Google PaLM API for text generation, offering flexibility and access to different models.
Model Listing: Fetch and display a list of available Gemini models directly within your AI2 app.
Stream Control (Stop/Open Stream): Provide user control over streaming processes with functions to start, stop, and manage active streams.
Error Handling & Events: Gracefully handle API errors, JSON parsing issues, and file processing errors, providing informative error events for debugging. Includes events for stream completion (
StreamFinished
) and manual stream stopping (StoppedStream
).Asynchronous Operations: All API interactions and file processing are handled asynchronously to prevent blocking the UI thread and ensure app responsiveness.
Benefits of using the Gemini Extension:
Unlock Advanced AI Capabilities: Easily integrate cutting-edge AI text and multimodal generation into your App Inventor projects without complex coding, now with support for even faster models.
Enhanced User Engagement: Streaming responses and interactive features create more dynamic and engaging user experiences.
Versatile Content Creation: Generate diverse content formats, from text and code to responses based on images, videos, PDFs, and audio, from both web URLs and local storage.
Structured Data Handling: Utilize JSON Schema to create applications that reliably process and generate structured data.
Simplified File Integration (Local & Web): Seamlessly work with local files and web-based files (images, videos, PDFs, audio, etc.) within your AI2 apps for richer AI interactions.
Flexibility with Multiple APIs & Models: Access both Gemini and PaLM APIs, and choose between different Gemini models including the fast Gemini 2.0 Flash Thinking, selecting the best option for your specific needs and performance requirements.
Easy to Use & Extensible: Designed for ease of use within the App Inventor environment, while providing a foundation for future feature expansions.
Here are some specific examples of how the Gemini Extension can be used:
Intelligent Chatbots & Virtual Assistants: Build sophisticated chatbots that understand text, images, and now audio, providing context-aware and engaging conversations.
Content Generation Tools: Create apps that generate articles, social media posts, product descriptions, creative stories, and more, with or without image, PDF, or audio prompts.
Image & Video Analysis Applications: Develop apps that analyze images and videos, extracting information and generating relevant text descriptions or summaries.
Document Processing & Summarization (Local & Web): Build tools to process PDF documents from URLs or local storage, extracting key information and generating summaries or answering questions based on the content.
Audio Analysis & Transcription Apps: Create applications that can process local audio files to generate text transcriptions, summaries, or answer questions based on audio content.
Code Generation & Assistance Tools: Create applications that can generate code snippets or provide coding assistance with optional code execution capabilities.
Data Extraction & Structuring Apps: Develop apps that extract information from unstructured text or multimodal inputs and output structured JSON data according to predefined schemas.
Educational & Creative Apps: Design interactive learning experiences, story generators, and creative tools that leverage the power of AI for enhanced engagement and personalization across various media types.
The potential applications are vast and limited only by your imagination!
Blocks
Explanation
Generating Content
To generate content using Gemini, you can use the
GenerateGeminiContent
block. This block takes two arguments:
modelName (String): The name of the Gemini model to use (e.g., "gemini-1.5-flash") check this docs.
apiKey (String): Your Google API key.
contents: A list of dictionaries, where each dictionary represents a content item. Each content item can have ,
So the JSON input forcontents
will be like this[ {"role":"user", "parts":[{ "text": "Write the first line of a story about a magic backpack."}]}, {"role": "model", "parts":[{ "text": "In the bustling city of Meadow brook, lived a young girl named Sophie. She was a bright and curious soul with an imaginative mind."}]}, {"role": "user", "parts":[{ "text": "Can you set it in a quiet village in 1600s France?"}]}, ]
Blocks example:
old Block: for explanationthe following keys:
*role
: A string representing the role of the content item in the conversation.
*parts
: A list of dictionaries, where each dictionary represents a part of the content item. Each part can have the following keys:
*text
: A string representing the text of the part.
- apiKey (String): Your Google API key.
The
GenerateGeminiContent
block will generate content using the specified parameters and return the result in theRespondedToGemini
event.The
RespondedToGemini
event will be triggered with the following parameters:
apiResponse
: The raw API response from Gemini.textParts
: A list of strings representing the generated text parts.role
: The role of the generated content.finishReason
: The reason why the generation was finished.index
: The index of the generated content.safetyRatings
: A list of dictionaries representing the safety ratings of the generated content. Each dictionary will have the following keys:
category
: The category of the safety rating.probability
: The probability of the safety rating.
Function: StreamGenerateGeminiContent
Description:
Stream generate content using the Google Gemini API with optional Code Execution. Provides a streaming response for text and code, with code execution capability.
Parameters:
contents
: A list of dictionaries, where each dictionary represents a content item. Each content item can have the following keys:
role
: A string representing the role of the content item in the conversation.parts
: A list of dictionaries, where each dictionary represents a part of the content item. Each part can have the following keys:
text
: A string representing the text of the part. so the JSON input forcontents
will be like this[ {"role":"user", "parts":[{ "text": "Write the first line of a story about a magic backpack."}]}, {"role": "model", "parts":[{ "text": "In the bustling city of Meadow Brook, lived a young girl named Sophie. She was a bright and curious soul with an imaginative mind."}]}, {"role": "user", "parts":[{ "text": "Can you set it in a quiet village in 1600s France?"}]}, ]
apiKey
(String): Your Google API key.modelName
(String): The name of the Gemini model to use (e.g., "gemini-1.5-flash") check this docs.enableCodeExecution
(boolean): Enable code execution capability (true/false).
Blocks examble
old Block: for explanation
Functionality:
- Asynchronously initiates streaming content generation from the specified Gemini API model.
- Constructs API request from
contents
, including optional tools for code execution ifenableCodeExecution
is true.- Receives streamed responses via Server-Sent Events (SSE).
- For each chunk, extracts text and/or executable code blocks.
- Calls
GotGeminiStream(textValue)
on UI thread with combined text and formatted code blocks (using Markdown code fences).- Calls
StreamFinished()
on UI thread upon stream completion.- Calls
ErrorOccurred(errorMessage, "Gemini")
orErrorOccurred(errorMessage, "Gemini-JSON")
on UI thread for errors.Callbacks:
GotGeminiStream(textValue)
: Called on UI thread with each streamed chunk of text and code.StreamFinished()
: Called on UI thread when streaming is finished.ErrorOccurred(errorMessage, "Gemini")
: Called for general errors.ErrorOccurred(errorMessage, "Gemini-JSON")
: Called for JSON parsing errors during streaming.Usage Notes:
- For streaming text and code generation with Gemini API.
contents
parameter allows for multi-turn conversations and image inputs.- Enable
codeExecution
for the model to potentially generate and execute code.GotGeminiStream
provides incremental content, including formatted code blocks.- Use
StreamFinished
to know when generation is complete.- Handle different
ErrorOccurred
callbacks for debugging.- Requires internet connection and valid API key. Model name must be specified.
GotGeminiStream` event.
The
GotGeminiStream
event will be triggered with the following parameter:
text
: A string representing the generated text.
You can manually stop the stream using the
StopStream
block. TheStoppedStream
event will be triggered when the stream is stopped.
You can also check if the stream is currently running using the
IsStreaming
block.
Function: GenerateGeminiThinkingContent
Description:
Generate content using the Gemini 2.0 Flash Thinking model. Retrieves the full response in one call.
Parameters:
prompt
(String): Text prompt for content generation.apiKey
(String): Google Cloud API key.
Function: StreamGenerateGeminiThinkingContent
Description:
Stream generate content using the Gemini 2.0 Flash Thinking model. Retrieves content in chunks for a responsive experience.
Parameters:
prompt
(String): Text prompt for content generation.apiKey
(String): Google Cloud API key.Functionality:
- Asynchronously initiates streaming content generation from Gemini API.
- Receives content in chunks via Server-Sent Events (SSE).
- Calls
GotGeminiStream(textValue)
on UI thread for each chunk.- Calls
StreamFinished()
on UI thread when streaming completes.- Calls
ErrorOccurred(errorMessage, "GeminiThinking")
orErrorOccurred(errorMessage, "GeminiThinking-JSON")
on UI thread if error.
Function: StreamGenerateContentFromPdfUrl
Description:
Stream generate content from a PDF URL using the Google Gemini API (Streaming). Processes a PDF from a URL, uploads it to the Gemini API, and streams the generated content in chunks.
Parameters:
pdfUrl
(String): URL of the PDF file to process.prompt
(String): Text prompt to guide content generation based on the PDF content.apiKey
(String): Google Cloud API key.modelName
(String): Gemini model name (e.g., "gemini-pro-vision").
Function: StreamGenerateGeminiStructuredContent
-----------------------
Description:Stream generate structured content using the Google Gemini API. The response will be formatted according to the provided JSON schema. Streams content from Gemini API, enforcing a JSON schema for structured output.
Parameters:
contents
(YailList of YailDictionary): List of content turns, same format asStreamGenerateGeminiContent
.apiKey
(String): Google Cloud API key.modelName
(String): Gemini model name (e.g., "gemini-pro").scheme
(String): JSON Schema string defining the desired structure of the API response. This schema can be created usingCreateJsonSchema
.
Usage Notes:
- Use this function to get structured JSON output from the Gemini API, streamed in chunks.
- Provide a valid JSON Schema string as the
scheme
parameter to define the desired output structure.- The Gemini API will attempt to format its response according to the provided schema.
GotGeminiStream
will provide chunks of text that, when combined, should form a valid JSON object matching the schema.- Use
CreateJsonSchema
to easily create thescheme
parameter.- Requires internet connection, valid API key, and a Gemini model that supports structured output.
Function: CreateJsonSchema
Description:
Create a JSON Schema string for structured output. Builds a JSON Schema based on provided property names, types, descriptions, and required properties.
Parameters:
propertyNames
(YailList of String): List of property names for the JSON schema.propertyTypes
(YailList of String): List of property types corresponding topropertyNames
(e.g., "string", "number", "array"). Supported types: "string", "number", "array", "boolean", "integer", "object".propertyDescriptions
(YailList of String): List of descriptions for each property. Can be empty strings ornull
for default descriptions.requiredProperties
(YailList of String): List of property names that are required in the JSON output.
Function: StreamGenerateContentFromLocalPdfPath
Description:
Stream generate content from a PDF from a local file path using the Google Gemini API (Streaming). Processes a PDF from the device's local storage, uploads it to the Gemini API, and streams the generated content in chunks.
Function: StreamGenerateContentFromLocalAudioPath
Description:
Stream generate content from audio from a local file path using the Google Gemini API (Streaming). Processes an audio file from the device's local storage, uploads it to the Gemini API, and streams the generated content in chunks.
Parameters:
audioPath
(String): The absolute file path to the audio file on the device's local storage. App needs storage permissions to access this path.prompt
(String): Text prompt to guide content generation based on the audio content.apiKey
(String): Google Cloud API key.modelName
(String): Gemini model name (e.g., "gemini-pro-vision", or a model suitable for audio processing if available in the future).
Generating Content with Images
To generate content using images, you can use the
StreamGenerateGeminiVisionContent
block. This block takes two arguments:
contents
: A list of dictionaries, where each dictionary represents a content item. Each content item can have the following keys:
role
: A string representing the role of the content item in the conversation.parts
: A list of dictionaries, where each dictionary represents a part of the content item. Each part can have the following keys:
text
: A string representing the text of the part.inlineData
: A dictionary representing inline data, such as an image. TheinlineData
dictionary can have the following keys:
mimeType
: The MIME type of the inline data.data
: The base64-encoded data of the inline data.
So the JSON input forcontents
will be like this[ { "text": "Describe what the people are doing in this image:\n" }, { "inlineData": { "mimeType": "image/png", "data": "'$(base64 -w0 image0.jpeg)'" } }, { "text": " " } ] } ]
Blocks example:
old Block: for explanation
api key
: Your Google Cloud API key.
The
StreamGenerateGeminiVisionContent
block will open a stream of content generation using the specified parameters. The generated content will be returned in theGotGeminiStream
event.
StreamGenerateGeminiFileContentFromBase64
This function sends a streaming request to the Google Gemini API to generate content based on the provided files and text.Parameters:
apiKey (String): Your Google API key.
modelName (String): The name of the Gemini model to use (e.g., "gemini-1.5-flash") check this docs.
fileBase64List (YailList): A list of strings containing the Base64 encoded data of the files.
mimeTypeList (YailList): A list of strings containing the MIME types of the files in fileBase64List. The order of MIME types must correspond to the order of files.
additionalText (String): Any additional text to include in the request.
GetGeminiModelNamesThis function retrieves a list of available Gemini model names from the Google Gemini API.
Parameters:
- apiKey (String): Your Google API key.
Events:
- GotGeminiModelNames(List modelNames): This event is triggered when the API request is successful and the list of model names is retrieved. The modelNames parameter contains the list of model names as strings.
- ErrorOccurred(String message, String component): This event is triggered if an error occurs during the API request.
Encoding Images to Base64
The
EncodeImageToBase64
block can be used to encode an image file to Base64 with the-w0
option, which removes all line breaks from the encoded string. This can be useful for sending images to the Gemini API.The
EncodeImageToBase64
block takes one argument:
imagePath
: The path to the image file.The
EncodeImageToBase64
block will return the base64-encoded image as a string.
Error Handling
The
ErrorOccurred
event will be triggered if an error occurs while using the Gemini extension. The event will be triggered with the following parameters:
message
: A string describing the error.component
: The name of the component that caused the error.
Examples
Here is an example of how to use the Gemini extension to generate text:
contents = [{"role": "user", "parts": [{"text": "Hello, Gemini!"}]}] api_key = "YOUR_API_KEY" GenerateGeminiContent(contents, api_key)
Bocks:
Here is an example of how to use the Gemini extension to generate text in a stream:
contents = [{"role": "user", "parts": [{"text": "Hello, Gemini!"}]}] api_key = "YOUR_API_KEY" StreamGenerateGeminiContent(contents, api_key)
Bocks:
Here is an example of how to use the Gemini extension to generate text with images:
contents = [ { "role": "user", "parts": [ {"text": "Here is an image of a cat:"}, {"inlineData": {"mimeType": "image/jpeg", "data": base64_image}} ] } ] api_key = "YOUR_API_KEY"
Bocks:
Here is an example of how to use the Gemini extension to generate text with images in FreeForm Prompt:
you can use this extension to convert the TextBox component to FreeForm layout :
contents = [ { "parts": [ { "text": "Describe what the people are doing in this image:\n" }, { "inlineData": { "mimeType": "image/jpeg", "data": "'$(base64 -w0 image0.jpeg)'" } }, { "text": "\nand what is the relation between this is mage to \n" }, { "inlineData": { "mimeType": "image/webp", "data": "'$(base64 -w0 image1.webp)'" } } ] } ] api_key = "YOUR_API_KEY"
Bocks:
Freeform preview example:
PaLM_2 blocks 
PaLM 2
PaLM 2 is a large language model (LLM) developed by Google that can perform various tasks involving natural language understanding and generation, such as reasoning, coding, mathematics, and multilingual translation. It is an improved version of PaLM, which was released in 2022. PaLM 2 is based on three main innovations: compute-optimal scaling, improved dataset mixture, and updated model architecture and objective. PaLM 2 is also used in other generative AI tools, such as the PaLM API and Bard
Applications that use this extension :
videos preview:
Aix_file:
Check the comparison between PAID and FREE file
PAID_file
Price: $5.99
PayPal payment URL: Purchase Gemini.aix , After payment you will be directed to the download URL so you do not have to contact me to get the extension file however you can contact me in case of any Any help or problem
FREE_file
Gemini_Mini.aix (11.6 KB)
Have Inquiries?
For any queries regarding the Gemini extension, feel free to reach out at PM
Note :
**You can try Gemini and get your API key from here