[PAID] 🧠 Gemini Extension to interact with the Gemini-pro model from Google

Black_Knight · December 18, 2023, 6:14am

Gemini

The Gemini extension for AI2 allows you to interact with the Google Gemini-Pro, Gemini-Pro-Vision, and Gemini 1.5 Flash models (including models Bard uses) to generate text and control streaming text generation.

Features of the Gemini Extension for AI2:

image804×870 62.6 KB

Gemini API Text Generation: Use Gemini-Pro, Gemini-Pro-Vision, and Gemini 1.5 Flash models.

Image Generation: Create new images from text prompts using supported models.

Image Editing: Modify existing images using text instructions (from path or Base64).

Streaming Text Generation: Real-time, interactive responses.

Vision Support: Generate text from images, video thumbnails, PDFs (URL/Local).

Local Audio Processing: Generate text based on local audio files.

Gemini 1.5 Flash Model Support: Access the faster Flash model.

Code Execution (Optional): Enable code execution in Gemini responses.

Structured JSON Output: Define schemas for predictable results.

File Handling: Optimized Base64 encoding (images, video, PDF, audio, etc.), path/URI handling, MIME type detection.

PaLM API Integration: Basic text generation support for PaLM.

List Available Models: Fetch model names.

Stream Control: Start/Stop streaming functions.

Error Handling: Events for API/file/JSON errors, stream completion/stopping.

Asynchronous Operations: Non-blocking UI for responsiveness.

Benefits:

image949×792 62.4 KB

Integrate cutting-edge AI text and multimodal generation easily into AI2 projects.

Create dynamic user experiences with streaming and interactive features.

Generate diverse content from text, images, videos, PDFs, and audio (local/web).

Build data-driven apps with reliable structured JSON output.

Seamlessly work with local and web files for richer AI interactions.

Choose between Gemini (including Flash) and PaLM APIs/models.

Easy to use within App Inventor, designed for extensibility.

Use Cases:

image804×511 44.2 KB

Chatbots: Build bots understanding text, images, audio.

Content Creation: Generate articles, posts, stories (with text, image, PDF, audio prompts).

Media Analysis: Analyze images/videos, generate descriptions/summaries.

Document Processing: Process PDFs (URL/Local) for summaries/Q&A.

Audio Processing: Transcribe/summarize local audio.

Code Tools: Generate/assist with code (optional execution).

Data Structuring: Extract info into structured JSON via schemas.

Edu/Creative Tools: Interactive learning, story generation.
The potential applications are vast!

Blocks

Explanation

Generating Content (Non-Streaming)

22498×310 27.4 KB

Use the GenerateGeminiContent block.

modelName (String): Gemini model (e.g., "gemini-1.5-flash"). See docs.

apiKey (String): Your Google API key.

contents: List of dictionaries representing conversation turns. Each dict has role (String) and parts (List of dicts with text key).

Blocks example:

Screenshot 2023-12-16 2356054110×622 272 KB

Screenshot 2023-12-16 2356051336×172 19.4 KB

The RespondedToGemini event handles the response.

apiResponse: Raw API response.

textParts: List of generated text strings.

role: Role of the response.

finishReason: Why generation stopped.

index: Content index.

safetyRatings: List of safety rating dictionaries (category, probability).

blocks3426×386 83.4 KB

Function: StreamGenerateGeminiContent

Stream content using the Gemini API, with optional Code Execution.

Parameters:

contents: List of conversation turn dictionaries (same structure as GenerateGeminiContent).

apiKey (String): Your Google API key.

modelName (String): Gemini model (e.g., "gemini-1.5-flash"). See docs.

enableCodeExecution (boolean): Enable code execution (true/false).

image798×700 16.5 KB

image512×512 12.2 KB

Blocks example

Screenshot 2023-12-16 2356054222×622 275 KB

Functionality:
Initiates streaming via Server-Sent Events (SSE). Extracts text/code from chunks. Triggers GotGeminiStream with content (Markdown for code). Triggers StreamFinished on completion. Triggers ErrorOccurred for API or JSON errors.

Callbacks: GotGeminiStream(textValue), StreamFinished(), ErrorOccurred(errorMessage, component).

Usage Notes: For streaming text/code. contents handles multi-turn/images. enableCodeExecution allows interactive code. Handle events for results and errors. Requires internet and valid API key.

GotGeminiStream` event.

Triggered with each chunk of streamed data.

text: String representing the generated text chunk.

Use StopStream to manually stop. StoppedStream event fires when stopped.

Use IsStreaming to check if a stream is active.

blocks1448×164 25.4 KB

Function: GenerateGeminiThinkingContent

Generate content using the Gemini 1.5 Flash model (non-streaming).

prompt (String): Text prompt.

apiKey (String): Google API key.

blocks3480×398 83.1 KB

Function: StreamGenerateGroundedContent

Streams content from Gemini, instructing it to use Google Search for grounding (fact-checking).

prompt (String): Text prompt.

apiKey (String): Google API key.

Event: GotGroundingInfo

Fires (usually near end of stream) with web sources used for grounding.

sourceUris: List of URLs consulted.

sourceTitles: Corresponding list of page titles.

blocks (1)1098×162 18.8 KB

Function: GetFaviconUrl

Constructs a URL for a website's favicon using Google's service.

url (String): The website URL.

blocks1542×164 26.5 KB

Function: StreamGenerateGeminiThinkingContent

Stream content using the Gemini 1.5 Flash model. Retrieves content in chunks.

prompt (String): Text prompt.

apiKey (String): Google API key.

Triggers GotGeminiStream, StreamFinished, and ErrorOccurred.

blocks1566×268 42.5 KB

Function: StreamGenerateContentFromPdfUrl

Stream content based on a PDF file retrieved from a URL.

pdfUrl (String): URL of the PDF.

prompt (String): Text prompt related to the PDF content.

apiKey (String): Google API key.

modelName (String): Gemini model (e.g., "gemini-pro-vision").

blocks3094×1228 310 KB

Function: StreamGenerateGeminiStructuredContent

image309×905 29.2 KB
-----------------------
image296×536 24.9 KB

Stream structured JSON content matching a provided schema.

contents (List): Conversation turns (same format as StreamGenerateGeminiContent).

apiKey (String): Google API key.

modelName (String): Gemini model (e.g., "gemini-pro").

scheme (String): JSON Schema string defining desired output structure (use CreateJsonSchema).

Usage Notes: Get structured JSON streamed. GotGeminiStream provides chunks that form the final JSON. Requires model supporting structured output.

blocks1154×894 87.9 KB

Function: CreateJsonSchema

Builds a JSON Schema string.

propertyNames (List of String): Names of JSON properties.

propertyTypes (List of String): Corresponding types ("string", "number", "array", "boolean", "integer", "object").

propertyDescriptions (List of String): Descriptions for each property.

requiredProperties (List of String): Names of required properties.

blocks2012×322 50.4 KB

Function: StreamGenerateContentFromLocalPdfPath

Stream content based on a PDF file from the device's local storage.

pdfPath (String): Absolute path to the local PDF file.

prompt (String): Text prompt.

apiKey (String): Google API key.

modelName (String): Gemini model.

blocks2044×322 52 KB

Function: StreamGenerateContentFromLocalAudioPath

Stream content based on an audio file from the device's local storage.

audioPath (String): Absolute path to the local audio file. Needs storage permissions.

prompt (String): Text prompt related to the audio.

apiKey (String): Google API key.

modelName (String): Gemini model (e.g., "gemini-pro-vision" or future audio models).

255885316×554 149 KB

Generating Content with Images (Streaming)

Use StreamGenerateGeminiVisionContent.

contents: List of dictionaries. Can include text parts ("text": "...") and image parts ("inlineData": {"mimeType": "image/jpeg", "data": "base64_string..."}).

apiKey: Your Google API key.

Blocks example:

Screenshot 2023-12-16 2356055210×502 225 KB

This block opens a stream; results arrive via the GotGeminiStream event.

42528×424 170 KB

StreamGenerateGeminiFileContentFromBase64

Streams content based on provided Base64 encoded files and text.

apiKey (String): Google API key.

modelName (String): Gemini model. See docs.

fileBase64List (List): List of Base64 encoded file strings.

mimeTypeList (List): Corresponding list of MIME types.

additionalText (String): Additional text prompt.

blocks1108×224 22 KB

GenerateImage

Creates a new image from a text description.

prompt (Text): Description of the desired image.

apiKey (Text): Google API Key.

modelName (Text): Image generation model (e.g., "gemini-1.5-flash"). Check Google docs.

blocks(1)1036×332 34.9 KB

EditImage

Modifies an existing image provided as Base64.

prompt (Text): Instructions for changes.

inputImageBase64 (Text): Image to edit (Base64 string).

inputMimeType (Text): MIME type of input image (e.g., "image/jpeg").

apiKey (Text): Google API Key.

modelName (Text): Image editing model.

blocks(2)1710×332 39.3 KB

EditImageFromPath

Modifies an existing image using its file path.

prompt (Text): Instructions for changes.

inputImagePath (Text): Full path to the image file on device.

apiKey (Text): Google API Key.

modelName (Text): Image editing model.

blocks(3)1240×278 29.2 KB

EditMultipleImagesSimple

Advanced editing/generation using multiple input images (URL/Path/Base64) and text.

prompt (Text): Instructions involving the images.

imageSourceStrings (List): List of image sources (URLs, paths, or Base64 strings).

apiKey (Text): Google API Key.

modelName (Text): Multi-image capable model.

blocks(4)944×224 22 KB

DisplayBase64Image

Helper block to display Base64 image data on an Image component.

base64Data (Text): Base64 image data (from GotImageResponse).

mimeType (Text): Image MIME type (from GotImageResponse).

imageComponent (Component): The Image component to display on.

component_event1240×176 13.7 KB

Event: GotImageResponse

Fires when image generation/editing succeeds.

imageBase64 (Text): Resulting image as Base64. Empty on failure.

mimeType (Text): MIME type of the result (e.g., "image/png").

responseText (Text): Any text from the API (e.g., errors if blocked).

rawApiResponse (Text): Full JSON response (for debugging).

imagePath (Text): Path where the result image was saved in app storage (ASD). Empty on failure.

Examples of generating and editing with Gemini

_- visual selection648×588 43.2 KB

blocks (1)2042×440 71.3 KB

StreamGenerateContentFromLocalVideoPath

Parameters:

videoPath (String): The local file path to a video file.

prompt (String): The text prompt related to the video content.

apiKey (String): Your Google AI API Key.

modelName (String): The Gemini model to use.

systemInstructionsValue (String): Optional system instructions.

jsonSchemaString (String): Optional JSON schema for structured output.

Description: Uploads a local video file using the File API, polls until the file is processed ("ACTIVE"), and then starts a streaming request based on the video content and prompt. Optionally includes system instructions and/or requests structured output via a JSON schema. Response chunks arrive via GotGeminiStream. Triggers StreamFinished when done or ErrorOccurred on failure.

blocks2252×386 60.9 KB

StreamGenerateContentFromLocalVideoPathWithInstructions

Parameters:

videoPath (String): The local file path to a video file.

prompt (String): The text prompt related to the video content.

apiKey (String): Your Google AI API Key.

modelName (String): The Gemini model to use.

systemInstructionsValue (String): Optional system instructions.

Description: Similar to StreamGenerateContentFromLocalVideoPath, but only includes the option for system instructions (no structured output schema). Uploads the video, waits for processing, then starts the streaming request. Response chunks arrive via GotGeminiStream. Triggers StreamFinished when done or ErrorOccurred on failure. Uses standard Designer Properties for generation config.

_- visual selection612×612 46.4 KB

blocks1368×278 37.5 KB

StreamGenerateContentFromYouTubeUrl (Overload 1 - Basic)

Parameters:

youtubeUrl (String): Public URL of a YouTube video (including Shorts).

prompt (String): Text prompt relating to the video.

apiKey (String): Your Google AI API Key.

modelName (String): The Gemini model to use.

Description: Starts a streaming analysis request using a YouTube URL and prompt. Uses default generation settings from Designer Properties. Response chunks arrive via GotGeminiStream. Triggers StreamFinished when done or ErrorOccurred on failure.

blocks (1)1504×386 56.6 KB

StreamGenerateStructuredContentFromYouTubeUrl (Overload 2 - Advanced)

Parameters:

youtubeUrl (String): Public URL of a YouTube video (including Shorts).

prompt (String): Text prompt relating to the video.

apiKey (String): Your Google AI API Key.

modelName (String): The Gemini model to use.

systemInstructionsValue (String): Optional system instructions.

jsonSchemaString (String): Optional JSON schema for structured output.

Description: Starts a streaming analysis request using a YouTube URL and prompt. Optionally includes system instructions and/or requests structured output via a JSON schema. Response chunks arrive via GotGeminiStream. Triggers StreamFinished when done or ErrorOccurred on failure.

GenerateSingleSpeakerAudio
- Parameters:
  - text_input (String): The text content to be converted into speech. This can include natural language prompts to guide the style, accent, pace, and tone (e.g., "Say cheerfully: Have a wonderful day!").
  - api_key (String): Your Google AI API Key (used to initialize the client).
  - model_name (String): The specific Gemini model to use for speech generation (e.g., "gemini-2.5-flash-preview-tts", "gemini-2.5-pro-preview-tts").
  - voice_name (String): The desired prebuilt voice for the audio output (e.g., 'Kore', 'Puck', 'Zephyr'). A list of available voices can be found in the Gemini API documentation.
  - output_filename (String): (Optional) The desired filename to save the generated audio (e.g., "output.wav"). The method of saving might vary based on implementation.
- Description: Converts a given text input into audio spoken by a single synthesized voice. The API allows for control over the speech style through prompts and selection from a variety of prebuilt voices. The generated audio can then be streamed or saved to a file.

Single audio examles

with style (whispering)

with style (Acting)

without style

GenerateMultiSpeakerAudio
- Parameters:
  - script_input (String): A text script that includes dialogue for multiple speakers. Speaker names should be clearly indicated in the script (e.g., "Joe: Hello! Jane: Hi there!"). This input can also include natural language prompts to guide the style and tone for each speaker (e.g., "Make Speaker1 sound tired and Speaker2 sound excited: Speaker1: ... Speaker2: ...").
  - api_key (String): Your Google AI API Key (used to initialize the client).
  - model_name (String): The specific Gemini model to use for speech generation (e.g., "gemini-2.5-flash-preview-tts", "gemini-2.5-pro-preview-tts").
  - speaker_configurations (List of Objects): A list where each object defines a speaker and their voice. Each object should contain:
    - speaker_tag (String): The identifier for the speaker as used in the script_input (e.g., "Joe", "Speaker1").
    - voice_name (String): The desired prebuilt voice for this specific speaker (e.g., 'Kore', 'Puck').
  - output_filename (String): (Optional) The desired filename to save the generated multi-speaker audio (e.g., "dialogue.wav"). The method of saving might vary based on implementation.
- Description: Generates audio from a text script involving up to two distinct speakers. Each speaker can be assigned a unique prebuilt voice. The API supports prompts within the script to control the style, tone, and delivery for each speaker individually. The output can be streamed or saved.

blocks examble

for detailed guide how to use GeminiTTS visit this guide

Multible audio examle

51090×112 6.17 KB

GetGeminiModelNames

Retrieves a list of available Gemini model names.

apiKey (String): Your Google API key.

Events:

6710×172 5.16 KB

GotGeminiModelNames(modelNames as List): Triggered on success with the list of model names.

ErrorOccurred(message, component): Triggered if an error occurs during the API request.

Screenshot 2023-12-16 235605708×102 9.29 KB

Encoding Images to Base64

The EncodeImageToBase64 block encodes an image file path to Base64 (removes line breaks).

imagePath: Path to the image file.

Returns the Base64 string.

Error Handling

The ErrorOccurred event signals errors from the extension.

message: Error description.

component: Name of the component causing the error ("Gemini", "Gemini-JSON", etc.).

Examples

Example: Generate text (non-streaming):

Screenshot 2023-12-16 2356053406×282 62.1 KB

Example: Generate text (streaming):

Screenshot 2023-12-16 2356053516×282 63.4 KB

Example: Generate text with images (streaming):

Screenshot 2023-12-16 2356055210×502 221 KB

Example: Generate text with images in FreeForm Prompt (streaming):
Use TextFormater extension for FreeForm layout.

Screenshot 2023-12-16 2356055210×796 363 KB

Applications that use this extension :

Videos preview:

Aix_file:

Check the comparison between PAID and FREE versions:point_down:

PAID_file

Price: $5.99
Purchase: PayPal Link or You can pay HERE using your credit card
In both cases after payment, you'll be redirected to the download URL. Contact me for any help or issues.

FREE_file

Gemini_Mini.aix (11.6 KB)

Have Inquiries?
For questions about the Gemini extension, contact me via PM on Telegram.

Note :

You can try Gemini and get your API key from Google AI Studio.

Black_Knight · December 27, 2023, 7:01am

PaLM 2 NewBlocks added

Long_Cao · March 26, 2024, 9:36pm

Hi, I just paid for gemini.aix but it sent me chatGPT.aix

Black_Knight · March 27, 2024, 3:15am

@Long_Cao I have sent you the file of Gemini. aix and I thank you for reporting this issue
and for ChatGpt.aix it's yours cause this is my fault not yours

Lima · May 17, 2024, 1:25am

I just made payment via "Gemini.aix".

Lima · May 17, 2024, 1:33am

I paid for Gemini.aix today, but I was not directed to the website to download the file.

Taifun · May 17, 2024, 2:35am

Taifun

Lima · May 17, 2024, 9:35pm

I sent a private message and also tried via chat, but I still couldn't speak to "Cavaleiro_Negro".

AyProductions · May 18, 2024, 6:37am

~~I think "Black_Knight" is the extension author, not the one you said.~~
Oh wait you traslated the author's name, you obviously won't be able to send a message to them! Send it to "Black_Knight" (and do not translate the name please)...

Lima · May 18, 2024, 9:52am

Sorry for this error, the author of the extension is "Black_Knight". Thanks!

AyProductions · May 18, 2024, 9:52am

You're welcome.

Lima · May 18, 2024, 10:06am

I would like help in contacting "Black_Knight". I already sent a private message, tried via chat, but I was unsuccessful. I purchased the "Gemini.aix" extension, but after payment I was not directed to download the file.

Taifun · May 18, 2024, 12:25pm

@Black_Knight also gets messages if you post something here in this thread
just be patient

Taifun

Lima · May 18, 2024, 9:37pm

Thank you very much for your help, I will wait for the answer

Black_Knight · May 30, 2024, 10:25am

Firstly I am very sorry for my very late reply and for this bad situation I was very busy with my military missions at the last of those days and you

You can Contact me again here and I will solve your problem

Lima · May 30, 2024, 11:10am

Send me the URL to download the extension and everything will be resolved.

Lima · May 30, 2024, 12:01pm

Thank you very much for sending me the file. If I have any questions about the application, I will post them here.

Black_Knight · May 30, 2024, 12:18pm

You are welcome man !

Ok you can ask if you want any thing about it

Black_Knight · May 31, 2024, 5:59pm

New update for the Extension to meet the latest updates of the Gemini API .

Here's a summary of the updates made to the Gemini.aix compared to the initial version.

1. Model Selection:

The GenerateGeminiContent, StreamGenerateGeminiContent, and functions now all accept a modelName parameter, allowing the user to specify which Gemini model to use for the request. This provides flexibility in choosing the appropriate model for different tasks.

2. StreamGenerateGeminiFileContentFromBase64 Function:

New Function: A new function called StreamGenerateGeminiFileContentFromBase64 has been added.

Dogs re partially colorblind!1383×512 92.4 KB
Base64 File Input: This function accepts a list of Base64 encoded files (fileBase64List) and a corresponding list of MIME types (mimeTypeList).
Generic File Handling: It handles various file types (not just images) by using the MIME type information.
Streaming Response: It uses streaming to receive the response from the Gemini API and triggers the GotGeminiStream event for each chunk of text received.

3. GetGeminiModelNames Function:

New Function: A new function called GetGeminiModelNames has been added.

51090×112 6.17 KB
Retrieving Model Names: It retrieves a list of available Gemini model names from the API and triggers the GotGeminiModelNames event with the list.

6710×172 5.16 KB

4. GetFilePathFromDataURI Function:

New Function: A new function called GetFilePathFromDataURI has been added.
Data URI to File Path: It converts a Data URI (representing a file) to a local file path. It handles content://, file://, and data:// URI schemes.

5. getMimeType Function:

New Function: A new function called getMimeType has been added.
Get MIME Type: It takes a file path as input and returns the MIME type of the file using Files.probeContentType(path).

6. Code Cleanup and Improvements:

Removed Redundant Parameter: The contents parameter in the StreamGenerateGeminiVisionContentFromPathsAndText function was removed as it became unnecessary after adding separate parameters for images and text.
Error Handling: The code now includes more robust error handling, using try-catch blocks and triggering the ErrorOccurred event when necessary.

Overall, the updated code is more versatile, efficient, and user-friendly:

More Features: It provides functions to retrieve model names, handle various file types, and work with Data URIs.
Flexibility: Users can now choose specific Gemini models and send different file types to the API.
Efficiency: Streaming responses allow for better handling of large data.
Improved Usability: The code is more organized and includes better documentation and error handling.

These updates enhance the functionality and make the extension more useful for a wider range of applications within App Inventor.

Black_Knight · May 31, 2024, 7:03pm

If any one interested of this competition AI event from google