14

LostMind AI - Gemini Chat Assistant

A sophisticated Python application integrating Google's Vertex AI Gemini models with multi-modal support, file processing capabilities, and both GUI and CLI interfaces.

The Problem

Users need to access and leverage the capabilities of state-of-the-art AI models across multiple data types (text, images, PDFs) without complex setup or technical expertise.

The Solution

A comprehensive Python application providing direct integration with Google's Vertex AI platform, allowing multi-modal interactions with Gemini models through both GUI and CLI interfaces with robust error handling.

Impact

Enables non-technical users to leverage advanced AI capabilities across multiple data formats, simplifies complex interactions with AI models, and demonstrates clean, maintainable architecture for AI integrations.

Technologies:PythonGoogle Vertex AIGemini APITkinterAPI IntegrationAsynchronous Programming
Status:Completed

LostMind AI - Gemini Chat Assistant

A sophisticated Python application that provides direct access to Google's Vertex AI Gemini models through an intuitive interface, supporting multi-modal interactions and comprehensive file processing capabilities.

The Real Problem

LLM platforms like Vertex AI provide incredible capabilities but present several challenges for typical users:

  • Complexity Barrier: Direct API integration requires technical knowledge many users don't possess
  • Multi-Modal Limitations: Processing different file types typically requires separate tools and workflows
  • Authentication Hurdles: Managing API credentials and authentication can be daunting
  • Lack of Context Persistence: Maintaining conversation history and context is challenging when directly using APIs
  • Limited Error Handling: API responses don't provide user-friendly error messages and recovery options

The Solution: Architecture & Implementation

The Gemini Chat Assistant solves these problems with a clean, modular architecture focused on maintainability and usability:

class GeminiChatAssistant:
    def __init__(self, gui_mode=True):
        self.gui_mode = gui_mode
        self.chat_history = []
        self.uploaded_files = []
        self.system_instruction = DEFAULT_INSTRUCTION
        self.selected_model = "gemini-2.0-flash-001"
        self.temperature = 0.7
        self.top_p = 0.95
        
        # Set up logging
        self.logger = logging.getLogger(__name__)
        self.logger.setLevel(logging.INFO)
        handler = logging.StreamHandler()
        formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')
        handler.setFormatter(formatter)
        self.logger.addHandler(handler)
        
        # Initialize the GenAI client
        try:
            self.client = genai.Client(
                vertexai=True,
                project='lostmind-ai-sumit-mon',
                location='us-central1',
            )
            # List available models for selection
            self.available_models = self.list_available_models()
        except Exception as e:
            error_msg = f"Failed to initialize GenAI client: {str(e)}"
            self.logger.error(error_msg)
            if self.gui_mode:
                messagebox.showerror("Error", error_msg)
            else:
                print(f"Error: {error_msg}")
                print("Please ensure you have set up authentication for Vertex AI.")
                exit(1)

The implementation provides several key features:

1. Multi-Modal File Processing

The application supports multiple file types with specialized processing for each:

def upload_file(self, file_path):
    """Process and upload a file to be used in the conversation"""
    if not os.path.exists(file_path):
        error_msg = f"File '{file_path}' not found."
        self.logger.error(error_msg)
        if self.gui_mode:
            messagebox.showerror("Error", error_msg)
        else:
            print(f"Error: {error_msg}")
        return None
    
    # Define size limits based on file type
    file_ext = os.path.splitext(file_path)[1].lower()
    size_limits = {
        '.jpg': 10,  # 10MB
        '.jpeg': 10,
        '.png': 10,
        '.gif': 10,
        '.bmp': 10,
        '.pdf': 10,  # 10MB
        '.txt': 5,   # 5MB
        '.md': 5,
        '.py': 5,
        # ... additional formats ...
    }
    
    # File type-specific processing
    if file_ext in ['.jpg', '.jpeg', '.png', '.gif', '.bmp']:
        image = PIL.Image.open(file_path)
        # Process image...
    elif file_ext in ['.txt', '.md', '.py', '.java', '.js', '.html', '.css', '.json', '.csv']:
        # Process text files...
    elif file_ext == '.pdf':
        # Process PDF files...

2. Robust Error Handling

A key strength is the comprehensive error management throughout the codebase:

try:
    # Process complex operation...
except Exception as e:
    error_msg = f"Failed to process file: {str(e)}"
    self.logger.error(error_msg)  # Log for debugging
    if self.gui_mode:
        messagebox.showerror("Error", error_msg)  # GUI error
    else:
        print(f"Error: {error_msg}")  # CLI error
    return None  # Graceful failure

3. Clean Message Processing Pipeline

The message processing follows well-structured steps:

def send_message(self, user_input, include_files=True, use_search=True):
    """Send a message to the model and get the response"""
    try:
        # 1. Add user message to history
        self.chat_history.append({"role": "user", "content": user_input, "is_visible": True})
        
        # 2. Prepare contents list with visible chat history
        contents = []
        for entry in self.chat_history:
            if not entry.get("is_visible", True):
                continue
            
            if entry["role"] == "user":
                parts = []
                
                # 3. Include uploaded files if this is the latest message and include_files is True
                if include_files and entry == self.chat_history[-1]:
                    for file in self.uploaded_files:
                        parts.append(file["part"])
                
                # 4. Add the text content
                parts.append({"text": entry["content"]})
                
                contents.append(types.Content(role="user", parts=parts))
            else:  # AI responses
                contents.append(types.Content(
                    role="model",
                    parts=[{"text": entry["content"]}]
                ))
        
        # 5. Set up generation config with safety settings
        generation_config = types.GenerateContentConfig(
            temperature=self.temperature,
            top_p=self.top_p,
            max_output_tokens=8192,
            response_modalities=["TEXT"],
            safety_settings=[...]
        )
        
        # 6. Add Google Search tool if requested and using Gemini 2 model
        if use_search and "gemini-2" in self.selected_model:
            self.logger.info("Adding Google Search capability to request")
            generation_config.tools = [types.Tool(google_search=types.GoogleSearch())]
        
        # 7. Generate content
        response = self.client.models.generate_content(
            model=self.selected_model,
            contents=contents,
            config=generation_config
        )
        
        # 8. Add AI response to history
        response_text = response.text
        self.chat_history.append({"role": "ai", "content": response_text, "is_visible": True})
        
        return response_text
        
    except Exception as e:
        # Error handling

4. Dual User Interface

The application provides both GUI and CLI interfaces with consistent functionality:

class GeminiChatGUI:
    def __init__(self, root):
        self.root = root
        self.root.title("Gemini Chat Assistant")
        self.root.geometry("950x700")
        self.root.minsize(800, 600)
        
        # Initialize the assistant
        self.assistant = GeminiChatAssistant(gui_mode=True)
        
        # Create the UI with settings, chat, and input frames
        
class GeminiChatCLI:
    def __init__(self):
        # Initialize the assistant
        self.assistant = GeminiChatAssistant(gui_mode=False)
        
        print("Welcome to Gemini Chat Assistant (CLI Mode)!")
        self.configure_assistant()
        self.chat_loop()

Technical Implementation Challenges

Building this application required overcoming several technical hurdles:

1. Authentication and Configuration Management

The application needed to securely handle API credentials while making setup user-friendly:

# Start with verification
CREDS_FILE="$SCRIPT_DIR/credentials/service-account-key.json"
 
# Offer convenient file selection if credentials not found
if [ ! -f "$CREDS_FILE" ]; then
    # Use Finder to select a file
    echo -e "${YELLOW}Please select your Google Cloud service account key file...${NC}"
    SELECTED_FILE=$(osascript -e 'tell application "Finder" to set selectedFile to POSIX path of (choose file with prompt "Select your Google Cloud service account key file:")')
    
    # Copy the selected file to the credentials directory
    cp "$SELECTED_FILE" "$CREDS_FILE"
fi
 
# Set credentials environment variable
export GOOGLE_APPLICATION_CREDENTIALS="$CREDS_FILE"

2. Multi-Modal Content Handling

Processing different file types required distinct approaches for each format:

# For image files
if file_ext in ['.jpg', '.jpeg', '.png', '.gif', '.bmp']:
    image = PIL.Image.open(file_path)
    
    # Create file part with correct format for Vertex AI
    file_part = {"inline_data": {"mime_type": f"image/{file_ext[1:]}", "data": self.image_to_base64(image)}}
    
# For text files
elif file_ext in ['.txt', '.md', '.py', '.java', '.js', '.html', '.css', '.json', '.csv']:
    with open(file_path, 'r', encoding='utf-8') as f:
        text_content = f.read()
    
    # Create file part with correct format for Vertex AI
    file_part = {"text": f"FILE CONTENT ({os.path.basename(file_path)}):\n\n{text_content}"}
    
# For PDF files
elif file_ext == '.pdf':
    # Read PDF as binary data
    with open(file_path, 'rb') as f:
        pdf_data = f.read()
    
    # Create file part with correct format for Vertex AI
    file_part = {"inline_data": {"mime_type": "application/pdf", "data": base64.b64encode(pdf_data).decode('utf-8')}}

3. Cloud Storage Integration

The app connects to Google Cloud Storage for larger file handling:

def upload_gcs_file(self, gcs_uri):
    """Upload a file from Google Cloud Storage"""
    if not gcs_uri.startswith("gs://"):
        error_msg = f"Invalid GCS URI: {gcs_uri}. Must start with 'gs://'"
        # Error handling...
        return None
    
    try:
        # Extract filename from GCS URI
        file_name = gcs_uri.split("/")[-1]
        file_ext = os.path.splitext(file_name)[1].lower()
        
        # For text files, download and process them
        if file_ext in ['.txt', '.md', '.py', '.java', '.js', '.html', '.css', '.json', '.csv']:
            from google.cloud import storage
            
            # Parse the GCS URI
            bucket_name = gcs_uri.replace("gs://", "").split("/")[0]
            blob_name = gcs_uri.replace(f"gs://{bucket_name}/", "")
            
            # Initialize storage client and download
            storage_client = storage.Client()
            bucket = storage_client.bucket(bucket_name)
            blob = bucket.blob(blob_name)
            
            with tempfile.NamedTemporaryFile(delete=False, suffix=os.path.splitext(file_name)[1]) as tmp:
                temp_file = tmp.name
            blob.download_to_filename(temp_file)
            
            # Process downloaded file
        # Handle other file types

4. Model Identification and Selection

The application dynamically identifies available models from the Vertex AI platform:

def list_available_models(self):
    """Get list of available models from Vertex AI"""
    try:
        models = list(self.client.models.list())
        return [model.name for model in models 
                if "gemini" in model.name.lower() and 
                not model.name.endswith("vision") and 
                not model.name.endswith("latest")]
    except Exception as e:
        self.logger.warning(f"Failed to retrieve model list: {str(e)}")
        # Fallback to default models
        return [
            "gemini-1.5-flash-001",
            "gemini-1.5-pro-001",
            "gemini-2.0-flash-001",
            "gemini-2.0-pro-001"
        ]

Project Organization

The project follows a clean organization structure with clear separation of components:

gemini-chat-assistant/
├── gemini_chat_assistant.py   # Main application file
├── run_gemini_chat.sh         # Startup script with environment setup
├── requirements.txt           # Dependencies
└── credentials/               # API credentials storage (gitignored)
    └── service-account-key.json

The code itself follows a clean architecture pattern:

  1. Core Backend Class: GeminiChatAssistant handles all API interaction and business logic
  2. UI Classes: Separate GeminiChatGUI and GeminiChatCLI for interface handling
  3. Utility Functions: Dedicated methods for file processing, error handling, and export

Learning Journey and Technical Growth

Developing this application provided significant learning experiences:

  1. API Integration Skills: Working directly with the Vertex AI API required understanding authentication flows, request structure, and response handling
  2. Multi-Modal Content Processing: Handling various file types required learning format-specific processing techniques
  3. GUI Development: Building a responsive, user-friendly interface with Tkinter involved learning event-driven programming patterns
  4. Error Resilience: Implementing comprehensive error handling with graceful failure modes
  5. Cross-Platform Deployment: Creating platform-specific startup scripts and environment management

Future Development

The project has a clear roadmap for future enhancements:

  1. Streaming Responses: Implementing real-time token streaming for more responsive interactions
  2. Session Management: Adding session persistence to save conversations between runs
  3. Enhanced File Formats: Adding support for more file types and larger file handling
  4. Custom Model Fine-Tuning: Integration with Vertex AI fine-tuning capabilities
  5. Web Interface: Adding a Flask or FastAPI web interface option

Impact & Outcomes

This project demonstrates the ability to build comprehensive AI applications with:

  • Clean Architecture: Proper separation of concerns with clear component boundaries
  • Robust Error Handling: Graceful failure modes and comprehensive logging
  • User-Centric Design: Interface options catering to different user preferences
  • Cloud Integration: Proper integration with Google Cloud services
  • Maintainable Codebase: Well-structured, modular design with clear documentation

The application provides a foundation for building more advanced AI tools that leverage state-of-the-art models while maintaining accessibility for non-technical users.