LostMind AI - Gemini Chat Assistant
A sophisticated Python application that provides direct access to Google's Vertex AI Gemini models through an intuitive interface, supporting multi-modal interactions and comprehensive file processing capabilities.
The Real Problem
LLM platforms like Vertex AI provide incredible capabilities but present several challenges for typical users:
- Complexity Barrier: Direct API integration requires technical knowledge many users don't possess
- Multi-Modal Limitations: Processing different file types typically requires separate tools and workflows
- Authentication Hurdles: Managing API credentials and authentication can be daunting
- Lack of Context Persistence: Maintaining conversation history and context is challenging when directly using APIs
- Limited Error Handling: API responses don't provide user-friendly error messages and recovery options
The Solution: Architecture & Implementation
The Gemini Chat Assistant solves these problems with a clean, modular architecture focused on maintainability and usability:
class GeminiChatAssistant:
def __init__(self, gui_mode=True):
self.gui_mode = gui_mode
self.chat_history = []
self.uploaded_files = []
self.system_instruction = DEFAULT_INSTRUCTION
self.selected_model = "gemini-2.0-flash-001"
self.temperature = 0.7
self.top_p = 0.95
# Set up logging
self.logger = logging.getLogger(__name__)
self.logger.setLevel(logging.INFO)
handler = logging.StreamHandler()
formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')
handler.setFormatter(formatter)
self.logger.addHandler(handler)
# Initialize the GenAI client
try:
self.client = genai.Client(
vertexai=True,
project='lostmind-ai-sumit-mon',
location='us-central1',
)
# List available models for selection
self.available_models = self.list_available_models()
except Exception as e:
error_msg = f"Failed to initialize GenAI client: {str(e)}"
self.logger.error(error_msg)
if self.gui_mode:
messagebox.showerror("Error", error_msg)
else:
print(f"Error: {error_msg}")
print("Please ensure you have set up authentication for Vertex AI.")
exit(1)
The implementation provides several key features:
1. Multi-Modal File Processing
The application supports multiple file types with specialized processing for each:
def upload_file(self, file_path):
"""Process and upload a file to be used in the conversation"""
if not os.path.exists(file_path):
error_msg = f"File '{file_path}' not found."
self.logger.error(error_msg)
if self.gui_mode:
messagebox.showerror("Error", error_msg)
else:
print(f"Error: {error_msg}")
return None
# Define size limits based on file type
file_ext = os.path.splitext(file_path)[1].lower()
size_limits = {
'.jpg': 10, # 10MB
'.jpeg': 10,
'.png': 10,
'.gif': 10,
'.bmp': 10,
'.pdf': 10, # 10MB
'.txt': 5, # 5MB
'.md': 5,
'.py': 5,
# ... additional formats ...
}
# File type-specific processing
if file_ext in ['.jpg', '.jpeg', '.png', '.gif', '.bmp']:
image = PIL.Image.open(file_path)
# Process image...
elif file_ext in ['.txt', '.md', '.py', '.java', '.js', '.html', '.css', '.json', '.csv']:
# Process text files...
elif file_ext == '.pdf':
# Process PDF files...
2. Robust Error Handling
A key strength is the comprehensive error management throughout the codebase:
try:
# Process complex operation...
except Exception as e:
error_msg = f"Failed to process file: {str(e)}"
self.logger.error(error_msg) # Log for debugging
if self.gui_mode:
messagebox.showerror("Error", error_msg) # GUI error
else:
print(f"Error: {error_msg}") # CLI error
return None # Graceful failure
3. Clean Message Processing Pipeline
The message processing follows well-structured steps:
def send_message(self, user_input, include_files=True, use_search=True):
"""Send a message to the model and get the response"""
try:
# 1. Add user message to history
self.chat_history.append({"role": "user", "content": user_input, "is_visible": True})
# 2. Prepare contents list with visible chat history
contents = []
for entry in self.chat_history:
if not entry.get("is_visible", True):
continue
if entry["role"] == "user":
parts = []
# 3. Include uploaded files if this is the latest message and include_files is True
if include_files and entry == self.chat_history[-1]:
for file in self.uploaded_files:
parts.append(file["part"])
# 4. Add the text content
parts.append({"text": entry["content"]})
contents.append(types.Content(role="user", parts=parts))
else: # AI responses
contents.append(types.Content(
role="model",
parts=[{"text": entry["content"]}]
))
# 5. Set up generation config with safety settings
generation_config = types.GenerateContentConfig(
temperature=self.temperature,
top_p=self.top_p,
max_output_tokens=8192,
response_modalities=["TEXT"],
safety_settings=[...]
)
# 6. Add Google Search tool if requested and using Gemini 2 model
if use_search and "gemini-2" in self.selected_model:
self.logger.info("Adding Google Search capability to request")
generation_config.tools = [types.Tool(google_search=types.GoogleSearch())]
# 7. Generate content
response = self.client.models.generate_content(
model=self.selected_model,
contents=contents,
config=generation_config
)
# 8. Add AI response to history
response_text = response.text
self.chat_history.append({"role": "ai", "content": response_text, "is_visible": True})
return response_text
except Exception as e:
# Error handling
4. Dual User Interface
The application provides both GUI and CLI interfaces with consistent functionality:
class GeminiChatGUI:
def __init__(self, root):
self.root = root
self.root.title("Gemini Chat Assistant")
self.root.geometry("950x700")
self.root.minsize(800, 600)
# Initialize the assistant
self.assistant = GeminiChatAssistant(gui_mode=True)
# Create the UI with settings, chat, and input frames
class GeminiChatCLI:
def __init__(self):
# Initialize the assistant
self.assistant = GeminiChatAssistant(gui_mode=False)
print("Welcome to Gemini Chat Assistant (CLI Mode)!")
self.configure_assistant()
self.chat_loop()
Technical Implementation Challenges
Building this application required overcoming several technical hurdles:
1. Authentication and Configuration Management
The application needed to securely handle API credentials while making setup user-friendly:
# Start with verification
CREDS_FILE="$SCRIPT_DIR/credentials/service-account-key.json"
# Offer convenient file selection if credentials not found
if [ ! -f "$CREDS_FILE" ]; then
# Use Finder to select a file
echo -e "${YELLOW}Please select your Google Cloud service account key file...${NC}"
SELECTED_FILE=$(osascript -e 'tell application "Finder" to set selectedFile to POSIX path of (choose file with prompt "Select your Google Cloud service account key file:")')
# Copy the selected file to the credentials directory
cp "$SELECTED_FILE" "$CREDS_FILE"
fi
# Set credentials environment variable
export GOOGLE_APPLICATION_CREDENTIALS="$CREDS_FILE"
2. Multi-Modal Content Handling
Processing different file types required distinct approaches for each format:
# For image files
if file_ext in ['.jpg', '.jpeg', '.png', '.gif', '.bmp']:
image = PIL.Image.open(file_path)
# Create file part with correct format for Vertex AI
file_part = {"inline_data": {"mime_type": f"image/{file_ext[1:]}", "data": self.image_to_base64(image)}}
# For text files
elif file_ext in ['.txt', '.md', '.py', '.java', '.js', '.html', '.css', '.json', '.csv']:
with open(file_path, 'r', encoding='utf-8') as f:
text_content = f.read()
# Create file part with correct format for Vertex AI
file_part = {"text": f"FILE CONTENT ({os.path.basename(file_path)}):\n\n{text_content}"}
# For PDF files
elif file_ext == '.pdf':
# Read PDF as binary data
with open(file_path, 'rb') as f:
pdf_data = f.read()
# Create file part with correct format for Vertex AI
file_part = {"inline_data": {"mime_type": "application/pdf", "data": base64.b64encode(pdf_data).decode('utf-8')}}
3. Cloud Storage Integration
The app connects to Google Cloud Storage for larger file handling:
def upload_gcs_file(self, gcs_uri):
"""Upload a file from Google Cloud Storage"""
if not gcs_uri.startswith("gs://"):
error_msg = f"Invalid GCS URI: {gcs_uri}. Must start with 'gs://'"
# Error handling...
return None
try:
# Extract filename from GCS URI
file_name = gcs_uri.split("/")[-1]
file_ext = os.path.splitext(file_name)[1].lower()
# For text files, download and process them
if file_ext in ['.txt', '.md', '.py', '.java', '.js', '.html', '.css', '.json', '.csv']:
from google.cloud import storage
# Parse the GCS URI
bucket_name = gcs_uri.replace("gs://", "").split("/")[0]
blob_name = gcs_uri.replace(f"gs://{bucket_name}/", "")
# Initialize storage client and download
storage_client = storage.Client()
bucket = storage_client.bucket(bucket_name)
blob = bucket.blob(blob_name)
with tempfile.NamedTemporaryFile(delete=False, suffix=os.path.splitext(file_name)[1]) as tmp:
temp_file = tmp.name
blob.download_to_filename(temp_file)
# Process downloaded file
# Handle other file types
4. Model Identification and Selection
The application dynamically identifies available models from the Vertex AI platform:
def list_available_models(self):
"""Get list of available models from Vertex AI"""
try:
models = list(self.client.models.list())
return [model.name for model in models
if "gemini" in model.name.lower() and
not model.name.endswith("vision") and
not model.name.endswith("latest")]
except Exception as e:
self.logger.warning(f"Failed to retrieve model list: {str(e)}")
# Fallback to default models
return [
"gemini-1.5-flash-001",
"gemini-1.5-pro-001",
"gemini-2.0-flash-001",
"gemini-2.0-pro-001"
]
Project Organization
The project follows a clean organization structure with clear separation of components:
gemini-chat-assistant/
├── gemini_chat_assistant.py # Main application file
├── run_gemini_chat.sh # Startup script with environment setup
├── requirements.txt # Dependencies
└── credentials/ # API credentials storage (gitignored)
└── service-account-key.json
The code itself follows a clean architecture pattern:
- Core Backend Class:
GeminiChatAssistant
handles all API interaction and business logic - UI Classes: Separate
GeminiChatGUI
andGeminiChatCLI
for interface handling - Utility Functions: Dedicated methods for file processing, error handling, and export
Learning Journey and Technical Growth
Developing this application provided significant learning experiences:
- API Integration Skills: Working directly with the Vertex AI API required understanding authentication flows, request structure, and response handling
- Multi-Modal Content Processing: Handling various file types required learning format-specific processing techniques
- GUI Development: Building a responsive, user-friendly interface with Tkinter involved learning event-driven programming patterns
- Error Resilience: Implementing comprehensive error handling with graceful failure modes
- Cross-Platform Deployment: Creating platform-specific startup scripts and environment management
Future Development
The project has a clear roadmap for future enhancements:
- Streaming Responses: Implementing real-time token streaming for more responsive interactions
- Session Management: Adding session persistence to save conversations between runs
- Enhanced File Formats: Adding support for more file types and larger file handling
- Custom Model Fine-Tuning: Integration with Vertex AI fine-tuning capabilities
- Web Interface: Adding a Flask or FastAPI web interface option
Impact & Outcomes
This project demonstrates the ability to build comprehensive AI applications with:
- Clean Architecture: Proper separation of concerns with clear component boundaries
- Robust Error Handling: Graceful failure modes and comprehensive logging
- User-Centric Design: Interface options catering to different user preferences
- Cloud Integration: Proper integration with Google Cloud services
- Maintainable Codebase: Well-structured, modular design with clear documentation
The application provides a foundation for building more advanced AI tools that leverage state-of-the-art models while maintaining accessibility for non-technical users.