Spaces:

iamfaham
/

multimodal-sentiment-analysis

Running

App Files Files Community

Faham commited on Aug 23

Commit

4b35e49

0 Parent(s):

CREATE: initialized repo

Browse files

Files changed (13) hide show

.gitignore +14 -0
.python-version +1 -0
.streamlit/config.toml +21 -0
README.md +271 -0
app.py +1220 -0
models/audio_sentiment_analysis.ipynb +0 -0
models/vision_sentiment_analysis.ipynb +0 -0
pyproject.toml +7 -0
requirements.txt +12 -0
run_app.py +65 -0
test_audio_model.py +173 -0
test_vision_model.py +136 -0
uv.lock +8 -0

.gitignore ADDED Viewed

	@@ -0,0 +1,14 @@

+# Python-generated files
+__pycache__/
+*.py[oc]
+build/
+dist/
+wheels/
+*.egg-info
+# Virtual environments
+.venv
+# model files
+*.pth
+models/*.pth

.python-version ADDED Viewed

	@@ -0,0 +1 @@


1	+ 3.9

.streamlit/config.toml ADDED Viewed

	@@ -0,0 +1,21 @@

+[global]
+developmentMode = false
+[server]
+headless = false
+port = 8501
+enableCORS = false
+enableXsrfProtection = false
+[browser]
+gatherUsageStats = false
+[theme]
+primaryColor = "#1f77b4"
+backgroundColor = "#ffffff"
+secondaryBackgroundColor = "#f0f2f6"
+textColor = "#262730"
+font = "sans serif"
+[client]
+showErrorDetails = true

README.md ADDED Viewed

	@@ -0,0 +1,271 @@

+# Sentiment Analysis Testing Ground
+A comprehensive multi-page Streamlit application for testing three independent sentiment analysis models: text, audio, and vision-based sentiment analysis.
+## 🚀 Features
+- **Multi-Page Interface**: Clean navigation with dedicated pages for each model
+- **Text Sentiment Analysis**: ✅ **READY TO USE** - TextBlob NLP model integrated
+- **Audio Sentiment Analysis**: ✅ **READY TO USE** - Fine-tuned Wav2Vec2 model integrated
+  - 📁 **File Upload**: Support for WAV, MP3, M4A, FLAC files
+  - 🎙️ **Audio Recording**: Direct microphone recording (max 5 seconds)
+  - 🔄 **Smart Preprocessing**: Automatic 16kHz sampling, 5s max duration (CREMA-D + RAVDESS format)
+- **Vision Sentiment Analysis**: ✅ **READY TO USE** - Fine-tuned ResNet-50 model integrated
+  - 📁 **File Upload**: Support for PNG, JPG, JPEG, BMP, TIFF files
+  - 📷 **Camera Capture**: Take photos directly with your camera
+  - 🔄 **Smart Preprocessing**: Automatic face detection, tight face crop (0% padding), grayscale conversion, 224x224 resize
+- **Fused Model**: Combine predictions from all three models
+- **Modern UI**: Beautiful, responsive interface with custom styling
+- **File Support**: Multiple audio and image format support
+## 📋 Requirements
+- Python 3.9 or higher
+- Streamlit 1.28.0 or higher
+- PyTorch 1.13.0 or higher
+- Additional dependencies listed in `requirements.txt`
+## 🛠️ Installation
+1. **Clone the repository**:
+   ```bash
+   git clone <your-repo-url>
+   cd sentiment-fused
+   ```
+2. **Create a virtual environment** (recommended):
+   ```bash
+   python -m venv venv
+   # On Windows
+   venv\Scripts\activate
+   # On macOS/Linux
+   source venv/bin/activate
+   ```
+3. **Install dependencies**:
+   ```bash
+   pip install -r requirements.txt
+   ```
+## 🚀 Usage
+1. **Start the Streamlit application**:
+   ```bash
+   streamlit run app.py
+   ```
+2. **Open your browser** and navigate to the URL shown in the terminal (usually `http://localhost:8501`)
+3. **Navigate between pages** using the sidebar:
+   - 🏠 **Home**: Overview and welcome page
+   - 📝 **Text Sentiment**: ✅ **Ready to use** - Analyze text with TextBlob
+   - 🎵 **Audio Sentiment**: ✅ **Ready to use** - Analyze audio with Wav2Vec2 - 📁 Upload audio files or 🎙️ record directly with microphone using `st.audio_input`
+   - 🖼️ **Vision Sentiment**: ✅ **Ready to use** - Analyze images with ResNet-50
+     - 📁 Upload image files or 📷 take photos with camera
+   - 🔗 **Fused Model**: Combine all three models
+## 🧪 Testing the Models
+Before running the full app, you can test if the models load correctly:
+### Vision Model Test
+```bash
+python test_vision_model.py
+```
+### Audio Model Test
+```bash
+python test_audio_model.py
+```
+These will verify that:
+- The model files exist
+- PyTorch can load the architectures
+- The trained weights can be loaded
+- Inference runs without errors
+### 🔍 Troubleshooting Model Issues
+If you encounter tensor size mismatch errors, run the diagnostic scripts:
+```bash
+python check_model.py          # For vision model
+python test_audio_model.py     # For audio model
+```
+These will examine your model files and identify:
+- The actual number of output classes
+- Whether the architectures match expected models
+- Any compatibility issues
+**Common Issues:**
+- **Tensor size mismatch**: Models might have been trained with different numbers of classes
+- **Architecture mismatch**: Models might not match expected architectures
+- **Weight loading errors**: Corrupted or incompatible model files
+- **Library dependencies**: Missing transformers, librosa, or other required libraries
+## 📁 Project Structure
+```
+sentiment-fused/
+├── app.py                          # Main Streamlit application
+├── requirements.txt                # Python dependencies
+├── README.md                      # This file
+├── test_vision_model.py           # Vision model test script
+├── test_audio_model.py            # Audio model test script
+├── main.py                        # Original main file
+├── pyproject.toml                 # Project configuration
+└── models/                        # Model files and notebooks
+    ├── audio_sentiment_analysis.ipynb
+    ├── vision_sentiment_analysis.ipynb
+    ├── wav2vec2_model.pth        # ✅ Fine-tuned Wav2Vec2 model (READY)
+    └── resnet50_model.pth        # ✅ Fine-tuned ResNet-50 model (READY)
+```
+## 🔧 Model Integration Status
+### ✅ Text Sentiment Model - **READY TO USE**
+- **Model**: TextBlob (Natural Language Processing)
+- **Features**: Sentiment classification (Positive/Negative/Neutral) with confidence scores
+- **Input**: Any text input
+- **Analysis**: Real-time NLP sentiment analysis
+- **Status**: Fully integrated and tested
+### ✅ Vision Sentiment Model - **READY TO USE**
+- **Model**: ResNet-50 fine-tuned on FER2013 dataset
+- **Training Dataset**:
+  - 🖼️ **FER2013**: Facial Expression Recognition 2013 dataset
+  - 🎯 **Classes**: 7 emotions mapped to 3 sentiments (Negative, Neutral, Positive)
+  - 🏗️ **Architecture**: ResNet-50 with ImageNet weights, fine-tuned for sentiment
+- **Classes**: 3 sentiment classes (Negative, Neutral, Positive)
+- **Input**: Images (PNG, JPG, JPEG, BMP, TIFF)
+  - **Preprocessing**:
+    - 🔍 **Face Detection**: Automatic face detection using OpenCV
+    - 🎨 **Grayscale Conversion**: Convert to grayscale and replicate to 3 channels
+    - 📏 **Face Cropping**: Crop to face region with 0% padding (tightest crop)
+    - 📐 **Resize**: Scale to 224x224 pixels (FER2013 format)
+    - 🎯 **Transforms**: Resize(224) → CenterCrop(224) → ToTensor → ImageNet Normalization
+    - 📊 **Format**: 224x224 RGB with ImageNet mean/std normalization
+- **Status**: Fully integrated and tested
+### ✅ Audio Sentiment Model - **READY TO USE**
+- **Model**: Wav2Vec2-base fine-tuned on RAVDESS + CREMA-D datasets
+- **Training Datasets**:
+  - 🎵 **RAVDESS**: Ryerson Audio-Visual Database of Emotional Speech and Song
+  - 🎵 **CREMA-D**: Crowd-sourced Emotional Multimodal Actors Dataset
+- **Classes**: 3 sentiment classes (Negative, Neutral, Positive)
+- **Input**:
+  - 📁 **File Upload**: Audio files (WAV, MP3, M4A, FLAC)
+  - 🎙️ **Direct Recording**: Microphone input using `st.audio_input`
+  - **Preprocessing**:
+    - 🔄 **Sampling Rate**: 16kHz (matching CREMA-D + RAVDESS training)
+    - ⏱️ **Duration**: Max 5 seconds (matching training max_duration_s=5.0)
+    - 🎵 **Feature Extraction**: AutoFeatureExtractor with truncation and padding
+    - 📊 **Format**: Automatic resampling, max_length=int(5.0 \* 16000)
+- **Status**: Fully integrated and tested
+### 🔗 Fused Model - **FULLY READY**
+The fused model now uses all three integrated models: text (TextBlob), audio (Wav2Vec2), and vision (ResNet-50).
+## 📊 Supported File Formats
+### Audio Files
+- WAV (.wav)
+- MP3 (.mp3)
+- M4A (.m4a)
+- FLAC (.flac)
+### Image Files
+- PNG (.png)
+- JPEG (.jpg, .jpeg)
+- BMP (.bmp)
+- TIFF (.tiff)
+## 🎨 Customization
+The application includes custom CSS styling that can be modified in the `app.py` file. Key styling classes:
+- `.main-header`: Main page headers
+- `.model-card`: Information cards
+- `.result-box`: Result display boxes
+- `.upload-section`: File upload areas
+## 🔍 Troubleshooting
+### Common Issues
+1. **Port already in use**: Change the port with `streamlit run app.py --server.port 8502`
+2. **Vision model loading errors**:
+   - Ensure `models/resnet50_model.pth` exists
+   - Run `python test_vision_model.py` to diagnose issues
+   - Check PyTorch installation: `python -c "import torch; print(torch.__version__)"`
+3. **Memory issues**: Large audio/image files may require more memory. Consider file size limits
+4. **OpenCV issues**: If face detection fails, ensure `opencv-python` is installed:
+   ```bash
+   pip install opencv-python
+   ```
+5. **Dependency conflicts**: Use a virtual environment to avoid package conflicts
+### Performance Tips
+- Use appropriate file sizes for audio and images
+- Consider implementing caching for model predictions
+- Use GPU acceleration if available for PyTorch models
+- The vision model automatically uses GPU if available
+## 🤝 Contributing
+1. Fork the repository
+2. Create a feature branch
+3. Make your changes
+4. Test thoroughly
+5. Submit a pull request
+## 📝 License
+This project is licensed under the MIT License - see the LICENSE file for details.
+## 🙏 Acknowledgments
+- Streamlit team for the amazing web framework
+- PyTorch community for deep learning tools
+- Hugging Face for transformer models
+- All contributors to the open-source libraries used
+## 📞 Support
+For questions or issues:
+1. Check the troubleshooting section above
+2. Run `python test_vision_model.py` for vision model issues
+3. Review the model integration examples
+4. Open an issue on the repository
+5. Contact the development team
+---
+**Happy Sentiment Analysis! 🧠✨**
+**Note**: All **THREE MODELS** are now fully integrated and ready to use! 🎉

app.py ADDED Viewed

	@@ -0,0 +1,1220 @@

+import streamlit as st
+import pandas as pd
+from PIL import Image
+import io
+import numpy as np
+import tempfile
+import os
+import torch
+import torch.nn as nn
+from torchvision import transforms, models
+import torch.nn.functional as F
+# Page configuration
+st.set_page_config(
+    page_title="Sentiment Analysis Testing Ground",
+    page_icon="🧠",
+    layout="wide",
+    initial_sidebar_state="expanded",
+)
+# Custom CSS for better styling
+st.markdown(
+    """
+<style>
+    .main-header {
+        font-size: 2.5rem;
+        font-weight: bold;
+        color: #1f77b4;
+        text-align: center;
+        margin-bottom: 2rem;
+    }
+    .model-card {
+        background-color: #f0f2f6;
+        padding: 1.5rem;
+        border-radius: 10px;
+        margin: 1rem 0;
+        border-left: 4px solid #1f77b4;
+    }
+    .result-box {
+        background-color: #e8f4fd;
+        padding: 1rem;
+        border-radius: 8px;
+        border: 1px solid #1f77b4;
+        margin: 1rem 0;
+    }
+    .upload-section {
+        background-color: #f8f9fa;
+        padding: 1.5rem;
+        border-radius: 10px;
+        border: 2px dashed #dee2e6;
+        text-align: center;
+        margin: 1rem 0;
+    }
+</style>
+""",
+    unsafe_allow_html=True,
+)
+# Global variables for models
+@st.cache_resource
+def load_vision_model():
+    """Load the pre-trained ResNet-50 vision sentiment model"""
+    try:
+        # Check if model file exists
+        model_path = "models/resnet50_model.pth"
+        if not os.path.exists(model_path):
+            st.error(f"❌ Vision model file not found at: {model_path}")
+            return None
+        # Load the model weights first to check the architecture
+        device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+        checkpoint = torch.load(model_path, map_location=device)
+        # Check the number of classes from the checkpoint
+        if "fc.weight" in checkpoint:
+            num_classes = checkpoint["fc.weight"].shape[0]
+            st.info(f"📊 Model checkpoint has {num_classes} output classes")
+        else:
+            # Fallback: try to infer from the last layer
+            num_classes = 3  # Default assumption
+            st.warning(
+                "⚠️ Could not determine number of classes from checkpoint, assuming 3"
+            )
+        # Initialize ResNet-50 model with the correct number of classes
+        # Note: Your model was trained with RGB images, so we keep 3 channels
+        model = models.resnet50(weights=None)  # Don't load ImageNet weights
+        num_ftrs = model.fc.in_features
+        model.fc = nn.Linear(num_ftrs, num_classes)  # Use actual number of classes
+        # Load trained weights
+        model.load_state_dict(checkpoint)
+        model.to(device)
+        model.eval()
+        st.success(f"✅ Vision model loaded successfully with {num_classes} classes!")
+        return model, device, num_classes
+    except Exception as e:
+        st.error(f"❌ Error loading vision model: {str(e)}")
+        return None, None, None
+@st.cache_data
+def get_vision_transforms():
+    """Get the image transforms used during FER2013 training"""
+    return transforms.Compose(
+        [
+            transforms.Resize(224),  # Match training: transforms.Resize(224)
+            transforms.CenterCrop(224),  # Match training: transforms.CenterCrop(224)
+            transforms.ToTensor(),
+            transforms.Normalize(
+                mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]
+            ),  # ImageNet normalization
+        ]
+    )
+def detect_and_preprocess_face(image, crop_tightness=0.05):
+    """
+    Detect face in image, crop to face region, convert to grayscale, and resize to 224x224
+    to match FER2013 dataset format (grayscale converted to 3-channel RGB)
+    Args:
+        image: Input image (PIL Image or numpy array)
+        crop_tightness: Padding around face (0.0 = no padding, 0.3 = 30% padding)
+    """
+    try:
+        import cv2
+        import numpy as np
+        # Convert PIL image to OpenCV format
+        if isinstance(image, Image.Image):
+            # Convert PIL to numpy array
+            img_array = np.array(image)
+            # Convert RGB to BGR for OpenCV
+            if len(img_array.shape) == 3:
+                img_array = cv2.cvtColor(img_array, cv2.COLOR_RGB2BGR)
+        else:
+            img_array = image
+        # Load face detection cascade
+        face_cascade = cv2.CascadeClassifier(
+            cv2.data.haarcascades + "haarcascade_frontalface_default.xml"
+        )
+        # Convert to grayscale for face detection (detection works better on grayscale)
+        gray = cv2.cvtColor(img_array, cv2.COLOR_BGR2GRAY)
+        # Detect faces
+        faces = face_cascade.detectMultiScale(
+            gray, scaleFactor=1.1, minNeighbors=5, minSize=(30, 30)
+        )
+        if len(faces) == 0:
+            st.warning("⚠️ No face detected in the image. Using center crop instead.")
+            # Fallback: center crop and resize
+            if isinstance(image, Image.Image):
+                # Convert to RGB first
+                rgb_pil = image.convert("RGB")
+                # Center crop to square
+                width, height = rgb_pil.size
+                size = min(width, height)
+                left = (width - size) // 2
+                top = (height - size) // 2
+                right = left + size
+                bottom = top + size
+                cropped = rgb_pil.crop((left, top, right, bottom))
+                # Resize to 224x224 (matching FER2013 training: transforms.Resize(224))
+                resized = cropped.resize((224, 224), Image.Resampling.LANCZOS)
+                # Convert to grayscale and then to 3-channel RGB
+                gray_pil = resized.convert("L")
+                # Convert back to RGB (this replicates grayscale values to all 3 channels)
+                gray_rgb_pil = gray_pil.convert("RGB")
+                return gray_rgb_pil
+            else:
+                return None
+        # Get the largest face (assuming it's the main subject)
+        x, y, w, h = max(faces, key=lambda rect: rect[2] * rect[3])
+        # Add padding around the face based on user preference
+        padding_x = int(w * crop_tightness)
+        padding_y = int(h * crop_tightness)
+        # Ensure we don't go out of bounds
+        x1 = max(0, x - padding_x)
+        y1 = max(0, y - padding_y)
+        x2 = min(img_array.shape[1], x + w + padding_x)
+        y2 = min(img_array.shape[0], y + h + padding_y)
+        # Crop to face region
+        face_crop = img_array[y1:y2, x1:x2]
+        # Convert BGR to RGB first
+        face_crop_rgb = cv2.cvtColor(face_crop, cv2.COLOR_BGR2RGB)
+        # Convert to grayscale
+        face_gray = cv2.cvtColor(face_crop_rgb, cv2.COLOR_RGB2GRAY)
+        # Resize to 224x224 (matching FER2013 training: transforms.Resize(224))
+        face_resized = cv2.resize(face_gray, (224, 224), interpolation=cv2.INTER_AREA)
+        # Convert grayscale to 3-channel RGB (replicate grayscale values)
+        face_rgb_3channel = cv2.cvtColor(face_resized, cv2.COLOR_GRAY2RGB)
+        # Convert back to PIL Image
+        face_pil = Image.fromarray(face_rgb_3channel)
+        return face_pil
+    except ImportError:
+        st.error(
+            "❌ OpenCV not installed. Please install it with: pip install opencv-python"
+        )
+        st.info("Falling back to basic preprocessing...")
+        # Fallback: basic grayscale conversion and resize
+        if isinstance(image, Image.Image):
+            rgb_pil = image.convert("RGB")
+            resized = rgb_pil.resize((48, 48), Image.Resampling.LANCZOS)
+            # Convert to grayscale and then to 3-channel RGB
+            gray_pil = resized.convert("L")
+            gray_rgb_pil = gray_pil.convert("RGB")
+            return gray_rgb_pil
+        return None
+    except Exception as e:
+        st.error(f"❌ Error in face detection: {str(e)}")
+        st.info("Falling back to basic preprocessing...")
+        # Fallback: basic grayscale conversion and resize
+        if isinstance(image, Image.Image):
+            rgb_pil = image.convert("RGB")
+            resized = rgb_pil.resize((48, 48), Image.Resampling.LANCZOS)
+            # Convert to grayscale and then to 3-channel RGB
+            gray_pil = resized.convert("L")
+            gray_rgb_pil = gray_pil.convert("RGB")
+            return gray_rgb_pil
+        return None
+def get_sentiment_mapping(num_classes):
+    """Get the sentiment mapping based on number of classes"""
+    if num_classes == 3:
+        return {0: "Negative", 1: "Neutral", 2: "Positive"}
+    elif num_classes == 4:
+        # Common 4-class emotion mapping
+        return {0: "Angry", 1: "Sad", 2: "Happy", 3: "Neutral"}
+    elif num_classes == 7:
+        # FER2013 7-class emotion mapping
+        return {
+            0: "Angry",
+            1: "Disgust",
+            2: "Fear",
+            3: "Happy",
+            4: "Sad",
+            5: "Surprise",
+            6: "Neutral",
+        }
+    else:
+        # Generic mapping for unknown number of classes
+        return {i: f"Class_{i}" for i in range(num_classes)}
+# Placeholder functions for model predictions
+def predict_text_sentiment(text):
+    """
+    Analyze text sentiment using TextBlob
+    """
+    if not text or text.strip() == "":
+        return "No text provided", 0.0
+    try:
+        from textblob import TextBlob
+        # Create TextBlob object
+        blob = TextBlob(text)
+        # Get polarity (-1 to 1, where -1 is very negative, 1 is very positive)
+        polarity = blob.sentiment.polarity
+        # Get subjectivity (0 to 1, where 0 is very objective, 1 is very subjective)
+        subjectivity = blob.sentiment.subjectivity
+        # Convert polarity to sentiment categories
+        if polarity > 0.1:
+            sentiment = "Positive"
+            confidence = min(0.95, 0.6 + abs(polarity) * 0.3)
+        elif polarity < -0.1:
+            sentiment = "Negative"
+            confidence = min(0.95, 0.6 + abs(polarity) * 0.3)
+        else:
+            sentiment = "Neutral"
+            confidence = 0.7 - abs(polarity) * 0.2
+        # Round confidence to 2 decimal places
+        confidence = round(confidence, 2)
+        return sentiment, confidence
+    except ImportError:
+        st.error(
+            "❌ TextBlob not installed. Please install it with: pip install textblob"
+        )
+        return "TextBlob not available", 0.0
+    except Exception as e:
+        st.error(f"❌ Error in text sentiment analysis: {str(e)}")
+        return "Error occurred", 0.0
+@st.cache_resource
+def load_audio_model():
+    """Load the pre-trained Wav2Vec2 audio sentiment model"""
+    try:
+        # Check if model file exists
+        model_path = "models/wav2vec2_model.pth"
+        if not os.path.exists(model_path):
+            st.error(f"❌ Audio model file not found at: {model_path}")
+            return None, None, None, None
+        # Load the model weights first to check the architecture
+        device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+        checkpoint = torch.load(model_path, map_location=device)
+        # Check the number of classes from the checkpoint
+        if "classifier.weight" in checkpoint:
+            num_classes = checkpoint["classifier.weight"].shape[0]
+            st.info(f"📊 Audio model checkpoint has {num_classes} output classes")
+        else:
+            num_classes = 3  # Default assumption
+            st.warning(
+                "⚠️ Could not determine number of classes from checkpoint, assuming 3"
+            )
+        # Initialize Wav2Vec2 model with the correct number of classes
+        from transformers import AutoModelForAudioClassification
+        model = AutoModelForAudioClassification.from_pretrained(
+            "facebook/wav2vec2-base", num_labels=num_classes
+        )
+        # Load trained weights
+        model.load_state_dict(checkpoint)
+        model.to(device)
+        model.eval()
+        # Load feature extractor
+        from transformers import AutoFeatureExtractor
+        feature_extractor = AutoFeatureExtractor.from_pretrained(
+            "facebook/wav2vec2-base"
+        )
+        st.success(f"✅ Audio model loaded successfully with {num_classes} classes!")
+        return model, device, num_classes, feature_extractor
+    except Exception as e:
+        st.error(f"❌ Error loading audio model: {str(e)}")
+        return None, None, None, None
+def predict_audio_sentiment(audio_bytes):
+    """
+    Analyze audio sentiment using fine-tuned Wav2Vec2 model
+    Preprocessing matches CREMA-D + RAVDESS training specifications:
+    - Target sampling rate: 16kHz
+    - Max duration: 5.0 seconds
+    - Feature extraction: AutoFeatureExtractor with max_length, truncation, padding
+    """
+    if audio_bytes is None:
+        return "No audio provided", 0.0
+    try:
+        # Load model if not already loaded
+        model, device, num_classes, feature_extractor = load_audio_model()
+        if model is None:
+            return "Model not loaded", 0.0
+        # Load and preprocess audio
+        import librosa
+        import io
+        import tempfile
+        # Save audio bytes to temporary file
+        with tempfile.NamedTemporaryFile(suffix=".wav", delete=False) as tmp_file:
+            tmp_file.write(audio_bytes)
+            tmp_file_path = tmp_file.name
+        try:
+            # Load audio with librosa
+            audio, sr = librosa.load(tmp_file_path, sr=None)
+            # Resample to 16kHz if needed
+            if sr != 16000:
+                audio = librosa.resample(y=audio, orig_sr=sr, target_sr=16000)
+            # Preprocess with feature extractor (matching CREMA-D + RAVDESS training exactly)
+            # From training: max_length=int(max_duration_s * TARGET_SAMPLING_RATE) = 5.0 * 16000
+            inputs = feature_extractor(
+                audio,
+                sampling_rate=16000,
+                max_length=int(5.0 * 16000),  # 5 seconds max (matching training)
+                truncation=True,
+                padding="max_length",
+                return_tensors="pt",
+            )
+            # Move to device
+            input_values = inputs.input_values.to(device)
+            # Run inference
+            with torch.no_grad():
+                outputs = model(input_values)
+                probabilities = torch.softmax(outputs.logits, dim=1)
+                confidence, predicted = torch.max(probabilities, 1)
+                # Get sentiment mapping based on number of classes
+                if num_classes == 3:
+                    sentiment_map = {0: "Negative", 1: "Neutral", 2: "Positive"}
+                else:
+                    # Generic mapping for unknown number of classes
+                    sentiment_map = {i: f"Class_{i}" for i in range(num_classes)}
+                sentiment = sentiment_map[predicted.item()]
+                confidence_score = confidence.item()
+            return sentiment, confidence_score
+        finally:
+            # Clean up temporary file
+            os.unlink(tmp_file_path)
+    except ImportError as e:
+        st.error(f"❌ Required library not installed: {str(e)}")
+        st.info("Please install: pip install librosa transformers")
+        return "Library not available", 0.0
+    except Exception as e:
+        st.error(f"❌ Error in audio sentiment prediction: {str(e)}")
+        return "Error occurred", 0.0
+def predict_vision_sentiment(image, crop_tightness=0.05):
+    """
+    Load ResNet-50 and run inference for vision sentiment analysis
+    Args:
+        image: Input image (PIL Image or numpy array)
+        crop_tightness: Padding around face (0.0 = no padding, 0.3 = 30% padding)
+    """
+    if image is None:
+        return "No image provided", 0.0
+    try:
+        # Load model if not already loaded
+        model, device, num_classes = load_vision_model()
+        if model is None:
+            return "Model not loaded", 0.0
+        # Preprocess image to match FER2013 format
+        st.info(
+            "🔍 Detecting face and preprocessing image to match training data format..."
+        )
+        preprocessed_image = detect_and_preprocess_face(image, crop_tightness=0.0)
+        if preprocessed_image is None:
+            return "Image preprocessing failed", 0.0
+        # Show preprocessed image
+        st.image(
+            preprocessed_image,
+            caption="Preprocessed Image (48x48 Grayscale → 3-channel RGB)",
+            width=200,
+        )
+        # Get transforms
+        transform = get_vision_transforms()
+        # Convert preprocessed image to tensor
+        image_tensor = transform(preprocessed_image).unsqueeze(0).to(device)
+        # Run inference
+        with torch.no_grad():
+            outputs = model(image_tensor)
+            # Debug: print output shape
+            st.info(f"🔍 Model output shape: {outputs.shape}")
+            probabilities = F.softmax(outputs, dim=1)
+            confidence, predicted = torch.max(probabilities, 1)
+            # Get sentiment mapping based on number of classes
+            sentiment_map = get_sentiment_mapping(num_classes)
+            sentiment = sentiment_map[predicted.item()]
+            confidence_score = confidence.item()
+        return sentiment, confidence_score
+    except Exception as e:
+        st.error(f"Error in vision sentiment prediction: {str(e)}")
+        st.error(
+            f"Model output shape mismatch. Expected {num_classes} classes but got different."
+        )
+        return "Error occurred", 0.0
+def predict_fused_sentiment(text=None, audio_bytes=None, image=None):
+    """
+    TODO: Implement ensemble/fusion logic combining all three models
+    This is a placeholder function for fused sentiment analysis
+    """
+    # Placeholder logic - replace with actual fusion implementation
+    results = []
+    if text:
+        text_sentiment, text_conf = predict_text_sentiment(text)
+        results.append((text_sentiment, text_conf))
+    if audio_bytes:
+        audio_sentiment, audio_conf = predict_audio_sentiment(audio_bytes)
+        results.append((audio_sentiment, audio_conf))
+    if image:
+        vision_sentiment, vision_conf = predict_vision_sentiment(image)
+        results.append((vision_sentiment, vision_conf))
+    if not results:
+        return "No inputs provided", 0.0
+    # Simple ensemble logic (replace with your fusion strategy)
+    sentiment_counts = {}
+    total_confidence = 0
+    for sentiment, confidence in results:
+        sentiment_counts[sentiment] = sentiment_counts.get(sentiment, 0) + 1
+        total_confidence += confidence
+    # Majority voting with confidence averaging
+    final_sentiment = max(sentiment_counts, key=sentiment_counts.get)
+    avg_confidence = total_confidence / len(results)
+    return final_sentiment, avg_confidence
+# Sidebar navigation
+st.sidebar.title("🧠 Sentiment Analysis")
+st.sidebar.markdown("---")
+# Navigation
+page = st.sidebar.selectbox(
+    "Choose a page:",
+    [
+        "🏠 Home",
+        "📝 Text Sentiment",
+        "🎵 Audio Sentiment",
+        "🖼️ Vision Sentiment",
+        "🔗 Fused Model",
+    ],
+)
+# Home Page
+if page == "🏠 Home":
+    st.markdown(
+        '<h1 class="main-header">Sentiment Analysis Testing Ground</h1>',
+        unsafe_allow_html=True,
+    )
+    st.markdown(
+        """
+    <div class="model-card">
+        <h2>Welcome to your Multi-Modal Sentiment Analysis Testing Platform!</h2>
+        <p>This application provides a comprehensive testing environment for your three independent sentiment analysis models:</p>
+    </div>
+    """,
+        unsafe_allow_html=True,
+    )
+    col1, col2, col3 = st.columns(3)
+    with col1:
+        st.markdown(
+            """
+        <div class="model-card">
+            <h3>📝 Text Sentiment Model</h3>
+            <p>✅ <strong>READY TO USE</strong> - Analyze sentiment from text input using TextBlob</p>
+                         <ul>
+                 <li>Process any text input</li>
+                 <li>Get sentiment classification (Positive/Negative/Neutral)</li>
+                 <li>View confidence scores</li>
+                 <li>Real-time NLP analysis</li>
+             </ul>
+        </div>
+        """,
+            unsafe_allow_html=True,
+        )
+    with col2:
+        st.markdown(
+            """
+        <div class="model-card">
+            <h3>🎵 Audio Sentiment Model</h3>
+            <p>✅ <strong>READY TO USE</strong> - Analyze sentiment from audio files using fine-tuned Wav2Vec2</p>
+                         <ul>
+                 <li>Upload audio files (.wav, .mp3, .m4a, .flac)</li>
+                 <li>🎙️ Record audio directly with microphone (max 5s)</li>
+                 <li>🔄 Automatic preprocessing: 16kHz sampling, 5s max duration (CREMA-D + RAVDESS format)</li>
+                 <li>Listen to uploaded/recorded audio</li>
+                 <li>Get sentiment predictions</li>
+                 <li>Real-time audio analysis</li>
+             </ul>
+        </div>
+        """,
+            unsafe_allow_html=True,
+        )
+    with col3:
+        st.markdown(
+            """
+        <div class="model-card">
+            <h3>🖼️ Vision Sentiment Model</h3>
+            <p>Analyze sentiment from images using fine-tuned ResNet-50</p>
+                         <ul>
+                 <li>Upload image files (.png, .jpg, .jpeg, .bmp, .tiff)</li>
+                 <li>🔄 Automatic face detection & preprocessing</li>
+                 <li>🎯 Fixed 0% padding for tightest face crop</li>
+                 <li>📏 Convert to 224x224 grayscale → 3-channel RGB (FER2013 format)</li>
+                 <li>🎯 Transforms: Resize(224) → CenterCrop(224) → ImageNet Normalization</li>
+                 <li>Preview original & preprocessed images</li>
+                 <li>Get sentiment predictions</li>
+             </ul>
+        </div>
+        """,
+            unsafe_allow_html=True,
+        )
+    st.markdown(
+        """
+    <div class="model-card">
+        <h3>🔗 Fused Model</h3>
+        <p>Combine predictions from all three models for enhanced accuracy</p>
+        <ul>
+            <li>Multi-modal input processing</li>
+            <li>Ensemble prediction strategies</li>
+            <li>Comprehensive sentiment analysis</li>
+        </ul>
+    </div>
+    """,
+        unsafe_allow_html=True,
+    )
+    st.markdown("---")
+    st.markdown(
+        """
+    <div style="text-align: center; color: #666;">
+        <p><strong>Note:</strong> This application now has <strong>ALL THREE MODELS</strong> fully integrated and ready to use! 🎉</p>
+        <p><strong>TextBlob</strong> (Text) + <strong>Wav2Vec2</strong> (Audio) + <strong>ResNet-50</strong> (Vision)</p>
+    </div>
+    """,
+        unsafe_allow_html=True,
+    )
+# Text Sentiment Page
+elif page == "📝 Text Sentiment":
+    st.title("📝 Text Sentiment Analysis")
+    st.markdown("Analyze the sentiment of your text using our TextBlob-based model.")
+    # Text input
+    text_input = st.text_area(
+        "Enter your text here:",
+        height=150,
+        placeholder="Type or paste your text here to analyze its sentiment...",
+    )
+    # Analyze button
+    if st.button("🔍 Analyze Sentiment", type="primary", use_container_width=True):
+        if text_input and text_input.strip():
+            with st.spinner("Analyzing text sentiment..."):
+                sentiment, confidence = predict_text_sentiment(text_input)
+                # Display results
+                st.markdown("### Results")
+                # Display results in columns
+                col1, col2 = st.columns(2)
+                with col1:
+                    st.metric("Sentiment", sentiment)
+                with col2:
+                    st.metric("Confidence", f"{confidence:.2f}")
+                # Color-coded sentiment display
+                sentiment_colors = {
+                    "Positive": "🟢",
+                    "Negative": "🔴",
+                    "Neutral": "🟡",
+                }
+                st.markdown(
+                    f"""
+                <div class="result-box">
+                    <h4>{sentiment_colors.get(sentiment, "❓")} Sentiment: {sentiment}</h4>
+                    <p><strong>Confidence:</strong> {confidence:.2f}</p>
+                    <p><strong>Input Text:</strong> "{text_input[:100]}{'...' if len(text_input) > 100 else ''}"</p>
+                    <p><strong>Model:</strong> TextBlob (Natural Language Processing)</p>
+                </div>
+                """,
+                    unsafe_allow_html=True,
+                )
+        else:
+            st.error("Please enter some text to analyze.")
+# Audio Sentiment Page
+elif page == "🎵 Audio Sentiment":
+    st.title("🎵 Audio Sentiment Analysis")
+    st.markdown(
+        "Analyze the sentiment of your audio files using our fine-tuned Wav2Vec2 model."
+    )
+    # Preprocessing information
+    st.info(
+        "ℹ️ **Audio Preprocessing**: Audio will be automatically processed to match CREMA-D + RAVDESS training format: "
+        "16kHz sampling rate, max 5 seconds, with automatic resampling and feature extraction."
+    )
+    # Model status
+    model, device, num_classes, feature_extractor = load_audio_model()
+    if model is None:
+        st.error("❌ Audio model could not be loaded. Please check the model file.")
+        st.info("Expected model file: `models/wav2vec2_model.pth`")
+    else:
+        st.success(
+            f"✅ Audio model loaded successfully on {device} with {num_classes} classes!"
+        )
+    # Input method selection
+    st.subheader("🎤 Choose Input Method")
+    input_method = st.radio(
+        "Select how you want to provide audio:",
+        ["📁 Upload Audio File", "🎙️ Record Audio"],
+        horizontal=True,
+    )
+    if input_method == "📁 Upload Audio File":
+        # File uploader
+        uploaded_audio = st.file_uploader(
+            "Choose an audio file",
+            type=["wav", "mp3", "m4a", "flac"],
+            help="Supported formats: WAV, MP3, M4A, FLAC",
+        )
+        audio_source = "uploaded_file"
+        audio_name = uploaded_audio.name if uploaded_audio else None
+    else:  # Audio recording
+        st.markdown(
+            """
+        <div class="model-card">
+            <h3>🎙️ Audio Recording</h3>
+            <p>Record audio directly with your microphone (max 5 seconds).</p>
+            <p><strong>Note:</strong> Make sure your microphone is accessible and you have permission to use it.</p>
+        </div>
+        """,
+            unsafe_allow_html=True,
+        )
+        # Audio recorder
+        recorded_audio = st.audio_input(
+            label="Click to start recording",
+            help="Click the microphone button to start/stop recording. Maximum recording time is 5 seconds.",
+        )
+        if recorded_audio is not None:
+            # Display recorded audio
+            st.audio(recorded_audio, format="audio/wav")
+            st.success("✅ Audio recorded successfully!")
+            # Convert recorded audio to bytes for processing
+            uploaded_audio = recorded_audio
+            audio_source = "recorded"
+            audio_name = "Recorded Audio"
+        else:
+            uploaded_audio = None
+            audio_source = None
+            audio_name = None
+    if uploaded_audio is not None:
+        # Display audio player
+        if audio_source == "recorded":
+            st.audio(uploaded_audio, format="audio/wav")
+            st.info(f"🎙️ {audio_name} | Source: Microphone Recording")
+        else:
+            st.audio(
+                uploaded_audio, format=f'audio/{uploaded_audio.name.split(".")[-1]}'
+            )
+            # File info for uploaded files
+            file_size = len(uploaded_audio.getvalue()) / 1024  # KB
+            st.info(f"📁 File: {uploaded_audio.name} | Size: {file_size:.1f} KB")
+        # Analyze button
+        if st.button(
+            "🔍 Analyze Audio Sentiment", type="primary", use_container_width=True
+        ):
+            if model is None:
+                st.error("❌ Model not loaded. Cannot analyze audio.")
+            else:
+                with st.spinner("Analyzing audio sentiment..."):
+                    audio_bytes = uploaded_audio.getvalue()
+                    sentiment, confidence = predict_audio_sentiment(audio_bytes)
+                # Display results
+                st.markdown("### Results")
+                col1, col2 = st.columns(2)
+                with col1:
+                    st.metric("Sentiment", sentiment)
+                with col2:
+                    st.metric("Confidence", f"{confidence:.2f}")
+                # Color-coded sentiment display
+                sentiment_colors = {"Positive": "🟢", "Negative": "🔴", "Neutral": "🟡"}
+                st.markdown(
+                    f"""
+                <div class="result-box">
+                    <h4>{sentiment_colors.get(sentiment, "❓")} Sentiment: {sentiment}</h4>
+                    <p><strong>Confidence:</strong> {confidence:.2f}</p>
+                    <p><strong>Audio Source:</strong> {audio_name}</p>
+                    <p><strong>Model:</strong> Wav2Vec2 (Fine-tuned on RAVDESS + CREMA-D)</p>
+                </div>
+                """,
+                    unsafe_allow_html=True,
+                )
+    else:
+        if input_method == "📁 Upload Audio File":
+            st.info("👆 Please upload an audio file to begin analysis.")
+        else:
+            st.info("🎙️ Click the microphone button above to record audio for analysis.")
+# Vision Sentiment Page
+elif page == "🖼️ Vision Sentiment":
+    st.title("🖼️ Vision Sentiment Analysis")
+    st.markdown(
+        "Analyze the sentiment of your images using our fine-tuned ResNet-50 model."
+    )
+    st.info(
+        "ℹ️ **Note**: Images will be automatically preprocessed to match FER2013 format: face detection, grayscale conversion, and 224x224 resize (converted to 3-channel RGB)."
+    )
+    # Face cropping is set to 0% (no padding) for tightest crop
+    st.info(
+        "🎯 **Face Cropping**: Set to 0% padding for tightest crop on facial features"
+    )
+    # Model status
+    model, device, num_classes = load_vision_model()
+    if model is None:
+        st.error("❌ Vision model could not be loaded. Please check the model file.")
+        st.info("Expected model file: `models/resnet50_model.pth`")
+    else:
+        st.success(
+            f"✅ Vision model loaded successfully on {device} with {num_classes} classes!"
+        )
+    # Input method selection
+    st.subheader("📸 Choose Input Method")
+    input_method = st.radio(
+        "Select how you want to provide an image:",
+        ["📁 Upload Image File", "📷 Take Photo with Camera"],
+        horizontal=True,
+    )
+    if input_method == "📁 Upload Image File":
+        # File uploader
+        uploaded_image = st.file_uploader(
+            "Choose an image file",
+            type=["png", "jpg", "jpeg", "bmp", "tiff"],
+            help="Supported formats: PNG, JPG, JPEG, BMP, TIFF",
+        )
+        if uploaded_image is not None:
+            # Display image
+            image = Image.open(uploaded_image)
+            st.image(
+                image,
+                caption=f"Uploaded Image: {uploaded_image.name}",
+                use_container_width=True,
+            )
+            # File info
+            file_size = len(uploaded_image.getvalue()) / 1024  # KB
+            st.info(
+                f"📁 File: {uploaded_image.name} | Size: {file_size:.1f} KB | Dimensions: {image.size[0]}x{image.size[1]}"
+            )
+            # Analyze button
+            if st.button(
+                "🔍 Analyze Image Sentiment", type="primary", use_container_width=True
+            ):
+                if model is None:
+                    st.error("❌ Model not loaded. Cannot analyze image.")
+                else:
+                    with st.spinner("Analyzing image sentiment..."):
+                        sentiment, confidence = predict_vision_sentiment(image)
+                        # Display results
+                        st.markdown("### Results")
+                        col1, col2 = st.columns(2)
+                        with col1:
+                            st.metric("Sentiment", sentiment)
+                        with col2:
+                            st.metric("Confidence", f"{confidence:.2f}")
+                        # Color-coded sentiment display
+                        sentiment_colors = {
+                            "Positive": "🟢",
+                            "Negative": "🔴",
+                            "Neutral": "🟡",
+                        }
+                        st.markdown(
+                            f"""
+                        <div class="result-box">
+                            <h4>{sentiment_colors.get(sentiment, "❓")} Sentiment: {sentiment}</h4>
+                            <p><strong>Confidence:</strong> {confidence:.2f}</p>
+                            <p><strong>Image File:</strong> {uploaded_image.name}</p>
+                            <p><strong>Model:</strong> ResNet-50 (Fine-tuned on FER2013)</p>
+                        </div>
+                        """,
+                            unsafe_allow_html=True,
+                        )
+    else:  # Camera capture
+        st.markdown(
+            """
+        <div class="model-card">
+            <h3>📷 Camera Capture</h3>
+            <p>Take a photo directly with your camera to analyze its sentiment.</p>
+            <p><strong>Note:</strong> Make sure your camera is accessible and you have permission to use it.</p>
+        </div>
+        """,
+            unsafe_allow_html=True,
+        )
+        # Camera input
+        camera_photo = st.camera_input(
+            "Take a photo",
+            help="Click the camera button to take a photo, or use the upload button to select an existing photo",
+        )
+        if camera_photo is not None:
+            # Display captured image
+            image = Image.open(camera_photo)
+            st.image(
+                image,
+                caption="Captured Photo",
+                use_container_width=True,
+            )
+            # Image info
+            st.info(
+                f"📷 Captured Photo | Dimensions: {image.size[0]}x{image.size[1]} | Format: {image.format}"
+            )
+            # Analyze button
+            if st.button(
+                "🔍 Analyze Photo Sentiment", type="primary", use_container_width=True
+            ):
+                if model is None:
+                    st.error("❌ Model not loaded. Cannot analyze image.")
+                else:
+                    with st.spinner("Analyzing photo sentiment..."):
+                        sentiment, confidence = predict_vision_sentiment(image)
+                        # Display results
+                        st.markdown("### Results")
+                        col1, col2 = st.columns(2)
+                        with col1:
+                            st.metric("Sentiment", sentiment)
+                        with col2:
+                            st.metric("Confidence", f"{confidence:.2f}")
+                        # Color-coded sentiment display
+                        sentiment_colors = {
+                            "Positive": "🟢",
+                            "Negative": "🔴",
+                            "Neutral": "🟡",
+                        }
+                        st.markdown(
+                            f"""
+                        <div class="result-box">
+                            <h4>{sentiment_colors.get(sentiment, "❓")} Sentiment: {sentiment}</h4>
+                            <p><strong>Confidence:</strong> {confidence:.2f}</p>
+                            <p><strong>Image Source:</strong> Camera Capture</p>
+                            <p><strong>Model:</strong> ResNet-50 (Fine-tuned on FER2013)</p>
+                        </div>
+                        """,
+                            unsafe_allow_html=True,
+                        )
+    # Show info if no image is provided
+    if input_method == "📁 Upload Image File" and "uploaded_image" not in locals():
+        st.info("👆 Please upload an image file to begin analysis.")
+    elif input_method == "📷 Take Photo with Camera" and "camera_photo" not in locals():
+        st.info("📷 Click the camera button above to take a photo for analysis.")
+# Fused Model Page
+elif page == "🔗 Fused Model":
+    st.title("🔗 Fused Model Analysis")
+    st.markdown(
+        "Combine predictions from all three models for enhanced sentiment analysis."
+    )
+    st.markdown(
+        """
+    <div class="model-card">
+        <h3>Multi-Modal Sentiment Analysis</h3>
+        <p>This page allows you to input text, audio, and/or image data to get a comprehensive sentiment analysis
+        using all three models combined.</p>
+    </div>
+    """,
+        unsafe_allow_html=True,
+    )
+    # Input sections
+    col1, col2 = st.columns(2)
+    with col1:
+        st.subheader("📝 Text Input")
+        text_input = st.text_area(
+            "Enter text (optional):",
+            height=100,
+            placeholder="Type or paste your text here...",
+        )
+        st.subheader("🎵 Audio Input")
+        # Audio preprocessing information for fused model
+        st.info(
+            "ℹ️ **Audio Preprocessing**: Audio will be automatically processed to match CREMA-D + RAVDESS training format: "
+            "16kHz sampling rate, max 5 seconds, with automatic resampling and feature extraction."
+        )
+        # Audio input method for fused model
+        audio_input_method = st.radio(
+            "Audio input method:",
+            ["📁 Upload File", "🎙️ Record Audio"],
+            key="fused_audio_method",
+            horizontal=True,
+        )
+        if audio_input_method == "📁 Upload File":
+            uploaded_audio = st.file_uploader(
+                "Upload audio file (optional):",
+                type=["wav", "mp3", "m4a", "flac"],
+                key="fused_audio",
+            )
+            audio_source = "uploaded_file"
+            audio_name = uploaded_audio.name if uploaded_audio else None
+        else:
+            # Audio recorder for fused model
+            recorded_audio = st.audio_input(
+                label="Record audio (optional):",
+                key="fused_audio_recorder",
+                help="Click to record audio for sentiment analysis",
+            )
+            if recorded_audio is not None:
+                st.audio(recorded_audio, format="audio/wav")
+                st.success("✅ Audio recorded successfully!")
+                uploaded_audio = recorded_audio
+                audio_source = "recorded"
+                audio_name = "Recorded Audio"
+            else:
+                uploaded_audio = None
+                audio_source = None
+                audio_name = None
+    with col2:
+        st.subheader("🖼️ Image Input")
+        # Face cropping is set to 0% (no padding) for tightest crop
+        st.info(
+            "🎯 **Face Cropping**: Set to 0% padding for tightest crop on facial features"
+        )
+        # Image input method for fused model
+        image_input_method = st.radio(
+            "Image input method:",
+            ["📁 Upload File", "📷 Take Photo"],
+            key="fused_image_method",
+            horizontal=True,
+        )
+        if image_input_method == "📁 Upload File":
+            uploaded_image = st.file_uploader(
+                "Upload image file (optional):",
+                type=["png", "jpg", "jpeg", "bmp", "tiff"],
+                key="fused_image",
+            )
+            if uploaded_image:
+                image = Image.open(uploaded_image)
+                st.image(image, caption="Uploaded Image", use_container_width=True)
+        else:
+            # Camera capture for fused model
+            camera_photo = st.camera_input(
+                "Take a photo (optional):",
+                key="fused_camera",
+                help="Click to take a photo for sentiment analysis",
+            )
+            if camera_photo:
+                image = Image.open(camera_photo)
+                st.image(image, caption="Captured Photo", use_container_width=True)
+                # Set uploaded_image to camera_photo for processing
+                uploaded_image = camera_photo
+        if uploaded_audio:
+            st.audio(
+                uploaded_audio, format=f'audio/{uploaded_audio.name.split(".")[-1]}'
+            )
+    # Analyze button
+    if st.button("🔍 Run Fused Analysis", type="primary", use_container_width=True):
+        if text_input or uploaded_audio or uploaded_image:
+            with st.spinner("Running fused sentiment analysis..."):
+                # Prepare inputs
+                audio_bytes = uploaded_audio.getvalue() if uploaded_audio else None
+                image = Image.open(uploaded_image) if uploaded_image else None
+                # Get fused prediction
+                sentiment, confidence = predict_fused_sentiment(
+                    text=text_input if text_input else None,
+                    audio_bytes=audio_bytes,
+                    image=image,
+                )
+                # Display results
+                st.markdown("### Fused Model Results")
+                col1, col2 = st.columns(2)
+                with col1:
+                    st.metric("Final Sentiment", sentiment)
+                with col2:
+                    st.metric("Overall Confidence", f"{confidence:.2f}")
+                # Show individual model results
+                st.markdown("### Individual Model Results")
+                results_data = []
+                if text_input:
+                    text_sentiment, text_conf = predict_text_sentiment(text_input)
+                    results_data.append(
+                        {
+                            "Model": "Text (TextBlob) ✅",
+                            "Input": f"Text: {text_input[:50]}...",
+                            "Sentiment": text_sentiment,
+                            "Confidence": f"{text_conf:.2f}",
+                        }
+                    )
+                if uploaded_audio:
+                    audio_sentiment, audio_conf = predict_audio_sentiment(audio_bytes)
+                    results_data.append(
+                        {
+                            "Model": "Audio (Wav2Vec2) ✅",
+                            "Input": f"Audio: {audio_name}",
+                            "Sentiment": audio_sentiment,
+                            "Confidence": f"{audio_conf:.2f}",
+                        }
+                    )
+                if uploaded_image:
+                    # Face cropping is set to 0% (no padding) for tightest crop
+                    vision_sentiment, vision_conf = predict_vision_sentiment(
+                        image, crop_tightness=0.0
+                    )
+                    results_data.append(
+                        {
+                            "Model": "Vision (ResNet-50)",
+                            "Input": f"Image: {uploaded_image.name}",
+                            "Sentiment": vision_sentiment,
+                            "Confidence": f"{vision_conf:.2f}",
+                        }
+                    )
+                if results_data:
+                    df = pd.DataFrame(results_data)
+                    st.dataframe(df, use_container_width=True)
+                # Final result display
+                sentiment_colors = {"Positive": "🟢", "Negative": "🔴", "Neutral": "🟡"}
+                st.markdown(
+                    f"""
+                <div class="result-box">
+                    <h4>{sentiment_colors.get(sentiment, "❓")} Final Fused Sentiment: {sentiment}</h4>
+                    <p><strong>Overall Confidence:</strong> {confidence:.2f}</p>
+                    <p><strong>Models Used:</strong> {len(results_data)}</p>
+                </div>
+                """,
+                    unsafe_allow_html=True,
+                )
+        else:
+            st.warning(
+                "⚠️ Please provide at least one input (text, audio, or image) for fused analysis."
+            )
+# Footer
+st.markdown("---")
+st.markdown(
+    """
+<div style="text-align: center; color: #666; padding: 1rem;">
+    <p>Built with ❤️ | by <a href="https://github.com/iamfaham">iamfaham</a></p>
+</div>
+""",
+    unsafe_allow_html=True,
+)

models/audio_sentiment_analysis.ipynb ADDED Viewed

The diff for this file is too large to render. See raw diff

models/vision_sentiment_analysis.ipynb ADDED Viewed

The diff for this file is too large to render. See raw diff

pyproject.toml ADDED Viewed

	@@ -0,0 +1,7 @@

+[project]
+name = "sentiment-fused"
+version = "0.1.0"
+description = "Add your description here"
+readme = "README.md"
+requires-python = ">=3.9"
+dependencies = []

requirements.txt ADDED Viewed

	@@ -0,0 +1,12 @@

+streamlit>=1.28.0
+pandas>=1.5.0
+Pillow>=9.0.0
+numpy>=1.21.0
+textblob>=0.17.0
+torch>=1.13.0
+torchvision>=0.14.0
+transformers>=4.21.0
+librosa>=0.9.0
+soundfile>=0.12.0
+opencv-python>=4.5.0
+accelerate>=0.20.0

run_app.py ADDED Viewed

	@@ -0,0 +1,65 @@

+#!/usr/bin/env python3
+"""
+Startup script for the Sentiment Analysis Testing Ground Streamlit application.
+This script provides an easy way to launch the application with proper configuration.
+"""
+import subprocess
+import sys
+import os
+def main():
+    """Main function to start the Streamlit application."""
+    print("🧠 Starting Sentiment Analysis Testing Ground...")
+    print("=" * 50)
+    # Check if app.py exists
+    if not os.path.exists("app.py"):
+        print("❌ Error: app.py not found in current directory!")
+        print("Please make sure you're in the correct directory.")
+        sys.exit(1)
+    # Check if requirements are installed
+    try:
+        import streamlit
+        import pandas
+        import PIL
+        print("✅ Dependencies check passed")
+    except ImportError as e:
+        print(f"❌ Missing dependency: {e}")
+        print("Please install requirements: pip install -r requirements.txt")
+        sys.exit(1)
+    print("🚀 Launching Streamlit application...")
+    print("📱 The app will open in your default browser")
+    print("🔗 If it doesn't open automatically, go to: http://localhost:8501")
+    print("⏹️  Press Ctrl+C to stop the application")
+    print("=" * 50)
+    try:
+        # Start Streamlit with the app
+        subprocess.run(
+            [
+                sys.executable,
+                "-m",
+                "streamlit",
+                "run",
+                "app.py",
+                "--server.headless",
+                "false",
+                "--server.port",
+                "8501",
+            ]
+        )
+    except KeyboardInterrupt:
+        print("\n👋 Application stopped by user")
+    except Exception as e:
+        print(f"❌ Error starting application: {e}")
+        sys.exit(1)
+if __name__ == "__main__":
+    main()

test_audio_model.py ADDED Viewed

	@@ -0,0 +1,173 @@

+#!/usr/bin/env python3
+"""
+Test script for the Wav2Vec2 audio sentiment analysis model
+"""
+import os
+import torch
+import numpy as np
+import librosa
+from transformers import AutoFeatureExtractor, AutoModelForAudioClassification
+import tempfile
+def test_audio_model():
+    """Test the audio model loading and inference"""
+    print("🔊 Testing Wav2Vec2 Audio Sentiment Model")
+    print("=" * 50)
+    # Check if model file exists
+    model_path = "models/wav2vec2_model.pth"
+    if not os.path.exists(model_path):
+        print(f"❌ Audio model file not found at: {model_path}")
+        return False
+    print(f"✅ Found model file: {model_path}")
+    try:
+        # Set device
+        device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+        print(f"🖥️  Using device: {device}")
+        # Load the model checkpoint to check architecture
+        checkpoint = torch.load(model_path, map_location=device)
+        print(f"📊 Checkpoint keys: {list(checkpoint.keys())}")
+        # Check for classifier weights
+        if "classifier.weight" in checkpoint:
+            num_classes = checkpoint["classifier.weight"].shape[0]
+            print(f"📊 Model has {num_classes} output classes")
+        else:
+            print("⚠️  Could not determine number of classes from checkpoint")
+            num_classes = 3  # Default assumption
+        # Initialize model
+        print("🔄 Initializing Wav2Vec2 model...")
+        model_checkpoint = "facebook/wav2vec2-base"
+        model = AutoModelForAudioClassification.from_pretrained(
+            model_checkpoint, num_labels=num_classes
+        )
+        # Load trained weights
+        print("🔄 Loading trained weights...")
+        model.load_state_dict(checkpoint)
+        model.to(device)
+        model.eval()
+        print("✅ Model loaded successfully!")
+        # Test with dummy audio
+        print("🧪 Testing inference with dummy audio...")
+        # Create dummy audio (1 second of random noise at 16kHz)
+        dummy_audio = np.random.randn(16000).astype(np.float32)
+        # Load feature extractor
+        feature_extractor = AutoFeatureExtractor.from_pretrained(model_checkpoint)
+        # Preprocess audio
+        inputs = feature_extractor(
+            dummy_audio,
+            sampling_rate=16000,
+            max_length=80000,  # 5 seconds * 16000 Hz
+            truncation=True,
+            padding="max_length",
+            return_tensors="pt",
+        )
+        # Move to device
+        input_values = inputs.input_values.to(device)
+        # Run inference
+        with torch.no_grad():
+            outputs = model(input_values)
+            probabilities = torch.softmax(outputs.logits, dim=1)
+            confidence, predicted = torch.max(probabilities, 1)
+            print(f"🔍 Model output shape: {outputs.logits.shape}")
+            print(f"🎯 Predicted class: {predicted.item()}")
+            print(f"📊 Confidence: {confidence.item():.3f}")
+            print(f"📈 All probabilities: {probabilities.squeeze().cpu().numpy()}")
+        # Sentiment mapping
+        sentiment_map = {0: "Negative", 1: "Neutral", 2: "Positive"}
+        predicted_sentiment = sentiment_map.get(
+            predicted.item(), f"Class_{predicted.item()}"
+        )
+        print(f"😊 Predicted sentiment: {predicted_sentiment}")
+        print("✅ Audio model test completed successfully!")
+        return True
+    except Exception as e:
+        print(f"❌ Error testing audio model: {str(e)}")
+        import traceback
+        traceback.print_exc()
+        return False
+def check_audio_model_file():
+    """Check the audio model file details"""
+    print("\n🔍 Audio Model File Analysis")
+    print("=" * 30)
+    model_path = "models/wav2vec2_model.pth"
+    if not os.path.exists(model_path):
+        print(f"❌ Model file not found: {model_path}")
+        return
+    # File size
+    file_size = os.path.getsize(model_path) / (1024 * 1024)  # MB
+    print(f"📁 File size: {file_size:.1f} MB")
+    try:
+        # Load checkpoint
+        device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+        checkpoint = torch.load(model_path, map_location=device)
+        print(f"🔑 Checkpoint keys ({len(checkpoint)} total):")
+        for key, value in checkpoint.items():
+            if isinstance(value, torch.Tensor):
+                print(f"  - {key}: {value.shape} ({value.dtype})")
+            else:
+                print(f"  - {key}: {type(value)}")
+        # Check classifier
+        if "classifier.weight" in checkpoint:
+            num_classes = checkpoint["classifier.weight"].shape[0]
+            print(f"\n🎯 Classifier output classes: {num_classes}")
+            print(
+                f"📊 Classifier weight shape: {checkpoint['classifier.weight'].shape}"
+            )
+            if "classifier.bias" in checkpoint:
+                print(
+                    f"📊 Classifier bias shape: {checkpoint['classifier.bias'].shape}"
+                )
+        # Check wav2vec2 base model
+        if "wav2vec2.feature_extractor.conv_layers.0.conv.weight" in checkpoint:
+            print(f"🔊 Wav2Vec2 base model: Present")
+    except Exception as e:
+        print(f"❌ Error analyzing checkpoint: {str(e)}")
+if __name__ == "__main__":
+    print("🚀 Starting Wav2Vec2 Audio Model Tests")
+    print("=" * 60)
+    # Check model file
+    check_audio_model_file()
+    print("\n" + "=" * 60)
+    # Test model loading and inference
+    success = test_audio_model()
+    if success:
+        print("\n🎉 All audio model tests passed!")
+    else:
+        print("\n💥 Audio model tests failed!")

test_vision_model.py ADDED Viewed

	@@ -0,0 +1,136 @@

+#!/usr/bin/env python3
+"""
+Test script for the vision sentiment analysis model.
+This script verifies that the ResNet-50 model can be loaded and run inference.
+"""
+import os
+import sys
+import torch
+import torch.nn as nn
+from torchvision import transforms, models
+from PIL import Image
+import numpy as np
+def get_sentiment_mapping(num_classes):
+    """Get the sentiment mapping based on number of classes"""
+    if num_classes == 3:
+        return {0: "Negative", 1: "Neutral", 2: "Positive"}
+    elif num_classes == 4:
+        # Common 4-class emotion mapping
+        return {0: "Angry", 1: "Sad", 2: "Happy", 3: "Neutral"}
+    elif num_classes == 7:
+        # FER2013 7-class emotion mapping
+        return {0: "Angry", 1: "Disgust", 2: "Fear", 3: "Happy", 4: "Sad", 5: "Surprise", 6: "Neutral"}
+    else:
+        # Generic mapping for unknown number of classes
+        return {i: f"Class_{i}" for i in range(num_classes)}
+def test_vision_model():
+    """Test the vision sentiment analysis model"""
+    print("🧠 Testing Vision Sentiment Analysis Model")
+    print("=" * 50)
+    # Check if model file exists
+    model_path = "models/resnet50_model.pth"
+    if not os.path.exists(model_path):
+        print(f"❌ Model file not found: {model_path}")
+        print("Please ensure the model file exists in the models/ directory")
+        return False
+    print(f"✅ Model file found: {model_path}")
+    try:
+        # Load the model weights first to check the architecture
+        print("📥 Loading model checkpoint...")
+        device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+        checkpoint = torch.load(model_path, map_location=device)
+        # Check the number of classes from the checkpoint
+        if 'fc.weight' in checkpoint:
+            num_classes = checkpoint['fc.weight'].shape[0]
+            print(f"📊 Model checkpoint has {num_classes} output classes")
+        else:
+            # Fallback: try to infer from the last layer
+            num_classes = 3  # Default assumption
+            print("⚠️ Could not determine number of classes from checkpoint, assuming 3")
+        # Initialize ResNet-50 model with the correct number of classes
+        print("🔧 Initializing ResNet-50 model...")
+        model = models.resnet50(weights=None)  # Don't load ImageNet weights
+        num_ftrs = model.fc.in_features
+        model.fc = nn.Linear(num_ftrs, num_classes)  # Use actual number of classes
+        print(f"📥 Loading trained weights for {num_classes} classes...")
+        model.load_state_dict(checkpoint)
+        model.to(device)
+        model.eval()
+        print(f"✅ Model loaded successfully with {num_classes} classes!")
+        print(f"🖥️  Using device: {device}")
+        # Test with a dummy image
+        print("🧪 Testing inference with dummy image...")
+        # Create a dummy image (224x224 RGB)
+        dummy_image = Image.fromarray(
+            np.random.randint(0, 255, (224, 224, 3), dtype=np.uint8)
+        )
+        # Apply transforms
+        transform = transforms.Compose(
+            [
+                transforms.Resize(224),
+                transforms.CenterCrop(224),
+                transforms.ToTensor(),
+                transforms.Normalize(
+                    mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]
+                ),
+            ]
+        )
+        image_tensor = transform(dummy_image).unsqueeze(0).to(device)
+        # Run inference
+        with torch.no_grad():
+            outputs = model(image_tensor)
+            print(f"🔍 Model output shape: {outputs.shape}")
+            probabilities = torch.nn.functional.softmax(outputs, dim=1)
+            confidence, predicted = torch.max(probabilities, 1)
+            # Get sentiment mapping based on number of classes
+            sentiment_map = get_sentiment_mapping(num_classes)
+            sentiment = sentiment_map[predicted.item()]
+            confidence_score = confidence.item()
+        print(f"🎯 Test prediction: {sentiment} (confidence: {confidence_score:.3f})")
+        print(f"📋 Available classes: {list(sentiment_map.values())}")
+        print("✅ Inference test passed!")
+        return True
+    except Exception as e:
+        print(f"❌ Error testing model: {str(e)}")
+        import traceback
+        traceback.print_exc()
+        return False
+def main():
+    """Main function"""
+    success = test_vision_model()
+    if success:
+        print("\n🎉 All tests passed! The vision model is ready to use.")
+        print("You can now run the Streamlit app with: streamlit run app.py")
+    else:
+        print("\n💥 Tests failed. Please check the error messages above.")
+        sys.exit(1)
+if __name__ == "__main__":
+    main()

uv.lock ADDED Viewed

	@@ -0,0 +1,8 @@

+version = 1
+revision = 2
+requires-python = ">=3.9"
+[[package]]
+name = "sentiment-fused"
+version = "0.1.0"
+source = { virtual = "." }