AI tool to connect Redis OSS source code, benchmarks, and profiling data. Includes GitHub PR performance analyzer, function analysis service, and profiling analysis system.
Project description
RedOpt
AI tool to connect source code, benchmarks, and profiling data. Includes a GitHub PR performance analyzer, Function Analysis Service, and Profiling Analysis System.
Features
- ๐ค OpenAI Agent Integration: Built using OpenAI's Agent framework
- ๐ Performance Analysis: Analyzes GitHub PRs for performance improvements and regressions
- ๐ Redis Focus: Specifically designed for Redis performance analysis
- โ๏ธ Simple Configuration: Easy setup with environment variables
- ๐ง Function Analysis Service: Complete C/C++ function analysis with LLVM/Clang AST parsing, graph embeddings, call tree analysis, and semantic similarity search
- ๐ฅ Real pprof Profiling Analysis: Parse binary pprof files with precise CPU percentages, function-level performance data, and command impact analysis
- ๐ฌ Single Question Mode: Ask performance questions directly via CLI:
redopt chat "question" - ๐๏ธ SQLite Performance Database: Lightweight, efficient storage for profiling data with flat/cumulative CPU percentages
- ๐ Multiple Interfaces: AI agent tools, REST API, and interactive chat interface
- ๐ข Slack Notifications: Automatic alerts for significant performance impacts via webhook integration
Architecture Overview
Redis Code Analyzer Architecture
Input Sources Processing Pipeline User Interfaces
โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ
โ Project โโโindexโโโโถโ โ โ ๐ค AI Chat โ
โ Source Code โ โ LLVM/Clang โโโโโโโโโ Agent โ
โโโโโโโโโโโโโโโ โ Parser โ โโโโโโโโโโโโโโโ
โ โ โ
โโโโโโโโโโโโโโโ โ โผ โ โโโโโโโโโโโโโโโ
โ C/C++ โโanalyzeโโโถโ Graph Converter โโโโโโโโโ ๐ REST API โ
โ Functions โ โ โ โ โ Server โ
โโโโโโโโโโโโโโโ โ โผ โ โโโโโโโโโโโโโโโ
โ Graph2Vec โ
โโโโโโโโโโโโโโโ โ Encoder โ โโโโโโโโโโโโโโโ
โ GitHub โโโPRโโโโโโโถโ โ โโโโโโโโโ ๐ GitHub โ
โ PR URLs โ โ โผ โ โ PR Analyzer โ
โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโโ โโโโโโโโโโโผโโโโโโโโโโ โโโโโโโโโโโโโโโ
โ Benchmark โโโparseโโโโถโ โโโโโโโโโโโ ๐ฅ Profilingโ
โ YAML Files โ โ Perf Script โ โ Analysis โ
โโโโโโโโโโโโโโโ โ Parser โ โโโโโโโโโโโโโโโ
โ โ โ
โโโโโโโโโโโโโโโ โ โผ โ โโโโโโโโโโโโโโโ
โ Perf Script โโprofileโโโถโ Hotspot Analysis โโโโโโโโโโโ ๐ Command โ
โ Data โ โ & Command โ โ Group โ
โโโโโโโโโโโโโโโ โ Group Mapping โ โ Mapping โ
โ โ โ
โ โผ โ
โโโโโโโโโโโโโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโ
โ Redis Database โ
โ โโโโโโโโโโโโโโโ โ
โ โ Functions โ โ
โ โ Embeddings โ โ
โ โ Metadata โ โ
โ โ Profiles โ โ
โ โ Hotspots โ โ
โ โโโโโโโโโโโโโโโ โ
โ โ โ
โ โผ โ
โ Vector Search โ
โ (Similarity) โ
โ RediSearch โ
โ (Profiling) โ
โโโโโโโโโโโโโโโโโโโโโโโ
Flow 1: Source Code โ Clang AST โ Graphs โ Embeddings โ Redis โ Search
Flow 2: GitHub PR โ Analysis โ Performance Impact โ Results
Flow 3: Benchmark YAML + Perf Data โ Hotspots โ Command Mapping โ Redis
Data Flow Explanation
-
๐ฅ Input Processing:
- Codebase Indexing: Redis source code โ LLVM/Clang parser
- Interactive Analysis: User provides C/C++ functions โ Direct analysis
- PR Analysis: GitHub URLs โ Performance-focused analysis
- Profiling Analysis: Benchmark YAML + Perf script data โ Hotspot analysis
- Chat Queries: Natural language โ AI agent processing
-
โ๏ธ Core Analysis Pipelines:
Function Analysis Pipeline:
- AST Generation: LLVM/Clang extracts Abstract Syntax Trees
- Graph Conversion: ASTs transformed to NetworkX graph structures
- Vector Encoding: Graph2Vec generates semantic embeddings
- Complexity Analysis: Cyclomatic complexity calculation
Profiling Analysis Pipeline:
- Benchmark Parsing: YAML benchmark definitions โ structured metadata
- Perf Script Parsing: Collapsed stack format โ call stacks and hotspots
- Command Mapping: Functions โ Redis commands โ command groups
- Hotspot Analysis: Sample counts โ percentage coverage โ performance impact
-
๐๏ธ Storage & Retrieval:
- Redis Database: Stores function metadata, graphs, embeddings, profiles, and hotspots
- Vector Search: Cosine similarity for semantic function matching
- RediSearch: Full-text and structured search for profiling data
- JSON Backup: File-based storage for offline analysis
-
๐ User Interfaces:
- AI Chat Agent: Conversational interface with function analysis and profiling tools
- REST API: Programmatic access for integration
- PR Analyzer: Specialized GitHub PR performance analysis
- Profiling Dashboard: Command group performance analysis
Key Components
๐ง Function Analysis Tools
analyze_function_code(): Parse and analyze C/C++ functions using LLVM/Clangfind_similar_functions_by_id(): Semantic similarity search using Graph2Vec embeddingssearch_functions_by_name(): Name-based function lookup in the pre-indexed Redis databasefind_function_callers()/find_function_callees(): Call tree analysis for function relationshipsfind_redis_commands_using_function(): Map functions to Redis commandscheck_function_database_status(): System health and statistics
๐ฅ Profiling Analysis Tools
get_function_performance_hotspots(): Find performance hotspots for specific functionsget_commands_affected_by_function(): See which Redis commands use a functionget_top_performance_hotspots(): Get overall performance hotspots across all benchmarksget_hotspots_by_command_group(): Get hotspots for specific command groups (sorted-set, string, etc.)search_performance_functions(): Search functions in performance profiling dataget_profiling_database_status(): Check profiling database status and statistics
๐ Processing Pipeline
- LLVM/Clang Parser: Industry-standard AST extraction from C/C++ code
- Graph Converter: Transforms ASTs into NetworkX graph structures
- Graph2Vec Encoder: Generates vector embeddings for semantic similarity
- Perf Script Parser: Parses collapsed stack format from perf tools
- Hotspot Analyzer: Maps call stacks to function performance metrics
- Command Group Mapper: Links functions to Redis commands and command groups
- Deduplication: Filters function declarations, keeps only implementations
๐๏ธ Storage Layer
- Redis Database: High-performance storage for functions, embeddings, and profiling data
- Vector Search: Cosine similarity search on function embeddings
- RediSearch: Full-text search on profiling data with command group indexing
- JSON Files: Backup storage and offline analysis capability
๐ค AI Integration
- Conversational Agent: Natural language interface for code and performance analysis
- Function Tools: Automated function analysis and similarity search
- Profiling Tools: Performance hotspot analysis and command group insights
- Context Awareness: Maintains conversation history and context
Workflow Examples
๐ Indexing a Codebase
redopt index --source ~/redis/src --output ./functions
Flow: Redis Source โ Clang Parser โ Graph Converter โ Graph2Vec โ Redis Database
๐ฅ Indexing Profiling Data
Index real Redis performance profiling data from pprof files:
# Index profiling data with pprof support
redopt profile-index \
--benchmark sample-inputs/benchmarks/memtier_benchmark-1Mkeys-generic-scan-count-500-pipeline-10.yml \
--pprof sample-inputs/pprof/generic-scan-count-500-pipeline-10.pb.gz
# This will:
# โ
Parse benchmark metadata (commands, command groups)
# โ
Extract function performance data from pprof using 'pprof -top'
# โ
Store flat and cumulative CPU percentages in SQLite
# โ
Enable AI agent to answer performance questions
Flow: Benchmark YAML + pprof File โ pprof Parser โ Function Performance Data โ SQLite Database
Real Output:
๐ฅ Starting profiling data indexing...
โ
Loaded benchmark: memtier_benchmark-1Mkeys-generic-scan-count-500-pipeline-10
โ
Parsed 50 profile entries from pprof
๐ Top function: scanGenericCommand (83.01% cum)
โ
Stored benchmark: generic-scan-count-500-pipeline-10 with 50 profile entries
๐ Profiling data indexing completed successfully!
๐ฌ Interactive Analysis
# Interactive chat mode
redopt chat
> Find functions similar to dictScan
> What are the hotspots for sorted-set commands?
> Which functions affect ZRANGE performance?
# Single question mode (NEW!)
redopt chat "What are the Redis functions that take more than 5% of CPU?"
redopt chat "What Redis commands are affected by the listDelNode function?"
Flow: Chat Query โ AI Agent โ Function/Profiling Tools โ Vector/SQLite Search โ Results
Real Examples:
$ redopt chat "What are the Redis functions that take more than 5% of CPU in the benchmark data?"
๐ค Redis Code Analyzer - Single Question Mode
==================================================
Question: What are the Redis functions that take more than 5% of CPU in the benchmark data?
๐ค Analyzing...
โ
SQLite database initialized
โ
Profiling service connected to SQLite
๐ค Answer:
The following Redis functions take more than 5% of CPU time based on benchmark data:
1. **dictScanDefragBucket**:
- Flat CPU %: 11.03
- Cumulative CPU %: 21.52
2. **_addReplyProtoToList**:
- Flat CPU %: 6.75
- Cumulative CPU %: 8.03
3. **[[kernel.kallsyms]_text]**:
- Flat CPU %: 6.01
- Cumulative CPU %: 6.67
4. **update_zmalloc_stat_alloc (inline)**:
- Flat CPU %: 5.83
- Cumulative CPU %: 6.24
5. **rev (inline)**:
- Flat CPU %: 5.63
- Cumulative CPU %: 5.66
6. **_addReplyToBufferOrList.part.0**:
- Flat CPU %: 5.03
- Cumulative CPU %: 5.09
$ redopt chat "What Redis commands are affected by the listDelNode function?"
๐ค Redis Code Analyzer - Single Question Mode
==================================================
Question: What Redis commands are affected by the listDelNode function?
๐ค Analyzing...
โ
SQLite database initialized
โ
Profiling service connected to SQLite
๐ค Answer:
The `listDelNode` function affects Redis commands in the `scan` command group, with a
performance impact estimated at about 7.53%. Specifically, it is part of the `generic`
command group.
๐ Function Analysis
curl -X POST "localhost:8000/analyze" -d '{"code": "int func() {...}"}'
Flow: Function Code โ Clang Parser โ Graph Analysis โ Similarity Search โ JSON Response
๐ Profiling Analysis
Query performance data directly via AI agent or programmatically:
# AI-powered performance analysis
redopt chat "What are the top 3 functions consuming the most CPU?"
redopt chat "Show me functions with more than 10% flat CPU usage"
redopt chat "Which functions are related to scanning operations?"
# Programmatic access via SQLite
sqlite3 profiling.db "SELECT function, flat_percent, cum_percent FROM profile_entries WHERE flat_percent > 5.0 ORDER BY cum_percent DESC"
Flow: Performance Query โ SQLite Database โ Function Data โ CPU Percentages โ Analysis Results
Installation
Prerequisites
- Python 3.12 or higher
- OpenAI API key
- GitHub Personal Access Token
Install from PyPI
# Install redopt
pip install redopt
# Or with pipx for isolated installation
pipx install redopt
Install from Source with Poetry
# Clone the repository
git clone https://github.com/redis/redopt.git
cd redopt
# Install dependencies
poetry install
# Activate the virtual environment
poetry shell
Docker Setup (Recommended)
For the easiest setup, use Docker Compose to run Redis with all required modules:
# Start Redis with RedisJSON and RedisInsight
docker-compose up -d
# Check if Redis is running
docker-compose ps
# View Redis logs
docker-compose logs redis
# Stop Redis
docker-compose down
This will start:
- Redis Stack on port
6379(with RedisJSON, RedisSearch, and other modules) - RedisInsight web UI on port
8001for database management
Access RedisInsight at: http://localhost:8001
Configuration
Create a .env file in your project root:
# Required
OPENAI_API_KEY=your_openai_api_key_here
GITHUB_TOKEN=your_github_personal_access_token_here
# Optional
OPENAI_MODEL=gpt-4o
OPENAI_BASE_URL=https://api.openai.com/v1
LOG_LEVEL=INFO
MAX_DIFF_LINES=1000
INCLUDE_COMMENTS=true
INCLUDE_REVIEWS=true
GitHub Token Setup
- Go to GitHub Settings โ Developer settings โ Personal access tokens
- Generate a new token with the most strict permissions
Usage
1. Codebase Indexing
Index an entire C/C++ codebase (like Redis) to extract and analyze all functions:
# Index Redis source code
redopt index \
--source ~/redislabs/redis/src \
--output ./redis_functions \
--clang-path /usr/bin/clang \
--include-dirs ~/redislabs/redis/src
# Alternative command
function-analysis-index index \
--source ~/redislabs/redis/src \
--output ./redis_functions
# Options:
# --source: Source directory to index
# --output: Output directory for JSON files
# --clang-path: Path to clang executable
# --include-dirs: Include directories for compilation
# --extensions: File extensions to process (.c .cpp .h)
# --no-recursive: Don't search subdirectories
# --no-redis: Don't store in Redis
# --no-json: Don't save JSON files
# --redis-host: Redis host (default: localhost)
# --redis-port: Redis port (default: 6379)
This will:
- Parse all C/C++ files using LLVM/Clang
- Extract function metadata, AST, and complexity
- Generate Graph2Vec embeddings
- Store results in Redis and/or JSON files
- Create a summary with statistics
2. Profiling Data Indexing
Index Redis performance profiling data from pprof files:
# Index profiling data with pprof support
redopt profile-index \
--benchmark sample-inputs/benchmarks/memtier_benchmark-1Mkeys-generic-scan-count-500-pipeline-10.yml \
--pprof sample-inputs/pprof/generic-scan-count-500-pipeline-10.pb.gz
# Options:
# --benchmark: YAML benchmark definition file
# --pprof: Binary pprof file (.pb.gz format)
# --output: Output directory (optional)
# --db-path: SQLite database path (default: profiling.db)
# Batch processing multiple files
redopt profile-index \
--benchmark-dir ./benchmarks \
--pprof-dir ./pprof_files \
--output ./profiling_results
This will:
- Parse benchmark metadata (commands, command groups, descriptions)
- Extract function performance data using
pprof -topcommand - Store flat and cumulative CPU percentages in SQLite
- Enable AI agent to answer performance questions
- Create comprehensive profiling database
Sample Questions After Indexing:
# Performance analysis questions
redopt chat "What are the Redis functions that take more than 5% of CPU in the benchmark data?"
# Expected: "dictScanDefragBucket (11.03% flat, 21.52% cumulative), _addReplyProtoToList (6.75% flat, 8.03% cumulative)..."
redopt chat "What Redis commands are affected by the scanGenericCommand function?"
# Expected: "scanGenericCommand affects the SCAN command with 83.01% performance impact in the 'generic' command group"
redopt chat "Show me the top 3 functions consuming the most CPU"
# Expected: Ranked list of functions with their CPU percentages
redopt chat "What specific Redis command and command group are being tested in this benchmark?"
# Expected: Command and group information from benchmark metadata
3. Interactive AI Chat
Chat with the Redis Code Analyzer AI agent:
# Start interactive chat mode
redopt chat
# Single question mode (NEW!)
redopt chat "What are the Redis functions that take more than 5% of CPU?"
redopt chat "What Redis commands are affected by the scanGenericCommand function?"
# Examples of what you can do:
> Search for Redis functions by name: "find functions related to dict"
> Analyze this Redis function: int dictScan(dict *d, ...) { ... }
> Find functions similar to dictFind
> What Redis commands use the dictFind function?
> What are the performance implications of this code change?
> Show me statistics about the function database
> Analyze GitHub PR: https://github.com/redis/redis/pull/14108
> What are the top 3 functions consuming the most CPU?
> Show me the benchmark database status
The AI agent can:
- Search the pre-indexed Redis codebase (7,000+ functions)
- Analyze C/C++ function code automatically using LLVM/Clang
- Find semantically similar functions using Graph2Vec embeddings
- Perform call tree analysis to trace function relationships
- Map functions to Redis commands
- Analyze real profiling data from pprof files with precise CPU percentages
- Identify performance hotspots and functions consuming >5% CPU
- Link functions to Redis commands and command groups with performance impact
- Query SQLite profiling database for flat and cumulative CPU usage
- Provide Redis performance insights from real benchmark data
- Answer questions about the codebase and performance characteristics
- Analyze GitHub PRs for performance impact
4. GitHub PR Analysis
Run the performance analyzer on a specific PR:
redopt
The tool will analyze GitHub PRs for performance-related changes and can use the indexed function database for enhanced analysis.
5. Slack Notifications
RedOpt AI can send notifications to Slack channels for significant performance impacts:
# Set up Slack webhook token
export PERFORMANCE_WH_TOKEN=T00000000/B00000000/XXXXXXXXXXXXXXXXXXXXXXXX
# Analyze a PR with automatic Slack alerts for significant impacts
redopt chat "Assess the impact of https://github.com/redis/redis/pull/14200. If the impact is significant warn us on slack on #perf-ci."
# Send a manual notification
redopt chat "Send a test message to slack saying 'Hello from RedOpt AI'"
Features:
- Automatic alerts for PRs with >5% performance impact
- Performance alerts with detailed impact analysis
- Interactive action buttons (JIRA, GitHub comments, benchmarks)
- Repository-specific functionality (Redis repos get additional features)
- Configurable via webhook token
- Supports custom channels and message formatting
See docs/slack_notifications.md for detailed configuration and usage.
How It Works
The tool uses an OpenAI Agent to:
- Fetch GitHub PR data (description, files, comments, reviews)
- Analyze the changes for performance impact
- Generate a structured analysis including:
- Performance improvements and regressions
- Affected Redis commands
- Significance assessment
- Summary of changes
Project Structure
src/
โโโ main.py # Main application entry point
โโโ config.py # Configuration management
โโโ github_client/ # GitHub API client
โ โโโ client.py # GitHub client implementation
โ โโโ models.py # Data models
โโโ function_analysis/ # Function analysis service
โ โโโ core/ # Core analysis components
โ โ โโโ clang_parser.py # LLVM/Clang AST parsing
โ โ โโโ graph_converter.py # Convert clang AST to networkx graphs
โ โ โโโ graph2vec.py # Graph embedding generation
โ โ โโโ models.py # Data models and schemas
โ โโโ storage/ # Storage layer
โ โ โโโ redis_client.py # Redis integration
โ โ โโโ vector_search.py # Similarity search implementation
โ โโโ interfaces/ # User interfaces
โ โ โโโ function_tool.py # AI agent function tool
โ โ โโโ api.py # FastAPI REST endpoints
โ โ โโโ chat.py # Interactive chat interface
โ โโโ cli/ # Command line tools
โ โโโ indexer.py # Codebase indexing CLI
โโโ profiling/ # Profiling analysis service
โโโ models.py # Profiling data models
โโโ parsers/ # Data parsers
โ โโโ perf_parser.py # Perf script parser
โโโ storage/ # Storage layer
โ โโโ profile_storage.py # Redis storage for profiling data
โโโ queries/ # Query interface
โ โโโ profile_queries.py # Profiling data queries
โโโ cli/ # Command line tools
โโโ profile_indexer.py # Profiling data indexing CLI
Publishing to PyPI
Prerequisites for Publishing
- PyPI Account: Create accounts on PyPI and TestPyPI
- API Tokens: Generate API tokens for both PyPI and TestPyPI
- Poetry Configuration: Configure Poetry with your credentials
Configure Poetry for Publishing
# Configure PyPI credentials
poetry config pypi-token.pypi your-pypi-api-token
# Configure TestPyPI for testing (optional)
poetry config repositories.testpypi https://test.pypi.org/legacy/
poetry config pypi-token.testpypi your-testpypi-api-token
Publishing Process
# 1. Update version in pyproject.toml
poetry version patch # or minor, major
# 2. Build the package
poetry build
# 3. Test publish to TestPyPI (optional but recommended)
poetry publish -r testpypi
# 4. Test installation from TestPyPI
pip install --index-url https://test.pypi.org/simple/ redopt
# 5. Publish to PyPI
poetry publish
# 6. Verify installation
pip install redopt
redopt --help
Version Management
# Patch version (0.1.0 -> 0.1.1)
poetry version patch
# Minor version (0.1.0 -> 0.2.0)
poetry version minor
# Major version (0.1.0 -> 1.0.0)
poetry version major
# Pre-release versions
poetry version prerelease # 0.1.0 -> 0.1.1a0
poetry version prepatch # 0.1.0 -> 0.1.1a0
poetry version preminor # 0.1.0 -> 0.2.0a0
poetry version premajor # 0.1.0 -> 1.0.0a0
Development
This is a Redis AI Week project focused on performance analysis of Redis-related pull requests, combining:
- Static Code Analysis: Function-level semantic analysis using LLVM/Clang and Graph2Vec
- Runtime Profiling: Performance hotspot analysis from perf script data
- Benchmark Integration: Mapping between functions, Redis commands, and performance characteristics
- AI-Powered Insights: Conversational interface for exploring code and performance relationships
The system enables developers to:
- Understand which functions are performance-critical for specific Redis operations
- Find semantically similar functions that might have similar performance characteristics
- Analyze the performance impact of code changes through both static analysis and profiling data
- Get AI-powered insights about Redis performance optimization opportunities
Authors
- Filipe Oliveira filipe@redis.com
- Paulo Sousa paulo.sousa@redis.com
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file redopt-0.1.0.tar.gz.
File metadata
- Download URL: redopt-0.1.0.tar.gz
- Upload date:
- Size: 80.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.3 CPython/3.10.7 Linux/5.19.0-46-generic
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
78ea0663937d43ade3ac939b7164867d93ab400552b0f8545b1782a229ba1f0d
|
|
| MD5 |
131f8487da4ae2d188dcd1c0b0f54795
|
|
| BLAKE2b-256 |
83f3539bfa35047d0b9cf370840d04ba70f0ce4096ff2fe2e101365f4e5c1c7b
|
File details
Details for the file redopt-0.1.0-py3-none-any.whl.
File metadata
- Download URL: redopt-0.1.0-py3-none-any.whl
- Upload date:
- Size: 93.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.3 CPython/3.10.7 Linux/5.19.0-46-generic
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
674c1be2b89662f2c45ab281e8634b9a9076fa3763d8d82478123fc72d464706
|
|
| MD5 |
fb2f30503a5b5c641375a319cb2b3072
|
|
| BLAKE2b-256 |
dbe869460ae131533108938cf3ae9f5a6c373518efc6246abbbaeee98449922a
|