502 lines
13 KiB
Markdown
502 lines
13 KiB
Markdown
---
|
|
name: lp-query-optimizer
|
|
description: Optimizes Django ORM queries for league-planner with select_related, prefetch_related, only/defer, bulk operations, and N+1 detection. Use when fixing slow queries.
|
|
argument-hint: <model-or-view-name>
|
|
allowed-tools: Read, Write, Edit, Glob, Grep
|
|
---
|
|
|
|
# League-Planner Query Optimizer
|
|
|
|
Optimizes Django 6 ORM queries following league-planner patterns: proper relationship loading, avoiding N+1 queries, using bulk operations, and leveraging PostgreSQL-specific features.
|
|
|
|
## When to Use
|
|
|
|
- Fixing slow database queries identified in logs or profiling
|
|
- Optimizing views that load multiple related objects
|
|
- Implementing bulk operations for large data sets
|
|
- Reviewing query patterns before deployment
|
|
|
|
## Prerequisites
|
|
|
|
- Django Debug Toolbar or Silk profiler enabled for query analysis
|
|
- Understanding of the model relationships in the app
|
|
- Access to query logs (`DEBUG=True` or logging configured)
|
|
|
|
## Instructions
|
|
|
|
### Step 1: Identify the Problem
|
|
|
|
Enable query logging in settings:
|
|
|
|
```python
|
|
# leagues/settings.py (development)
|
|
LOGGING = {
|
|
'handlers': {
|
|
'console': {
|
|
'class': 'logging.StreamHandler',
|
|
},
|
|
},
|
|
'loggers': {
|
|
'django.db.backends': {
|
|
'level': 'DEBUG',
|
|
'handlers': ['console'],
|
|
},
|
|
},
|
|
}
|
|
```
|
|
|
|
Or use Django Debug Toolbar / Silk to identify:
|
|
- Number of queries per request
|
|
- Duplicate queries (N+1 problem)
|
|
- Slow queries (>100ms)
|
|
|
|
### Step 2: Apply Optimization Patterns
|
|
|
|
#### Pattern 1: select_related for ForeignKey/OneToOne
|
|
|
|
```python
|
|
# BAD: N+1 queries
|
|
teams = Team.objects.all()
|
|
for team in teams:
|
|
print(team.season.league.name) # 2 extra queries per team!
|
|
|
|
# GOOD: 1 query with JOINs
|
|
teams = Team.objects.select_related(
|
|
'season',
|
|
'season__league',
|
|
'country',
|
|
).all()
|
|
for team in teams:
|
|
print(team.season.league.name) # No extra queries
|
|
```
|
|
|
|
#### Pattern 2: prefetch_related for ManyToMany/Reverse FK
|
|
|
|
```python
|
|
# BAD: N+1 queries
|
|
scenarios = Scenario.objects.all()
|
|
for scenario in scenarios:
|
|
for match in scenario.matches.all(): # Extra query per scenario
|
|
print(match.home_team.name) # Extra query per match!
|
|
|
|
# GOOD: 2 queries total (scenarios + prefetched matches with teams)
|
|
scenarios = Scenario.objects.prefetch_related(
|
|
Prefetch(
|
|
'matches',
|
|
queryset=Match.objects.select_related('home_team', 'away_team', 'day')
|
|
)
|
|
).all()
|
|
```
|
|
|
|
#### Pattern 3: Prefetch with Filtering
|
|
|
|
```python
|
|
from django.db.models import Prefetch
|
|
|
|
# Prefetch only active matches with their related data
|
|
seasons = Season.objects.prefetch_related(
|
|
Prefetch(
|
|
'scenarios',
|
|
queryset=Scenario.objects.filter(is_active=True).only('id', 'name', 'season_id')
|
|
),
|
|
Prefetch(
|
|
'teams',
|
|
queryset=Team.objects.select_related('country').filter(is_active=True)
|
|
),
|
|
)
|
|
```
|
|
|
|
#### Pattern 4: only() and defer() for Partial Loading
|
|
|
|
```python
|
|
# Load only needed fields (reduces memory and transfer)
|
|
teams = Team.objects.only(
|
|
'id', 'name', 'abbreviation', 'country_id'
|
|
).select_related('country')
|
|
|
|
# Defer heavy fields
|
|
scenarios = Scenario.objects.defer(
|
|
'description', # Large text field
|
|
'settings_json', # Large JSON field
|
|
).all()
|
|
|
|
# Combining with values() for aggregation queries
|
|
team_stats = Match.objects.filter(
|
|
scenario_id=scenario_id
|
|
).values(
|
|
'home_team_id'
|
|
).annotate(
|
|
total_home_matches=Count('id'),
|
|
total_goals=Sum('home_goals'),
|
|
)
|
|
```
|
|
|
|
#### Pattern 5: Bulk Operations
|
|
|
|
```python
|
|
from django.db import transaction
|
|
|
|
# BAD: N individual INSERTs
|
|
for data in items:
|
|
Match.objects.create(**data)
|
|
|
|
# GOOD: Single bulk INSERT
|
|
with transaction.atomic():
|
|
Match.objects.bulk_create([
|
|
Match(**data) for data in items
|
|
], batch_size=1000)
|
|
|
|
# Bulk UPDATE
|
|
Match.objects.filter(
|
|
scenario_id=scenario_id,
|
|
is_confirmed=False
|
|
).update(
|
|
is_confirmed=True,
|
|
updated_at=timezone.now()
|
|
)
|
|
|
|
# Bulk UPDATE with different values
|
|
from django.db.models import Case, When, Value
|
|
|
|
Match.objects.filter(id__in=match_ids).update(
|
|
status=Case(
|
|
When(id=1, then=Value('confirmed')),
|
|
When(id=2, then=Value('cancelled')),
|
|
default=Value('pending'),
|
|
)
|
|
)
|
|
|
|
# bulk_update for different values per object (Django 4.0+)
|
|
matches = list(Match.objects.filter(id__in=match_ids))
|
|
for match in matches:
|
|
match.status = calculate_status(match)
|
|
|
|
Match.objects.bulk_update(matches, ['status'], batch_size=500)
|
|
```
|
|
|
|
### Step 3: Model-Level Optimizations
|
|
|
|
Add these to your models for automatic optimization:
|
|
|
|
```python
|
|
class Scenario(models.Model):
|
|
# ... fields ...
|
|
|
|
class Meta:
|
|
# Indexes for common query patterns
|
|
indexes = [
|
|
models.Index(fields=['season', 'is_active']),
|
|
models.Index(fields=['created_at']),
|
|
# Partial index for active scenarios only
|
|
models.Index(
|
|
fields=['name'],
|
|
condition=models.Q(is_active=True),
|
|
name='idx_active_scenario_name'
|
|
),
|
|
]
|
|
|
|
# Default manager with common prefetches
|
|
@classmethod
|
|
def get_with_matches(cls, pk: int):
|
|
"""Get scenario with all matches pre-loaded."""
|
|
return cls.objects.prefetch_related(
|
|
Prefetch(
|
|
'matches',
|
|
queryset=Match.objects.select_related(
|
|
'home_team', 'away_team', 'day', 'kick_off_time'
|
|
).order_by('day__number', 'kick_off_time__time')
|
|
)
|
|
).get(pk=pk)
|
|
|
|
@classmethod
|
|
def get_list_optimized(cls, season_id: int):
|
|
"""Optimized query for listing scenarios."""
|
|
return cls.objects.filter(
|
|
season_id=season_id
|
|
).select_related(
|
|
'season'
|
|
).prefetch_related(
|
|
Prefetch(
|
|
'matches',
|
|
queryset=Match.objects.only('id', 'scenario_id')
|
|
)
|
|
).annotate(
|
|
match_count=Count('matches'),
|
|
confirmed_count=Count('matches', filter=Q(matches__is_confirmed=True)),
|
|
).order_by('-created_at')
|
|
```
|
|
|
|
## Query Optimization Patterns
|
|
|
|
### Common Relationships in League-Planner
|
|
|
|
```
|
|
League
|
|
└─ select_related: (none - top level)
|
|
└─ prefetch_related: seasons, managers, spectators
|
|
|
|
Season
|
|
└─ select_related: league
|
|
└─ prefetch_related: teams, scenarios, memberships
|
|
|
|
Scenario
|
|
└─ select_related: season, season__league
|
|
└─ prefetch_related: matches, optimization_runs
|
|
|
|
Match
|
|
└─ select_related: scenario, home_team, away_team, day, kick_off_time, stadium
|
|
└─ prefetch_related: (usually none)
|
|
|
|
Team
|
|
└─ select_related: season, country, stadium
|
|
└─ prefetch_related: home_matches, away_matches, players
|
|
|
|
Draw
|
|
└─ select_related: season
|
|
└─ prefetch_related: groups, groups__teams, constraints
|
|
|
|
Group
|
|
└─ select_related: super_group, super_group__draw
|
|
└─ prefetch_related: teams, teamsingroup
|
|
```
|
|
|
|
### Aggregation Patterns
|
|
|
|
```python
|
|
from django.db.models import Count, Sum, Avg, Max, Min, F, Q
|
|
|
|
# Count related objects
|
|
seasons = Season.objects.annotate(
|
|
team_count=Count('teams'),
|
|
active_scenarios=Count('scenarios', filter=Q(scenarios__is_active=True)),
|
|
)
|
|
|
|
# Subquery for complex aggregations
|
|
from django.db.models import Subquery, OuterRef
|
|
|
|
latest_run = OptimizationRun.objects.filter(
|
|
scenario=OuterRef('pk')
|
|
).order_by('-created_at')
|
|
|
|
scenarios = Scenario.objects.annotate(
|
|
latest_score=Subquery(latest_run.values('score')[:1]),
|
|
latest_run_date=Subquery(latest_run.values('created_at')[:1]),
|
|
)
|
|
|
|
# Window functions (PostgreSQL)
|
|
from django.db.models import Window
|
|
from django.db.models.functions import Rank, RowNumber
|
|
|
|
teams = Team.objects.annotate(
|
|
season_rank=Window(
|
|
expression=Rank(),
|
|
partition_by=F('season'),
|
|
order_by=F('points').desc(),
|
|
)
|
|
)
|
|
```
|
|
|
|
### PostgreSQL-Specific Optimizations
|
|
|
|
```python
|
|
# Array aggregation
|
|
from django.contrib.postgres.aggregates import ArrayAgg, StringAgg
|
|
|
|
seasons = Season.objects.annotate(
|
|
team_names=ArrayAgg('teams__name', ordering='teams__name'),
|
|
team_list=StringAgg('teams__name', delimiter=', '),
|
|
)
|
|
|
|
# JSON aggregation
|
|
from django.db.models.functions import JSONObject
|
|
from django.contrib.postgres.aggregates import JSONBAgg
|
|
|
|
scenarios = Scenario.objects.annotate(
|
|
match_summary=JSONBAgg(
|
|
JSONObject(
|
|
id='matches__id',
|
|
home='matches__home_team__name',
|
|
away='matches__away_team__name',
|
|
),
|
|
filter=Q(matches__is_final=True),
|
|
)
|
|
)
|
|
|
|
# Full-text search
|
|
from django.contrib.postgres.search import SearchVector, SearchQuery, SearchRank
|
|
|
|
teams = Team.objects.annotate(
|
|
search=SearchVector('name', 'city', 'stadium__name'),
|
|
).filter(
|
|
search=SearchQuery('bayern munich')
|
|
)
|
|
```
|
|
|
|
### Caching Query Results
|
|
|
|
```python
|
|
from django.core.cache import cache
|
|
|
|
def get_season_teams(season_id: int) -> list:
|
|
"""Get teams with caching."""
|
|
cache_key = f'season:{season_id}:teams'
|
|
teams = cache.get(cache_key)
|
|
|
|
if teams is None:
|
|
teams = list(
|
|
Team.objects.filter(season_id=season_id)
|
|
.select_related('country')
|
|
.values('id', 'name', 'country__name', 'country__code')
|
|
)
|
|
cache.set(cache_key, teams, timeout=300) # 5 minutes
|
|
|
|
return teams
|
|
|
|
# Invalidate on changes
|
|
def invalidate_season_cache(season_id: int):
|
|
cache.delete(f'season:{season_id}:teams')
|
|
cache.delete(f'season:{season_id}:scenarios')
|
|
```
|
|
|
|
## Examples
|
|
|
|
### Example 1: Optimizing Schedule View
|
|
|
|
```python
|
|
# BEFORE: ~500 queries for 300 matches
|
|
def schedule_view(request, scenario_id):
|
|
scenario = Scenario.objects.get(pk=scenario_id)
|
|
matches = scenario.matches.all() # N+1 for each match's teams, day, etc.
|
|
|
|
context = {'matches': matches}
|
|
return render(request, 'schedule.html', context)
|
|
|
|
# AFTER: 3 queries total
|
|
def schedule_view(request, scenario_id):
|
|
scenario = Scenario.objects.select_related(
|
|
'season',
|
|
'season__league',
|
|
).get(pk=scenario_id)
|
|
|
|
matches = Match.objects.filter(
|
|
scenario=scenario
|
|
).select_related(
|
|
'home_team',
|
|
'home_team__country',
|
|
'away_team',
|
|
'away_team__country',
|
|
'day',
|
|
'kick_off_time',
|
|
'stadium',
|
|
).order_by('day__number', 'kick_off_time__time')
|
|
|
|
context = {
|
|
'scenario': scenario,
|
|
'matches': matches,
|
|
}
|
|
return render(request, 'schedule.html', context)
|
|
```
|
|
|
|
### Example 2: Bulk Match Creation
|
|
|
|
```python
|
|
# BEFORE: Slow - one INSERT per match
|
|
def create_matches(scenario, match_data_list):
|
|
for data in match_data_list:
|
|
Match.objects.create(scenario=scenario, **data)
|
|
|
|
# AFTER: Fast - bulk INSERT
|
|
from django.db import transaction
|
|
|
|
def create_matches(scenario, match_data_list):
|
|
matches = [
|
|
Match(
|
|
scenario=scenario,
|
|
home_team_id=data['home_team_id'],
|
|
away_team_id=data['away_team_id'],
|
|
day_id=data.get('day_id'),
|
|
kick_off_time_id=data.get('kick_off_time_id'),
|
|
)
|
|
for data in match_data_list
|
|
]
|
|
|
|
with transaction.atomic():
|
|
Match.objects.bulk_create(matches, batch_size=500)
|
|
|
|
return len(matches)
|
|
```
|
|
|
|
### Example 3: Complex Reporting Query
|
|
|
|
```python
|
|
def get_season_report(season_id: int) -> dict:
|
|
"""Generate comprehensive season report with optimized queries."""
|
|
from django.db.models import Count, Avg, Sum, Q, F
|
|
from django.db.models.functions import TruncDate
|
|
|
|
# Single query for team statistics
|
|
team_stats = Team.objects.filter(
|
|
season_id=season_id
|
|
).annotate(
|
|
home_matches=Count('home_matches'),
|
|
away_matches=Count('away_matches'),
|
|
total_distance=Sum('away_matches__distance'),
|
|
).values('id', 'name', 'home_matches', 'away_matches', 'total_distance')
|
|
|
|
# Single query for scenario comparison
|
|
scenarios = Scenario.objects.filter(
|
|
season_id=season_id,
|
|
is_active=True,
|
|
).annotate(
|
|
match_count=Count('matches'),
|
|
confirmed_pct=Count('matches', filter=Q(matches__is_confirmed=True)) * 100.0 / Count('matches'),
|
|
avg_distance=Avg('matches__distance'),
|
|
).values('id', 'name', 'match_count', 'confirmed_pct', 'avg_distance')
|
|
|
|
# Matches by day aggregation
|
|
matches_by_day = Match.objects.filter(
|
|
scenario__season_id=season_id,
|
|
scenario__is_active=True,
|
|
).values(
|
|
'day__number'
|
|
).annotate(
|
|
count=Count('id'),
|
|
).order_by('day__number')
|
|
|
|
return {
|
|
'teams': list(team_stats),
|
|
'scenarios': list(scenarios),
|
|
'matches_by_day': list(matches_by_day),
|
|
}
|
|
```
|
|
|
|
## Common Pitfalls
|
|
|
|
- **select_related on M2M**: Only use for FK/O2O; use prefetch_related for M2M
|
|
- **Chained prefetch**: Remember prefetched data is cached; re-filtering creates new queries
|
|
- **values() with related**: Use `values('related__field')` carefully; it can create JOINs
|
|
- **Forgetting batch_size**: bulk_create/bulk_update without batch_size can cause memory issues
|
|
- **Ignoring database indexes**: Ensure fields in WHERE/ORDER BY have proper indexes
|
|
|
|
## Verification
|
|
|
|
Use Django Debug Toolbar or these queries to verify:
|
|
|
|
```python
|
|
from django.db import connection, reset_queries
|
|
from django.conf import settings
|
|
|
|
settings.DEBUG = True
|
|
reset_queries()
|
|
|
|
# Run your code here
|
|
result = my_function()
|
|
|
|
# Check queries
|
|
print(f"Total queries: {len(connection.queries)}")
|
|
for q in connection.queries:
|
|
print(f"{q['time']}s: {q['sql'][:100]}...")
|
|
```
|
|
|
|
Or with Silk profiler at `/silk/` when `USE_SILK=True`.
|