---
name: lp-query-optimizer
description: Optimizes Django ORM queries for league-planner with select_related, prefetch_related, only/defer, bulk operations, and N+1 detection. Use when fixing slow queries.
argument-hint: <model-or-view-name>
allowed-tools: Read, Write, Edit, Glob, Grep
---

# League-Planner Query Optimizer

Optimizes Django 5.2 ORM queries following league-planner patterns: proper relationship loading, avoiding N+1 queries, using bulk operations, and leveraging PostgreSQL-specific features.

## When to Use

- Fixing slow database queries identified in logs or profiling
- Optimizing views that load multiple related objects
- Implementing bulk operations for large data sets
- Reviewing query patterns before deployment

## Prerequisites

- Django Debug Toolbar or Silk profiler enabled for query analysis
- Understanding of the model relationships in the app
- Access to query logs (`DEBUG=True` or logging configured)

## Instructions

### Step 1: Identify the Problem

Enable query logging in settings:

```python
# leagues/settings.py (development)
LOGGING = {
    'handlers': {
        'console': {
            'class': 'logging.StreamHandler',
        },
    },
    'loggers': {
        'django.db.backends': {
            'level': 'DEBUG',
            'handlers': ['console'],
        },
    },
}
```

Or use Django Debug Toolbar / Silk to identify:
- Number of queries per request
- Duplicate queries (N+1 problem)
- Slow queries (>100ms)

### Step 2: Apply Optimization Patterns

#### Pattern 1: select_related for ForeignKey/OneToOne

```python
# BAD: N+1 queries
teams = Team.objects.all()
for team in teams:
    print(team.season.league.name)  # 2 extra queries per team!

# GOOD: 1 query with JOINs
teams = Team.objects.select_related(
    'season',
    'season__league',
    'country',
).all()
for team in teams:
    print(team.season.league.name)  # No extra queries
```

#### Pattern 2: prefetch_related for ManyToMany/Reverse FK

```python
# BAD: N+1 queries
scenarios = Scenario.objects.all()
for scenario in scenarios:
    for match in scenario.matches.all():  # Extra query per scenario
        print(match.home_team.name)  # Extra query per match!

# GOOD: 2 queries total (scenarios + prefetched matches with teams)
scenarios = Scenario.objects.prefetch_related(
    Prefetch(
        'matches',
        queryset=Match.objects.select_related('home_team', 'away_team', 'day')
    )
).all()
```

#### Pattern 3: Prefetch with Filtering

```python
from django.db.models import Prefetch

# Prefetch only active matches with their related data
seasons = Season.objects.prefetch_related(
    Prefetch(
        'scenarios',
        queryset=Scenario.objects.filter(is_active=True).only('id', 'name', 'season_id')
    ),
    Prefetch(
        'teams',
        queryset=Team.objects.select_related('country').filter(is_active=True)
    ),
)
```

#### Pattern 4: only() and defer() for Partial Loading

```python
# Load only needed fields (reduces memory and transfer)
teams = Team.objects.only(
    'id', 'name', 'abbreviation', 'country_id'
).select_related('country')

# Defer heavy fields
scenarios = Scenario.objects.defer(
    'description',  # Large text field
    'settings_json',  # Large JSON field
).all()

# Combining with values() for aggregation queries
team_stats = Match.objects.filter(
    scenario_id=scenario_id
).values(
    'home_team_id'
).annotate(
    total_home_matches=Count('id'),
    total_goals=Sum('home_goals'),
)
```

#### Pattern 5: Bulk Operations

```python
from django.db import transaction

# BAD: N individual INSERTs
for data in items:
    Match.objects.create(**data)

# GOOD: Single bulk INSERT
with transaction.atomic():
    Match.objects.bulk_create([
        Match(**data) for data in items
    ], batch_size=1000)

# Bulk UPDATE
Match.objects.filter(
    scenario_id=scenario_id,
    is_confirmed=False
).update(
    is_confirmed=True,
    updated_at=timezone.now()
)

# Bulk UPDATE with different values
from django.db.models import Case, When, Value

Match.objects.filter(id__in=match_ids).update(
    status=Case(
        When(id=1, then=Value('confirmed')),
        When(id=2, then=Value('cancelled')),
        default=Value('pending'),
    )
)

# bulk_update for different values per object (Django 4.0+)
matches = list(Match.objects.filter(id__in=match_ids))
for match in matches:
    match.status = calculate_status(match)

Match.objects.bulk_update(matches, ['status'], batch_size=500)
```

### Step 3: Model-Level Optimizations

Add these to your models for automatic optimization:

```python
class Scenario(models.Model):
    # ... fields ...

    class Meta:
        # Indexes for common query patterns
        indexes = [
            models.Index(fields=['season', 'is_active']),
            models.Index(fields=['created_at']),
            # Partial index for active scenarios only
            models.Index(
                fields=['name'],
                condition=models.Q(is_active=True),
                name='idx_active_scenario_name'
            ),
        ]

    # Default manager with common prefetches
    @classmethod
    def get_with_matches(cls, pk: int):
        """Get scenario with all matches pre-loaded."""
        return cls.objects.prefetch_related(
            Prefetch(
                'matches',
                queryset=Match.objects.select_related(
                    'home_team', 'away_team', 'day', 'kick_off_time'
                ).order_by('day__number', 'kick_off_time__time')
            )
        ).get(pk=pk)

    @classmethod
    def get_list_optimized(cls, season_id: int):
        """Optimized query for listing scenarios."""
        return cls.objects.filter(
            season_id=season_id
        ).select_related(
            'season'
        ).prefetch_related(
            Prefetch(
                'matches',
                queryset=Match.objects.only('id', 'scenario_id')
            )
        ).annotate(
            match_count=Count('matches'),
            confirmed_count=Count('matches', filter=Q(matches__is_confirmed=True)),
        ).order_by('-created_at')
```

## Query Optimization Patterns

### Common Relationships in League-Planner

```
League
  └─ select_related: (none - top level)
  └─ prefetch_related: seasons, managers, spectators

Season
  └─ select_related: league
  └─ prefetch_related: teams, scenarios, memberships

Scenario
  └─ select_related: season, season__league
  └─ prefetch_related: matches, optimization_runs

Match
  └─ select_related: scenario, home_team, away_team, day, kick_off_time, stadium
  └─ prefetch_related: (usually none)

Team
  └─ select_related: season, country, stadium
  └─ prefetch_related: home_matches, away_matches, players

Draw
  └─ select_related: season
  └─ prefetch_related: groups, groups__teams, constraints

Group
  └─ select_related: super_group, super_group__draw
  └─ prefetch_related: teams, teamsingroup
```

### Aggregation Patterns

```python
from django.db.models import Count, Sum, Avg, Max, Min, F, Q

# Count related objects
seasons = Season.objects.annotate(
    team_count=Count('teams'),
    active_scenarios=Count('scenarios', filter=Q(scenarios__is_active=True)),
)

# Subquery for complex aggregations
from django.db.models import Subquery, OuterRef

latest_run = OptimizationRun.objects.filter(
    scenario=OuterRef('pk')
).order_by('-created_at')

scenarios = Scenario.objects.annotate(
    latest_score=Subquery(latest_run.values('score')[:1]),
    latest_run_date=Subquery(latest_run.values('created_at')[:1]),
)

# Window functions (PostgreSQL)
from django.db.models import Window
from django.db.models.functions import Rank, RowNumber

teams = Team.objects.annotate(
    season_rank=Window(
        expression=Rank(),
        partition_by=F('season'),
        order_by=F('points').desc(),
    )
)
```

### PostgreSQL-Specific Optimizations

```python
# Array aggregation
from django.contrib.postgres.aggregates import ArrayAgg, StringAgg

seasons = Season.objects.annotate(
    team_names=ArrayAgg('teams__name', ordering='teams__name'),
    team_list=StringAgg('teams__name', delimiter=', '),
)

# JSON aggregation
from django.db.models.functions import JSONObject
from django.contrib.postgres.aggregates import JSONBAgg

scenarios = Scenario.objects.annotate(
    match_summary=JSONBAgg(
        JSONObject(
            id='matches__id',
            home='matches__home_team__name',
            away='matches__away_team__name',
        ),
        filter=Q(matches__is_final=True),
    )
)

# Full-text search
from django.contrib.postgres.search import SearchVector, SearchQuery, SearchRank

teams = Team.objects.annotate(
    search=SearchVector('name', 'city', 'stadium__name'),
).filter(
    search=SearchQuery('bayern munich')
)
```

### Caching Query Results

```python
from django.core.cache import cache

def get_season_teams(season_id: int) -> list:
    """Get teams with caching."""
    cache_key = f'season:{season_id}:teams'
    teams = cache.get(cache_key)

    if teams is None:
        teams = list(
            Team.objects.filter(season_id=season_id)
            .select_related('country')
            .values('id', 'name', 'country__name', 'country__code')
        )
        cache.set(cache_key, teams, timeout=300)  # 5 minutes

    return teams

# Invalidate on changes
def invalidate_season_cache(season_id: int):
    cache.delete(f'season:{season_id}:teams')
    cache.delete(f'season:{season_id}:scenarios')
```

## Examples

### Example 1: Optimizing Schedule View

```python
# BEFORE: ~500 queries for 300 matches
def schedule_view(request, scenario_id):
    scenario = Scenario.objects.get(pk=scenario_id)
    matches = scenario.matches.all()  # N+1 for each match's teams, day, etc.

    context = {'matches': matches}
    return render(request, 'schedule.html', context)

# AFTER: 3 queries total
def schedule_view(request, scenario_id):
    scenario = Scenario.objects.select_related(
        'season',
        'season__league',
    ).get(pk=scenario_id)

    matches = Match.objects.filter(
        scenario=scenario
    ).select_related(
        'home_team',
        'home_team__country',
        'away_team',
        'away_team__country',
        'day',
        'kick_off_time',
        'stadium',
    ).order_by('day__number', 'kick_off_time__time')

    context = {
        'scenario': scenario,
        'matches': matches,
    }
    return render(request, 'schedule.html', context)
```

### Example 2: Bulk Match Creation

```python
# BEFORE: Slow - one INSERT per match
def create_matches(scenario, match_data_list):
    for data in match_data_list:
        Match.objects.create(scenario=scenario, **data)

# AFTER: Fast - bulk INSERT
from django.db import transaction

def create_matches(scenario, match_data_list):
    matches = [
        Match(
            scenario=scenario,
            home_team_id=data['home_team_id'],
            away_team_id=data['away_team_id'],
            day_id=data.get('day_id'),
            kick_off_time_id=data.get('kick_off_time_id'),
        )
        for data in match_data_list
    ]

    with transaction.atomic():
        Match.objects.bulk_create(matches, batch_size=500)

    return len(matches)
```

### Example 3: Complex Reporting Query

```python
def get_season_report(season_id: int) -> dict:
    """Generate comprehensive season report with optimized queries."""
    from django.db.models import Count, Avg, Sum, Q, F
    from django.db.models.functions import TruncDate

    # Single query for team statistics
    team_stats = Team.objects.filter(
        season_id=season_id
    ).annotate(
        home_matches=Count('home_matches'),
        away_matches=Count('away_matches'),
        total_distance=Sum('away_matches__distance'),
    ).values('id', 'name', 'home_matches', 'away_matches', 'total_distance')

    # Single query for scenario comparison
    scenarios = Scenario.objects.filter(
        season_id=season_id,
        is_active=True,
    ).annotate(
        match_count=Count('matches'),
        confirmed_pct=Count('matches', filter=Q(matches__is_confirmed=True)) * 100.0 / Count('matches'),
        avg_distance=Avg('matches__distance'),
    ).values('id', 'name', 'match_count', 'confirmed_pct', 'avg_distance')

    # Matches by day aggregation
    matches_by_day = Match.objects.filter(
        scenario__season_id=season_id,
        scenario__is_active=True,
    ).values(
        'day__number'
    ).annotate(
        count=Count('id'),
    ).order_by('day__number')

    return {
        'teams': list(team_stats),
        'scenarios': list(scenarios),
        'matches_by_day': list(matches_by_day),
    }
```

## Common Pitfalls

- **select_related on M2M**: Only use for FK/O2O; use prefetch_related for M2M
- **Chained prefetch**: Remember prefetched data is cached; re-filtering creates new queries
- **values() with related**: Use `values('related__field')` carefully; it can create JOINs
- **Forgetting batch_size**: bulk_create/bulk_update without batch_size can cause memory issues
- **Ignoring database indexes**: Ensure fields in WHERE/ORDER BY have proper indexes

## Verification

Use Django Debug Toolbar or these queries to verify:

```python
from django.db import connection, reset_queries
from django.conf import settings

settings.DEBUG = True
reset_queries()

# Run your code here
result = my_function()

# Check queries
print(f"Total queries: {len(connection.queries)}")
for q in connection.queries:
    print(f"{q['time']}s: {q['sql'][:100]}...")
```

Or with Silk profiler at `/silk/` when `USE_SILK=True`.