martin 30daee0c7e chore: Update Django version references from 5.2 to 6 across documentation and skills.

2026-02-05 14:28:23 +01:00

13 KiB

Raw Blame History

name	description	argument-hint	allowed-tools
lp-query-optimizer	Optimizes Django ORM queries for league-planner with select_related, prefetch_related, only/defer, bulk operations, and N+1 detection. Use when fixing slow queries.	<model-or-view-name>	Read, Write, Edit, Glob, Grep

League-Planner Query Optimizer

Optimizes Django 6 ORM queries following league-planner patterns: proper relationship loading, avoiding N+1 queries, using bulk operations, and leveraging PostgreSQL-specific features.

When to Use

Fixing slow database queries identified in logs or profiling
Optimizing views that load multiple related objects
Implementing bulk operations for large data sets
Reviewing query patterns before deployment

Prerequisites

Django Debug Toolbar or Silk profiler enabled for query analysis
Understanding of the model relationships in the app
Access to query logs (DEBUG=True or logging configured)

Instructions

Step 1: Identify the Problem

Enable query logging in settings:

# leagues/settings.py (development)
LOGGING = {
    'handlers': {
        'console': {
            'class': 'logging.StreamHandler',
        },
    },
    'loggers': {
        'django.db.backends': {
            'level': 'DEBUG',
            'handlers': ['console'],
        },
    },
}

Or use Django Debug Toolbar / Silk to identify:

Number of queries per request
Duplicate queries (N+1 problem)
Slow queries (>100ms)

Step 2: Apply Optimization Patterns

Pattern 1: select_related for ForeignKey/OneToOne

# BAD: N+1 queries
teams = Team.objects.all()
for team in teams:
    print(team.season.league.name)  # 2 extra queries per team!

# GOOD: 1 query with JOINs
teams = Team.objects.select_related(
    'season',
    'season__league',
    'country',
).all()
for team in teams:
    print(team.season.league.name)  # No extra queries

Pattern 2: prefetch_related for ManyToMany/Reverse FK

# BAD: N+1 queries
scenarios = Scenario.objects.all()
for scenario in scenarios:
    for match in scenario.matches.all():  # Extra query per scenario
        print(match.home_team.name)  # Extra query per match!

# GOOD: 2 queries total (scenarios + prefetched matches with teams)
scenarios = Scenario.objects.prefetch_related(
    Prefetch(
        'matches',
        queryset=Match.objects.select_related('home_team', 'away_team', 'day')
    )
).all()

Pattern 3: Prefetch with Filtering

from django.db.models import Prefetch

# Prefetch only active matches with their related data
seasons = Season.objects.prefetch_related(
    Prefetch(
        'scenarios',
        queryset=Scenario.objects.filter(is_active=True).only('id', 'name', 'season_id')
    ),
    Prefetch(
        'teams',
        queryset=Team.objects.select_related('country').filter(is_active=True)
    ),
)

Pattern 4: only() and defer() for Partial Loading

# Load only needed fields (reduces memory and transfer)
teams = Team.objects.only(
    'id', 'name', 'abbreviation', 'country_id'
).select_related('country')

# Defer heavy fields
scenarios = Scenario.objects.defer(
    'description',  # Large text field
    'settings_json',  # Large JSON field
).all()

# Combining with values() for aggregation queries
team_stats = Match.objects.filter(
    scenario_id=scenario_id
).values(
    'home_team_id'
).annotate(
    total_home_matches=Count('id'),
    total_goals=Sum('home_goals'),
)

Pattern 5: Bulk Operations

from django.db import transaction

# BAD: N individual INSERTs
for data in items:
    Match.objects.create(**data)

# GOOD: Single bulk INSERT
with transaction.atomic():
    Match.objects.bulk_create([
        Match(**data) for data in items
    ], batch_size=1000)

# Bulk UPDATE
Match.objects.filter(
    scenario_id=scenario_id,
    is_confirmed=False
).update(
    is_confirmed=True,
    updated_at=timezone.now()
)

# Bulk UPDATE with different values
from django.db.models import Case, When, Value

Match.objects.filter(id__in=match_ids).update(
    status=Case(
        When(id=1, then=Value('confirmed')),
        When(id=2, then=Value('cancelled')),
        default=Value('pending'),
    )
)

# bulk_update for different values per object (Django 4.0+)
matches = list(Match.objects.filter(id__in=match_ids))
for match in matches:
    match.status = calculate_status(match)

Match.objects.bulk_update(matches, ['status'], batch_size=500)

Step 3: Model-Level Optimizations

Add these to your models for automatic optimization:

class Scenario(models.Model):
    # ... fields ...

    class Meta:
        # Indexes for common query patterns
        indexes = [
            models.Index(fields=['season', 'is_active']),
            models.Index(fields=['created_at']),
            # Partial index for active scenarios only
            models.Index(
                fields=['name'],
                condition=models.Q(is_active=True),
                name='idx_active_scenario_name'
            ),
        ]

    # Default manager with common prefetches
    @classmethod
    def get_with_matches(cls, pk: int):
        """Get scenario with all matches pre-loaded."""
        return cls.objects.prefetch_related(
            Prefetch(
                'matches',
                queryset=Match.objects.select_related(
                    'home_team', 'away_team', 'day', 'kick_off_time'
                ).order_by('day__number', 'kick_off_time__time')
            )
        ).get(pk=pk)

    @classmethod
    def get_list_optimized(cls, season_id: int):
        """Optimized query for listing scenarios."""
        return cls.objects.filter(
            season_id=season_id
        ).select_related(
            'season'
        ).prefetch_related(
            Prefetch(
                'matches',
                queryset=Match.objects.only('id', 'scenario_id')
            )
        ).annotate(
            match_count=Count('matches'),
            confirmed_count=Count('matches', filter=Q(matches__is_confirmed=True)),
        ).order_by('-created_at')

Query Optimization Patterns

Common Relationships in League-Planner

League
  └─ select_related: (none - top level)
  └─ prefetch_related: seasons, managers, spectators

Season
  └─ select_related: league
  └─ prefetch_related: teams, scenarios, memberships

Scenario
  └─ select_related: season, season__league
  └─ prefetch_related: matches, optimization_runs

Match
  └─ select_related: scenario, home_team, away_team, day, kick_off_time, stadium
  └─ prefetch_related: (usually none)

Team
  └─ select_related: season, country, stadium
  └─ prefetch_related: home_matches, away_matches, players

Draw
  └─ select_related: season
  └─ prefetch_related: groups, groups__teams, constraints

Group
  └─ select_related: super_group, super_group__draw
  └─ prefetch_related: teams, teamsingroup

Aggregation Patterns

from django.db.models import Count, Sum, Avg, Max, Min, F, Q

# Count related objects
seasons = Season.objects.annotate(
    team_count=Count('teams'),
    active_scenarios=Count('scenarios', filter=Q(scenarios__is_active=True)),
)

# Subquery for complex aggregations
from django.db.models import Subquery, OuterRef

latest_run = OptimizationRun.objects.filter(
    scenario=OuterRef('pk')
).order_by('-created_at')

scenarios = Scenario.objects.annotate(
    latest_score=Subquery(latest_run.values('score')[:1]),
    latest_run_date=Subquery(latest_run.values('created_at')[:1]),
)

# Window functions (PostgreSQL)
from django.db.models import Window
from django.db.models.functions import Rank, RowNumber

teams = Team.objects.annotate(
    season_rank=Window(
        expression=Rank(),
        partition_by=F('season'),
        order_by=F('points').desc(),
    )
)

PostgreSQL-Specific Optimizations

# Array aggregation
from django.contrib.postgres.aggregates import ArrayAgg, StringAgg

seasons = Season.objects.annotate(
    team_names=ArrayAgg('teams__name', ordering='teams__name'),
    team_list=StringAgg('teams__name', delimiter=', '),
)

# JSON aggregation
from django.db.models.functions import JSONObject
from django.contrib.postgres.aggregates import JSONBAgg

scenarios = Scenario.objects.annotate(
    match_summary=JSONBAgg(
        JSONObject(
            id='matches__id',
            home='matches__home_team__name',
            away='matches__away_team__name',
        ),
        filter=Q(matches__is_final=True),
    )
)

# Full-text search
from django.contrib.postgres.search import SearchVector, SearchQuery, SearchRank

teams = Team.objects.annotate(
    search=SearchVector('name', 'city', 'stadium__name'),
).filter(
    search=SearchQuery('bayern munich')
)

Caching Query Results

from django.core.cache import cache

def get_season_teams(season_id: int) -> list:
    """Get teams with caching."""
    cache_key = f'season:{season_id}:teams'
    teams = cache.get(cache_key)

    if teams is None:
        teams = list(
            Team.objects.filter(season_id=season_id)
            .select_related('country')
            .values('id', 'name', 'country__name', 'country__code')
        )
        cache.set(cache_key, teams, timeout=300)  # 5 minutes

    return teams

# Invalidate on changes
def invalidate_season_cache(season_id: int):
    cache.delete(f'season:{season_id}:teams')
    cache.delete(f'season:{season_id}:scenarios')

Examples

Example 1: Optimizing Schedule View

# BEFORE: ~500 queries for 300 matches
def schedule_view(request, scenario_id):
    scenario = Scenario.objects.get(pk=scenario_id)
    matches = scenario.matches.all()  # N+1 for each match's teams, day, etc.

    context = {'matches': matches}
    return render(request, 'schedule.html', context)

# AFTER: 3 queries total
def schedule_view(request, scenario_id):
    scenario = Scenario.objects.select_related(
        'season',
        'season__league',
    ).get(pk=scenario_id)

    matches = Match.objects.filter(
        scenario=scenario
    ).select_related(
        'home_team',
        'home_team__country',
        'away_team',
        'away_team__country',
        'day',
        'kick_off_time',
        'stadium',
    ).order_by('day__number', 'kick_off_time__time')

    context = {
        'scenario': scenario,
        'matches': matches,
    }
    return render(request, 'schedule.html', context)

Example 2: Bulk Match Creation

# BEFORE: Slow - one INSERT per match
def create_matches(scenario, match_data_list):
    for data in match_data_list:
        Match.objects.create(scenario=scenario, **data)

# AFTER: Fast - bulk INSERT
from django.db import transaction

def create_matches(scenario, match_data_list):
    matches = [
        Match(
            scenario=scenario,
            home_team_id=data['home_team_id'],
            away_team_id=data['away_team_id'],
            day_id=data.get('day_id'),
            kick_off_time_id=data.get('kick_off_time_id'),
        )
        for data in match_data_list
    ]

    with transaction.atomic():
        Match.objects.bulk_create(matches, batch_size=500)

    return len(matches)

Example 3: Complex Reporting Query

def get_season_report(season_id: int) -> dict:
    """Generate comprehensive season report with optimized queries."""
    from django.db.models import Count, Avg, Sum, Q, F
    from django.db.models.functions import TruncDate

    # Single query for team statistics
    team_stats = Team.objects.filter(
        season_id=season_id
    ).annotate(
        home_matches=Count('home_matches'),
        away_matches=Count('away_matches'),
        total_distance=Sum('away_matches__distance'),
    ).values('id', 'name', 'home_matches', 'away_matches', 'total_distance')

    # Single query for scenario comparison
    scenarios = Scenario.objects.filter(
        season_id=season_id,
        is_active=True,
    ).annotate(
        match_count=Count('matches'),
        confirmed_pct=Count('matches', filter=Q(matches__is_confirmed=True)) * 100.0 / Count('matches'),
        avg_distance=Avg('matches__distance'),
    ).values('id', 'name', 'match_count', 'confirmed_pct', 'avg_distance')

    # Matches by day aggregation
    matches_by_day = Match.objects.filter(
        scenario__season_id=season_id,
        scenario__is_active=True,
    ).values(
        'day__number'
    ).annotate(
        count=Count('id'),
    ).order_by('day__number')

    return {
        'teams': list(team_stats),
        'scenarios': list(scenarios),
        'matches_by_day': list(matches_by_day),
    }

Common Pitfalls

select_related on M2M: Only use for FK/O2O; use prefetch_related for M2M
Chained prefetch: Remember prefetched data is cached; re-filtering creates new queries
values() with related: Use values('related__field') carefully; it can create JOINs
Forgetting batch_size: bulk_create/bulk_update without batch_size can cause memory issues
Ignoring database indexes: Ensure fields in WHERE/ORDER BY have proper indexes

Verification

Use Django Debug Toolbar or these queries to verify:

from django.db import connection, reset_queries
from django.conf import settings

settings.DEBUG = True
reset_queries()

# Run your code here
result = my_function()

# Check queries
print(f"Total queries: {len(connection.queries)}")
for q in connection.queries:
    print(f"{q['time']}s: {q['sql'][:100]}...")

Or with Silk profiler at /silk/ when USE_SILK=True.

13 KiB Raw Blame History