Aurora PostgreSQL Severe Performance Degradation Under Concurrent Load

1 vote

0 answers

53 views

postgresql-performance aws-aurora aurora amazon-rds-aurora sqlalchemy

**Environment:** - Database: AWS Aurora PostgreSQL - ORM: SQLAlchemy - API Framework: Python FastAPI **Issue:** I'm experiencing significant query performance degradation when my API receives concurrent requests. I ran a performance test comparing single execution vs. concurrent execution of the same query, and the results are concerning. **Real-World Observations:** When monitoring our production API endpoint during load tests with 100 concurrent users, I've observed concerning behavior: When running the same complex query through PGAdmin without concurrent load, it consistently completes in ~60ms However, during periods of high concurrency (100 simultaneous users), response times for this same query become wildly inconsistent: Some executions still complete in 60-100ms Others suddenly take up to 2 seconds No clear pattern to which queries are slow **Test Results:** Single query execution time: 0.3098 seconds Simulating 100 concurrent clients - all requests starting simultaneously... Results Summary: Total execution time: 32.7863 seconds Successful queries: 100 out of 100 Failed queries: 0 Average query time: 0.5591 seconds (559ms) Min time: 0.2756s, Max time: 1.9853s Queries exceeding 500ms threshold: 21 (21.0%) 50th percentile (median): 0.3114s (311ms) 95th percentile: 1.7712s (1771ms) 99th percentile: 1.9853s (1985ms) With 100 concurrent threads: - Each query takes ~12.4x longer on average (3.62s vs 0.29s) - Huge variance between fastest (0.5s) and slowest (4.8s) query - Overall throughput is ~17.2 queries/second (better than sequential, but still concerning) **Query Details:** The query is moderately complex, involving: Several JOINs across multiple tables, a subquery using EXISTS, ORDER BY and LIMIT clauses. **My Setup** **SQLAlchemy Configuration:**

engine = create_async_engine(
    settings.ASYNC_DATABASE_URL,
    echo=settings.SQL_DEBUG,
    pool_pre_ping=True,
    pool_use_lifo=True,
    pool_size=20,
    max_overflow=100,
    pool_timeout=30,
    pool_recycle=30,
)

AsyncSessionLocal = async_sessionmaker(
    bind=engine,
    class_=AsyncSession,
    expire_on_commit=False,
    autocommit=False,
    autoflush=False,
)

**FastAPI Dependency:**

async def get_db() -> AsyncGenerator[AsyncSession, None]:
    """Get database session"""
    async with AsyncSessionLocal() as session:
        try:
            yield session
            await session.commit()
        except Exception:
            await session.rollback()
            raise

**Questions:** - **Connection Pool Settings:** Are my SQLAlchemy pool settings appropriate for handling 100 concurrent requests? What would be optimal? - **Aurora Configuration:** What Aurora PostgreSQL parameters should I tune to improve concurrent query performance? - **Query Optimization:** Is there a standard approach to optimize complex queries with JOINs and EXISTS subqueries for better concurrency? - **ORM vs Raw SQL:** Would bypassing SQLAlchemy ORM help performance? Any guidance or best practices would be greatly appreciated. I'd be happy to provide additional details if needed. **Update:** **Hardware Configuration** 1. Aurora regional cluster with 1 instance 2. Capacity Type: Provisioned (Min: 0.5 ACUs (1GiB), Max: 16 ACUs (32 GiB)) 3. Storage Config: Standard **Performance Insights** 1. Max ACU utilization: 70% 2. Max CPU Utilization: 45% 3. Max DB connection: 111 4. EBS IO Balance: 100% 5. Buffer Cache Hit Ratio: 100%

Asked by Abhishek Tyagi (11 rep)

May 20, 2025, 07:18 PM
Last activity: May 21, 2025, 02:50 PM

Aurora PostgreSQL Severe Performance Degradation Under Concurrent Load

Related Questions