SQL Queries - Analysis and Optimization

In this fast-paced, data-driven landscape, dissecting the hidden mechanics of your SQL Queries is paramount for building reliable applications. Poorly optimized queries can lead to sluggish response times, resource overload, and ultimately, frustrated users.

This guide provides a practical approach to analyzing and optimizing your SQL Queries for improved performance and data retrieval efficiency

Analyzing your SQL Queries

Almost all SQL languages offer profiling tools that serve to identify time allocation and provide a comprehensive understanding. Here, MySQL with NodeJS will serve as good examples for demonstration.

In MySQL, EXPLAIN ANALYZE is the profiling tool. This tool orchestrates query plans, counts rows, and gauges time spent at various execution stages. Post-execution, it provides crucial metrics encompassing table scans, stepwise time consumption, and parsed/retrieved rows, vital for query optimization.

How to use EXPLAIN ANALYZE?

Consider that a public library has two tables:

A user table to track the number of users
A book table to keep a record of each book that a user has added

All SQL code can be found in the references section.

Initially the book table has 10 book entries and user table has 1 entry.

This is a snippet from the profiling tool after execution:

EXPLAIN ANALYZE SELECT * FROM book -> Table scan on book (cost=1.25 rows=10) (actual time=0.0332..0.0377 rows=10 loops=1)

This tells us two things:

The cost and rows estimated by the query optimizer before the execution of the query i.e. cost=1.25ms, rows=10. The query optimizer is responsible for planning which indexes or table scan to use to best provide the desired output in minimal time.
The actual time, rows, and number of loops taken i.e. actual time=0.0332..0.0377ms, rows=10, loops=1. Here, the actual time is the most interesting metric. This shows that 0.0332ms is the time taken on average for 1 loop iteration, and 0.0377ms is the time taken on average for all the loop iterations. Since there is only 1 loop, these times are almost the same.

Upon joining the book and user table with:

SELECT * FROM book INNER JOIN user ON user.id = book.created_by 

-> Inner hash join (book.created_by = user.id) (cost=1.6 rows=1) (actual time=0.0795..0.0953 rows=10 loops=1)     

   -> Table scan on book (cost=0.351 rows=10) (actual time=0.0203..0.0299 rows=10   loops=1)      

   -> Hash         

      -> Table scan on user (cost=0.35 rows=1) (actual time=0.0281..0.0334   rows=1 loops=1)

We see table scans on both tables, followed by the join on user.id. Each step has its own cost and time estimates. Remember, “table scan” means traversing the entire table, while “index scan” utilizes a pre-built map for faster retrieval.

Optimizing your SQL Queries

Identification of the most time-intensive query parts is a significant milestone.

With this intelligence in hand, these optimization techniques can be followed:

Filtering Prior to Joining: When dealing with tables with over a million records and conducting a join before filtering, a substantial amount of time will be expended in performing the join and then filtering the result set. It is considered optimal to first filter the result set and then execute the required joins.
Index mastery: Indexes should be utilized, and if necessary, their application should be enforced if the query optimizer fails to select them for any reason. While using indexes, ensure the matching criteria remain at <=30% for efficient operation. This criterion refers to scenarios in the WHERE clause where indexed parameters yield results accounting for less than 30% of the table count. Higher values favor a full table scan, often proving more effective than an index-based scan in such contexts.
Parallelization of operations: For APIs executing multiple DB calls concurrently, opting for parallel DB calls instead of sequential execution is recommended. Utilizing NodeJS enables the utilization of tools like Promise.allSettled([]) or Promise.all([]) to execute asynchronous calls simultaneously.
Strengthening connection lines: In cases of multiple concurrent DB calls, increasing the number of DB connections via the employed ODM/ORM library is advised. Configuring up to 10 open DB connections in idle mode ensures each of the 10 parallel queries receives a dedicated connection for execution. This eliminates the need to check for available connections, minimizing potential delays caused by limited connection availability or time taken to establish new connections.

References

Creating book and user table:

Stored procedure to populate book table:

At CoReCo Technologies, our focus lies in utilizing technology to solve real-world issues and add value to end-users. Throughout the solutioning phase, our primary focus remains on problem-solving rather than the technology itself. For us, technology is a means to an end, not the final goal. Additionally, we go the extra mile to find optimal solutions within the given constraints such as cost and time.

As of January 2024, we have served 60+ global customers with 100+ digital transformation projects successfully executed. For more details, please visit us at www.corecotechnologies.com or write to us at [email protected].

Kshitij Gajbhiye

Software Engineer

CoReCo Technologies Private Limited

Guide to Analyze and Optimize SQL Queries

Analyzing your SQL Queries

How to use EXPLAIN ANALYZE?

Optimizing your SQL Queries

References

Kshitij Gajbhiye

Next Post

Insurance Data Extraction: 16x Faster

Leave a Reply Cancel reply

Guide to Analyze and Optimize SQL Queries

Analyzing your SQL Queries

How to use EXPLAIN ANALYZE?

Optimizing your SQL Queries

References

Kshitij Gajbhiye

Next Post

Related Posts

Insurance Data Extraction: 16x Faster

Leave a Reply Cancel reply