LLM SQL Agent Prompt: Better Prompting for Reliable SQL Bots

Large language models are now good enough to translate natural language into database queries. In practice, this is the foundation of a reliable LLM SQL agent prompt: the model takes a user question, generates a SQL query, executes it, and explains the result. But accuracy doesn’t come “for free.” If you want consistent outputs from an LLM connected to a database, you need prompting strategies that reduce ambiguity, keep queries grounded in schema, and prevent hallucinated columns or tables.

In this blog, we focus on practical prompting techniques to improve a LLM SQL agent prompt when building LangChain-style SQL bots. These tactics help the model generate syntactically correct queries, retrieve relevant rows, and produce answers that match the database.

LLM SQL agent prompt strategies for better SQL generation and grounding

Dialect-Specific Prompting

A core improvement you can make to any LLM SQL agent prompt is dialect grounding. SQL isn’t one language: MySQL, PostgreSQL, SQL Server, and others differ in syntax and functions. If the model doesn’t know which dialect to use, even a “correct” query can fail at runtime.

LangChain supports dialect-specific prompting through built-in utilities. For example, you can use a MySQL-focused template to guide query generation:

_mysql_prompt = """ 
You are a MySQL expert. Given an input question, first create a syntactically correct MySQL query to run, then look at the results of the query and return the answer to the input question. 

Unless the user specifies in the question a specific number of examples to obtain, query for at most {top_k} results using the LIMIT clause as per MySQL. You can order the results to return the most informative data in the database.

Never query for all columns from a table. You must query only the columns that are needed to answer the question. Wrap each column name in backticks (`) to denote them as delimited identifiers.

Pay attention to use only the column names you can see in the tables below. Be careful not to query for columns that do not exist. Also, pay attention to which column is in which table.

Pay attention to use CURDATE() function to get the current date, if the question involves "today".

Use the following format:

Question: Question here
SQLQuery: SQL Query to run
SQLResult: Result of the SQLQuery
Answer: Final answer here
"""

This type of LLM SQL agent prompt reduces common failure modes: wrong date functions, wrong LIMIT/TOP behavior, and queries that ask for columns the database doesn’t have.

This is supported for the following SQL dialects when using the built-in create_sql_query_chain and SQLDatabase: ‘crate’, ‘duckdb’, ‘googlesql’, ‘mssql’, ‘mysql’, ‘mariadb’, ‘oracle’, ‘postgresql’, ‘sqlite’, ‘clickhouse’, ‘prestodb’.

Table Information (Examples of Rows)

Even the best LLM SQL agent prompt will fail if the model doesn’t have schema context. The model needs to know what tables exist, which columns are available, and how tables relate. A practical approach is to include table DDL plus a few example rows so the model learns the shape of the data.

context = db.get_context()
print(context["table_info"])

Output:

CREATE TABLE "Student" (
    "StudentId" INTEGER NOT NULL,
    "FirstName" NVARCHAR(50) NOT NULL,
    "LastName" NVARCHAR(50) NOT NULL,
    "DateOfBirth" DATE NOT NULL,
    "EnrollmentDate" DATE NOT NULL,
    PRIMARY KEY ("StudentId")
);

/* 3 rows from Student table:
StudentId  FirstName  LastName  DateOfBirth  EnrollmentDate
1          John       Doe       2000-01-15   2023-06-01
2          Jane       Smith     1999-11-20   2023-06-02
3          Alice      Johnson   2001-05-10   2023-06-03
*/

This makes your LLM SQL agent prompt more grounded and reduces “made-up columns.” It also helps the model choose correct filters (dates, IDs, names) based on realistic values.

If your schema is too large to fit in the context window, selectively include only the most relevant table definitions based on the user’s question. This keeps the LLM SQL agent prompt short, targeted, and more likely to retrieve the right evidence.

Few-Shot Examples

Few-shot examples are a straightforward way to improve a LLM SQL agent prompt. By showing a small set of “question → SQL” pairs, you demonstrate the style of queries you expect, the table naming conventions, and patterns for joins, filters, and ordering.

examples = [
  {
    "input": "List all courses that have more than 3 credits.",
    "query": "SELECT course_name, course_description FROM courses WHERE credits > 3;"
  },
  {
    "input": "Find all students born after January 1, 2000.",
    "query": "SELECT first_name, last_name FROM students WHERE date_of_birth > '2000-01-01' ORDER BY date_of_birth;"
  }
]

When your LLM SQL agent prompt includes examples like these, the model is more likely to follow the same patterns (select only required columns, include correct filters, and add ordering when it improves readability).

Dynamic Few-Shot Examples

If you have many examples, you don’t want to paste them all into the prompt. Instead, select the most relevant ones dynamically. A common approach is to use an ExampleSelector such as SemanticSimilarityExampleSelector, which retrieves examples closest to the current user input.

prompt.format(input="List all students and their enrolled courses.", top_k=4, table_info="Student")

This keeps your LLM SQL agent prompt small while still benefiting from examples that match the user’s intent. Over time, dynamic example selection becomes one of the most effective ways to improve reliability for complex databases.

Conclusion

A well-designed LLM SQL agent prompt helps the model generate valid SQL, stay aligned to your schema, and reduce hallucinations. Dialect-specific templates reduce syntax errors, schema + row examples provide grounding, and few-shot examples improve query quality. If your schema or example library is large, dynamic example selection keeps prompts efficient and context-aware.

For more details on how CoReCo Technologies can help optimize your AI and LLM-based solutions, visit us at www.corecotechnologies.com. If you’re interested in a collaboration, please reach out to Yogesh Hasabe at [email protected].

Email Us

Prompting Strategies to Optimize LLM Generated Results for LangChain-LLM SQL Database Bots

Dialect-Specific Prompting

Table Information (Examples of Rows)

Few-Shot Examples

Dynamic Few-Shot Examples

Conclusion

Yogesh Hasabe

Next Post

10 Step AI Playbook for Service Industries

Classifying Queries and Managing Bot Flows Using Classifier Chains

Prompting Strategies to Optimize LLM Generated Results for LangChain-LLM SQL Database Bots

Dialect-Specific Prompting

Table Information (Examples of Rows)

Few-Shot Examples

Dynamic Few-Shot Examples

Conclusion

Yogesh Hasabe

Next Post

Related Posts

10 Step AI Playbook for Service Industries

Classifying Queries and Managing Bot Flows Using Classifier Chains