Get In Touch
401, Parijaat, Baner Pune 411021
[email protected]
Business Inquiries
[email protected]
Ph: +91 9595 280 870
Back

Prompting Strategies to Optimize LLM Generated Results for LangChain-LLM SQL Database Bots

Artificial intelligence is advancing quickly, and large language models (LLMs) are getting very good at understanding and generating natural language. These AI-powered systems can generate SQL queries, retrieve and manipulate data, and even offer insights based on user prompts. However, to maximize the effectiveness of these bots, it’s crucial to implement strategies that optimize the quality and relevance of LLM-generated results. There are several essential components that require optimization, but we’ll focus on prompting strategies for now.

Dialect-Specific Prompting

One simple strategy is to tailor the prompt to the required SQL dialect as provided by LangChain.One simple strategy is to tailor the prompt to the required SQL dialect as provided by LangChain.

_mysql_prompt = """ 
You are a MySQL expert. Given an input question, first create a syntactically correct MySQL query to run, then look at the results of the query and return the answer to the input question. 

Unless the user specifies in the question a specific number of examples to obtain, query for at most {top_k} results using the LIMIT clause as per MySQL. You can order the results to return the most informative data in the database.

Never query for all columns from a table. You must query only the columns that are needed to answer the question. Wrap each column name in backticks (`) to denote them as delimited identifiers.

Pay attention to use only the column names you can see in the tables below. Be careful not to query for columns that do not exist. Also, pay attention to which column is in which table.

Pay attention to use CURDATE() function to get the current date, if the question involves "today".

Use the following format:

Question: Question here
SQLQuery: SQL Query to run
SQLResult: Result of the SQLQuery
Answer: Final answer here
"""

This is supported for the following SQL dialects when using the built-in create_sql_query_chain and SQLDatabase: ‘crate’, ‘duckdb’, ‘googlesql’, ‘mssql’, ‘mysql’, ‘mariadb’, ‘oracle’, ‘postgresql’, ‘sqlite’, ‘clickhouse’, ‘prestodb’.

Table Information (Examples of Rows)

The model needs at least some schema information to write relevant SQL queries. We can provide this using convenient methods built into our database, which supply context such as table names and example rows.

context = db.get_context()
print(context["table_info"])

Output:

CREATE TABLE "Student" (
    "StudentId" INTEGER NOT NULL,
    "FirstName" NVARCHAR(50) NOT NULL,
    "LastName" NVARCHAR(50) NOT NULL,
    "DateOfBirth" DATE NOT NULL,
    "EnrollmentDate" DATE NOT NULL,
    PRIMARY KEY ("StudentId")
);

/* 3 rows from Student table:
StudentId  FirstName  LastName  DateOfBirth  EnrollmentDate
1          John       Doe       2000-01-15   2023-06-01
2          Jane       Smith     1999-11-20   2023-06-02
3          Alice      Johnson   2001-05-10   2023-06-03
*/

When the database schema is too large to fit in the model’s context window, we need to selectively include only the relevant table definitions based on user input.

Few-Shot Examples

Including a few examples of SQL queries based on natural language inputs can significantly enhance the model’s performance, especially for more complex queries.

examples = [    {        "input": "List all courses that have more than 3 credits.",        "query": "SELECT course_name, course_description FROM courses WHERE credits > 3;"    },    {        "input": "Find all students born after January 1, 2000.",        "query": "SELECT first_name, last_name FROM students WHERE date_of_birth > '2000-01-01' ORDER BY date_of_birth;"    }]

Dynamic Few-Shot Examples

When dealing with large sets of examples, it’s often best to include only the most relevant ones in the prompt. This can be achieved by using an ExampleSelector, such as SemanticSimilarityExampleSelector, which retrieves examples most similar to the input.

prompt.format(input="List all students and their enrolled courses.", top_k=4, table_info="Student")

Conclusion

These prompting strategies significantly improve the effectiveness of LangChain-LLM SQL database bots. They help the model better understand the underlying database and deliver more accurate responses, minimizing the chances of generating made-up answers. Selecting the right prompting strategy for your database and implementing it can enhance the performance and reliability of your LLM-powered applications.

For more details on how CoReCo Technologies can help optimize your AI and LLM-based solutions, visit us at www.corecotechnologies.com. If you’re interested in a collaboration, please reach out to Yogesh Hasabe at [email protected].

Yogesh Hasabe
Yogesh Hasabe

Leave a Reply

Your email address will not be published. Required fields are marked *