Ask Your File: Query CSV or Parquet Using AI- NL2SQL

Imagine being able to upload a .csv or .parquet file and simply ask:
“What is the average transaction amount?”
“Which product had the highest sales?”
— without writing a single line of SQL.

That’s what Ask Your File does. It’s a lightweight, AI-powered Natural Language to SQL (NL2SQL) web app that lets you explore your own data using natural language. Here’s how I built it, and why each component matters.


What the App Does
  • Upload a CSV or Parquet file
  • Ask a question in plain English
  • The app:
    • Loads your data into memory
    • Uses OpenAI to generate a SQL query
    • Executes it with DuckDB
    • Shows both the answer and the generated SQL

No database setup, no SQL editor, no dashboarding tools — just one file, one prompt.


Tools and Components Used

1. OpenAI (GPT-4) – Natural Language to SQL

At the heart of the app is an OpenAI model (like GPT-4), which interprets your question and converts it into SQL.

“What is the total revenue by product?”
is translated to something like:

SELECT product, SUM(revenue) FROM data GROUP BY product

This is what allows non-technical users to interact with structured data using natural language.


2. LangChain SQL Agent – Schema Awareness + Query Orchestration

LangChain is the framework that connects everything. It reads the schema from your uploaded file, constructs prompts, and handles communication with the language model.

It also shows you the actual SQL query generated, which is useful for transparency and learning.


3. DuckDB – Instant, In-Memory SQL Execution

DuckDB is a fast, zero-setup database engine designed for analytics. It can read both CSV and Parquet directly, without needing to import or transform the data.

  • No database servers
  • No configuration
  • Just run SQL directly on the file you uploaded

Perfect for quick, local analytics.


4. LangChain Memory – Conversational Follow-Ups

The app uses ConversationBufferMemory so you can ask follow-up questions like:

“How about just for 2023?”

This enables a smoother, more conversational experience — much closer to how a human analyst would work with you.


5. Streamlit – Simple Web App Interface

Streamlit powers the front-end of the app, including:

  • File upload (CSV/Parquet)
  • A text box for your question
  • Display of both the result and the generated SQL

Streamlit makes it easy to deploy and share — no front-end code needed.


6. Streamlit Secrets – Secure API Key Handling

The app reads the OpenAI API key from Streamlit’s secure secrets system. This keeps your credentials safe and out of the codebase.

tomlCopyEdit# .streamlit/secrets.toml
OPENAI_API_KEY = "sk-xxxxxxxxxxxxxxxxxx"

7. Git + GitHub – Version Control & Sharing

The full project is tracked in Git and published on GitHub for easy collaboration and open-source sharing. It also serves as the deployment base for Streamlit Cloud.


Example Questions You Can Ask

Try uploading any CSV or Parquet file and asking:

  • “Which customer spent the most?”
  • “How many records are in the file?”
  • “What is the average order quantity?”
  • “List the top 3 categories by sales.”

The app responds with both the result and the SQL that powered it.


Try It Yourself

You can try the app live here:

👉 https://langchain-csv-agent-hzi8wj2vhbzmnomucfiilz.streamlit.app/


Why This Matters

This project shows how easy it is to build AI-assisted analytics using modern tools. Instead of building full dashboards, users can ask questions and get instant insight — even on raw files.

It’s not just a cool demo — it’s a real example of how AI is making data more accessible for everyone.


Tech Stack Summary
  • OpenAI – Natural language understanding
  • LangChain – Agent + memory + prompt logic
  • DuckDB – SQL engine for CSV/Parquet
  • Streamlit – Web app and UI
  • GitHub + Streamlit Cloud – Deployment