- Published on
Instructor Library: How to Get Structured JSON From LLMs
Instructor is a Python library that allows you to get structured data, like JSON (JavaScript Object Notation - a standard format for sharing data), consistently from AI models. By using Pydantic (a tool for data validation in Python), it ensures that an AI's response follows your exact rules 100% of the time, typically saving developers 5-10 hours of manual error-handling per project.
How does Instructor solve the "Chatty AI" problem?
When you ask an AI model like Claude Sonnet 4 to "return a list of users in JSON," it often adds conversational filler like "Sure! Here is your list:". This extra text breaks your code because computers expect raw data, not a friendly chat.
Instructor acts as a bridge between your Python code and the AI. It uses a technique called function calling (a way for AI to trigger specific code structures) to force the model to fill out a form instead of writing a paragraph. If the AI makes a mistake, Instructor can automatically catch the error and ask the AI to fix it before your program even sees it.
This process transforms a messy, unpredictable text generator into a reliable data engine. You no longer need to write complex "Regular Expressions" (patterns used to find specific text) just to pull a single number or name out of a long AI response.
What do you need to get started?
Before writing your first script, you need a basic environment set up on your computer. Don't worry if you haven't done this in a while; the process is straightforward.
What You'll Need:
- Python 3.14+: The latest stable version of Python. You can download it from python.org.
- An API Key: You will need a key from Anthropic (for Claude) or OpenAI (for GPT-5).
- A Code Editor: We recommend VS Code or Cursor for the best experience.
To install the necessary libraries, open your terminal (the command-line interface on your computer) and run:
# Install the instructor library and the anthropic client
pip install -U instructor anthropic pydantic
How do you define your data structure?
The heart of Instructor is the Pydantic model. Think of this as a "blueprint" or a digital form that you want the AI to fill out. You define exactly what pieces of information you want and what type they should be (like a string of text or a whole number).
In our experience, spending an extra minute defining clear descriptions in your blueprint makes the AI significantly more accurate.
from pydantic import BaseModel, Field
# This is your blueprint
class UserInfo(BaseModel):
# Field descriptions help the AI understand what to look for
name: str = Field(description="The person's full name")
age: int = Field(description="The person's age in years")
occupation: str = Field(description="Their current job title")
In this example, str stands for "string" (text) and int stands for "integer" (a whole number). By setting these types, you are telling Instructor to reject any response where the AI tries to put text in the age box.
How do you connect Instructor to an AI model?
Once your blueprint is ready, you need to "patch" your AI client. Patching is a way of adding new powers to an existing tool without changing how it fundamentally works.
Step 1: Import the libraries. Step 2: Create a standard AI client. Step 3: Wrap that client with Instructor.
import instructor
from anthropic import Anthropic
# Initialize the standard Anthropic client
# Make sure your API key is set in your environment variables
client = Anthropic()
# This "patch" gives the client the ability to understand your blueprints
structured_client = instructor.from_anthropic(client)
Now, instead of just getting back a big block of text, your structured_client can return actual Python objects that your code can use immediately.
Step-by-Step: Extracting data from a sentence
Let's put everything together into a working script. We will take a random sentence and turn it into a clean, structured object using Claude Sonnet 4.
Step 1: Define the model.
We use the UserInfo class we created earlier.
Step 2: Make the request.
You will use the messages.create method, but with one extra ingredient: the response_model.
# Step 3: Run the extraction
user_data = structured_client.messages.create(
# Use the latest 2026 model version
model="claude-4-sonnet-20260215",
max_tokens=1024,
# Tell Instructor which blueprint to use
response_model=UserInfo,
messages=[
{"role": "user", "content": "Meet Sarah, a 28-year-old software engineer."}
],
)
# Step 4: Use the data
print(f"Name: {user_data.name}")
print(f"Age: {user_data.age}")
print(f"Job: {user_data.occupation}")
What you should see: When you run this code, the output will be:
Name: Sarah
Age: 28
Job: software engineer
The AI didn't say "Hello!", it just gave you the data. If you tried to access user_data.name, it works perfectly because it's a real Python object, not just a string of text.
What are the common gotchas for beginners?
It is normal to run into a few bumps when first using structured data. Here are the most common issues and how to solve them.
- Missing API Keys: If your code crashes immediately, check that you have set your
ANTHROPIC_API_KEYin your terminal environment. - Validation Errors: If the AI can't find the information (for example, if you don't mention an age in the prompt), the code might throw an error. You can fix this by making fields "Optional" or providing a default value.
- Model Hallucinations: Sometimes the AI might guess an age if it's not provided. To prevent this, add "Do not guess information" to your Field descriptions.
- Wrong Model Version: Using an outdated model string (like something from 2024) may result in slower performance or lack of support for the latest features. Always use the current 2026 versions.
Why is this better than traditional prompting?
Without Instructor, you would have to write a "Prompt" (the instructions you give to an AI) that says: "Please return JSON. Do not include any other text. Ensure the age is a number." Even then, the AI might fail 5% of the time.
With Instructor, the validation happens at the code level. If the AI returns "twenty-eight" instead of 28, Instructor sees that it doesn't match the int requirement. It can then automatically send a hidden message back to the AI saying: "You provided a string, but I need a number. Please try again."
This "Self-Correction" happens behind the scenes, so your final application remains stable and reliable. This makes it possible to build real products—like automated invoice processors or medical record summarizers—that cannot afford to have formatting errors.
Next Steps
Now that you've built your first structured data extractor, you can start exploring more advanced features. You might try creating nested models (a blueprint inside another blueprint) or using "Enums" (a list of fixed choices, like 'Red', 'Green', or 'Blue') to limit what the AI can choose.
To see the full range of capabilities, including how to handle streaming data or complex lists, check out the official Instructor documentation.