-
Notifications
You must be signed in to change notification settings - Fork 1
Open
Description
Feature Request
Description
Enable passing dataframes directly to lui() function calls using Arrow format for efficient data transfer.
Proposed API
from typing import Union, Dict, Optional
import pandas as pd
import pyarrow as pa
DfLike = Union[pd.DataFrame, pa.Table]
# Single dataframe
lui('analyze this data', df=sales_df)
# Multiple named dataframes
lui('compare these datasets', dfs={'sales': sales_df, 'inventory': inventory_df})
# Optional parameters
def lui(prompt: str, df: Optional[DfLike] = None, dfs: Optional[Dict[str, DfLike]] = None):
...Use Case
Users working with dataframes currently need to convert them to text or manually serialize. Direct dataframe support would:
- Eliminate manual serialization steps
- Preserve data types and schemas
- Enable efficient transfer of large datasets
- Work seamlessly with existing pandas/pyarrow workflows
Implementation Details
- Accept
pd.DataFrameandpa.TableasDfLiketypes - Automatically convert pandas DataFrames to Arrow format internally
- Use Arrow IPC format for efficient binary serialization
- Support both single dataframe (
dfparam) and multiple named dataframes (dfsparam) - Handle upload behind the scenes transparently
Example Usage
import pandas as pd
from louie import lui
# Single dataframe
sales_df = pd.DataFrame({'product': ['A', 'B'], 'revenue': [1000, 1500]})
response = lui('What are the top selling products?', df=sales_df)
# Multiple dataframes
inventory_df = pd.DataFrame({'product': ['A', 'B'], 'stock': [50, 30]})
response = lui(
'Which products need restocking based on sales velocity?',
dfs={'sales': sales_df, 'inventory': inventory_df}
)
# With pyarrow.Table
import pyarrow as pa
arrow_table = pa.Table.from_pandas(sales_df)
response = lui('Analyze this Arrow table', df=arrow_table)Benefits
- Clean, intuitive API matching existing lui() signature
- Zero friction for data scientists already using pandas/pyarrow
- Type safety with Union types
- Efficient binary format reduces latency
- Preserves nulls, timestamps, and other complex types correctly
Technical Considerations
- Arrow IPC format for wire protocol
- Automatic chunking for large dataframes
- Memory-efficient streaming where possible
- Graceful fallback for unsupported column types
🤖 Generated with Claude Code
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels