Skip to content

Feature: Support dataframes in lui() calls via Arrow upload #27

@lmeyerov

Description

@lmeyerov

Feature Request

Description

Enable passing dataframes directly to lui() function calls using Arrow format for efficient data transfer.

Proposed API

from typing import Union, Dict, Optional
import pandas as pd
import pyarrow as pa

DfLike = Union[pd.DataFrame, pa.Table]

# Single dataframe
lui('analyze this data', df=sales_df)

# Multiple named dataframes  
lui('compare these datasets', dfs={'sales': sales_df, 'inventory': inventory_df})

# Optional parameters
def lui(prompt: str, df: Optional[DfLike] = None, dfs: Optional[Dict[str, DfLike]] = None):
    ...

Use Case

Users working with dataframes currently need to convert them to text or manually serialize. Direct dataframe support would:

  • Eliminate manual serialization steps
  • Preserve data types and schemas
  • Enable efficient transfer of large datasets
  • Work seamlessly with existing pandas/pyarrow workflows

Implementation Details

  • Accept pd.DataFrame and pa.Table as DfLike types
  • Automatically convert pandas DataFrames to Arrow format internally
  • Use Arrow IPC format for efficient binary serialization
  • Support both single dataframe (df param) and multiple named dataframes (dfs param)
  • Handle upload behind the scenes transparently

Example Usage

import pandas as pd
from louie import lui

# Single dataframe
sales_df = pd.DataFrame({'product': ['A', 'B'], 'revenue': [1000, 1500]})
response = lui('What are the top selling products?', df=sales_df)

# Multiple dataframes
inventory_df = pd.DataFrame({'product': ['A', 'B'], 'stock': [50, 30]})
response = lui(
    'Which products need restocking based on sales velocity?',
    dfs={'sales': sales_df, 'inventory': inventory_df}
)

# With pyarrow.Table
import pyarrow as pa
arrow_table = pa.Table.from_pandas(sales_df)
response = lui('Analyze this Arrow table', df=arrow_table)

Benefits

  • Clean, intuitive API matching existing lui() signature
  • Zero friction for data scientists already using pandas/pyarrow
  • Type safety with Union types
  • Efficient binary format reduces latency
  • Preserves nulls, timestamps, and other complex types correctly

Technical Considerations

  • Arrow IPC format for wire protocol
  • Automatic chunking for large dataframes
  • Memory-efficient streaming where possible
  • Graceful fallback for unsupported column types

🤖 Generated with Claude Code

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions