Skip to content

polars.LazyFrame don't work with from_arrow, even tough it work with raw sql query #308

@OutSquareCapital

Description

@OutSquareCapital

What happens?

ouput of the code example:

PS C:\Users\stett\Documents\python\pql> uv run t.py
Query (sql on LazyFrame):
WITH lf AS (SELECT * FROM arrow_scan(0x15c661746e0, 0x7ffbf06c5ab0, 0x7ffbf06c5420))SELECT * FROM lf

shape: (3, 2)
┌─────┬─────┐
│ a   ┆ b   │
│ --- ┆ --- │
│ i64 ┆ str │
╞═════╪═════╡
│ 1   ┆ x   │
│ 2   ┆ y   │
│ 3   ┆ z   │
└─────┴─────┘
Query (sql on DataFrame):
WITH df AS (SELECT * FROM arrow_scan(0x15c66515a90, 0x7ffbf06c5ab0, 0x7ffbf06c5420))SELECT * FROM df

shape: (3, 2)
┌─────┬─────┐
│ a   ┆ b   │
│ --- ┆ --- │
│ i64 ┆ str │
╞═════╪═════╡
│ 1   ┆ x   │
│ 2   ┆ y   │
│ 3   ┆ z   │
└─────┴─────┘
Query (from Arrow on DataFrame):
SELECT * FROM arrow_scan(0x15c67bb85e0, 0x7ffbf06c5ab0, 0x7ffbf06c5420) AS arrow_object_8756061e3d8cb73a

shape: (3, 2)
┌─────┬─────┐
│ a   ┆ b   │
│ --- ┆ --- │
│ i64 ┆ str │
╞═════╪═════╡
│ 1   ┆ x   │
│ 2   ┆ y   │
│ 3   ┆ z   │
└─────┴─────┘
Traceback (most recent call last):
  File "C:\Users\stett\Documents\python\pql\t.py", line 19, in <module>
    _qry_on(duckdb.from_arrow(lf), "from Arrow on LazyFrame")
            ~~~~~~~~~~~~~~~~~^^^^
_duckdb.InvalidInputException: Invalid Input Error: Python Object Type LazyFrame is not an accepted Arrow Object.
PS C:\Users\stett\Documents\python\pql> 

I saw that in all cases the scan_arrow function was called, and concluded that it's more likely to be a runtime type check issue rather than a real limitation with LazyFrame, hence this is why I raise this as an issue.

To Reproduce

import duckdb
import polars as pl


def _qry_on(qry: duckdb.DuckDBPyRelation, name: str) -> None:

    print(f"Query ({name}):\n{qry.sql_query()}\n")
    print(qry.pl())


if __name__ == "__main__":
    data = {"a": [1, 2, 3], "b": ["x", "y", "z"]}
    lf = pl.LazyFrame(data)
    df = pl.DataFrame(data)

    _qry_on(duckdb.sql("""SELECT * FROM lf"""), "sql on LazyFrame")
    _qry_on(duckdb.sql("""SELECT * FROM df"""), "sql on DataFrame")
    _qry_on(duckdb.from_arrow(df), "from Arrow on DataFrame")
    _qry_on(duckdb.from_arrow(lf), "from Arrow on LazyFrame")

OS:

Windows

DuckDB Package Version:

1.4.4

Python Version:

3.13.7

Full Name:

Stettler Thibaud

Affiliation:

None

What is the latest build you tested with? If possible, we recommend testing with the latest nightly build.

I have not tested with any build

Did you include all relevant data sets for reproducing the issue?

Yes

Did you include all code required to reproduce the issue?

  • Yes, I have

Did you include all relevant configuration to reproduce the issue?

  • Yes, I have

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions