Using planframe-pandas
This guide covers the intended public usage pattern:
- define a schema as a
PandasFramesubclass - construct frames from Python-native data (PlanFrame constructs pandas internally)
- chain transforms (always lazy)
- execute via boundaries (
collect,to_dicts,to_dict,collect_backend,stream_dicts, or async:acollect,ato_dicts,ato_dict,acollect_backend,astream_dicts)
Optional ExecutionOptions can be passed at those boundaries when you need streaming hints.
Streaming rows
If you want to iterate rows (and potentially avoid building large intermediate lists), use:
stream_dicts()/astream_dicts(): yielddict[str, object]stream(name=...)/astream(name=...): yield schema-derivedpydantic.BaseModel
See the core guide: Streaming rows.
Pandas-like helpers (PandasLikeFrame)
PandasFrame subclasses planframe.pandas.PandasLikeFrame, so you can use pandas-style patterns that still compile to PlanFrame’s lazy IR—e.g. boolean indexing df[mask], column selection via filter(items=...) / like= / regex=, astype, eval (alias for assign), drop_duplicates, and pandas-flavored drop / rename overloads. See the core guide: Pandas-like API (planframe.pandas).
Quickstart
Run:
./.venv/bin/python docs/planframe_pandas/guides/examples/basic_usage.py
Expected output:
columns=['id', 'age', 'age_plus_one']
to_dict={'id': [1], 'age': [10], 'age_plus_one': [11]}
rows=[{'id': 1, 'age': 10, 'age_plus_one': 11}]
Construction rules
- Do:
User({"id": [1], "age": [10]}) - Do:
User([{"id": 1, "age": 10}]) - Don’t: pass
pandas.DataFramedirectly intoUser(...)
If you need advanced construction, use Frame.source(...) with a backend frame (this is intentionally “escape hatch” territory).
Execution model
PlanFrame is always lazy:
- Chaining methods (like
df[["a", "b"]]or.assign(...)) does not run pandas operations. collect_backend()evaluates the full plan by calling adapter methods on demand (and returns a backend-native DataFrame).collect()evaluates the full plan and returnslist[pydantic.BaseModel]rows (PlanFrame’s cross-backend materialization contract).
Row numbering and clamping
Two common primitives:
with_row_count(name="row_nr", offset=0)adds a monotonically increasing row number column.clip(lower=..., upper=..., subset=...)clamps numeric columns (ifsubset=None, PlanFrame clamps all numeric schema fields).
Schema-only selectors and multi-column helpers
select_schema(selector, strict=True)evaluates a selector object against the current PlanFrameSchema(backend-independent) and lowers to an explicit selection.- Multi-column helpers:
cast_many,cast_subset,fill_null_many,fill_null_subset. - Rename helpers:
rename_upper/lower/title/strip(...).
Grouping and aggregation
group_by takes one or more keys, each either a column name (str) or a planframe.expr expression (same general idea as sort / join keys). Keys that are expressions are not named after a single input column; in the aggregated result they appear as __pf_g0, __pf_g1, … by position in the key list.
agg takes keyword arguments name=value where each value is either:
- Tuple form:
("op", "column")withopone ofcount,sum,mean,min,max,n_unique. - Aggregation expression: wrap any supported inner expression with
agg_sum,agg_mean,agg_min,agg_max,agg_count, oragg_n_uniquefromplanframe.expr(these produceAggExprIR).
Example:
from planframe.expr import agg_sum, col
from planframe_pandas import PandasFrame
class S(PandasFrame):
g: int
x: int
pf = S({"g": [1, 1, 2], "x": [10, 20, 7]})
out = pf.groupby("g").agg(n=("count", "x"), sx=agg_sum(col("x")))
df = out.collect_backend()
Reshape and nested data
- melt: implemented via
pandas.melt(...) - pivot_longer: convenience wrapper around
melt(...) - pivot: implemented via
DataFrame.pivot_table(...); if you passon_columns, PlanFrame will ensure those output columns exist (filling missing withNA) and will reorder to match. - pivot_wider: convenience wrapper around
pivot(...)(passon_columnsfor deterministic output columns) - explode: implemented via
DataFrame.explode(...) - unnest: expands dict-like values into columns (via
pandas.json_normalize); name collisions raise an error.
I/O and optional dependencies
- CSV:
to_csv(...)(alias forsink_csv(...)) works with the built-in pandas writer. - Parquet:
to_parquet(...)(alias forsink_parquet(...)) requires an extra engine. Installplanframe-pandas[parquet](usespyarrow).