from faceberg import catalogcat = catalog("./mycatalog")table = cat.load_table("default.imdb")# Select specific columnsdf = table.scan(limit=10).select("label", "text").to_pandas()print(df.head())
text label
0 I rented I AM CURIOUS-YELLOW from my video sto... 0
1 "I Am Curious: Yellow" is a risible and preten... 0
2 If only to avoid making this type of film in t... 0
3 This film was probably inspired by Godard's Ma... 0
4 Oh, brother...after hearing about this ridicul... 0
from faceberg import catalogcat = catalog("./mycatalog")table = cat.load_table("default.imdb")# View schemaprint("Schema:")for field in table.schema().fields:print(f" {field.name}: {field.field_type}")
Snapshot ID: 1
Total records: 100000
Total files: 3
Working with Large Datasets
For large datasets, process in batches:
from faceberg import catalogcat = catalog("./mycatalog")table = cat.load_table("default.imdb")# Process in batches using Arrowscan = table.scan(limit=100)# Get as Arrow table (more memory efficient)arrow_table = scan.to_arrow()print(f"Arrow table: {arrow_table.num_rows} rows, {arrow_table.num_columns} columns")# Convert to Pandas when neededdf = arrow_table.to_pandas()