Local catalogs store metadata on your filesystem instead of HuggingFace Hub. They’re useful for:
Testing before deploying to HuggingFace
Offline development
CI/CD pipelines
Quick experiments
Create a Local Catalog
from faceberg import catalog# Create catalog in current directorycat = catalog("mycatalog")cat.init()# Add datasetscat.add_dataset("default.imdb", "stanfordnlp/imdb", config="plain_text")cat.add_dataset("default.gsm8k", "openai/gsm8k", config="main")# List tablestables = cat.list_tables("default")print(f"Tables: {[str(t) for t in tables]}")
Warning: You are sending unauthenticated requests to the HF Hub. Please set a HF_TOKEN to enable higher rate limits and faster downloads.
Main Iceberg table metadata (schema, partitions, snapshots)
snap-*.avro
Manifest list pointing to manifest files
*.avro
Manifest files with data file references
version-hint.text
Current metadata version for quick lookups
Serve via REST API
Expose your local catalog via REST for testing:
# Start server on default port (8181)faceberg ./mycatalog serve# Custom portfaceberg ./mycatalog serve --port 9000# Development mode with auto-reloadfaceberg ./mycatalog serve --reload
The server provides:
REST catalog at http://localhost:8181
API docs at http://localhost:8181/schema/swagger
Connect from Another Process
import duckdbconn = duckdb.connect()conn.execute("INSTALL iceberg; LOAD iceberg")# Attach local REST catalogconn.execute(""" ATTACH 'http://localhost:8181' AS cat ( TYPE ICEBERG, AUTHORIZATION_TYPE 'none' )""")result = conn.execute("SELECT * FROM cat.default.imdb LIMIT 5").fetchdf()
Interactive DuckDB Shell
The quack command opens DuckDB with the catalog pre-attached:
# Start REST server first (in another terminal)faceberg ./mycatalog serve# Open interactive shellfaceberg ./mycatalog quack
Then run SQL:
SHOW ALLTABLES;SELECT*FROM iceberg_catalog.default.imdb LIMIT5;
Query Without REST Server
You can also query directly using the metadata file path:
import duckdbconn = duckdb.connect()conn.execute("INSTALL iceberg; LOAD iceberg")# Query using direct metadata pathresult = conn.execute(""" SELECT label, substr(text, 1, 80) as preview FROM iceberg_scan('mycatalog/default/imdb/metadata/v1.metadata.json') LIMIT 3""").fetchdf()print(result)
label preview
0 0 I rented I AM CURIOUS-YELLOW from my video sto...
1 0 "I Am Curious: Yellow" is a risible and preten...
2 0 If only to avoid making this type of film in t...
Testing Pattern
For tests, create isolated catalogs in temporary directories: