External Tables vs Managed Tables in Databricks Unity Catalog: A Hands-On Guide

If you've ever wondered whether to let Databricks manage your data or keep control of it yourself, this post walks you through the exact commands.
Step 1: Create the External Location
Before creating any external table, you need to tell Databricks where on S3 it's allowed to read/write. This is the External Location — it maps a storage credential (which wraps an IAM Role) to an S3 path.
-- One-time setup: Register your S3 path with Databricks
CREATE EXTERNAL LOCATION practice_tables
URL 's3://bucket-name/tables/'
WITH (STORAGE CREDENTIAL storagecredentialname)
COMMENT 'External tables for practice';
Think of the External Location as a "gate" — nothing can read or write to that S3 path without going through it.
Step 2: Grant Permissions on the External Location
Unity Catalog enforces access at every level. You must explicitly grant your user (or service principal) the right to create tables at that S3 location.
-- Allow user to create external tables at this location
GRANT CREATE EXTERNAL TABLE ON EXTERNAL LOCATION practice_tables TO `user@company.com`;
-- Allow reading and writing files directly
GRANT READ FILES ON EXTERNAL LOCATION practice_tables TO `user@company.com`;
GRANT WRITE FILES ON EXTERNAL LOCATION practice_tables TO `user@company.com`;
Without CREATE EXTERNAL TABLE, you cannot register a table at this path. READ/WRITE FILES allows direct file access.
Step 3: Verify External Location Was Created
spark.sql("SHOW EXTERNAL LOCATIONS").display()
You should see practice_tables pointing to s3://bucket-name/tables/ with storagecredname as the credential.
Step 4: Create Catalog and Schema
Unity Catalog uses a three-level namespace: catalog.schema.table. Create these containers before the table.
-- Top-level namespace
CREATE CATALOG IF NOT EXISTS practice;
-- Logical grouping inside the catalog
CREATE SCHEMA IF NOT EXISTS practice.raw;
IF NOT EXISTS makes this safe to run multiple times.
Step 5: Create the External Table
The defining feature of an external table is the LOCATION clause — this tells Databricks exactly where to store the Delta files on S3.
CREATE TABLE IF NOT EXISTS practice.raw.test_events
(
id STRING,
event_type STRING,
timestamp TIMESTAMP
)
USING DELTA
LOCATION 's3://bucket-name/tables/practice/raw/test_events/';
Databricks registers the table metadata in Unity Catalog, but the actual data files live at your specified S3 path.
Step 6: Write and Query Data
-- Insert a test record
INSERT INTO practice.raw.test_events
VALUES ('1', 'click', current_timestamp());
-- select from table
SELECT * FROM practice.raw.test_events;
Your data is now physically located in s3://bucket-name/tables/practice/raw/test_events/ — independent of Databricks-managed storage.
Step 7: Create a Managed Table (For Comparison)
-- No LOCATION clause = managed table
-- Databricks chooses the storage location (usually a root bucket)
CREATE TABLE practice.raw.orders
(
order_id STRING,
amount DOUBLE,
created_at TIMESTAMP
)
USING DELTA;
No S3 path required. Databricks manages the file location and lifecycle automatically.
Step 8: Inspect Table Details
spark.sql("DESCRIBE EXTENDED practice.raw.orders").display(truncate=False)
spark.sql("DESCRIBE EXTENDED practice.raw.test_events").display(truncate=False)
The Drop Table Difference
This is the most critical operational difference:
-- Managed: drops metadata AND deletes underlying data files
DROP TABLE practice.raw.orders;
-- ❌ Data is deleted from storage.
-- External: drops metadata ONLY; S3 files remain
DROP TABLE practice.raw.test_events;
-- ✅ Data files remain at s3://bucket-name/tables/practice/raw/test_events/
-- You can recreate the table pointing to the same location.
Side-by-Side Comparison
| Feature | Managed Table | External Table |
|---|---|---|
| Data Location | Databricks-managed | User-defined S3 path |
| Drop Behavior | Deletes data + metadata | Deletes metadata only |
| Setup | Simple CREATE TABLE |
Requires External Location + Credential |
| File Access | Via Databricks | Via Databricks or direct S3 access |
| Ownership | Databricks manages lifecycle | You manage lifecycle/storage |
Conclusion
Managed tables offer simplicity and automated management. External tables provide control over storage location and data retention independent of the table metadata. Choose based on whether you need Databricks to manage the lifecycle (Managed) or if you need persistent, independent control of the data files (External).



