Skip to main content

Command Palette

Search for a command to run...

External Tables vs Managed Tables in Databricks Unity Catalog: A Hands-On Guide

Updated
4 min read
External Tables vs Managed Tables in Databricks Unity Catalog: A Hands-On Guide

If you've ever wondered whether to let Databricks manage your data or keep control of it yourself, this post walks you through the exact commands.


Step 1: Create the External Location

Before creating any external table, you need to tell Databricks where on S3 it's allowed to read/write. This is the External Location — it maps a storage credential (which wraps an IAM Role) to an S3 path.


-- One-time setup: Register your S3 path with Databricks
CREATE EXTERNAL LOCATION practice_tables
    URL 's3://bucket-name/tables/'
    WITH (STORAGE CREDENTIAL storagecredentialname)
    COMMENT 'External tables for practice';

Think of the External Location as a "gate" — nothing can read or write to that S3 path without going through it.


Step 2: Grant Permissions on the External Location

Unity Catalog enforces access at every level. You must explicitly grant your user (or service principal) the right to create tables at that S3 location.


-- Allow user to create external tables at this location
GRANT CREATE EXTERNAL TABLE ON EXTERNAL LOCATION practice_tables TO `user@company.com`;

-- Allow reading and writing files directly
GRANT READ FILES  ON EXTERNAL LOCATION practice_tables TO `user@company.com`;
GRANT WRITE FILES ON EXTERNAL LOCATION practice_tables TO `user@company.com`;

Without CREATE EXTERNAL TABLE, you cannot register a table at this path. READ/WRITE FILES allows direct file access.


Step 3: Verify External Location Was Created


spark.sql("SHOW EXTERNAL LOCATIONS").display()

You should see practice_tables pointing to s3://bucket-name/tables/ with storagecredname as the credential.


Step 4: Create Catalog and Schema

Unity Catalog uses a three-level namespace: catalog.schema.table. Create these containers before the table.


-- Top-level namespace
CREATE CATALOG IF NOT EXISTS practice;

-- Logical grouping inside the catalog
CREATE SCHEMA IF NOT EXISTS practice.raw;

IF NOT EXISTS makes this safe to run multiple times.


Step 5: Create the External Table

The defining feature of an external table is the LOCATION clause — this tells Databricks exactly where to store the Delta files on S3.


CREATE TABLE IF NOT EXISTS practice.raw.test_events
(
    id          STRING,
    event_type  STRING,
    timestamp   TIMESTAMP
)
USING DELTA
LOCATION 's3://bucket-name/tables/practice/raw/test_events/';

Databricks registers the table metadata in Unity Catalog, but the actual data files live at your specified S3 path.


Step 6: Write and Query Data


-- Insert a test record
INSERT INTO practice.raw.test_events
VALUES ('1', 'click', current_timestamp());

-- select from table
SELECT * FROM practice.raw.test_events;

Your data is now physically located in s3://bucket-name/tables/practice/raw/test_events/ — independent of Databricks-managed storage.


Step 7: Create a Managed Table (For Comparison)


-- No LOCATION clause = managed table
-- Databricks chooses the storage location (usually a root bucket)
CREATE TABLE practice.raw.orders
(
    order_id   STRING,
    amount     DOUBLE,
    created_at TIMESTAMP
)
USING DELTA;

No S3 path required. Databricks manages the file location and lifecycle automatically.


Step 8: Inspect Table Details


spark.sql("DESCRIBE EXTENDED practice.raw.orders").display(truncate=False)
spark.sql("DESCRIBE EXTENDED practice.raw.test_events").display(truncate=False)

The Drop Table Difference

This is the most critical operational difference:

-- Managed: drops metadata AND deletes underlying data files
DROP TABLE practice.raw.orders;
-- ❌ Data is deleted from storage.

-- External: drops metadata ONLY; S3 files remain
DROP TABLE practice.raw.test_events;
-- ✅ Data files remain at s3://bucket-name/tables/practice/raw/test_events/
-- You can recreate the table pointing to the same location.

Side-by-Side Comparison

Feature Managed Table External Table
Data Location Databricks-managed User-defined S3 path
Drop Behavior Deletes data + metadata Deletes metadata only
Setup Simple CREATE TABLE Requires External Location + Credential
File Access Via Databricks Via Databricks or direct S3 access
Ownership Databricks manages lifecycle You manage lifecycle/storage

Conclusion

Managed tables offer simplicity and automated management. External tables provide control over storage location and data retention independent of the table metadata. Choose based on whether you need Databricks to manage the lifecycle (Managed) or if you need persistent, independent control of the data files (External).