Skip to main content

Command Palette

Search for a command to run...

Getting Started with Databricks Connect on AWS Using Serverless Compute

Overview

Updated
2 min read
Getting Started with Databricks Connect on AWS Using Serverless Compute

Databricks Connect lets you write PySpark code locally in VS Code and execute it remotely on Databricks — no cluster management needed when using serverless compute. This post walks through the exact steps to get it working on Windows from scratch.


Step 1: Install Python 3.12.9

Databricks Connect 16.2+ requires exactly Python 3.12 — newer versions like 3.13 or 3.14 are not yet supported. Download the Windows 64-bit installer from python.org and during install:

  • ✅ Check "Add Python to PATH"

Verify:

python --version
# Python 3.12.9

Step 2: Create and Activate a Virtual Environment

Navigate to your project folder and create an isolated Python environment:



python -m venv .venv

# Activate (command prompt on Windows)
.venv\Scripts\activate.bat

You should see (.venv) appear in your terminal prompt confirming activation.


Step 3: Install Databricks Connect

Install databricks connect package. Yes—per the Databricks Connect usage requirements doc at Databricks Connect usage requirements | Databricks on AWS, Databricks Connect 17.2.x–17.3.x is the supported range for “Serverless, version 4” and it requires Python 3.12.

pip install "databricks-connect==17.3.*"

Step 4: Authenticate with Your AWS databricks Workspace

Use the Databricks CLI to authenticate and populate a named profile in ~/.databrickscfg. Here we created a profile called dev2:

databricks auth login --host https://your-workspace.cloud.databricks.com -p dev

This opens your browser for OAuth login.It writes your credentials automatically to ~/.databrickscfg:

[dev]
host                  = https://your-workspace.cloud.databricks.com
auth_type             = databricks-cli

Step 5: Run Your Script

from databricks.connect import DatabricksSession


spark = DatabricksSession.builder.profile("dev").serverless(True).getOrCreate()

# Check runtime version - serverless shows "serverless" in the version string
version = spark.sql("SELECT current_version()").collect()[0][0]
print(f"\n✅ Connected! Runtime: {version}\n")

df = spark.read.table("shipments.bronze.shipmentsdev3").limit(10)
(df.select("shipment_id", "order_id", "customer_id", "carrier", "tracking_number").show(10, truncate=False))

Running this produces:

How to Confirm Serverless

The runtime version string 18.0.x-photon-scala2.13 contains photonPhoton is Databricks' vectorized engine that runs automatically and exclusively on serverless compute on AWS. Classic all-purpose clusters only use Photon if manually enabled. Seeing photon in the version confirms you are on serverless.