Why I Think Spark Will Have the Staying Power of SQL

Spark is to SQL what calculus is to algebra.

Michael Vedomske -

October 27, 2017

Why-I-Think-Spark-Will-Have-the-Staying-Power-of-SQL

Old-timer Just Keeps on Tickin’

SQL has been around for almost 40 years. SQL has been around in commercial form since 1979. That’s when Relational Software, Inc. (which later became Oracle) released Oracle version 2 (which was the marketing renaming of what was really version 1).

Think about that for a second. A hotshot fresh-out-of-undergrad SQL-skilled new hire would be retiring in just a couple years. Some people still build their careers off of SQL skills. In other words, SQL had incredible staying power. It still does.

Enter the Young Gun

So what does this have to do with AI? Well, I’m going to go out on a limb and say that we’re three years into a similar journey with another landscape shifting technology: Spark. Spark was initially released as an Apache project in May of 2014. I happened to be a fresh hire (albeit PhD, not undergrad) and Spark was HOT. I mean, it was exactly what our company (and many others) needed and every release just got better.

I have a few reasons that I believe will help Spark stay meaningful through the years.

1. Daddy Warbucks Got Yer Back

SQL was supported by a strong company (read, had commercial support) while also taking advantage of the open source efforts of outside contributors and was eventually standardized. Spark has the commercial support of Databricks which is currently valued at nearly $1B only a few years into its existence. And as an Apache project, it is developed at an extremely rapid clip by a vibrant open-source community.

What’s probably even more important is the fact that it is used at so many large companies. In other words, it has weaseled it’s way into the core toolset of much of the world’s GDP. And that’s just the beginning, because according to DataBricks, they’re still working on reaching the other 99%.

2. One Stop ML Shop

One of the things that made it really great was it could pass through HQL and then soon had it’s own SQL-like language, Spark-SQL. For the first time, data wrangling and machine learning could be executed on big data in one place in well-known languages at extraordinary speed. It was the holy grail of big data science.

Spark meets two primary needs:

Easy data wrangling (in a familiar approach: SQL)
Many of your favorite machine learning algorithms at scale.

In other words, SQL’s staying power, and natural way of thinking about data, is what will help Spark also have staying power. Yes, most data stores are no-SQL, but the fact that you can use non-relational databases and think about the data in them as if they were relational is what makes it so powerful. Notice, SQL is still the reference here. All databases are referenced by their relation to SQL, that’s saying something.

Spark can handle pretty much any data store you throw at it and data scientists can use a common way of thinking about data (SQL) for handling it. You don’t have to use the SQL-like interface, but it’s there, and many take advantage of it. Don’t care for the SQL/HQL aproach? That’s fine, you can use Spark like many use bash for data wrangling. Spark spans many skill levels.

3. It Feels Familiar

Because Spark has a machine learning library, you can use it much like you would familiar data science languages like R and Python. The usefulness here goes beyond just syntax, it’s the process that makes it so user-friendly.

Interactively playing with and exploring data is one of the most powerful parts of R and Python. You can very quickly start to peel back the layers and find the stories within the data. Before Spark, that process was painful and slow (sorry MapReduce 🙁 … ). Suddenly with Spark, working with very large data sets felt much more like what we experienced in R and Python. Sure, there was still some waiting, but nothing close to what it was before.

The second powerful parts of R and Python are the packages that contain numerous algorithms for machine learning (and just about any other data-related task you can think of). Spark does this as well, though in a more limited way (due to the parallelization it requires). Spark makes big data feel a little smaller. In today’s parlance, the user experience is solid.

Apache Spark Architecture – See You In 40 Years

SQL made working with data much simpler. For the first time, people could use a straighforward logic and language for getting at previously hidden knowledge. Spark is the next natural step of that evolution. In this step, the hidden knowledge is less explicit, and is found via feature engineering, machine learning, and dipping into vast stores of previously untapped data. Because Spark makes doing these things simple in the way that SQL made the first step of data exploration simple.

Spark is to SQL what calculus is to algebra. And that’s why I think Spark will have the staying power of SQL.

EXPLORE

ABOUT

COMMUNITY

SUBMIT CONTENT

CONTENT

Old-timer Just Keeps on Tickin’

Enter the Young Gun

1. Daddy Warbucks Got Yer Back

2. One Stop ML Shop

3. It Feels Familiar

Apache Spark Architecture – See You In 40 Years

New Episode

Deploying IoT at the Edge

Related Articles

IoT Swims Laps Around ...

How Much Does IoT Adop...

4 Ways IoT is Making B...

Related Articles

More Articles

Latest IoT News

Latest IoT News

Betaworks bets on AI agents in latest ‘Camp’ cohort

The rise of smart and AI-capable cellular IoT modules: Evolution and market outlook

The Barcelona Cybersecurity Congress will bring together the best solutions to stop cyber-attacks

2027 LoRaWAN roadmap published – for satellite IoT, hybrid IoT, easy IoT

Fact of the Day – 4/16/2024

The Shift to Industry 5.0 will Propel IIoT

SandboxAQ’s AQtive Guard deployed by SoftBank for cryptographic security

High demand and good growth for smart-city tech – but same challenges remain

Success Stories: Electric Rises to the Skies

OpenText expands GenAI for enterprise content, IoT

OpenAI plans new Tokyo office, Tesla lays offs thousands

Siemens and Microsoft standardize digital-twin languages

Investors are growing increasingly wary of AI

Fact of the Day – 4/15/2024

Siemens Digital Enterprise – An overview with Stefan Krug

VOZIQ AI sets AI retention strategy for Hawx

A Review of Top 10 IoT Market Size Reports in 2024 in UK and Europe﻿

Digital Healthcare Market Size in 2024: Top 10 Digital Health Market Research Reports with Health Tech Trends and MedTech Forecasts till 2030﻿

Cisco enhances coffee farming in Tanzania using the IoT Solution

“Niche to mainstream” – annual smart lighting shipments to hit 138m by 2030

Fibocom to unveil a series of Linux-based edge AI solutions for industrial apps at Embedded World 2024

Blues claims first viable ‘chip-down’ alternative for large-scale IoT design

Fact of the Day – 4/12/2024

Understanding the Convergence of Cybersecurity Drivers

CONTENT

EXPLORE

ABOUT

ABOUT

COMMUNITY

SUBMIT CONTENT

Search IoT For All

A Review of Top 10 IoT Market Size Reports in 2024 in UK and Europe

Digital Healthcare Market Size in 2024: Top 10 Digital Health Market Research Reports with Health Tech Trends and MedTech Forecasts till 2030