Apache Spark and Databricks

What is Apache Spark?

  • An open-source big data platform for data science
  • Big Data includes massive data volume, streaming data, unstructured and semi-structured data, images, video, sound.
  • There is no IDE, you need bring your own tools
  • It is a query/data analytics engine, it is meant to run queries
  • It is NOT a storage engine. One would store data in a storage layer like an S3, DataLake, HDFS etc

What is Databricks?

  • Commercial product from the creators of Apache Spark
  • Complete development environment for Apache Spark
  • Numerous proprietary Spark enhancements
  • Ideal for Data Science team collaboration
  • Optimized for cloud, dont believe you can spin up on your own data center

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: