What is Apache Spark?
- An open-source big data platform for data science
- Big Data includes massive data volume, streaming data, unstructured and semi-structured data, images, video, sound.
- There is no IDE, you need bring your own tools
- It is a query/data analytics engine, it is meant to run queries
- It is NOT a storage engine. One would store data in a storage layer like an S3, DataLake, HDFS etc
What is Databricks?
- Commercial product from the creators of Apache Spark
- Complete development environment for Apache Spark
- Numerous proprietary Spark enhancements
- Ideal for Data Science team collaboration
- Optimized for cloud, dont believe you can spin up on your own data center