Powering Interactive BI Analytics with Presto and Delta Lake

Created with glancer

this is Kamil here and I will be talking about who don't know what it is enterprise adoption of Presto.
that we've done for Presto Spark, and Delta together
results for your team. community driven project. performance MPP SQL engine.
to be geared towards perform on SQL analytics
capabilities of Presto is that those two independently, adding compute nodes cost and performance
just a compute layer, execution of the query about storing the data. bring your own storage effective for your use case
extenstion, if you're with us variety of different sources
undo correlations of data analytical queries
Presto basically anywhere. deployments of Presto on premises systems, any of those environments
of gravity for that day is. developed by the team at Facebook
logos here on the screen growing SQL engine
deployments especially at and running thousands, limits on scale and performance. enterprise Presto company,
enhancements and integrations, permissions, integration with LDAP,
connectivity needs querying Oracle, Teradata, DB2s, and those are all packaged
in the enterprise settings autoscaling, HA, monitoring, those different environments that effectively.
behind the company right now patches 24 by 7 support contributor and commuter roadmap and enhancing Presto
to go with our platform additional benefits as well. how Presto can connect already, so why get Delta Lake?
was open sourced last year really exciting technology mention several of those here ACID properties over data lake
and insert individual rows frameworks to do that, table which is amazing, is work so that's great.
initial implementation is connectivity for Hive in the future as well. is stored as a Parquet file,
is going when storing these days that's amazing.
around schema evolution in the analytics space. show the benefits of Delta,
arranged in special order, many analytical queries. very thankful to Databricks
are also users of Delta sense for us at Starburst query Delta effectively.
high meta store integration
Delta Lake properties the Delta transaction log, statistics about the data
as an input through Presto that allows us to effectively from different sources built for Presto here.
very first version of that as a collectional Parquet file on average (audio cuts out) obviously those queries
observed in this benchmark scan doing some aggregation foundation that we've done of the Delta reader
actually even more enthusiastic speed ups over 10x the previous solution, helps with performance overall
Delta with Starburst Presto link here on the screen. obviously engineering edition things together right in one environment
Databricks and Starburst Presto as a fast SQL layer sources what we've done leveraging open source Presto, connectors to more sources,
being the primary examples. storage engines for on premises on all the clouds obviously
Cassandra, et cetera. those different sources is really really powerful. sources how to do it effectively,
call as part of our platforms
data encrypted at rest tool and the Presto cluster. grained access control also apply row filters
integrations and certifications RBI (presenter mumbles)
by now is part of Databricks, and speak to Presto natively of the broad set of connectors
leverages Starburst and Presto, often that's Databricks Spark. think those technologies
machine learning jobs obviously managing data lake, whether you leverage Spark SQL, Databricks and inside Spark
designed and excels in of queries at the same time, BI reporting analytics, federate different sources and Parquet files obviously
databases, no SQL engines, provide a lot of value faster performance of Presto and Starburst so many joint customers
and analytics ecosystem, having your raw data sources Delta Lake specifically
the machine learning in Sagemaker, for those use cases
SQL and all over the place Presto is perfect answer responsive SQL access RDBI tools and SQL editors for analytical purposes
analytical purposes. service later in the demo
also (audio cuts out) Hat and Openship platform many different ways as well adopted at your company.
integrative way to do all those. simplified drastically all this together, right architectural technology.
that we can show here? we want to advocate for already being leveraged Databricks and Starburst streaming into Delta Lake
those different layers silver layer of Delta Lake store or the gold layer, for fast analytics.
happen in Databricks, fast highly concurrent SQL your aggregate store of those tables if needed.
gravity there, however, Oracle RDB2 et cetera
coming from other sources. like more textural data sets comments, less structured data,
spread across so many places the same time quite often provide here with Starburst have too for those pathways.
in the demo in a moment traditional enterprise data to data sources as well further analytical needs overall architecture.
have all the data in one place source of true for data point to query this data
data or ex data into one place imagine how you want me to this elastic search down some processing there
to the rational world for them to be analyzed. obviously the end users SQL editor such as Looker,
like JDBC drivers ODBC show all of this in a demo put all those things together. Databricks notebook here structured streaming ingestion
sure the connectivity is live, command to receive this data stream is actually live,
viewer which is a SQL editor tool all different data sources from DB viewer client as you can see now set up in Amazon Cloud quickly as well individually
some customer information bringing this all together correlate this information between all these customer here in the Presto web UI
query with four tables at the DB viewer screen here. BI classic BI tool tableau creating all the same information in a geographical dashboard
coming from which countries reflected on the dashboard again of federating all the sources,
per country per region report style visualization interactive live queries sources during query time. of both those together answer any questions.