Join a Special AMA: Unveiling The Graph's New Era – Nov 28-30, 2023

NodeTwentyTwo@alien.top · 2 years ago

Join a Special AMA: Unveiling The Graph's New Era – Nov 28-30, 2023

Drewsapple@alien.top · 2 years ago

Thanks for hosting this, and providing a glimpse of the future at datapalooza.

The separation of Extraction, transformation, Loading and Querying data seems to be key to accelerating the availability and flexibility of the data provided by the graph. Sam’s announcement of bringing clickhouse SQL to the graph really excites me, as I’m currently wasting a lot of time writing/maintaining code to make aggregations over data as part of the transform step, instead of aggregating at query-time.

What can we expect to see for the rollout of clickhouse SQL on the graph?

Since this is dependent on substreams, which in turn depend on firehose, what steps are needed to get substreams working on OP stack chains?

Will there be a way to get an “event substream” without call handlers shipped earlier than the full firehose implementation for OP stack chains, as this can be done with just an RPC instead of instrumenting OP-geth or OP-reth?

Thanks.

xsamgreen@alien.top · 2 years ago

Hi u/Drewsapple! This is Sam from Semiotic Labs. Regarding your rollout question, here’s the current status:
* We currently have Substreams to ClickHouse working well
* We have recently prototyped the SQL API
* We have a sketch for how to handle DBT experiments by the developer
* The plan is to get SQL queries on the network by Q1 2024
* We are very interested in learning more about our developers’ specific use cases for SQL. Please dm me if you would be interested in chatting!

Pinax will answer your OP stack question :)

pinax-network@alien.top · 2 years ago

Hey, I’m Daniel Keyes, CEO of Pinax, and we’re very pleased to be here participating in this AMA.

Thanks for asking these great questions. For SQL data services, Pinax is currently investigating how to deploy these services in a performant, modular, and reliable way. We’ll work closely with StreamingFast and Semiotic to improve the workflow as operator of these services.

For Firehose, Pinax is working on adding RPC nodes for many EVM chains (if you want to see which ones, check the hosted service list of supported blockchains here: https://thegraph.com/docs/en/developing/supported-networks/). The StreamingFast team is working on a Firehose “light” stream that will not need to have deep instrumentation.

There will be some discussion on this topic in the Monthly Core Dev call this coming Thursday if you want to learn more. This page has info on how to access recordings of previous Core Dev calls and how to join in the future.

abourget@alien.top · 2 years ago

Hey apple :)

I’m currently wasting a lot of time writing/maintaining code to make aggregations over data as part of the transform step, instead of aggregating at query-time.

I’d love to learn more about your use case, and where you’re building aggregations in the transform step. Is that within Substreams? If so, we’re working hard to making this simpler and simpler. For instance, we’ll be:

working to make a WASI-compatible target, so you can use a bunch of languages, not only Rust, leveraging libraries from here and there, and previous skills you would have.
building more and more code generation tools, to allow you to get off the ground much more quickly, like having all those dynamic data sources patterns automatically be built for you. We’ve started that with ABI to Database tables (check the latest substreams CLI changelog, in the init command)
some are building DSLs, and higher order libraries in Rust to allow you to do more with less code

That being said, we very much understand there’s a whole lot of things that are best done at query time. That’s why we’re putting lots of efforts on the SQL sink (https://github.com/streamingfast/substreams-sink-sql). It already has a high throughput injector, reorgs navigation - which we just released - for postgres, support for Clickhouse, and a bunch of other features.

What can we expect to see for the rollout of clickhouse SQL on the graph?

This SQL sink is also what we’re turning into a deployable unit, shippable to The Graph network eventually. You have our first take at it here: https://substreams.streamingfast.io/tutorials/substreams-sql … but I think it’ll evolve quite a bit. The goal is that indexers can run those deployment endpoints, and even some gateways can accept deployment requests and decide where to optimally run workloads.

Our goal is to make it as easy as possible for you to think of a data service, pluck some from the community, and have them running on your behalf somewhere on The Graph network.

Since this is dependent on substreams, which in turn depend on firehose, what steps are needed to get substreams working on OP stack chains?

We’ve just recently closed this issue: https://github.com/streamingfast/substreams/issues/278 and we’ve rolled out that RPC poller for the Firehose Ethereum, that requires only an RPC node. The data is lighter, but we can get to much more chains much faster.

Using this method, we’ve backfilled the Arbitrum network (prior to the Nitro, called the “Classic” era). With this method, we’ll be sync’ing one chain after the other. We’re currently sync’ing Bitcoin Core (!) using this new method. OP is next on our list, but with a few instructions one could start using it right away. We’ve crafted a more precise definition of a Firehose extractor (you can read about it here: https://github.com/streamingfast/firehose-core/issues/17) and have implemented the RPC poller methods using this interface. Our goal is to speed up the chain coverage, by simplifying the extraction and not always require core chain instrumentation. Yet, if people want better performances (than the bit of latency induced by some RPC nodes), going deeper can be done post factum.

I think this addresses your last question too ^^.

Thanks for reaching out!

- Alexandre Bourget, CTO at StreamingFast.io