• 0 Posts
  • 6 Comments
Joined 10 months ago
cake
Cake day: November 28th, 2023

help-circle





  • Hey apple :)

    I’m currently wasting a lot of time writing/maintaining code to make aggregations over data as part of the transform step, instead of aggregating at query-time.

    I’d love to learn more about your use case, and where you’re building aggregations in the transform step. Is that within Substreams? If so, we’re working hard to making this simpler and simpler. For instance, we’ll be:

    1. working to make a WASI-compatible target, so you can use a bunch of languages, not only Rust, leveraging libraries from here and there, and previous skills you would have.
    2. building more and more code generation tools, to allow you to get off the ground much more quickly, like having all those dynamic data sources patterns automatically be built for you. We’ve started that with ABI to Database tables (check the latest substreams CLI changelog, in the init command)
    3. some are building DSLs, and higher order libraries in Rust to allow you to do more with less code

    That being said, we very much understand there’s a whole lot of things that are best done at query time. That’s why we’re putting lots of efforts on the SQL sink (https://github.com/streamingfast/substreams-sink-sql). It already has a high throughput injector, reorgs navigation - which we just released - for postgres, support for Clickhouse, and a bunch of other features.

    What can we expect to see for the rollout of clickhouse SQL on the graph?

    This SQL sink is also what we’re turning into a deployable unit, shippable to The Graph network eventually. You have our first take at it here: https://substreams.streamingfast.io/tutorials/substreams-sql … but I think it’ll evolve quite a bit. The goal is that indexers can run those deployment endpoints, and even some gateways can accept deployment requests and decide where to optimally run workloads.

    Our goal is to make it as easy as possible for you to think of a data service, pluck some from the community, and have them running on your behalf somewhere on The Graph network.

    Since this is dependent on substreams, which in turn depend on firehose, what steps are needed to get substreams working on OP stack chains?

    We’ve just recently closed this issue: https://github.com/streamingfast/substreams/issues/278 and we’ve rolled out that RPC poller for the Firehose Ethereum, that requires only an RPC node. The data is lighter, but we can get to much more chains much faster.

    Using this method, we’ve backfilled the Arbitrum network (prior to the Nitro, called the “Classic” era). With this method, we’ll be sync’ing one chain after the other. We’re currently sync’ing Bitcoin Core (!) using this new method. OP is next on our list, but with a few instructions one could start using it right away. We’ve crafted a more precise definition of a Firehose extractor (you can read about it here: https://github.com/streamingfast/firehose-core/issues/17) and have implemented the RPC poller methods using this interface. Our goal is to speed up the chain coverage, by simplifying the extraction and not always require core chain instrumentation. Yet, if people want better performances (than the bit of latency induced by some RPC nodes), going deeper can be done post factum.

    I think this addresses your last question too ^^.

    Thanks for reaching out!

    - Alexandre Bourget, CTO at StreamingFast.io