SQL on Ethereum: How to Work With All the Data from a Transactio…

SQL on Ethereum: How to Work With All the Data from a Transaction

October 4th, 2021

if you’re looking for more web3 data content, check out my 30-day free course (with videos)!

What's in your wallet?

If you've ever made a transaction on Ethereum (or any smart contract enabled blockchain), then you've probably looked it up on a block explorer and seen this heap of information:

and that's just the information on the first tab

Learning to read the details of a transaction will be the foundation for all your Ethereum data analysis and knowledge, so let's cover all the pieces and how to work with them in SQL. I'll be using Dune Analytics to run my queries, but there are many other tools you can use to query the chain such as Big Query and Flipside Crypto.

If you're completely new to SQL and Ethereum I recommend starting with my full beginners' overview first.

Guide Overview

We're going to cover transactions in four layers:

Transaction basics
Function calls and state
Internal Transactions (Traces)
Logs (Events Emitted)

As the base for our transaction examples, we'll be using the Mirror Crowdfund contract. Put simply, this is a smart contract that allows you to get ERC20 (fungible) or ERC721 (NFTs) tokens in exchange for donating ETH to the contract. The creator of the contract can then withdraw those funds by closing the crowdfund. This is by no means a simple contract, but the point I want to make here is that you don't need to understand all the solidity code to start your analysis - you just need to know how to navigate the four layers above.

The three transactions we'll study are:

Creation/deployment of the crowdfund contract
Contributions of ETH to the contract
Closing and withdrawing funds from the contract

Side note, we also just opened up crowdfunds for anyone to use, so if you're curious or want to create a crowdfund head to mirror.xyz/dashboard to get started. Hop into our discord while you're at it!

Creation/Deployment of the crowdfund contract

First Transaction: 0x5e5ef5dd9d147028f9bc21127e3de774a80c56a2e510d95f41984e6b7af1b8db

Let's start with the transaction basics.

Each transaction has a unique keccak256 transaction hash of a few different variables
There's a blocknumber associated based on when the transaction was mined, typically a new block is created every 15 seconds.
From is the one who signed the transaction, To is the contract address that was interacted with
Value is the ETH value that was transferred from the signer's wallet. Even if that value is 0 that doesn't mean that no ETH was transferred during the transaction.
Gas is a bit complicated (especially with EIP-1559), but just keep this formula in mind:

Gas Price * Gas Used by Transaction = Transaction Fee

Now for the meat and bones, the input data of a transaction:

This is just bytecode for any function call and the parameters passed in. The first 8 characters (4 bytes) are the function signature 0x849a3aa3, essentially a hash of the function name and parameter types. And no, these are not always unique which can lead to hacks/security issues. In this case, this function calls the factory contract to create the crowdfund contract (it's a proxy, but we won't get into that).

createCrowdfund((uint256,uint256,bytes32)[], (address,uint256), string, string, address, address, uint256, uint256, uint256)

This shows up if you click "decode input data", and you can see the various variables values set as well. Every subsequent 64 characters (32 bytes) is a different input variable. The crowdfund comes with three tiers of editions. In this crowdfund for BLVKHVND they used quantities of 1000, 250, and 50 with prices of 0.1, 0.3, and 1 ETH respectively.

Notice that the price actually shows up as 100000000000000000 , which is because the first 18 zeroes represent decimals. We'll have to do the conversions by dividing by 10^18 in our data.

That was a lot, let's get to querying. Dune has a table called ethereum.transactions which has all the variables we've talked about above for every transaction since the first block. We can query this table for the appearance of 0x849a3aa3 in the last few months:

SELECT * FROM ethereum.transactions 
WHERE "block_time" > now() - interval '3 months'
AND "data" is not null
AND SUBSTRING ( encode("data", 'hex'), 1, 8 ) = '849a3aa3'

ethereum.transactions is a very large table, so if you query without filters the query is going to timeout (taking more than 30 minutes). Filtering by block_time is usually most useful, and in this case we're taking all the rows that have occurred within 3 months. Also, many transactions are just ETH transfers without any data attached so we'll filter that out by only keeping data is not null. Now for checking for the function signature, we need to encode the data into a string from hexadecimal, then take only the characters from position 1 to position 8 using SUBSTRING.

Now the complicated parts, internal transactions and events emitted. For this, it'll be easier to look at the code. If you go to the contract tab on etherscan and do a ctrl+f on file 1 of 10 you'll find the following code (I've edited out some bits to make this more readable).

function createCrowdfund(
        ...variables...
    ) external returns (address crowdfundProxy) {
        ...some variable prep code...

        crowdfundProxy = address(
            new CrowdfundWithPodiumEditionsProxy{
                salt: keccak256(abi.encode(symbol_, operator_))
            }(treasuryConfig, operator_)
        );

        emit CrowdfundDeployed(crowdfundProxy, name_, symbol_, operator_);

        ...register to treasury code...
    }

The first key line here is crowdfundProxy = address(contract_to_be_created), which is what deploys the new contract and creates an internal transaction of type CREATE 0. Transferring ETH also creates an internal transaction of type CALL , which we'll see in the next transaction we study.

We can query for all the crowdfund contracts created with:

SELECT tx."block_time", tx."from", tr."type", tr."code"
FROM ethereum.transactions tx 
LEFT JOIN ethereum.traces tr ON tx."hash"=tr."tx_hash" --tracks internal transactions
WHERE tx."to" = '\x15312b97389a1dc3bcaba7ae58ebbd552e606ed2' -- crowdfund podiums edition
AND tr."type" = 'create'

We need ethereum.transactions because we want to filter for traces (internal transactions) only related to transactions on the factory contract. We need this since an internal transaction will not always have the same to as that of the overall transaction. We can JOIN the tables on the transaction hash, and then filter for only internal transactions of the create type.

The second key line here is emit CrowdfundDeployed, which creates a log that is stored in the node but not in the block. If you look at the logs, you'll notice that EditionCreated events are also emitted, but this is from another contract that actually creates the ERC721 tokens (hence a different address).

Similar to a function signature, events have a unique hash as well that sits in Topic 0. So in the events above, 0x5133bb164b64ffa4461bc0c782a5c0e71cdc9d6c6ef5aa9af84f7fd2cd966d8e is the hash for CrowdfundDeployed and 0xbaf1f6ab5aa5406df2735e70c52585e630f9744f4ecdedd8b619e983e927f0b6 is the hash for EditionCreated.

We can query the ethereum.logs table in dune to see all crowdfunds created as well:

SELECT * FROM ethereum.logs
WHERE "topic1"='\x5133bb164b64ffa4461bc0c782a5c0e71cdc9d6c6ef5aa9af84f7fd2cd966d8e'::bytea

topic2 and topic3 typically hold the data for ETH transfers, otherwise, event data will show up in the data column. We'll get more into how to work with this later.

Logs are very helpful, as they can be used to emit state variables instead of just the function call values (TheGraph uses logs to model subgraphs for GraphQL queries). Next, we'll utilize everything we've covered to study the contributions of ETH to our newly created crowdfund contract (sitting at the address 0x320d83769eb64096ea74b686eb586e197997f930 ).

If you've made it this far, then you're already through all the tough concepts. Give yourself a pat on the back! We'll really be getting into the details in the next two sections, so take a breather if you need to.

Contributions of ETH to the contract

Second Transaction: 0xd4ce80a5ee62190c5f5d5a5a7e95ba7751c8f3ef63ea0e4b65a1abfdbbb9d1ef

This one is fairly simple to read. Jesse paid 1 ETH to mint an edition of tokenId 167 from the BLVKHVND crowdfund. He also got 1000 HVND, the ERC20 token the crowdfund gives out based on the size of the donation.

But what if we wanted to see how much ETH has been contributed over time, or how many editions have been sold? Sometimes contracts will have a view function in Read Contract on etherscan where you can get total balances. But in this case, the contract doesn't have that.

Remember that function calls change the state data, which we'll need to piece together the overall state data by aggregating over transaction history. Sometimes the overall state of a contract can be emitted in events, such as with Compound V2's AccrueInterest event.

In our case, we'll need to do two things in one query to get to total ETH contributed:

get the transactions that have the "contribute" method called
sum the total ETH transferred by filtering for internal transactions which have the type CALL

Remember, I can get the method function signature by decoding the input data on etherscan.

SELECT SUM(tr."value"/1e18) as contribution FROM ethereum.transactions tx 
LEFT JOIN ethereum.traces tr ON tx."hash" = tr."tx_hash"
--transactions filtering 
WHERE tx."to" = '\x320d83769eb64096ea74b686eb586e197997f930'::bytea
AND tx."data" is not null
AND SUBSTRING ( encode(tx."data", 'hex'), 1, 8 ) IN ('a08f793c', 'ce4661bb')
--traces filtering 
AND tr."success"
AND tr."value" > 0
AND tr."call_type" = 'call'

There was technically another method called contributeForPodium, which is why we check for two function signatures above. The CALL type actually has subtypes as well at the opcode level, so we need the specific base call_type of call (if you're familiar with a delegatecall, then you'll know that would give us a double count). We joined on transaction hash, and then divided by 10^18 to get the right decimals of ETH value.

Let's move on to the last transaction, where the data starts to get really tricky on us.

Closing and withdrawing funds from the contract

Third Transaction: 0xe9d5fefde77d4086d0f64dd1403f9b6e8e12aac74db238ebf11252740c3f65a8

Here, we can see that 337 ETH was transferred and 1,012,965 HVND tokens (the latter of which was decided by operatorPercent_ in the first transaction). After this function is called, the contract just operates the way any normal ERC20 would.

In the case that a crowdfund was already closed, we could have gotten the total raised from the data in this transaction - such as value transferred in an internal transaction of CALL type. It's better to tie this to an event though, in case there are some transfer behaviors that we don't know about. But wait, why are the logs not readable?

Well, this is where we start to get into some pretty confusing patterns. Earlier I mentioned that this crowdfund is deployed as a proxy - that means it’s just like an empty USB that plugs into a computer that actually holds the logic. It's much cheaper to create USBs than computers - and that logic holds for on-chain too (except the cost is in gas). If you want to read about proxy patterns, I'd check out this great article by the OpenZeppelin team.

The computer in this case is known as the logic and is only deployed once. The proxy is deployed many times, and it doesn't have the logic functions or events in the contract code. Therefore, etherscan isn't equipped to show the decoded data in logs. So then how do we piece this together? We could take the keccak256 hash of the event, just like we did for function signatures. But here's where reading the code will help save you some time. If you go to Read Contract on the factory contract, you'll see the address of the logic contract:

From there, we can look for the closeFunding() function in the code:

function closeFunding() external onlyOperator nonReentrant {
        ...code...

        _mint(operator, operatorTokens);
        // Announce that funding has been closed.
        emit FundingClosed(address(this).balance, operatorTokens);

        ...ETH value transfers...
    }

ETH value transfers don't emit events since they are just internal transactions. And if you are familiar with how the ERC20 standard works, you'll know that _mint actually creates a Transfer event (meaning that covers our first event). That means that FundingClosed must be the second log, with the topic of 0x352ce94da8e3109dc06c05ed84e8a0aaf9ce2c4329dfd10ad1190cf620048972. Can you figure out why else it couldn't be the third log (hint: what's a key difference between the first two logs and the third log)?

With that knowledge, we can query this just like any other event, with some fancy data decoding (remember parameters are every 64 characters (32 bytes). We have to turn it into a string to slice it, and then we change it into a number and divide by 10^18 to get rid of decimals.

SELECT "contract_address", 
        bytea2numeric( decode ( SUBSTRING ( encode("data", 'hex') , 1, 64 ), 'hex'))/1e18 as eth_raised, 
        bytea2numeric ( decode ( SUBSTRING ( encode("data", 'hex') , 65 , 64 ), 'hex'))/1e18 as tokens_allocated_owned
FROM ethereum.logs
WHERE "topic1"='\x352ce94da8e3109dc06c05ed84e8a0aaf9ce2c4329dfd10ad1190cf620048972'::bytea
AND "contract_address"='\x320d83769eb64096ea74b686eb586e197997f930'::bytea

Congrats, you now know your way around ethereum.transactions, ethereum.traces, and ethereum.logs. They can always be joined by transaction hash, and then the rest is just knowing how to manipulate the data with encode/decode, substring, and some bytea operators. Woohoo!

We could have done this exercise for the contribute method in the last transaction too. Since this is all happening on the proxy contract.

Putting it all together

Now, if we had to go and keep track of function signatures and event topics - as well as decoding all the variables in each query - I think we would have all quit data analysis by now. Luckily, most data services have some variation of contract decoding, meaning I can give a contract address and the ABI and Dune will take care of the decoding for me. That way, events/functions become their own tables and I can easily make the same "total contributions" query from earlier with this:

WITH 
    union_sum as (
        SELECT SUM("amount")/1e18 as raised FROM mirror."CrowdfundWithPodiumEditionsLogic_evt_Contribution"
        WHERE "contract_address"='\x320d83769eb64096ea74b686eb586e197997f930'
        
        UNION ALL 
        
        SELECT SUM("amount")/1e18 as raised FROM mirror."CrowdfundWithPodiumEditionsLogic_evt_ContributionForEdition"
        WHERE "contract_address"='\x320d83769eb64096ea74b686eb586e197997f930'
    )
    
SELECT SUM("raised") FROM union_sum

Thankfully this query is much more readable and easier to write. They even take care of proxy/factory logic patterns - thanks team! Without this abstraction, I guarantee that data analysis would be ten times messier to write and one hundred times worse to debug. Dune has plenty of other useful tables as well, such as prices.usd for daily token prices and dex.trades for all token trades across all main exchanges (and event nft.trades for OpenSea NFT actions).

While most of the time you'll be playing with decoded data, knowing what really sits underneath it all will help you level up a lot faster in Web3! Plus, you're now etherscan fluent - which I promise will be a part of every crypto job description in the future. I hope you found this helpful, and as always feel free to reach out if you need some help getting started.

Subscribe to Andrew Hong

Receive the latest updates directly to your inbox.

Verification

This entry has been permanently stored onchain and signed by its creator.

Arweave Transaction

uICmLpk4FV508nH…sK38hv1jyZVqhQA

Author Address

0x2Ae8c972fB2E6c0…2dc672ED190DA06

Content Digest

mbR1n_CvflL1KIK…BqDPNndH8mu2eJw