In my previous post on digital identity, I mentioned that "the tokenization of these graph shards could take many forms and will likely be layered upon by proof tokens." I believe that sharded graph identity approach requires coming up with a community-specific reputation score that measures how influential a certain person has been in expanding a specific network. While some reputation scores may be more set in terms of having to have done X or Y actions, this score captures a users reputation in the context of other users in a more fluid manner - acting like a signal rather than a badge.
Typically in Web2, users are "rewarded" by an algorithm which will highlight them based on the engagement and attention they bring to the platform. Thus, their reputation score is just the number of likes or followers they have - regardless of who those vanity metrics come from. In Web3, we typically reward with tokens representing the value of the protocol or product. These tokens also carry a lot of power in the form of voting and other privileges. As such, a score that signals not only influence but also how supportive or aligned a user is with the rest of the community will become increasingly important over time.
For this post, we will be focusing on creating a reputation score for Mirror users (voters, writers, contributors) based on where each user sits on an interaction-based social graph across Ethereum, Twitter, and Mirror data. I chose Mirror for three reasons:
Social graphs themselves are not new, but selectively layering them will open up quite a few new doors of usability and meaningfulness. There are two main reasons for this:
Enabling new applications: The way that we understand the layering affects what we do with the data. The way I see it, Ethereum is a base layer social graph that everything else is built upon. Platforms like Mirror and Twitter are contexts that sit on top of this base layer and shift how we see the connection of users across the space.
Because I wanted to analyze only users of Mirror, I took just a subset of available Ethereum and Twitter data. Those who have a high community-specific reputation score could get a larger token airdrop, be delegated for roles (or more votes) in treasury management, or get preferential access to new protocol features. There are also many more such contexts like Mirror built on top of Web2 + Web3 platforms, such as NFT communities (Cryptopunks, BAYC, Blitmaps) and gaming communities (Axie Infinity, Darkforest).
As DAOs start working together more and the metaverse becomes more interconnected, we'll see more communities (and contexts) overlap. I imagine that studying how people and communities interact within and across different contexts of the social graph could produce mixed-community reputation scores which could be applied in quite a few different situations. For example, this score could be used for multi-token or collaborative NFT airdrops, as well as choosing leaders for partnerships and programs like rabbithole's pathfinder initiative.
What the data represents: The kinds of data collected and used affect the social graph as well. Using data from $WRITE votes, funding contributions (across editions, splits, crowdfunds, and auctions), and Twitter mentions, I want to represent three buckets of scarcity (respectively): belief, capital, and attention.
I believe that analyzing how interactions connect different people across these three buckets gives us a proxy for how much they support each other. I also believe that networks of support give us a more accurate representation of communities within social graphs. These two assumptions gave me confidence in using a concept called "betweenness centrality" as a primitive for a reputation score.
The selection of data representation and contexts was key to the form the social graph ultimately took. If I wanted to proxy just willingness to contribute, then I would probably create a completely create node schemas based on different kinds of products/categories of creators rather than a pure user-to-user graph. This would likely shift the shape of the social graph completely.
Let's try to target and quantify the strong connection points of the social graph of race participants, writers, and contributors. I've chosen to use a concept called "betweenness centrality" to represent the score of each node. Betweenness centrality is calculated using an algorithm that calculates the shortest path between all nodes for all nodes).
Community clusters in a social graph usually look something like this:
For Mirror, the base social graph of who has voted for who looks like this:
However, this doesn't have an outside-of-race social context yet. Let's layer in Ethereum transaction-level data - this is limited to Mirror-related transactions between participants, such as sending a split, funding a crowdfund, and buying an edition or reserve auction.
Now we'll add Twitter data too, which will link nodes based on who has mentioned another participant in their last 2000 tweets.
Showing the interactions (edges) in a cleaner fashion, the mix of interactions across the social graph looks like this:
Now getting back to the original point, the expectation is that some of these nodes between large clusters would get a higher weighting than others since they've connected the paths across the most other nodes.
This is called "betweenness" and is a factor I believe to be very important when trying to proactively grow a community. The base idea here is that the higher the "betweenness" factor a person has, the more likely they will lead to creating connections and branches that build up a more diverse community. There have been a few research papers that highlight the beneficial effect that nodes with high betweenness have on the diffusion of a community as well as building the resilience of a network.
Some of you might be wondering why I chose betweenness centrality over something like closeness or degree centrality. Those latter two metrics highlight pure influence, and I don't think reputation in a community should be based on just those who already have that level of influence. The Graph Algorithms textbook by Neo4j puts the concept behind betweenness very well:
"Sometimes the most important cog in the system is not the one with the most overt power or the highest status. Sometimes it’s the middlemen that connect groups or the brokers who have the most control over resources or the flow of information. Betweenness Centrality is a way of detecting the amount of influence a node has over the flow of information or resources in a graph. It is typically used to find nodes that serve as a bridge from one part of a graph to another."
A lot of Web2 has been about concentrated influence and echo chambers, I believe Web3 should try to enable and incentivize creating bridges as much as possible instead.
For anyone who wants to see the scores, check out the sheet here. Note that these values have not been weighted based on data/edge type, so some users may have higher betweenness due to Twitter interactions rather than Mirror/Ethereum interactions.
We've already discussed some of the applications of this score, but the solution I used may still appear unnecessarily complex or overengineered. I think this methodology is required to provide composability while also measuring reputation in a way that is much tougher to game.
Composability: As mentioned before, this graph is built up layer-by-layer. The data elements are all publicly available to collect, the model follows a search algorithm and is flexible to whatever nodes or edges (users or interactions) are chosen, and once the pipeline is built once it can be reused or forked for any set of tweaks. Hopefully in the future, all the available data components just sit in a user interface and this model becomes drag and drop. From there I imagine you could export scores or connect directly to something like disperse.app or galaxy drops.
Durability: The problem with many scores and measures is that once they have been publicly used once, then people can start to figure out how to game the system. This is especially true for anything that is just user <> protocol
interaction based. Creating a model that is user <> user <> user
dependent is harder to game because the users in the current community won't necessarily reciprocate interactions with a bad actor. Actions are also compounding, so having just one interaction (or one type of interaction) will not be good enough to get a higher score). Even if they do find some way to game it, then congratulations you now have another active participant who is contributing to the community.
These two elements allow for an overall stronger reputation score mechanism, which I believe justifies the efforts it takes to get there.
I would like to continue iterating this research and possibly use this kind of score for specific use cases from protocols and communities (please reach out if interested). If you've made it this far and want to contribute to this analysis with ideas or technical expertise, I do invite you to join ImagDAO for further work on social graph data and decentralized identity).
The next steps include more data collection, setting up a Neo4j dashboard for others to play with, and of course applying more data science algorithms. Machine learning and community detection will likely take on a bigger role in the future as we try to predict how the social graph will grow or try to target specific sub-communities.
I mentioned before that this analysis was only possible because of the verifiable consolidation of identity by Mirror's platform (Ethereum <> Twitter). Continuation of this work heavily depends on an identity consolidation API as well, so anyone working in this context please connect with me as well (looking at you Ceramic and Gitcoin 👀).
The data and scripts may be shared publicly later once I refactor and figure out how to protect the privacy of users in this social graph.
Special thanks to Ben Schecter for all his help and ideas in reviewing this post
If you want to support more of this research, consider bidding on the auction for "Decentralized Community" or contributing to the split below: