Dear /r/Bitcoin, I wrote a set of tools to import data from .bitcoin/blocks/blkXXXXX.dat files into databases and allow you to explore the blockchain on your own PC by running ad-hoc queries of all the main Bitcoin Core data structures including blocks, TX’s, inputs, outputs, addresses and more.

It’s written in C++ and uses Bitcoin Core itself to read the data, so it’s always 100% compliant with the latest Bitcoin release. I abstracted away the database functions, so you can implement “drivers” for any other DB system. I’ve been playing with it on MySQL but perhaps others would prefer Neo4J or Cassandra for nosql graph analysis.

Once the data is loaded, you can run any database query against it.

I implemented a simple reference Python interface to it so you can play with the blockchain. I made this outer layer in Python but any other language can be easily plugged in, including big data analysis systems like Spark or Hadoop.

Some stuff you can do with it:

  • Trace any Bitcoin address funds by building a graph
  • Run your own local block explorer without any external API
  • Add and subtract inputs/outputs to build whatever statistics
  • Plug in viz modules to output graphical stuff from the data

While loading the block files it does some additional work :

  • Creates an index of block position in .dat files. Bitcoin Core does this but access to the index is locked while Core is running. With Toolbox you can play with this data even while Bitcoin Core is running independently. With this index you can go back to the raw data any time. This can be turned into a website service and exposed via an API
  • Generates a address graph, resolving previous outputs from inputs and building a DB table with source and destination TX’s and addresses side by side. It’s like a WWW block explorer in your own database which you can query in any way.

Addresses, TX’s and all other data structures are decoded by Bitcoin Core itself. The C++ Toolbox links to Bitcoin Core and uses the canonical implementation, so the loaded data is accurate. The Python reference implementation is almost 100% independent from the C++ code (except for configuration code which shares the same parameters and config file as the C++ Toolbox).

Released under the MIT license.

I hope you like playing with it!


Announcement :

Docs :

Source code:

submitted by /u/josefonseca
[link] [comments]