Hi there Thomas, just a quick question concerning the status of this project? Is it done and how did it turn out? I feel this is actually a very important project for the future of Monero and I was just curious on the status. Also, is it possible to post an example of what you are doing, so interested people can get an idea of what is taking place?
Documentation and cleanup of source code
What
Clean up and properly document (doxygen) the monero source code, and handle any merges necessary to do so seamlessly with other concurrent development efforts.
Who
I'm Thomas Winget, and I've been contributing to the Monero codebase since around June 2014. My largest contribution is the migration from storing the blockchain in a static, binary format to storing the blockchain using a database API, as well as two BlockchainDB implementations (LMDB and BerkeleyDB). Other contributions of mine can be found via the network graph on github or simply looking at my fork.
As a side effect of the work I've done already on Monero, I'm rather familiar with most of the codebase. This puts me in a good position to write down how everything works and fits together, as I've had to sort that out already for many parts.
As a side effect of this effort, I will further cement the knowledge I have of the codebase and gain knowledge in areas I'm less familiar with, enabling me to efficiently implement new features in the future and guide others who may have questions regarding a specific module. In addition, documenting the code will function as a sort of code review, allowing the possibility of spotting any bugs (minor or otherwise).
Why
In my time developing on Monero, there have been countless instances where I need to reference what another piece of code does in order to work with it, and having all the code documented properly in an easily accessible manner (doxygen) would be helpful for that for anyone working in Monero's codebase. I have also come across areas where the code could be made more efficient or clear without changing any functionality, as well as many typographical errors (though these are minor, I think we can all agree they're annoying if nothing else).
Documentation is by far not a glamorous task (probably why it's a bit sparse in Monero), but every development effort benefits immensely from properly documented code.
Proposal and Milestones
I would like to spend about 20 hours per week working on the above, at a rate of 45 XMR per hour (just over $20/hr USD at the time of writing this), paid out every week. I am unsure whether or not 80 hours of work will be enough to cover the entire codebase. There are over 70,000 lines of source in the src/ folder alone, and there is code in other folders that brings the total to over 100k, so this is no small task!
Milestones would be as follows:
- First 20 hours (900/3600 XMR)
- Second 20 hours (1800/3600 XMR)
- Third 20 hours (2700/3600 XMR)
- Fourth 20 hours (3600/3600 XMR)
After each milestone (and during), I will be available for comment, so anything that seems unclear in my documentation can be addressed and dealt with accordingly.
Ok, final 20 hours complete at long last. Really a bit longer, but I'm still not anywhere near "done". I just likely won't attempt to do so much at a time again... >_>
Much of this time was spent rebasing/cherry-picking the previous milestones' commits, which is why above I said "really a bit longer", as that didn't quite seem fair. I'll be doing more yet, but as I'm over 20 hours by a bit and just wrapped up a section, I thought it's time for an update. New developments follow below.
Transaction pool documentation:
commit c6bb201a07d84e195be901acda2fcd0b886bf66f
Author: Thomas Winget
Date: Thu Mar 24 20:03:02 2016 -0400
Transaction pool documentation (and some cleanup)
tx_pool.h doxygen documentation completed.
Many notes made on areas for improvement, be that functionality or
code clarity.
Commented code and unused code removed.
As the pull request for the previous milestones is still pending, I have not yet submitted this as a pull request. The changes can be found here on my github.
Testnet documentation:
This documentation isn't something that will go in the source code directly, and thus not on the github. I went through every instance of "testnet" in the source to evaluate what "testnet or not" means. I could probably make the resulting document look nicer, but it should be clear enough. While this is not directly documenting the source code, it will lead to better and ideally self-documenting code. For now, I'll simply paste the document below, but if anyone has any questions I'll be glad to answer them. I've also uploaded it to fpaste because the preview shows that it's ugly on here.
For reading it:
The organization is source lines followed by their purpose, followed by a line of hyphens, repeat for all purposes of the notion of "testnet". Source lines which are grouped together without a line between are directly related to one another. The exception to this is the "pass the testnet flag around" section, as it didn't seem useful to group them in this manner.
EDIT: The formatting the preview showed broke into fragile pieces, apparently, so just look at it on fpaste.
Okay, long delay since the last milestone update, but here's #3:
BlockchainDB's documentation is now Doxygen-compliant and up-to-date with recent changes. A couple of things were renamed, as their purpose had changed but their names implied the old purpose.
It's amazing how much things change in under a year. BlockchainDB has undergone many changes, so the new documentation is not guaranteed to be 100% correct. I'll be attaching my notes to the end of this post, some of them refer to bits I still need to ask about (or spend time looking into). I'll also paste the commit logs at the end, because why not.
If anyone has thoughts on my notes below, please don't hesitate, as many of them I'm not 100% sure what the best course of action is at this time.
As always, you can follow updates on my github repo.
Current TODO list
General:
Discuss updating function names to better reflect their purpose.
Fix inconsistent indentation.
evaluate return values of all functions (lots of bool return values not being used, etc.) related to this, consider throwing rather than returning empty containers where a container is the return type
src/
cryptonote_core/
cryptonote_core.*/
move BlockchainDB initialization and DB-specific code from cryptonote_core.* to better places
blockchain.*/
evaluate locking mechanism and possibility of separate reader/writer locks
blockchain_db/
blockchain_db.*/
remove_transaction_data() should not remove outputs as well, but currently
subclasses do so. This should be reorganized. The current subclass function
to do so is remove_tx_outputs()
global output index counting should maybe be handled in the parent class,
up for debate.
Need to make sure that aborting a partial add_block after an exception
will not error out part way through due to another exception. This could
be tricky, as "remove_*" might fail because it hadn't been added yet, or
it might fail because of a database issue. This will likely require more
fine-grained exceptions. For now, with the exceptions that exist, I think
that a generic DB_ERROR is suitable for, well, a db error, and the DNE
exceptions should be used as such. This will likely be sufficient to
differentiate.
add_transaction (concrete in parent) takes a pointer to a tx hash as a
parameter. This should be a reference, but perhaps there's a reason it
is not. Needs looking into.
Destructor: should it call sync() and close() if open?
open: db_flags should either be handled in subclass implementations
or standardized among all blockchaindbs.
batch transactions: batch_stop and batch_commit both have commit code,
batch_stop should call to batch_commit instead.
IMPORTANT: make sure BlockchainDB (and/or subclasses) handles correctly
removing partial block data in the event of a failure during a block add.
get_blocks_range: should this throw if the request is out of range, or
should it let get_block_from_height throw and just ignore it as it does
now?
get_tx_list: much the same as above with get_blocks_range.
*important* hardfork functions: verify that documented functionality is true/correct
** Git log **
commit cc3394b38d87785aafe850e58dc2d79cd65d47de
Author: Thomas Winget
Date: Tue Dec 1 17:38:22 2015 -0500
Update BlockchainDB documentation
BlockchainDB is now Doxygen-compliant and its documentation is
up-to-date with recent changes.
src/blockchain_db/blockchain_db.h | 1063 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++----------------------------
1 file changed, 911 insertions(+), 152 deletions(-)
commit 204ed981b0e114dc122de16d8642ec44745074f1
Author: Thomas Winget
Date: Tue Dec 1 15:03:46 2015 -0500
Rename function for clarity
As BlockchainDB::get_output_key was changed to return a struct containing
the output key along with two other pieces of metadata for the output, the
function name is now changed to be get_output_data.
In addition, the database handles in db_bdb and db_lmdb were renamed
from m_output_keys to m_output_data to mirror this.
src/blockchain_db/berkeleydb/db_bdb.cpp | 28 ++++++++++++++--------------
src/blockchain_db/berkeleydb/db_bdb.h | 8 ++++----
src/blockchain_db/blockchain_db.h | 8 ++++----
src/blockchain_db/lmdb/db_lmdb.cpp | 22 +++++++++++-----------
src/blockchain_db/lmdb/db_lmdb.h | 8 ++++----
src/blockchain_utilities/blockchain_dump.cpp | 2 +-
src/cryptonote_core/blockchain.cpp | 10 +++++-----
7 files changed, 43 insertions(+), 43 deletions(-)
I'd like to thank everyone for their patience with me, as I've been very slow to complete this task. I will admit that a large part of that is motivation, as it is a rather boring task. That said, another part of it is frustration. There have been a lot of great changes in the source over the past year, and I have not kept up with even half of them (other than reading commit logs). As I have worked to document some of these changes, I have seen changes which, while I agree with what they do, I disagree with how they've been done (from an organization/maintainability standpoint). I admit 100% that this is an issue of pride, and I will work past it.
As I work to complete this contract, I am taking notes of the things I disagree with. Obviously mine is but one opinion, so I will not change any of these things for this contract (except obvious things like replacing BOOST_FOREACH with C++11 for, etc.). After I've completed this contract I will review my notes and see if I can come up with solutions, then submit them for review.
Again, thank you all (especially those who donated to this effort, but also to the community at-large) for your patience. I intend to spend the whole day working on more documentation, so look for an update at or before 00:00 UTC Wednesday morning.
tl;dr - look for next milestone update at or before the coming midnight UTC.
thanks for sticking with it man!
How is it going? Also, was the 4x20 hours meant as consecutive? Or just arbitrary. I am looking forward to hearing some new ideas as well btw :)
Update #2:
Second 20 hours completed. Progress can be seen on the branch on my fork.
Rather than retype all of the commit messages here, you'll find a paste of "git log --stat" below, or at least hopefully you will, this preview thing hates me.
Please feel free to have a look through the generated Doxygen and let me know if anything seems unclear or incorrect (including Doxygen I haven't touched yet, it all needs to be correct eventually, and if you spot something I may as well know about it!)