I recently hired into a data analytics team for a hospital, and we don’t have a style guide. Lots of frustration from folks working with legacy code…I thought putting together a style guide would help folks working with code they didn’t write, starting with requiring a header for SQL scripts first as low hanging fruit.
Or so I thought.
My counterpart over application development says that we shouldnt be documenting any metadata in-line, and he’d rather implement “docfx” if we want to improve code metadata and documentation. I’m terrified of half-implementing yet another application to further muddy the waters–i’m concerned it will become just one-more place to look while troubleshooting something.
Am I going crazy? I thought code headers were an industry standard, and in-line comments are regarded as practically necessary when working with a larger team…
I recently hired into a data analytics team
I work in Data Engineering and have spent most my time on analytics teams. They don’t have a SWE/CS background and generally because of that don’t follow any good programming practices. In my experience style guides are hard to get them to follow properly even if you set up SQLFluff for them., I can barely make them see the advantage of not committing directly to main (at least we’re using git). It’s very frustrating.
Yep that’s us–maybe half of us have CS degrees.
The funny thing is that the pushback is coming from the “regular” development folks. At least we’re using git too :)
Yes, serious people write docs. I hate this bullshit about code that should be so good that it’s “auto-documenting.” It never happens in real life. Code is at best of average quality, but it needs documentation. At my previous job they had “guidelines” to make sure that code didn’t needed doc. It was a bad joke and we had the worst code I’ve ever seen.
I don’t have solutions for you though. You need a combo of documentation generation, code formatter (in the CI maybe, or before a commit), and code linters to check for errors.
I like it better when the docs are embedded in the code or alongside them. Everywhere I’ve worked it is a pain trying to find some random Confluence page or whatever where some API doc is.
Also if it’s not in the code, it will get outdated quickly and nobody will ever look at it. Separate docs are only really useful for main concepts that are not going to change that quickly.
“Self-documenting” just means “(I thought) I understood it when I wrote it, so you should too”. In other words, it really means “I don’t want to document my code”
Hmm, do I want to open some external site/program to see my documentation or have it already in the code in front of me?
We use doxygen at my company and I think I’ve only ever opened it twice in 9 years.
A header might be useful, although there’s likely better ways to (not) document what each sql statement does.
But inline documentation? I’d suggest trying to work around that. Here’s an explanation as to why: https://youtu.be/Bf7vDBBOBUA
If possible, and as much as possible, things should simply make enough sense to be self documenting. With only the high level concepts actually documented. Everything else is at risk to be outdated or worse, confuse
Self-documenting code only documents what the code does, not why it does it. I can look at a well written method that populates a list with random elements from another list and go “I know what that does!” but reading the code doesn’t tell me the reason this code was written or why alternatives weren’t chosen.
In the case of Rust, it goes even a step further when working with unsafe code. Sure I know what invariants need to be held for unsafe code to be sound, but not everyone does, and it isn’t always clear why a particular assumption made in an unsafe block (the list has at least 5 elements, for example) can be made soundly.