A user on the online forum 4chan has leaked a massive 270GB of data purportedly belonging to The New York Times. This leak includes what is claimed to be the source code for the newspaper’s digital operations.
It’s mostly node modules
“send nodes”
I hate Web 3.0
Node has been around longer than web3
NPM nightmares intensify
I also hate making things from smaller pieces, the engineering in software engineering. /s
270GB of mostly node modules?
You’re right, it would be bigger if it was node
Sounds pretty average
pnpm store would probably save them a lot.of grief if true
Nah, having big node_modules folders is a security feature. It’s like keeping your valuables in a bag of trash.
It also takes a long time to make a dump, so you have a higher chance of noticing it happening.
reminds me of the time someone said “Who is this 4chan?” on tv and it became a meme. good times
Source for the curious: https://www.youtube.com/watch?v=qz5i171h_no
Here is an alternative Piped link(s):
https://www.piped.video/watch?v=qz5i171h_no
Piped is a privacy-respecting open-source alternative frontend to YouTube.
I’m open-source; check me out at GitHub.
He can’t keep getting away with it.
270GB feels insane for the source code of a single organisation. Is there media assets or backups in there too?
EDIT: yep, multiple subsidiaries and slack Comms which could inflate it by a lot. we post a whole lot of uncompressed shit on our slack
NY Times has a freaking great data visualisations, they are (were?) employing a wizard in this space, doing custom extensions on d3.js.
Source code… for a website?
Subscription software. Tracking software. Ad tools. Promotion tools. Tools for journalists.
The website is just what you see.
Yeah, I guess I didn’t consider all the other operational shit that goes into providing content and funding for the website.
It’s why our PCs have gotten insanely fast but websites still load like fucking trash. All the back end spying shit takes up a ton of cpu cycles. If you don’t already have em run ublock origin and no script and the internet is so fucking speedy 😆
I hadn’t noticed but then again I run Ublock Origin on Firefox.
Yeah. You got yourself covered no script helps with JavaScript being pesky. But breaks a lot of shit tbh.
You can still make it work, it’s just more stuff to click on. I used to use NoScript too, but eventually stopped using it.
That’s not what makes websites slow. It’s React.
Retards with React. “I’m optimizing user experience”
Oh haven’t heard of this will check it out.
Anything more complicated than a static website is going to have a significant amount of server-side code.
Also, the article explains that it’s not just the website, but ALL of their repos, which would include their smartphone apps, backend tools, etc.
I doubt this will affect much … that’s a lot more source code than I’d expect though, dang.
Presumably a lot of it is for internal operations (custom editing software or something of that ilk).
It sounds like it’s not all source code, from the article.
Now everyone will get to run Wordle!
In case anyone missed the hubbub: [ETA: This is from March 2024; unconnected to this hack/leak]
The Times has filed several Digital Millennium Copyright Act, or DMCA, takedown notices to developers of Wordle-inspired games, which cited infringement on the Times’ ownership of the Wordle name, as well as its look and feel — such as the layout and color scheme of green, gray and yellow tiles.
Numerous impacted developers have also taken to social media to share their frustrations. Many said that their games, which range from Wordle-like offerings in other languages to more guessing games, would be taken down as a result.
Still, Brauneis said he believes the Times’ arguments for Wordle copyright infringement are on “a little bit shaky ground” for several reasons. Rules of a game, for example, are not covered by copyright — and that can include the layout of the game itself, he said.
I prefer Connections.
We still have no legal right to use, change and share its source code, control it both ourselves and in groups. It’s still anti-libre software.
Anything that may help develop better adblockers/paywall bypasses or exposes how/what of our personal information is collected is a win in my book. And this may very well be none of those things.
They only exist when we keep them relevant and we already know we can’t prove it’s private but if it helps some people, that’s good.
Right, because fuck paying for proper journalism. Everything must be free!
Remind me again, how does that work?
I pay for the NYT, and yet every other screen is a fucking ad (often the same ad repeated over and over). You already have my subscription money, and unless they decide not to be so greedy (haha), their ads get shoved up my pihole.
The inverse of this is where subscription services that previously had no ads for paying subscribers then add in ads on paid plans while also increasing the fees associated. It’s a pretty standard practice, NYT included. Adblocking is necessary.
Just seeing how something is approached helps.
I sometimes rebuild software from one language to another for practice.
Very few care about licenses unless the use of such material can be proven, and good luck with that
Did this leak happen before or after NYT published an investigation detailing how Israeli forces were raping and torturing defenseless Palestinian detainees brought in from the Gaza Strip?
I have not read the news in a really long time just cause paywalls are annoying as frick.
Consider paying for the news…?
I’d only do that if you want independent news.
I’m not sure what you’re saying here …
Pay for news if you want it to be independent, and not beholden to sponsors.
I’d go as far as to say that paying for news (if you have the means to do so comfortably), is your duty as a commitment to democracy.
Ahh, yes I agree on all points; thanks for the clarification!
It’s amazing the number of times on Lemmy that someone will come in with the completely opposite “explanation” for what I was saying. Almost like they have an agenda.
It’s so weird to turn my statement of “support the news with money” into “the mainstream media can’t be trusted”.
Maybe it’s only happened twice, but it’s still weird that it’s happened twice.
Phrasing things in semi-sarcastic converse will do that
I was wondering if that’s where you were going in part.
I think it’s a bit of the phrasing; you stated an opinion that’s vague to the point of tiptoeing towards the potentially loaded question: “who’s independent media?”
It’s not uncommon in the conservative media sphere to see a similar (typically series) of leading ambiguous questions. They’re never genuine, it’s always in the style of:
You know what the best operating system is? I’ll tell you what the best operating system is, it’s Linux. Do you know why Linux is the best operating system? It’s because it’s got penguins and penguins are great! Do you know why penguins are great? I mean, can you think of a more iconic bird? That’s why, that is why … and Big Microsoft is out to destroy your hopes and dreams aren’t they? Yes, yes they absolutely are, with their soulless Windows operating system that’s manufactured by the flying spaghetti monster. Now obviously folks, only use Linux if you support freedom not the unholy flying spaghetti monster. The flying spaghetti monster will destroy America. It’s its one true mission. Support freedom, support penguins, stop the flying spaghetti monster.
I think it’s made a bunch of if antsy lol
Why would anyone sponsor an IDF propaganda outlet?
Never!
You can go to archive.is and put in the url of a news story you want to read in the second box and it will usually let you bypass the paywall.
I expect that paywall to be fully useless soon.
That’s a really silly take … a Paywall is just an authorization mechanism.
That’s like saying the source code of lemmy leaks and you expect your account to be compromised any second.
I can sell you a copy of lemmys source code, are you interested?
I’ll sell it for cheaper!
I can give you 25 schmeckles.
Well, the open source code is less likely to commit “security through obscurity” than closed one.
Oh. They stopped seeding the torrent at 85%…
Critical support
I just received an email, is this related?
Thats a lot of data but surly its not all their articles cos I’d very much like to train mixtral7x8b on it along with 4chan data and shir from the dark web. Surly there is a project where such a model is public and being trained on literally everything regardless of legality.
EDIT: why am i getting downvoted?