Hi!

Kagi had a rough couple months on the PR side, and a comment from another Lemmy user arguing that they aren’t using Google’s index set me off… because I had just read a couple weeks ago on their own websites that they primarily use Google’s search index.

Lo and behold, that user was “right”: No mention of Google whatsoever on Kagi’s Search Sources page. If that’s all you had to go off of, you’d be excused for thinking they are only using their internal index to power their web search since that’s what they now strongly imply. The only “reference” to external indexes is this nebulous sentence:

Our search results also include anonymized API calls to all major search result providers worldwide, specialized search engines like Marginalia, and sources of vertical information […]

… Unless one goes to check that pesky Wayback Machine. Here is the same page from March 2024, which I will copy/paste here for posterity:

Search Sources

You can think of Kagi as a “search client,” working like an email client that connects to various indexes and sources, including ours, to find relevant results and package them into a superior, secure, and privacy-respecting search experience, all happening automatically and in a split-second for you.

External

Our data includes anonymized API calls to traditional search indexes like Google, Yandex, Mojeek and Brave, specialized search engines like Marginalia, and sources of vertical information like Wolfram Alpha, Apple, Wikipedia, Open Meteo, Yelp, TripAdvisor and other APIs. Typically every search query on Kagi will call a number of different sources at the same time, all with the purpose of bringing the best possible search results to the user.

For example, when you search for images in Kagi, we use 7 different sources of information (including non-typical sources such as Flickr and Wikipedia Commons), trying to surface the very best image results for your query. The same is also the case for Kagi’s Video/News/Podcasts results.

Internal

But most importantly, we are known for our unique results, coming from our web index (internal name - Teclis) and news index (internal name - TinyGem). Kagi’s indexes provide unique results that help you discover non-commercial websites and “small web” discussions surrounding a particular topic. Kagi’s Teclis and TinyGem indexes are both available as an API.

We do not stop there and we are always trying new things to surface relevant, high-quality results. For example, we recently launched the Kagi Small Web initiative which platforms content from personal blogs and discussions around the web. Discovering high quality content written without the motive of financial gain, gives Kagi’s search results a unique flavor and makes it feel more humane to use.


Of course, running an index is crazy expensive. By their own admission, Teclis is narrowly focused on “non-commercial websites and ‘small web’ discussions”. Mojeek indexes nowhere near enough things to meaningfully compete with Google, and Yandex specializes in the Russosphere. Bing (Google’s only meaningful direct indexing competitor) is not named so I assume they don’t use it. So it’s not a leap to say that Google powers most of English-speaking web searches, just like Bing powers almost all search alternatives such as DDG.

I don’t personally mind that they use Google as an index (it makes the most sense and it’s still the highest-quality one out there IMO, and Kagi can’t compete with Google’s sheer capital on the indexing front). But I do mind a lot that they aren’t being transparent about it anymore. This is very shady and misleading, which is a shame because Kagi otherwise provides a valuable and higher quality service than Google’s free search does.

  • Imprudent3449@lemm.ee
    link
    fedilink
    English
    arrow-up
    76
    arrow-down
    2
    ·
    6 months ago

    This is disappointing. Due to Kagi requiring an account and billing I would say transparency should be vitally important for a them since privacy concerns are going to be a large reason a lot of people are looking to switch from Google in the first place. It’s always a concern brought up when search alternatives are discussed in forums and the “just trust me BRO” is going to start to ring kind of hollow if they play little games like this.

    • WhatAmLemmy@lemmy.world
      link
      fedilink
      English
      arrow-up
      38
      arrow-down
      1
      ·
      6 months ago

      “Just trust me bro” is always bullshit with capitalism. On a long emough time line for-profit orgs will always expand to double/triple/quadruple/etc dip into their customer base. It might as well be a fundamental law of economics at this point.

    • capital@lemmy.world
      link
      fedilink
      English
      arrow-up
      5
      arrow-down
      2
      ·
      6 months ago

      They’re either anonymizing your searches to the downstream index or they aren’t.

      Does seeing an itemized list of indexes used change that?

  • Lemmchen@feddit.de
    link
    fedilink
    English
    arrow-up
    54
    arrow-down
    7
    ·
    edit-2
    6 months ago

    Our data includes anonymized API calls to traditional search indexes like Google, Yandex, Mojeek and Brave, specialized search engines like Marginalia, and sources of vertical information like Wolfram Alpha, Apple, Wikipedia, Open Meteo, Yelp, TripAdvisor and other APIs

    I don’t want to be that guy, but technically they said they are using traditional indexes like Google, not that they are in fact using Google. But I guess that is splitting hairs.
    Also, maybe they just dropped Google from their indexes? And what’s more: Why does it matter if they are using Google at all, when the results are satisfying?

    Knowing which indexes they are using exactly would be nice to know, though.

  • Avid Amoeba@lemmy.ca
    link
    fedilink
    English
    arrow-up
    40
    ·
    6 months ago

    Our search results also include anonymized API calls to all major search result providers worldwide

    When I read this, it doesn’t tell me they don’t use Google. Quite the opposite. It says all, that immediately tells me Google is among them.

  • vanderbilt@lemmy.world
    link
    fedilink
    English
    arrow-up
    34
    arrow-down
    7
    ·
    6 months ago

    I don’t care whose indexes they use so long as the results are good. The problem isn’t the index, it’s how the contents get prioritized and presented. Kagi happens to do so well for me.

    • vanderbilt@lemmy.world
      link
      fedilink
      English
      arrow-up
      32
      arrow-down
      8
      ·
      edit-2
      6 months ago

      He offered to start a conversation about the blog post and give his perspective. The only thing I see here is the author refusing to stand on their post.

      • RunawayFixer@lemmy.world
        link
        fedilink
        English
        arrow-up
        29
        arrow-down
        6
        ·
        6 months ago

        I didn’t read every little bit as well, but that was my take away as well. I saw an emotially invested CEO who could not bear seeing his baby dragged through the mud, and so he wanted to provide a counterpoint to what he saw as misinformation and accusations, but in a polite professional manner. My first instinct would be that he would have been wasting his time with that, but seeing as his comments got posted and they make a more convincing level headed argument then the accusations, maybe it was worth it.

        • Zengen@lemmy.world
          link
          fedilink
          English
          arrow-up
          13
          arrow-down
          3
          ·
          6 months ago

          I agree with this perspective. The CEO felt like the more reasonable guy here who wanted to respectfully and professionally clear his name through polite conversation. And the writer here seemed very aggressive. The line of questioning was outwardly hostile and accusatory with literally nothing for good evidence.

    • redcalcium@lemmy.institute
      link
      fedilink
      English
      arrow-up
      20
      arrow-down
      1
      ·
      6 months ago

      Just for some perspective, if you want to know how little reach the fedi post with the link to this blog post got: the first post in this thread already has more likes and boosts after less than a hour since posting it than my blog post ever did that he felt the need to confront me over.

      The author probably wasn’t aware that their blog post has a huge engagement in hacker news just the day before and the CEO got roasted there, so the CEO probably felt the need to contact the author to “correct” their post.

      • Snot Flickerman@lemmy.blahaj.zone
        link
        fedilink
        English
        arrow-up
        14
        arrow-down
        26
        ·
        6 months ago

        The author was aware. They made a post regarding it getting posted to hackernews stating “I specifically requested for this not to happen.”

        so the CEO probably felt the need to contact the author to “correct” their post.

        This still makes the CEO seem like an unhinged fucking freak who does not respect personal boundaries, it literally makes him look no better, no matter how he came across it.

        • douglasg14b@lemmy.world
          link
          fedilink
          English
          arrow-up
          13
          arrow-down
          2
          ·
          6 months ago

          … Contacting someone makes you an: “unhinged fucking freak who does not respect personal boundaries”?

          More people need to go touch grass, this is insane.

        • Wiz@midwest.social
          link
          fedilink
          English
          arrow-up
          6
          arrow-down
          1
          ·
          6 months ago

          Yes. How many times did she ask him to stop contacting her?

          Yet he kept coming at her, all like, “Just debate me!”

          No. Take a hint, dude!

        • capital@lemmy.world
          link
          fedilink
          English
          arrow-up
          7
          arrow-down
          3
          ·
          6 months ago

          lol they asked that their public post wasn’t posted somewhere else on the internet?

          Are they new here or something? The fuck?

        • DeprecatedCompatV2@programming.dev
          link
          fedilink
          English
          arrow-up
          5
          arrow-down
          2
          ·
          edit-2
          6 months ago

          If someone posts an angry rant about your company and you email them to say “you’re wrong and I’m sorry you feel that way” that makes you an “unhinged … freak?” This is not the president sending the secret service to your college dorm room lol.

          • Wiz@midwest.social
            link
            fedilink
            English
            arrow-up
            1
            arrow-down
            2
            ·
            6 months ago

            No, he started being an unhinged freak when it was a private email exchange.

            • DeprecatedCompatV2@programming.dev
              link
              fedilink
              English
              arrow-up
              3
              arrow-down
              2
              ·
              edit-2
              6 months ago

              Look, if it was a random kid on tiktok that’s one thing, but slinging (potentially) slanderous information around (and publishing it, technically) is a serious matter with real-world consequences. If someone made a blog post about how you torture animals and have a horrible taste in music, you’d probably want to do something about it.

              • Wiz@midwest.social
                link
                fedilink
                English
                arrow-up
                5
                ·
                6 months ago

                It wasn’t slanderous. It was her opinion about a couple items. The bad part was him hounding her after she repeatedly said to leave her alone.

    • niemcycle@lemmy.world
      link
      fedilink
      English
      arrow-up
      12
      arrow-down
      1
      ·
      6 months ago

      Oh interesting, I never knew about that side of Kagi. The fact the company is focused so hard on AI is a red flag. I don’t think I’ll renew my subscription when it comes up later this year, given how erratic their plans seem.

      • beefbot@lemmy.blahaj.zone
        link
        fedilink
        English
        arrow-up
        11
        arrow-down
        10
        ·
        6 months ago

        Eh, yeah ya do. It clearly speaks to the question of how honest and forthcoming the CEO, & by extension the company culture, is about which sources they use. The CEO has a history of the kind of interactions they had with the poster

    • Dark ArcA
      link
      fedilink
      English
      arrow-up
      9
      arrow-down
      1
      ·
      edit-2
      6 months ago

      Ironically going to use Kagi to summarize the blog post:

      Kagi is trying to expand into too many different products and services beyond just their search engine, which is stretching their resources too thin.

      Kagi spent a significant portion of their funding (1/3) to set up a t-shirt printing business to give away free t-shirts to their first 20,000 users, which seems like a questionable financial decision.

      Kagi was not paying sales tax for two years and had to retroactively pay up, indicating potential financial mismanagement.

      Kagi is heavily focused on developing AI tools and features, to the point where the author believes it is becoming the main focus over improving the core search functionality.

      The author is very critical of Kagi’s founder Vlad’s dismissive attitude towards privacy concerns and his belief that email addresses are not personally identifiable information.

      Vlad has a “my way or the highway” management style and is unwilling to consider feedback or criticism about Kagi’s direction.

      Kagi’s dedication to privacy is questionable, as the founder does not seem to take many privacy concerns seriously.

      The author believes Kagi’s AI-powered features like “FastGPT” and the “Universal Summarizer” are inaccurate and unreliable.

      The author is skeptical about Kagi’s long-term sustainability and viability as a business.

      Overall, the author has lost faith in Kagi due to the company’s questionable financial decisions, overreliance on AI, and the founder’s dismissive attitude towards user concerns.

      I did actually read the post (and I think this is actually my second time seeing this). I’m majorly unconvinced by the author … and yes, the criticized AI summarizer is that good. I regularly use it after reading something to share the details with friends (or get a rough idea of what’s being discussed and decide if I want to read something non-trivially long).

      It also works on YouTube videos (presumably using the transcript) which can be a HUGE time saver.

      Ultimately the search is good; it’s better than what Google offers me, and I’ve found their AI tools fairly useful (despite having distrust for GPT-style chat bots/BS generating AI, I think summerization of some specified source is something they might actually do well – the major concern of piecing together random pieces of random sources of varying integrity is largely mitigated).

      Whether the company will stay private / whether it lives on beyond Vlad is the biggest concern I have with using it. However, “what’s the other (practical) option to invest in?” I find myself in a similar position with Steam and Proton (at least the latter open sources much of their work). For now anyways, the weather is fair, so I’ll stay on board.

  • thejml@lemm.ee
    link
    fedilink
    English
    arrow-up
    24
    arrow-down
    6
    ·
    6 months ago

    Honestly, if the search results are good, anonymized, and consistent, I’m not worried about not using Google’s index. In all honesty, I’m much prefer it did NOT use Google. The further I can distance myself from their shady SEO/SEM practices, results stuffing, large site favoring, and monetization techniques, the better.

        • capital@lemmy.world
          link
          fedilink
          English
          arrow-up
          8
          ·
          6 months ago

          Our search results also include anonymized API calls to all major search result providers worldwide…

          Google search is the 800 lb gorilla in this space. When I read that, there’s no doubt in my mind that it includes Google.

          • boatswain@infosec.pub
            link
            fedilink
            English
            arrow-up
            3
            ·
            6 months ago

            Sure but given that their previous language explicitly mentions Google why remove that unless they’re trying to make people think that maybe they didn’t use Google. It’s a shady change, from a company whose CEO is already doing somewhat unhinged things.

        • d13@programming.dev
          link
          fedilink
          English
          arrow-up
          4
          arrow-down
          1
          ·
          6 months ago

          I wonder if they are preparing to stop using it. That could be a benign reason for the change in wording.

    • asudox@lemmy.world
      link
      fedilink
      English
      arrow-up
      10
      arrow-down
      1
      ·
      6 months ago

      I agree. I was paying Kagi a few months ago but then started self hosting a SearXNG instance. The nicest thing about it is that I can replace links and select the engines I want to use. It’s very customizable.

  • e8d79@discuss.tchncs.de
    link
    fedilink
    English
    arrow-up
    20
    arrow-down
    4
    ·
    edit-2
    6 months ago

    I unsubscribed and deleted my Kagi account mainly because of their attitude to data privacy but also because of their nutjob CEO. When I subscribed I was excited because I thought they wanted to build a proper competitor to other search engine operators, but they are actually just another company that tries to shove AI into absolutely everything. So, after realising that they are an untrustworthy company full of tech maximalists trying to build the torment nexus, I immediately canceled my subscription and moved back to duckduckgo and marginalia. Maybe I give SearXNG another go, it’s just that selfhosting is a bit of a bother.

  • 0oWow@lemmy.world
    link
    fedilink
    English
    arrow-up
    18
    arrow-down
    6
    ·
    6 months ago

    The fact that they specifically mentioned those search engines, when I checked back in late March, was a selling point for me.

    In not sure I would have even tried it if I only saw the new wording.

    Searches are good on Kagi though, but Brave Search Premium is trying to catch my attention.

    • muse@kbin.social
      link
      fedilink
      arrow-up
      54
      arrow-down
      3
      ·
      6 months ago

      Brave being run by a bigot crypto bro makes it a non-option for lots of us, sadly.

    • azertyfun@sh.itjust.worksOP
      link
      fedilink
      English
      arrow-up
      7
      ·
      6 months ago

      I know Brave browser has had a lot of controversy in the past regarding their business practices, including rolling out their own crypto-coin.

      They apparently make the really bold claim of using their own index exclusively. If true (given their track record I am not 100 % willing to accept that as truth without seeing some independent analysis), that would do wonders for the search ecosystem. I’m definitely interested to see how it pans out.

    • BaroqueInMind@lemmy.one
      link
      fedilink
      English
      arrow-up
      15
      arrow-down
      1
      ·
      6 months ago

      It still requires the use of Google/Bing/etc API calls. There’s literally no way to truly self host a web indexing search engine without sacrificing your privacy or paying millions of dollars.

      • hedgehog@ttrpg.network
        link
        fedilink
        English
        arrow-up
        4
        ·
        6 months ago

        You can use YaCy, which can be run as an independent self-hosted index (in “Local” mode), where it will index sites visited as part of web crawls that you initiate, or you can run it as part of a decentralized peer-to-peer network of indexes.

        YaCy has its own search UI but you can also set up SearXNG to use it.

        • BaroqueInMind@lemmy.one
          link
          fedilink
          English
          arrow-up
          3
          ·
          6 months ago

          I have mentioned this software a while back here in lemmy and someone with actual expertise mentioned running YaCy on noncommercial hardware becomes untenable after a certain duration due to poor quality network, incompatible indexing algorithms, massive database, and search query response.

          • Dark ArcA
            link
            fedilink
            English
            arrow-up
            1
            ·
            edit-2
            6 months ago

            I am not that person, but the only way I see YaCy being useful/usable long term is as a web crawler for specific sites that you personally find high value in/regularly pruning irrelevant index data.

    • Cheradenine@sh.itjust.works
      link
      fedilink
      English
      arrow-up
      3
      ·
      6 months ago

      You can do it, or use one of the instances at searx.space

      Searx is great, it’s all I use, but it’s a meta, there is not a ‘Searx Index’ which is what this is about.

      • hedgehog@ttrpg.network
        link
        fedilink
        English
        arrow-up
        1
        ·
        6 months ago

        there is not a ‘Searx Index’ which is what this is about.

        There’s YaCy, which includes a search index (which can be independent or can join a P2P network of indexes), web crawler, and web ui for searching. It can also be added as a SearXNG engine.

  • 7heo@lemmy.ml
    link
    fedilink
    English
    arrow-up
    15
    arrow-down
    4
    ·
    6 months ago

    Hi! Great post, good research with sources, great initiative, thank you. 🙏

  • ikidd@lemmy.world
    link
    fedilink
    English
    arrow-up
    9
    arrow-down
    1
    ·
    edit-2
    6 months ago

    Maybe they should give away some T-shirts to advertise their non-affiliation.

  • antihumanitarian@lemmy.world
    link
    fedilink
    English
    arrow-up
    6
    ·
    6 months ago

    Them using Google indexes anonymously isn’t intending to solve the problem you think it is. It’s more about incentive structures. Google’s “free” search optimizes for ad revenue now. The API access doesn’t as much, and Kagi certainly doesn’t have an ad incentive. So privacy is a nice bonus, but the real benefit is a customer serving incentive structure.

  • tb_@lemmy.world
    link
    fedilink
    English
    arrow-up
    5
    ·
    6 months ago

    This video by TechAltar is great and goes into why Bing/Google are often the backend for alternative search engines, such as DDG (Bing), Ecosia (Bing), and Startpage (Google).

  • UraniumBlazer@lemm.ee
    link
    fedilink
    English
    arrow-up
    5
    arrow-down
    1
    ·
    6 months ago

    An unrelated-ish question I know, but how much would Kagi’s index have costed?