7:[["$","script",null,{"type":"application/ld+json","dangerouslySetInnerHTML":{"__html":"{\"@context\":\"https://schema.org\",\"@type\":\"DiscussionForumPosting\",\"headline\":\"TurboQuant Cuts Memory Usage For AI Inference\",\"text\":\"In short, this new compression technique will enable lower memory usage by local\\nAI models running on consumer hardware and anywhere else running (not training)...\",\"author\":{\"@type\":\"Person\",\"name\":\"Nicholas Nielsen\"},\"datePublished\":\"2026-03-25T15:41:13.000Z\",\"dateModified\":\"2026-03-25T21:30:09.489Z\",\"url\":\"https://www.flankinvesting.com/p/4cdcd3db-a846-4359-ae3e-c2832382d473\"}"}}],["$","div",null,{"className":"w-full max-w-7xl mx-auto px-2 bg-white rounded-lg shadow-lg pt-1 mt-4","children":["$","div",null,{"className":"flex flex-col gap-6 my-8 mx-2","children":[["$","$L1b",null,{"children":[["$","section",null,{"children":["$","div",null,{"className":"flex items-center justify-between","children":[["$","div",null,{"className":"flex items-center flex-1 min-w-0","children":[["$","img",null,{"alt":"Alphabet Inc","className":"max-w-[60px] sm:max-w-[50px] rounded-lg bg-white ring-1 ring-gray-900/10","loading":"lazy","src":"https://d2qkqhh60b0n0e.cloudfront.net/logos/goog.svg"}],["$","div",null,{"className":"ml-3 w-full","children":[["$","a",null,{"href":"/company/GOOG/research/overview","className":"flex items-baseline gap-2 hover:underline","children":[["$","p",null,{"className":"text-sm text-gray-900 font-semibold","children":"GOOG"}],["$","p",null,{"className":"text-sm text-gray-600","children":"Alphabet Inc"}]]}],["$","$L1c",null,{"createdAt":1774453273000,"author":{"id":"e99da353-d999-42ad-92f2-7fa619b23541","createdAt":1717092443000,"firstName":"Nicholas","handle":"CAT","lastName":"Nielsen","level":8,"nextLevelXp":null,"profilePhoto":"profile-pictures/03c894c8-25b0-413f-a5ce-859a69c2da57.jpg","xp":246,"isSubscribed":true,"__typename":"UserProfileDTO"},"isDeleted":false}]]}]]}],["$","$L1d",null,{"postId":"4cdcd3db-a846-4359-ae3e-c2832382d473","authorId":"e99da353-d999-42ad-92f2-7fa619b23541"}]]}]}],["$","section",null,{"children":["$","$L1e",null,{"post":{"author":"$1f","body":"

In short, this new compression technique will enable lower memory usage by local AI models running on consumer hardware and anywhere else running (not training) AI models.

https://research.google/blog/turboquant-redefining-ai-efficiency-with-extreme-compression/

","postComments":[{"content":"

Ooooo! that's really interesting!

","createdAt":1774471170000,"id":"4717aaab-fc20-467f-bf24-0e5f6599588d","parentId":null,"updatedAt":1774537132000,"deletedAt":null,"user":{"handle":"david","id":"a87fd0d7-9f71-434b-bb3e-65bd93f609a3","profilePhoto":"profile-pictures/e303f19e-f9c5-471d-8814-f29a99292bf4.jpg","firstName":"David","lastName":"Flanks","level":10,"xp":504,"isSubscribed":true,"__typename":"UserProfileDTO"},"voteQuantity":3,"__typename":"PostCommentDTO"},{"content":"

If you are interested in applying things on small devices (like smartphones) LiquidAIModells for example are worth a look or MoEs in general. With a good system instruction and harness they are quiet capable for everyday use.

","createdAt":1774514234000,"id":"dece9029-97a6-4a14-aced-5b505f3aa5a1","parentId":null,"updatedAt":1774545104000,"deletedAt":null,"user":{"handle":"philfischerflank","id":"6504bf94-87ba-4e91-9ed0-812eb17ef63f","profilePhoto":null,"firstName":"Phil","lastName":"Fischer","level":10,"xp":374,"isSubscribed":false,"__typename":"UserProfileDTO"},"voteQuantity":2,"__typename":"PostCommentDTO"}],"createdAt":1774453273000,"deletedAt":null,"updatedAt":1774474209489,"publishedAt":null,"id":"4cdcd3db-a846-4359-ae3e-c2832382d473","images":[],"postTarget":{"id":"94c15125-4c82-4f86-9920-475ecc7b3202","forum":null,"company":{"id":"8db5c6e0-6d8d-4146-af69-ee2d2bc2985d","displaySymbol":"GOOG","logoPath":"goog.svg","name":"Alphabet Inc","__typename":"Company"},"researchStep":null,"userJourney":null,"__typename":"PostTargetDTO"},"title":"TurboQuant Cuts Memory Usage For AI Inference","voteQuantity":3,"__typename":"PostDTO"},"children":[["$","h1",null,{"className":"text-xl font-semibold text-gray-900 pb-3","children":"TurboQuant Cuts Memory Usage For AI Inference"}],null,["$","div",null,{"className":"prose prose-base max-w-none text-gray-700 prose-p:my-2 prose-li:my-1 prose-ul:my-2 prose-ol:my-2 prose-h1:my-3 prose-h2:my-3 prose-h3:my-2 prose-blockquote:my-3 prose-pre:my-2 prose-hr:my-3 p-2 rounded-sm","children":["$","div",null,{"className":"tiptap","dangerouslySetInnerHTML":{"__html":"

In short, this new compression technique will enable lower memory usage by local AI models running on consumer hardware and anywhere else running (not training) AI models.

https://research.google/blog/turboquant-redefining-ai-efficiency-with-extreme-compression/

"}}]}]]}]}]]}],["$","$L20",null,{"postId":"4cdcd3db-a846-4359-ae3e-c2832382d473"}]]}]}]]