Or try one of the following: 詹姆斯.com, adult swim, Afterdawn, Ajaxian, Andy Budd, Ask a Ninja, AtomEnabled.org, BBC News, BBC Arabic, BBC China, BBC Russia, Brent Simmons, Channel Frederator, CNN, Digg, Diggnation, Flickr, Google News, Google Video, Harvard Law, Hebrew Language, InfoWorld, iTunes, Japanese Language, Korean Language, mir.aculo.us, Movie Trailers, Newspond, Nick Bradbury, OK/Cancel, OS News, Phil Ringnalda, Photoshop Videocast, reddit, Romanian Language, Russian Language, Ryan Parman, Traditional Chinese Language, Technorati, Tim Bray, TUAW, TVgasm, UNEASYsilence, Web 2.0 Show, Windows Vista Blog, XKCD, Yahoo! News, You Tube, Zeldman
Vector Institute aims to clear up confusion about AI model performance | InfoWorld
Technology insight for the enterpriseVector Institute aims to clear up confusion about AI model performance 10 Apr 2025, 8:41 pm
AI models are advancing at a dizzying clip, with builders boasting ever more impressive performance with each iteration.
But how do the various models really stack up? And how can IT buyers truly know that vendors are being forthcoming with their results?
The Geoffrey Hinton-founded Vector Institute for Artificial Intelligence hopes to bring more clarity with its new State of Evaluations study, which includes an interactive leaderboard. The independent, non-profit AI research institute tested 11 top open and closed source models against 16 benchmarks in math, general knowledge, coding, safety, and other areas, fully open sourcing its results.
“Researchers, developers, regulators, and end-users can independently verify results, compare model performance, and build out their own benchmarks and evaluations to drive improvements and accountability,” said John Willes, Vector Institute’s AI infrastructure and research engineering manager.
How did the top models do?
Vector Institute analyzed a range of state-of-the-art models:
Open source:
- Qwen2.5-72B-Instruct (Alibaba)
- Llama-3.1-70B-Instruct (Meta)
- Command R+ (Cohere)
- Mistral-Large-Instruct-2407 (Mistral)
- DeepSeek-R1 (DeepSeek)
Closed source:
- OpenAI GPT-4o
- OpenAI o1
- OpenAI GPT4o-mini
- Google Gemini-1.5-Pro
- Google Gemini-1.5-Flash
- Anthropic Claude-3.5-Sonnet
Models were ranked on two types of benchmark: basic, comprising short, question-answer tasks; and agentic, requiring sequential decisions and tool use to solve multi-step challenges. They were tested on language understanding, math, code generation, general AI assistance, AI harmfulness, common sense reasoning, software engineering, graduate level intelligence and other tasks.
Model performance ranged widely, but DeepSeek and o1 consistently scored highest. Command R+, on the other hand, exhibited the lowest performance, but Willes pointed out that it is the smallest and oldest of the models tested.
Overall, closed source models tended to outperform open source models, particularly with the most challenging knowledge and reasoning tasks. That said, DeepSeek’s performance proves that open source can remain competitive.
“In simple cases, these models are quite capable,” said Willes. “But as these tasks get more complicated, we see a large cliff in terms of reasoning capability and understanding.”
One such task could be, for instance, a customer support function requiring a number of steps. “For complex tasks, there’s still engineering work to be done,” said Willes. “We’re a long way from really general purpose models.”
All 11 models also struggled with agentic benchmarks designed to assess real world problem-solving abilities around general knowledge, safety, and coding. Claude 3.5 Sonnet and o1 ranked the highest in this area, particularly when it came to more structured tasks with explicit objectives. Still, all models had a hard time with software engineering and other tasks requiring open-ended reasoning and planning.
Multimodality is becoming increasingly important for AI systems, as it allows models to process different inputs. To measure this, Vector developed the Multimodal Massive Multitask Understanding (MMMU) benchmark, which evaluates a model’s ability to reason about images and text across both multiple-choice and open-ended formats. Questions cover math, finance, music and history and are designated as “easy,” “medium,” and “hard.”
In its evaluation, Vector found that o1 exhibited “superior” multimodal understanding across different formats and difficulty levels. Claude 3.5 Sonnet also did well, but not at o1’s level. Again, here, researchers found that most models dropped in performance when given more challenging, open-ended tasks.
“There’s a lot of work going on right now that’s exploring how to make these systems really multimodal, so they can take text input, image input, audio input, and unify their capabilities,” said Willes. “The takeaway here is we’re not quite there yet.”
Overcoming challenges of benchmarking
Willes pointed out that one of the big problems with benchmarking is evaluation leakage, where models learn to perform well on specific evaluation datasets they’ve seen before, but not on new, unseen data.
“Once these benchmarks are out in the public domain, it’s awesome because others can replicate and validate,” he said. However, “there’s a huge challenge in making sure that when a model improves its performance in the benchmark, we’re sure it’s because we’ve had a step change in the model’s capability, not just that it’s seen the answers to the test.”
To help IT buyers make sense of its findings and apply the best models to their specific use cases, Vector has released all of its sample-level results.
“Most of the time, when people report these metrics, they give you a high level metric,” said Willes. But on Vector’s interactive leaderboard, users can click through and analyze every single question asked of the model, and the ensuing output.
So, if enterprise users have a particular use case they want to dig into, they can go very deeply into the results to gain that understanding. It is important to have a strong connection to real world use cases so that IT decision makers can do one-to-one mapping between the models being evaluated and what they’re building, Willes pointed out.
“That’s one of the things that we’re trying to solve here, is to make the methodology as open as possible,” he said.
To overcome some of the most common benchmarking challenges, Vector is advocating for more novel benchmarks and dynamic evaluation, he explained, such as judging models against each other and against a continuously-evolving scale.
“[Dynamic evaluations] have a lot more longevity and avoid a lot of the evaluation leakage issues,” said Willes. Ultimately, he said, “there’s a need for continued development in benchmarking and evaluation.”
Google unveils Firebase Studio for AI app development 10 Apr 2025, 6:10 pm
Google is previewing Firebase Studio, a cloud-based agentic development environment designed to build, test, deploy, and run AI applications.
Introduced April 9, Firebase Studio fuses tools such as the Project IDX cloud IDE, the Genkit framework for AI applications, and Gemini in Firebase, an AI-powered collaborative assistant, into a unified, agentic experience, Google said. The platform offers prototyping capabilities, coding workspaces, and flexible deployment options, allowing developers to move faster and build the next generation of innovative applications quicker, the company said.
Developers can use an app prototyping agent to generate functional web app prototypes, starting with Next.js, using natural language prompts, images, or drawings. Firebase Studio wires up Genkit and provides a Gemini API key to enable AI features to work out-of-the-box.
Developers also can iterate quickly with Firebase Studio, Google said. Apps can be edited through chat with Gemini. For example, developers could ask Gemini to add user authentication, change the layout, or refine the UI. Gemini understands a developer’s code base, according to Google. Additionally, developers can leverage a coding workspace in a familiar CodeOSS-based IDE, backed with Gemini code assistance for code completion, debugging, terminal access, and integrations with Firebase services.
To see how a prototype looks on a device, developers can generate a public URL for a web preview or a QR code to load a preview of the app on a phone. When developers are pleased with a prototype and ready to test, they can click Publish. Firebase Studio uses Firebase App Hosting for one-click deployment. Developers then can open an app inside a Firebase Studio coding space, refine the architecture, and expand features to prepare for production deployment. An entire workspace can be shared with a URL. Users can collaborate in real time within the same Firebase Studio environment and then push updates.
Developers can sign up for early access to Gemini Code Assist agents within Firebase Studio for tasks such as code migration, AI model testing, and code documentation.
Google’s BigQuery and Looker get agents to simplify analytics tasks 10 Apr 2025, 7:44 am
Google has added new agents to its BigQuery data warehouse and Looker business intelligence platform to help data practitioners automate and simplify analytics tasks.
The data agents, announced at the company’s Google Cloud Next conference, include a data engineering and data science agent — both of which have been made generally available.
[ Related: Google Cloud Next ’25: News and insights ]
The data engineering agent, which is embedded inside BigQuery, is designed to help data practitioners by delivering support to build data pipelines, perform data preparation, and automate metadata generation.
According to Google, the data engineering agent will simplify and accelerate analytics tasks as data practitioners, typically, have to spend a majority chunk of their time creating a pipeline for data and preparing data to get actionable insights from it.
Another capability of the data engineering agent, which is currently in preview, is the ability to detect anomalies in order to maintain data quality.
The data science agent, accessible via the company’s free, cloud-based Jupyter notebook service — Colab, is designed to help data scientists automate feature engineering.
Feature engineering in data science refers to the process of revamping raw data into features that a model can use to make predictions.
The agent is also capable of providing intelligent model selection and enabling scalable training along with faster iteration cycles.
This would allow enterprise data science teams to focus on building data science workflows instead of having to worry about revamping data and managing infrastructure.
Google has also added a conversational analytics tool to Looker, currently in preview, to help enterprise users interact with data using natural language.
The tool, essentially an agent, shows the reasoning behind its response to a query to help the end user understand and monitor its behavior — a requirement for most enterprises deploying agents in order to escape pitfalls around hallucination.
However, Google pointed out that the tool is powered by Looker’s semantic layer and that is expected to improve its accuracy.
The cloud services provider has also made an API of conversational analytics available to developers to help them integrate it into applications and workflows.
The agents made available via BigQuery, Looker, and Colab comes at no additional cost, the company said in a statement.
BigQuery gets a new knowledge engine and an AI query engine
As part of the updates to BigQuery, the cloud services provider is adding a knowledge engine to help enterprises analyze autonomous data — datasets inside the warehouse that exists independent of any applications.
According to Google, the knowledge engine will use Gemini to analyze schema relationships, table descriptions, and query histories to generate metadata on the fly, and model data relationships.
This engine would act as the foundational layer for enterprises to ground AI models and agents in business context Google said.
In order to assist data practitioners further, Google said that it was adding intelligent SQL cells to BigQuery notebook.
The cells can understand data’s context and provide suggestions as scientists write code while enabling them to join data sources directly within the notebook, the company said.
Other updates to the notebook includes features, such as the ability to share insights across an enterprise and build interactive data applications.
In its efforts to help data practitioners analyze structured and unstructured data together inside BigQuery, Google has added a new AI query engine.
“This engine enables data scientists to move beyond simply retrieving structured data to seamlessly processing both structured and unstructured data together with added real-world context,” Yasmeen Ahmad, managing director of data analytics at Google Cloud, wrote in a blog post.
The AI query engine co-processes traditional SQL alongside Gemini to inject runtime access to real-world knowledge, linguistic understanding, and reasoning abilities, Ahmad added.
Other updates to BigQuery
Expanding its efforts to help data practitioners analyze unstructured data further, Google is adding multimodal tables to BigQuery.
Multimodal tables, currently in preview, will allow enterprises to bring complex data types to BigQuery and store them alongside structured data in unified storage for querying.
“To effectively manage this comprehensive data estate, our enhanced BigQuery governance provides a single, unified view for data stewards and professionals to handle discovery, classification, curation, quality, usage, and sharing, including automated cataloging and metadata generation,” Ahmad wrote.
While BigQuery governance is still in preview, automated cataloging, as a feature, has been made generally available. Other updates to BigQuery include the general availability of Google Cloud for Apache Kafka to facilitate real-time data streaming, analytics; and the addition of serverless execution of Apache Spark workloads in preview.
Google Cloud Next 2025: News and insights 10 Apr 2025, 7:37 am
Google Cloud Next ’25, which runs from April 9-11 in Las Vegas is highlighting the latest advancements and future directions of Google Cloud Platform and cloud computing. This year, artificial intelligence and machine learning, along with announcements of new AI-powered tools and services aimed at boosting productivity, automating tasks, and driving innovation across industries, are dominating the news.
Another theme will undoubtedly be data analytics and management. With the rise in the volume of data, Google Cloud Next is expected to unveil updates and new features for BigQuery, data lakes, and other data-centric services. The focus will likely be on making data more accessible, actionable, and secure for enterprises of all sizes. Expect discussions around real-time analytics, data governance, and the integration of AI/ML with data workflows.
Hybrid and multicloud strategies will also be a prominent topic. Google Cloud has been promoting its Anthos platform, and so you can expect further developments and success stories around enabling seamless workload migration and management across on-premises, Google Cloud, and other cloud environments. This will likely include updates focused on consistency, security, and cost optimization in hybrid and multicloud deployments.
Follow this page for ongoing coverage of Google Cloud Next.
Google Cloud Next news
Google’s BigQuery and Looker get agents to simplify analytics tasks
April 10, 2025: Google added new agents to its BigQuery data warehouse and Looker business intelligence platform to help data practitioners automate and simplify analytics tasks. The data agents include a data engineering and data science agent — both of which have been made generally available.
Cisco, Google Cloud offer enterprises new way to connect SD-WANs
April 10, 2025: Cisco and Google Cloud have expanded their partnership to integrate Cisco’s SD-WAN with the cloud provider’s fully managed Cloud WAN service. For Cisco SD-WAN customers, the integration provides a new, secure way to tie together geographically dispersed enterprise data center sites with their Google Cloud workloads using Google’s core global network backbone.
Google Cloud introduces cloud app design center
April 10, 2025: Google Cloud has introduced Application Design Center, a service that helps platform administrators and developers design, deploy, and manage applications on the Google Cloud Platform. Application Design Center is designed to provide a visual, canvas-style approach to designing and modifying application templates.
Google launches unified enterprise security platform, announces AI security agents
April 9, 2025: Google has launched a new enterprise security platform, Google Unified Security, that combines the company’s visibility, threat detection, and incident response capabilities and makes it available across networks, endpoints, cloud infrastructure, and apps.
Google to add on-demand genAI data analyst to Workspace
April 9, 2025: Google’s on-demand generative AI analyst for spreadsheets stood out among a slew of new AI features for Google Workplace productivity suite announced at Cloud Next event . The upcoming “help me analyze” feature in the Google Sheets application in Workspace is designed to take information from tables and provide instant data analysis and insights. I
Google targets AI inferencing opportunity with Ironwood chip
April 9, 2025: Google has unveiled Ironwood, a new chip that could help enterprises accelerate generative AI workloads, especially inferencing — the process used by a large language model (LLM) to generate responses to a user request.
Google’s Agent2Agent open protocol aims to connect disparate agents
April 9, 2025: Google has taken the covers off a new open protocol — Agent2Agent (A2A) — that aims to connect agents across disparate ecosystems. Google said that the A2A protocol will enable enterprises to adopt agents more readily as it bypasses the challenge of agents that are built on different vendor ecosystems.
Google adds open source framework for building agents to Vertex AI
April 9, 2025: Google is adding a new open source framework for building agents to its AI and machine learning platform Vertex AI, along with other updates to help deploy and maintain these agents. Google said the open source Agent Development Kit (ADK) will make it possible to build an AI agent in fewer than100 lines of Python code. It expects to add support for more languages later this year.
Google adds natural language query capabilities to AlloyDB
April 9, 2025: Google is enhancing AlloyDB, its managed database-as-a-service (DBaaS), to help developers build applications underpinned by generative AI. Announced at Google’s annual Cloud Next conference, the updates could give the PostreSQL-compatible AlloyDB an edge over PostgreSQL itself or other compatible offerings such as Amazon Aurora.
Related Google Cloud News
Google acquires Wiz: A win for multicloud security
Mach 25, 2025: Google’s recent acquisition of Wiz positions the tech giant as a leader ready to tackle today’s multicloud challenges, potentially outperforming competitors such as Microsoft Azure and AWS. The collaboration aims to simplify complex security architectures and underscores Google’s commitment to addressing the gaps many enterprises face when combining different cloud ecosystems.
Who needs Google technology? Probably not you
March 3, 2025: Clearly Google is doing something right. Although Google Cloud’s revenue still lags AWS and Microsoft Azure, it’s growing much faster (albeit on a smaller base). But that’s not the real story of its growth. The story is that Google Cloud is growing at all.
Six key takeaways from Google Cloud Next ’24
April 10, 2025: Generative AI was the theme at Google Cloud Next ’24, as Google rolled out new chips, software updates for AI workloads, updates to LLMs, and generative AI-based assistants for its machine learning platform Vertex AI.
Google Cloud Next 2024: AI networking gets a boost
April 10, 2024: Google announced new cloud networking capabilities that aim to help enterprises securely connect AI and multicloud workloads. The new features expand on the company’s Cross-Cloud Network service and are focused on high-speed networking for AI/ML workloads, any-to-any cloud connectivity,
Gemini Code Assist debuts at Google Cloud Next 24
April 9, 2025: Gemini Code Assist provides AI-powered code completion, code generation, and chat. It works in the Google Cloud Console, and integrates into popular code editors such as Visual Studio Code and JetBrains,
Adding smarts to Azure data lakes with Fabric Data Agents 10 Apr 2025, 5:00 am
Enterprise AI needs one thing if it’s to get around the limitations of large language models and deliver the results businesses need from their agents. It doesn’t matter if you’re building retrieval-augmented generation (RAG) applications, fine-tuning, or using the Model Context Protocol, what you need is data—and lots of it.
Thus Microsoft has been evolving its large-scale data lake platform Fabric to work with its Azure AI Foundry development platform. At its recent FabCon 2025 event, Microsoft announced further integrations between the two, using Fabric to develop agents that work with data and can then be orchestrated and built into applications inside Azure AI Foundry. By mixing familiar data analytics techniques with AI tools, Microsoft is making it easier to access enterprise data and insights and to use them to ground AI agent outputs.
Working with Fabric and AI
Fabric’s data agents are designed to be built and tested outside the Azure AI Foundry. You can use them to explore your data conversationally, as an alternative approach to traditional analytics tools. Here you ask questions about your data and use those answers to refine prompts and queries, ensuring that the prompt returns sensible data that can help guide effective business decisions when built into an application. With data scientists and business analysts using iterative techniques to deliver grounded queries, any risk associated with using an LLM in a business application is significantly reduced.
Fabric data agents work with existing OneLake implementations, giving them a base set of data to use as context for your queries. Along with your data, they can be fine-tuned using examples or be given specific instructions to help build queries.
There are some prerequisites before you can build a data agent. The key requirement is an F64 or higher client, along with a suitable data source. This can be a lake house, a data warehouse, a set of Power BI semantic models, or a KQL database. Limiting the sources makes sense, as it reduces the risk of losing the context associated with a query and keeps the AI grounded. This helps ensure the agent uses a limited set of known query types, allowing it to turn your questions into the appropriate query.
Building AI-powered queries
The agent uses user credentials when making queries, so it only works with data the user can view. Role-based access controls are the default, keeping your data as secure as possible. Agents’ operations need to avoid leaking confidential information, especially if they’re to be embedded within more complex Azure AI Foundry agentic workflows and processes.
Fabric data agents are based on Azure OpenAI Assistant APIs. This ensures that requests are gated through Azure’s AI security tools, including enforcing responsible AI policies and using its regularly updated prompt filters to reduce the risks associated with prompt injection or other AI attacks. As Fabric’s agents are read-only, we need to be sure they can’t be used to leak confidential data.
The queries generated by the agent are built using one of three different tools, which translate natural language to Fabric’s query languages: SQL for relational stores, DAX for Power BI, and KQL for non-relational queries using Kusto. This will allow you to validate any queries if necessary, as they’re designed to be correctly formed. However, in practice, Fabric data agents are intended for business users to build complex queries without needing to write any code.
Tuning an agent with instructions and examples
Microsoft itself suggests that building a Fabric data agent should require much the same level of knowledge as creating a Power BI report. Building an agent takes more than choosing data sources and tables; there’s a key element of the process that takes advantage of an LLM’s use of context.
By adding instructions and sample queries to an agent definition, you can start to improve the context it uses to respond to user queries. Instructions can refine which data sources are used for what type of question, as well as provide added specialist knowledge that might not be included in your data. For example, instructions can define specialized terms used in your organization. It’s not quite fine-tuning, as the instructions don’t affect the model, but it does provide context to improve output and reduce the risk of hallucination.
Having tuning tools built into the agent creation process is important. Microsoft is aiming to make Fabric a single source of truth for organizational data, so keeping the risk of errors to a minimum must be a key requirement for any AI built on that data.
Unlike other agent frameworks, you have to put in the necessary work to first ensure that you choose the right sources for your agent. Then you have to make sure that it’s given enough context to route queries to the appropriate source (for example, if you’re using Fabric to store observability information for applications, then your agent should use KQL). Finally, you have to assemble a set of question-and-answer pairs that train the agent in the types of queries it will work with and how it should respond.
If an agent gives incorrect answers to queries, the most effective way to improve its grounding is to use more examples to improve its context. The more curated examples you use when building a data agent, the better.
You don’t need much coding skill to build a Fabric data agent. The process is very much designed for data specialists—an approach in line with Microsoft’s policy of making AI tools available to subject matter experts. If you prefer to use code to build a data agent, Microsoft provides a Python SDK to create, manage, and use Fabric data agents.
Building your first Azure data agent
Getting started is simple enough. Fabric data agents are now a standard item in the Fabric workspace, so all you need to do is create a new agent. Start by giving it a name. This can be anything, though it’s best to use a name related to the agent’s purpose—especially if you intend to publish it as an endpoint for Azure AI Foundry applications.
Once you give your agent a name, you can add up to five data sources from your Fabric environment. The tool provides access to the Fabric OneLake data catalog, and once you’ve selected a source you can expose tables to the agent by simply using checkboxes to select the data you want. If you need to add other sources later or change your table choice, you can do so from Fabric’s Explorer interface. One useful tip is to ensure that tables have descriptive names. The more descriptive they are, the more accurate the queries generated by the agent.
You can now test the agent by asking questions. Microsoft notes that you can only ask questions about the data; there’s no reasoning capability in the model, so it can’t infer results where there is no data or where it would require access to information that isn’t stored in Fabric. That’s no surprise, as what we’re building here is a traditional RAG application with a deep connection to your data.
Tuning and sharing a data agent
The agent you have at this point has no instructions or tuning; you’re simply testing that it can construct and parse queries against your sources. Once you’ve got a basic agent in place, you can apply instructions and tuning. A separate pane in the design surface lets you add up to 15,000 characters of instructions. These are in English and should describe how the agent should work with your data sources. You can use the agent’s prompting tools to test instructions alongside your proven queries.
Now you can use example queries to tune the model, using the few-shot learning technique. By providing pairs of queries and their expected responses, you create a set of weights that guides the underlying model to produce the answers you expect. Few-shot learning is a useful tool for data-based agents because you can get reliable results with very few query/answer pairs. You can provide examples for all supported data sources, apart from Power BI semantic models.
Once tuned and tested, a Fabric data agent can be published and shared with colleagues or applications. Publishing creates two versions of your agent: one you can continue to change, and one that’s frozen and ready to share. Fabric data agents can be used with Azure AI Foundry as components in the Azure AI Agent Service. Here they are used as knowledge sources, with one Fabric data agent per Azure AI agent. Endpoints can be accessed via a REST API, using your workspace details as part of the calling URL.
Microsoft’s agent tool takes the interesting approach of putting development in the hands of subject matter experts. Here Fabric data agents are a low-code tool for building grounded, data-centric, analytical AI services that use Fabric’s OneLake as a RAG data source. Business analysts and data scientists can build, test, and tune agents before opening them up to the rest of the business via Azure AI Foundry, providing deep access to key parts of business data and allowing it to become part of your next generation of AI-powered business workflows.
DSPy: An open-source framework for LLM-powered applications 10 Apr 2025, 5:00 am
The past year has seen explosive growth in generative AI and the tools for integrating generative AI models into applications. Developers are eager to harness large language models (LLMs) to build smarter applications, but doing so effectively remains challenging. New open-source projects are emerging to simplify this task. DSPy is one such project—a fresh framework that exemplifies current trends in making LLM app development more modular, reliable, and data-driven. This article provides an overview of DSPy, covering what it is, the problem it tackles, how it works, key use cases, and where it’s headed.
Project overview – DSPy
DSPy (short for Declarative Self-improving Python) is an open-source Python framework created by researchers at Stanford University. Described as a toolkit for “programming, rather than prompting, language models,” DSPy allows developers to build AI systems by writing compositional Python code instead of hard-coding fragile prompts. The project was open sourced in late 2023 alongside a research paper on self-improving LLM pipelines, and has quickly gained traction in the AI community.
As of this writing, the DSPy GitHub repository, which is hosted under the StanfordNLP organization, has accumulated nearly 23,000 stars and nearly 300 contributors—a strong indicator of developer interest. The project is under active development with frequent releases (version 2.6.14 was released in March 2025) and an expanding ecosystem. Notably, at least 500 projects on GitHub already use DSPy as a dependency, signaling early adoption in real-world LLM applications. In short, DSPy has rapidly moved from research prototype to one of the most-watched open-source frameworks for LLM-powered software.
What problem does DSPy solve?
Building applications with LLMs today involves a lot of prompt engineering and ad hoc orchestration. Developers using frameworks like LangChain or LlamaIndex must manually craft prompt templates and chain model calls together, introducing several pain points:
- Brittle prompts and workflows. Writing prompts can be time-consuming and error-prone, and the prompts often break when you change models or inputs. Small differences in wording might yield inconsistent outputs, making maintenance a nightmare.
- Lack of reusability. Prompt logic is typically embedded in code or configuration that’s hard to generalize. There’s no standardized way to reuse reasoning steps, retrieval, or other components across projects.
- Scaling and optimization challenges. Improving the performance of an LLM app may require endless trial-and-error in writing prompts, providing examples, or configuring hyperparameters. Existing tools provide little automation for this process, so developers must rely on intuition and constant tweaking.
DSPy addresses these issues by shifting the paradigm from prompt hacking to high-level programming. Instead of writing one-off prompts, developers define the behavior of the AI in code (specifying model inputs, outputs, and constraints) and let DSPy handle the rest. Under the hood, DSPy will automatically optimize prompts and parameters for you, using algorithms to refine them based on feedback and desired metrics. Whenever you update your code, data, or evaluation criteria, you can recompile the DSPy program and it will re-tune the prompts to fit the changes. The framework essentially replaces manual prompt tuning with a smarter, iterative compilation process.
By replacing fragile prompts with declarative modules, DSPy makes LLM pipelines more robust to changes. It mitigates the “pipeline of sticks” problem where an update to the model or task requires rebuilding your prompt chain from scratch. In comparison to LangChain or LlamaIndex, which excel at connecting LLMs with tools and data but leave prompt crafting to the developer, DSPy provides a higher-level abstraction.
The value becomes apparent when integrating multiple steps or models: DSPy can coordinate the parts and optimize their interactions without extensive human fine-tuning. In summary, DSPy’s promise is to replace the painstaking, unscalable aspects of current LLM app development with a more systematic, maintainable approach to building AI applications.
A closer look at DSPy
How does DSPy achieve this shift from prompting to programming? The framework introduces an architecture inspired by software engineering principles and machine learning pipelines:
- Modules and signatures. At the core of DSPy are modules—reusable components that encapsulate a particular strategy for invoking an LLM. You define a module by specifying its input and output interface (called a signature). For example, you might declare a module for question answering as
question -> answer: text
, or a math solver asquestion -> answer: float
. DSPy expands these signatures into proper prompts and parses the model’s responses according to the expected output type. This design decouples your application logic from the raw prompt strings. You can compose multiple modules to form a pipeline, and each module can be updated or optimized independently. - Optimizers (self-improving pipelines). What differentiates DSPy is its optimizer subsystem. Optimizers are algorithms that DSPy uses to iteratively improve the prompts or even fine-tune smaller models behind the scenes. Using provided example data or even heuristic feedback, DSPy will generate variations of prompts, test them, and retain the ones that perform best on your defined success metrics. One of DSPy’s standout features is the ability to continuously refine prompts using feedback loops, leading to better model responses with each iteration. In practice, this means higher accuracy and consistency without the developer having to manually adjust prompt wording over and over.
- Built-in strategies. Out of the box, DSPy provides a library of prebuilt modules for common patterns in advanced LLM usage. These include Chain of Thought (to guide the model in step-by-step reasoning), ReAct (for reasoning and acting with tools in an agent loop), and other primitives for few-shot examples, tool usage, and history/context management. The framework also integrates with external tools via its Tool and Example abstractions. For instance, you can incorporate a web search or database lookup as part of a pipeline, or enforce output validation using schemas.
- Model and library support. DSPy is designed to be LLM-agnostic and work with a variety of model providers. It supports mainstream cloud APIs such as OpenAI GPT, Anthropic Claude, Databricks Dolly, etc., as well as local models running on your own hardware. Under the hood, DSPy uses a unified language model interface, and can leverage libraries like Hugging Face or the OpenAI SDK, so developers can plug in whatever model is appropriate.
Key use cases for DSPy
How can developers put DSPy into practice? Thanks to its flexible architecture, DSPy can be applied to a wide range of LLM-driven scenarios. Here are a few key use cases where the framework particularly shines:
Complex question answering with retrieval-augmented generation (RAG)
DSPy enables the creation of robust QA systems that retrieve relevant information before generating answers.
import dspy
# Configure the language model
dspy.configure(lm=dspy.LM('openai/gpt-4o-mini'))
# Define a retrieval function (e.g., search Wikipedia)
def search_wikipedia(query: str) -> list[str]:
results = dspy.ColBERTv2(url='http://20.102.90.50:2017/wiki17_abstracts')(query, k=3)
return [x['text'] for x in results]
# Define the RAG module
class RAG(dspy.Module):
def __init__(self):
super().__init__()
self.retrieve = dspy.Retrieve(k=3)
self.generate = dspy.ChainOfThought("question, context -> answer")
def forward(self, question):
context = self.retrieve(question)
return self.generate(question=question, context=context)
# Instantiate the RAG module
rag = RAG()
# Example usage
question = "What is the capital of France?"
answer = rag(question)
print(answer)
Text summarization
DSPy facilitates dynamic text summarization by defining modules that adapt to varying input lengths and styles.
import dspy
# Configure the language model
dspy.configure(lm=dspy.LM('openai/gpt-4o-mini'))
# Define the summarization module
class Summarizer(dspy.Module):
def __init__(self):
super().__init__()
self.summarize = dspy.ChainOfThought("document -> summary")
def forward(self, document):
return self.summarize(document=document)
# Instantiate the summarizer
summarizer = Summarizer()
# Example usage
document = "DSPy is a framework for programming language models..."
summary = summarizer(document)
print(summary)
LLM agents with tool integration
DSPy supports building AI agents that can reason and interact with external tools to perform tasks.
import dspy
# Configure the language model
dspy.configure(lm=dspy.LM('openai/gpt-4o-mini'))
# Define a custom tool (e.g., a calculator)
def calculator(expression: str) -> float:
return eval(expression)
# Define the agent module
class Agent(dspy.Module):
def __init__(self):
super().__init__()
self.react = dspy.ReAct("question -> answer", tools=[calculator])
def forward(self, question):
return self.react(question=question)
# Instantiate the agent
agent = Agent()
# Example usage
question = "What is 2 plus 2?"
answer = agent(question)
print(answer)
Bottom line – DSPy
DSPy is an ambitious framework that pushes the envelope of what development frameworks for LLMs can do. It addresses real pain points by making LLM applications more declarative, modular, and self-improving. By decoupling application logic from prompt strings, DSPy allows developers to combine modules into pipelines, update or optimize modules independently, and even continuously optimize modules using feedback loops. While still young, DSPy has a strong potential to become a go-to solution for building complex LLM-powered applications.
GitHub Advanced Security now offers security campaigns 9 Apr 2025, 5:08 pm
GitHub has made security campaigns available for GitHub Advanced Security and GitHub Code Security users. Security campaigns help control security debt and manage risk by enabling collaboration between developers and security teams, GitHub said.
Announced April 8 and available in the Copilot Autofix code scanning tool, security campaigns help security and developer teams collaborate on security across an entire codebase, according to GitHub. The feature makes vulnerability remediation quicker and more scalable, the company said.
Security campaigns with Copilot Autofix have been available for public preview since October 2024. With general availability, Github also provided updated capabilities including draft security campaigns, with security managers able to iterate on the scope of campaigns before making them available to developers. The feature also now allows the creation of GitHub issues that are updated automatically as the campaign progresses. Additionally, security managers can now view aggregated statistics showing the progress across both currently active and past campaigns.
Copilot Autofix with security campaigns helps security teams triage and prioritize vulnerabilities, with the ability to generate code suggestions for as many as 1,000 scanning alerts simultaneously, GitHub said. The Autofix tool provides instant remediation suggestions and reduces mean-time to remediation by as much as 60%, said GitHub.
Google adds open source framework for building agents to Vertex AI 9 Apr 2025, 8:00 am
Google is adding a new open source framework for building agents to its AI and machine learning platform Vertex AI, along with other updates to help deploy and maintain these agents.
It unveiled the open source Agent Development Kit (ADK) at its annual Google Cloud Next conference, saying it will make it possible to build an AI agent in under 100 lines of Python code. It expects to add support for more languages later this year.
[ Related: Google Cloud Next ’25: News and insights ]
ADK is built on the same framework that powers Google Agentspace and Google Customer Engagement Suite (CES) agents.
VP of Cloud AI Saurabh Tiwary, said developers will be able to use it through the entire agent development lifecycle, starting with shaping how agents think, reason, and operate within guardrails, and to select which LLMs they are running in the backend. Interaction with agents won’t be limited to text: ADK also provides bidirectional audio and video capabilities, he said.
Agent Development Kit to help build multi-agent systems faster
Gartner vice president analyst Jim Hare said he expects the ADK will help enterprises accelerate the build-out of in-demand multi-agent systems.
Making the framework open source will free up Google’s engineering resources, Hare said, as it will enable the company to leverage a community of developers to enhance and maintain the code base. “At the same time, Google gains by showing the developer audience it wants to help enterprises benefit from using agentic AI using open software and tooling,” he said.
Duncan Van Kouteren, an analyst at Nucleus Research, said that the open-source framework will also create a natural path for enterprises to move to Google’s paid cloud services. “Google anticipates that once developers build with their framework, they’re more likely to use Google Cloud for deployment,” he said.
Some of Google’s cloud rivals have similar ambitions, including IBM with its open-source BEE Agent Framework and Microsoft with AutoGen.
The ADK also supports the Model Context Protocol (MCP). Anthropic created the protocol to enable agents to interact with data sources and LLMs, and it’s rapidly gaining support from other software vendors.
Other Vertex AI updates
Google has also added a collection of ready-to-use, pre-built agent patterns and components named Agent Garden to accelerate the model development process.
It’s a strategy already followed by Salesforce and Microsoft, which introduced ready-to-use agent templates and components, in their Agentforce and Copilot offerings respectively.
Other updates to Vertex AI include a fully managed runtime engine to help developers deploy agents in production and maintain control over their behavior. Agent Engine is intended to handle infrastructure management and help with tasks such as rebuilding an agent or an agent system when moving it to production. Developers will be able to use Agent Engine to deploy agents or systems built on any framework, including ADK, LangGraph, and Crew.ai among others, on Google Cloud, Tiwary said.
Agent Engine can also help agents retain the context of sessions as it supports short-term memory and long-term memory, he added. This means that applications integrated with such agents will be able to recall past conversations and user preferences. Further, he said, it can be used with Vertex AI’s evaluation tools to improve agents’ functionality and performance.
Another option available to developers is to connect to Agentspace via AgentEngine. This is a way for enterprises to pass on agents to employees while maintaining control over them, Tiwary said.
Van Kouteren sees Agent Engine as a way to eliminate the overhead of managing the infrastructure to run agents in production.
“It handles all the behind-the-scenes work such as security, scaling, and monitoring, so enterprise teams can focus on what their AI agents do rather than worrying about how to keep them running properly,” he said.
Google said it intends to expand Agent Engine with computer-use and code-execution capabilities, and a dedicated simulation environment so developers can rigorously test agents with diverse user personas to ensure reliability in production.
Google Cloud introduces cloud app design center 9 Apr 2025, 8:00 am
Google Cloud has introduced Application Design Center, a service that helps platform administrators and developers design, deploy, and manage applications on the Google Cloud Platform. Application Design Center is in public preview.
Application Design Center provides a visual, canvas-style approach to designing and modifying application templates, Google Cloud said. Platform administrators and platform engineers can build spaces tailored to the needs of each development team. Developers can configure application templates for deployment, view infrastructure as code in-line, and collaborate with others on designs. Deployments are automatically registered in App Hub for operations and troubleshooting.
[ Related: Google Cloud Next ’25: News and insights ]
Developers can design and deploy applications using the design canvas or natural language chat, powered by Gemini Cloud Assist. As developers drag application components to the canvas to create an application diagram, the tool offers suggestions for additional components and possible connections. At the same time, developers can chat with Gemini to get design suggestions. Gemini will even propose an initial design for a business problem.
App Design Center can be used to design and deploy serving infrastructure, containerized applications, and generative AI applications. After deployment, developers can connect to a code repository to pull in client code or containers, Google Cloud said.
Google adds natural language query capabilities to AlloyDB 9 Apr 2025, 8:00 am
Google is enhancing AlloyDB, its fully managed database-as-a-service (DBaaS), to help developers build applications underpinned by generative AI.
Announced at Google’s annual Cloud Next conference, the updates could give the PostreSQL-compatible AlloyDB an edge over PostgreSQL itself or other compatible offerings such as Amazon Aurora.
[ Related: Google Cloud Next ’25: News and insights ]
Among the additions, a new AlloyDB AI query engine enables developers to use natural language expressions inside SQL queries.
When Google launched AlloyDB in May 2022 the open-source PostgreSQL was rising in popularity due to its transactional and analytical capabilities, extended support for spatial data, broader SQL support, enhanced security and governance, and expanded support for programming languages. Google saw an opportunity to offer a cloud-based alternative as a service — but it’s an opportunity that also attracted the attention of rival Amazon Aurora and Microsoft Azure’s Database for PostgreSQL. Now the challenge for Google is to make its offering stand out.
Support for natural language in SQL queries
The arrival of the AlloyDB AI query engine means that developers can now embed free text questions inside SQL queries, even if those depend on less-structured data such as images and descriptions, said Bradley Shimmin, lead of data and analytics practice at The Futurum Group.
This, said ISG Software Research’s executive director David Menninger, will ease the burden on developers as they need to be very precise when writing SQL statements.
By way of example, Menninger said, instead of writing “SELECT customer_name FROM customer_table WHERE city in (‘Boston’, ‘Cambridge’)”, a developer using AI Query could give a narrative description of what they were looking for, such as “list all the customers near the Charles River.”
With its new query engine, said Futurum Group’s Shimmin, Google is following the trend of database providers converging database operations with semantic and relational query methodologies to expand capabilities of traditional SQL use cases.
Alongside the query engine, Google is adding the next-generation of AlloyDB natural language capability that is expected to allow developers to query structured data inside AlloyDB, thereby helping them build applications that understands an end user’s natural language input better.
ISG’s Menninger sees the natural language capability as a productivity tool for developers.
“It’s often easier to write a natural language query than to write out a SQL statement. It may not be the final SQL statement, but what’s generated can be edited so it moves the development process along more quickly,” Menninger said.
For the enterprises, the analyst sees the natural language ability making data more accessible for end users.
“You don’t necessarily need an analytics tool. You can simply ask the database some questions and get responses. And developers can embed these capabilities in the applications they create benefitting end users,” Menninger explained.
Google Agentspace can now search structured data in AlloyDB
As part of the updates to AlloyDB, Google said that enterprises that subscribe to its Agentspace service will now be able to search structured data inside AlloyDB.
The Agentspace service, launched in December, is intended to help enterprises build agents, which in turn can be used to search data stored across various sources.
These agents can also be programmed to take actions based on the data held at different sources within an enterprise, the company said.
dbInsights’ chief analyst Tony Baer said the extension of Agentspace to AlloyDB is a logical move as Google expects that enterprises will use agents to work or interact with their data in the future.
More support for migrations and other updates
Other database updates announced at Cloud Next this year includes updated support for migrations, added support for running Oracle’s Base Database service, and Model Context Protocol (MCP) support for Gen AI Toolbox for Databases.
The update to migrations comes in the form of the Google Database Migration Service (DMS) now supporting SQL Server to PostgreSQL migrations for Cloud SQL and AlloyDB.
“These new capabilities support the migration of both self-managed and cloud-managed SQL Server offerings, and a range of SQL Server editions and versions,” Google said in a statement.
In April last year, Google added Gemini support to the DMS to make migrations faster by supporting conversion of database-resident code.
The Toolbox is an open-source server designed to streamline the creation, deployment, and management of AI agents capable of querying databases.
With industry support for MCP rising, it was inevitable that Google would add support for the protocol to its GenAI Toolbox for Databases, analysts said.
Anthropic introduced MCP last year to make it easier to bring data to LLMs. Since then, it has become a standard means of linking up models, tools, and data resources in support of agentic processes, Shimmin said.
For Menninger, MCP is the emerging standard that enterprises are starting to use to provide context to agents in order to enhance their performance.
Google’s Agent2Agent open protocol aims to connect disparate agents 9 Apr 2025, 8:00 am
Google has taken the covers off a new open protocol — Agent2Agent (A2A) — that aims to connect agents across disparate ecosystems.
At its annual Cloud Next conference, Google said that the A2A protocol will enable enterprises to adopt agents more readily as it bypasses the challenge of agents that are built on different vendor ecosystems not being able to communicate with each other.
“Using A2A, agents can publish their capabilities and negotiate how they will interact with users (via text, forms, or bidirectional audio/video) – all while working securely together,” said Saurabh Tiwary, vice president of Cloud AI at Google.
[ Related: Google Cloud Next ’25: News and insights ]
The interoperability being offered by the A2A protocol will allow businesses to automate complex workflows that span multiple systems, potentially increasing productivity while reducing integration costs, according to Paul Chada, co-founder of DoozerAI — an agentic digital worker platform.
Google said the protocol is built atop existing, popular standards including HTTP, SSE, and JSON-RPC. This should make it easier to integrate with existing IT stacks businesses already use.
While HTTP serves as the foundation of web communication, SEE and JSON-RPC are foundational protocols for sending updates to a client server, and applications talking to each other remotely using JSON messages respectively.
“Using HTTP and JSON-RPC is practical and should make implementation easier,” said Anil Clifford, founder of IT services and consulting firm Eden Digital.
[ Related: Agentic AI – Ongoing news and insights ]
However, Clifford was uncertain of the protocol’s success in handling edge cases in real-world scenarios, which he believes will determine the protocol’s proficiency.
In its efforts to proliferate the protocol, Google said that it has partnered with over 50 partners, such as SAP, LangChain, MongoDB, Workday, Box, Deloitte, Elastic, Salesforce, ServiceNow, UiPath, UKG, and Weights & Biases among others.
How A2A works?
A2A facilitates communication between a “client” agent and a “remote” agent.
While a client agent is responsible for formulating and communicating tasks, the remote agent is responsible for acting on those tasks to provide the correct information or take the correct action.
The interaction between agents across use cases is dependent on several key capabilities baked within the protocol, such as capability discovery, task management, collaboration, and user experience negotiation.
The capability discovery feature allows the client agent to read an agent or agents’ capabilities published or showcased via their Agent Cards in JSON format.
This allows the client agent to identify the best agent that can perform a task and leverage A2A to communicate with the remote agent, Google said.
The task management capability allows the client and remote agent to talk to each other to complete a task based on end-user input.
This task object is defined by the protocol and has a lifecycle, Google said. It can be completed immediately or, for long-running tasks, each of the agents can communicate to stay in sync with each other on the latest status of completing a task. The output of the tasks is called an artifact.
The collaboration capability allows the client and remote agent to send each other messages to communicate context, replies, artifacts, or user interactions.
The user experience negotiation capability, according to the executives, allows the client and remote agents to negotiate the correct format needed to respond to an end user request as well as understand the user’s UI capabilities, such as iframes, video, web forms, etc.
Complementary to Model Context Protocol
The A2A protocol is different from Anthropic’s Model Context Protocol (MCP) but both could complement each other, analysts said.
While MCP focuses on the interaction between an application and a generative AI model, the A2A protocol focuses on the interaction between different AI agents.
“You could view MCP as providing vertical integration (application-to-model) while A2A provides horizontal integration (agent-to-agent),” Chada said, adding that an agent built using MCP could potentially use A2A to communicate with other agents.
Chada also pointed out that A2A is distinct from Nvidia’s AgentIQ, which is more of a development toolkit for building and optimizing agent systems rather than an agent communication protocol.
A2A as a new industry standard
Microsoft, Amazon, and other major cloud providers could end up adopting A2A given that major vendors, such as Salesforce, ServiceNow, and Workday are already part of A2A’s partner network, Chada said.
But Clifford feels that Microsoft could have a good reason to develop a competing or complementary standard, given their deep integration with OpenAI and enterprise software presence. “The question is whether enterprises will benefit from this competition or suffer from fragmentation. History suggests we’ll see competing standards before consolidation occurs — likely causing headaches for early adopters,” Clifford said.
Five things to consider before you deploy an LLM 9 Apr 2025, 5:00 am
If the screwdriver were invented by the tech industry today, then it would be widely deployed for a variety of tasks, including hammering nails. Since the debut of ChatGPT, there has been growing fervor and backlash against large language models (LLMs). Indeed, many adaptations of the technology seem misappropriated, and its capabilities are overhyped, given its frequent lack of veracity. This is not to say there are not many great uses for an LLM, but you should answer some key questions before going full bore.
Will an LLM be better or at least equal to human responses?
Does anyone like those customer service chatbots that don’t answer any question that isn’t already on the website’s front page? On the other hand, talking to a person in customer service who just reads a script and isn’t empowered to help is equally frustrating. Any deployment of an LLM should test whether it is equal or better to the chatbot or human responses it is replacing.
What is the liability exposure?
In a litigious society, any new process or technology has to be evaluated against its potential for legal exposure. There are obvious places for caution, like medical, law, or finance, but what about an LLM-generated answer that directs people to a potential policy or advice that is misleading or unallowed? In many places, bad company policies or management have meant human-generated responses resulted in class action lawsuits. However, an improperly trained or constrained LLM could generate responses for a large number of users and create unintended liability.
Is it actually cheaper?
Sure, it is easy to measure your subscription and use of a general LLM like ChatGPT, but more specific custom systems can have higher costs beyond just the compute power. What about the staff and other infrastructure to maintain and debug the system? You can hire quite a few customer service personnel for the price of one AI expert. Additionally, ChatGPT and similar services seem to be subsidized by investment at the moment. Presumably at some point they will want to turn a profit and therefore your cost could go up. Is it actually cheaper and will it stay so for the life of your system?
How will you maintain it?
Most LLM systems will be custom trained in specific data sets. A disadvantage to the neural networks on which LLMs rely is that they are notoriously difficult to debug. As this technology progresses, a model may eventually be able to update (or unlearn) something it has learned, but for now this can be quite difficult. What is your process or procedure for regularly updating the LLM, especially if it gives a bad response?
What is your testing process?
A key benefit of an LLM is that you don’t have to anticipate every possible permutation of a question in order for it to provide a credible answer. However, the word “credible” doesn’t mean correct. At least the most common questions and various permutations should be tested. If your LLM is replacing a human or existing machine process, the questions people are asking today are a good data set to start with.
There is an old proverb of dubious provenance that translates roughly to “slow down I’m in a hurry.” Not everything will be a great fit for LLMs, and there is ample evidence (no Grammarly, I don’t want to sound more positive!) that enthusiasm has outstripped capabilities. However, by measuring quality, economy, and coming up with some decent maintenance and testing procedures, LLMs can be a valuable tool in many different use cases.
Four paradoxes of software development 9 Apr 2025, 5:00 am
Civil engineers can rightfully claim that no two bridges are exactly alike. However, bridges share many known characteristics, and the materials they are built with have known characteristics. Building bridges involves many known knowns and not as many unknown unknowns as one might think.
I am not a civil engineer, and I have nothing but respect for the fine folks who design and build our bridges, but I point this out to contrast bridge building to writing software. Writing good, functioning software is hard. No project undertaken by software development teams has ever been done before. There are similarities among projects, sure, but every software project has its own nuances, requirements, and plentiful supply of unknown unknowns.
Or, one might say, software development is full of paradoxes that are challenging to deal with. Here are four.
No one knows how long the job will take, but the customer demands a completion date.
This, frankly, is probably the biggest challenge that software development organizations face. We simply can’t be certain how long any project will take. Sure, we can estimate, but we are almost always wildly off. Sometimes we drastically overestimate the time required, but usually we drastically underestimate it.
For our customers, this is both a mystery and a huge pain. Not understanding the first part of the paradox, they don’t understand why they can’t know for sure when their new software will arrive. Then of course they are frustrated when the software isn’t delivered as promised.
We try story points and planning poker and all kinds of other agile techniques to figure out when things will get done, but we never seem to be able to get past Hofstadter’s Law: It always takes longer than you expect, even when you take into account Hofstadter’s Law.
Adding developers to a late project makes it later.
Known as Brooks’s Law, this rule may be the strangest of the paradoxes to the casual observer.
Normally, if you realize that you aren’t going to make the deadline for filing your monthly quota of filling toothpaste tubes, you can put more toothpaste tube fillers on the job and make the date. If you want to double the number of houses that you build in a given year, you can usually double the inputs—labor and materials—and get twice as many houses, give or take a few.
However, as Fred Brooks showed in his book The Mythical Man Month, “adding manpower to a late software project makes it later.” That is a paradox, but it’s as close to a law in software development as we will get. Brooks showed that because new team members require time to learn the context of a complex system and increase the communication overhead, they can’t contribute to the project immediately, thus lengthening the time to project completion while also driving up costs.
The better you get at coding, the less coding you do.
It takes many years to gain experience as a software developer. Learning the right way to code, the right way to design, and all of the rules and subtleties of writing clean, maintainable software doesn’t happen overnight.
But all too often, as you gain that experience, you are put into leadership positions that actually reduce the amount of code that you write. Instead of coding, you end up in design meetings, reviewing code written by others, and managing people. Sometimes you get promoted out of writing code all together.
That is not to say that a senior developer’s contribution decreases. By planning projects, mentoring younger developers, enforcing coding standards, and realizing how important it is for everyone to write good code, a senior developer contributes mightily to the success of the team and the company.
But you still end up writing less code.
Software development platforms and tools keep getting better, but software takes just as long to develop and run.
If you compare how we build web applications today, with amazingly powerful tools like React, Astro, and Next.js, versus how we built websites 30 years ago, when we processed data and HTML using the Common Gateway Interface (CGI), you realize that we’ve advanced light-years from those early days.
And yet, while our tools get more and more sophisticated, and our processors get faster and faster, software development never seems to move any faster. Work always seems to expand to exceed not only time budgets, but every CPU cycle as well.
Our sites look nicer, but are we really any more productive? Do our sites run faster and process data better? Sure, these new frameworks and libraries abstract away many complexities (does anyone want to write jQuery code anymore?), but they also introduce new problems like long build pipelines, configuration nightmares, and dependency bloat.
The existence of these paradoxes doesn’t mean things are hopeless. I don’t point them out to frustrate or depress you. Every day, paradoxically, teams still build and ship working software.
I point them out to make sure we realize that they exist, that we need to accept them and deal with them, and hopefully avoid the pitfalls and potholes that they present. We can’t eliminate the strangeness and chaos, but we can anticipate them and deal with them. Our job is to ship despite them.
One last paradox might be that software is never really done. There is always one more feature that you can add. At least with a bridge, it is quite clear when the job is finished and that the product works as designed.
Civil engineers can rightfully claim that no
two bridges are exactly alike. However,
bridges share many known characteristics, and the materials they are built with
have known characteristics. What they do
has many known knowns and not as many unknown unknowns as one might think.
I am not a civil engineer and I have nothing but respect for the fine folks
that design and build our bridges, but I point this out to contrast it to
writing software. Writing good,
functioning software is hard. Every project undertaken by software development
teams has never been done before. There
are similarities amongst projects, sure, but any given project has nuances,
requirements, and a plentiful supply of unknown unknowns. Or, one might say, software development is
full of paradoxes that are challenging to deal with. Here are four:
No one knows how long anything
will take. Customers want and need to
know how long things will take.
This, frankly, is probably the biggest challenge that software development
organizations face. We simply
aren’t able to tell for sure how long any project will take. Sure, we can
estimate, but we are almost always wildly off — sometimes drastically
overestimating, but most often drastically underestimating how long something
will take. But for our customers, this is both a mystery
and a difficulty. Not understanding the
first part of the paradox, they don’t get why they can’t know for sure when
they will have a new piece of functionality and are of course frustrated when
it isn’t delivered as promised. We try story points and planning poker and all
kinds of other agile techniques to figure out when things will get done, but we
never quite seem to be able to get past Hofstadter’s Law: “It always takes longer than you expect, even when you take into
account Hofstadter’s Law.”
Brooks’ Law — Adding developers
to a late project makes it later.
This is the strangest of the paradoxes to the
casual observer.
Normally, if you realize that you aren’t going to make the deadline for filing
your monthly quota of filling toothpaste tubes, you can put more toothpaste
tube fillers on the job and make the date.
If you want to double the number of houses that you build in a given
year, you can usually double the inputs — labor and materials — and get twice
as many houses, give or take. However, as Fred Brooks showed in his book The Mythical Man Month, “adding manpower
to a late software project makes it later.”
That is a paradox, but it’s as close to a law in software development as
we will get. Brooks showed that because
new team members require training, time to learn the context of a complex
system and increase the communication overhead, they can’t contribute to the
project immediately, thus driving up costs.
The better you get at coding, the
less code you end up writing
It takes many years to gain experience as a
software developer. Learning the right
way to code, the right way to design, and all the small subtleties of writing
clean, maintainable software isn’t done overnight. But all too often, as you gain that
experience, you are put into leadership positions that actually reduce the
amount of code that you write. Instead,
you end up in design meetings, reviewing code written by others, and managing
people. And sometimes you get promoted
out of writing code all together. That is not to say that a senior developer’s
contribution decreases — that is usually not the case. The process of planning projects, mentoring
younger developers, enforcing coding standards, and realizing how important it
is for everyone to write good code — all contribute mightily to the success of
a project.
But you still end up writing less code.
Software development frameworks
and tooling keep getting better and more powerful, but our software still takes
just as long to develop and never seems to run any faster.
If you compare how we build web applications
today with React,
Astro, Next.js, and
other powerful advanced tools with thirty years ago when we processed data and
HTML using the Common Gateway Interface (CGI), you soon
realize that we’ve advanced lightyear from those early days. It always seems like a paradox to me that our
processors get faster and faster, but software development never seems to move
any faster. Work always seems to expand
to fill and exceed not only time budgets, but every single CPU cycle as well. Our sites look nicer, but are we really any
more productive? Do our sites run faster
and process data better? Sure, these new
frameworks and libraries abstract away many complexities (does anyone want to
write jQuery code anymore?) but at the
same time introduce new problems like long build pipelines, configuration
nightmares, and dependency bloat. The existence of these paradoxes doesn’t mean
things are hopeless. I don’t point them out to frustrate or depress you. And yet, every day, teams still build and
ship working software. I point them out to make sure we realize that
they exist, that we need to accept them and deal with them, and hopefully avoid
the pitfalls and potholes that they present. We can’t eliminate the strangeness
and chaos that they can present to us, but we can anticipate them and deal with
them. Our job is to ship despite them. One last paradox might be that software is
never really done. There is always one more feature that you can add. At least with a bridge, it is quite clear
when it is done and that it works as designed.
What is Kubernetes? Scalable cloud-native applications 9 Apr 2025, 5:00 am
Kubernetes is a popular open source platform for container orchestration—that is, for managing applications built from multiple, largely self-contained runtimes called containers.
Containers have become increasingly popular since Docker launched in 2013, but large applications spread out across many containers are difficult to coordinate.
Kubernetes makes applications dramatically easier to manage at scale. It has become a key player in the container revolution.
What is container orchestration?
Containers are a powerful successor to virtual machines, which are software-based emulations of computers. Both containers and VMs help keep applications discrete, but containers operate at the application level, not the machine level. Containers require far less overhead and grant much greater flexibility than VMs do. They’ve reshaped the way we think about developing, deploying, and maintaining software.
In a containerized architecture, the various services that constitute an application (web interface, database, etc.) are packaged into separate containers and deployed across a cluster of physical or virtual machines. But this gives rise to the need for container orchestration—a tool for automating the deployment, management, scaling, networking, and availability of container-based applications.
What is Kubernetes?
Kubernetes is an open source project that has become one of the most popular container orchestration tools. It allows you to deploy and manage multi-container applications at scale. While in practice Kubernetes is most often used with Docker, the best-known containerization platform, it can also work with any container system that conforms to the Open Container Initiative (OCI) standards for container image formats and runtimes.
Kubernetes has also grown to manage virtual machine workloads side-by-side with containers, by way of projects like KubeVirt. In that sense, it’s also evolving into a general framework for applications, regardless of how they’re hosted. That said, containers are still the first choice for running apps in Kubernetes due to their flexibility and convenience.
Because Kubernetes is open source, with relatively few restrictions, it can be used freely by anyone who wants to run containers, most anywhere they want to run them: on-premises, in the public cloud, or both.
Google and Kubernetes
Kubernetes began life as a project within Google. It is a successor to (though not a direct descendent of) Google Borg, an earlier container management tool that Google used internally. Google open sourced Kubernetes in 2014, in part because the distributed microservices architectures that Kubernetes facilitates makes it easy to run applications in the cloud. Most every cloud vendor, Google included, sees support for containers, microservices, and Kubernetes as big draws for customers. Cloud vendors have gone to great lengths to add this support to their service portfolios.
Kubernetes and many of its related projects are currently maintained by the Cloud Native Computing Foundation, which is itself under the umbrella of the Linux Foundation.
Kubernetes vs. Docker
Kubernetes doesn’t replace Docker but augments it. However, Kubernetes does replace some of the container management technologies that have emerged around Docker.
One such technology is Docker swarm mode, a system for managing a cluster of Docker engines referred to as a “swarm”—essentially a small orchestration system. It’s still possible to use Docker swarm mode instead of Kubernetes, but Docker Inc. has made Kubernetes a key part of Docker support.
On an even smaller scale, Docker also has Docker Compose, a way to bring up a multi-container application on a single host. If you just want to run a multi-container application on one machine, without spreading it across a cluster, Docker Compose covers that scenario.
Kubernetes is significantly more complex than Docker swarm mode or Docker Compose, and it requires more work to deploy. But again, the work is intended to provide a big payoff in the long run—a more manageable, resilient application infrastructure in production. For development work, and smaller container clusters, Docker swarm mode is a simpler choice. And for single-machine deployments of multi-container applications, there’s Docker Compose.
Kubernetes vs. Mesos
Another project you might have heard about as a competitor to Kubernetes is Mesos, an Apache project that originally emerged from developers at Twitter. At one time, Mesos was seen as an answer to the Google Borg project.
Mesos does offer container orchestration services, but its ambitions go far beyond that: It aims to be a sort of cloud operating system that can coordinate both containerized and non-containerized components. Many different platforms can run within Mesos—including Kubernetes itself.
Mesos has also received far less development than Kubernetes recently. Its last significant release was in 2020; its last updates were in 2024. Kubernetes, by contrast, continues to be updated regularly. When combined with projects like KubeVirt, it eclipses much of the functionality Mesos provides.
Kubernetes architecture
The Kubernetes architecture is based on several key concepts and abstractions. Some of these are variations on familiar themes while others are unique to Kubernetes.
Kubernetes clusters
The highest-level Kubernetes abstraction, the cluster, refers to the group of machines running Kubernetes (itself a clustered application) and the containers managed by it. Machines in a cluster are referred to as worker nodes. A Kubernetes cluster must have a master, the system that commands and controls all the other Kubernetes machines in the cluster. This system utilizes an interface called the control plane.
A highly available (HA) Kubernetes setup can replicate the control plane across multiple machines. The configuration data for the cluster can also be replicated across nodes. But at any given time, only one master can run the job scheduler and controller-manager.
Kubernetes nodes and pods
Each cluster contains Kubernetes nodes. Nodes might be physical machines or VMs. Again, the idea is abstraction: Whatever the application is running on, Kubernetes handles deployment on that substrate. Kubernetes even makes it possible to ensure certain containers run only on certain substrates—for example, only virtual machines, or only bare metal.
Nodes run pods, the most basic Kubernetes objects. Each pod represents a single instance of an application or running process in Kubernetes and consists of one or more containers. Kubernetes starts, stops, and replicates all containers in a pod as a group. This way, users don’t need to think at the level of containers for how to abstract their work; instead, they can just think about running application instances.
Pods are created and destroyed on nodes as needed to conform to the desired state, which is specified by the user in the pod definition. Etcd, a distributed key-value store, keeps details about how Kubernetes should be configured, from the state of pods on up. For example, you can state that you want at least two instances of a web front end to satisfy incoming requests, which synchronize with a single instance of a database.
Kubernetes provides an abstraction called a controller that describes how pods are to be spun up, rolled out, and spun down. One simple controller is the Deployment controller, which assumes every pod is stateless (that is, any data is stored outside it) and can be stopped or started as needed. It’s used to scale an application up or down, update an application to a new version, or roll back an application to a known-good version if there’s a problem.
For applications with a persistent state of some kind, you’d use a StatefulSet controller. There are other controllers that handle other scenarios.
Kubernetes services
Pods live and die as needed, so we need a different abstract to deal with the lifecycle of an application as a whole. An app should be persistent even when the pods that comprise it aren’t. To that end, Kubernetes provides an abstraction called a service.
A service in Kubernetes describes how a given group of pods (or other Kubernetes objects) can be accessed via the network. As the Kubernetes documentation puts it, the pods that constitute the back end of an application might change, but the front end shouldn’t have to know about that or track it. Services handle those details.
A few more pieces internal to Kubernetes round out the picture. The scheduler parcels out workloads to nodes so they’re balanced across resources, and so that deployments meet the requirements of the application definitions. The controller manager ensures that the state of the system—applications, workloads, and so on—matches the desired state defined in Etcd’s configuration settings (no more than one database, at least two front-end nodes, etc.).
It is important to keep in mind that none of the low-level mechanisms used by containers, such as Docker itself, are replaced by Kubernetes. Rather, Kubernetes provides a larger set of abstractions for using these mechanisms for the sake of keeping applications running at scale.
Kubernetes policies
Policies in Kubernetes ensure that pods adhere to certain standards of behavior. Policies prevent pods from using excess resources: CPU, memory, process IDs, or disk space. Such “limit ranges” are expressed in relative terms for CPUs (e.g., 50% of a hardware thread) and absolute terms for memory (e.g., 200MB). These limits can be combined with resource quotas to ensure that different teams of Kubernetes users (as opposed to applications generally) have equal access to resources.
Kubernetes Ingress
While Kubernetes services run within a cluster, you’ll want to be able to access these services from the outside world. Several Kubernetes components facilitate this with varying degrees of simplicity and robustness, including NodePort and LoadBalancer. The component with the most flexibility is Ingress, an API that manages external access to a cluster’s services, typically via HTTP.
Ingress requires a bit of configuration to set up properly. Matthew Palmer, who wrote a book on Kubernetes development, steps you through the process on his website.
Kubernetes with Prometheus
A common need with containerized applications, especially at scale, is visibility—knowing what applications are doing and where they may be having problems. Kubernetes components can emit metrics to be used by Prometheus, the open source monitoring tool created to work in conjunction with Kubernetes and other cloud-native technologies.
The Kubernetes Dashboard
One Kubernetes component that helps you stay on top of all of these other components is Dashboard, a web-based UI you can use to deploy and troubleshoot applications and manage cluster resources. Dashboard isn’t installed by default, but adding it isn’t difficult.
Benefits of using Kubernetes
Because Kubernetes introduces new abstractions and concepts, and because the learning curve is high, it’s only normal to ask about the long-term payoffs for using it. Let’s consider some of the benefits of running applications inside Kubernetes.
Kubernetes automates application management
One of the most basic duties Kubernetes takes off your hands is keeping a large, complex application up, running, and responsive to user demands. It automates application health, replication, load balancing, and hardware resource allocation.
Kubernetes applications that become “unhealthy,” or don’t conform to the definition of health you’ve specified for them, can be automatically repaired. Kubernetes also lets you set soft and hard limits on application resource usage, including memory, storage I/O, and network bandwidth. Applications that use minimal resources can be packaged together on the same hardware; ones that need to stretch out can be placed on systems where they have room to grow. And, again, rolling out updates across a cluster, or rolling back if updates break, can be automated.
Kubernetes eases deployment
Package managers such as Debian Linux’s apt
and Python’s pip
save users the trouble of manually installing and configuring an application. This is especially handy when an application has multiple external dependencies.
Helm is essentially a package manager for Kubernetes. Many popular software applications must run in Kubernetes as a group of interdependent containers. Helm provides a definition mechanism, a “chart,” that describes how an application or service can be run as a group of containers inside Kubernetes.
You can create your own Helm charts from scratch, and you might have to if you’re building a custom application to be deployed internally. But if you’re using a popular application that has a common deployment pattern, there’s a good chance someone has already composed a Helm chart for it and published it in the Artifact Hub.
Another place to look for official Helm charts is the Kubeapps directory, which allows Kubernetes applications to be deployed and installed from within a Kubernetes cluster itself, using a handy web-based interface.
Kubernetes simplifies application resource management
Containers are meant to be immutable; the code and data you put into them isn’t supposed to change. But applications need state, meaning they need a reliable way to deal with data that changes. That’s made all the more complicated by the way containers live, die, and are reborn across the lifetime of an application.
Kubernetes provides abstractions to allow containers and applications to deal with data storage in the same decoupled way as other resources. Many common kinds of storage, from Amazon EBS volumes to plain old NFS shares, can be accessed via Kubernetes storage drivers, called volumes. Normally, volumes are bound to a specific pod, but a volume subtype called a persistent volume (PV) can be used for data that needs to live on independently of any pod.
Containers often need to work with secrets. These are credentials like API keys or service passwords that you don’t want hard-coded into a container or stashed openly on a disk volume. While there are third-party solutions like Docker secrets and HashiCorp Vault, Kubernetes has its own mechanism for natively handling secrets, although it should be configured with care (for instance, by restricting access through RBACs).
Hybrid cloud and multi-cloud deployments
One of the long-standing dreams of cloud computing is to be able to run any application in any cloud, or in any mix of public or private clouds. This isn’t just to avoid vendor lock-in, but also to take advantage of features specific to individual clouds.
For some time, the most common mechanism for keeping multiple clusters in sync with one another across multiple regions and clouds was a Kubernetes SIG project called KubeFed, for Kubernetes Cluster Federation. In a federation, a given application deployment can be kept consistent between multiple clusters, and different clusters can share service discovery so that a back-end resource can be accessed from any cluster. Federations can also be used to create highly available or fault-tolerant Kubernetes deployments, whether or not you’re spanning multiple cloud environments.
However, in September 2023, the KubeFed project was archived. A successor project, Karmada, uses Kubernetes-native APIs to synchronize applications across clusters. It requires no changes to the applications themselves.
Small deployments and edge computing
Kubernetes deployments don’t have to be big to be useful. K3s, for instance, is a tiny Kubernetes deployment—a single 70MB binary—that can run on embedded hardware or low-resource ARM systems (2GB of RAM). Minimal Kubernetes distros have created space for Kubernetes in edge computing—not just in environments with tight hardware constraints, but also minimal or even no external networking.
Where to get Kubernetes
Kubernetes is available in many forms—from open source bits to commercially backed distribution to public cloud service. The best way to figure out where to get Kubernetes is by use case.
- If you want to do it all yourself: The source code, and pre-built binaries for most common platforms, can be downloaded from the GitHub repository for Kubernetes. If you want to try out a tiny instance of Kubernetes on your own system, you can use Minikube to set up a local cluster on a single machine, or use the K3s distribution.
- If you’re using Docker: Docker Desktop’s most recent editions come with Kubernetes as a pack-in. This is ostensibly the easiest way for container mavens to get a leg up with Kubernetes, since it comes by way of a product you’re almost certainly already familiar with. (Docker can also use Minikube for deployments.)
- If you’re deploying on-prem or in a private cloud: Chances are good that any infrastructure you choose for your private cloud has Kubernetes built-in. Standard-issue, certified, supported Kubernetes distributions are available from dozens of vendors.
- If you’re deploying in a public cloud: The three major public cloud vendors all offer Kubernetes as a service. Google Cloud Platform offers Google Kubernetes Engine. Microsoft Azure offers the Azure Kubernetes Service. And Amazon has added Kubernetes to its existing Elastic Container Service. Managed Kubernetes services are also available from many vendors.
Kubernetes tutorials and certifications
Now that you’ve got the basics under your belt, are you ready to get started with Kubernetes? You might want to start off with the simple tutorials on the Kubernetes project site itself; when you’re ready for something more advanced, check out the list of guides in the awesome-kubernetes repository, which has something for everyone. For migration advice, see “How to succeed with Kubernetes.”
If you feel you have a good handle on how Kubernetes works and you want to demonstrate your expertise to employers, certification may be the way to go. Check out the pair of Kubernetes-related certifications offered jointly by the Linux Foundation and the Cloud Native Computing Foundation:
- Certified Kubernetes Administrator: Seeks to “provide assurance that CKAs have the skills, knowledge, and competency to perform the responsibilities of Kubernetes administrators,” including application lifecycle management, installation, configuration, validation, cluster maintenance, and troubleshooting.
- Certified Kubernetes Application Developer: Certifies that “users can design, build, configure, and expose cloud native applications for Kubernetes.”
The certification exams are $445 each. There are also accompanying training courses, which can serve as a structured way to learn more about Kubernetes.
What misleading Meta Llama 4 benchmark scores show enterprise leaders about evaluating AI performance claims 8 Apr 2025, 9:49 pm
Benchmarks are critical when evaluating AI — they reveal how well models work, as well as their strengths and weaknesses, based on factors like reliability, accuracy, and versatility.
But the revelation that Meta misled users about the performance of its new Llama 4 model has raised red flags about the accuracy and relevancy of benchmarking, particularly when model builders tweak their products to get better results.
“Organizations need to perform due diligence and evaluate these claims for themselves, because operating environments, data, and even differences in prompts can change the outcome of how these models perform,” said Dave Schubmehl, research VP for AI and automation at IDC.
Vendors may fudge results, but it’s not likely to dissuade IT buyers
On Saturday, Meta unexpectedly dropped two new Llama models, Scout and Maverick, claiming that Maverick outperformed GPT-4o and Gemini 2.0 Flash and achieved comparable results to the new DeepSeek v3 on reasoning and coding. The model quickly claimed second place behind Gemini 2.5 Pro on LMArena, an AI proving ground where human raters compare model outputs.
The company also claimed that Scout delivered better results than Gemma 3, Gemini 2.0 Flash-Lite, and Mistral 3.1 across a broad range of benchmarks.
However, independent researchers soon discovered, by reading the fine print on the Llama website, that Meta had used an experimental chat version of Maverick in testing that was specifically optimized for conversationality, as opposed to the publicly-available version it dropped over the weekend. Meta has denied any wrongdoing.
It’s not the first time vendors have been misleading about benchmarking, and experts say the little bit of fudging isn’t likely to prompt enterprise buyers to look elsewhere.
“Every vendor will try to use benchmarked results as a demonstration of superior performance,” said Hyoun Park, CEO and chief analyst at Amalgam Insights. “There is always some doubt placed on vendors that intentionally game the system from a benchmarking perspective, especially when they are opaque in their methods.”
However, as long as leading AI vendors show that they are keeping pace with their competitors, or can potentially do so, there will likely be little to no long-term backlash, he said. He pointed out that the foundation model landscape is changing “extremely rapidly,” with massive improvements either in performance or productivity happening monthly, or even more frequently.
“Frankly, none of today’s model benchmark leaderboards will be relevant in six to 12 months,” said Park.
Enterprises: Do your due diligence with AI
With the proliferation of models in the market, it’s naturally important for organizations and developers to have some idea of how AI will work in their environment, and benchmarks partially serve this need, Schubmehl pointed out.
“Benchmarks provide a starting point, especially since performance is becoming increasingly important as applications using AI models become more complex,” he said. However, “evaluation with the organizations’ data, prompts, and operating environments is the ultimate benchmark for most enterprises.”
Park emphasized that benchmarks are ultimately only as useful as accuracy of their simulated environments. So, for instance, for a defined transactional technology such as a server or database, metrics and guardrails can often simulate specific high traffic or compute-heavy environments fairly accurately.
However, the goals of AI are often outcomes based, rather than related to tasks or rules-based workflows. For instance, the ability to answer a customer service question is different from solving a customer service request, Park noted. AI may be very good at the former task, but struggle with the intricacies of the chain-of-thought (CoT) across many permutations that is necessary to conduct the latter.
Therefore, when evaluating models, enterprise buyers should first consider if benchmarked tasks and results match their business processes and end results, or whether the benchmarking stops at an intermediate point. They must conceptually understand the processes and work that is being supported or automated, and align benchmark results to their current work process.
It is also important to ensure that the benchmark environment is similar to the business production environment, he said, and to document areas where network, compute, storage, inputs, outputs, and contextual augmentation of the benchmark environment differ from the production environment.
Further, make sure that the model tested matches the model that is available for preview or for production, Park advised. It is common for models to be optimized for a benchmark, without revealing deep detail into the cost or time required for the training, augmentation, or tuning going into that optimization.
Ultimately, “businesses seeking to conduct a competitive evaluation of AI models can use benchmarks as a starting point, but really need to scenario test in their own corporate or cloud environments if they want an accurate understanding of how a model may work for them,” Park emphasized.
GitHub Copilot rolls out agent mode in Visual Studio Code 8 Apr 2025, 6:15 pm
GitHub has rolled out agent mode and MCP (Model Context Protocol) support in GitHub Copilot for all Visual Studio Code users. The company is also releasing an open source GitHub MCP server, allowing developers to add GitHub functionality to any LLM that supports MCP.
These capabilities were announced April 4. Agent mode helps developers analyze code, proposes edits, runs tests, and validate results across multiple files, while MCP support unlocks access to context or capabilities sought by developers. The GitHub MCP server being released in preview provides seamless integration with GitHub APIs, enabling advanced automation and interaction capabilities for developers and tools, GitHub said.
GitHub also expanded model support in GitHub Copilot. Anthropic Claude 3.5, 3.7 Sonnet, 3.7 Sonnet Thinking, Google Gemini 2.0 Flash, and OpenAI o3-mini are now generally available in GitHub Copilot via premium requests. This is included in all paid Copilot tiers.
GitHub also announced the general availability of the Copilot code review agent, which helps offload basic reviews to a Copilot agent that finds bugs or potential performance problems and suggests fixes. This means developers can start iterating on code while waiting for a human review, helping to keep code repositories more maintainable and focused on quality, GitHub said. To improve Copilot code review, support has been added for C, C++, Kotlin, and Swift, now in public preview.
Beware these 10 malicious VS Code extensions 8 Apr 2025, 10:47 am
Developers using Microsoft’s Visual Studio Code (VSCode) editor are being warned to delete, or at least stay away from, 10 newly published extensions which will trigger the installation of a cryptominer.
The warning comes from researchers at Extension Total, who said possibly as many as 1 million of these malicious extensions, which pretend to be popular development tools, may have been installed since April 4, when they were published on Microsoft’s Visual Studio Code Marketplace. However, the researchers also suspect the threat actors may have inflated the download numbers.
Meta launches AI family Llama 4 — but the EU doesn’t get everything 8 Apr 2025, 6:22 am
Over the weekend, Meta took the opportunity to launch Llama 4, a new series of AI models trained on a large amount of text, images and videos.
According to Meta, Llama 4 is better than its competitors GPT-4o and Gemini 2.0 in a number of areas, including programming, reasoning and language translation.
The two variants Llama 4 Scout and Llama 4 Maverick are available on Llama.com and Hugging Face now, while the top-of-the-line Llama 4 Behemoth will take a little longer.
Why is cloud-based AI so hard? 8 Apr 2025, 5:00 am
The public cloud market continues its explosive growth trajectory, with enterprises rushing to their cloud consoles to allocate more resources, particularly for AI initiatives. Cloud providers are falling over themselves to promote their latest AI capabilities, posting numerous job requisitions (many unfunded “ghost jobs”) and offering generous credits to entice enterprise adoption. However, beneath this veneer of enthusiasm lies a troubling reality that few are willing to discuss openly.
The statistics tell a sobering story: Gartner estimates that 85% of AI implementations fail to meet expectations or aren’t completed. I consistently witness projects begin with great fanfare, only to fade into obscurity quietly. Companies excel at spending money but struggle to build and deploy AI effectively.
How strong is demand for AI really?
There’s a puzzling disconnect in the cloud computing industry today. Cloud providers consistently claim they’re struggling to meet the overwhelming demand for AI computing resources, citing waiting lists for GPU access and the need for massive infrastructure expansion. Yet their quarterly earnings reports often fall short of Wall Street’s expectations, creating a curious paradox.
The providers are simultaneously announcing unprecedented capital expenditures for AI infrastructure. Some are planning 40% or higher increases in their capital budgets even as they seem to struggle to demonstrate proportional revenue growth.
Investors’ fundamental concern is that AI remains an expensive research project, and there’s significant uncertainty about how the global economy will absorb, utilize, and pay for these capabilities at scale. Cloud providers may conflate potential future demand with current market reality, leading to a mismatch between infrastructure investments and immediate revenue generation.
This suggests that although AI’s long-term potential is significant, the short-term market dynamics may be more complex than providers’ public statements indicate.
The ROI conundrum
Data quality is perhaps the most significant barrier to successful AI implementation. As organizations venture into more complex AI applications, particularly generative AI, the demand for tailored, high-quality data sets has exposed serious deficiencies in existing enterprise data infrastructure. Most enterprises knew their data wasn’t perfect, but they didn’t realize just how bad it was until AI projects began failing. For years, they’ve avoided addressing these fundamental data issues, accumulating technical debt that now threatens to derail their AI ambitions.
Leadership hesitation compounds these challenges. Many enterprises are abandoning generative AI initiatives because the data problems are too expensive to fix. CIOs, increasingly concerned about their careers, are reluctant to take on these projects without a clear path to success. This creates a cyclical problem where lack of investment leads to continued failure, further reinforcing leadership’s unwillingness.
Return on investment has been dramatically slower than anticipated, creating a significant gap between AI’s potential and practical implementation. Organizations are being forced to carefully assess the foundational elements necessary for AI success, including robust data governance and strategic planning. Unfortunately, too many enterprises consider these things too expensive or risky.
Sensing this hesitation, cloud providers are responding with increasingly aggressive marketing and incentive programs. Free credits, extended trials, and promises of easy implementation abound. However, these tactics often mask the real issues. Some providers are even creating artificial demand signals by posting numerous AI-related job openings, many of which are unfunded, to create the impression of rapid adoption and success.
Another critical factor slowing adoption is the severe shortage of skilled professionals who can effectively implement and manage AI systems. Enterprises are discovering that traditional IT teams lack the specialized knowledge needed for successful AI deployment. Although cloud providers do offer various tools and platforms, the expertise gap remains a significant barrier.
This situation will likely create a stark divide between AI “haves” and “have-nots.” Organizations that successfully organize their data and effectively implement AI will use generative AI as a strategic differentiator to advance their business. Others will fall behind, creating a competitive gap that may be difficult to close.
A strategic path for adoption
Enterprise leaders must move away from the current pattern of rushed, poorly planned AI implementations. The path to success isn’t chasing every new AI capability or burning through cloud credits. Indeed, it’s through thoughtful, strategic development.
Start by getting your data house in order. Without clean, well-organized data, even the most sophisticated AI tools will fail to deliver value. This means investing in proper data governance and quality control measures before diving into AI projects.
Build expertise from within. Cloud providers offer powerful tools, but your team needs to understand how to apply them effectively to your business challenges. Invest in training your existing staff and strategically hire AI specialists who can bridge the gap between technology and business outcomes.
Begin with small, focused projects that address specific business problems. Prove the value through controlled experiments before scaling up. This approach helps build confidence, develop internal capabilities, and demonstrate tangible ROI.
The road ahead for cloud-based AI
Cloud providers will continue to grow in the coming years, but their market could contract unless they can help their customers develop AI strategies that overcome the current high failure rates. The reasons enterprises struggle with generative AI, agentic AI, and project failures are well understood. This isn’t a mystery to analysts and CTOs. Yet enterprises seem unwilling or unable to invest in solutions.
The gap between AI supply and demand will eventually close, but it will take significantly longer than cloud providers and their marketing teams suggest. Organizations that take a measured approach of thoughtful planning and building proper foundations may move more slowly initially, but will ultimately be more successful in their AI implementations and realize better returns on their investments.
As we move forward, cloud providers and enterprises must align their expectations with reality and focus on building sustainable, practical AI implementations rather than chasing the latest hype cycle. I hope that enterprises and cloud providers both can get what they are looking for; it should be the same thing—right?
Visual Studio Code stabilizes agent mode 8 Apr 2025, 12:48 am
Visual Studio Code 1.99, the latest release of Microsoft’s popular code editor, is now available. Highlights of the update center on GitHub Copilot agent mode, Next Edit Suggestions, and Copilot chat.
Developers can access Visual Studio Code 1.99, also known as the March 2025 release, through code.visualstudio.com. VS Code is available for Windows, Linux, and Mac.
In the VS Code 1.99 release, unveiled April 3, agent mode becomes part of VS Code Stable. With chat agent mode, developers can use natural language to define a high-level task and start an agentic code editing session. In agent mode, GitHub Copilot autonomously plans the work needed and selects relevant files and context, then makes edits to a code base and invokes tools to accomplish the developer’s request. Agent mode monitors the outcome of edits and tools and iterates to resolve issues. Developers can sign up for a GitHub Copilot subscription through the GitHub Copilot Free plan. Agent mode also features an experimental thinking tool that can give a model the opportunity to think between tool calls.
Model Context Protocol (MCP) servers are now supported in agent mode. MCP provides a standard method for AI to interact with external tools, applications, and data sources. When users input a chat prompt using agent mode in VS Code, the model can invoke tools to perform tasks such as accessing databases, file operations, or retrieving web data. This integration enables more dynamic, context-aware coding assistance, according to Microsoft.
For AI-powered code editing, Next Edit Suggestions now is generally available. Improvements include making suggestions more compact, less interfering with surrounding code, and easier to read at a glance. In addition, updates to the gutter indicator have been made to ensure suggestions are more easily noticeable. Other Copilot-related improvements include muting diagnostics events outside the editor when rewriting a file with AI edits, and saving files explicitly when the user decides to keep AI edits. Syntax highlighting for inline suggestions now is enabled by default.
VS Code 1.99 follows last month’s VS Code 1.98, which also brought enhancements for GitHub Copilot. Also in Visual Studio Code 1.99:
- AI-powered editing support for notebooks now is available in VS Code Stable. VS Code also now provides a dedicated tool for creating new Jupyter notebooks directly from chat.
- Improvements have been made to the reference picker that is used for various source control operations such as checkout, merge, rebase, or delete branch. The updated reference picker contains the details of the last commit (author, commit message, commit date), along with ahead/behind information for local branches.
- Enhanced IntelliSense for the code CLI brings support for subcommands to the
code
,code-insiders
, andcode-tunnel
commands. - For simplification, the terminal tab by default now shows much less detail.
- The shell integration PowerShell script now is signed. This means shell integration on Windows when using the default PowerShell execution policy of
RemoteSigned
now should start working automatically. - With a BYOK (Bring Your Own Key) preview, GitHub Copilot Pro and GitHub Copilot Free users now can bring their own API keys for popular providers such as Azure, Anthropic, Gemini, OpenAI, Ollama, and Open Router.
- Beginning with VS Code 1.99, prebuilt servers distributed by VS Code are only compatible with Linux distributions that are based on glibc 2.28 or later.
Page processed in 0.691 seconds.
Powered by SimplePie 1.3.1, Build 20160628201422. Run the SimplePie Compatibility Test. SimplePie is © 2004–2025, Ryan Parman and Geoffrey Sneddon, and licensed under the BSD License.