Skip to main content

· 12 min read

Application Programming Interfaces (APIs) come in various forms, with synchronous and asynchronous being the primary types. Synchronous protocols like Hypertext Transfer Protocol (HTTP) represent RESTful implementations, while asynchronous protocols like Server Sent Events (SSE) and MQ Telemetry Transport (MQTT) represent EVENTful implementations.

EVENTful message design can be categorized into three main types: notifications, objects, and streams. These messages can represent past actions (events) or future actions (commands).

A Brief History of Asynchronous APIs

Asynchronous APIs have been pivotal in the development of interactive and real-time web applications. The journey began with the establishment of standards like MQTT in 1999, which emerged as a lightweight messaging protocol ideal for low-bandwidth, high-latency environments. MQTT's publish-subscribe model was a departure from the synchronous HTTP protocol introduced in 1991, offering a more efficient way to handle real-time, bidirectional communication.

The term "EVENTful" APIs aptly encapsulates the nature of asynchronous communication, with EVENT standing for Efficient, Versatile, Nonblocking, and Timely. These characteristics are inherent to asynchronous APIs, which include message-based and event-driven architectures, providing a robust foundation for services that require real-time updates and interactions.

The concept of APIs has a storied history, dating back to the early days of computing as referenced in the 1951 book "The Preparation of Programs for an Electronic Digital Computer." Over the past 70 years, APIs have evolved dramatically, especially with the advent of web-based APIs. A significant milestone was the introduction of the Ajax pattern by Jesse James Garrett in 2005. Ajax leveraged asynchronous JavaScript and XML to enable web pages to dynamically fetch and display content without a full page reload, enhancing user experience and web application performance.

Ajax laid the groundwork for asynchronous web requests, primarily interacting with RESTful APIs. However, as the web evolved, so did the need for more efficient real-time communication methods. This led to the adoption of Server-Sent Events (SSE), a modern approach that allows servers to push updates to clients over a single, long-held HTTP connection. Unlike MQTT, which is a protocol designed for machine-to-machine communication, SSE is specifically tailored for web applications, providing a standardized way to stream updates from the server to the client.

RESTful Systems

In RESTful systems, the consumer initiates communication, making a request to which the service responds with the appropriate data. For example, a RESTful service might maintain a dataset of all completed trades for an investment firm. As each trade is executed and finalized, this dataset is updated accordingly. When a consumer needs to retrieve information on completed trades, it can request the service, which then responds with the relevant data for all the trades completed that day.

To illustrate this with a practical example, imagine an investment firm that needs to keep its traders and consumers informed about the status of their stock market trades. The firm could utilize a RESTful API to manage this data flow. The service would have access to a completed-trades dataset, exposing an endpoint that consumers could request to obtain the list of trades completed on a given day. A consumer application might request a web address such as https://investment.arbs.io/completed-trades to retrieve this information.

The service's HTTP response to such a request could be structured as follows:

HTTP/1.1 200 OK
Content-Type: application/json
Content-Length: 1234

{
"trades": [
{
"id": "1",
"stock": "AAPL",
"volume": "100",
"price": "150.00",
"status": "completed"
},
{
"id": "2",
"stock": "MSFT",
"volume": "50",
"price": "250.00",
"status": "completed"
}
// more trades here...
]
}

This JSON response provides a clear and structured representation of the completed trades, allowing consumers to easily parse and use the data as needed.

EVENTful Systems

EVENTful systems stand in contrast to RESTful systems by adopting a model where services actively "push" messages to consumers. These messages are sent based on predefined filters or subscriptions made by the consumer applications. Unlike RESTful systems, which adhere to a strict request-response pattern, EVENTful systems blur the traditional roles of consumers and services, allowing consumers to act as both producers and consumers of messages. This model is beneficial in scenarios where real-time updates are crucial, and the constant polling of a service for updates would be inefficient.

For instance, an EVENTful system could notify consumers immediately when a trade is executed. Instead of consumers repeatedly querying the service for the latest trade statuses, the service could send a message to the relevant consumers as soon as a trade is completed. This ensures that consumers receive updates in real-time without the need to make redundant requests to the service.

Let's reimagine the previous example. We have a service that sends a message each time a stock trade is executed. The message could be sent to all interested parties, such as traders, investment managers, or consumers, as soon as the trade is confirmed.

The structure of the JSON payload for an trade_executed event might look similar to the following:

{
"metadata": {
"message_source": "trade-execution-system",
"event-type": "trade-executed"
},
"message": {
"trade_id": "3335dc20",
"stock_symbol": "AAPL",
"volume": "100",
"price": "150.00",
"trader_id": "butsona",
"timestamp": "2023-03-13T12:00:00Z"
}
}

In this payload, the metadata provides context about the source and type of the message, while the message body contains the details of the executed trade, such as the trade ID, stock symbol, volume, price, trader ID, and execution timestamp. Each message represents a single trade execution, and it's up to the receiving applications to maintain a record of all trades if they need a historical log.

By leveraging EVENTful systems, the investment firm can minimize latency and resource consumption, ensuring that all parties have the most current information as soon as it's available.

Types of EVENTful Messages

EVENTful systems can utilize a variety of message types to communicate different kinds of information. While the terminology for these message types needs to be standardized, understanding their concepts is crucial for designing an effective event-driven architecture. The three primary types of messages are notifications, event-carried state, and event-source messages.

Message-Notifications

Notifications are the simplest form of EVENTful messages, often containing minimal information about an event and a point where more detailed information can be found. In the context of an investment firm, a notification might be sent out to inform a trader or consumer about a significant event, such as a large trade execution or a market movement that triggers an alert. The notification could include basic details and a URL to access a more comprehensive report or to perform a follow-up action.

For example, a notification message in a trading environment might look like this:

{
"metadata": {
"event-type": "trade-alert",
"message_source": "trading-platform"
},
"message": {
"trade_id": "f18121e4",
"timestamp": "2023-03-13T12:00:00Z",
"trade_url": "https://investment.arbs.io/trade-notifications/f18121e4"
}
}

A side-effect of using notifications is that it reduces the security requirements as the trade_url is protected, which means that only consumers with the correct permissions can request data, reducing the complexity of security.

Event-Carried State

Event-Carried State Transfer (ECS) messages are used when a significant block of data needs to be transmitted, such as a summary of a trade or a batch of trades. These messages carry all the necessary information to update a data store or to provide a comprehensive view of a trade without requiring additional queries to the system. This approach is beneficial for consolidating data from multiple sources and ensuring that the recipient has immediate access to the full context of the event.

An ECS message in a trading system might include details about the trade, the trader, and the current status, as follows:

{
"metadata": {
"event-type": "trade-summary",
"message_source": "trade-execution-system",
"timestamp": "2023-03-13T14:13:12Z"
},
"message": {
"trade": {
"trade_id": "f18121e4",
"stock_symbol": "AAPL",
"volume": "100",
"price": "150.00",
"trader_id": "butsona"
},
"status": {
"execution_time": "2023-03-13T13:13:13Z",
"trade_status": "completed"
}
}
}

Event-Source

Event-Source messages, or Delta messages, are designed to convey incremental changes or updates to data. They are useful for streaming real-time updates about the progress of a trade or a series of trades. These messages could inform consumers or internal systems about the progression of trade executions, price changes, or other relevant market events.

For example, a series of event-source messages might be sent to indicate the stages of trade execution:

[
{
"trade_id": "f18121e4",
"stage": "initiated",
"timestamp": "2023-03-13T10:11:12"
},
{
"trade_id": "f18121e4",
"stage": "executed",
"timestamp": "2023-03-13T10:14:13Z"
},
{
"trade_id": "f18121e4",
"stage": "confirmed",
"timestamp": "2023-03-13T10:14:15Z"
},
{
"trade_id": "f18121e4",
"stage": "settled",
"timestamp": "2023-03-13T10:16:15Z"
}
]

Each message in this array represents a discrete update to the trade's status, providing a granular view of the trade's lifecycle. By employing these different types of EVENTful messages, an investment firm can ensure efficient and timely communication within its trading ecosystem, facilitating quick decision-making and enhancing overall market responsiveness.

Events and Commands

In an EVENTful system within an investment firm, messages are a crucial component of the communication process. They are categorized based on purpose and timing: command messages are sent to trigger an action, while event messages are sent to notify that an action has occurred. Understanding and distinguishing between these two types of messages is fundamental when designing an EVENTful API system.

Identifying Events and Defining Commands

Defining Events

Identifying events is a critical step in the architecture of asynchronous APIs. It involves pinpointing all significant activities within the domain of stock market trading that should be tracked. Once identified, these activities are documented as events and incorporated into the API's implementation. Events in a trading environment might include a trader placing an order, a consumer logging into their account, or executing a trade.

In addition to these user-centric events, process-centric events occur within the trading system itself. These could involve the internal progression of a trade order through various stages, such as order validation, execution, and settlement. Events may also signal issues or exceptions, such as a trade rejection due to insufficient funds or a discrepancy in a trade order.

A corresponding message is designed to convey the necessary information for each identified event. The nature of the event will determine the type of message required—whether it's a simple notification, a more detailed object-style message, or a continuous stream of updates. For instance, a notification might be sent when a consumer logs in, while an object-style message might be used to convey the details of a new trade order, and streaming messages could provide real-time updates as a trade progresses through various stages.

Discoverability is essential, and cataloguing these events is crucial, as well as listing critical information such as the event name, message type, triggering conditions, example messages, and any additional notes. This documentation becomes a valuable resource for designers, developers, and architects involved in the system's development.

Here's an example of how events might be documented for an investment firm:

  • Trade Initiated: A trader orders a stock trade through the investment firm's platform.
  • Trade Executed: A trade order has been successfully executed on the stock exchange.
  • Trade Settled: The executed trade has been settled, and the securities and funds have been exchanged.

After identifying the significant past events, the next step is defining the commands representing future actions. Commands in a trading system include placing a trade order, modifying an existing order, or transferring funds. These commands are integral to the system's interactivity, allowing users to initiate actions that the trading system will process.

By meticulously identifying events and defining commands, an investment firm can ensure that its EVENTful system is comprehensive, responsive, and capable of handling the complexities of stock market trading. This approach enables the firm to maintain high service and efficiency, providing consumers with timely information and the ability to act swiftly in a dynamic market environment.

Defining Commands

In EVENTful systems, the distinction between events and commands is pivotal. Events indicate actions within the system, providing a historical record of transactions and activities. Conversely, commands instruct parts of the system to perform future actions, which may be executed immediately or after some delay, depending on the process involved.

For example, a command to execute a trade might be processed nearly instantaneously, reflecting a change in the status of a trade from pending to completed. However, specific commands, such as those involving complex payment processing or third-party bank verifications, may incur delays. In some cases, if there are issues with payment verification, the completion of a command might take significantly longer, ranging from minutes to hours.

The ability to define and initiate commands is essential for the functionality of an EVENTful system. Commands can encompass various actions, such as placing a new trade, modifying an existing trade, or cancelling a trade order. When designing such a system, it is crucial to meticulously describe all the commands necessary to support the desired activities within the trading environment.

The methodology for defining commands mirrors identifying events. A structured catalogue of each command detailing the command name, the type of message it will generate, an example of the message, and any supplementary notes. This documentation is a reference for the team members tasked with implementing the EVENTful system, ensuring clarity and consistency across the development process. This serves as a reference for consumers to understand how to interact with the solution. It is common to place commands behind RESTful APIs to improve security and improve discoverability (for example using the openapi-specifications).

For instance, the command list might include actions such as:

  • Place Trade: A command that initiates a new trade order within the trading platform.
  • Modify Trade: A command that alters an existing trade order's parameters.
  • Cancel Trade: A command that enables the cancellation of a previously placed trade order.

Upon finalizing the list of commands, the foundational elements of an EVENTful system are established, encompassing messages, events, and commands.

Summary

We explore the fundamental aspects of an EVENTful API system, focusing on transmitting messages between machines based on past events and future commands. The three types of Message-Notifications, Event-Carried State, and Event-Source have been discussed, along with guidance on their appropriate use. Additionally, the processes for identifying events and defining commands have been outlined, with the recommendation to document these elements for ease of collaboration and implementation. This structured approach is instrumental in building an efficient and effective EVENTful system, facilitating real-time communication and action in an ever growing fast-paced world.

· 16 min read

Artificial Intelligence (AI), particularly Large Language Models (LLMs), has revolutionised various sectors by allowing organisations to tackle complex problems and perform tasks efficiently. However, as organisations increasingly adopt LLMs, the need to understand their behaviour in a production environment and use this understanding to improve their development has become apparent.

The solution to these challenges lies in software observability, which refers to the ability to understand an application's behaviour based on the telemetry data it generates at runtime. The complexity of modern software systems, including LLMs, necessitates using observability tools and practices to manage their complexity and unpredictability.

LLMs have transformed the way organisations approach machine learning (ML). They offer solutions for complex problems, making them an accessible tool for any product engineering team. However, the very features that make LLMs so appealing also present significant challenges, particularly regarding reliability and predictability.

The Challenges of LLMs

LLMs are essentially black boxes. Their outputs are nondeterministic and depend on natural language inputs, which are inherently broad and unpredictable. This means that users of applications with natural language inputs will inevitably do unexpected things. Debugging an LLM takes a lot of work. You can likely explain why an LLM produces a particular output for a given input if you're an ML researcher.

Testing LLMs

Traditional unit and integration testing, verifying that specific inputs yield specific outputs, are ineffective with LLMs. The range of possible outputs is vast, making it impossible to test all potential inputs and scenarios exhaustively. Instead, ML teams often build evaluation systems that can assess the effectiveness of a model.

De-risking Product Launch

Attempts to de-risk a product launch through early access programs or limited user testing can introduce bias and create a false sense of security. These programs must often capture the full range of user behaviour and potential edge cases in real-world usage. Instead, it's better to embrace a "ship to learn" mentality and release features earlier, but you need a way to systematically "learn" from what was shipped.

Observability in LLMs

To deal with the challenges LLMs pose, engineering teams have turned to observability as a better way to debug, monitor, and use data from production to inform product improvements. Teams can manage modern systems by collecting relevant information about their applications from within their code and systematically analysing and monitoring this data. This principle can apply equally to products that use modern AI systems.

Prompt Engineering

Prompt engineering, also known as prompting, is a collection of techniques to guide a Language Model (LLM) to generate desired outputs without modifying the model. The primary communication method with the model is through one or more textual inputs, which can include instructions, data, user inputs, example outputs, and more.

Consider a scenario where you use an LLM to generate a SQL query based on natural language input. The prompt for this task could include the following text, with several placeholders that parameterise the relevant data for the task:

-- You are an AI that turns natural language input into SQL queries.  Given user input, the table and its columns produce an SQL statement.

-- Input: Get all order that Bob has made this year.
-- Output:
select * from orders where customer = 'Bob' and year(order_date) = '2023'

Prompts offer a flexible way to influence the behaviour and output of an LLM, allowing the generated text to be customised for specific tasks, styles, or domains. However, prompt engineering is a subtle and nuanced process. Even minor changes to the prompts can lead to significant differences in the outputs produced by the model.

OpenTelemetry and (W3C) Trace Context

OpenTelemetry, an open standard for telemetry creation, offers a unified framework for capturing and collecting observability data. This is instrumental in gathering insightful data about LLM-based products' user behaviour and system performance.

One of the primary data types supported by OpenTelemetry is a trace. A trace, a distributed trace, is a set of structured logs, known as spans, linked by an ID and assigned a duration. Spans can also identify a parent span, enabling the representation of a hierarchy of operations in data.

To leverage OpenTelemetry for modern AI observability, several steps need to be taken:

  1. Automatic Instrumentation Installation: Install automatic instrumentation or a relevant instrumentation library to monitor incoming and outgoing requests. This enables tracking external API call behaviour, such as calls to OpenAI, and correlating that information with a user request to your applications.

  2. OpenTelemetry SDK Installation: Install the appropriate OpenTelemetry Software Development Kit (SDK) for your language into your codebase. Use the OpenTelemetry APIs to create manual instrumentation that captures relevant data and operations before and after a call to a generative AI model.

By integrating OpenTelemetry's automatic tracing instrumentation capabilities with manual instrumentation, you can capture all the necessary data to start systematically analysing user behaviour. This approach allows you to understand how user behaviour influences the results produced by a generative AI model.

Cost Management in Language Model Services

When utilising external services such as OpenAI, it's crucial to understand the cost implications associated with each API call. These services typically provide mechanisms to monitor usage daily or monthly, catering to most organisations' needs. However, for those requiring more detailed insights, there are methods to track costs more granularly.

Token-Based Cost Tracking

In the context of large language models (LLMs), the concept of 'tokens' is central to understanding cost. Tokens are encoded representations of input and output text. When you provide input text to an LLM, it's encoded into a list of tokens, efficiently representing the text. The LLM responds by emitting tokens, which are then decoded into the response text.

Cost Considerations

While cost tracking is essential, it's worth noting that it's rarely the primary concern for users of LLMs. The cost of using most LLMs is relatively low, and trends suggest that they will become even more affordable over time due to increased efficiency and competitive pressures.

Rate Limiting and Cost Estimation

When using an LLM, rate limiting can also be applied at the application level. This, combined with the ability to calculate costs based on token usage and vendor rate limits, simplifies estimating your monthly bill. This approach ensures that you can effectively manage your costs while leveraging the powerful capabilities of LLMs.

Understanding and Optimising Large Language Models (LLMs) and Generative AI Systems

Large Language Models (LLMs) and generative AI systems are complex, nondeterministic entities that can handle various inputs. To effectively monitor and optimise these systems, it is crucial to track both inputs and outputs systematically. This process involves five key components:

  1. User inputs
  2. LLM outputs
  3. Data values post parsing/validation of LLM outputs (assuming no errors)
  4. Any errors, whether from LLM output or parsing/validation of the LLM output
  5. User feedback (e.g., thumbs up/down responses)

This assumes that a mechanism for tracking user feedback is integrated into your telemetry. While not mandatory, this feature significantly enhances your system's observability and prompt engineering efforts.

The Importance of Parsing and Validating LLM Outputs

Parsing and validating LLM outputs is a critical aspect of managing these systems. There are several reasons for this:

  1. Security: LLM outputs should be treated as untrusted inputs to your system. Parsing and validating these outputs can help mitigate potential security risks, such as prompt injection attacks.
  2. Versatility: Parsing and validating LLM outputs allows you to use these systems for various applications, not just basic chatbots. This process enables you to validate the outputs against a set of rules, which is crucial for using those outputs in other parts of an application or displaying them to users.
  3. Accuracy: Some prompt-engineering techniques involve having an LLM output piece of an answer that you manually assemble into the complete answer later. This approach can reduce the complexity of a task for the LLM and increase its accuracy.
  4. Error Handling: Parsing and validating LLM outputs can produce specific and often correctable errors. This allows for a set of "fixups" to the data the LLM returns, which can yield impressive results.

The Need to Capture Both LLM Outputs and Final Outputs

When you parse and validate LLM outputs, the final output (assuming there's no error) is often in a different format from what the LLM initially responds with. Therefore, capturing both the LLM outputs and your final outputs in your telemetry is crucial.

The Importance of Tracking All Errors

Tracking all errors, whether they arise from a network error, a timeout, the LLM itself, or the parsing and validation process, is crucial for understanding what kinds of user inputs can lead to errors. This information is vital for improving the user experience and identifying opportunities to correct an LLM output directly if it fails a parsing/validation step.

Analysing Inputs, Outputs, and Errors

With user inputs, LLM outputs, validation/parsing outputs, and errors at your disposal, you can start analysing them. Using an observability tool, write a query that groups requests by error and frequency. This will provide a prioritised list of issues to fix.

Monitoring API Call Performance for Language Learning Models (LLMs)

Understanding the performance of API calls to Language Learning Models (LLMs) is crucial for maintaining a high-quality user experience. This involves distinguishing between the latency and errors associated with these API calls and the complete operations involving an LLM.

Factors Influencing API Call Performance

Several factors can influence the latency and errors experienced with API calls to LLMs. These include:

  1. API Call Frequency: The number of API calls made per user request can significantly impact the performance. For instance, generating a vector embedding for each user input before calling an LLM can increase the number of API calls, potentially affecting the latency.
  2. Rate of API Calls: The frequency of API calls made per minute can also influence the performance. A high rate of calls can lead to rate limiting, timeouts, or errors due to resource unavailability.
  3. Token Count: The average number of tokens passed to an LLM for each request, and the average number of tokens received per request can affect the latency and error rate.
  4. Rate Limiting: The frequency of rate-limited requests can indicate whether the API calls are being throttled, leading to increased latency or errors.
  5. API Call Contribution to Overall Latency: Understanding the proportion of overall user-experienced latency due to an API call to an LLM can help identify areas for optimisation.

Understanding and Evaluating LLM API Performance

While these factors may not always provide direct action points, they are essential for comprehending the overall behaviour of your product. They can also serve as evaluation metrics for different LLM APIs and vendors, enabling you to effectively assess their ability to service your requests.

Monitoring and Service-Level Objectives

Service-level objectives (SLOs) are critical to any system's performance monitoring strategy. They provide a quantifiable measure of the system's performance, allowing teams to identify and address issues proactively. SLOs are typically defined as service-level indicators (SLIs), which are functions that return a binary, true or false, based on the measurement of specific data, such as requests to OpenAI.

Key Service-Level Indicators: Latency and Error Rates

When setting up SLOs for systems that involve Language Learning Models (LLMs), two fundamental SLIs to track are latency and error rates. While other SLOs could be beneficial, these two are often the most critical starting points.

Latency SLOs

Latency is the time it takes for a user to receive a result after initiating a request. It's crucial to monitor latency throughout the entire lifecycle of user interaction with a feature that uses LLMs. This includes the time taken to gather input, build or gather the prompt to an LLM, make additional API calls (such as fetching a vector embedding), make the call to an LLM, and parse/validate results.

Error Rate SLOs

The second SLO to monitor is error rates. This includes any error encountered during the process, whether it originates from an API call or parsing/validation of LLM results.

SLO Monitoring and Alerting

SLO alerts for LLMs should be non-urgent. They exist to inform so that a team can plan a corrective action rather than cause a team to halt everything and fix a problem. It's recommended to send these alerts to a messaging channel, such as one in Slack or Microsoft Teams, and never trigger an alert on platforms like PagerDuty. Unlike LLM SLO alerts, paging alerts should always be directly actionable.

Utilising Observability Data for Product Enhancement

Observability, an essential aspect of system monitoring, is crucial in developing and improving products, especially those utilising Language Learning Models (LLMs). Given that LLMs are not traditionally debuggable, the only way to identify improvement areas is by analysing user data. When collected in sufficient quantities, this data can reveal patterns where the product fails to meet user expectations, thus providing opportunities for iteration and improvement.

Leveraging Production Data for Iteration

Using production data for iteration may seem complex, but it's pretty straightforward. All it requires is an observability tool and the ability to interpret the results of a query against that data. The essential data points to consider are any existing error, the input provided by the user, the output of the LLM, and the output of parsing/validation, if it exists.

Addressing Correctable Errors

LLMs generate outputs that follow a specific structure rather than open-ended text in many products. When the structure output by an LLM is incorrect, you have the option to intervene directly on that error.

Building a Prompt Evaluation System Based on Production Data

In the long run, two primary ways to improve your product's use of an LLM are rigorous prompt engineering and fine-tuning an LLM. Both methods require a systematic way to quantify improvements in an LLM's performance, which can be achieved through an evaluation system.

Using Production Data to Power an Evaluation System

The most significant task in an evaluation system is building the dataset to be evaluated. Observability is critical to creating evaluation datasets because it gathers real-world user inputs and system outputs. The data used to power an evaluation system must be representative. If you need more unique data points to evaluate, you'll all gain a false sense of confidence in your evaluations.

A Collaborative Approach

Observability, particularly in the context of Large Language Models (LLMs), is not a solitary endeavour. It requires a collaborative effort from various roles within an organisation. This principle holds for all software observability. While roles such as Site Reliability Engineering (SRE), DevOps, and platform engineers often take the lead, enhancing software, including LLMs, necessitates the involvement of a diverse range of individuals.

The Role of Different Stakeholders in Observability

Regarding products that utilise LLMs, roles not traditionally associated with observability come into play. These include product managers, ML engineers, and data scientists. While having optional job titles in your team to get started is not mandatory, having individuals who can fulfil the responsibilities typically associated with these roles is crucial.

Key Areas of Understanding for Effective LLM Implementation

Certain areas of understanding and action within your organisation are essential to optimise the use of LLMs in production. These include:

  • Instrumenting your application to emit the necessary telemetry
  • Analysing telemetry with the intent to enhance a feature
  • Monitoring telemetry to ensure your changes are effective
  • Handling end-user feedback
  • Understanding user expectations for your LLM-powered feature
  • Knowing when your data will be representative
  • Cleaning and classifying data effectively for evaluation
  • Setting up production data pipelines for continuous evaluation system improvement
  • Establishing developer tools and infrastructure to support prompt engineering efforts, prompt lifecycle management, and systematic validation of prompt changes against an evaluation system

Shifting Responsibilities and Roles

The introduction of LLMs may necessitate changes in responsibilities for existing roles. Software engineers should focus more on data quality, representativeness, and working with probabilistic systems. ML engineers must adopt a more product-minded approach, understanding user interactions and intended product behaviour. Product managers must familiarise themselves with Python and Jupyter Notebook to participate in prompt-engineering experiments. LLMs are transformative, not just for products but also for the roles people play within an organisation.

Understanding User Interactions

A common theme across many organisations is that LLMs compel individuals at all levels to understand how their users interact with their products. LLMs not only fundamentally alter existing products, but they also enable entirely new categories of products and capabilities. These introduce new modalities for user interaction, and success is only possible with understanding these interactions and user expectations.

Adapting to New Responsibilities

As you begin to use an LLM in production, it's okay to hire a host of people with different job titles immediately. However, your organisation must be prepared to adapt. Individuals may need to take on responsibilities not traditionally associated with their roles. With this adaptability, your organisation may effectively utilise LLMs in the long term.

The Future of Observability Tools and Practices

Observability tools and practices are a crucial component of modern software development, and their importance escalates when building products that use LLMs. LLMs, being nondeterministic and essentially a black boxes, present unique challenges to reliability and require a different approach to development and iteration.

Teams that already practice observability likely find that their existing tools and playbooks transition well into making LLMs in production more reliable. However, they will face new challenges in integrating that data into development for core prompt engineering and model fine-tuning work.

Looking ahead, we can expect advancements in the following domains:

  • Automated tools for Large Language Models (LLMs) instrumentation
  • Enhanced solutions for managing the lifecycle of prompt engineering
  • Advanced utilities for transferring data from production environments to development ones
  • Specialized observability tools focusing on the areas above
  • Improved solutions for simplifying the fine-tuning of LLMs
  • Ready-to-use evaluation frameworks that streamline the construction and operation of evaluation systems

While innovation will lead to improved tools and practices, it's unlikely that a single tool or practice will solve all the challenges involved in making LLMs more reliable. Therefore, adopting a more general approach to making software more reliable and applying it to LLMs is valuable. Software observability is that approach.

· 13 min read

Quantum computing represents a paradigm shift in information processing, offering a fundamentally distinct approach to problem-solving and computation. Unlike classical computers operating in a single state at any given moment, quantum computers leverage the ability to exist in many states concurrently. This unique characteristic has led many researchers to posit that quantum computers could potentially deliver exponential speedups and tackle problems that are currently beyond the reach of classical computers.

Classical Computing: The Foundation

The advent of classical computers has propelled us into the Information Age, catalyzing a myriad of digital revolutions, including personal computing, internet communication, smartphones, machine learning, and the broader knowledge economy. Classical computers encode and manipulate data in units known as bits, using billions of semiconductor components called transistors to switch or amplify electrical signals. A classical bit, akin to the power switch on your electronic device, can exist in one of two states at any given time - 0 or 1. This binary nature of classical information processing forms the bedrock of our current digital landscape.

Quantum Computing: The Mechanics

Quantum computers, on the other hand, process information by harnessing the behaviours of subatomic particles such as electrons, ions, or photons. Data is stored in quantum registers composed of quantum bits or qubits. Unlike classical bits, qubits are not confined to binary states. The principles of superposition, entanglement, and interference govern them.

Superposition

Superposition is a quantum property that allows qubits to exist in multiple states until an external measurement is made. For instance, an electron's state could be a superposition of "spin up" and "spin down". Drawing from the famous Schrödinger's cat analogy, a qubit in superposition is akin to the cat being both dead and alive until observed. A qubit could be in a 0 state, a one state, or any complex linear combination of 0 and 1. Upon measurement, the qubit collapses into a binary state, becoming a classical bit.

Entanglement

Entanglement is another quantum phenomenon where particles become interconnected so that they cannot be described independently, even across vast distances. This is in stark contrast to classical bits, which are independent of each other. In quantum computing, entangled qubits fall into a shared quantum state, and manipulating one qubit can influence the entire system's probability distribution. The number of states also grows exponentially with the addition of each qubit, offering a significant advantage over classical computers.

Interference

Interference, the final quantum property impacting the operation of a quantum computer, involves the addition of the wave functions of all entangled qubits. This process describes both the quantum computer's state and the interference phenomenon. Constructive interference increases the probability of a correct solution, while destructive interference decreases it. Quantum algorithms are designed to orchestrate this interference to maximize the likelihood of helpful measurement states.

In essence, quantum computing leverages the principles of superposition, entanglement, and interference to process information in ways that classical computers cannot. The potential of quantum computing is immense, with the capability to process more possibilities than the number of atoms in the observable universe, given a quantum computer unaffected by decoherence and noise. However, the field is still nascent, and much research is needed to realize and harness this potential fully.

The Genesis of Quantum Computing

The inception of quantum computing can be traced back to the Soviet mathematician Yuri Manin, who first proposed the concept in his book, "Computable and Uncomputable," published in 1980. In the same year, American physicist Paul Benioff, affiliated with the French Centre de Physique Théorique, introduced a quantum mechanical model of a Turing machine in a scholarly paper.

In 1981, Benioff and American theoretical physicist Richard Feynman presented separate talks on quantum computing at MIT's inaugural Conference on the Physics of Computation. Feynman, in his lecture titled "Simulating Physics with Computers," underscored the necessity of a quantum computer for simulating a quantum system, famously stating, "Nature isn't classical, dammit, and if you want to make a simulation of nature, you'd better make it quantum mechanical."

Quantum Computing: Gaining Momentum

The pioneering work of Benioff and Feynman sparked a surge of interest in quantum computing during the final decades of the 20th century. British theoretical physicist David Deutsch, intrigued by the potential of a quantum computer to test the "many-worlds interpretation" of quantum physics, proposed the concept of a quantum Turing machine (QTM) in a 1985 paper.

By 1992, Deutsch, in collaboration with Australian mathematician Richard Jozsa, identified a computational problem that could be efficiently solved on a universal quantum computer using their Deutsch-Jozsa algorithm. This problem, they believed, could not be solved efficiently on a classical computer. For his significant contributions, Deutsch is often called the "father of quantum computing."

Quantum Speedup: Shor's and Grover's Algorithms

In the late 20th century, we witnessed the development of several quantum computer models and algorithms. One of the most notable is Shor's algorithm, developed by Peter Shor, an applied mathematician at AT&T Bell Labs. In 1994, Shor introduced a method for factoring large integers in polynomial time, considered "efficiently solvable" or "tractable."

Factoring, the process of decomposing a number into smaller numbers that multiply to give the original number, is a fundamental mathematical operation. While it is straightforward to multiply factors to produce the actual number, finding the original characteristics of large numbers is challenging due to the vast search space of possible factors.

Shor's algorithm can break existing encryption systems by factoring primes in polynomial time, which has significant implications for cryptographic systems and data security. However, the practical implementation of Shor's algorithm on a quantum computer capable of breaking current advanced encryption schemes is still a distant reality due to the limitations in the number of qubits in current quantum computers.

In 1996, Lov Grover, a researcher at Bell Labs, made another significant development in quantum computing. Grover introduced a quantum algorithm for database search, which provides a quadratic speedup for one-way function problems typically solved by random or brute-force search. Grover's algorithm leverages the principles of qubit superposition and interference to iteratively check and eliminate non-solution states, thereby finding the correct solution with certainty. This algorithm is particularly effective for computational problems where finding a solution is difficult, but verifying a solution is relatively straightforward.

Quantum AI/ML

The advent of quantum computing has sparked a wave of excitement due to its potential to revolutionize information processing. This enthusiasm led to the establishment of initiatives for enhanced information sharing, policymaking, and prioritization of national and international research efforts.

In the mid-1990s, the National Institute of Standards and Technology (NIST) and the Department of Defense (DoD) hosted the first U.S. government workshops on quantum computing. By 2000, theoretical physicist David DiVincenzo had outlined the requirements for constructing a quantum computer, known as the DiVincenzo criteria. These criteria include well-defined qubits, initialization to a pure state, a universal set of quantum gates, qubit-specific measurement, and long coherence times.

In 2002, an expert panel convened by Los Alamos National Laboratory released a Quantum Information Science and Technology Roadmap. This roadmap aimed to capture the challenges in quantum computing, provide direction on technical goals, and track progress toward those goals through various technologies and approaches. The panel adopted the DiVincenzo criteria to evaluate the viability of different quantum computing approaches.

Significant milestones were achieved as evaluations of quantum computing models and approaches yielded physical hardware and valuable algorithms. In 1995, Christopher Monroe and David Wineland demonstrated the first quantum logic gate with trapped ions, an indispensable component for constructing gate-based quantum computers. A decade later, researchers at the University of Michigan created a scalable and mass-producible semiconductor chip ion trap, paving the way for scalable quantum computing.

In 2009, researchers at Yale University made the first solid-state gate quantum processor. Two years later, D-Wave Systems of Burnaby, British Columbia, became the first company to market a commercial quantum computer. D-Wave's machine, which uses a unique approach to analogue computing known as quantum annealing, is not a universal quantum computer but is specialized for problems where the search space is discrete, with many local minima or plateaus, such as combinatorial optimization problems.

The introduction of the original D-Wave machine highlighted the potential economic rewards and national security dividends of advances in quantum hardware and software. However, the research involved would be expensive and risky. This led to partnerships between private-sector companies and government agencies in the early 2000s. Early adopters of D-Wave quantum computers included Google in alliance with NASA, Lockheed Martin Corporation in cooperation with the University of Southern California, and the U.S. Department of Energy's Los Alamos National Laboratory.

Recognizing the potential of quantum computers in solving intractable problems in computer science, especially machine learning, Google Research, NASA, and the Universities Space Research Association established a Quantum Artificial Intelligence Lab (QuAIL) at NASA's Ames Research Center in Silicon Valley. NASA aims to use hybrid quantum-classical technologies to tackle some of the most challenging machine learning problems, such as generative unsupervised learning. IBM, Intel, and Rigetti are also pursuing goals to demonstrate quantum computational speedups over classical computers and algorithms in various areas, sometimes called quantum supremacy or quantum advantage.

In 2017, University of Toronto assistant professor Peter Wittek founded the Quantum Stream in the Creative Destruction Lab (CDL). Quantum Stream encourages scientists, entrepreneurs, and investors to pursue commercial opportunities in quantum computing and machine learning. Quantum Stream's technology partners include D-Wave Systems, IBM Q, Rigetti Computing, Xanadu, and Zapata Computing. Numerous startups and well-established companies are also forging ahead to create their quantum computing technologies and applications.

In November 2021, IBM Quantum announced Eagle, a 127-qubit quantum processor. However, the University of Science and Technology of China also claimed a 66-qubit superconducting quantum processor called Zuchongzhi and an even more powerful photonic quantum computer called Jiuzhang 2.0 in the same month.

Determining who has achieved primacy in quantum computing is challenging due to the murky process of verifying and benchmarking quantum computers and the inherent diversity in current approaches and models of quantum computers. There is excitement surrounding various models for manipulating qubits: gate model quantum computing, quantum annealing, adiabatic quantum computing (AQC), and topological quantum computing. There is also great diversity in methods for building physical implementations of quantum systems.

The physical implementation of quantum computers is crucial because quantum computers and qubits are notoriously difficult to control. Information stored in qubits can escape when they become accidentally entangled with the outside environment, the measurement device and controls, or the material of the quantum computer itself. This seepage of quantum information is called decoherence. Qubits must also be physically shielded from any noise: changing magnetic and electrical fields, radiation from other electronic devices, cosmic rays from space, radiation from warm objects, and other rogue particles and waves.

2018, President Donald Trump signed the National Quantum Initiative Act into law. The act is designed to plan, coordinate, and accelerate quantum research and development for economic and national security over ten years. Funded under the National Quantum Initiative Act is the Quantum Economic Development Consortium™ (QED-C™), with NIST and SRI International as lead managers.

Several critical online resources support quantum computer science. The Quantum Algorithm Zoo, a comprehensive catalogue of quantum algorithms, is managed by Stephen Jordan in Microsoft Research's Quantum Systems group. IBM hosts the Quantum Experience, an online interface to the company's superconducting quantum systems and a repository of quantum information processing protocols. Qiskit is an open-source software development kit (SDK) for anyone interested in working with OpenQASM (a programming language for describing universal physical quantum circuits) and IBM Q quantum processors. In collaboration with the University of Waterloo, the "moonshot factory" X, and Volkswagen, Google AI announced TensorFlow Quantum (TFQ) in 2020; TFQ is a Python-based open-source library and framework for hands-on quantum machine learning.

The Revolution in Information Science and Technology

Quantum computing, a field that leverages the principles of quantum mechanics, has already begun to influence various sectors, including machine learning (ML) and artificial intelligence (AI), genomics, drug discovery, and more. Quantum simulation, a notable application, could expedite the prototyping of materials and designs, potentially revolutionizing industries like manufacturing and aerospace.

However, the current capabilities of quantum computers are limited to simulating only a few particles and their interactions. Despite this, researchers are uncovering promising insights that could help us understand complex phenomena like superconductivity, environmental-friendly production methods, and the intricacies of aerodynamics.

A Game Changer in Encryption and AI

In recent years, there have been several groundbreaking developments in quantum computing. For instance, the National Security Agency's SIGINT initiatives, revealed by Edward Snowden in 2014, aimed to break strong encryption and gain access to secure digital networks. The agency planned to develop an $80 million quantum "god machine" for these purposes.

Moreover, researchers have made strides in understanding quantum Darwinism, a theory that explains how our classical physics world emerges from the quantum world. This theory suggests that the transition from quantum to classical is akin to the process of evolutionary natural selection.

A New Era of Information Technology

The convergence of quantum computing and AI, often called Quantum AI/ML (QAI), drastically transforms information science and technology, economic activities, social paradigms, and political arrangements. This shift could lead to a post-scarcity golden age where quantum AI democratizes access to limitless computational possibilities.

Johannes Otterbach, from the quantum-computer company Rigetti, has noted that quantum computing and machine learning are inherently probabilistic, making them natural partners. Quantum computers could significantly speed up training in machine learning, advancing all three primary subcategories of ML: supervised learning, unsupervised learning, and reinforcement learning.

Revolutionizing Various Industries

Quantum computing and AI have already begun to intersect in various applications. For instance, quantum algorithms have been developed for route and traffic optimization, computing the quickest route for each vehicle in a fleet and optimizing it in real-time. Companies like Toyota Tsusho Corp and Volkswagen have demonstrated the potential of these quantum routing algorithms to reduce wait times and traffic congestion.

Predictive and risk analytic QAI technology could also aid in forecasting and managing hazards such as geopolitical events, financial panics, and future pandemics. Furthermore, quantum AI could revolutionize fields like seismology, geological prospecting, and medical imaging.

Quantum Ultra-intelligence: The Future of AI

The potential of quantum artificial intelligence has inspired a new literary subgenre called quantum fiction. While these works are purely fictional, they reflect the aspirations of computer scientists striving to engineer an artificial general intelligence (AGI) that possesses self-awareness.

However, it remains to be seen whether we can maintain control over a self-aware QAI or persuade it into a collaborative partnership with humanity. As we continue to develop these technologies, ensuring they remain beneficial to society is crucial.

Summary

The convergence of quantum computing and AI is set to revolutionize various aspects of our lives. As Max Tegmark, an MIT physicist and ML specialist has said, "Everything we love about civilization is a product of intelligence, so amplifying our human intelligence with artificial intelligence has the potential of helping civilization flourish like never before—as long as we manage to keep the technology beneficial."

· 5 min read

Artificial Intelligence (AI) is rapidly transforming the landscape of database systems, with its influence permeating operational and service-delivery aspects. This transformation is a symbiotic one. On the one hand, AI is being leveraged to enhance database performance, facilitating autonomous and semi-autonomous operations and data service delivery. On the other hand, databases are integral to AI and Machine Learning (ML), as they manage and supply high-quality, reliable data when needed.

Leveraging AI for Enhanced Database Performance

Artificial intelegence and machine learning technologies hold the potential to improve the performance of various types of databases. They can be utilized for tasks such as discovering, processing, and searching datasets, delivering quick results. As Thomas Davenport and Thomas Redman noted in the MIT Sloan Management Review, AI is subtly enhancing data management, including aspects like data quality, accessibility, and security. They further elaborate that managing data is a labour-intensive activity that involves cleaning, extracting, integrating, cataloguing, labelling, and organizing data, among other tasks.

Today's data managers are challenged to provide improved data capabilities within limited or relatively static budgets. With organizations sourcing and ingesting more data than ever before, often in the multiterabyte and gigabyte range, this data must be readily available to business users, data scientists, and mission-critical applications. AI is revolutionizing the way databases operate today, autonomously improving database query development and performance and managing databases' daily operation, provisioning, and security.

Emerging methodologies that incorporate AI in database management include AIOps, which applies AI to streamline and automate data operations; DataOps, which involves the application of intelligent collaboration and automation to data pipelines; and DataSecOps, which pertains to data security operations on cloud-native databases.

Applying AI to database functions will enable data engineers, architects, administrators, and scientists to focus on more significant tasks beyond routine maintenance. These tasks include digital transformation and innovation, crucial for thriving in today's highly competitive environment.

The Role of Databases in AI Development

Databases are indispensable to AI development. AI's success hinges on the availability of meaningful and relevant data, making a well-managed database the bedrock of AI. The quality of AI models and algorithms is directly proportional to the quality of data they are fed. Organizations rely on databases operating at optimal performance to supply timely and pertinent data for training datasets and large language models.

In the future, enterprises and data managers must pinpoint the data crucial for training models and address potential data shortages for maintaining these models. The data fuelling AI systems must be current and relevant to business issues, often in real-time. Moreover, this data must be of the highest quality and trustworthiness.

Data utilized by ML models is often "raw" or unstructured, necessitating content delivery networks as part of a high-performance data architecture. While simple time-series data can be accumulated and stored in a database, training using audio or image data often exceeds the capabilities of databases. A content delivery network—comprising interconnected servers that cache such assets close to applications or end users—may be more appropriate in such cases.

Databases supporting AI initiatives must also manage a broad spectrum of data types, from structured to unstructured. Distributed SQL databases with Hybrid Transactional/Analytical Processing (HTAP) capabilities meet this requirement, delivering real-time analytical data of all types when and where needed.

The Value Addition of Generative AI

Generative AI—offered by platforms like OpenAI's ChatGPT, Google's Bard, or Microsoft's Bing Chat—can potentially revolutionize various aspects of the database world. Operationally, generative AI can generate code for applications or scripts that boost database performance and integration. This allows database developers, architects, engineers, and administrators to undertake higher-level tasks and respond more swiftly to business needs.

Generative AI can also aid in database configuration and assist in designing a high-performance data architecture, leveraging patterns and experiences stored locally or across the network.

From a service-delivery perspective, modern databases will be tasked with preserving the data used within large language models for enterprise-specific instances of generative AI. This data offers recommendations to database teams and across the broader business.

The Evolution of SQL Development in the Age of AI Innovation

The emergence of AI has significantly broadened the capabilities of databases and the roles of those working with them. AI enables the automatic construction of simple SQL queries through natural language processing prompts, minimizing or eliminating the need for coding. An AI-driven SQL interface can also suggest questions based on an analysis of the backend database.

Generative AI, for example, has much to offer for ad hoc or natural language queries created by non-technical users. For programmers, AI has shown proficiency in generating syntactically correct windowing functions, which are often tedious to develop and beyond the skillset of most business users. Machine Learning (ML) techniques can generate simple queries for non-experts, which can be easily verified to produce accurate results. AI has already demonstrated its ability to comprehend natural language queries that aid programming on MySQL, making it a preferred protocol due to the abundance of training data. AI can understand schema and apply best practices for SQL. However, AI needs to effectively distinguish between transaction and analysis types of queries or maintain cross-sharding consistency. This necessitates an AI assistant programming approach that leverages a more versatile, user-friendly, and flexible database.

The emerging architectural approach supports the delivery of real-time insights and capabilities, leveraging AI. Databases are becoming the backbone of real-time AI, used with streaming technologies.

AI introduces new methods for building and managing databases and elevates the databases' roles. Enterprises need to prepare for and harness AI's power with scalable, scalable data architectures capable of processing mixed workloads, highly available, and able to deliver intelligence on demand.

· 4 min read

Artificial Intelligence (AI) is a dynamic field that combines computing and cognitive sciences. It involves the study of intelligent agents that can perceive and respond to their environments to achieve specific goals. Examples of these agents range from chatbots like Siri and Alexa to sensor and actuator-based systems found in Roomba vacuum cleaners and Tesla cars. AI aims to mimic human and animal intelligence and creativity through machines and code.

AI research focuses on human behaviours and characteristics such as pattern recognition, problem-solving and decision-making, learning and knowledge representation, communication, and emotions. While some AI advancements, like Boston Dynamics robots performing synchronized gymnastics, garner millions of views on social media, many AI applications like recommendation and search engines, banking and investment software, shopping and pricing bots have become so commonplace that their effectiveness is often overlooked.

The concept of thinking machines dates back to ancient civilizations. From Hesiod's tale of the lethal autonomous robot Talos in 700 BC to Samuel Butler's 1872 utopian novel Erewhon featuring conscious, self-replicating machines, the idea of artificial intelligence has been a recurring theme in literature and film. This theme often explores authenticity, personhood, companionship, loneliness, dystopia, and immortality.

The development of AI as a scientific field is interdisciplinary. It draws ideas from cybernetics, which studies the role of mammalian neural pathways and connections in producing homeostasis and intelligent control. This field inspired pioneers like John von Neumann, Warren McCulloch, Walter Pitts, and Claude Shannon, whose work continues to influence system theory, artificial neural networks, and AI.

Cognitive psychology, which emerged as a reaction against behaviourism in the 1950s, is another source of AI ideas. It combines the information theory work of Claude Shannon, Alan Turing's conception of mental activity as computation, Allen Newell and Herbert Simon's information processing models of human perception, memory, communication, and problem-solving, and Noam Chomsky's generative linguistics.

A third source for AI is rule-based and symbolic representations of problems, also known as good old-fashioned AI (GOFAI). GOFAI nurtured knowledge-based expert systems that emulated human decision-making in various academic fields and commercial applications.

Machine learning, a subfield of AI, uses computer algorithms to build systems that can learn autonomously from a given database and experiences. It is divided into three broad types: supervised, unsupervised, and reinforcement learning. Supervised learning algorithms rely on labelled training data provided by human specialists, while unsupervised machine learning algorithms search for patterns or structures in unlabeled datasets. Reinforcement learning involves intelligent agents interacting directly with the environment to achieve rewards or attain goals by trial, error, and feedback.

Deep learning is a subfield of machine learning inspired by the structures and functions of the human brain. It has led to significant advances in speech recognition and natural language processing (NLP), computer vision and image recognition, neuromorphic computing, sustainability science, bioinformatics, and smart devices and vehicles.

AI research is also influenced by philosophy, particularly the questions of ethics and consciousness. The ethics of AI extend back to Isaac Asimov's Three Laws of Robotics and continue to be a topic of discussion today, especially with the rise of algorithmic bias and discrimination. Efforts to address these issues include

  • European Union's General Data Protection Regulation (GDPR),
  • Ethics Guidelines for Trustworthy AI (2018)
  • Proposed Artificial Intelligence Act (2021)

AI autonomy in motor vehicles, autonomous weapons systems, and caregiver robots presents new opportunities and threats. The Society of Automotive Engineers International (SAE) defines six levels of driver automation, ranging from level 0, where the human driver is in complete control, to level 5, where the vehicle can drive itself under all conditions without a human being present.

Lethal autonomous weapons systems (LAWS) are divided into levels of AI autonomy, from human-in-the-loop weapons that operate under direct human authority to human-out-of-the-loop weapon systems that identify, target, and destroy enemies without human oversight.

AI also can impact the nature and future of work significantly. While it threatens to displace millions of workers in various industries, it could also address persistent shortages in others, such as trucking. The impact could be even more significant with advances in quantum artificial intelligence (QAI) and superintelligence, which could revolutionize areas like traffic management, pharmaceutical discovery, and military encryption but could inversely pose existential risks to human civilization.