Data ingestion

更新时间:
复制 MD 格式

Collect, transfer, store, and normalize data from various sources. Create a complete and correlated dataset of cloud usage and costs for analysis.

Manage data sources

  • Identify suitable external data sources based on the requirements for Reporting and analysis and Unit economics.

  • Identify suitable internal business data with contextual information to support cost allocation policies.

  • Identify suitable data sources for the cross-functional collaboration framework.

  • Define the granularity for each data source.

  • Define the elements, dimensions, and metrics to collect from each data source.

  • Establish and maintain relationships with vendors. This includes data source providers and tool providers that acquire or process data for the organization.

Ensure data quality

  • Define and maintain mechanisms to produce stable and normalized data for FinOps practices.

  • Define and maintain mechanisms to assess data quality and consistency.

  • Adjust and manage data source documentation and content expectations based on changes identified in ingested data.

  • Develop and maintain observability and alerting features. These features notify teams when the data ingestion process exceeds established limits.

  • Inform data source owners and data users about issues with availability, quality, or consistency.

  • Collect policy and governance requirements related to data ingestion frequency, timeliness, granularity, auditing, and data protection. This includes data for usage, cost, and supplementary information such as sustainability and observability.

Ensure data timeliness and availability

  • Identify or design a data warehouse to store data that meets reporting and analysis needs. Define the access methods, maintenance responsibilities, and management policies for the data warehouse.

  • Define normalization operations between data sources. Specify where to perform normalization and which standards and database-related keys to use.

  • Submit data requirements to data providers based on the needs for reporting, analysis, and unit economics.

  • Maintain the data warehouse. Ensure it has the appropriate scale, cost, performance, elasticity, and availability throughout its lifecycle.

  • Incorporate data ingestion metrics and performance into unit economics.

  • Provide guidance to everyone in the organization on how to access ingested data. Clarify expectations for data quality and availability.

Definition

Data ingestion covers the collection, transfer, processing, transformation, and correlation of various datasets. It creates a data warehouse with appropriate granularity, accessibility, and integrity. This ensures the data is queryable and contextualized to support the activities of all FinOps roles across all FinOps capabilities.

Data ingestion requirements vary greatly depending on how each organization practices FinOps:

  • An organization that relies entirely on a cloud service provider's tools for all FinOps capabilities may not need to import or process any data.

  • An organization that uses a third-party FinOps tool provider can rely fully or partially on the provider to acquire data.

  • An organization that requires data beyond cloud usage data will have a more complex data ingestion process.

Therefore, as an organization evolves and its data needs for other capabilities change, its data ingestion capability must also evolve.

The data that supports FinOps activities must include cloud cost and usage data from cloud services or other pay-as-you-go service providers. It may also include the following:

  • Modified cloud bill data, such as adjusted or marked-up data

  • Carbon usage data

  • Cloud resource utilization or performance data

  • Observable data

  • Data from self-built, hybrid, or Apsara Stack clouds

  • Metadata from a Configuration Management Database (CMDB) or other service management systems

  • IT Asset Management (ITAM) data for license or issuance fees

  • Cloud data from specialized tools, such as Kubernetes usage

  • Business data, such as revenue, user count, or transaction volume

  • Other data or metadata that provides context for cloud usage and costs

FinOps capabilities such as cost allocation, reporting and analysis, and unit economics provide specific requirements. These requirements determine the necessary data sources, granularity, normalization level, correlation, and storage methods at any given time.

Effective FinOps requires detailed data streams that are updated regularly and iterated frequently. These streams include information such as usage, utilization, and cost. This data must be classifiable, correlatable, and analyzable to provide a solid foundation for decision-making.

For organizations new to the cloud, data ingestion can be an initial challenge. Cloud service providers' cost and usage datasets are large and complex. Each provider uses proprietary schemas and data structures. The complexity, scale, inconsistency, and latency of cloud data create obstacles for using standard business intelligence (BI) tools or building custom ones. Given the size and scale of the data, effective analysis is difficult without advanced technology or big data capabilities.

The FinOps Open Cost & Usage Specification (FOCUS) project is dedicated to providing consistency and standardization for cloud cost data, with the goal of eventually expanding to data sources from other providers, including SaaS provider data, sustainability data, license publisher data, and data from Apsara Stack and observability providers. As vendors and data providers adopt the FOCUS specification, organizations will benefit from the interoperability of common and custom tools.

Observability platforms, security platforms, carbon usage platforms, and business operations applications also have very large datasets that may need to be correlated with cloud data. Metadata created as part of a tagging or cost allocation policy, managed by the cost allocation capability, can provide important keys for correlating, contextualizing, and summarizing these datasets. Data ingestion ensures that tags or labels generated by the cloud platform are integrated and mapped to internal allocation metadata.

The data ingestion capability can discover or build cloud data sources for reporting and analysis. An organization can create a public data warehouse for cloud data or use an existing one. The choice depends on the data's complexity, the organization's needs, and whether the data must be connected to other data sources.

The goal of data ingestion is not to accumulate the largest, most granular, real-time dataset available. Instead, the goal is to collect and integrate valuable data for the organization at its current maturity level. Over time, the required data will also change as an organization's analytical needs mature, its service types diversify, it uses other cloud or SaaS products, and its internal policies and usage patterns change.

The data ingestion capability triggers actions in these situations. These actions include building or adding data sources, obtaining more detailed data, combining data with metadata, standardizing data formats, developing specific tools, and improving data ingestion efficiency and convenience. This work proceeds incrementally as FinOps maturity increases and as value is gained from investing in these steps.

Maturity assessment

Crawl

  • Use tools provided by the cloud service provider without acquiring specific data.

  • Use summary files of cost and usage data, or retrieve aggregate data from a single data source or cloud service provider through an API.

  • Analyze data from different sources separately without normalization.

  • Focus primarily on cloud cost and usage data, with little to no data collected from other sources.

  • Apply uniform tags, naming standards, and hierarchies in key cloud platforms and data resources to manually correlate different sources.

  • Data sources required for cost allocation, reporting, and analysis are available, such as utilization data, carbon data, and internal metadata.

  • Summarizing data requires manual intervention or multi-step transformations.

Walk

  • Import resource-level data from multiple cloud service providers and other relevant data sources.

  • Normalize cost metrics from different data sources to provide a standard, consistent data warehouse.

  • Use one or more third-party FinOps tools or platforms to normalize data, or use the dimensions and metrics from a FOCUS-compliant provider.

  • Generate consistent reports for different clouds, possibly using different reporting methods.

  • Map data to the business and adapt it as business needs change.

  • Collect most historical data to analyze annual trends.

  • Collect performance data, utilization data, and carbon usage data.

  • Data integrity check mechanisms are in place.

Run

  • A comprehensive and unified data warehouse is available that includes data on cloud usage, costs, performance, sustainability, and utilization.

  • Acquire data at the most granular level to support more complex data analysis or reporting needs.

  • Use FOCUS or other standards to normalize dimensions and cost metrics from all data sources, achieving consistent reporting across multicloud environments.

  • Map data to the business as needs change and collect historical data.

  • Collect all historical data for in-depth trend analysis.

  • Collect non-cloud data from SaaS, software license vendors, or other content providers.

  • Data integrity mechanisms are in place, including quality checks and automated processes.

Functional activities

FinOps practitioners

  • Collaborate with other FinOps roles to identify a list of data sources that meet current reporting, analysis, and operational needs.

  • Identify data gaps and collaborate with relevant teams to improve source data.

  • Determine the required granularity for each data source.

  • Establish a data normalization model to correlate fields from various sources.

  • Proactively validate data source content regularly. Track changes, respond promptly with adjustments, re-document, and inform all relevant personnel.

  • Throughout the data lifecycle, ensure that data sources and data stores have accurate cost and usage information, are appropriately sized, backed up promptly, and well-managed.

  • Provide data access methods to ensure data is accessible to those who need it.

  • Create reports and document expectations. Update them regularly as maturity increases.

  • Use the FOCUS test case library and collaborate with other FinOps roles to determine the FOCUS datasets required for FinOps practices.

Product

  • Provide business or product information, define KPIs, or supply other required information based on FinOps practice requirements.

Finance

  • Provide access permissions to data sources as required by FinOps practices.

  • Ensure the finance department uses the latest, most appropriate data sources for reporting, forecasting, and decision-making.

  • Participate in or lead data integrity and quality validation efforts. Ensure that invoices, data sources, and other information can be correlated and matched as expected. This typically involves reconciling data monthly or periodically by comparing data from the cloud service provider's native tools with standardized data.

  • Minimize or eliminate data changes outside of the source data systems. Record and reconcile all source changes to ensure traceability.

Procurement

  • In contracts or managed interactions with data source vendors, clearly define data ingestion requirements and standards to ensure the necessary data access permissions are obtained.

  • Make necessary requests to data source vendors to ensure contractual obligations and terms are enforced in line with the needs of FinOps practices.

Engineering

  • Within the scope of engineering permissions, provide access to performance and usage monitoring information for use by the FinOps data warehouse.

  • Identify data issues or inconsistencies in data used for analysis or reporting to ensure data integrity and accuracy and maintain data quality.

Leadership

  • Support a centralized data normalization policy to fulfill requests for the different types of information required by FinOps practices.

  • Promote and reinforce communication that emphasizes the need to establish a single source of truth for cloud usage and cost data to support reporting and decision-making.

Related roles

  • Within their scope of permissions, provide access to data sources or content to correlate information with the FinOps data warehouse content.

  • Develop data architectures and formats to support data normalization and information correlation in the FinOps data warehouse over time.

  • Identify emerging FinOps data issues, inconsistencies, or changes to help the data store owner maintain data quality and availability.

  • Collaborate with third-party platform providers to resolve data discrepancies between data sources and their reports.

Success metrics and KPIs

  • Receive data from cloud service providers once or multiple times daily. For example:

    • Receive cloud cost and usage reports at least once per day.

    • Generate exported bills at least once per day.

  • All required data sources are identified according to established protocols, have consistent and standard formats, and are accessible.

  • Successfully complete data quality checks.

  • Successfully complete FOCUS Validator checks.

  • Investigate data quality or availability issues from automated notifications or reports within one business day and resolve them within three business days.

  • Data ingestion and processing occur within the expected timeframe.

  • When a data source changes, identify the change within one business day and adjust data storage or processing parameters within three business days.

  • New data sources, finer granularity, or new data correlations are identified and enabled as needed.

  • Data timeliness: The time since the last update for each data source compared to the expected time.

  • Data ingestion time: The actual time from receiving a new version of data to storing the raw data in the FinOps data warehouse, compared to the expected time.

  • Extract, transform, and load (ETL)/normalization/correlation time: The actual time from storing raw data from each data source to completing data correlation, normalization, transformation, and storage in the data warehouse, compared to the expected time.

  • Percentage (%) of total cost presented in a standardized way.

  • Matching percentage (%) of metadata elements.

Inputs and outputs

  • Cloud service provider cost and usage data, generated at the required granularity, definition, and frequency.

  • FOCUS datasets, containing usage and cost data from cloud service providers, along with supplementary data such as sustainability, observability, and SaaS data.

  • Utilization, performance, or observable data, including system metrics such as CPU, memory, disk, or network utilization, provided at the required resource or resource group level.

  • Log or system transaction data that records the number of uses or quantity of various resources, typically shared resources.

  • Business performance data that provides context, such as customer numbers, revenue or sales figures, transaction counts, or other business outcomes, to explain cloud costs and usage.

  • KPIs determined by unit economics, which support the collection and correlation of data elements in the FinOps data warehouse.

  • Compliance and governance requirements, which use the acquired data to achieve overall cloud policies and other performance targets set by FinOps.

Alibaba Cloud related capabilities

Bill Management

Bill overview

Bill details

Split bill details

Amortized cost