Split a dataset

更新时间:
复制 MD 格式

This topic describes how to use the API to split a dataset into a training dataset and a test dataset.

Function path

fascia.data.horizontal.dataframe.train_test_split

Function definition

def train_test_split(data: HDataFrame, 
                     ratio: float, 
                     random_state: int = None, 
                     shuffle: bool = True) -> (HDataFrame, HDataFrame):

Parameters

Parameter

Type

Description

data

HDataFrame

The federated dataset to split.

ratio

Float

The split ratio. The value must be between 0 and 1, inclusive. The value can be accurate to three decimal places.

random_state

Integer

The random number seed. If specified, the split result is consistent for the same seed. The default value is None.

shuffle

Bool

Specifies whether to shuffle the data. The default value is True.

Example

from fascia.data.horizontal.dataframe import train_test_split
# Split an existing federated dataset and save the two resulting datasets.
# Assume that fed_df is a pre-existing federated dataset.
train_set, test_set = train_test_split(fed_df, 0.7) 
save_fed_dataframe(train_set, '$output1')
save_fed_dataframe(test_set, '$output2')

Return value

A tuple containing two federated tables.