The LLM-LaTeX Remove Comments component is used for text data preprocessing in large language model (LLM) workflows. It operates on documents in TEX format to remove comment lines and inline comments from LaTeX text.
Supported computing resources
Algorithm
The component uses the following regular expressions to identify and remove comments in LaTeX text:
|
Type |
Regular expression |
|
Comment line |
|
|
Inline comment |
|
The component finds all strings that match these regular expressions and replaces them with an empty string. The following example shows this process:
|
Before
|
After
|
Configure the component
Add the LLM-LaTeX Remove Comments component to your Designer workflow and configure its parameters in the pane on the right.
|
Parameter group |
Parameter |
Description |
|
Field settings |
Select target column |
Select one or more columns to process. |
|
Remove all comment lines |
Removes all comment lines when selected. |
|
|
Remove all inline comments |
Removes all inline comments when selected. |
|
|
Set output table lifecycle |
Specifies the number of days before the temporary output table is deleted. This value must be a positive integer. The default is 28. |
|
|
Performance tuning |
Number of CPUs per instance |
The number of CPUs for each map task instance. Value range: 50–800. Default value: 100. |
|
Memory size per instance (MB) |
The memory size for each map task instance, in MB. Value range: 256–12288. Default value: 1024. |
|
|
Data size per instance (MB) |
The maximum amount of data in MB that each map task instance can process. Value range: 1 to Integer.MAX_VALUE. Default value: 256. You can use this parameter to control the input volume for each map task. |