LLM-Remove LaTeX Comment Lines (DLC)

更新时间:
复制 MD 格式

The LLM-Remove LaTeX Comment Lines (DLC) component removes both full-line and inline comments from TeX-formatted text. The input OSS data file must be in the JSONL format (example). In a JSONL file, each line is a self-contained JSON object.

Supported computing resources

DLC

Algorithm

The component uses the following regular expressions to identify comments in LaTeX-formatted text:

Comment type

Regular expression

Full-line comments

r'(?m)^%.*\n?'

Inline comments

r'[^\\]%.+$'

The component finds all strings that match these regular expressions and replaces them with an empty string. The following example shows how this works.

Before processing

%%
%% This is file 'sample-sigconf.tex',
%% The first command in your LaTeX source must be the \documentclass command.
\documentclass[sigconf,review,anonymous]{acmart}

%% NOTE that a single column version is required for
%% submission and peer review. This can be done by changing
\input{math_commands.tex}
%% end of the preamble, start of the body of the document source.
\begin{document}

%% The "title" command has an optional parameter,
%% allowing the author to define a "short title" to be used in page headers.
\title{Hierarchical Cross Contrastive Learning of Visual Representations}

\author{Hesen Chen}

After processing

In the Current Field Value dialog box, the field contains the cleaned LaTeX source code, including the following lines: \documentclass[sigconf,review,anonymous]{acmart}, \input{math_commands.tex}, \begin{document}, \title{Hierarchical Cross Contrastive Learning of Visual Representations}, and \author{Hesen Chen}.

Configure the component

On the Designer workflow page, add the LLM-Remove LaTeX Comment Lines (DLC) component and configure its parameters in the right-side pane.

Parameter type

Parameter

Required

Description

Default

Field settings

Field to process

Yes

The field to process.

None

Remove all full-line comments

No

Specifies whether to remove all full-line comments.

Selected

Remove all inline comments

No

Specifies whether to remove all inline comments.

Selected

OSS output directory

No

The OSS directory for the output data. If empty, the component uses the default workspace path.

None

Execution tuning

Number of processes

No

The number of concurrent processes for the job.

8

Select resource group

Public resource group

No

Select a node specification (CPU or GPU instance specification), the number of nodes, and a VPC.

None

Dedicated resource group

No

Select the number of CPU cores, memory, shared memory, number of GPUs, and number of nodes.

None

Maximum runtime

No

The component's maximum runtime. The job is terminated if it exceeds this limit.

None