HomeDocumentation

Getting Started

Learn how to set up Harp, create templates, and start automating your document workflows in minutes.

Generate Templates with AI

You can use any LLM (ChatGPT, Claude, Gemini, etc.) to generate Harp template JSON files, then import them directly into the app. This is the fastest way to create templates for complex documents.

How It Works

  1. Copy the prompt below and paste it into any LLM
  2. Replace the placeholder with a description of your document type
  3. Copy the JSON output from the LLM
  4. Save it as a .json file
  5. In Harp, go to Templates and click Import Template
  6. Select your JSON file — done

Template JSON Schema

Every Harp template is a JSON file with this structure. The key part is the extractionFields array — it defines what data gets extracted.

PropertyTypeDescription
namestringTemplate name (e.g., "Invoice", "Bank Statement")
systemPromptstringAlways: "You are a data extraction agent. Extract data in the required structured format."
outputFormatstringAlways: "xlsx"
extractionFieldsarrayThe fields to extract (see field schema below)

Field Schema

Each field in extractionFields has these properties:

PropertyTypeDescription
namestringsnake_case field name (e.g., vendor_name)
descriptionstringClear description of what to extract
typestring"string", "number", or "array"
requiredbooleanWhether this field is required (usually true)
expandToRowsbooleanOnly for array type. Set to true to expand rows in Excel
itemsarrayOnly for array type. Sub-fields for each row
Use "string" for text and dates. Use "number" for amounts and quantities. Use "array" with expandToRows for repeated rows like line items or transactions.

Schema Example

Template JSON Structure
{
  "name": "Template Name",
  "systemPrompt": "You are a data extraction agent. Extract data in the required structured format.",
  "outputFormat": "xlsx",
  "extractionFields": [
    {
      "name": "field_name",
      "description": "what this field contains",
      "type": "string",
      "required": true
    },
    {
      "name": "amount",
      "description": "the total amount",
      "type": "number",
      "required": true
    },
    {
      "name": "line_items",
      "description": "the repeated rows of data",
      "type": "array",
      "required": true,
      "expandToRows": true,
      "items": [
        {
          "name": "item_description",
          "description": "description of the item",
          "type": "string",
          "required": true
        },
        {
          "name": "item_amount",
          "description": "cost of the item",
          "type": "number",
          "required": true
        }
      ]
    }
  ]
}

Full Example: Bank Statement

Here's a real template for extracting bank statement transactions. Each transaction row gets its own row in the Excel output.

bank-statement.json
{
  "name": "Bank Statement",
  "systemPrompt": "You are a data extraction agent. Extract data in the required structured format.",
  "outputFormat": "xlsx",
  "extractionFields": [
    {
      "name": "transactions",
      "description": "the transactions",
      "type": "array",
      "required": true,
      "expandToRows": true,
      "items": [
        {
          "name": "transaction_date",
          "description": "the date of the transaction",
          "type": "string",
          "required": true
        },
        {
          "name": "value_date",
          "description": "the date of the value",
          "type": "string",
          "required": true
        },
        {
          "name": "description_particulars",
          "description": "the description of the transaction",
          "type": "string",
          "required": true
        },
        {
          "name": "deposit",
          "description": "amount deposited",
          "type": "number",
          "required": true
        },
        {
          "name": "withdrawals",
          "description": "amount withdrawn",
          "type": "number",
          "required": true
        },
        {
          "name": "running_balance",
          "description": "the running balance after the transaction",
          "type": "number",
          "required": true
        },
        {
          "name": "bank_name",
          "description": "the name of the bank",
          "type": "string",
          "required": true
        },
        {
          "name": "account_number",
          "description": "the account number",
          "type": "string",
          "required": true
        }
      ]
    }
  ]
}

Prompt for Any LLM

Copy this prompt, paste it into any LLM, and replace [DESCRIBE YOUR DOCUMENT TYPE HERE] with a description of your document. The LLM will generate a ready-to-import JSON template.

LLM Prompt — copy and paste into ChatGPT, Claude, etc.
I need you to generate a Harp template JSON for extracting data from [DESCRIBE YOUR DOCUMENT TYPE HERE].

Here is the JSON schema for a Harp template:

- "name" (string): A descriptive name for the template.
- "systemPrompt" (string): Always set to "You are a data extraction agent. Extract data in the required structured format."
- "outputFormat" (string): Always set to "xlsx".
- "extractionFields" (array): The fields to extract. Each field has:
  - "name" (string): snake_case field name (e.g., "invoice_number", "vendor_name")
  - "description" (string): Clear description of what to extract. Be specific — mention formats, edge cases, and where the data typically appears on the document.
  - "type" (string): One of "string" or "number"
  - "required" (boolean): Whether this field is required (usually true)

For repeated/tabular data (like line items or transaction rows), use:
  - "type": "array"
  - "expandToRows": true
  - "items" (array): Sub-fields for each row, each with name, description, type, and required.

Rules:
1. Use snake_case for all field names.
2. Use "string" for text, dates, and IDs. Use "number" for amounts and quantities.
3. Use "array" with "expandToRows": true for any repeated rows of data.
4. Write detailed descriptions — they directly affect extraction quality.
5. Output valid JSON only, no extra text.

Example output for a bank statement:
{
  "name": "Bank Statement",
  "systemPrompt": "You are a data extraction agent. Extract data in the required structured format.",
  "outputFormat": "xlsx",
  "extractionFields": [
    {
      "name": "transactions",
      "description": "the transactions",
      "type": "array",
      "required": true,
      "expandToRows": true,
      "items": [
        {
          "name": "transaction_date",
          "description": "the date of the transaction",
          "type": "string",
          "required": true
        },
        {
          "name": "value_date",
          "description": "the date of the value",
          "type": "string",
          "required": true
        },
        {
          "name": "description_particulars",
          "description": "the description of the transaction",
          "type": "string",
          "required": true
        },
        {
          "name": "deposit",
          "description": "amount deposited",
          "type": "number",
          "required": true
        },
        {
          "name": "withdrawals",
          "description": "amount withdrawn",
          "type": "number",
          "required": true
        },
        {
          "name": "running_balance",
          "description": "the running balance after the transaction",
          "type": "number",
          "required": true
        },
        {
          "name": "bank_name",
          "description": "the name of the bank",
          "type": "string",
          "required": true
        },
        {
          "name": "account_number",
          "description": "the account number",
          "type": "string",
          "required": true
        }
      ]
    }
  ]
}

Now generate a template for: [DESCRIBE YOUR DOCUMENT TYPE HERE]
After the LLM generates the JSON, save it as a .json file and import it in Harp via Templates > Import Template. Review the fields and adjust descriptions if needed.

Quick Start

Get up and running with Harp in just a few minutes. Follow these steps to install the app and process your first document.

1. Download and Install

Download Harp for your operating system from our website. Harp is available for macOS and Windows.

Harp comes with 25 free pages so you can try it out before purchasing credits. No credit card required.

2. Create a Template

Templates tell Harp what data to extract from your documents. You'll need at least one template before processing files.

  1. Open Harp and click Templates in the sidebar
  2. Click New Template
  3. Give it a name (e.g., "Invoice") and add the fields you want to extract
  4. Click Save

You can also generate templates with AI or see the Templates section for a detailed guide.

3. Process Your First Document

You're ready to extract data from your first PDF or image.

  1. Drag and drop a PDF or image onto the Harp window, or use the Add Files button
  2. Select the template you created
  3. Click Process and wait for the extraction to complete
  4. Review the extracted data and export to Excel
For best results, use high-quality scans with clear text. Harp works with both text-based PDFs and scanned images.

Creating Templates

Templates are the core of Harp. They define exactly what data you want to extract from your documents. A well-crafted template is the difference between accurate, useful data and messy results.

What Is a Template?

A template is a collection of fields that describe the data points you need from a document. Each field has a name, a type, and an optional description. When Harp processes a document, it uses your template to understand what information to look for and how to structure it.

You can create as many templates as you need — one for invoices, another for contracts, one for receipts, and so on. Templates are reusable across any number of documents.

Creating a New Template

  1. Click Templates in the sidebar
  2. Click New Template
  3. Enter a descriptive name (e.g., "Vendor Invoice", "Patient Intake Form")
  4. Add fields using the Add Field button
  5. For each field, set the name, type, and write a clear description
  6. Click Save Template when you're done

Field Types

Choose the right field type for each piece of data you want to extract. This helps Harp format the output correctly.

TypeUse ForExample
textNames, addresses, descriptions, IDs"Acme Corp", "INV-2024-001"
numberAmounts, quantities, percentages1250.00, 42, 18.5
dateAny date value2024-01-15, Jan 15 2024
booleanYes/No fields, checkboxestrue, false
tableRepeated rows (line items, entries)Invoice line items, transaction rows
listMultiple values for one fieldSkills, tags, categories

Template Examples

Here are some common templates to help you get started. Use these as a starting point and customize them for your documents.

Example: Invoice Template
Template Name: Invoice

Fields:
  vendor_name      (text)    "Company or person issuing the invoice"
  invoice_number   (text)    "Unique invoice identifier or reference number"
  invoice_date     (date)    "Date the invoice was issued"
  due_date         (date)    "Payment due date"
  subtotal         (number)  "Amount before tax"
  tax_amount       (number)  "Total tax applied"
  total_amount     (number)  "Final amount due including tax"
  line_items       (table)   "Each line item with description, quantity, unit price, and amount"
Example: Receipt Template
Template Name: Receipt

Fields:
  store_name       (text)    "Name of the store or merchant"
  date             (date)    "Date of purchase"
  payment_method   (text)    "How payment was made (cash, card, etc.)"
  items            (table)   "Each purchased item with name, quantity, and price"
  subtotal         (number)  "Total before tax"
  tax              (number)  "Tax amount charged"
  total            (number)  "Final total paid"
Example: Contract Template
Template Name: Contract

Fields:
  contract_title   (text)    "Title or subject of the contract"
  party_a          (text)    "First party name and details"
  party_b          (text)    "Second party name and details"
  effective_date   (date)    "Date the contract takes effect"
  expiry_date      (date)    "Date the contract expires, if any"
  contract_value   (number)  "Total contract value or fee"
  key_terms        (list)    "Important terms, conditions, or obligations"
  signatures       (boolean) "Whether the contract is signed by all parties"
Example: Medical Form Template
Template Name: Patient Intake

Fields:
  patient_name     (text)    "Full name of the patient"
  date_of_birth    (date)    "Patient's date of birth"
  phone            (text)    "Contact phone number"
  insurance_id     (text)    "Insurance policy or member ID"
  allergies        (list)    "Known allergies"
  medications      (list)    "Current medications"
  visit_reason     (text)    "Reason for the visit or chief complaint"

Writing Good Field Descriptions

The description you write for each field directly impacts extraction quality. It tells the AI what to look for and how to interpret the data. Be specific and mention any formatting preferences.

Good vs. Bad Descriptions
Good descriptions:
  "The final invoice total including all taxes and fees,
   as a number without currency symbols (e.g., 1250.00)"

  "Full legal name of the vendor company, not abbreviated.
   If multiple names appear, use the one from the letterhead."

  "Each line item should include: item description,
   quantity, unit price, and line total"

Bad descriptions:
  "total"
  "name"
  "items"
Think of the description as instructions you'd give to a person reading the document for the first time. The more context you provide, the better the extraction.

Table Fields

Table fields are used when a document contains repeated rows of data, like invoice line items or transaction entries. When you add a table field, you define the columns that each row should have.

For example, an invoice line items table might have columns for description, quantity, unit price, and amount. Harp will extract each row separately and export them to a dedicated sheet in the Excel output.

In the description for a table field, list out the columns you expect. For example: "Each line item with description, quantity, unit price, and total amount."

Tips for Better Results

1.

Start simple. Begin with a few key fields and add more once you see how the extraction performs.

2.

Use clear field names. Names like vendor_name and invoice_date are better than field1 or data.

3.

Be specific in descriptions. Mention formats, edge cases, and where on the document the data is typically found.

4.

Choose the right type. Using number for monetary values ensures they export as numbers in Excel, not text.

5.

Test with a sample document. Process one document first, review the results, and refine your template before batch processing.

Setting Up Watch Folders

Watch folders automatically process documents as they arrive. Drop a file into the folder and Harp handles the rest — extraction, formatting, and export happen without any manual steps.

How Watch Folders Work

When you configure a watch folder, Harp monitors it for new files. When a PDF or image appears, Harp automatically processes it using your specified template and exports the results to Excel.

Setting Up a Watch Folder

  1. Go to Watch Folders in the sidebar
  2. Click Add Folder
  3. Select the folder you want to monitor
  4. Choose the template to use for processing
  5. Configure output options (where to save the Excel file)
  6. Toggle the folder to Active
Watch Folder Configuration Example
Watch Folder Settings:
  Input Folder:    ~/Documents/Invoices/Incoming
  Template:        Invoice Template
  Output Folder:   ~/Documents/Invoices/Processed
  Output Format:   Excel (.xlsx)

  Options:
    - Move processed files to: ~/Documents/Invoices/Archive
    - Append to existing Excel file: Yes
    - Process subfolders: No

Workflow Example

A typical automated workflow might look like this:

Automated Invoice Processing Workflow
1. Email rule saves invoice attachments to ~/Invoices/Incoming
2. Harp detects new PDF in watch folder
3. Document is processed with Invoice Template
4. Extracted data is appended to invoices.xlsx
5. Original PDF is moved to ~/Invoices/Archive
6. You review the Excel file at your convenience
Combine watch folders with email rules or scanner software to create fully automated document processing pipelines.

Excel Export

Harp exports extracted data to Excel format for easy analysis and integration with your existing workflows.

Export Options

When exporting, you can choose between several options:

Export Options
Export Formats:
  - Excel (.xlsx) - Recommended for most use cases
  - CSV (.csv) - For legacy system compatibility

Export Modes:
  - New file - Create a fresh file for each export
  - Append - Add new rows to an existing file

Excel Output Structure

Exported Excel files follow a consistent structure with columns matching your template fields.

Example Excel Output
| vendor_name | invoice_number | invoice_date | total_amount |
|-------------|----------------|--------------|--------------|
| Acme Corp   | INV-2024-001   | 2024-01-15   | 1,250.00     |
| Beta LLC    | INV-2024-002   | 2024-01-18   | 875.50       |
| Gamma Inc   | INV-2024-003   | 2024-01-20   | 2,100.00     |

Handling Table Fields

When your template includes table fields (like line items), Harp creates a separate sheet for the nested data with a reference to the parent row.

Multi-sheet Excel Output
Sheet 1: Main Data
| invoice_id | vendor_name | total_amount |
|------------|-------------|--------------|
| 1          | Acme Corp   | 1,250.00     |

Sheet 2: Line Items
| invoice_id | description     | quantity | price  |
|------------|-----------------|----------|--------|
| 1          | Widget A        | 10       | 50.00  |
| 1          | Widget B        | 5        | 150.00 |

Credits & Billing

Harp uses a simple pay-per-page credit system. No monthly subscriptions, no contracts — buy credits when you need them and use them at your own pace.

How Credits Work

1 credit = 1 page. Each page of a document costs one credit to process.

Credits never expire. Use them whenever you need — there's no time limit.

25 free pages included. Every new account starts with 25 free credits to try Harp.

Buy in bulk and save. Larger credit packs come at a lower per-page price.

Buying Credits

You can purchase credits directly from within the Harp app. Go to Settings and click Buy Credits. You'll be taken to a secure checkout page. Credits are added to your account instantly after purchase.

Checking Your Balance

Your current credit balance is always visible in the app. You can also see your usage history and how many credits each document used.

View our credit packs and pricing on the pricing page.

Troubleshooting

Common issues and how to resolve them.

Poor Extraction Quality

Improving Extraction Accuracy
Problem: Extracted data is missing or incorrect

Solutions:
1. Use higher quality scans (300 DPI recommended)
2. Ensure documents are not skewed or rotated
3. Add more detailed field descriptions in your template
4. Split multi-page documents if extraction is struggling
5. For table fields, describe expected columns in the description
6. Contact support if issues persist

Watch Folder Not Processing

Watch Folder Troubleshooting
Problem: Files in watch folder are not being processed

Solutions:
1. Check that the watch folder is set to "Active"
2. Verify Harp has read/write permissions to the folder
3. Ensure files are valid PDFs or supported image formats
   (PNG, JPG, JPEG, TIFF, BMP, WEBP)
4. Check if you have available credits in your account
5. Look for error messages in Harp's activity log

Application Won't Start

Startup Issues
macOS:
  - Right-click the app and select "Open" (first launch)
  - Check System Preferences > Security & Privacy
  - Try: xattr -cr /Applications/Harp.app

Windows:
  - Run as Administrator
  - Check Windows Defender isn't blocking the app
  - Try reinstalling with the latest version

Updates

Harp checks for updates automatically and will notify you when a new version is available. You can also manually check for updates from the app menu. We recommend always running the latest version for the best extraction accuracy and features.

Getting Help

If you run into issues or have questions, here are the best ways to get support.

Common Questions

What file formats does Harp support?

PDF, PNG, JPG, JPEG, TIFF, BMP, and WEBP. Both text-based and scanned documents are supported.

Is my data secure?

Your documents are processed securely and are never stored on our servers after processing. Harp runs on your desktop, so your files stay on your machine.

Can I use Harp offline?

Harp requires an internet connection to process documents, as it uses cloud-based AI for extraction. However, your documents are not permanently stored on any server.

Do credits expire?

No. Credits never expire. Buy them once and use them whenever you need.

Have a question not covered here? Email us at satya@querygen.ai — we typically respond within 24 hours.

Ready to Get Started?

Download Harp and start automating your document workflows today. 25 free pages included.