🔈 Big News! Agentgenesis is an Official 🔈 IBM Watsonx Partner!

Docs
Unstructured Loader

Unstructured Loader

With the Unstructured Loader, you can load various types of files and extract their content using the Unstructured API.

Installation

Install peer dependencies:

npm install unstructured-client --include=dev

Add Environment Variables

.env
UNSTRUCTURED_API_KEY = 'YOUR_SAMPLE_API_KEY';
/* You can get one from - https://unstructured.io/api-key-hosted */

Copy the code

Add the following code to your utils/unstructuredLoader.ts file:

unstructuredLoader.ts
import { UnstructuredClient } from "unstructured-client";
import { PartitionResponse } from "unstructured-client/sdk/models/operations";
import {
  PartitionParameters,
  Strategy,
} from "unstructured-client/sdk/models/shared";
import * as fs from "fs";
 
interface UnstructuredLoaderProps {
  apiKey: string;
  baseUrl?: string;
}
 
interface LoadUnstructuredDirectoryDataParams {
  filePath: string;
  fileName: string;
  options?: Omit<PartitionParameters, "files">;
  returnText?: boolean;
}
 
interface LoadUnstructuredFileDataParams {
  fileContent: Uint8Array;
  fileName: string;
  returnText?: boolean;
  options?: Omit<PartitionParameters, "files">;
}
 
export class UnstructuredLoader {
  private client: UnstructuredClient;
 
  constructor(props: UnstructuredLoaderProps) {
    const { apiKey, baseUrl } = props;
 
    if (!apiKey || apiKey.trim().length === 0) {
      throw new Error("No API key provided for Unstructured!");
    }
 
    this.client = new UnstructuredClient({
      ...(baseUrl && baseUrl.trim().length !== 0 ? { serverURL: baseUrl } : {}),
      security: {
        apiKeyAuth: apiKey,
      },
    });
  }
 
  async loadUnstructuredDirectoryData(
    params: LoadUnstructuredDirectoryDataParams
  ) {
    const { filePath } = params;
    const fileContent = fs.readFileSync(filePath);
    return this.processFileData({ ...params, fileContent });
  }
 
  async loadUnstructuredFileData(params: LoadUnstructuredFileDataParams) {
    return this.processFileData(params);
  }
 
  private async processFileData({
    fileContent,
    fileName,
    options,
    returnText,
  }: LoadUnstructuredFileDataParams) {
    try {
      const res: PartitionResponse = await this.client.general.partition({
        partitionParameters: {
          files: {
            content: fileContent,
            fileName,
          },
          strategy: options?.strategy ?? Strategy.Auto,
          ...options,
        },
      });
 
      if (res.statusCode !== 200) {
        throw new Error(`Unexpected status code: ${res.statusCode}`);
      }
 
      if (!res.elements || res.elements.length === 0) {
        throw new Error("No elements returned in the response");
      }
 
      return returnText ? this.extractText(res.elements) : res.elements;
    } catch (error: any) {
      throw new Error(`Error processing file data: ${error.message}`);
    }
  }
 
  private extractText(elements: Array<{ [k: string]: any }>): string {
    return elements
      .map((el) => el.text)
      .filter(Boolean)
      .join("\n");
  }
}
 
 

Usage

Initialize client

Initialize the UnstructuredLoader client.

import { UnstructuredLoader } from "@utils/unstructuredLoader";
 
const loader = new UnstructuredLoader({
  apiKey: process.env.UNSTRUCTURED_API_KEY,
});
 

Load files from local directory

Provide the local file path to initiate it's content extraction.

const elementsFromDirectory = await loader.loadUnstructuredDirectoryData({
  filePath: "./sample.png",
  fileName: "Sample_File",
  returnText: true,
});

Load files directly

Files can also be loaded directly, in this example assuming they are received as FormData.

const data = await request.formData();
const file: File | null = data.get("file") as File;
 
const arrayBuffer = await file.arrayBuffer();
const uint8Array = new Uint8Array(arrayBuffer);
 
const elementsFromFile = await loader.loadUnstructuredFileData({
  fileContent: uint8Array,
  fileName: "Sample_File",
  returnText: true,
});

Props

UnstructuredLoader

PropTypeDescriptionDefault
apiKeystringThe API key for Unstructured.io""
baseUrlstring?Server URL in case of self-hosting""

loadUnstructuredDirectoryData

PropTypeDescription
filePathstringThe local file path of the file.
fileNamestringName of the file.
returnTextboolean?If true, the data returned will be a single string.
optionsoptionalAdditional options as specified in the Unstructured documentation.

loadUnstructuredFileData

PropTypeDescription
fileContentUint8ArrayUint8Array content of the file.
fileNamestringName of the file.
returnTextboolean?If true, the data returned will be a single string.
optionsoptionalAdditional options as specified in the Unstructured documentation.

Credits

This component is built on top of Unstructured Typescript SDK