API Reference

readme_ready.index.index

Utility to Index a Repository and store them into a Vector Store

index

index(config)

Indexes a repository to generate documentation and vector store files

Processes the repository specified in the config to create JSON files, converts them to Markdown format, and builds a vector store from the Markdown documents. Creates the necessary directories for JSON, Markdown, and data outputs as specified in the configuration.

Parameters:	`config` (`AutodocRepoConfig`) – An AutodocRepoConfig instance containing configuration settings for indexing, including output paths, repository details, and processing options.

readme_ready.index.convert_json_to_markdown

Utility to Convert Summary JSON to Markdown

convert_json_to_markdown

convert_json_to_markdown(config)

Convert JSON summary to Markdown documents

Traverses the root directory, finds the summary JSON for each file and directory and converts them into Markdown format.

Parameters:	`config` (`AutodocRepoConfig`) – An AutodocRepoConfig instance containing configuration settings for indexing, including output paths, repository details, and processing options.

readme_ready.index.create_vector_store

Utilities to Create Vector Store

RepoLoader

Bases: BaseLoader

Class to load and process a repository

A loader class which loads and processes a repsitory given the root directory path and list of file patterns to ignore.

Typical usage example:

loader = RepoLoader(path, ignore)
docs = loader.load()

create_vector_store

create_vector_store(root, output, ignore, llms, device)

Creates a vector store from Markdown documents

Loads documents from the specified root directory, splits the text into chunks, creates a vector store using the selected LLM, and saves the vector store to the output path. Ignores files matching the patterns provided in the ignore list.

Parameters:

root (str) –

The root directory containing the documents to be processed.
output (str) –

The directory where the vector store will be saved.
ignore (List[str]) –

A list of file patterns to ignore during document loading.
llms (List[LLMModels]) –

A list of LLMModels to use for generating embeddings.
device (str) –

The device to use for embedding generation (e.g., 'cpu' or 'auto').

process_directory

process_directory(directory_path, ignore)

Processes a directory

Processes a specified directory, and converts all the content of the files in the directory into Document format. Ignores files matching the patterns provided in the ignore list.

Parameters:	`directory_path` (`str`) – The root directory containing the files to be processed. `ignore` (`List[str]`) – A list of file patterns to ignore during processing.

Returns:	`docs`( `List[Document]` ) – List of Documents with file contents and metatdata

process_file

process_file(file_path, ignore)

Processes a file

Processes a specified file and converts the content of the file into Document format. Ignores any file matching the patterns provided in the ignore list.

Parameters:	`file_path` (`str`) – The file to be processed. `ignore` (`List[str]`) – A list of file patterns to ignore during processing.

Returns:	`doc`( `Document \| None` ) – A Document with file contents and metatdata

readme_ready.index.process_repository

Utilities to Process Repository and Summarize File Contents

process_repository

process_repository(config, dry_run=False)

Process a repository to generate JSON summary using LLMs

Traverses through the repository and summarizes the contents of each file and directory using an LLM via. a summarization prompt and saves them into JSON files.

Parameters:	`config` (`AutodocRepoConfig`) – An AutodocRepoConfig instance containing configuration settings for indexing, including output paths, repository details, and processing options. `dry_run` (`bool`, default: `False` ) – Flag to enable dry run mode where the process runs over the directory without actual indexing the documents

readme_ready.query.query

Utility to Query a Code Repository or Generate README

query

query(repo_config, user_confg)

Queries the repository for information based on user input.

Initializes a question-answering chain, displays a welcome message, and enters a loop to prompt the user for questions about the repository. Processes each question by invoking the QA chain, updates the chat history, and displays the response in Markdown format. The loop continues until the user inputs 'exit'.

Parameters:	`repo_config` (`AutodocRepoConfig`) – An AutodocRepoConfig instance containing configuration settings for the repository. `user_confg` (`AutodocUserConfig`) – An AutodocUserConfig instance containing user-specific configuration settings.

generate_readme

generate_readme(repo_config, user_config, readme_config)

Generates a README file based on repository and user configurations.

Initializes a README generation chain, clears the terminal, and prepares the output file. Iterates over the specified headings in the README configuration, generates content for each section by invoking the chain, and writes the content in Markdown format to the README file. Handles any RuntimeError that occurs during the process.

Parameters:	`repo_config` (`AutodocRepoConfig`) – An AutodocRepoConfig instance containing configuration settings for the repository. `user_config` (`AutodocUserConfig`) – An AutodocUserConfig instance containing user-specific configuration settings. `readme_config` (`AutodocReadmeConfig`) – An AutodocReadmeConfig instance containing configuration settings for README generation.

readme_ready.query.create_chat_chain

Creates Chains for QA Chat or Readme Generation

make_qa_chain

make_qa_chain(project_name, repository_url, content_type, chat_prompt, target_audience, vector_store, llms, device='cpu', on_token_stream=False)

Creates a question-answering (QA) chain for the specified project

Initializes and configures the QA chain using the provided repository and user configurations. Selects the appropriate language model (LLM), sets up the retriever with a history-aware mechanism, and combines document chains for processing queries. The chain facilitates interaction with the vector store to retrieve and process relevant information based on user queries.

Parameters:

project_name (str) –

The name of the project for which the QA chain is being created.
repository_url (str) –

The URL of the repository containing the project.
content_type (str) –

The type of content to be processed (e.g., 'code', 'documentation').
chat_prompt (str) –

The prompt template used for generating chat responses.
target_audience (str) –

The intended audience for the QA responses.
vector_store (HNSWLib) –

An instance of HNSWLib representing the vector store containing document embeddings.
llms (List[LLMModels]) –

A list of LLMModels to select from for generating embeddings and responses.
device (str, default: 'cpu' ) –

The device to use for model inference (default is 'cpu').
on_token_stream (bool, default: False ) –

Optional callback for handling token streams during model inference.

Returns:	`Runnable` – A retrieval chain configured for question-answering, combining the retriever and document processing chain.

make_readme_chain

make_readme_chain(project_name, repository_url, content_type, chat_prompt, target_audience, vector_store, llms, peft_model=None, device='cpu', on_token_stream=False)

Creates a README generation chain for the specified project

Initializes and configures the README generation chain using the provided repository, user, and README configurations. Selects the appropriate language model (LLM), sets up the document processing chain with the specified prompts, and integrates with the vector store to generate comprehensive README sections based on project data. The chain facilitates automated generation of README files tailored to the project's specifications.

Parameters:

project_name (str) –

The name of the project for which the README is being generated.
repository_url (str) –

The URL of the repository containing the project.
content_type (str) –

The type of content to be included in the README (e.g., 'overview', 'installation').
chat_prompt (str) –

The prompt template used for generating README content.
target_audience (str) –

The intended audience for the README.
vector_store (HNSWLib) –

An instance of HNSWLib representing the vector store containing document embeddings.
llms (List[LLMModels]) –

A list of LLMModels to select from for generating README content.
peft_model (str | None, default: None ) –

An optional parameter specifying a PEFT (Parameter-Efficient Fine-Tuning) model for enhanced performance.
device (str, default: 'cpu' ) –

The device to use for model inference (default is 'cpu').
on_token_stream (bool, default: False ) –

Optional callback for handling token streams during model inference.

Returns:	`Runnable` – A retrieval chain configured for README generation, combining the retriever and document processing chain.

readme_ready.types

Utility Classes for REAMDE generation

AutodocReadmeConfig

Configuration class for managing README-specific settings in the README generation process.

Attributes:	`headings` (`str`) – A comma separated list of headings to include in the README. The input string is split by commas and stripped of extra whitespace.

Typical usage example:

readme_config = AutodocReadmeConfig(
    headings = "Description,Requirements"
)

AutodocRepoConfig

Configuration class for managing the README generation process of a repository.

Attributes:

name (str) –

The name of the repository.
repository_url (str) –

The URL of the repository to be documented.
root (str) –

The root directory of the repository.
output (str) –

The directory where the generated README will be stored.
llms (List[LLMModels]) –

A list of language models to be used in the documentation process.
priority (Priority) –

The priority level for processing tasks.
max_concurrent_calls (int) –

The maximum number of concurrent calls allowed during processing.
add_questions (bool) –

Whether to include generated questions in the documentation.
ignore (List[str]) –

A list of files or directories patterns to be excluded from documentation.
file_prompt (str) –

The template or prompt to process individual files.
folder_prompt (str) –

The template or prompt to process folders.
chat_prompt (str) –

The template or prompt for chatbot interactions.
content_type (str) –

The type of content being documented (e.g., code, docs).
target_audience (str) –

The intended audience for the documentation.
link_hosted (bool) –

Whether to generate hosted links in the documentation.
peft_model_path (str | None) –

Path to a PEFT (Parameter-Efficient Fine-Tuning) model, if applicable.
device (str | None) –

The device to be used for processing (e.g., "cpu", "auto").

Typical usage example:

repo_config = AutodocRepoConfig (
    name = "<REPOSITORY_NAME>",
    root = "<REPOSITORY_ROOT_DIR_PATH>",
    repository_url = "<REPOSITORY_URL>",
    output = "<OUTPUT_DIR_PATH>",
    llms = [model],
    peft_model_path = "<PEFT_MODEL_NAME_OR_PATH>",
    ignore = [
        ".*",
        "*package-lock.json",
        "*package.json",
        "node_modules",
        "*dist*",
        "*build*",
        "*test*",
        "*.svg",
        "*.md",
        "*.mdx",
        "*.toml"
    ],
    file_prompt = "",
    folder_prompt = "",
    chat_prompt = "",
    content_type = "docs",
    target_audience = "smart developer",
    link_hosted = True,
    priority = None,
    max_concurrent_calls = 50,
    add_questions = False,
    device = "auto",
)

AutodocUserConfig

Configuration class for managing user-specific settings in the README generation process.

Attributes:	`llms` (`List[LLMModels]`) – A list of language models available for the user to utilize. `streaming` (`bool`) – Whether to enable streaming during the documentation process. Defaults to False.

Typical usage example:

model = LLMModels.LLAMA2_7B_CHAT_GPTQ
user_config = AutodocUserConfig(
    llms = [model]
)

LLMModels

Bases: str, Enum

Supported Large Language Models (LLMs) for README generation task.

Members

GPT3 (str): OpenAI GPT-3.5-turbo model.
GPT4 (str): OpenAI GPT-4 model.
GPT432k (str): OpenAI GPT-4-32k model with extended context window.
TINYLLAMA_1p1B_CHAT_GGUF (str): TinyLlama 1.1B Chat model from TheBloke with GGUF format.
GOOGLE_GEMMA_2B_INSTRUCT_GGUF (str): Gemma 2B Instruction model in GGUF format by bartowski.
LLAMA2_7B_CHAT_GPTQ (str): LLaMA 2 7B Chat model using GPTQ from TheBloke.
LLAMA2_13B_CHAT_GPTQ (str): LLaMA 2 13B Chat model using GPTQ from TheBloke.
CODELLAMA_7B_INSTRUCT_GPTQ (str): CodeLlama 7B Instruction model using GPTQ from TheBloke.
CODELLAMA_13B_INSTRUCT_GPTQ (str): CodeLlama 13B Instruction model using GPTQ from TheBloke.
LLAMA2_7B_CHAT_HF (str): LLaMA 2 7B Chat model hosted on Hugging Face.
LLAMA2_13B_CHAT_HF (str): LLaMA 2 13B Chat model hosted on Hugging Face.
CODELLAMA_7B_INSTRUCT_HF (str): CodeLlama 7B Instruction model hosted on Hugging Face.
CODELLAMA_13B_INSTRUCT_HF (str): CodeLlama 13B Instruction model hosted on Hugging Face.
GOOGLE_GEMMA_2B_INSTRUCT (str): Gemma 2B Instruction model by Google.
GOOGLE_GEMMA_7B_INSTRUCT (str): Gemma 7B Instruction model by Google.
GOOGLE_CODEGEMMA_2B (str): CodeGemma 2B model by Google for code-related tasks.
GOOGLE_CODEGEMMA_7B_INSTRUCT (str): CodeGemma 7B Instruction model by Google.

Typical usage example:

model = LLMModels.LLAMA2_7B_CHAT_GPTQ