+ Added/refactored new custom GPTs scripts

- added gen_gpt_templ script
- improved Custom GPTs template generator
This commit is contained in:
Elias Bachaalany
2025-07-30 05:39:03 -07:00
parent 612584a66c
commit 58e8bd1e72
9 changed files with 900 additions and 110 deletions

13
Tools/README.md Normal file
View File

@@ -0,0 +1,13 @@
# TheBigPromptLibrary Tools
This directory contains various tools and utilities for working with ChatGPT's custom GPTs.
## Available Tools
- A [collection of scripts](./openai_gpts/README.md) for managing and working with ChatGPT Custom GPTs
## License
These tools are open-sourced under the GNU General Public License (GPL). Under this license, you are free to use, modify, and redistribute this software, provided that all copies and derivative works are also licensed under the GPL.
For more details, see the [GPLv3 License](https://www.gnu.org/licenses/gpl-3.0.html).

177
Tools/openai_gpts/README.md Normal file
View File

@@ -0,0 +1,177 @@
# Custom GPTs scripts and tools
This directory contains utilities for working with ChatGPT Custom GPTs in TheBigPromptLibrary:
- **idxtool.py** - GPT indexing and searching tool
- **gen_gpt_templ.py** - Generate markdown templates for ChatGPT GPTs by downloading and parsing their metadata
- **oneoff.py** - One-off operations on GPT files (e.g., batch reformatting)
## idxtool
The `idxtool` script is a Custom GPT indexing and searching tool used in TheBigPromptLibrary.
### Command line
```
usage: idxtool.py [-h] [--toc [TOC]] [--find-gpt FIND_GPT]
[--template TEMPLATE] [--parse-gptfile PARSE_GPTFILE]
[--rename]
idxtool: A GPT indexing and searching tool for the CSP repo
options:
-h, --help show this help message and exit
--toc [TOC] Rebuild the table of contents of GPT custom instructions
--find-gpt FIND_GPT Find a GPT file by its ID or full ChatGPT URL
--template TEMPLATE Creates an empty GPT template file from a ChatGPT URL
--parse-gptfile PARSE_GPTFILE
Parses a GPT file name
--rename Rename the GPT file names to include their GPT ID
```
### Features
- Rebuild TOC: Use `--toc` to rebuild the table of contents for GPT custom instructions.
- Find GPT File: Use `--find-gpt [GPTID or Full ChatGPT URL or a response file with IDs/URLs]` to find a GPT by its ID or URL.
- Rename GPT: Use `--rename` to rename all the GPTs to include their GPTID as prefix.
- Create a starter template GPT file: Use `--template [Full ChatGPT URL]` to create a starter template GPT file.
- Help: Use `--help` to display the help message and usage instructions.
### Example
To rebuild the custom GPTs files, run:
```bash
python idxtool.py --toc
```
To find a GPT by its ID, run:
```bash
python idxtool.py --find-gpt 3rtbLUIUO
```
or by URL:
```bash
python idxtool.py --find-gpt https://chat.openai.com/g/g-svehnI9xP-retro-adventures
```
Additionally, you can have a file with a list of IDs or URLs and pass it to the `--find-gpt` option:
```bash
python idxtool.py --find-gpt @gptids.txt
```
(note the '@' symbol).
The `gptids.txt` file contains a list of IDs or URLs, one per line:
```text
3rtbLUIUO
https://chat.openai.com/g/g-svehnI9xP-retro-adventures
#vYzt7bvAm
w2yOasK1r
waDWNw2J3
```
## gen_gpt_templ
The `gen_gpt_templ` script generates markdown templates for ChatGPT GPTs by downloading and parsing their metadata from the ChatGPT website.
### Command line
```bash
usage: gen_gpt_templ.py [-h] [--debug] [--dump] [input]
Generate markdown template for ChatGPT GPTs
positional arguments:
input GPT URL, GPT ID, g-prefixed GPT ID, or @response_file
options:
-h, --help show this help message and exit
--debug Save debug files (HTML and dump)
--dump Save parsed names and values to .txt file
```
### Features
- Downloads GPT metadata from ChatGPT URLs
- Parses GPT information including title, description, author, and profile picture
- Generates markdown templates with GPT metadata
- Supports multiple input formats:
- Full ChatGPT URL: `https://chatgpt.com/g/g-VgbIr9TQQ-ida-pro-c-sdk-and-decompiler`
- Conversation URL: `https://chatgpt.com/g/g-m5lMeGifF-sql-expert-querygpt/c/682cd38c-ca8c-800d-b6e2-33b8ba763824`
- GPT ID: `VgbIr9TQQ`
- Prefixed GPT ID: `g-VgbIr9TQQ`
- Response file: `@gptids.txt` (processes multiple GPTs from a file)
### Examples
Generate template for a single GPT:
```bash
python gen_gpt_templ.py https://chatgpt.com/g/g-VgbIr9TQQ-ida-pro-c-sdk-and-decompiler
```
Process multiple GPTs from a file:
```bash
python gen_gpt_templ.py @gptids.txt
```
Generate template with debug output:
```bash
python gen_gpt_templ.py g-VgbIr9TQQ --debug --dump
```
## Differences between idxtool and gen_gpt_templ
### idxtool --template
- Uses gen_gpt_templ internally to download actual GPT metadata from ChatGPT
- Creates templates with real GPT information (title, description, author, logo)
- Generates properly named files (`{gpt_id}.md`) without RENAMEME suffix
- Simpler interface for basic template generation within the idxtool workflow
### gen_gpt_templ
- Full-featured standalone tool with additional capabilities:
- `--dump` flag to save all parsed metadata to .txt file
- `--debug` flag to save HTML and debug information
- Batch processing with @response_file for multiple GPTs
- More detailed console output showing download and parsing progress
- Can be used as a module by other tools (like idxtool)
Use `idxtool --template` when you need a quick template as part of your GPT file management workflow. Use `gen_gpt_templ` directly when you need the advanced features like metadata dumping or batch processing.
## oneoff
The `oneoff` script performs one-off operations on GPT files, primarily batch processing tasks.
### Features
- **Reformat GPT files**: Reformats all GPT markdown files in a source directory and saves them to a destination directory
- Validates GPT file structure during processing
- Preserves GPT metadata (ID, name) during reformatting
### Usage
The script is designed for batch operations. Currently supports:
1. **Batch reformatting**: Process all `.md` files in a source directory, reformat them according to the standard GPT markdown structure, and save to a destination directory.
Example usage in code:
```python
from oneoff import reformat_gpt_files
success, message = reformat_gpt_files("source_gpts/", "formatted_gpts/")
print(message)
```
## License
This tool is open-sourced under the GNU General Public License (GPL). Under this license, you are free to use, modify, and redistribute this software, provided that all copies and derivative works are also licensed under the GPL.
For more details, see the [GPLv3 License](https://www.gnu.org/licenses/gpl-3.0.html).

View File

@@ -0,0 +1,654 @@
"""
Generate markdown templates for ChatGPT GPTs by downloading and parsing their metadata.
By Elias Bachaalany
Usage:
gen_template.py <gpt_url|gpt_id|g-prefixed_id> [--debug]
gen_template.py @response_file.txt [--debug]
"""
import re
import json
import os
import sys
import argparse
import requests
from collections import namedtuple
# Named tuple for generate_template return value
GenerateTemplateResult = namedtuple('GenerateTemplateResult',
['template', 'short_url', 'gpt_id', 'parser'])
# Global template string
TEMPLATE = """GPT URL: https://chatgpt.com/g/{short_url}
GPT logo: <img src="{profile_pic}" width="100px" />
GPT Title: {title}
GPT Description: {description} - By {author_display_name}
GPT instructions:
```markdown
```"""
# ----------------------------------------------------------
def parse_gpt_id(url):
"""
Parse the GPT ID from a ChatGPT URL
Args:
url (str): Full ChatGPT URL like https://chatgpt.com/g/g-VgbIr9TQQ-ida-pro-c-sdk-and-decompiler
Returns:
str or None: The GPT ID (e.g., 'VgbIr9TQQ') or None if not found
"""
# Pattern to match g- followed by 9 characters
pattern = r'/g/g-([a-zA-Z0-9]{9})'
match = re.search(pattern, url)
if match:
return match.group(1)
return None
# ----------------------------------------------------------
# Compile regex to extract streamController.enqueue arguments
_ENQUEUE_RE = re.compile(
r'window\.__reactRouterContext\.streamController\.enqueue\(\s*' # find the call
r'(?P<q>["\'])' # capture whether it's " or '
r'(?P<raw>(?:\\.|(?!\1).)*?)' # any escaped-char or char not the opening quote
r'(?P=q)\s*' # matching closing quote
r'\)',
flags=re.DOTALL
)
# ----------------------------------------------------------
def extract_enqueue_args(html_text, decode_escapes=True):
"""
Scans html_text for all streamController.enqueue(...) calls,
returns a list of the raw string-literals inside the quotes.
"""
args = []
for m in _ENQUEUE_RE.finditer(html_text):
raw = m.group('raw')
if decode_escapes:
# Only decode actual escape sequences, not Unicode characters
# This prevents double-encoding of emojis and other Unicode chars
try:
# First try to parse as JSON string to handle escapes properly
raw = json.loads('"' + raw + '"')
except:
# Fallback to simple replacement of common escapes
raw = raw.replace('\\n', '\n').replace('\\t', '\t').replace('\\"', '"').replace("\\'", "'").replace('\\\\', '\\')
args.append(raw)
return args
# ----------------------------------------------------------
class CustomGPTParser:
def __init__(self):
self._parse_cache = {} # Cache for parsed data
self._parsed_items = None # Store parsed items internally
def parse(self, source, debug: bool = False):
# Determine if source is a filename or content
# First check if it could be a file (avoid treating content as filename)
is_likely_filename = (
len(source) < 1000 and # Reasonable filename length
'|' not in source and # Filenames don't contain pipes
os.path.isfile(source)
)
if is_likely_filename:
try:
with open(source, encoding='utf-8') as f:
content = f.read()
except Exception as e:
return (False, f"Error reading file: {e}")
else:
# Treat as content
content = source
# Parse the content
if not (enqueue_args := extract_enqueue_args(content)):
msg = "No enqueue arguments found in the provided string."
if debug:
print(msg)
return (False, msg)
if not enqueue_args: # Additional safety check for empty list
msg = "No enqueue arguments found in the provided string."
if debug:
print(msg)
return (False, msg)
try:
# Use the argument with the longest length (most likely the Gizmo data)
s = max(enqueue_args, key=len)
data = json.loads(s)
parsed_items = []
for item in data:
if isinstance(item, dict):
for k, v in item.items():
parsed_items.append((k, v))
else:
if debug:
print(f" {item} (type: {type(item).__name__})")
parsed_items.append(item)
self._parsed_items = parsed_items
return (True, None)
except json.JSONDecodeError as e:
return (False, f"JSON decoding error: {e}")
def get_title(self):
"""
Extract the title of the GPT by finding the item preceding 'description'.
The algorithm walks through items to find 'description', then returns
the immediately preceding item as the title.
Returns:
str: The title value or empty string on failure
"""
# Check cache first
if 'title' in self._parse_cache:
return self._parse_cache['title']
# Need parsed items to work with
if not self._parsed_items:
return ''
# Convert to list if not already to allow indexing
items_list = list(self._parsed_items)
# Find 'description' and get the preceding item
for i, item in enumerate(items_list):
# Skip tuples
if isinstance(item, tuple):
continue
# Found 'description'?
if item == 'description' and i > 0:
# Get the previous item as title
prev_item = items_list[i - 1]
# Make sure it's a string value, not a tuple
if isinstance(prev_item, str):
self._parse_cache['title'] = prev_item
return prev_item
# Not found
return ''
def get_author_display_name(self):
"""
Extract the author display name by finding the item after 'user-{id}'.
The pattern is:
- 'user_id' (literal string)
- 'user-{actual_user_id}' (e.g., 'user-IUwuaeXwGuwv0UoRPaeEqlzs')
- '{author_display_name}' (e.g., 'Elias Bachaalany')
Returns:
str: The author display name or empty string on failure
"""
# Check cache first
if 'author_display_name' in self._parse_cache:
return self._parse_cache['author_display_name']
# Need parsed items to work with
if not self._parsed_items:
return ''
# Convert to list if not already to allow indexing
items_list = list(self._parsed_items)
# Find pattern: 'user_id' -> 'user-{id}' -> '{display_name}'
for i, item in enumerate(items_list):
# Skip tuples
if isinstance(item, tuple):
continue
# Found 'user_id'?
if item == 'user_id' and i + 2 < len(items_list):
# Check if next item is a user ID (starts with 'user-')
next_item = items_list[i + 1]
if isinstance(next_item, str) and next_item.startswith('user-'):
# The item after that should be the display name
display_name_item = items_list[i + 2]
if isinstance(display_name_item, str):
self._parse_cache['author_display_name'] = display_name_item
return display_name_item
# Not found
return ''
def get_str_value(self, name: str, default: str = None):
"""
Get a string value by name from the parsed items.
Args:
name: The key/name to search for
default: Default value if not found
Returns:
str: The value associated with the name or default
"""
# Check cache first
if name in self._parse_cache:
return self._parse_cache[name]
# Need parsed items to work with
if not self._parsed_items:
return default
# Search through items
it = iter(self._parsed_items)
for item in it:
# Handle tuple items (key-value pairs from dictionaries)
if isinstance(item, tuple):
# We don't handle tuple items now
continue
# Handle flat list items (name followed by value)
if item == name:
try:
val = next(it)
# Cache and return the value
self._parse_cache[name] = str(val)
return str(val)
except StopIteration:
return default
return default
def clear_cache(self):
"""Clear the internal cache"""
self._parse_cache.clear()
def get_parsed_items(self):
"""Get the parsed items (for backward compatibility)"""
return self._parsed_items if self._parsed_items else []
def dump(self, safe_ascii=True):
"""
Dump all parsed items in a formatted way.
Args:
safe_ascii (bool): If True, encode non-ASCII characters safely
Returns:
None (prints to stdout)
"""
if not self._parsed_items:
print("No parsed items to dump")
return
print(f"Dumping {len(self._parsed_items)} parsed items:")
print("-" * 60)
for item in self._parsed_items:
if isinstance(item, tuple) and len(item) == 2:
# Handle key-value pairs from dictionaries
k, v = item
if safe_ascii:
# Handle Unicode characters safely by encoding to ASCII with replacement
k_safe = str(k).encode('ascii', errors='replace').decode('ascii')
v_safe = str(v).encode('ascii', errors='replace').decode('ascii')
print(f" {k_safe}: {v_safe} (type: {type(v).__name__})")
else:
print(f" {k}: {v} (type: {type(v).__name__})")
else:
# Handle non-tuple items
if safe_ascii:
# Handle Unicode characters safely for non-dict items
item_safe = str(item).encode('ascii', errors='replace').decode('ascii')
print(f" {item_safe} (type: {type(item).__name__})")
else:
print(f" {item} (type: {type(item).__name__})")
print("-" * 60)
# ----------------------------------------------------------
def download_page(url: str, out_filename: str = '') -> tuple[bool, object]:
"""
Download a page using browser-like headers
Args:
url (str): The full URL to download
out_filename (str): Optional filename to save to. If empty, no file is written.
Returns:
tuple[bool, object]: (success, content/error_message)
- (True, content) if successful
- (False, error_message) if failed
"""
# Ensure we have a full URL
if not url.startswith('http'):
return (False, "Please provide a full URL starting with http:// or https://")
# Base headers from the sample request
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:139.0) Gecko/20100101 Firefox/139.0',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Language': 'en-US,en;q=0.5',
# Remove Accept-Encoding to get uncompressed response
# 'Accept-Encoding': 'gzip, deflate, br, zstd',
'DNT': '1',
'Sec-GPC': '1',
'Connection': 'keep-alive',
'Upgrade-Insecure-Requests': '1',
'Sec-Fetch-Dest': 'document',
'Sec-Fetch-Mode': 'navigate',
'Sec-Fetch-Site': 'none',
'Sec-Fetch-User': '?1',
'Priority': 'u=0, i',
'TE': 'trailers'
}
try:
# Create a session to handle cookies and connections properly
session = requests.Session()
session.headers.update(headers)
# Make the GET request
response = session.get(url, timeout=30)
# Check if request was successful
response.raise_for_status()
# Save to file if filename provided
if out_filename:
with open(out_filename, 'w', encoding='utf-8') as f:
f.write(response.text)
return (True, response.text)
except requests.exceptions.RequestException as e:
return (False, f"Error downloading page: {e}")
except Exception as e:
return (False, f"Unexpected error: {e}")
# ----------------------------------------------------------
def process_gpt_input(input_str):
"""
Process GPT input which can be:
- Full URL: https://chatgpt.com/g/g-VgbIr9TQQ-ida-pro-c-sdk-and-decompiler
- Conversation URL: https://chatgpt.com/g/g-m5lMeGifF-sql-expert-querygpt/c/682cd38c-ca8c-800d-b6e2-33b8ba763824
- GPT ID: VgbIr9TQQ
- Prefixed GPT ID: g-VgbIr9TQQ
Returns:
tuple: (full_url, gpt_id)
"""
# Check if it's a full URL
if input_str.startswith('https://') or input_str.startswith('http://'):
gpt_id = parse_gpt_id(input_str)
if not gpt_id:
raise ValueError(f"Could not parse GPT ID from URL: {input_str}")
# If it's a conversation URL (contains /c/), extract the base GPT URL
if '/c/' in input_str:
# Extract the GPT part before /c/
base_url = input_str.split('/c/')[0]
return (base_url, gpt_id)
return (input_str, gpt_id)
# Check if it's a prefixed GPT ID (g-XXXXXXXXX)
if input_str.startswith('g-') and len(input_str) >= 11:
# Extract just the 9-character ID after 'g-'
gpt_id = input_str[2:11] # Get exactly 9 characters after 'g-'
url = f"https://chatgpt.com/g/{input_str}"
return (url, gpt_id)
# Assume it's a bare GPT ID (9 characters)
if len(input_str) == 9:
url = f"https://chatgpt.com/g/g-{input_str}"
return (url, input_str)
raise ValueError(f"Invalid GPT input format: {input_str}")
def generate_template(url, debug=False, dump=False):
"""
Download and parse GPT data, then generate markdown template
Args:
url: Full GPT URL
debug: Whether to save debug files (HTML and dump)
dump: Whether to print parsed items to console
Returns:
tuple: (success, result_or_error)
- (True, GenerateTemplateResult) if successful
- (False, error_message) if failed
"""
print(f"[DOWNLOAD] Fetching page from: {url}")
# Download the page
save_file = None
if debug:
save_file = "debug_download.html"
print(f"[DEBUG] Will save HTML to: {save_file}")
success, content = download_page(url, save_file)
if not success:
return (False, f"Download failed: {content}")
print(f"[DOWNLOAD] Successfully downloaded {len(content)} bytes")
# Parse the content
print(f"[PARSE] Parsing GPT data...")
parser = CustomGPTParser()
success, error = parser.parse(content)
if not success:
return (False, f"Parsing failed: {error}")
print(f"[PARSE] Successfully parsed {len(parser.get_parsed_items())} items")
# Save dump if debug mode
if debug:
from io import StringIO
old_stdout = sys.stdout
sys.stdout = buffer = StringIO()
parser.dump(safe_ascii=True)
dump_content = buffer.getvalue()
sys.stdout = old_stdout
dump_file = "debug_dump.txt"
with open(dump_file, 'w', encoding='utf-8') as f:
f.write(dump_content)
print(f"[DEBUG] Saved parsed data dump to: {dump_file}")
# Extract required fields
print(f"[EXTRACT] Extracting GPT metadata...")
short_url = parser.get_str_value('short_url', 'UNKNOWN')
profile_pic = parser.get_str_value('profile_picture_url', '')
title = parser.get_title()
description = parser.get_str_value('description', '')
author_display_name = parser.get_author_display_name()
print(f"[EXTRACT] Found:")
print(f" - Short URL: {short_url}")
print(f" - Title: {title}")
print(f" - Author: {author_display_name}")
try:
print(f" - Description: {description[:50]}..." if len(description) > 50 else f" - Description: {description}")
except UnicodeEncodeError:
# Handle special characters that can't be printed to console
safe_desc = description.encode('ascii', errors='replace').decode('ascii')
print(f" - Description: {safe_desc[:50]}..." if len(safe_desc) > 50 else f" - Description: {safe_desc}")
print(f" - Profile Pic: {'Yes' if profile_pic else 'No'}")
# Dump parsed items if requested
if dump:
print("\n[DUMP] Parsed items:")
parser.dump(safe_ascii=False)
# Generate template
template = TEMPLATE.format(
short_url=short_url,
profile_pic=profile_pic,
title=title,
description=description,
author_display_name=author_display_name
)
# Extract GPT ID from short_url (remove 'g-' prefix if it exists)
gpt_id = short_url[2:] if short_url.startswith('g-') else short_url
return (True, GenerateTemplateResult(template, short_url, gpt_id, parser))
def process_response_file(filename, debug=False, dump=False):
"""
Process a response file containing multiple GPT URLs/IDs
Args:
filename: Path to the response file
debug: Whether to save debug files
dump: Whether to dump parsed items
Returns:
tuple: (success_count, error_count)
"""
try:
with open(filename, 'r', encoding='utf-8') as f:
lines = f.readlines()
except Exception as e:
print(f"Error reading response file: {e}")
return (0, 1)
# Process each non-empty line
inputs = [line.strip() for line in lines if line.strip() and not line.strip().startswith('#')]
if not inputs:
print(f"No valid inputs found in {filename}")
return (0, 0)
print(f"\n{'=' * 70}")
print(f"PROCESSING RESPONSE FILE: {filename}")
print(f"Found {len(inputs)} items to process")
print(f"{'=' * 70}")
success_count = 0
error_count = 0
for i, input_str in enumerate(inputs, 1):
print(f"\n[ITEM {i}/{len(inputs)}] Processing: {input_str}")
print("-" * 60)
try:
# Process input
url, gpt_id = process_gpt_input(input_str)
print(f"[PARSED] Full URL: {url}")
print(f"[PARSED] GPT ID: {gpt_id}")
# Generate template
success, result = generate_template(url, debug, dump)
if success:
filename = f"{result.gpt_id}.md"
with open(filename, 'w', encoding='utf-8') as f:
f.write(result.template)
print(f"[SUCCESS] Template saved to: {filename}")
# Save dump file if requested
if dump:
dump_filename = f"{result.gpt_id}.txt"
from io import StringIO
old_stdout = sys.stdout
sys.stdout = buffer = StringIO()
result.parser.dump(safe_ascii=True)
dump_content = buffer.getvalue()
sys.stdout = old_stdout
with open(dump_filename, 'w', encoding='utf-8') as f:
f.write(dump_content)
print(f"[SUCCESS] Dump saved to: {dump_filename}")
success_count += 1
else:
print(f"[FAILED] Error: {result}")
error_count += 1
except Exception as e:
print(f"[ERROR] Exception: {e}")
error_count += 1
print(f"\n{'=' * 70}")
print(f"RESPONSE FILE COMPLETE")
print(f"Success: {success_count}, Errors: {error_count}")
print(f"{'=' * 70}")
return (success_count, error_count)
def main():
parser = argparse.ArgumentParser(description='Generate markdown template for ChatGPT GPTs')
parser.add_argument('input', nargs='?', help='GPT URL, GPT ID, g-prefixed GPT ID, or @response_file')
parser.add_argument('--debug', action='store_true', help='Save debug files (HTML and dump)')
parser.add_argument('--dump', action='store_true', help='Save parsed names and values to .txt file')
args = parser.parse_args()
# Check if input was provided
if not args.input:
parser.print_help()
return 1
try:
# Check if input is a response file
if args.input.startswith('@'):
# Process response file
filename = args.input[1:] # Remove the @ prefix
success_count, error_count = process_response_file(filename, args.debug, args.dump)
sys.exit(0 if error_count == 0 else 1)
else:
# Process single input
print(f"\n[INPUT] Processing: {args.input}")
url, gpt_id = process_gpt_input(args.input)
print(f"[PARSED] Full URL: {url}")
print(f"[PARSED] GPT ID: {gpt_id}")
# Generate template
success, result = generate_template(url, args.debug, args.dump)
if success:
# Save to file
filename = f"{result.gpt_id}.md"
with open(filename, 'w', encoding='utf-8') as f:
f.write(result.template)
print(f"Template saved to: {filename}")
# Save dump file if requested
if args.dump:
dump_filename = f"{result.gpt_id}.txt"
from io import StringIO
old_stdout = sys.stdout
sys.stdout = buffer = StringIO()
result.parser.dump(safe_ascii=True)
dump_content = buffer.getvalue()
sys.stdout = old_stdout
with open(dump_filename, 'w', encoding='utf-8') as f:
f.write(dump_content)
print(f"Dump saved to: {dump_filename}")
# Also print the template
print("\nGenerated template:")
print("=" * 50)
try:
print(result.template)
except UnicodeEncodeError:
# Handle special characters that can't be printed to console
safe_template = result.template.encode('ascii', errors='replace').decode('ascii')
print(safe_template)
else:
print(f"Error: {result}")
return 1
except Exception as e:
print(f"Error: {e}")
return 1
if __name__ == "__main__":
main()

View File

@@ -0,0 +1,167 @@
"""
GPT parsing module.
The GPT markdown files have to adhere to a very specific format described in the README.md file in the root of the CSP project.
"""
import os, re
from collections import namedtuple
from typing import Union, Tuple, Generator, Iterator
GPT_BASE_URLS = ('https://chat.openai.com/g/g-', 'https://chatgpt.com/g/g-')
GPT_BASE_URLS_L = [len(url) for url in GPT_BASE_URLS]
FIELD_PREFIX = 'GPT'
GPT_FILE_ID_RE = re.compile(r'^([0-9a-z]{9})_(.*)\.md$', re.IGNORECASE)
"""GPT file name regex with ID and name capture."""
GPT_FILE_VERSION_RE = re.compile(r'\[([^]]*)\]\.md$', re.IGNORECASE)
"""GPT file name regex with version capture."""
GptFieldInfo = namedtuple('FieldInfo', ['order', 'display'])
GptIdentifier = namedtuple('GptIdentifier', ['id', 'name'])
"""Description of the fields supported by GPT markdown files."""
SUPPORTED_FIELDS = {
'url': GptFieldInfo(10, 'URL'),
'title': GptFieldInfo(20, 'Title'),
'description': GptFieldInfo(30, 'Description'),
'logo': GptFieldInfo(40, 'Logo'),
'verif_status': GptFieldInfo(50, 'Verification Status'),
'instructions': GptFieldInfo(60, 'Instructions'),
'actions': GptFieldInfo(70, 'Actions'),
'kb_files_list': GptFieldInfo(80, 'KB Files List'),
'extras': GptFieldInfo(90, 'Extras'),
'protected': GptFieldInfo(100, 'Protected'),
}
"""
Dictionary of the fields supported by GPT markdown files:
- The key should always be in lower case
- The GPT markdown file will have the form: {FIELD_PREFIX} {key}: {value}
"""
class GptMarkdownFile:
"""
A class to represent a GPT markdown file.
"""
def __init__(self, fields={}, filename: str = '') -> None:
self.fields = fields
self.filename = filename
def get(self, key: str, strip: bool = True) -> Union[str, None]:
"""
Return the value of the field with the specified key.
:param key: str, key of the field.
:return: str, value of the field.
"""
key = key.lower()
if key == 'version':
m = GPT_FILE_VERSION_RE.search(self.filename)
return m.group(1) if m else ''
v = self.fields.get(key)
return v.strip() if strip else v
def id(self) -> Union[GptIdentifier, None]:
"""
Return the GPT identifier.
:return: GptIdentifier object.
"""
return parse_gpturl(self.fields.get('url'))
def __str__(self) -> str:
sorted_fields = sorted(self.fields.items(), key=lambda x: SUPPORTED_FIELDS[x[0]].order)
# Check if the field value contains the start marker of the markdown block and add a blank line before it
field_strings = []
for key, value in sorted_fields:
if value:
# Only replace the first occurrence of ```markdown
modified_value = value.replace("```markdown", "\r\n```markdown", 1)
field_string = f"{FIELD_PREFIX} {SUPPORTED_FIELDS[key].display}: {modified_value}"
field_strings.append(field_string)
return "\r\n".join(field_strings)
@staticmethod
def parse(file_path: str) -> Union['GptMarkdownFile', Tuple[bool, str]]:
"""
Parse a markdown file and return a GptMarkdownFile object.
:param file_path: str, path to the markdown file.
:return: GptMarkdownFile if successful, otherwise a tuple with False and an error message.
"""
if not os.path.exists(file_path):
return (False, f"File '{file_path}' does not exist.")
with open(file_path, 'r', encoding='utf-8') as file:
fields = {key.lower(): [] for key in SUPPORTED_FIELDS.keys()}
field_re = re.compile(f"^\s*{FIELD_PREFIX}\s+({'|'.join(fields.keys())}):", re.IGNORECASE)
current_field = None
for line in file:
if m := field_re.match(line):
current_field = m.group(1).lower()
line = line[len(m.group(0)):].strip()
if current_field:
if current_field not in SUPPORTED_FIELDS:
return (False, f"Field '{current_field}' is not supported.")
fields[current_field].append(line)
gpt = GptMarkdownFile(
{key: ''.join(value) for key, value in fields.items()},
filename=file_path)
return (True, gpt)
def save(self, file_path: str) -> Tuple[bool, Union[str, None]]:
"""
Save the GptMarkdownFile object to a markdown file.
:param file_path: str, path to the markdown file.
"""
try:
with open(file_path, 'w', encoding='utf-8') as file:
file.write(str(self))
return (True, None)
except Exception as e:
return (False, f"Failed to save file '{file_path}': {e}")
def parse_gpturl(url: str) -> Union[GptIdentifier, None]:
for GPT_BASE_URL, GPT_BASE_URL_L in zip(GPT_BASE_URLS, GPT_BASE_URLS_L):
if url and url.startswith(GPT_BASE_URL):
id = url[GPT_BASE_URL_L:].split('\n')[0]
i = id.find('-')
if i != -1:
return GptIdentifier(id[:i], id[i+1:])
else:
return GptIdentifier(id, '')
def get_prompts_path() -> str:
"""Return the path to the Custom GPTs prompts directory."""
return os.path.abspath(os.path.join(os.path.dirname(__file__), '..', '..', 'CustomInstructions', 'ChatGPT'))
def enum_gpts() -> Generator[Tuple[bool, Union[GptMarkdownFile, str]], None, None]:
"""Enumerate all the GPT files in the prompts directory, parse them and return the parsed GPT object."""
prompts_path = get_prompts_path()
for file_path in os.listdir(prompts_path):
_, ext = os.path.splitext(file_path)
if ext != '.md':
continue
file_path = os.path.join(prompts_path, file_path)
ok, gpt = GptMarkdownFile.parse(file_path)
if ok:
yield (True, gpt)
else:
yield (False, f"Failed to parse '{file_path}': {gpt}")
def enum_gpt_files() -> Iterator[Tuple[str, str]]:
"""
Enumerate all the GPT files in the prompts directory while relying on the files naming convention.
To normalize all the GPT file names, run the `idxtool.py --rename`
"""
prompts_path = get_prompts_path()
for file_path in os.listdir(prompts_path):
m = GPT_FILE_ID_RE.match(file_path)
if not m:
continue
file_path = os.path.join(prompts_path, file_path)
yield (m.group(1), file_path)

View File

@@ -0,0 +1,276 @@
"""
idxtool is a script is used to perform various GPT indexing and searching tasks
- Find a GPT file by its ID or full ChatGPT URL or via a file containing a list of GPT IDs.
- Rename all the GPTs to include their ChatGPT/g/ID in the filename.
- Generate TOC
- etc.
"""
import sys, os, argparse
from typing import Tuple
from urllib.parse import quote
import gptparser
from gptparser import enum_gpts, parse_gpturl, enum_gpt_files, get_prompts_path
import gen_gpt_templ
TOC_FILENAME = os.path.abspath(os.path.join(get_prompts_path(), '..', 'README.md'))
TOC_GPT_MARKER_LINE = '## ChatGPT GPT instructions'
def rename_gpts():
effective_rename = nb_ok = nb_total = 0
for ok, gpt in enum_gpts():
nb_total += 1
if not ok or not (id := gpt.id()):
print(f"[!] {gpt.filename}")
continue
# Skip files with correct prefix
basename = os.path.basename(gpt.filename)
if basename.startswith(f"{id.id}_"):
nb_ok += 1
continue
effective_rename += 1
# New full file name with ID prefix
new_fn = os.path.join(os.path.dirname(gpt.filename), f"{id.id}_{basename}")
print(f"[+] {basename} -> {os.path.basename(new_fn)}")
if os.system(f"git mv \"{gpt.filename}\" \"{new_fn}\"") == 0:
nb_ok += 1
continue
# If git mv failed, then try os.rename
try:
os.rename(gpt.filename, new_fn)
nb_ok += 1
continue
except OSError as e:
print(f"Rename error: {e.strerror}")
msg = f"Renamed {nb_ok} out of {nb_total} GPT files."
ok = nb_ok == nb_total
if effective_rename == 0:
msg = f"All {nb_total} GPT files were already renamed. No action taken."
print(msg)
return (ok, msg)
def parse_gpt_file(filename) -> Tuple[bool, str]:
ok, gpt = gptparser.GptMarkdownFile.parse(filename)
if ok:
file_name_without_ext = os.path.splitext(os.path.basename(filename))[0]
dst_fn = os.path.join(
os.path.dirname(filename),
f"{file_name_without_ext}.new.md")
gpt.save(dst_fn)
else:
print(gpt)
return (ok, gpt)
def rebuild_toc(toc_out: str = '') -> Tuple[bool, str]:
"""
Rebuilds the table of contents GPT custom instructions file by reading all the GPT files in the CustomInstructions/ChatGPT directory.
"""
if not toc_out:
print(f"Rebuilding Table of Contents GPT custom instructions in place")
else:
print(f"Rebuilding Table of Contents GPT custom instructions to '{toc_out}'")
toc_in = TOC_FILENAME
if not toc_out:
toc_out = toc_in
if not os.path.exists(toc_in):
return (False, f"TOC File '{toc_in}' does not exist.")
# Read the TOC file and find the marker line for the GPT instructions
out = []
marker_found = False
with open(toc_in, 'r', encoding='utf-8') as file:
for line in file:
out.append(line)
if line.startswith(TOC_GPT_MARKER_LINE):
out.append('\n')
marker_found = True
break
if not marker_found:
return (False, f"Could not find the marker '{TOC_GPT_MARKER_LINE}' in '{toc_in}'. Please revert the TOC file and try again.")
# Write the TOC file all the way up to the marker line
try:
ofile = open(toc_out, 'w', encoding='utf-8')
except:
return (False, f"Failed to open '{toc_out}' for writing.")
# Count GPTs
enumerated_gpts = list(enum_gpts())
nb_ok = sum(1 for ok, gpt in enumerated_gpts if ok and gpt.id())
# Write the marker line and each GPT entry
out.append(f"There are {nb_ok} GPTs total:\n\n")
nb_ok = nb_total = 0
gpts = []
for ok, gpt in enumerated_gpts:
nb_total += 1
if ok:
if gpt_id := gpt.id():
nb_ok += 1
gpts.append((gpt_id, gpt))
else:
print(f"[!] No ID detected: {gpt.filename}")
else:
print(f"[!] {gpt}")
# Consistently sort the GPTs by ID and GPTs title
def gpts_sorter(key):
gpt_id, gpt = key
version = f"{gpt.get('version')}" if gpt.get('version') else ''
return f"{gpt.get('title')}{version} (id: {gpt_id.id}))"
gpts.sort(key=gpts_sorter)
for id, gpt in gpts:
file_link = f"./ChatGPT/{quote(os.path.basename(gpt.filename))}"
version = f" {gpt.get('version')}" if gpt.get('version') else ''
out.append(f"- [{gpt.get('title')}{version} (id: {id.id})]({file_link})\n")
ofile.writelines(out)
ofile.close()
msg = f"Generated TOC with {nb_ok} out of {nb_total} GPTs."
ok = nb_ok == nb_total
if ok:
print(msg)
return (ok, msg)
def make_template(input_str, verbose=True):
"""Creates a GPT template file from a ChatGPT URL/ID by downloading metadata"""
try:
# Process the input to handle URLs, IDs, conversation URLs, etc.
url, gpt_id = gen_gpt_templ.process_gpt_input(input_str)
if verbose:
print(f"[PARSED] Full URL: {url}")
print(f"[PARSED] GPT ID: {gpt_id}")
# Use gen_gpt_templ to generate the template with actual metadata
success, result = gen_gpt_templ.generate_template(url, debug=False, dump=False)
if not success:
msg = f"Failed to generate template: {result}"
if verbose:
print(msg)
return (False, msg)
# Extract the template content and gpt_id from the result
template_content = result.template
gpt_id = result.gpt_id
# Save to the current working directory with the proper filename
filename = f"{gpt_id}.md"
# Check if file already exists
if os.path.exists(filename):
msg = f"File '{filename}' already exists."
if verbose:
print(msg)
return (False, msg)
# Write the template content
with open(filename, 'w', encoding='utf-8') as file:
file.write(template_content)
msg = f"Created template '{filename}' for URL '{url}'"
if verbose:
print(msg)
return (True, msg)
except Exception as e:
msg = f"Error creating template: {str(e)}"
if verbose:
print(msg)
return (False, msg)
def find_gptfile(keyword, verbose=True):
"""Find a GPT file by its ID or full ChatGPT URL
The ID can be prefixed with '@' to indicate a file containing a list of GPT IDs.
"""
keyword = keyword.strip()
# Response file with a set of GPT IDs
if keyword.startswith('@'):
with open(keyword[1:], 'r', encoding='utf-8') as file:
ids = set()
for line in file:
line = line.strip()
# Skip comments
if line.startswith('#'):
continue
# If the line is a GPT URL, then extract the ID
if gpt_info := parse_gpturl(line):
ids.add(gpt_info.id)
continue
# If not a GPT URL, then it's a GPT ID
ids.add(line)
elif gpt_info := parse_gpturl(keyword):
# A single GPT URL
ids = {gpt_info.id}
else:
# A single GPT ID
ids = {keyword}
if verbose:
print(f'Looking for GPT files with IDs: {", ".join(ids)}')
matches = []
for id, filename in enum_gpt_files():
if id in ids:
if verbose:
print(filename)
matches.append((id, filename))
return matches
def main():
parser = argparse.ArgumentParser(description='idxtool: A GPT indexing and searching tool for the CSP repo')
parser.add_argument('--toc', nargs='?', const='', type=str, help='Rebuild the table of contents of custom GPTs')
parser.add_argument('--find-gpt', type=str, help='Find a GPT file by its ID or full ChatGPT URL')
parser.add_argument('--template', type=str, help='Creates a GPT template file from a ChatGPT URL, GPT ID, or g-prefixed ID')
parser.add_argument('--parse-gptfile', type=str, help='Parses a GPT file name')
parser.add_argument('--rename', action='store_true', help='Rename the GPT file names to include their GPT ID')
# Handle arguments
ok = True
args = parser.parse_args()
# Check if no arguments were provided
if not any(vars(args).values()):
parser.print_help()
sys.exit(0)
if args.parse_gptfile:
ok, err = parse_gpt_file(args.parse_gptfile)
if not ok:
print(err)
elif args.toc is not None:
ok, err = rebuild_toc(args.toc)
if not ok:
print(err)
elif args.find_gpt:
find_gptfile(args.find_gpt)
elif args.template:
make_template(args.template)
elif args.rename:
ok, err = rename_gpts()
if not ok:
print(err)
sys.exit(0 if ok else 1)
if __name__ == "__main__":
main()

View File

@@ -0,0 +1,54 @@
"""
'oneoff.py' is a script that performs one-off operations on the GPT files
- Reformat all the GPT files in the source path and save them to the destination path.
"""
from gptparser import GptMarkdownFile
from typing import Tuple
import os
def reformat_gpt_files(src_path: str, dst_path: str) -> Tuple[bool, str]:
"""
Reformat all the GPT files in the source path and save them to the destination path.
:param src_path: str, path to the source directory.
:param dst_path: str, path to the destination directory.
"""
if not os.path.exists(src_path):
return (False, f"Source path '{src_path}' does not exist.")
if not os.path.exists(dst_path):
os.makedirs(dst_path)
print(f"Reformatting GPT files in '{src_path}' and saving them to '{dst_path}'...")
nb_ok = nb_total = 0
for src_file_path in os.listdir(src_path):
_, ext = os.path.splitext(src_file_path)
if ext != '.md':
continue
nb_total += 1
dst_file_path = os.path.join(dst_path, src_file_path)
src_file_path = os.path.join(src_path, src_file_path)
ok, gpt = GptMarkdownFile.parse(src_file_path)
if ok:
ok, msg = gpt.save(dst_file_path)
if ok:
id = gpt.id()
if id:
info = f"; id={id.id}"
if id.name:
info += f", name='{id.name}'"
else:
info = ''
print(f"[+] saved '{os.path.basename(src_file_path)}'{info}")
nb_ok += 1
else:
print(f"[!] failed to save '{src_file_path}': {msg}")
else:
print(f"[!] failed to parse '{src_file_path}': {gpt}")
msg = f"Reformatted {nb_ok} out of {nb_total} GPT files."
ok = nb_ok == nb_total
return (ok, msg)

View File

@@ -0,0 +1 @@
GitPython