Carl 2 päivää sitten
säilyke
3cfd87f0a2
8 muutettua tiedostoa jossa 636 lisäystä ja 0 poistoa
  1. 6 0
      .gitignore
  2. 117 0
      README.md
  3. 72 0
      SKILL.md
  4. 17 0
      config.example.json
  5. 1 0
      requirements.txt
  6. 267 0
      scripts/qwen_image_client.py
  7. 83 0
      scripts/registry.py
  8. 73 0
      scripts/shared/config.py

+ 6 - 0
.gitignore

@@ -0,0 +1,6 @@
+.venv/
+__pycache__/
+*.pyc
+config.json
+outputs/*
+!outputs/.gitkeep

+ 117 - 0
README.md

@@ -0,0 +1,117 @@
+# Qwen Image Skill for OpenClaw
+
+This repository provides an OpenClaw-compatible image generation skill backed by a configurable Qwen-compatible image API.
+
+It follows the same basic contract as the ComfyUI reference skill:
+
+- `SKILL.md` defines how OpenClaw should discover and call the skill.
+- `scripts/registry.py` exposes the available workflow and parameters.
+- `scripts/qwen_image_client.py` executes the actual image generation request and saves images locally.
+
+## Features
+
+- Configurable `base_url`, `model`, and `api_key`
+- Natural-language image generation through a single workflow: `qwen/text-to-image`
+- OpenClaw-friendly registry output for parameter discovery
+- Local image download and storage under `./outputs`
+- Compatible with OpenAI-style `images/generations` APIs that return `b64_json` or image URLs
+
+## Project Structure
+
+```text
+qwen-image-skill/
+├── SKILL.md
+├── README.md
+├── config.example.json
+├── requirements.txt
+├── outputs/
+│   └── .gitkeep
+└── scripts/
+    ├── registry.py
+    ├── qwen_image_client.py
+    └── shared/
+        ├── __init__.py
+        └── config.py
+```
+
+## Installation
+
+Install this repository into your OpenClaw skills directory, for example:
+
+```bash
+cd ~/.openclaw/workspace/skills
+git clone <your-repo-url> qwen-image-skill-openclaw
+cd qwen-image-skill-openclaw
+python3 -m venv .venv
+source .venv/bin/activate
+python3 -m pip install -r requirements.txt
+cp config.example.json config.json
+```
+
+## Configuration
+
+Edit `config.json`:
+
+```json
+{
+  "provider": {
+    "name": "qwen-compatible",
+    "base_url": "https://api-inference.modelscope.cn/v1",
+    "api_key": "YOUR_QWEN_API_KEY",
+    "model": "qwen-image"
+  },
+  "generation": {
+    "output_dir": "./outputs",
+    "timeout_seconds": 300,
+    "default_size": "1024x1024",
+    "default_n": 1,
+    "default_response_format": "b64_json",
+    "default_quality": "standard"
+  }
+}
+```
+
+Notes:
+
+- `base_url` defaults to `https://api-inference.modelscope.cn/v1`. You only need to override it if you are using a different compatible gateway.
+- `base_url` should point to the API root, not necessarily the full endpoint. The client appends `/images/generations` when needed.
+- `model` is fully configurable so you can switch to a newer Qwen image model later without code changes.
+- `api_key` is read from `config.json`. If you prefer environment variables later, that can be added separately.
+
+## Verify
+
+List the registered workflow:
+
+```bash
+python scripts/registry.py list --agent
+```
+
+Run a test generation:
+
+```bash
+python scripts/qwen_image_client.py \
+  --workflow qwen/text-to-image \
+  --args '{"prompt":"A cinematic portrait of a white cat astronaut on the moon","size":"1024x1024"}'
+```
+
+Expected success output:
+
+```json
+{
+  "status": "success",
+  "run_id": "...",
+  "model": "qwen-image",
+  "images": [
+    "./outputs/..._1.png"
+  ]
+}
+```
+
+## API Compatibility Assumption
+
+This implementation targets OpenAI-style image generation APIs exposed by Qwen-compatible providers. It supports two modes:
+
+- Synchronous providers that return `data[].b64_json` or `data[].url` directly from `POST <base_url>/images/generations`
+- ModelScope-style asynchronous providers that require `X-ModelScope-Async-Mode: true` and polling `GET <base_url>/tasks/<task_id>` with `X-ModelScope-Task-Type: image_generation`
+
+For the default ModelScope endpoint, the client automatically detects the synchronous-call rejection and retries in async mode.

+ 72 - 0
SKILL.md

@@ -0,0 +1,72 @@
+---
+name: qwen-image-skill
+description: |
+  Generate images through a configurable Qwen-compatible image generation API. This skill lets OpenClaw turn natural-language image requests into structured parameters, call the configured Qwen image model, and return downloaded image files.
+
+  Use this skill when:
+  (1) The user wants to generate an image from text.
+  (2) The user describes a scene, character, style, or composition and expects a rendered picture.
+  (3) The user asks to configure or verify the Qwen image generation model, API key, or endpoint.
+---
+
+# Qwen Image Agent Skill
+
+## Core Execution Specification
+
+As an OpenClaw agent equipped with this skill, your objective is to convert the user's natural-language image request into a structured argument payload, then hand it to the local Python client to generate images through the configured Qwen-compatible API.
+
+### Step 1: Query the Skill Registry
+
+Before generating, inspect the currently exposed workflow and parameter surface:
+
+```bash
+python ./scripts/registry.py list --agent
+```
+
+Rules:
+
+- Treat `prompt` as required.
+- If the user did not specify `size`, `n`, `quality`, or `negative_prompt`, infer sensible defaults from the request.
+- Do not expose implementation details like endpoints, headers, or internal response fields unless the user explicitly asks.
+
+### Step 2: Assemble Parameters
+
+Create a JSON object for the workflow `qwen/text-to-image`.
+
+Expected arguments:
+
+- `prompt`: fully written image prompt.
+- `size`: optional output size, such as `1024x1024`, `1280x720`, or `720x1280`.
+- `n`: optional image count.
+- `quality`: optional quality hint such as `standard` or `hd`.
+- `negative_prompt`: optional list of unwanted elements.
+- `seed`: optional deterministic seed.
+
+If the user intent is vague, ask for the missing art direction only when necessary. Otherwise, refine the request yourself into a production-ready prompt.
+
+### Step 3: Execute Generation
+
+Run the local client:
+
+```bash
+python ./scripts/qwen_image_client.py --workflow qwen/text-to-image --args '<JSON_ARGS>'
+```
+
+Requirements:
+
+- Pass strict JSON in `--args`.
+- If the API returns multiple images, keep all paths.
+- If generation fails due to config issues, guide the user to update `config.json` from `config.example.json`.
+
+### Step 4: Return Results
+
+On success, return the generated local image paths and a concise summary of what was generated.
+
+On failure:
+
+- Surface the actual error.
+- If the error is about credentials or endpoint configuration, tell the user to set `provider.api_key`, `provider.base_url`, and `provider.model` in `config.json`.
+
+### Configuration Notes
+
+This skill reads runtime settings from `config.json` in the repository root. If that file does not exist yet, copy `config.example.json` to `config.json` and fill in the Qwen settings.

+ 17 - 0
config.example.json

@@ -0,0 +1,17 @@
+{
+  "provider": {
+    "name": "qwen-compatible",
+    "base_url": "https://dashscope.aliyuncs.com/compatible-mode/v1",
+    "api_key": "YOUR_QWEN_API_KEY",
+    "model": "qwen-image"
+  },
+  "generation": {
+    "output_dir": "./outputs",
+    "timeout_seconds": 300,
+    "poll_interval_seconds": 3,
+    "default_size": "1024x1024",
+    "default_n": 1,
+    "default_response_format": "b64_json",
+    "default_quality": "standard"
+  }
+}

+ 1 - 0
requirements.txt

@@ -0,0 +1 @@
+requests>=2.31.0,<3

+ 267 - 0
scripts/qwen_image_client.py

@@ -0,0 +1,267 @@
+from __future__ import annotations
+
+import argparse
+import base64
+import json
+import time
+import uuid
+from pathlib import Path
+from typing import Any
+
+import requests
+
+from shared.config import load_config, require_provider_config, resolve_output_dir
+
+
+def _normalize_base_url(base_url: str) -> str:
+    return base_url.rstrip("/")
+
+
+def _images_endpoint(base_url: str) -> str:
+    normalized = _normalize_base_url(base_url)
+    if normalized.endswith("/images/generations"):
+        return normalized
+    return f"{normalized}/images/generations"
+
+
+def _tasks_endpoint(base_url: str, task_id: str) -> str:
+    normalized = _normalize_base_url(base_url)
+    if normalized.endswith("/v1"):
+        return f"{normalized}/tasks/{task_id}"
+    return f"{normalized}/v1/tasks/{task_id}"
+
+
+def _build_payload(config: dict[str, Any], args_payload: dict[str, Any]) -> dict[str, Any]:
+    provider = config["provider"]
+    generation = config.get("generation", {})
+
+    payload: dict[str, Any] = {
+        "model": provider["model"],
+        "prompt": args_payload["prompt"],
+        "size": args_payload.get("size") or generation.get("default_size", "1024x1024"),
+        "n": args_payload.get("n") or generation.get("default_n", 1),
+        "response_format": args_payload.get("response_format") or generation.get("default_response_format", "b64_json"),
+        "quality": args_payload.get("quality") or generation.get("default_quality", "standard"),
+    }
+
+    if args_payload.get("negative_prompt"):
+        payload["negative_prompt"] = args_payload["negative_prompt"]
+    if args_payload.get("seed") is not None:
+        payload["seed"] = args_payload["seed"]
+
+    extra_body = args_payload.get("extra_body")
+    if isinstance(extra_body, dict):
+        payload.update(extra_body)
+
+    return payload
+
+
+def _write_b64_image(output_dir: Path, image_data: str, run_id: str, index: int) -> str:
+    image_bytes = base64.b64decode(image_data)
+    output_path = output_dir / f"{run_id}_{index}.png"
+    output_path.write_bytes(image_bytes)
+    return str(output_path)
+
+
+def _download_image(output_dir: Path, session: requests.Session, image_url: str, run_id: str, index: int) -> str:
+    response = session.get(image_url, timeout=60)
+    response.raise_for_status()
+    output_path = output_dir / f"{run_id}_{index}.png"
+    output_path.write_bytes(response.content)
+    return str(output_path)
+
+
+def _save_images(config: dict[str, Any], response_payload: dict[str, Any], session: requests.Session, run_id: str) -> list[str]:
+    output_dir = resolve_output_dir(config)
+    images: list[str] = []
+    data = response_payload.get("data") or []
+
+    for index, item in enumerate(data, start=1):
+        if item.get("b64_json"):
+            images.append(_write_b64_image(output_dir, item["b64_json"], run_id, index))
+            continue
+        if item.get("url"):
+            images.append(_download_image(output_dir, session, item["url"], run_id, index))
+            continue
+
+    return images
+
+
+def _should_retry_async(response: requests.Response) -> bool:
+    if response.status_code != 400:
+        return False
+
+    try:
+        payload = response.json()
+    except ValueError:
+        return False
+
+    message = str(payload.get("errors", {}).get("message", "")).lower()
+    return "does not support synchronous calls" in message and "async" in message
+
+
+def _should_use_async_mode(provider: dict[str, Any]) -> bool:
+    base_url = str(provider.get("base_url", "")).lower()
+    return "api-inference.modelscope.cn" in base_url
+
+
+def _poll_async_task(
+    config: dict[str, Any],
+    provider: dict[str, Any],
+    session: requests.Session,
+    task_id: str,
+    run_id: str,
+) -> dict[str, Any]:
+    generation = config.get("generation", {})
+    poll_interval = float(generation.get("poll_interval_seconds", 3))
+    timeout_seconds = int(generation.get("timeout_seconds", 300))
+    started_at = time.monotonic()
+    endpoint = _tasks_endpoint(provider["base_url"], task_id)
+
+    while True:
+        if time.monotonic() - started_at > timeout_seconds:
+            raise TimeoutError(f"Timed out waiting for task {task_id} after {timeout_seconds} seconds.")
+
+        response = session.get(
+            endpoint,
+            headers={"X-ModelScope-Task-Type": "image_generation"},
+            timeout=60,
+        )
+        response.raise_for_status()
+        payload = response.json()
+        task_status = str(payload.get("task_status", "")).upper()
+
+        if task_status == "SUCCEED":
+            output_images = payload.get("output_images") or []
+            images = [
+                _download_image(output_dir=resolve_output_dir(config), session=session, image_url=image_url, run_id=run_id, index=index)
+                for index, image_url in enumerate(output_images, start=1)
+            ]
+            if not images:
+                raise RuntimeError(f"Task {task_id} succeeded but returned no output_images.")
+
+            return {
+                "status": "success",
+                "run_id": run_id,
+                "task_id": task_id,
+                "model": provider["model"],
+                "images": images,
+                "raw_response": payload,
+            }
+
+        if task_status == "FAILED":
+            raise RuntimeError(json.dumps(payload, ensure_ascii=False))
+
+        time.sleep(poll_interval)
+
+
+def execute_generation(args_payload: dict[str, Any]) -> dict[str, Any]:
+    config = load_config()
+    provider = require_provider_config(config)
+    payload = _build_payload(config, args_payload)
+    endpoint = _images_endpoint(provider["base_url"])
+    timeout = int(config.get("generation", {}).get("timeout_seconds", 300))
+
+    run_id = str(uuid.uuid4())
+    with requests.Session() as session:
+        session.headers.update(
+            {
+                "Authorization": f"Bearer {provider['api_key']}",
+                "Content-Type": "application/json",
+            }
+        )
+
+        if _should_use_async_mode(provider):
+            async_response = session.post(
+                endpoint,
+                json=payload,
+                headers={"X-ModelScope-Async-Mode": "true"},
+                timeout=timeout,
+            )
+            async_response.raise_for_status()
+            async_payload = async_response.json()
+            task_id = async_payload.get("task_id")
+            if not str(task_id or "").strip():
+                raise RuntimeError("Async image generation did not return a task_id.")
+            return _poll_async_task(config, provider, session, str(task_id), run_id)
+
+        response = session.post(endpoint, json=payload, timeout=timeout)
+
+        if _should_retry_async(response):
+            async_response = session.post(
+                endpoint,
+                json=payload,
+                headers={"X-ModelScope-Async-Mode": "true"},
+                timeout=timeout,
+            )
+            async_response.raise_for_status()
+            async_payload = async_response.json()
+            task_id = async_payload.get("task_id")
+            if not str(task_id or "").strip():
+                raise RuntimeError("Async image generation did not return a task_id.")
+            return _poll_async_task(config, provider, session, str(task_id), run_id)
+
+        response.raise_for_status()
+        response_payload = response.json()
+
+        images = _save_images(config, response_payload, session, run_id)
+        if not images:
+            raise RuntimeError("Provider returned no downloadable images.")
+
+        return {
+            "status": "success",
+            "run_id": run_id,
+            "model": provider["model"],
+            "images": images,
+            "raw_response": response_payload,
+        }
+
+
+def _parse_args_json(raw_args: str) -> dict[str, Any]:
+    try:
+        payload = json.loads(raw_args)
+    except json.JSONDecodeError as exc:
+        raise ValueError(f"Invalid JSON in --args: {exc}") from exc
+
+    if not isinstance(payload, dict):
+        raise ValueError("--args must decode to a JSON object.")
+    if not str(payload.get("prompt", "")).strip():
+        raise ValueError("The 'prompt' field is required.")
+    return payload
+
+
+def _cmd_run(raw_args: str) -> dict[str, Any]:
+    payload = _parse_args_json(raw_args)
+    return execute_generation(payload)
+
+
+def main() -> None:
+    parser = argparse.ArgumentParser(description="Qwen Image Client for OpenClaw Skill")
+    subparsers = parser.add_subparsers(dest="command")
+
+    sp_run = subparsers.add_parser("run", help="Generate images from a JSON payload")
+    sp_run.add_argument("--args", required=True, help="JSON string of generation parameters")
+
+    parser.add_argument("--workflow", help="Legacy compatibility flag. Only qwen/text-to-image is supported.")
+    parser.add_argument("--args", dest="legacy_args", help="Legacy compatibility flag when no subcommand is used.")
+    parsed = parser.parse_args()
+
+    result: dict[str, Any]
+    if parsed.command == "run":
+        result = _cmd_run(parsed.args)
+    elif parsed.workflow or parsed.legacy_args:
+        workflow = parsed.workflow or "qwen/text-to-image"
+        if workflow != "qwen/text-to-image":
+            raise SystemExit("Only qwen/text-to-image is supported by this skill.")
+        if not parsed.legacy_args:
+            raise SystemExit("--args is required.")
+        result = _cmd_run(parsed.legacy_args)
+    else:
+        parser.print_help()
+        return
+
+    print(json.dumps(result, ensure_ascii=False, indent=2))
+
+
+if __name__ == "__main__":
+    main()

+ 83 - 0
scripts/registry.py

@@ -0,0 +1,83 @@
+from __future__ import annotations
+
+import argparse
+import json
+
+from shared.config import load_config
+
+
+def build_registry_payload() -> dict:
+    config = load_config()
+    provider = config.get("provider", {})
+    generation = config.get("generation", {})
+
+    return {
+        "skill": "qwen-image-skill",
+        "provider": {
+            "name": provider.get("name", "qwen-compatible"),
+            "model": provider.get("model", "qwen-image"),
+        },
+        "workflows": [
+            {
+                "id": "qwen/text-to-image",
+                "name": "Qwen Text to Image",
+                "description": "Generate images from natural language prompts through a Qwen-compatible image generation API.",
+                "enabled": True,
+                "parameters": {
+                    "prompt": {
+                        "type": "string",
+                        "required": True,
+                        "description": "Main image prompt. Include subject, style, composition, lighting, and quality requirements.",
+                    },
+                    "size": {
+                        "type": "string",
+                        "required": False,
+                        "description": "Output size such as 1024x1024, 1280x720, or 720x1280.",
+                        "default": generation.get("default_size", "1024x1024"),
+                    },
+                    "n": {
+                        "type": "int",
+                        "required": False,
+                        "description": "Number of images to generate in one request.",
+                        "default": generation.get("default_n", 1),
+                    },
+                    "quality": {
+                        "type": "string",
+                        "required": False,
+                        "description": "Provider quality hint, for example standard or hd.",
+                        "default": generation.get("default_quality", "standard"),
+                    },
+                    "negative_prompt": {
+                        "type": "string",
+                        "required": False,
+                        "description": "Things to avoid in the generated image.",
+                    },
+                    "seed": {
+                        "type": "int",
+                        "required": False,
+                        "description": "Optional deterministic seed if the provider supports it.",
+                    },
+                },
+            }
+        ],
+    }
+
+
+def main() -> None:
+    parser = argparse.ArgumentParser(description="Workflow Registry for Qwen Image OpenClaw Skill")
+    parser.add_argument("action", choices=["list"], help="Action to perform")
+    parser.add_argument("--agent", action="store_true", help="Output full JSON schema for agent parsing")
+    args = parser.parse_args()
+
+    payload = build_registry_payload()
+    if args.agent:
+        print(json.dumps(payload, ensure_ascii=False, indent=2))
+        return
+
+    workflows = payload["workflows"]
+    for workflow in workflows:
+        print(f"{workflow['id']}: {workflow['description']}")
+
+
+if __name__ == "__main__":
+    main()

+ 73 - 0
scripts/shared/config.py

@@ -0,0 +1,73 @@
+from __future__ import annotations
+
+import json
+from copy import deepcopy
+from pathlib import Path
+from typing import Any
+
+
+BASE_DIR = Path(__file__).resolve().parents[2]
+CONFIG_PATH = BASE_DIR / "config.json"
+EXAMPLE_CONFIG_PATH = BASE_DIR / "config.example.json"
+DEFAULT_CONFIG: dict[str, Any] = {
+    "provider": {
+        "name": "qwen-compatible",
+        "base_url": "https://api-inference.modelscope.cn/v1",
+        "model": "qwen-image",
+    },
+    "generation": {
+        "output_dir": "./outputs",
+        "timeout_seconds": 300,
+        "poll_interval_seconds": 3,
+        "default_size": "1024x1024",
+        "default_n": 1,
+        "default_response_format": "b64_json",
+        "default_quality": "standard",
+    },
+}
+
+
+def load_json(path: Path) -> dict[str, Any]:
+    with path.open("r", encoding="utf-8") as handle:
+        return json.load(handle)
+
+
+def _merge_defaults(defaults: dict[str, Any], value: dict[str, Any]) -> dict[str, Any]:
+    merged = deepcopy(defaults)
+    for key, item in value.items():
+        if isinstance(item, dict) and isinstance(merged.get(key), dict):
+            merged[key] = _merge_defaults(merged[key], item)
+        else:
+            merged[key] = item
+    return merged
+
+
+def load_config() -> dict[str, Any]:
+    if CONFIG_PATH.exists():
+        return _merge_defaults(DEFAULT_CONFIG, load_json(CONFIG_PATH))
+    return _merge_defaults(DEFAULT_CONFIG, load_json(EXAMPLE_CONFIG_PATH))
+
+
+def resolve_output_dir(config: dict[str, Any]) -> Path:
+    raw_output_dir = config.get("generation", {}).get("output_dir", "./outputs")
+    output_dir = Path(raw_output_dir)
+    if not output_dir.is_absolute():
+        output_dir = BASE_DIR / output_dir
+    output_dir.mkdir(parents=True, exist_ok=True)
+    return output_dir
+
+
+def require_provider_config(config: dict[str, Any]) -> dict[str, Any]:
+    provider = config.get("provider", {})
+    missing = [
+        key
+        for key in ("api_key",)
+        if not str(provider.get(key, "")).strip()
+    ]
+    if missing:
+        raise ValueError(
+            "Missing provider config fields: "
+            + ", ".join(missing)
+            + ". Update config.json before running the skill."
+        )
+    return provider