XPULink API Cookbook¶
Build powerful AI applications with zero infrastructure hassle - A comprehensive collection of examples for www.xpulink.ai đ
Why XPULink?¶
đ¯ No GPU? No Problem!¶
- 100% Cloud-Hosted: All models run on XPULink's infrastructure
- Zero Setup: No CUDA, no drivers, no expensive hardware needed
- Instant Access: Get started in minutes with just an API key
⥠Powered by vLLM - Enterprise-Grade Performance¶
- 15-30x Faster than traditional inference frameworks
- 50% Better Memory Efficiency with PagedAttention technology
- High Concurrency: Handle thousands of requests simultaneously
- Low Latency: Optimized CUDA kernels for blazing-fast responses
đ OpenAI-Compatible API¶
- Drop-in replacement for OpenAI API
- Use with LangChain, LlamaIndex, and other popular frameworks
- Minimal code changes to switch from OpenAI
đ° Cost-Effective¶
- Pay only for what you use
- No idle infrastructure costs
- Transparent pricing
đ What's Inside¶
This cookbook provides production-ready examples for:
| Feature | Description | Best For |
|---|---|---|
| đ¤ Text Generation | Basic LLM inference with Qwen3-32B | Chat, content generation |
| đ RAG System | PDF Q&A with BGE-M3 embeddings | Document analysis, knowledge bases |
| đ¯ LoRA Fine-tuning | Custom model training | Domain adaptation, style transfer |
| đ Device Monitoring Agent | Industrial IoT diagnostics | Predictive maintenance, anomaly detection |
| đ Model Evaluation | Benchmark testing with OpenBench | Model comparison, performance analysis |
All examples now use LiteLLM for elegant, production-ready integration with custom APIs!
đ Quick Start¶
Prerequisites¶
- Python 3.8+
- XPULink API Key from www.xpulink.ai
Installation¶
# Clone the repository
git clone https://github.com/xpulinkAI/cookbook.git
cd cookbook
# Install dependencies
pip install -r requirements.txt
# Set up your API key
echo "XPULINK_API_KEY=your_api_key_here" > .env
Your First API Call (30 seconds!)¶
from litellm import completion
response = completion(
model="openai/qwen3-32b",
messages=[{"role": "user", "content": "Hello!"}],
api_key="your_api_key",
api_base="https://www.xpulink.ai/v1",
custom_llm_provider="openai"
)
print(response.choices[0].message.content)
That's it! No GPU setup, no model downloads, just pure API magic. â¨
đ Examples¶
1. đŦ Text Generation¶
The simplest way to use LLMs
cd function_call
python text_model.py
What you get: - OpenAI-compatible chat completions - Streaming support - Function calling (when available) - Full control over temperature, tokens, etc.
Why it's easy with XPULink: - â No model downloads (GBs of data) - â No GPU required - â Instant API access - â Auto-scaling infrastructure
2. đ RAG System (Retrieval-Augmented Generation)¶
Build ChatGPT for your documents
cd RAG
# Put your PDFs in data/
mkdir -p data
cp your_document.pdf data/
# Run the system
python pdf_rag_bge_m3.py
Features: - đ BGE-M3 Embeddings: Best-in-class multilingual model - đ PDF Processing: Automatic text extraction and chunking - đ Semantic Search: Find relevant context for any question - đ¤ LLM Integration: Generate answers based on your documents - đž Vector Storage: Efficient retrieval with LlamaIndex
Why RAG on XPULink: - â No Embedding Server: BGE-M3 hosted for you - â No LLM Hosting: Qwen3-32B ready to use - â Automatic Retries: Built-in error handling - â LiteLLM Integration: Clean, maintainable code
Use Cases: - Corporate knowledge bases - Customer support bots - Research paper analysis - Legal document search
See RAG/README.md for detailed documentation.
3. đ¯ LoRA Fine-tuning¶
Customize models for your specific needs - on the cloud!
cd LoRA
# Interactive notebook (recommended)
jupyter notebook lora_finetune_example.ipynb
# Or use Python script
python lora_finetune.py
What is LoRA? - Parameter-Efficient: Train only 0.1% of model parameters - Fast: Minutes to hours (vs. days for full fine-tuning) - Cheap: Much lower compute costs - Effective: Near full fine-tuning quality
Why Fine-tune on XPULink: - â Cloud Training: Zero local GPU needed - â Managed Infrastructure: We handle everything - â Easy API: Upload, configure, train, deploy - â Quick Turnaround: Get results fast
Perfect For: - đĸ Enterprise: Inject company knowledge - đĨ Domain Experts: Medical, legal, finance terminology - âī¸ Style: Custom tone, format, personality - đ¯ Task Optimization: Code generation, summarization, etc.
Example:
from lora_finetune import XPULinkLoRAFineTuner
finetuner = XPULinkLoRAFineTuner()
# Prepare data
training_data = [
{
"messages": [
{"role": "system", "content": "You are a Python expert."},
{"role": "user", "content": "Explain decorators"},
{"role": "assistant", "content": "Decorators in Python..."}
]
},
# ... more examples
]
# Train in the cloud
file_id = finetuner.upload_training_file("training.jsonl")
job_id = finetuner.create_finetune_job(file_id, model="qwen3-32b")
status = finetuner.wait_for_completion(job_id)
# Use your custom model
finetuned_model = status['fine_tuned_model']
See LoRA/README.md for best practices and advanced configuration.
4. đ Device Monitoring Agent¶
AI-powered predictive maintenance
cd Agent
# Interactive demo
jupyter notebook device_agent_example.ipynb
# Or quick test
python simple_example.py
Capabilities: - đ Real-time Analysis: Multi-sensor data interpretation - đ Log Intelligence: Pattern recognition in error logs - đ§ Maintenance Planning: Predictive scheduling - đ Trend Analysis: Identify degradation patterns - đ Automated Reports: Structured diagnostic output
Industry Applications: - Manufacturing: Production line monitoring - Energy: Power generation equipment - Transportation: Fleet management - Data Centers: Server health monitoring
Why on XPULink: - â Always Available: 24/7 cloud inference - â No Latency Issues: Fast response times - â Scalable: Monitor thousands of devices - â Cost-Effective: No dedicated servers needed
See Agent/README.md for implementation details.
5. đ Model Evaluation¶
Benchmark your models with OpenBench
cd Evaluation
# Install OpenBench
pip install openbench
# Run evaluation
openbench evaluate \
--model-type openai \
--model-name qwen3-32b \
--api-key $XPULINK_API_KEY \
--base-url https://www.xpulink.ai/v1 \
--benchmark mmlu
Supported Benchmarks: - MMLU (Massive Multitask Language Understanding) - GSM8K (Math reasoning) - HellaSwag (Common sense reasoning) - Custom benchmarks
See Evaluation/README.md for comprehensive guide.
đī¸ Architecture¶
Built on vLLM - The Fastest Inference Engine¶
XPULink uses vLLM (Very Large Language Model) for all model serving:
| Feature | vLLM (XPULink) | Traditional Frameworks |
|---|---|---|
| Throughput | ⥠15-30x faster | 1x baseline |
| Memory | đž 50% more efficient | Standard |
| Latency | đ Dynamic batching | Static batching |
| Concurrency | đ Thousands of users | Limited |
| API | â OpenAI compatible | Custom |
Key Technologies: - PagedAttention: Revolutionary memory management - Continuous Batching: No waiting for batch completion - Tensor Parallelism: Multi-GPU scaling - Quantization: FP16, INT8 support
Learn more: vLLM GitHub
đ ī¸ Technical Stack¶
LiteLLM Integration¶
All examples use LiteLLM for elegant API integration:
from litellm import completion
# Clean, consistent API across all providers
response = completion(
model="openai/qwen3-32b",
messages=[...],
api_key=api_key,
api_base="https://www.xpulink.ai/v1",
custom_llm_provider="openai"
)
Why LiteLLM: - â No Hacks: No workarounds or monkey-patching - â Production-Ready: Used by thousands of developers - â Unified Interface: Works with 100+ LLM providers - â Built-in Retries: Automatic error handling - â Easy Migration: Switch providers with one line
đĄ Best Practices¶
API Key Security¶
# â
DO: Use environment variables
XPULINK_API_KEY=your_key python script.py
# â DON'T: Hardcode keys
api_key = "sk-..." # Never do this!
Error Handling¶
# LiteLLM provides automatic retries
response = completion(
model="openai/qwen3-32b",
messages=[...],
api_key=api_key,
api_base="https://www.xpulink.ai/v1",
custom_llm_provider="openai",
num_retries=3 # Automatic retry on failure
)
Performance Optimization¶
- Use appropriate
temperaturefor your use case - Set reasonable
max_tokenslimits - Batch requests when possible
- Use streaming for real-time applications
đ¤ Support & Community¶
Getting Help¶
- đ Documentation: docs.xpulink.ai
- đŦ Issues: Open an issue on GitHub
- đ§ Email: tech-support@xpulink.ai
- đ Website: www.xpulink.ai
Contributing¶
We welcome contributions! Please: 1. Fork the repository 2. Create a feature branch 3. Submit a pull request
đ License¶
MIT License - see LICENSE file for details
đ Why Developers Love XPULink¶
"No GPU setup, no model downloads - I had a RAG system running in 10 minutes!" â Sarah, ML Engineer
"The fine-tuning API saved us weeks of infrastructure work. Just upload and train." â Mike, Startup Founder
"vLLM performance + OpenAI compatibility = perfect combo" â Alex, DevOps Lead
đ Ready to Build?¶
- Get your API key: www.xpulink.ai
- Pick an example: Start with RAG or text generation
- Run the code: Copy, paste, customize
- Ship to production: Scale with confidence
No credit card needed to start experimenting! đ
Built with â¤ī¸ by the XPULink team
Powered by vLLM | OpenAI-Compatible | Production-Ready