* docs: add deployment, performance tuning guides and streamline getting started
- Add deployment-options.md: Library vs. Microservice decision guide
- Add inference-architecture.md: Separation of concerns with LLM servers
- Add performance-tuning.md: Concurrency and batching optimization guide
- Streamline index.md: Merge installation, add quick example, simplify
- Remove quick-start.md: Content merged into welcome page
- Remove installation.md: Content merged into welcome page
- Update model docs: Add concurrency control sections and cross-references
- Update mkdocs.yml: Add new Architecture section to navigation
* docs: add tasteful emojis to new documentation pages
* docs: consolidate redundant concurrency and troubleshooting content
- Remove duplicate max_parallel_requests tables from model-configs.md and inference-parameters.md
- Remove duplicate Concurrency Control section from model-configs.md
- Simplify Concurrency Control in inference-parameters.md to link to performance-tuning.md
- Remove Troubleshooting section from inference-architecture.md (covered in performance-tuning.md)
- performance-tuning.md is now the authoritative source for tuning guidance
* Simplified doc additions
* Switched default model to nemotron 3 nano
* Addressed feedback
* Added first blog draft
* adding basic jupytext structure
Co-authored-by: Johnny Greco <jogreco@nvidia.com>
* few fixes
* first test for ci
* adding error intentionally to check workflow behavior
* test calling from other workflows
* typo
* trying as job instead
* couple of fixes
* checking path
* trying to fix path
* wrapping up
---------
Co-authored-by: Johnny Greco <jogreco@nvidia.com>