Storage & Data Management
Managing disks, data transfers, and storage on SynpixCloud
Overview
SynpixCloud instances come with configurable storage options. Understanding how storage works helps you manage your data effectively and optimize costs.
Storage Types
System Disk
- Purpose - Operating system, software, and dependencies
- Default size - Varies by image (typically 50-100 GB)
- Persistence - Tied to instance lifecycle
Data Disk
- Purpose - Datasets, models, and user data
- Configurable - Choose size at instance creation
- Retention - Can be preserved when instance stops
Disk Retention
When you stop or terminate an instance, you have options for your data:
| Option | Data Preserved | Storage Cost | Best For |
|---|---|---|---|
| No Retention | No | No ongoing cost | Temporary workloads |
| Retain Disk | Yes | Ongoing storage fee | Long-term projects |
Without disk retention enabled, all data is lost when the instance is terminated. Always backup important data!
Enabling Disk Retention
- During instance creation, enable Disk Retention
- For existing instances, go to Settings > Storage
- Toggle disk retention before stopping
Expanding Disk Space
If you need more storage space:
Online Expansion
Some instances support live disk expansion:
- Go to instance Settings > Storage
- Click Expand Disk
- Select new size
- Confirm and pay the difference
Extending Filesystem
After expanding the disk, extend the filesystem:
# Check current disk size
df -h
# Resize the partition (if needed)
sudo growpart /dev/vda 1
# Extend ext4 filesystem
sudo resize2fs /dev/vda1
# Or for XFS filesystem
sudo xfs_growfs /Data Transfer
Uploading Files
Using SCP
# Upload single file
scp -P <port> local_file.zip root@<host>:/root/
# Upload directory
scp -P <port> -r local_folder root@<host>:/root/Using rsync (Recommended for large transfers)
# Sync with progress
rsync -avz --progress local_folder/ root@<host>:/root/data/ -e "ssh -p <port>"
# Resume interrupted transfer
rsync -avz --partial --progress large_file.zip root@<host>:/root/ -e "ssh -p <port>"Using SFTP
sftp -P <port> root@<host>
sftp> put local_file.zip
sftp> get remote_file.zipDownloading Files
# Download from instance to local
scp -P <port> root@<host>:/root/results.zip ./
# Download directory
scp -P <port> -r root@<host>:/root/output ./Large Dataset Transfer
For very large datasets, consider:
- Cloud storage sync - Use
rcloneto sync with S3, GCS, etc. - wget/curl - Download directly from URLs
- HuggingFace Hub - Use
huggingface-clifor model downloads
# Install rclone
curl https://rclone.org/install.sh | sudo bash
# Configure cloud storage
rclone config
# Sync from cloud
rclone sync remote:bucket/data /root/dataHuggingFace Model Downloads
Download models efficiently from HuggingFace:
# Install the CLI
pip install huggingface_hub
# Login (optional, for private models)
huggingface-cli login
# Download a model
huggingface-cli download meta-llama/Llama-2-7b --local-dir ./modelsMirror Acceleration (China)
For faster downloads in China, use a mirror:
# Set HuggingFace mirror
export HF_ENDPOINT=https://hf-mirror.com
# Then download normally
huggingface-cli download model-nameStorage Best Practices
1. Organize Your Data
/root/
├── code/ # Your source code
├── data/ # Datasets
├── models/ # Trained models
├── checkpoints/ # Training checkpoints
└── output/ # Results and logs2. Regular Backups
# Backup to local machine
rsync -avz root@<host>:/root/important/ ./backup/ -e "ssh -p <port>"
# Backup to cloud storage
rclone sync /root/important remote:bucket/backup3. Clean Temporary Files
# Clear pip cache
pip cache purge
# Clear conda cache
conda clean --all
# Clear apt cache
sudo apt clean
# Find large files
du -sh /* | sort -hr | head -204. Use Compression
# Compress before transfer
tar -czvf data.tar.gz data/
# Extract on remote
tar -xzvf data.tar.gzStorage Monitoring
Check Disk Usage
# Overall disk usage
df -h
# Directory sizes
du -sh /root/*
# Find largest files
find /root -type f -exec du -h {} + | sort -hr | head -20Set Up Alerts
Monitor disk usage in your training scripts:
import shutil
def check_disk_space(path="/", threshold_gb=10):
total, used, free = shutil.disk_usage(path)
free_gb = free // (2**30)
if free_gb < threshold_gb:
print(f"Warning: Only {free_gb}GB free space remaining!")
return free_gbTroubleshooting
Disk Full
- Check what's using space:
du -sh /* | sort -hr - Clear caches (pip, conda, apt)
- Remove unused files/checkpoints
- Consider expanding disk
Cannot Write to Disk
- Check permissions:
ls -la /path/to/dir - Check disk space:
df -h - Check if filesystem is mounted read-only:
mount | grep " / "
Slow Transfer Speeds
- Use compression:
rsync -avz - Use parallel transfers for many small files
- Check network bandwidth on both ends
Support
Storage questions? Contact us at support@synpixcloud.com