This is a URL parser written in C. It works like parse_url() in PHP. It split the URL into components such as scheme, host, path, etc.
Copy url_parser.c and url_parser.h into your project.
The parse_url() function parses URL which is given by url parameter. It returns URL_COMPONENTS structure pointer or NULL if an error occurred.
The returned URL_COMPONENTS must be released using free_url_components().
In case of error, errno will be set.
- EINVAL - Invalid URL was given.
- ENOMEM - Insufficient memory available
This function simply splits the URL into components. It does not perform strict validation of URLs. Invalid URL is also accepted. URL that only have path is also accepted.
But it returns an error (EINVAL) in some cases.
- URL contains control code(0x00-0x1f, 0x7f).
- The hostname contains space.
- The port number contains non-numeric characters.
- The port number is out of range.
and so on.
The free_url_components() function frees up URL_COMPONENTS.
typedef struct url_components { char *scheme; char *user; char *password; char *host; int port; char *path; char *query; char *fragment; } URL_COMPONENTS;
URL_COMPONENTS is returned by parse_url().
scheme, user, password, host, path, query, fragment fields:
If they are not in the URL, they will be set to NULL.
port field:
It's a port number. If it is not in the URL, it will be set to -1.
http://user:[email protected]:8080/foo?bar=baz#qux
.scheme http
.user user
.password pass
.host example.com
.port 8080
.path /foo
.query bar=baz
.fragment qux
.scheme http
.host example.com
.port -1
.path /foo
Other fields are NULL.
.scheme http
.host example.com
.port -1
.path /foo
.query ""(empty string)
Other fields are NULL.
/foo
.port -1
.path /foo
Other fields are NULL.
file:///foo/bar
.scheme file
.port -1
.path /foo/bar
Other fields are NULL.
//example.com/foo
.host example.com
.port -1
.path /foo
Other fields are NULL.
See also unittest.c .
#include <stdio.h> #include "url_parser.h" int main() { const char *url = "https://example.com:8080/foo/?bar=baz"; URL_COMPONENTS *c; c = parse_url(url); if (!c) { return -1; } printf("Scheme: %s\n", c->scheme ? c->scheme : ""); printf("Host: %s\n", c->host ? c->host : ""); if (c->port != -1) { printf("Port: %d\n", c->port); } printf("Path: %s\n", c->path ? c->path : ""); printf("Query: %s\n", c->query ? c->query : ""); free_url_components(c); return 0; }
Result
Scheme: https Host: example.com Port: 8080 Path: /foo/ Query: bar=baz