temporal<-distribution|>filter(grepl("Medicare Inpatient Hospitals - by Provider and Service", title), format=="API",temporal=="2017-01-01/2017-12-31")temporal|>purse()
- $type c1 dcat:Distribution ...
- $format c1 API ...
- $accessURL c1 https://data.cms.gov/data-api/v1/dataset/b61ba5eb-021b- ...
- $resourcesAPI c1 https://data.cms.gov/data-api/v1/dataset-resources/b61b ...
- $description c1 NA ...
- $title c1 Medicare Inpatient Hospitals - by Provider and Service ...
- $modified p1 2023-05-10 ...
- $temporal c1 2017-01-01/2017-12-31 ...
- $downloadURL c1 NA ...
- $mediaType c1 NA ...
- $First Name c1160 Ana Joan ...
- $Last Name c1160 Adelstein Miller ...
- $NPI c1160 1881778967 1982776308 ...
- $Specialty c1160 Clinical Psychologist Clinical Psycho ...
- $Optout Effective Date c1160 07/01/2012 07/01/2014 ...
- $Optout End Date c1160 07/01/2026 07/01/2026 ...
- $First Line Street Address c1160 675 SEMINOLE AVENUE NE 2520 WINDY HILL ...
- $Second Line Street Address c1160 SUITE 307 SUITE 106 ...
- $City Name c1160 ATLANTA MARIETTA ...
- $State Code c1160 GA GA ...
- $Zip code c1160 30307 300678633 ...
- $Eligible to Order and Refer c1160 N N ...
- $Last updated c1160 08/15/2024 08/15/2024 ...
CSV Downloads
Start with a request to the data.json.
Match the dataset title with its name.
The most recent release will be at the top of the distribution array.
There is also a temporal field which can be used to find earlier releases.
Datasets available as either CSV or ZIP files will be designated as such in the mediaType field.
The downloadURL field will provide a direct download link for the data.
orderrefer_url<-distribution|>filter(grepl("Order and Referring", title),mediaType=="text/csv")|>slice(1)|>pull(downloadURL)orderrefer_csv<-read_csv_arrow( file =orderrefer_url, as_data_frame =TRUE)|>to_duckdb()# approx 2 million rowsorderrefer_csv
# Source: table<arrow_002> [?? x 8]
# Database: DuckDB v1.2.1 [Andrew@Windows 10 x64:R 4.4.2/:memory:]
NPI LAST_NAME FIRST_NAME PARTB DME HHA PMD HOSPICE
<int> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 1558467555 .MCINDOE THOMAS N Y N Y N
2 1770667479 A SCOTT N Y N Y N
3 1417051921 A BELLE N Y Y Y Y N
4 1356025894 A FOLEY MEGAN Y Y Y Y N
5 1972040137 A NOVOTNY ELIZABETH Y Y Y Y N
6 1760465553 A SATTAR MUHAMMAD Y Y Y Y Y
7 1295400745 A'NEAL BROGAN Y Y N N N
8 1265446264 A'VANT FOWLER CATHERINE Y Y N N N
9 1700562584 AAB BAILEY Y Y Y N Y
10 1205257284 AAB KATIE Y Y N N N
# ℹ more rows
Note 2.1: ZIP Files
CMS Summary Statistics/COVID data
Likely not relevant to package scope
New:Resources
Resources (supplemental documents to the main dataset) are now available for download through the public API. Included are sub-files, tables, supplementary data, reports, and documentation. The new endpoints appear in the resourcesAPI field, a secondary endpoint from data.json.
Note: Limit by the name of the resource you want to download. This name may change between versions.
# Medicare {#sec-careapi}```{r}#| label: setup-common-01#| include: falsesource("includes/_common.R")remove_at_symbol <- \(x) fuimus::sf_remove(s = x, p ="@", fix =TRUE)```## Links * [CMS API Docs](https://data.cms.gov/api-docs) * [API FAQ pdf](https://data.cms.gov/sites/default/files/2024-10/7ef65521-65a4-41ed-b600-3a0011f8ec4b/API%20Guide%20Formatted%201_6.pdf)The **data.CMS.gov/data.json** API: * offers access to CMS public data * is REST-ful * has predictable resource-oriented URLs * accepts form-encoded requests * returns JSON & JSON:API responses * uses standard HTTP response codes.To integrate endpoint requests with the **data.cms.gov Public API**, follow these steps: 1. Find Dataset & Identifier: * [Search for a dataset](https://data.cms.gov/search) * Visit its **Overview** page * Click **Access API** then **API Docs for the Dataset** 2. Endpoint structure: * `data.cms.gov/data-api/v1/dataset/{identifier}/data` 3. Query format (**JSON:API** syntax): * Key-values: `filter[field_name]=value&filter[field_other]=value` * Responses support a maximum size of `5000` rows * Use `size` & `offset` query parameters to page through the data--------------------------------------------------------------------------------::: {#nte-hierarchy .callout-note collapse="true"}## Hierarchy```r<catalog>- conformsTo- context- describedBy- id- type == dcat:Catalog|=><dataset>- type == dcat:Dataset- accessLevel-> accrualPeriodicity- bureauCode- [contactPoint]- type- fn- hasEmail- dataQuality- describedBy- describedByType|-> ($description)|-> ($identifier)-> keyword-> landingPage- language- license-> modified- [publisher]- type- name- programCode-> references|-> temporal- theme|-> title|=><distribution>- type == dcat:Distribution|-> accessURL|-> ($description)|-> latest|-> downloadURL|-> ($format)|-> API|-> ($mediaType)- application/vnd.ms-excel- application/zip|-> text/csv|-> modified|-> resourcesAPI|-> title|-> temporal```::: ## CatalogThe [`data.json`](https://data.cms.gov/data.json) file is an [**Open Data**](https://resources.data.gov/resources/dcat-us/) catalog containing all available datasets. As new data is added, `data.json` is automatically updated.```{r}#| label: data_jsondata_json <-read_json_arrow(file ="https://data.cms.gov/data.json",col_select =c("dataset"),as_data_frame =TRUE) |>to_duckdb()data_json |>glimpse()```> **Note:** Removing `col_select = c("dataset")` from the above call returns the following metadata:```{r}#| label: col_select_remove#| echo: falsec("context <chr> https://project-open-data.cio.gov/v1.1/schema/catalog.jsonld", "id <chr> https://data.cms.gov/data.json", "type <chr> dcat:Catalog", "conformsTo <chr> https://project-open-data.cio.gov/v1.1/schema", "describedBy <chr> https://project-open-data.cio.gov/v1.1/schema/catalog.json", "dataset <list> [<data.frame[138 x 22]>]") |>cat(sep ="\n")```## DatasetWithin `data.json`, there is an array called `dataset`:```{r}#| label: catalog_datasetdataset <- data_json |>pull(dataset) |>pluck(1) |>as_tibble()dataset```One can search through this array using the dataset's `title`, such as `"Payroll Based Journal Daily Nurse Staffing"`:```{r}#| label: dataset_titledataset |>filter(grepl("Payroll Based Journal Daily Nurse Staffing", title)) |>unnest_wider(contactPoint, names_sep ="_") |>unnest_wider(publisher, names_sep ="_") |>mutate(bureauCode =unlist(bureauCode, use.names =FALSE),keyword =map_chr(keyword, \(x) paste0(unlist(x, use.names =FALSE), collapse =", ")),language =unlist(language, use.names =FALSE),programCode =unlist(programCode, use.names =FALSE),references =unlist(references, use.names =FALSE),theme =unlist(theme, use.names =FALSE)) |>rename_with(remove_at_symbol) |>purse()```## DistributionWithin `dataset`, there is an array called `distribution` which contains all dataset versions, in all available formats:```{r}#| label: distributiondistribution <- dataset |>select(distribution) |>unnest(distribution) |>rename_with(remove_at_symbol)distribution |>purse()```### Formats * __description__: `"latest"` == URL always pointing to latest data * __mediaType__: `"text/csv"` == downloadable CSV file * __mediaType__: `"application/zip"` == downloadable ZIP file * __format__: `"API"` == API endpoint * __temporal__: Data at fixed points in time```{r}#| label: emphatic#| echo: falsedistribution |>count(description, format, mediaType) |> emphatic::as_emphatic() |> emphatic::hl(ggplot2::scale_colour_viridis_c(), cols ="n") |> emphatic::hl_adjust(text_contrast =0.5, na ="-")```For instance, the following URL will *always* point to the **Q2 2021 Payroll Based Journal Daily Nurse Staffing** data:```{r}#| label: staffingstaffing <- distribution |>filter(grepl("Payroll Based Journal Daily Nurse Staffing", title), format =="API", grepl("^2021-04", temporal))staffing |>purse()staffing |>pull(accessURL) |>request() |>req_perform() |>resp_body_json(simplifyVector =TRUE) |>head() |>purse()```### Temporal DataDatasets with multiple historical versions available will have a `temporal` field in the `distribution` array of the `data.json`.> **Format**: `YYYY-mm-dd/YYYY-mm-dd`The following example returns the [2017 Medicare Inpatient Hospitals - by Provider and Service](https://data.cms.gov/provider-summary-by-type-of-service/medicare-inpatienthospitals/medicare-inpatient-hospitals-by-provider-and-service):```{r}#| label: temporal_datatemporal <- distribution |>filter(grepl("Medicare Inpatient Hospitals - by Provider and Service", title), format =="API", temporal =="2017-01-01/2017-12-31")temporal |>purse()temporal |>pull(accessURL) |>request() |>req_perform() |>resp_body_json(simplifyVector =TRUE) |>head() |>purse()```## Different JSON MethodsThere are two methods of accessing the latest data. Both result in a URL pointing to the most recent version of the dataset. This URL is canonical, i.e., it will not change as new versions are added. > For this reason, it is _recommended to **always** start with the **`data.json` object** as opposed to hardcoding any URL_.### Standard JSONUse the __`distribution`__ with the `"latest"` __`description`__:```r<data_json>=><dataset>=><distribution>-> ($description =="latest")-> ($accessURL)```For example, this URL for **Opt Out Affidavits** is:```rhttps://data.cms.gov/ data-api/v1/dataset/9887a515-7552-4693-bf58-735c77af46d7/ data^^^^``````{r}#| label: json_standarddistribution |>filter(grepl("Order and Referring", title), description =="latest") |>purse()```### JSON:APIThe **JSON:API** form has a slightly different structure that includes metadata about the dataset. Otherwise it is identical to the **Standard JSON** method. Use the URL in the `identifier` field. ```r<data_json>=><dataset>-> ($title =="Order and Referring")-> ($identifier)```For example, this URL for **Opt Out Affidavits** is:```rhttps://data.cms.gov/ data-api/v1/dataset/9887a515-7552-4693-bf58-735c77af46d7/ data-viewer^^^^^^^^^^^``````{r}#| label: json_apidataset |>filter(grepl("Order and Referring", title)) |>unnest_wider(contactPoint, names_sep ="_") |>unnest_wider(publisher, names_sep ="_") |>mutate(bureauCode =unlist(bureauCode, use.names =FALSE),keyword =map_chr(keyword, \(x) paste0(unlist(x, use.names =FALSE), collapse =", ")),language =unlist(language, use.names =FALSE),programCode =unlist(programCode, use.names =FALSE),references =unlist(references, use.names =FALSE),theme =unlist(theme, use.names =FALSE)) |>rename_with(remove_at_symbol) |>purse()```--------------------------------------------------------------------------------# PaginationThe default is to provide the first **1,000 rows per request**. However, there is an ability to increase the limit to **5,000 rows per request**. You can use pagination to retrieve the entire dataset. For example, with the **Opt Out Affidavits** dataset, start with the following request to get the number of rows:```{r}#| label: pagination_1base <-request("https://data.cms.gov/data-api/v1/dataset") |>req_url_path_append("9887a515-7552-4693-bf58-735c77af46d7") |>req_url_path_append("data")basestats <- base |>req_url_path_append("stats")statsstats <- stats |>req_perform() |>resp_body_json(simplifyVector =TRUE)stats |>purse()sq <- providertwo:::offset_seq(stats$found_rows, 5000)glue::glue("{base$url}","?size=5000&","offset=","{sq}")```## Pagination ExamplePaginates through the [Opt Out Affidavits](https://data.cms.gov/provider-characteristics/medicare-provider-supplier-enrollment/opt-out-affidavits) data:::: callout## Find the Number of RowsUse the `stats` endpoints i.e., * `/data-viewer/stats` * `/data/stats`::: ```rdata/stats? filter[f1][path]=State Code& filter[f1][value]=GAdata? size=5& filter[f1][path]=State Code& filter[f1][value]=GA``````{r}#| label: pagination_optoutquery <- glue::glue('list( "filter[id-{FID}][path]" = "{PATH}", "filter[id-{FID}][value]" = "{VALUE}" )', FID =1, PATH ="State Code", VALUE ="GA")queryquery <- query |> rlang::parse_expr() |> rlang::eval_bare()queryaccessURL <- distribution |>filter(grepl("Opt Out Affidavits : ", title), format =="API", description =="latest") |>pull(accessURL) |>request() |>req_url_query(size =5000, !!!query)resp_found <- accessURL |>req_url_path_append("stats") |>req_perform() |>resp_body_json() |> fuimus::gelm("found")resp_foundaccessURL |>req_perform() |>resp_body_json(simplifyVector =TRUE) |>tibble() |>purse()```# CSV Downloads 1. Start with a request to the `data.json`. 2. Match the dataset `title` with its name. 3. The most recent release will be at the top of the `distribution` array. 4. There is also a `temporal` field which can be used to find earlier releases. 5. Datasets available as either CSV or ZIP files will be designated as such in the `mediaType` field. 6. The `downloadURL` field will provide a direct download link for the data.```{r}#| label: csv_download#| eval: trueorderrefer_url <- distribution |>filter(grepl("Order and Referring", title), mediaType =="text/csv") |>slice(1) |>pull(downloadURL)orderrefer_csv <-read_csv_arrow(file = orderrefer_url, as_data_frame =TRUE) |>to_duckdb()# approx 2 million rowsorderrefer_csv```::: {#nte-zips .callout-note collapse="true"}## ZIP Files_CMS Summary Statistics/COVID data_ _Likely not relevant to package scope_::: # _New:_ **Resources**Resources (supplemental documents to the main dataset) are now available for download through the public API. Included are sub-files, tables, supplementary data, reports, and documentation. The new endpoints appear in the `resourcesAPI` field, a secondary endpoint from `data.json`. > **Note**: Limit by the `name` of the resource you want to download. This name may change between versions.## Example: Reassignment SubFile (`csv`)> Site: [Public Provider Enrollment Reassignment SubFile](https://data.cms.gov/provider-characteristics/medicare-provider-supplier-enrollment/medicare-fee-for-service-public-provider-enrollment)```{r}#| label: add_resources#| eval: trueresp_resources <- distribution |>filter(grepl("Medicare Fee-For-Service", title), description =="latest") |>pull(resourcesAPI) |>request() |>req_perform() |>resp_body_json(simplifyVector =FALSE) |>list_flatten() |>list_flatten()reassign_url <-tibble(name = fuimus::gelm(resp_resources, "name$") |>unlist(use.names =FALSE),size = fuimus::gelm(resp_resources, "fileSize$") |>unlist(use.names =FALSE),url = fuimus::gelm(resp_resources, "downloadURL$") |>unlist(use.names =FALSE))reassign_urlreassign_url <- reassign_url|>filter(grepl("Reassignment", name))reassign_url |>purse()reassign_csv <-read_csv_arrow(file = reassign_url$url, as_data_frame =TRUE) |>to_duckdb()reassign_csv```-------------------------------------------------------------------------------