There was a recent question in the /r/Rlanguage subreddit which piqued my interest. They asked how to find the right mapping, and with the large number of data types I wondered if there was a good way to dynamically discover how fields are cast.
First step is to decide how to communicate with the database. I used the package RMySQL for this.
1 2 | library(RMySQL) |
Now to connect to the DB. I am running this locally. Make sure to use a password, even locally.
1 2 3 4 5 6 | db_sql=dbConnect(MySQL(),user="root", host="127.0.0.1", dbname="test", password="", port=3306) |
I included two different queries. The one below grabs all of the rows from the table.
1 2 3 | q <- "SELECT * FROM main" rows <- dbGetQuery(db_sql, q) |
We get a couple warnings with the current table.
1 2 3 4 5 6 | Warning messages: 1: In .local(conn, statement, ...) : Unsigned INTEGER in col 0 imported as numeric 2: In .local(conn, statement, ...) : unrecognized MySQL field type 7 in column 5 imported as character |
We can use this loop to get the column name and data type for the imported dataframe.
1 2 3 4 | for(column in names(rows)) { print(paste0(column,": ",typeof(rows[[column]]))) } |
We get this nice output.
1 2 3 4 5 6 7 8 9 | [1] "id: double" [1] "char_test: character" [1] "float_test: double" [1] "bool_test: integer" [1] "double_test: double" [1] "timestamp_test: character" [1] "datetime_test: character" [1] "date_test: character" |
The column names in MySQL were the data type followed by _test
. As you can see, they were not all imported with a matching data type.
We can get a 1:1 mapping of the column type in MySQL and imported data type in R with the following piece of code.
1 2 3 4 | q <- "SELECT DATA_TYPE FROM INFORMATION_SCHEMA.COLUMNS WHERE table_name = 'main'" schema <- dbGetQuery(db_sql, q) print(data.frame(SQL = schema$DATA_TYPE, R = as.vector(sapply(rows,typeof)))) |
This gives us a nice printout of a new dataframe created on the fly.
1 2 3 4 5 6 7 8 9 10 | SQL R 1 int double 2 char character 3 float double 4 tinyint integer 5 double double 6 timestamp character 7 datetime character 8 date character |
And of course we need to close the connection if no more queries need to be made.
1 2 | dbDisconnect(db_sql) |
Hi, thank for this point.
I use RMariaDB for connect to MySQL, is similar package but it is more modern over RMySQL.
In RMySQL anytime i get error in inconding when i work with cyrillic characters.