Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unicode/Chinese replication issue. #154

Open
BillyGEG opened this issue Aug 1, 2023 · 0 comments
Open

Unicode/Chinese replication issue. #154

BillyGEG opened this issue Aug 1, 2023 · 0 comments

Comments

@BillyGEG
Copy link

BillyGEG commented Aug 1, 2023

Describe the bug
A clear and concise description of what the bug is, with verbose log.

This is happening when i tried to replicate a table with Chinese character, from PostgreSQL to MSSQL.

What i saw in PostgreSQL

id col_chi


1 陳大文
2 吳小明

What i saw after replicated to MSSQL using replicaDB

id col_chi


1 ???
2 ???

If i perform insert select from PostgreSQL via linked server from MSSQL
insert into test_chinese_1
select * from openquery(link_postgres, 'select * from public.test_chinese_1;')

id col_chi


1 陳大文
2 吳小明

Verbose log
[~]$ /replicadb/bin/replicadb -v --options-file /replicadb/conf/_replicadb_bis.conf --source-table public.test_chinese_1 --sink-table dbo.test_chinese_1 --mode complete-atomic
Picked up JAVA_TOOL_OPTIONS: -Djdbc.drivers=org.postgresql.Driver -Dfile.encoding=UTF8 -Dclient.encoding.override=GBK -Dpostgresql.enable_sspi=true -Duser.language=zh -Duser.country=TW
2023-08-01 10:17:04,871 INFO ReplicaDB:63 Running ReplicaDB version: 0.15.0
2023-08-01 10:17:04,877 INFO ReplicaDB:66 Setting verbose mode INFO
2023-08-01 10:17:05,268 INFO SQLServerManager:133 Creating staging table with this command: SELECT * INTO staging.test_chinese_1repdb010 FROM dbo.test_chinese_1 WHERE 0 = 1
2023-08-01 10:17:05,272 INFO SqlManager:388 Atomic and asynchronous deletion of all data from the sink table with this command: DELETE FROM dbo.test_chinese_1
2023-08-01 10:17:05,274 INFO ReplicaTask:35 Starting TaskId-0
2023-08-01 10:17:05,443 INFO SqlManager:128 TaskId-0: Executing SQL statement: SELECT * FROM public.test_chinese_1 OFFSET ?
2023-08-01 10:17:05,451 INFO SqlManager:148 TaskId-0: With args: 0,
2023-08-01 10:17:05,524 WARN ConnManager:188 Options source-columns and sink-columns are null, getting from Source ResultSetMetaData: id,col_chi
2023-08-01 10:17:05,524 INFO ReplicaTask:67 A total of 0 rows processed by task 0
2023-08-01 10:17:05,526 INFO ReplicaDB:120 Waiting for the asynchronous task to be completed...
2023-08-01 10:17:05,526 INFO SQLServerManager:50 IF OBJECTPROPERTY(OBJECT_ID('dbo.test_chinese_1'), 'TableHasIdentity') = 1 SET IDENTITY_INSERT dbo.test_chinese_1 ON
2023-08-01 10:17:05,526 INFO SqlManager:430 Inserting data from staging table to sink table within a transaction: INSERT INTO dbo.test_chinese_1 (id,col_chi) SELECT id,col_chi FROM staging.test_chinese_1repdb010
2023-08-01 10:17:05,528 INFO SQLServerManager:50 IF OBJECTPROPERTY(OBJECT_ID('dbo.test_chinese_1'), 'TableHasIdentity') = 1 SET IDENTITY_INSERT dbo.test_chinese_1 OFF
2023-08-01 10:17:05,529 INFO SqlManager:462 Dropping staging table with this command: DROP TABLE staging.test_chinese_1repdb010
2023-08-01 10:17:05,531 INFO ReplicaDB:54 Total process time: 668ms

To Reproduce
Steps to reproduce the behaviour:

  1. Source table DDL
    PostgreSQL:
    create table public.test_chinese_1 (
    id serial,
    col_chi CHARACTER VARYING(10)
    );
    insert into public.test_chinese_1 (col_chi) values ('陳大文');
    insert into public.test_chinese_1 (col_chi) values ('吳小明');

  2. Sink table DDL
    create table test_chinese_1(
    id int,
    col_chi nvarchar(10)
    )

  3. ReplicaDB configuration options-file.

jobs=1
fetch.size=1000
source.connect=jdbc:postgresql://postger_server:5432/nih?useUnicode=yes&characterEncoding=UTF8
source.user=
source.password=
# source.connect.parameter.useUnicode=yes
# source.connect.parameter.characterEncoding=UTF8

sink.connect=jdbc:sqlserver://mssql_server:1433;database=DM_GM_BIS_REPL;useUnicode=true;characterEncoding=UTF8
sink.user=
sink.password=
# sink.connect.parameter.[parameter_name]=parameter_value
# sink.connect.parameter.useUnicode=true
# sink.connect.parameter.characterEncoding=UTF8

Expected behavior
A clear and concise description of what you expected to happen.
I tried different way to try enforcing JDBC using unicode set and UTF8/Big5/GBK encoding but i still not able to keep the Chinese characters.

Additional context
Add any other context about the problem here. Running environment (cloud, on premise, java version..), source and sink technologies (Oracle, MySQL, Postgres...)
Also tried JDBC_TOOL_OPTIONS on OS level.

export JAVA_TOOL_OPTIONS="-Dfile.encoding=UTF8 -Dclient.encoding.override=GBK"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant